* [PATCH][linux-next] soc: fsl: dpio: Unsigned compared against 0 in qbman_swp_set_irq_coalescing()
From: Tim Gardner @ 2021-10-18 16:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Roy Pledge, tim.gardner, linuxppc-dev, linux-kernel, Li Yang
Coverity complains of unsigned compare against 0. There are 2 cases in
this function:
1821 itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns;
CID 121131 (#1 of 1): Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. itp < 0U.
1822 if (itp < 0 || itp > 4096) {
1823 max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000;
1824 pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff);
1825 return -EINVAL;
1826 }
1827
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. irq_threshold < 0U.
1828 if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) {
1829 pr_err("irq_threshold must be between 0..%d\n",
1830 p->dqrr.dqrr_size - 1);
1831 return -EINVAL;
1832 }
Fix this by checking for 0. Also fix a minor comment typo.
Fixes ed1d2143fee53755ec601eb4d48a337a93933f71 ("soc: fsl: dpio: add support for
irq coalescing per software portal")
Cc: Roy Pledge <Roy.Pledge@nxp.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
I'm not 100% sure this is the right way to fix the warning, but according to the
pr_err() comments these values should never be 0.
---
drivers/soc/fsl/dpio/qbman-portal.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index d3c58df6240d..b768a14bb271 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -1816,16 +1816,16 @@ int qbman_swp_set_irq_coalescing(struct qbman_swp *p, u32 irq_threshold,
u32 itp, max_holdoff;
/* Convert irq_holdoff value from usecs to 256 QBMAN clock cycles
- * increments. This depends to the QBMAN internal frequency.
+ * increments. This depends on the QBMAN internal frequency.
*/
itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns;
- if (itp < 0 || itp > 4096) {
+ if (!itp || itp > 4096) {
max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000;
pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff);
return -EINVAL;
}
- if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) {
+ if (irq_threshold >= p->dqrr.dqrr_size || !irq_threshold) {
pr_err("irq_threshold must be between 0..%d\n",
p->dqrr.dqrr_size - 1);
return -EINVAL;
--
2.33.1
^ permalink raw reply related
* Re: [PATCH 00/13] block: add_disk() error handling stragglers
From: Luis Chamberlain @ 2021-10-18 16:15 UTC (permalink / raw)
To: Geoff Levand
Cc: nvdimm, vigneshr, linux-nvme, paulus, miquel.raynal, ira.weiny,
hch, dave.jiang, sagi, minchan, vishal.l.verma, ngupta,
linux-block, kbusch, dan.j.williams, axboe, linux-kernel, jim,
senozhatsky, richard, linux-mtd, linuxppc-dev
In-Reply-To: <a31970d6-8631-9d9d-a36f-8f4fcebfb1e6@infradead.org>
On Sun, Oct 17, 2021 at 08:26:33AM -0700, Geoff Levand wrote:
> Hi Luis,
>
> On 10/15/21 4:52 PM, Luis Chamberlain wrote:
> > This patch set consists of al the straggler drivers for which we have
> > have no patch reviews done for yet. I'd like to ask for folks to please
> > consider chiming in, specially if you're the maintainer for the driver.
> > Additionally if you can specify if you'll take the patch in yourself or
> > if you want Jens to take it, that'd be great too.
>
> Do you have a git repo with the patch set applied that I can use to test with?
Sure, although the second to last patch is in a state of flux given
the ataflop driver currently is broken and so we're seeing how to fix
that first:
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20211011-for-axboe-add-disk-error-handling
Luis
^ permalink raw reply
* [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates
From: Nathan Lynch @ 2021-10-18 16:34 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, cheloha, ldufour
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
drivers bound to the leaf nodes are ignorant of this, and do not relinquish
their references. The leaf nodes, still reachable through sysfs, of course
still refer to the now-freed ibm,platform-facilities parent node, which
makes use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable@vger.kernel.org
---
arch/powerpc/platforms/pseries/mobility.c | 34 +++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
--
2.31.1
^ permalink raw reply related
* Re: [PATCH v1 04/11] powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
From: Christophe Leroy @ 2021-10-18 17:09 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20211015154624.922960-5-npiggin@gmail.com>
Le 15/10/2021 à 17:46, Nicholas Piggin a écrit :
> slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
> segment interrupts that occur with radix MMU as well.
> ---
> arch/powerpc/include/asm/interrupt.h | 2 +-
> arch/powerpc/kernel/exceptions-64s.S | 4 ++--
> arch/powerpc/mm/book3s64/slb.c | 16 ----------------
> arch/powerpc/mm/fault.c | 17 +++++++++++++++++
> 4 files changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index a1d238255f07..3487aab12229 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -564,7 +564,7 @@ DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
>
> /* slb.c */
> DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);
> -DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
> +DECLARE_INTERRUPT_HANDLER(do_bad_segment_interrupt);
>
> /* hash_utils.c */
> DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index eaf1f72131a1..046c99e31d01 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1430,7 +1430,7 @@ MMU_FTR_SECTION_ELSE
> ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> std r3,RESULT(r1)
> addi r3,r1,STACK_FRAME_OVERHEAD
> - bl do_bad_slb_fault
> + bl do_bad_segment_interrupt
> b interrupt_return_srr
>
>
> @@ -1510,7 +1510,7 @@ MMU_FTR_SECTION_ELSE
> ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> std r3,RESULT(r1)
> addi r3,r1,STACK_FRAME_OVERHEAD
> - bl do_bad_slb_fault
> + bl do_bad_segment_interrupt
> b interrupt_return_srr
>
>
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index f0037bcc47a0..31f4cef3adac 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -868,19 +868,3 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
> return err;
> }
> }
> -
> -DEFINE_INTERRUPT_HANDLER(do_bad_slb_fault)
> -{
> - int err = regs->result;
> -
> - if (err == -EFAULT) {
> - if (user_mode(regs))
> - _exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
> - else
> - bad_page_fault(regs, SIGSEGV);
> - } else if (err == -EINVAL) {
> - unrecoverable_exception(regs);
> - } else {
> - BUG();
> - }
> -}
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index a8d0ce85d39a..53ddcae0ac9e 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -35,6 +35,7 @@
> #include <linux/kfence.h>
> #include <linux/pkeys.h>
>
> +#include <asm/asm-prototypes.h>
> #include <asm/firmware.h>
> #include <asm/interrupt.h>
> #include <asm/page.h>
> @@ -620,4 +621,20 @@ DEFINE_INTERRUPT_HANDLER(do_bad_page_fault_segv)
> {
> bad_page_fault(regs, SIGSEGV);
> }
> +
> +DEFINE_INTERRUPT_HANDLER(do_bad_segment_interrupt)
> +{
> + int err = regs->result;
> +
> + if (err == -EFAULT) {
> + if (user_mode(regs))
> + _exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
> + else
> + bad_page_fault(regs, SIGSEGV);
> + } else if (err == -EINVAL) {
> + unrecoverable_exception(regs);
> + } else {
> + BUG();
> + }
> +}
> #endif
>
You could do something more flat:
if (err == -EINVAL)
unrecoverable_exception(regs);
BUG_ON(err != -EFAULT);
if (user_mode(regs))
_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
else
bad_page_fault(regs, SIGSEGV);
I know you are just moving existing code but moving code is always an
opportunity to clean it without additional churn.
^ permalink raw reply
* Re: [PATCH] powerpc/eeh: skip slot presence check when PE is temporarily unavailable.
From: Mahesh J Salgaonkar @ 2021-10-18 17:28 UTC (permalink / raw)
To: Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CAOSf1CG4H2GrxV5C=55vcNue4taSAkgFOUg32yVspgw9+meDAg@mail.gmail.com>
On 2021-05-07 10:41:46 Fri, Oliver O'Halloran wrote:
> On Fri, May 7, 2021 at 3:43 AM Mahesh Salgaonkar <mahesh@linux.ibm.com> wrote:
> >
> > When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> > state as temporarily unavailable. In this case, per PAPR, rtas call
> > ibm,read-slot-reset-state2 returns a PE state as temporarily unavailable(5)
> > and OS has to wait until that recovery is complete. During this state the
> > slot presence check 'get-sensor-state(dr-entity-sense)' returns as DR
> > connector empty which leads to assumption that the device has been
> > hot-removed. This results into no EEH recovery on this device and it stays
> > in failed state forever.
> >
> > This patch fixes this issue by skipping slot presence check only if device
> > PE state is temporarily unavailable(5).
> >
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> > ---
> > * snip*
> >
> > /*
> > * It should be corner case that the parent PE has been
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index 3eff6a4888e79..a0913768f33de 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -851,6 +851,17 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> > return;
> > }
> >
> > + /*
> > + * When PE's state is temporarily unavailable, the slot
> > + * presence check returns as DR connector empty.
>
> That sounds like a bug in either RTAS or the hotplug slot driver (or
> both). The presence check is there largely to filter out events that
> we can guarantee are not recoverable (i.e. surprise hot-unplug). In
> every other case (especially if we can't determine the state) we
> should be going down the recovery path. If the hotplug slot driver is
> incorrectly reporting the card has been removed then you should be
> fixing the slot driver.
Thanks Oliver for the comment.
So phyp fixed the issue where it was incorrectly reporting the card has
been removed. After the phyp fix, the slot presence check
'get-sensor-state(dr-entity-sense)' returns extended busy error (9902)
until PHB is recovered by phyp. And once PHB is recovered, the
get-sensor-state() returns success with correct presence status.
But now we have different problem. The Linux rtas call interface
rtas_get_sensor() loops over the rtas call on extended delay return code
(9902) until the return value is either success (0) or error (-1). This
causes EEH handler to get stuck at presence check 'rtas_get_sensor()'
for ~6 seconds before it could indicate network driver that error has
been detected and stop any active operations. With no I/O traffic this
doesn't cause any issue and EEH recovery works fine. However with
running I/O traffic, during this 6 seconds, network driver continues its
operation and hits timeout (netdev watchdog). On timeouts, network
driver go into ffdc capture mode and reset path assuming PCI device is
in fatal condition. This causes EEH recovery to fail and sometimes it
leads to system hang or crash.
------------
[52732.244731] DEBUG: ibm_read_slot_reset_state2()
[52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=0x0
[52732.244798] DEBUG: in eeh_slot_presence_check
[52732.244804] DEBUG: error state check
[52732.244807] DEBUG: Is slot hotpluggable
[52732.244810] DEBUG: hotpluggable ops ?
[52732.244953] DEBUG: Calling ops->get_adapter_status
[52732.244958] DEBUG: calling rpaphp_get_sensor_state
[52736.564262] ------------[ cut here ]------------
[52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed out
[52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev_watchdog+0x438/0x440
[...]
[52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
[52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
------------
I am working on ways to fix this and looking at below two options. More
ideas are welcome.
1. There is an alternate call rtas_get_sensor_fast() available that does
not use rtas_busy_delay() and returns immediately with error code. Using
rtas_get_sensor_fast() for slot presence check fixes the above issue and
EEH recovery works fine. However there is no provision in
hotplug_slot_ops struct to do a quick check of adapter status that can
be used to call rtas_get_sensor_fast().
2. Another option is to move the slot presence check after reporting
network driver that error has been detected. This also fixes the issue.
However need to verify the hotplug case where if slot is empty, inform
driver to resume while skiping the recovery.
Let me know what do you think about above options and if there is any
other better way to fix this.
Thanks,
-Mahesh.
^ permalink raw reply
* [PATCH v2] tracing: Have all levels of checks prevent recursion
From: Steven Rostedt @ 2021-10-18 19:44 UTC (permalink / raw)
To: LKML
Cc: 王贇, Peter Zijlstra (Intel), Paul Walmsley,
James E.J. Bottomley, Paul Mackerras, Jisheng Zhang,
H. Peter Anvin, live-patching, linux-riscv, Miroslav Benes,
Joe Lawrence, Helge Deller, x86, linux-csky, Ingo Molnar,
Petr Mladek, Albert Ou, Jiri Kosina, Nicholas Piggin,
Borislav Petkov, Josh Poimboeuf, Thomas Gleixner, linux-parisc,
linuxppc-dev, Palmer Dabbelt, Masami Hiramatsu, Guo Ren,
Colin Ian King, Linus Torvalds
[
Linus,
I have patches that clean this up that are not marked for stable, but
will depend on this patch. As I already have commits in my next queue,
I can do one of the following:
1. Cherry pick this from my urgent tree, and build everything on top.
2. Add this on top of the mainline tag my next branch is based on and
merge it.
3. Add this to my next branch, and have it go in at the next merge
window.
]
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.
The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.
Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.
Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.
Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.
If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.
Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.
This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.
i.e.
traced_function_1: [ more than one callback tracing it ]
call loop_func
loop_func:
trace_recursion set internal bit
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
traced_function_2: [ only traced by above callback ]
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
[ wash, rinse, repeat, BOOM! out of shampoo! ]
Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.
Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.
[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.
Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Changes since v1:
- Removed "max" parameter (Petr Mladek)
Petr, I did not update the comment, because that can/should be a
separate patch, as the comment was there before this patch.
include/linux/trace_recursion.h | 49 ++++++---------------------------
kernel/trace/ftrace.c | 4 +--
2 files changed, 11 insertions(+), 42 deletions(-)
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c5714e65..fe95f0922526 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -16,23 +16,8 @@
* When function tracing occurs, the following steps are made:
* If arch does not support a ftrace feature:
* call internal function (uses INTERNAL bits) which calls...
- * If callback is registered to the "global" list, the list
- * function is called and recursion checks the GLOBAL bits.
- * then this function calls...
* The function callback, which can use the FTRACE bits to
* check for recursion.
- *
- * Now if the arch does not support a feature, and it calls
- * the global list function which calls the ftrace callback
- * all three of these steps will do a recursion protection.
- * There's no reason to do one if the previous caller already
- * did. The recursion that we are protecting against will
- * go through the same steps again.
- *
- * To prevent the multiple recursion checks, if a recursion
- * bit is set that is higher than the MAX bit of the current
- * check, then we know that the check was made by the previous
- * caller, and we can skip the current check.
*/
enum {
/* Function recursion bits */
@@ -40,12 +25,14 @@ enum {
TRACE_FTRACE_NMI_BIT,
TRACE_FTRACE_IRQ_BIT,
TRACE_FTRACE_SIRQ_BIT,
+ TRACE_FTRACE_TRANSITION_BIT,
- /* INTERNAL_BITs must be greater than FTRACE_BITs */
+ /* Internal use recursion bits */
TRACE_INTERNAL_BIT,
TRACE_INTERNAL_NMI_BIT,
TRACE_INTERNAL_IRQ_BIT,
TRACE_INTERNAL_SIRQ_BIT,
+ TRACE_INTERNAL_TRANSITION_BIT,
TRACE_BRANCH_BIT,
/*
@@ -86,12 +73,6 @@ enum {
*/
TRACE_GRAPH_NOTRACE_BIT,
- /*
- * When transitioning between context, the preempt_count() may
- * not be correct. Allow for a single recursion to cover this case.
- */
- TRACE_TRANSITION_BIT,
-
/* Used to prevent recursion recording from recursing. */
TRACE_RECORD_RECURSION_BIT,
};
@@ -113,12 +94,10 @@ enum {
#define TRACE_CONTEXT_BITS 4
#define TRACE_FTRACE_START TRACE_FTRACE_BIT
-#define TRACE_FTRACE_MAX ((1 << (TRACE_FTRACE_START + TRACE_CONTEXT_BITS)) - 1)
#define TRACE_LIST_START TRACE_INTERNAL_BIT
-#define TRACE_LIST_MAX ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
-#define TRACE_CONTEXT_MASK TRACE_LIST_MAX
+#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
* Used for setting context
@@ -132,6 +111,7 @@ enum {
TRACE_CTX_IRQ,
TRACE_CTX_SOFTIRQ,
TRACE_CTX_NORMAL,
+ TRACE_CTX_TRANSITION,
};
static __always_inline int trace_get_context_bit(void)
@@ -160,45 +140,34 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
#endif
static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
- int start, int max)
+ int start)
{
unsigned int val = READ_ONCE(current->trace_recursion);
int bit;
- /* A previous recursion check was made */
- if ((val & TRACE_CONTEXT_MASK) > max)
- return 0;
-
bit = trace_get_context_bit() + start;
if (unlikely(val & (1 << bit))) {
/*
* It could be that preempt_count has not been updated during
* a switch between contexts. Allow for a single recursion.
*/
- bit = TRACE_TRANSITION_BIT;
+ bit = TRACE_CTX_TRANSITION + start;
if (val & (1 << bit)) {
do_ftrace_record_recursion(ip, pip);
return -1;
}
- } else {
- /* Normal check passed, clear the transition to allow it again */
- val &= ~(1 << TRACE_TRANSITION_BIT);
}
val |= 1 << bit;
current->trace_recursion = val;
barrier();
- return bit + 1;
+ return bit;
}
static __always_inline void trace_clear_recursion(int bit)
{
- if (!bit)
- return;
-
barrier();
- bit--;
trace_recursion_clear(bit);
}
@@ -214,7 +183,7 @@ static __always_inline void trace_clear_recursion(int bit)
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
unsigned long parent_ip)
{
- return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+ return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
}
/**
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7efbc8aaf7f6..635fbdc9d589 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6977,7 +6977,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op;
int bit;
- bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
+ bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
if (bit < 0)
return;
@@ -7052,7 +7052,7 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
{
int bit;
- bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
+ bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
if (bit < 0)
return;
--
2.31.1
^ permalink raw reply related
* Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference
From: Tyrel Datwyler @ 2021-10-18 21:09 UTC (permalink / raw)
To: Gustavo A. R. Silva, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Nicholas Piggin, Haren Myneni
Cc: linuxppc-dev, linux-hardening, linux-kernel
In-Reply-To: <20211015050345.GA1161918@embeddedor>
On 10/14/21 10:03 PM, Gustavo A. R. Silva wrote:
> (!ptr && !ptr->foo) strikes again. :)
>
> The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
> it leads to a NULL pointer dereference: ptr->foo.
>
> Fix this by converting && to ||
>
> This issue was detected with the help of Coccinelle, and audited and
> fixed manually.
>
> Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window operations")
> Cc: stable@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Looking at the usage pattern it is obvious that if we determine !ptr attempting
to also confirm !ptr->ops is going to blow up.
LGTM.
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> arch/powerpc/platforms/book3s/vas-api.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
> index 30172e52e16b..4d82c92ddd52 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -303,7 +303,7 @@ static int coproc_ioc_tx_win_open(struct file *fp, unsigned long arg)
> return -EINVAL;
> }
>
> - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->open_win) {
> + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->open_win) {
> pr_err("VAS API is not registered\n");
> return -EACCES;
> }
> @@ -373,7 +373,7 @@ static int coproc_mmap(struct file *fp, struct vm_area_struct *vma)
> return -EINVAL;
> }
>
> - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->paste_addr) {
> + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->paste_addr) {
> pr_err("%s(): VAS API is not registered\n", __func__);
> return -EACCES;
> }
>
^ permalink raw reply
* Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference
From: Gustavo A. R. Silva @ 2021-10-18 21:26 UTC (permalink / raw)
To: Tyrel Datwyler
Cc: Haren Myneni, linux-kernel, Nicholas Piggin, Paul Mackerras,
linux-hardening, linuxppc-dev
In-Reply-To: <97c42e43-15b9-5db6-d460-dbb16f31954d@linux.ibm.com>
On Mon, Oct 18, 2021 at 02:09:31PM -0700, Tyrel Datwyler wrote:
> On 10/14/21 10:03 PM, Gustavo A. R. Silva wrote:
> > (!ptr && !ptr->foo) strikes again. :)
> >
> > The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
> > it leads to a NULL pointer dereference: ptr->foo.
> >
> > Fix this by converting && to ||
> >
> > This issue was detected with the help of Coccinelle, and audited and
> > fixed manually.
> >
> > Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window operations")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Looking at the usage pattern it is obvious that if we determine !ptr attempting
> to also confirm !ptr->ops is going to blow up.
>
> LGTM.
>
> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Thanks, Tyrel.
--
Gustavo
>
> > ---
> > arch/powerpc/platforms/book3s/vas-api.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
> > index 30172e52e16b..4d82c92ddd52 100644
> > --- a/arch/powerpc/platforms/book3s/vas-api.c
> > +++ b/arch/powerpc/platforms/book3s/vas-api.c
> > @@ -303,7 +303,7 @@ static int coproc_ioc_tx_win_open(struct file *fp, unsigned long arg)
> > return -EINVAL;
> > }
> >
> > - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->open_win) {
> > + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->open_win) {
> > pr_err("VAS API is not registered\n");
> > return -EACCES;
> > }
> > @@ -373,7 +373,7 @@ static int coproc_mmap(struct file *fp, struct vm_area_struct *vma)
> > return -EINVAL;
> > }
> >
> > - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->paste_addr) {
> > + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->paste_addr) {
> > pr_err("%s(): VAS API is not registered\n", __func__);
> > return -EACCES;
> > }
> >
>
^ permalink raw reply
* Re: [PATCH v2] tracing: Have all levels of checks prevent recursion
From: Steven Rostedt @ 2021-10-18 21:55 UTC (permalink / raw)
To: LKML
Cc: 王贇, Peter Zijlstra (Intel), Paul Walmsley,
James E.J. Bottomley, Paul Mackerras, Jisheng Zhang,
H. Peter Anvin, live-patching, linux-riscv, Miroslav Benes,
Joe Lawrence, Helge Deller, x86, linux-csky, Ingo Molnar,
Petr Mladek, Albert Ou, Jiri Kosina, Nicholas Piggin,
Borislav Petkov, Josh Poimboeuf, Thomas Gleixner, linux-parisc,
linuxppc-dev, Palmer Dabbelt, Masami Hiramatsu, Guo Ren,
Colin Ian King, Linus Torvalds
In-Reply-To: <20211018154412.09fcad3c@gandalf.local.home>
On Mon, 18 Oct 2021 15:44:12 -0400
Steven Rostedt <rostedt@goodmis.org> (by way of Steven Rostedt
<rostedt@goodmis.org>) wrote:
> [
> Linus,
> I have patches that clean this up that are not marked for stable, but
> will depend on this patch. As I already have commits in my next queue,
> I can do one of the following:
>
> 1. Cherry pick this from my urgent tree, and build everything on top.
> 2. Add this on top of the mainline tag my next branch is based on and
> merge it.
> 3. Add this to my next branch, and have it go in at the next merge
> window.
Hmm, I take this back. Although the clean up affects the same code block,
the updates don't actually conflict. (Although, if I do update the comment
that Petr asked, that will conflict. But nothing you can't handle ;-)
I'll start running this change through my tests and post it separately.
-- Steve
> ]
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates
From: Tyrel Datwyler @ 2021-10-18 22:37 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev; +Cc: cheloha, ldufour
In-Reply-To: <20211018163424.2491472-1-nathanl@linux.ibm.com>
On 10/18/21 9:34 AM, Nathan Lynch wrote:
> On VMs with NX encryption, compression, and/or RNG offload, these
> capabilities are described by nodes in the ibm,platform-facilities device
> tree hierarchy:
>
> $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
> /sys/firmware/devicetree/base/ibm,platform-facilities/
> ├── ibm,compression-v1
> ├── ibm,random-v1
> └── ibm,sym-encryption-v1
>
> 3 directories
>
> The acceleration functions that these nodes describe are not disrupted by
> live migration, not even temporarily.
>
> But the post-migration ibm,update-nodes sequence firmware always sends
> "delete" messages for this hierarchy, followed by an "add" directive to
> reconstruct it via ibm,configure-connector (log with debugging statements
> enabled in mobility.c):
>
> mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
> mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
> mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
> mobility: removing node /ibm,platform-facilities:4294967286
> ...
> mobility: added node /ibm,platform-facilities:4294967286
It always bothered me that the update from firmware here was an delete/add at
the entire '/ibm,platform-facilities' node level instead of update properties
calls. When I asked about it years ago the claim was it was just easier to pass
an entire sub-tree as a single node add.
>
> Note we receive a single "add" message for the entire hierarchy, and what
> we receive from the ibm,configure-connector sequence is the top-level
> platform-facilities node along with its three children. The debug message
> simply reports the parent node and not the whole subtree.
>
> Also, significantly, the nodes added are almost completely equivalent to
> the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
> the leaf nodes is the only property I've observed to differ, and Linux does
> not use that. So in practice, the sum of update messages Linux receives for
> this hierarchy is equivalent to minor property updates.
The two props I would think maybe we would need to be most be concerned about
ensuring don't change are "ibm,resource-id" which gets used in the vio bus code
when configuring platform facilities nodes, and 'ibm,max-sync-cop' used by the
pseries-nx driver.
>
> We succeed in removing the original hierarchy from the device tree. But the
> drivers bound to the leaf nodes are ignorant of this, and do not relinquish
> their references. The leaf nodes, still reachable through sysfs, of course
> still refer to the now-freed ibm,platform-facilities parent node, which
> makes use-after-free possible:
It is actually more subtle then the drivers themselves being ignorant. They
could register node update notifiers, but the real problem here is that the vio
bus device itself takes a reference to each child node registered to the bus.
I'm not sure we really want to unbind/rebind drivers as a result of LPM, but it
would be generic to the vio bus instead of updating each driver to ensure its
handling it device node references properly.
>
> refcount_t: addition on 0; use-after-free.
> WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
> refcount_warn_saturate+0x160/0x1f0 (unreliable)
> kobject_get+0xf0/0x100
> of_node_get+0x30/0x50
> of_get_parent+0x50/0xb0
> of_fwnode_get_parent+0x54/0x90
> fwnode_count_parents+0x50/0x150
> fwnode_full_name_string+0x30/0x110
> device_node_string+0x49c/0x790
> vsnprintf+0x1c0/0x4c0
> sprintf+0x44/0x60
> devspec_show+0x34/0x50
> dev_attr_show+0x40/0xa0
> sysfs_kf_seq_show+0xbc/0x200
> kernfs_seq_show+0x44/0x60
> seq_read_iter+0x2a4/0x740
> kernfs_fop_read_iter+0x254/0x2e0
> new_sync_read+0x120/0x190
> vfs_read+0x1d0/0x240
>
> Moreover, the "new" replacement subtree is not correctly added to the
> device tree, resulting in ibm,platform-facilities parent node without the
> appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
>
> $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
> /sys/firmware/devicetree/base/ibm,platform-facilities/
>
> 0 directories
>
> $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
> ./ibm,sym-encryption-v1/of_node: broken symbolic link to
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
> ./ibm,random-v1/of_node: broken symbolic link to
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
> ./ibm,compression-v1/of_node: broken symbolic link to
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
>
> This is because add_dt_node() -> dlpar_attach_node() attaches only the
> parent node returned from configure-connector, ignoring any children. This
> should be corrected for the general case, but fixing that won't help with
> the stale OF node references, which is the more urgent problem.
I don't follow. If the code path you mention is truly broken in the way you say
then DLPAR operations involving nodes with child nodes should also be broken.
Last I had seen was that sysfs adds of the new nodes got renamed because the old
nodes still existed as a result of there reference count not going to zero. Has
this behavior changed, or am I misremembering the observed behavior?
>
> One way to address that would be to make the drivers respond to node
> removal notifications, so that node references can be dropped
> appropriately. But this would likely force the drivers to disrupt active
> clients for no useful purpose: equivalent nodes are immediately re-added.
> And recall that the acceleration capabilities described by the nodes remain
> available throughout the whole process.
See my comments above about its the vio bus more at fault here then the drivers
themselves. I'm inclined to agree though that disrupting active operations with
a driver unbind/rebind is a little extreme.
This also brings me back to firmware removing and re-adding the whole
'/ibm,platform-facilities' node instead of simply updating changed properties
could avoid this whole fiasco.
>
> The solution I believe to be robust for this situation is to convert
> remove+add of a node with an unchanged phandle to an update of the node's
> properties in the Linux device tree structure. That would involve changing
> and adding a fair amount of code, and may take several iterations to land.
>
> Until that can be realized we have a confirmed use-after-free and the
> possibility of memory corruption. So add a limited workaround that
> discriminates on the node type, ignoring adds and removes. This should be
> amenable to backporting in the meantime.
The reality is that '/ibm,platform-facilities' and 'cache' nodes are the only
LPM scoped device tree nodes that allow node delete/add. So, as a one-off
workaround to deal with what I consider a bad firmware approach I think this is
probably the best approach barring getting firmware to move to an update
properties approach.
An audit of the drivers is probably still a valid exercise to ensure any device
tree props they care about they pick up a new value should it change.
-Tyrel
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
> Cc: stable@vger.kernel.org
> ---
> arch/powerpc/platforms/pseries/mobility.c | 34 +++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e0891272d..210a37a065fb 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
>
> static int delete_dt_node(struct device_node *dn)
> {
> + struct device_node *pdn;
> + bool is_platfac;
> +
> + pdn = of_get_parent(dn);
> + is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
> + of_node_is_type(pdn, "ibm,platform-facilities");
> + of_node_put(pdn);
> +
> + /*
> + * The drivers that bind to nodes in the platform-facilities
> + * hierarchy don't support node removal, and the removal directive
> + * from firmware is always followed by an add of an equivalent
> + * node. The capability (e.g. RNG, encryption, compression)
> + * represented by the node is never interrupted by the migration.
> + * So ignore changes to this part of the tree.
> + */
> + if (is_platfac) {
> + pr_notice("ignoring remove operation for %pOFfp\n", dn);
> + return 0;
> + }
> +
> pr_debug("removing node %pOFfp\n", dn);
> dlpar_detach_node(dn);
> return 0;
> @@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
> if (!dn)
> return -ENOENT;
>
> + /*
> + * Since delete_dt_node() ignores this node type, this is the
> + * necessary counterpart. We also know that a platform-facilities
> + * node returned from dlpar_configure_connector() has children
> + * attached, and dlpar_attach_node() only adds the parent, leaking
> + * the children. So ignore these on the add side for now.
> + */
> + if (of_node_is_type(dn, "ibm,platform-facilities")) {
> + pr_notice("ignoring add operation for %pOF\n", dn);
> + dlpar_free_cc_nodes(dn);
> + return 0;
> + }
> +
> rc = dlpar_attach_node(dn, parent_dn);
> if (rc)
> dlpar_free_cc_nodes(dn);
>
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates
From: Tyrel Datwyler @ 2021-10-18 23:16 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev; +Cc: cheloha, ldufour
In-Reply-To: <6de8b295-112f-651e-a18e-3ab3e499ad69@linux.ibm.com>
On 10/18/21 3:37 PM, Tyrel Datwyler wrote:
> On 10/18/21 9:34 AM, Nathan Lynch wrote:
<<snip>>
>>
>> One way to address that would be to make the drivers respond to node
>> removal notifications, so that node references can be dropped
>> appropriately. But this would likely force the drivers to disrupt active
>> clients for no useful purpose: equivalent nodes are immediately re-added.
>> And recall that the acceleration capabilities described by the nodes remain
>> available throughout the whole process.
>
> See my comments above about its the vio bus more at fault here then the drivers
> themselves. I'm inclined to agree though that disrupting active operations with
> a driver unbind/rebind is a little extreme.
>
> This also brings me back to firmware removing and re-adding the whole
> '/ibm,platform-facilities' node instead of simply updating changed properties
> could avoid this whole fiasco.
>
Thinking more on this and trying to recall my discussion so very long ago with
firmware I now recall that I had complained that the idea of a node remove/add
is akin to a DLPAR operation which we have no notion of for platform facilities.
They know better than to do this with other virtual devices so I'm still not
sure why they insist on doing it here.
-Tyrel
^ permalink raw reply
* Re: [PATCH] tracing: Have all levels of checks prevent recursion
From: Steven Rostedt @ 2021-10-19 2:02 UTC (permalink / raw)
To: Petr Mladek
Cc: 王贇, Peter Zijlstra (Intel), Paul Walmsley,
James E.J. Bottomley, Paul Mackerras, Jisheng Zhang,
H. Peter Anvin, live-patching, linux-riscv, Miroslav Benes,
Joe Lawrence, Helge Deller, x86, linux-csky, Ingo Molnar,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, LKML,
Palmer Dabbelt, Masami Hiramatsu, Guo Ren, Colin Ian King,
linuxppc-dev
In-Reply-To: <YW1KKCFallDG+E01@alley>
On Mon, 18 Oct 2021 12:19:20 +0200
Petr Mladek <pmladek@suse.com> wrote:
> > -
> > bit = trace_get_context_bit() + start;
> > if (unlikely(val & (1 << bit))) {
> > /*
> > * It could be that preempt_count has not been updated during
> > * a switch between contexts. Allow for a single recursion.
> > */
> > - bit = TRACE_TRANSITION_BIT;
> > + bit = TRACE_CTX_TRANSITION + start;
>
[..]
> Could we please update the comment? I mean to say if it is a race
> or if we trace a function that should not get traced.
What do you think of this change?
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 1d8cce02c3fb..24f284eb55a7 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -168,8 +168,12 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
bit = trace_get_context_bit() + start;
if (unlikely(val & (1 << bit))) {
/*
- * It could be that preempt_count has not been updated during
- * a switch between contexts. Allow for a single recursion.
+ * If an interrupt occurs during a trace, and another trace
+ * happens in that interrupt but before the preempt_count is
+ * updated to reflect the new interrupt context, then this
+ * will think a recursion occurred, and the event will be dropped.
+ * Let a single instance happen via the TRANSITION_BIT to
+ * not drop those events.
*/
bit = TRACE_TRANSITION_BIT;
if (val & (1 << bit)) {
-- Steve
^ permalink raw reply related
* [PATCH v2] soc: fsl: dpio: instead smp_processor_id with raw_smp_processor_id
From: Meng.Li @ 2021-10-19 2:32 UTC (permalink / raw)
To: Roy.Pledge, leoyang.li, ruxandra.radulescu, horia.geanta
Cc: meng.li, linuxppc-dev, linux-kernel, linux-arm-kernel
From: Meng Li <Meng.Li@windriver.com>
When enable debug kernel configs,there will be calltrace as below:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id+0x20/0x30
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.63-yocto-standard #1
Hardware name: NXP Layerscape LX2160ARDB (DT)
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x24/0x30
dump_stack+0xf0/0x13c
check_preemption_disabled+0x100/0x110
debug_smp_processor_id+0x20/0x30
dpaa2_io_query_fq_count+0xdc/0x154
dpaa2_eth_stop+0x144/0x314
__dev_close_many+0xdc/0x160
__dev_change_flags+0xe8/0x220
dev_change_flags+0x30/0x70
ic_close_devs+0x50/0x78
ip_auto_config+0xed0/0xf10
do_one_initcall+0xac/0x460
kernel_init_freeable+0x30c/0x378
kernel_init+0x20/0x128
ret_from_fork+0x10/0x38
Based on comment in the context, it doesn't matter whether
preemption is disable or not. So, instead smp_processor_id()
with raw_smp_processor_id() to avoid above call trace.
v2:
Remove the preempt_disable/enable() protection, instead smp_processor_id
with raw_smp_processor_id.
Fixes: c89105c9b390 ("staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl")
Cc: stable@vger.kernel.org
Signed-off-by: Meng Li <Meng.Li@windriver.com>
---
drivers/soc/fsl/dpio/dpio-service.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/fsl/dpio/dpio-service.c b/drivers/soc/fsl/dpio/dpio-service.c
index 19f47ea9dab0..3050a534d42c 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -59,7 +59,7 @@ static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d,
* potentially being migrated away.
*/
if (cpu < 0)
- cpu = smp_processor_id();
+ cpu = raw_smp_processor_id();
/* If a specific cpu was requested, pick it up immediately */
return dpio_by_cpu[cpu];
--
2.17.1
^ permalink raw reply related
* [PATCH] driver: soc: dpio: use the whole functions to protect critical zone
From: Meng.Li @ 2021-10-19 3:05 UTC (permalink / raw)
To: Roy.Pledge, leoyang.li, youri.querry_1
Cc: meng.li, linuxppc-dev, linux-kernel, linux-arm-kernel
From: Meng Li <Meng.Li@windriver.com>
In orininal code, use 2 function spin_lock() and local_irq_save() to
protect the critical zone. But when enable the kernel debug config,
there are below inconsistent lock state detected.
================================
WARNING: inconsistent lock state
5.10.63-yocto-standard #1 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
lock_torture_wr/226 [HC0[0]:SC1[5]:HE1:SE0] takes:
ffff002005b2dd80 (&p->access_spinlock){+.?.}-{3:3}, at: qbman_swp_enqueue_multiple_mem_back+0x44/0x270
{SOFTIRQ-ON-W} state was registered at:
lock_acquire.part.0+0xf8/0x250
lock_acquire+0x68/0x84
_raw_spin_lock+0x68/0x90
qbman_swp_enqueue_multiple_mem_back+0x44/0x270
......
cryptomgr_test+0x38/0x60
kthread+0x158/0x164
ret_from_fork+0x10/0x38
irq event stamp: 4498
hardirqs last enabled at (4498): [<ffff800010fcf980>] _raw_spin_unlock_irqrestore+0x90/0xb0
hardirqs last disabled at (4497): [<ffff800010fcffc4>] _raw_spin_lock_irqsave+0xd4/0xe0
softirqs last enabled at (4458): [<ffff8000100108c4>] __do_softirq+0x674/0x724
softirqs last disabled at (4465): [<ffff80001005b2a4>] __irq_exit_rcu+0x190/0x19c
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&p->access_spinlock);
<Interrupt>
lock(&p->access_spinlock);
*** DEADLOCK ***
So, in order to avoid deadlock, use the whole functinos
spin_lock_irqsave/spin_unlock_irqrestore() to protect critical zone.
Fixes: 3b2abda7d28c ("soc: fsl: dpio: Replace QMAN array mode with ring mode enqueue")
Cc: stable@vger.kernel.org
Signed-off-by: Meng Li <Meng.Li@windriver.com>
---
drivers/soc/fsl/dpio/qbman-portal.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index 845e91416b58..a56dbe4de34f 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -785,8 +785,7 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
int i, num_enqueued = 0;
unsigned long irq_flags;
- spin_lock(&s->access_spinlock);
- local_irq_save(irq_flags);
+ spin_lock_irqsave(&s->access_spinlock, irq_flags);
half_mask = (s->eqcr.pi_ci_mask>>1);
full_mask = s->eqcr.pi_ci_mask;
@@ -797,8 +796,7 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
eqcr_ci, s->eqcr.ci);
if (!s->eqcr.available) {
- local_irq_restore(irq_flags);
- spin_unlock(&s->access_spinlock);
+ spin_unlock_irqrestore(&s->access_spinlock, irq_flags);
return 0;
}
}
@@ -837,8 +835,7 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
dma_wmb();
qbman_write_register(s, QBMAN_CINH_SWP_EQCR_PI,
(QB_RT_BIT)|(s->eqcr.pi)|s->eqcr.pi_vb);
- local_irq_restore(irq_flags);
- spin_unlock(&s->access_spinlock);
+ spin_unlock_irqrestore(&s->access_spinlock, irq_flags);
return num_enqueued;
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v2] powerpc/smp: do not decrement idle task preempt count in CPU offline
From: Srikar Dronamraju @ 2021-10-19 4:45 UTC (permalink / raw)
To: Nathan Lynch
Cc: peterz, linux-kernel, mingo, clg, linuxppc-dev,
valentin.schneider
In-Reply-To: <20211015173902.2278118-1-nathanl@linux.ibm.com>
* Nathan Lynch <nathanl@linux.ibm.com> [2021-10-15 12:39:02]:
> With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
> get:
>
> BUG: scheduling while atomic: swapper/1/0/0x00000000
> no locks held by swapper/1/0.
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
> Call Trace:
> dump_stack_lvl+0xac/0x108
> __schedule_bug+0xac/0xe0
> __schedule+0xcf8/0x10d0
> schedule_idle+0x3c/0x70
> do_idle+0x2d8/0x4a0
> cpu_startup_entry+0x38/0x40
> start_secondary+0x2ec/0x3a0
> start_secondary_prolog+0x10/0x14
>
> This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
> preempt count, for reasons explained in commit a7c2bb8279d2 ("powerpc:
> Re-enable preemption before cpu_die()"), specifically "start_secondary()
> expects a preempt_count() of 0."
>
> However, since commit 2c669ef6979c ("powerpc/preempt: Don't touch the idle
> task's preempt_count during hotplug") and commit f1a0a376ca0c ("sched/core:
> Initialize the idle task with preemption disabled"), that justification no
> longer holds.
>
> The idle task isn't supposed to re-enable preemption, so remove the
> vestigial preempt_enable() from the CPU offline path.
>
> Tested with pseries and powernv in qemu, and pseries on PowerVM.
>
> Fixes: 2c669ef6979c ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Looks good to me.
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>
> Notes:
> Changes since v1:
>
> - remove incorrect Fixes: tag, add Valentin's r-b.
>
> arch/powerpc/kernel/smp.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 9cc7d3dbf439..605bab448f84 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1730,8 +1730,6 @@ void __cpu_die(unsigned int cpu)
>
> void arch_cpu_idle_dead(void)
> {
> - sched_preempt_enable_no_resched();
> -
> /*
> * Disable on the down path. This will be re-enabled by
> * start_secondary() via start_secondary_resume() below
> --
> 2.31.1
>
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH] tracing: Have all levels of checks prevent recursion
From: Petr Mladek @ 2021-10-19 6:41 UTC (permalink / raw)
To: Steven Rostedt
Cc: 王贇, Peter Zijlstra (Intel), Paul Walmsley,
James E.J. Bottomley, Paul Mackerras, Jisheng Zhang,
H. Peter Anvin, live-patching, linux-riscv, Miroslav Benes,
Joe Lawrence, Helge Deller, x86, linux-csky, Ingo Molnar,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, LKML,
Palmer Dabbelt, Masami Hiramatsu, Guo Ren, Colin Ian King,
linuxppc-dev
In-Reply-To: <20211018220203.064a42ed@gandalf.local.home>
On Mon 2021-10-18 22:02:03, Steven Rostedt wrote:
> On Mon, 18 Oct 2021 12:19:20 +0200
> Petr Mladek <pmladek@suse.com> wrote:
>
> > > -
> > > bit = trace_get_context_bit() + start;
> > > if (unlikely(val & (1 << bit))) {
> > > /*
> > > * It could be that preempt_count has not been updated during
> > > * a switch between contexts. Allow for a single recursion.
> > > */
> > > - bit = TRACE_TRANSITION_BIT;
> > > + bit = TRACE_CTX_TRANSITION + start;
> >
>
> [..]
>
> > Could we please update the comment? I mean to say if it is a race
> > or if we trace a function that should not get traced.
>
> What do you think of this change?
>
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index 1d8cce02c3fb..24f284eb55a7 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -168,8 +168,12 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
> bit = trace_get_context_bit() + start;
> if (unlikely(val & (1 << bit))) {
> /*
> - * It could be that preempt_count has not been updated during
> - * a switch between contexts. Allow for a single recursion.
> + * If an interrupt occurs during a trace, and another trace
> + * happens in that interrupt but before the preempt_count is
> + * updated to reflect the new interrupt context, then this
> + * will think a recursion occurred, and the event will be dropped.
> + * Let a single instance happen via the TRANSITION_BIT to
> + * not drop those events.
> */
> bit = TRACE_TRANSITION_BIT;
> if (val & (1 << bit)) {
>
>
Looks good to me. Thanks for the update.
Feel free to postpone this change. I do not want to complicate
upstreaming the fix for stable. I am sorry if I already
complicated it.
Best Regards,
Petr
^ permalink raw reply
* [PATCH v3 08/22] powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64.
Remove it.
It allows to refactor setup_kuep() via a __weak function
that only PPC64s will overide for now.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
arch/powerpc/include/asm/kup.h | 5 -----
arch/powerpc/mm/book3s32/Makefile | 1 -
arch/powerpc/mm/book3s32/kuep.c | 13 -------------
arch/powerpc/mm/init-common.c | 15 +++++++++++++++
arch/powerpc/mm/nohash/44x.c | 10 ----------
arch/powerpc/mm/nohash/8xx.c | 7 -------
7 files changed, 16 insertions(+), 37 deletions(-)
delete mode 100644 arch/powerpc/mm/book3s32/kuep.c
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..fce1c13ce642 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3354,7 +3354,7 @@
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
- nosmep [X86,PPC]
+ nosmep [X86,PPC64s]
Disable SMEP (Supervisor Mode Execution Prevention)
even if it is supported by processor.
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 94734a8eb54d..fa8513b7acca 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -33,12 +33,7 @@ extern bool disable_kuap;
#include <linux/pgtable.h>
void setup_kup(void);
-
-#ifdef CONFIG_PPC_KUEP
void setup_kuep(bool disabled);
-#else
-static inline void setup_kuep(bool disabled) { }
-#endif /* CONFIG_PPC_KUEP */
#ifdef CONFIG_PPC_KUAP
void setup_kuap(bool disabled);
diff --git a/arch/powerpc/mm/book3s32/Makefile b/arch/powerpc/mm/book3s32/Makefile
index 15f4773643d2..50dd8f6bdf46 100644
--- a/arch/powerpc/mm/book3s32/Makefile
+++ b/arch/powerpc/mm/book3s32/Makefile
@@ -9,5 +9,4 @@ endif
obj-y += mmu.o mmu_context.o
obj-$(CONFIG_PPC_BOOK3S_603) += nohash_low.o
obj-$(CONFIG_PPC_BOOK3S_604) += hash_low.o tlb.o
-obj-$(CONFIG_PPC_KUEP) += kuep.o
obj-$(CONFIG_PPC_KUAP) += kuap.o
diff --git a/arch/powerpc/mm/book3s32/kuep.c b/arch/powerpc/mm/book3s32/kuep.c
deleted file mode 100644
index 78fc48eee510..000000000000
--- a/arch/powerpc/mm/book3s32/kuep.c
+++ /dev/null
@@ -1,13 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-
-#include <asm/code-patching.h>
-#include <asm/kup.h>
-#include <asm/smp.h>
-
-void setup_kuep(bool disabled)
-{
- if (smp_processor_id() != boot_cpuid)
- return;
-
- pr_info("Activating Kernel Userspace Execution Prevention\n");
-}
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index b4f3437aee38..b22e0649e2d3 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -20,6 +20,7 @@
#include <linux/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/kup.h>
+#include <asm/smp.h>
phys_addr_t memstart_addr __ro_after_init = (phys_addr_t)~0ull;
EXPORT_SYMBOL_GPL(memstart_addr);
@@ -33,6 +34,9 @@ bool disable_kuap = !IS_ENABLED(CONFIG_PPC_KUAP);
static int __init parse_nosmep(char *p)
{
+ if (!IS_ENABLED(CONFIG_PPC_BOOKS_64))
+ return 0;
+
disable_kuep = true;
pr_warn("Disabling Kernel Userspace Execution Prevention\n");
return 0;
@@ -47,6 +51,17 @@ static int __init parse_nosmap(char *p)
}
early_param("nosmap", parse_nosmap);
+void __weak setup_kuep(bool disabled)
+{
+ if (!IS_ENABLED(CONFIG_PPC_KUEP) || disabled)
+ return;
+
+ if (smp_processor_id() != boot_cpuid)
+ return;
+
+ pr_info("Activating Kernel Userspace Execution Prevention\n");
+}
+
void setup_kup(void)
{
setup_kuap(disable_kuap);
diff --git a/arch/powerpc/mm/nohash/44x.c b/arch/powerpc/mm/nohash/44x.c
index ceb290df1fb5..796c824acc8c 100644
--- a/arch/powerpc/mm/nohash/44x.c
+++ b/arch/powerpc/mm/nohash/44x.c
@@ -240,13 +240,3 @@ void __init mmu_init_secondary(int cpu)
}
}
#endif /* CONFIG_SMP */
-
-#ifdef CONFIG_PPC_KUEP
-void setup_kuep(bool disabled)
-{
- if (smp_processor_id() != boot_cpuid)
- return;
-
- pr_info("Activating Kernel Userspace Execution Prevention\n");
-}
-#endif
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index e878e8124ee6..36010d1c0bc4 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -212,13 +212,6 @@ void __init setup_initial_memory_limit(phys_addr_t first_memblock_base,
memblock_set_current_limit(min_t(u64, first_memblock_size, SZ_32M));
}
-#ifdef CONFIG_PPC_KUEP
-void setup_kuep(bool disabled)
-{
- pr_info("Activating Kernel Userspace Execution Prevention\n");
-}
-#endif
-
#ifdef CONFIG_PPC_KUAP
struct static_key_false disable_kuap_key;
EXPORT_SYMBOL(disable_kuap_key);
--
2.31.1
^ permalink raw reply related
* [PATCH v3 12/22] powerpc/kuap: Add kuap_lock()
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
Add kuap_lock() and call it when entering interrupts from user.
It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.
However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/kup.h | 14 +++++++++-----
arch/powerpc/include/asm/interrupt.h | 5 ++++-
arch/powerpc/include/asm/kup.h | 9 +++++++++
arch/powerpc/include/asm/nohash/32/kup-8xx.h | 4 ++++
arch/powerpc/kernel/interrupt.c | 2 ++
5 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
index bc245e9f0bcc..678f9c9d89b6 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -57,7 +57,7 @@ static inline void kuap_unlock_all(void)
void kuap_lock_all_ool(void);
void kuap_unlock_all_ool(void);
-static inline void kuap_lock(unsigned long addr, bool ool)
+static inline void kuap_lock_addr(unsigned long addr, bool ool)
{
if (likely(addr != KUAP_ALL))
kuap_lock_one(addr);
@@ -77,6 +77,10 @@ static inline void kuap_unlock(unsigned long addr, bool ool)
kuap_unlock_all_ool();
}
+static inline void __kuap_lock(void)
+{
+}
+
static inline void __kuap_save_and_lock(struct pt_regs *regs)
{
unsigned long kuap = current->thread.kuap;
@@ -86,7 +90,7 @@ static inline void __kuap_save_and_lock(struct pt_regs *regs)
return;
current->thread.kuap = KUAP_NONE;
- kuap_lock(kuap, false);
+ kuap_lock_addr(kuap, false);
}
static inline void kuap_user_restore(struct pt_regs *regs)
@@ -97,7 +101,7 @@ static inline void __kuap_kernel_restore(struct pt_regs *regs, unsigned long kua
{
if (unlikely(kuap != KUAP_NONE)) {
current->thread.kuap = KUAP_NONE;
- kuap_lock(kuap, false);
+ kuap_lock_addr(kuap, false);
}
if (likely(regs->kuap == KUAP_NONE))
@@ -139,7 +143,7 @@ static __always_inline void __prevent_user_access(unsigned long dir)
return;
current->thread.kuap = KUAP_NONE;
- kuap_lock(kuap, true);
+ kuap_lock_addr(kuap, true);
}
static inline unsigned long __prevent_user_access_return(void)
@@ -148,7 +152,7 @@ static inline unsigned long __prevent_user_access_return(void)
if (flags != KUAP_NONE) {
current->thread.kuap = KUAP_NONE;
- kuap_lock(flags, true);
+ kuap_lock_addr(flags, true);
}
return flags;
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 3bbca1fbbe1e..ae719e200c80 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -140,9 +140,12 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
trace_hardirqs_off();
if (user_mode(regs))
- account_cpu_user_entry();
+ kuap_lock();
else
kuap_save_and_lock(regs);
+
+ if (user_mode(regs))
+ account_cpu_user_entry();
#endif
#ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 5d3c1e8060f9..34574a7455ce 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -49,6 +49,7 @@ __bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
}
static inline void __kuap_assert_locked(void) { }
+static inline void __kuap_lock(void) { }
static inline void __kuap_save_and_lock(struct pt_regs *regs) { }
static inline void kuap_user_restore(struct pt_regs *regs) { }
static inline void __kuap_kernel_restore(struct pt_regs *regs, unsigned long amr) { }
@@ -91,6 +92,14 @@ static __always_inline void kuap_assert_locked(void)
}
#ifdef CONFIG_PPC32
+static __always_inline void kuap_lock(void)
+{
+ if (kuap_is_disabled())
+ return;
+
+ __kuap_lock();
+}
+
static __always_inline void kuap_save_and_lock(struct pt_regs *regs)
{
if (kuap_is_disabled())
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 37fe4b32b658..c44d97751723 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -20,6 +20,10 @@ static __always_inline bool kuap_is_disabled(void)
return static_branch_unlikely(&disable_kuap_key);
}
+static inline void __kuap_lock(void)
+{
+}
+
static inline void __kuap_save_and_lock(struct pt_regs *regs)
{
regs->kuap = mfspr(SPRN_MD_AP);
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 0d12aa66e1f9..dc56a514df0a 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -81,6 +81,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
{
syscall_fn f;
+ kuap_lock();
+
regs->orig_gpr3 = r3;
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
--
2.31.1
^ permalink raw reply related
* [PATCH v3 05/22] powerpc/32s: Remove capability to disable KUEP at boottime
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
Disabling KUEP at boottime makes things unnecessarily complex.
Still allow disabling KUEP at build time, but when it's built-in
it is always there.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/kup.h | 3 +--
arch/powerpc/mm/book3s32/kuep.c | 10 ++--------
2 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
index 9f38040f0641..fb6c39225dd1 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -12,7 +12,6 @@
#include <linux/jump_label.h>
extern struct static_key_false disable_kuap_key;
-extern struct static_key_false disable_kuep_key;
static __always_inline bool kuap_is_disabled(void)
{
@@ -21,7 +20,7 @@ static __always_inline bool kuap_is_disabled(void)
static __always_inline bool kuep_is_disabled(void)
{
- return !IS_ENABLED(CONFIG_PPC_KUEP) || static_branch_unlikely(&disable_kuep_key);
+ return !IS_ENABLED(CONFIG_PPC_KUEP);
}
static inline void kuep_lock(void)
diff --git a/arch/powerpc/mm/book3s32/kuep.c b/arch/powerpc/mm/book3s32/kuep.c
index c20733d6e02c..8474edce3df9 100644
--- a/arch/powerpc/mm/book3s32/kuep.c
+++ b/arch/powerpc/mm/book3s32/kuep.c
@@ -3,18 +3,12 @@
#include <asm/kup.h>
#include <asm/smp.h>
-struct static_key_false disable_kuep_key;
-
void setup_kuep(bool disabled)
{
- if (!disabled)
- kuep_lock();
+ kuep_lock();
if (smp_processor_id() != boot_cpuid)
return;
- if (disabled)
- static_branch_enable(&disable_kuep_key);
- else
- pr_info("Activating Kernel Userspace Execution Prevention\n");
+ pr_info("Activating Kernel Userspace Execution Prevention\n");
}
--
2.31.1
^ permalink raw reply related
* [PATCH v3 04/22] powerpc/book3e: Activate KUEP at all time
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.
Activate it at all time.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_fsl_booke.S | 1 +
arch/powerpc/platforms/Kconfig.cputype | 2 ++
2 files changed, 3 insertions(+)
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index 0a9a0f301474..4622b50a5208 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -777,6 +777,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_BIG_PHYS)
andi. r10, r11, _PAGE_USER /* Test for _PAGE_USER */
slwi r10, r12, 1
or r10, r10, r12
+ rlwinm r10, r10, 0, ~_PAGE_EXEC /* Clear SX on user pages */
iseleq r12, r12, r10
rlwimi r13, r12, 0, 20, 31 /* Get RPN from PTE, merge w/ perms */
mtspr SPRN_MAS3, r13
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 6f2e8a4026ff..68361ab7078d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -292,6 +292,8 @@ config PPC_FSL_BOOK3E
select FSL_EMB_PERFMON
select PPC_SMP_MUXED_IPI
select PPC_DOORBELL
+ select PPC_HAVE_KUEP
+ select PPC_KUEP
default y if FSL_BOOKE
config PTE_64BIT
--
2.31.1
^ permalink raw reply related
* [PATCH v3 07/22] powerpc/32s: Save content of sr0 to avoid 'mfsr'
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.
In order to avoid this operation, save it in mm context and in
thread struct.
The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.
For unlocking, the only thing to do is to clear SR_NX.
This improves null_syscall selftest by 12 cycles, ie 4%.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/mmu-hash.h | 5 +++++
arch/powerpc/include/asm/processor.h | 9 +++++++++
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_32.S | 8 +++-----
arch/powerpc/mm/book3s32/kuap.c | 5 ++++-
arch/powerpc/mm/book3s32/kuep.c | 1 +
arch/powerpc/mm/book3s32/mmu_context.c | 15 +++++++--------
arch/powerpc/mm/mmu_context.c | 3 +++
8 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index e2f7ccc13edb..7be27862329f 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -175,9 +175,14 @@ struct hash_pte {
typedef struct {
unsigned long id;
+ unsigned long sr0;
void __user *vdso;
} mm_context_t;
+#ifdef CONFIG_PPC_KUEP
+#define INIT_MM_CONTEXT(mm) .context.sr0 = SR_NX
+#endif
+
void update_bats(void);
static inline void cleanup_cpu_mmu_context(void) { }
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index f348e564f7dd..e64ec54398c6 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -157,6 +157,7 @@ struct thread_struct {
#ifdef CONFIG_PPC_BOOK3S_32
unsigned long r0, r3, r4, r5, r6, r8, r9, r11;
unsigned long lr, ctr;
+ unsigned long sr0;
#endif
#endif /* CONFIG_PPC32 */
/* Debug Registers */
@@ -276,6 +277,12 @@ struct thread_struct {
#define SPEFSCR_INIT
#endif
+#ifdef CONFIG_PPC_BOOK3S_32
+#define SR0_INIT .sr0 = IS_ENABLED(CONFIG_PPC_KUEP) ? SR_NX : 0,
+#else
+#define SR0_INIT
+#endif
+
#if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_PPC_KUAP)
#define INIT_THREAD { \
.ksp = INIT_SP, \
@@ -283,6 +290,7 @@ struct thread_struct {
.kuap = ~0UL, /* KUAP_NONE */ \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
+ SR0_INIT \
}
#elif defined(CONFIG_PPC32)
#define INIT_THREAD { \
@@ -290,6 +298,7 @@ struct thread_struct {
.pgdir = swapper_pg_dir, \
.fpexc_mode = MSR_FE0 | MSR_FE1, \
SPEFSCR_INIT \
+ SR0_INIT \
}
#else
#define INIT_THREAD { \
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index e563d3222d69..256aa669cf80 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -141,6 +141,7 @@ int main(void)
OFFSET(THR11, thread_struct, r11);
OFFSET(THLR, thread_struct, lr);
OFFSET(THCTR, thread_struct, ctr);
+ OFFSET(THSR0, thread_struct, sr0);
#endif
#ifdef CONFIG_SPE
OFFSET(THREAD_EVR0, thread_struct, evr[0]);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4ba6a8c43475..cf3cc0e52d07 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -76,15 +76,13 @@ _ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
#if defined(CONFIG_PPC_KUEP) && defined(CONFIG_PPC_BOOK3S_32)
.globl __kuep_lock
__kuep_lock:
- mfsr r9,0
- rlwinm r9,r9,0,8,3
- oris r9,r9,SR_NX@h
+ lwz r9, THREAD+THSR0(r2)
update_user_segments_by_4 r9, r10, r11, r12
blr
__kuep_unlock:
- mfsr r9,0
- rlwinm r9,r9,0,8,2
+ lwz r9, THREAD+THSR0(r2)
+ rlwinm r9,r9,0,~SR_NX
update_user_segments_by_4 r9, r10, r11, r12
blr
diff --git a/arch/powerpc/mm/book3s32/kuap.c b/arch/powerpc/mm/book3s32/kuap.c
index 0f920f09af57..28676cabb005 100644
--- a/arch/powerpc/mm/book3s32/kuap.c
+++ b/arch/powerpc/mm/book3s32/kuap.c
@@ -20,8 +20,11 @@ EXPORT_SYMBOL(kuap_unlock_all_ool);
void setup_kuap(bool disabled)
{
- if (!disabled)
+ if (!disabled) {
kuap_lock_all_ool();
+ init_mm.context.sr0 |= SR_KS;
+ current->thread.sr0 |= SR_KS;
+ }
if (smp_processor_id() != boot_cpuid)
return;
diff --git a/arch/powerpc/mm/book3s32/kuep.c b/arch/powerpc/mm/book3s32/kuep.c
index bac1420d028b..78fc48eee510 100644
--- a/arch/powerpc/mm/book3s32/kuep.c
+++ b/arch/powerpc/mm/book3s32/kuep.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-or-later
+#include <asm/code-patching.h>
#include <asm/kup.h>
#include <asm/smp.h>
diff --git a/arch/powerpc/mm/book3s32/mmu_context.c b/arch/powerpc/mm/book3s32/mmu_context.c
index e2708e387dc3..269a3eb25a73 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -69,6 +69,12 @@ EXPORT_SYMBOL_GPL(__init_new_context);
int init_new_context(struct task_struct *t, struct mm_struct *mm)
{
mm->context.id = __init_new_context();
+ mm->context.sr0 = CTX_TO_VSID(mm->context.id, 0);
+
+ if (!kuep_is_disabled())
+ mm->context.sr0 |= SR_NX;
+ if (!kuap_is_disabled())
+ mm->context.sr0 |= SR_KS;
return 0;
}
@@ -108,20 +114,13 @@ void __init mmu_context_init(void)
void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk)
{
long id = next->context.id;
- unsigned long val;
if (id < 0)
panic("mm_struct %p has no context ID", next);
isync();
- val = CTX_TO_VSID(id, 0);
- if (!kuep_is_disabled())
- val |= SR_NX;
- if (!kuap_is_disabled())
- val |= SR_KS;
-
- update_user_segments(val);
+ update_user_segments(next->context.sr0);
if (IS_ENABLED(CONFIG_BDI_SWITCH))
abatron_pteptrs[1] = next->pgd;
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index 74246536b832..e618d5442a28 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -18,6 +18,9 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
{
/* 32-bit keeps track of the current PGDIR in the thread struct */
tsk->thread.pgdir = mm->pgd;
+#ifdef CONFIG_PPC_BOOK3S_32
+ tsk->thread.sr0 = mm->context.sr0;
+#endif
}
#elif defined(CONFIG_PPC_BOOK3E_64)
static inline void switch_mm_pgdir(struct task_struct *tsk,
--
2.31.1
^ permalink raw reply related
* [PATCH v3 06/22] powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.
However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).
A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.
In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.
So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.
Group 'mtsr's by 4.
With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/kup.h | 34 --------
arch/powerpc/include/asm/book3s/32/mmu-hash.h | 77 ++++++++++++++++++-
arch/powerpc/include/asm/interrupt.h | 6 +-
arch/powerpc/include/asm/kup.h | 5 --
arch/powerpc/kernel/entry_32.S | 31 ++++++++
arch/powerpc/kernel/head_32.h | 6 ++
arch/powerpc/kernel/head_book3s_32.S | 4 +
arch/powerpc/kernel/interrupt.c | 3 -
arch/powerpc/mm/book3s32/kuep.c | 2 -
9 files changed, 119 insertions(+), 49 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
index fb6c39225dd1..e3db5ed4b255 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -23,40 +23,6 @@ static __always_inline bool kuep_is_disabled(void)
return !IS_ENABLED(CONFIG_PPC_KUEP);
}
-static inline void kuep_lock(void)
-{
- if (kuep_is_disabled())
- return;
-
- update_user_segments(mfsr(0) | SR_NX);
- /*
- * This isync() shouldn't be necessary as the kernel is not excepted to
- * run any instruction in userspace soon after the update of segments,
- * but hash based cores (at least G3) seem to exhibit a random
- * behaviour when the 'isync' is not there. 603 cores don't have this
- * behaviour so don't do the 'isync' as it saves several CPU cycles.
- */
- if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
- isync(); /* Context sync required after mtsr() */
-}
-
-static inline void kuep_unlock(void)
-{
- if (kuep_is_disabled())
- return;
-
- update_user_segments(mfsr(0) & ~SR_NX);
- /*
- * This isync() shouldn't be necessary as a 'rfi' will soon be executed
- * to return to userspace, but hash based cores (at least G3) seem to
- * exhibit a random behaviour when the 'isync' is not there. 603 cores
- * don't have this behaviour so don't do the 'isync' as it saves several
- * CPU cycles.
- */
- if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
- isync(); /* Context sync required after mtsr() */
-}
-
#ifdef CONFIG_PPC_KUAP
#include <linux/sched.h>
diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index f5be185cbdf8..e2f7ccc13edb 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -64,7 +64,82 @@ struct ppc_bat {
#define SR_KP 0x20000000 /* User key */
#define SR_KS 0x40000000 /* Supervisor key */
-#ifndef __ASSEMBLY__
+#ifdef __ASSEMBLY__
+
+#include <asm/asm-offsets.h>
+
+.macro uus_addi sr reg1 reg2 imm
+ .if NUM_USER_SEGMENTS > \sr
+ addi \reg1,\reg2,\imm
+ .endif
+.endm
+
+.macro uus_mtsr sr reg1
+ .if NUM_USER_SEGMENTS > \sr
+ mtsr \sr, \reg1
+ .endif
+.endm
+
+/*
+ * This isync() shouldn't be necessary as the kernel is not excepted to run
+ * any instruction in userspace soon after the update of segments and 'rfi'
+ * instruction is used to return to userspace, but hash based cores
+ * (at least G3) seem to exhibit a random behaviour when the 'isync' is not
+ * there. 603 cores don't have this behaviour so don't do the 'isync' as it
+ * saves several CPU cycles.
+ */
+.macro uus_isync
+#ifdef CONFIG_PPC_BOOK3S_604
+BEGIN_MMU_FTR_SECTION
+ isync
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
+#endif
+.endm
+
+.macro update_user_segments_by_4 tmp1 tmp2 tmp3 tmp4
+ uus_addi 1, \tmp2, \tmp1, 0x111
+ uus_addi 2, \tmp3, \tmp1, 0x222
+ uus_addi 3, \tmp4, \tmp1, 0x333
+
+ uus_mtsr 0, \tmp1
+ uus_mtsr 1, \tmp2
+ uus_mtsr 2, \tmp3
+ uus_mtsr 3, \tmp4
+
+ uus_addi 4, \tmp1, \tmp1, 0x444
+ uus_addi 5, \tmp2, \tmp2, 0x444
+ uus_addi 6, \tmp3, \tmp3, 0x444
+ uus_addi 7, \tmp4, \tmp4, 0x444
+
+ uus_mtsr 4, \tmp1
+ uus_mtsr 5, \tmp2
+ uus_mtsr 6, \tmp3
+ uus_mtsr 7, \tmp4
+
+ uus_addi 8, \tmp1, \tmp1, 0x444
+ uus_addi 9, \tmp2, \tmp2, 0x444
+ uus_addi 10, \tmp3, \tmp3, 0x444
+ uus_addi 11, \tmp4, \tmp4, 0x444
+
+ uus_mtsr 8, \tmp1
+ uus_mtsr 9, \tmp2
+ uus_mtsr 10, \tmp3
+ uus_mtsr 11, \tmp4
+
+ uus_addi 12, \tmp1, \tmp1, 0x444
+ uus_addi 13, \tmp2, \tmp2, 0x444
+ uus_addi 14, \tmp3, \tmp3, 0x444
+ uus_addi 15, \tmp4, \tmp4, 0x444
+
+ uus_mtsr 12, \tmp1
+ uus_mtsr 13, \tmp2
+ uus_mtsr 14, \tmp3
+ uus_mtsr 15, \tmp4
+
+ uus_isync
+.endm
+
+#else
/*
* This macro defines the mapping from contexts to VSIDs (virtual
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index a1d238255f07..3bbca1fbbe1e 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -139,12 +139,10 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
if (!arch_irq_disabled_regs(regs))
trace_hardirqs_off();
- if (user_mode(regs)) {
- kuep_lock();
+ if (user_mode(regs))
account_cpu_user_entry();
- } else {
+ else
kuap_save_and_lock(regs);
- }
#endif
#ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 8699ca5884b9..94734a8eb54d 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -40,11 +40,6 @@ void setup_kuep(bool disabled);
static inline void setup_kuep(bool disabled) { }
#endif /* CONFIG_PPC_KUEP */
-#ifndef CONFIG_PPC_BOOK3S_32
-static inline void kuep_lock(void) { }
-static inline void kuep_unlock(void) { }
-#endif
-
#ifdef CONFIG_PPC_KUAP
void setup_kuap(bool disabled);
#else
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 61fdd53cdd9a..4ba6a8c43475 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -73,6 +73,34 @@ prepare_transfer_to_handler:
_ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
#endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
+#if defined(CONFIG_PPC_KUEP) && defined(CONFIG_PPC_BOOK3S_32)
+ .globl __kuep_lock
+__kuep_lock:
+ mfsr r9,0
+ rlwinm r9,r9,0,8,3
+ oris r9,r9,SR_NX@h
+ update_user_segments_by_4 r9, r10, r11, r12
+ blr
+
+__kuep_unlock:
+ mfsr r9,0
+ rlwinm r9,r9,0,8,2
+ update_user_segments_by_4 r9, r10, r11, r12
+ blr
+
+.macro kuep_lock
+ bl __kuep_lock
+.endm
+.macro kuep_unlock
+ bl __kuep_unlock
+.endm
+#else
+.macro kuep_lock
+.endm
+.macro kuep_unlock
+.endm
+#endif
+
.globl transfer_to_syscall
transfer_to_syscall:
stw r11, GPR1(r1)
@@ -94,6 +122,7 @@ transfer_to_syscall:
SAVE_2GPRS(7, r1)
addi r2,r10,-THREAD
SAVE_NVGPRS(r1)
+ kuep_lock
/* Calling convention has r9 = orig r0, r10 = regs */
addi r10,r1,STACK_FRAME_OVERHEAD
@@ -110,6 +139,7 @@ ret_from_syscall:
cmplwi cr0,r5,0
bne- 2f
#endif /* CONFIG_PPC_47x */
+ kuep_unlock
lwz r4,_LINK(r1)
lwz r5,_CCR(r1)
mtlr r4
@@ -273,6 +303,7 @@ interrupt_return:
beq .Lkernel_interrupt_return
bl interrupt_exit_user_prepare
cmpwi r3,0
+ kuep_unlock
bne- .Lrestore_nvgprs
.Lfast_user_interrupt_return:
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 6b1ec9e3541b..133197039775 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -136,6 +136,12 @@ _ASM_NOKPROBE_SYMBOL(\name\()_virt)
andi. r12,r9,MSR_PR
bne 777f
bl prepare_transfer_to_handler
+#ifdef CONFIG_PPC_KUEP
+ b 778f
+777:
+ bl __kuep_lock
+778:
+#endif
777:
#endif
.endm
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 68e5c0a7e99d..fa84744d6b24 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -931,7 +931,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
_GLOBAL(load_segment_registers)
li r0, NUM_USER_SEGMENTS /* load up user segment register values */
mtctr r0 /* for context 0 */
+#ifdef CONFIG_PPC_KUEP
+ lis r3, SR_NX@h /* Kp = 0, Ks = 0, VSID = 0 */
+#else
li r3, 0 /* Kp = 0, Ks = 0, VSID = 0 */
+#endif
li r4, 0
3: mtsrin r3, r4
addi r3, r3, 0x111 /* increment VSID */
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index de10a2697258..0d12aa66e1f9 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -81,8 +81,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
{
syscall_fn f;
- kuep_lock();
-
regs->orig_gpr3 = r3;
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -406,7 +404,6 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
/* Restore user access locks last */
kuap_user_restore(regs);
- kuep_unlock();
return ret;
}
diff --git a/arch/powerpc/mm/book3s32/kuep.c b/arch/powerpc/mm/book3s32/kuep.c
index 8474edce3df9..bac1420d028b 100644
--- a/arch/powerpc/mm/book3s32/kuep.c
+++ b/arch/powerpc/mm/book3s32/kuep.c
@@ -5,8 +5,6 @@
void setup_kuep(bool disabled)
{
- kuep_lock();
-
if (smp_processor_id() != boot_cpuid)
return;
--
2.31.1
^ permalink raw reply related
* [PATCH v3 02/22] powerpc/8xx: Activate KUEP at all time
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.
Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 6 ++----
arch/powerpc/mm/nohash/8xx.c | 5 -----
arch/powerpc/platforms/Kconfig.cputype | 1 +
3 files changed, 3 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 997cec973406..0e93a4728c9e 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -39,12 +39,10 @@
* 0 => Kernel => 11 (all accesses performed according as user iaw page definition)
* 1 => Kernel+Accessed => 01 (all accesses performed according to page definition)
* 2 => User => 11 (all accesses performed according as user iaw page definition)
- * 3 => User+Accessed => 00 (all accesses performed as supervisor iaw page definition) for INIT
- * => 10 (all accesses performed according to swaped page definition) for KUEP
+ * 3 => User+Accessed => 10 (all accesses performed according to swaped page definition) for KUEP
* 4-15 => Not Used
*/
-#define MI_APG_INIT 0xdc000000
-#define MI_APG_KUEP 0xde000000
+#define MI_APG_INIT 0xde000000
/* The effective page number register. When read, contains the information
* about the last instruction TLB miss. When MI_RPN is written, bits in
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index baa1f8a40af8..e878e8124ee6 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -215,12 +215,7 @@ void __init setup_initial_memory_limit(phys_addr_t first_memblock_base,
#ifdef CONFIG_PPC_KUEP
void setup_kuep(bool disabled)
{
- if (disabled)
- return;
-
pr_info("Activating Kernel Userspace Execution Prevention\n");
-
- mtspr(SPRN_MI_AP, MI_APG_KUEP);
}
#endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index a208997ade88..66650ec1c7e6 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -43,6 +43,7 @@ config PPC_8xx
select ARCH_SUPPORTS_HUGETLBFS
select FSL_SOC
select PPC_HAVE_KUEP
+ select PPC_KUEP
select PPC_HAVE_KUAP
select HAVE_ARCH_VMAP_STACK
select HUGETLBFS
--
2.31.1
^ permalink raw reply related
* [PATCH v3 11/22] powerpc/kuap: Remove __kuap_assert_locked()
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
__kuap_assert_locked() is redundant with
__kuap_get_and_assert_locked().
Move the verification of CONFIG_PPC_KUAP_DEBUG in kuap_assert_locked()
and make it call __kuap_get_and_assert_locked() directly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/book3s/32/kup.h | 5 -----
arch/powerpc/include/asm/book3s/64/kup.h | 6 ------
arch/powerpc/include/asm/kup.h | 3 ++-
arch/powerpc/include/asm/nohash/32/kup-8xx.h | 6 ------
4 files changed, 2 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
index 35ca48f7c293..bc245e9f0bcc 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -117,11 +117,6 @@ static inline unsigned long __kuap_get_and_assert_locked(void)
return kuap;
}
-static inline void __kuap_assert_locked(void)
-{
- __kuap_get_and_assert_locked();
-}
-
static __always_inline void __allow_user_access(void __user *to, const void __user *from,
u32 size, unsigned long dir)
{
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 9f2099790658..503828709d55 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -298,12 +298,6 @@ static inline unsigned long __kuap_get_and_assert_locked(void)
return amr;
}
-static inline void __kuap_assert_locked(void)
-{
- if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
- WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
-}
-
/*
* We support individually allowing read or write, but we don't support nesting
* because that would require an expensive read/modify write of the AMR.
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 33e93a6c5d19..5d3c1e8060f9 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -86,7 +86,8 @@ static __always_inline void kuap_assert_locked(void)
if (kuap_is_disabled())
return;
- __kuap_assert_locked();
+ if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
+ __kuap_get_and_assert_locked();
}
#ifdef CONFIG_PPC32
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 74f15c386476..37fe4b32b658 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -47,12 +47,6 @@ static inline unsigned long __kuap_get_and_assert_locked(void)
return kuap;
}
-static inline void __kuap_assert_locked(void)
-{
- if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
- __kuap_get_and_assert_locked();
-}
-
static inline void __allow_user_access(void __user *to, const void __user *from,
unsigned long size, unsigned long dir)
{
--
2.31.1
^ permalink raw reply related
* [PATCH v3 01/22] Revert "powerpc: Inline setup_kup()"
From: Christophe Leroy @ 2021-10-19 7:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cover.1634627931.git.christophe.leroy@csgroup.eu>
This reverts commit 1791ebd131c46539b024c0f2ebf12b6c88a265b9.
setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.
But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.
In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.
Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/kup.h | 8 ++------
arch/powerpc/mm/init-common.c | 6 ++++++
arch/powerpc/mm/nohash/8xx.c | 4 ++--
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 1df763002726..8699ca5884b9 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -32,6 +32,8 @@ extern bool disable_kuap;
#include <linux/pgtable.h>
+void setup_kup(void);
+
#ifdef CONFIG_PPC_KUEP
void setup_kuep(bool disabled);
#else
@@ -78,12 +80,6 @@ static inline void restore_user_access(unsigned long flags) { }
#endif /* CONFIG_PPC_BOOK3S_64 */
#endif /* CONFIG_PPC_KUAP */
-static __always_inline void setup_kup(void)
-{
- setup_kuep(disable_kuep);
- setup_kuap(disable_kuap);
-}
-
static __always_inline void allow_read_from_user(const void __user *from, unsigned long size)
{
barrier_nospec();
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 3a82f89827a5..b4f3437aee38 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -47,6 +47,12 @@ static int __init parse_nosmap(char *p)
}
early_param("nosmap", parse_nosmap);
+void setup_kup(void)
+{
+ setup_kuap(disable_kuap);
+ setup_kuep(disable_kuep);
+}
+
#define CTOR(shift) static void ctor_##shift(void *addr) \
{ \
memset(addr, 0, sizeof(void *) << (shift)); \
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 0df9fe29dd56..baa1f8a40af8 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -213,7 +213,7 @@ void __init setup_initial_memory_limit(phys_addr_t first_memblock_base,
}
#ifdef CONFIG_PPC_KUEP
-void __init setup_kuep(bool disabled)
+void setup_kuep(bool disabled)
{
if (disabled)
return;
@@ -228,7 +228,7 @@ void __init setup_kuep(bool disabled)
struct static_key_false disable_kuap_key;
EXPORT_SYMBOL(disable_kuap_key);
-void __init setup_kuap(bool disabled)
+void setup_kuap(bool disabled)
{
if (disabled) {
static_branch_enable(&disable_kuap_key);
--
2.31.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox