LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [next-20210827][ppc][multipathd] INFO: task hung in dm_table_add_target
From: Christoph Hellwig @ 2021-09-01 13:36 UTC (permalink / raw)
  To: Abdul Haleem
  Cc: axboe, sachinp, jack, linux-scsi, linux-kernel, dm-devel,
	linux-next, dougmill, Brian King, linuxppc-dev, hch
In-Reply-To: <68dde454-965a-0c44-374a-a0ca277150ee@linux.vnet.ibm.com>

On Wed, Sep 01, 2021 at 04:47:26PM +0530, Abdul Haleem wrote:
> Greeting's
>
> multiple task hung while adding the vfc disk back to the multipath on my 
> powerpc box running linux-next kernel

Can you retry to reproduce this with lockdep enabled to see if there
is anything interesting holding this lock?

^ permalink raw reply

* [PATCH v2] powerpc/bug: Cast to unsigned long before passing to inline asm
From: Michael Ellerman @ 2021-09-01 11:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nathan

In commit 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
it would take the warning condition, x, and double negate it before
converting the result to int, and passing that int to the underlying
inline asm. ie:

  #define WARN_ON(x) ({
  	int __ret_warn_on = !!(x);
  	if (__builtin_constant_p(__ret_warn_on)) {
  	...
  	} else {
  		BUG_ENTRY(PPC_TLNEI " %4, 0",
  			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
  			  "r" (__ret_warn_on));

The asm then does a full register width comparison with zero and traps
if it is non-zero (PPC_TLNEI).

The new code instead passes the full expression, x, with some arbitrary
type, to the inline asm:

  #define WARN_ON(x) ({
	...
	do {
		if (__builtin_constant_p((x))) {
		...
		} else {
			...
			WARN_ENTRY(PPC_TLNEI " %4, 0",
				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
				   __label_warn_on, "r" (x));

As reported[1] by Nathan, when building with clang this can cause
spurious warnings to fire repeatedly at boot:

  WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
  NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
  REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
  MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
  CFAR: c00000000090a034 IRQMASK: 0
  GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
  GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
  GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
  GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
  GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
  NIP .klist_add_tail+0x3c/0x110
  LR  .bus_add_driver+0x148/0x290
  Call Trace:
    0xc0000000073c35d0 (unreliable)
    .bus_add_driver+0x148/0x290
    .driver_register+0xb8/0x190
    .__hid_register_driver+0x70/0xd0
    .redragon_driver_init+0x34/0x58
    .do_one_initcall+0x130/0x3b0
    .do_initcall_level+0xd8/0x188
    .do_initcalls+0x7c/0xdc
    .kernel_init_freeable+0x178/0x21c
    .kernel_init+0x34/0x220
    .ret_from_kernel_thread+0x58/0x60
  Instruction dump:
  fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
  fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024

The instruction dump shows that we are trapping because r30 is not zero:
  tdnei   r30,0

Where r30 = c000000007de72c8

The WARN_ON() comes from:

  static void knode_set_klist(struct klist_node *knode, struct klist *klist)
  {
  	knode->n_klist = klist;
  	/* no knode deserves to start its life dead */
  	WARN_ON(knode_dead(knode));
  		      ^^^^^^^^^^^^^^^^^

Where:

  #define KNODE_DEAD		1LU

  static bool knode_dead(struct klist_node *knode)
  {
  	return (unsigned long)knode->n_klist & KNODE_DEAD;
  }

The full disassembly shows that clang has not generated any code to
apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.

Nathan filed an LLVM bug [2], in which Eli Friedman explained that clang
believes it is only passing a single bit to the asm (ie. a bool) and so
the mask of bit 0 with 1 can be omitted, and suggested that if we want
the full 64-bit value passed to the inline asm we should cast to a
64-bit type (or 32-bit on 32-bits).

In fact we already do that for BUG_ENTRY(), which was added to fix a
possibly similar bug in 2005 in commit 32818c2eb6b8 ("[PATCH] ppc64: Fix
issue with gcc 4.0 compiled kernels").

So cast the value we pass to the inline asm to long.

For GCC this appears to have no effect on code generation, other than
causing sign extension in some cases.

[1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
[2]: https://bugs.llvm.org/show_bug.cgi?id=51634

Fixes: 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/bug.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

v2: Reword the change log a bit to hopefully make it clearer that it's clang that believes
it only needs to pass a single bit for bool, whether that's correct behaviour can be
discussed on the list at a later date :)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..02c08d1492f8 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,8 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on,		\
+				   "r" ((__force long)(x)));	\
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\
-- 
2.25.1


^ permalink raw reply related

* [next-20210827][ppc][multipathd] INFO: task hung in dm_table_add_target
From: Abdul Haleem @ 2021-09-01 11:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: axboe, sachinp, jack, linux-scsi, dm-devel, linux-next, dougmill,
	Brian King, linuxppc-dev, hch

Greeting's

multiple task hung while adding the vfc disk back to the multipath on my 
powerpc box running linux-next kernel

Test:
$ multipathd -k"remove path sde"
$ multipathd -k"add path sde"

After above command multipathd task hung with continuous call traces on 
console, it requires reboot to stop call traces

systemd-udevd[180359]: Process '/sbin/mdadm -I /dev/dm-8' failed with 
exit code 1.
multipathd[180272]: mpathf: load table [0 62914560 multipath 1 
queue_if_no_path 1 alua 2 1 round-robin 0 2 1 8:64 1 65:0 1 round-robin 
0 2 1 8:144 1 65:80 1]
systemd[1]: Started Device-Mapper Multipath Device Controller.
multipathd[180272]: mpatha: sdl - tur checker reports path is up
multipathd[180272]: 8:176: reinstated
multipathd[180272]: mpatha: remaining active paths: 2
multipathd[180272]: sde: remove path (operator)
multipathd[180272]: mpathf: load table [0 62914560 multipath 1 
queue_if_no_path 1 alua 2 1 round-robin 0 1 1 65:0 1 round-robin 0 2 1 
8:144 1 65:80 1]
multipathd[180272]: sde [8:64]: path removed from map mpathf
multipathd[180272]: sdq: remove path (operator)
multipathd[180272]: mpathf: load table [0 62914560 multipath 1 
queue_if_no_path 1 alua 1 1 round-robin 0 2 1 8:144 1 65:80 1]
multipathd[180272]: sdq [65:0]: path removed from map mpathf
multipathd[180272]: sdj: remove path (operator)
multipathd[180272]: mpathf: load table [0 62914560 multipath 1 
queue_if_no_path 1 alua 1 1 round-robin 0 1 1 65:80 1]
multipathd[180272]: sdj [8:144]: path removed from map mpathf
multipathd[180272]: sdv: remove path (operator)
multipathd[180272]: mpathf: map in use
multipathd[180272]: mpathf: can't flush
multipathd[180272]: mpathf: load table [0 62914560 multipath 1 
queue_if_no_path 0 0 0]
multipathd[180272]: sdv [65:80]: path removed from map mpathf
systemd[1]: Starting system activity accounting tool...
systemd[1]: sysstat-collect.service: Succeeded.
systemd[1]: Started system activity accounting tool.
systemd-udevd[1156]: seq 5678 '/devices/virtual/block/dm-10' is taking a 
long time
systemd-udevd[1156]: seq 5678 '/devices/virtual/block/dm-10' killed
multipathd[180272]: sde: add path (operator)
systemd[1]: Stopping Device-Mapper Multipath Device Controller...
systemd[1]: multipathd.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: multipathd.service: Killing process 180272 (multipathd) with 
signal SIGKILL.
systemd[1]: multipathd.service: Processes still around after SIGKILL. 
Ignoring.
dbus-daemon[1920]: [system] Activating via systemd: service 
name='net.reactivated.Fprint' unit='fprintd.service' requested by 
':1.255' (uid=0 pid=95173 comm="/bin/login -p --      ")
systemd[1]: Starting Fingerprint Authentication Daemon...
dbus-daemon[1920]: [system] Successfully activated service 
'net.reactivated.Fprint'
systemd[1]: Started Fingerprint Authentication Daemon.
systemd-logind[2032]: New session 6 of user root.
systemd[1]: Started Session 6 of user root.
kernel: INFO: task multipathd:180274 blocked for more than 122 seconds.
kernel:      Not tainted 5.14.0-rc7-next-20210827-autotest #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
kernel: task:multipathd      state:D stack:    0 pid:180274 ppid:     1 
flags:0x00040082
kernel: Call Trace:
kernel: [c00000002aaa72c0] [c00000002aaa73a0] 0xc00000002aaa73a0 
(unreliable)
kernel: [c00000002aaa74b0] [c00000000001e638] __switch_to+0x278/0x490
kernel: [c00000002aaa7510] [c000000000c89cfc] __schedule+0x31c/0xa10
kernel: [c00000002aaa75d0] [c000000000c8a468] schedule+0x78/0x130
kernel: [c00000002aaa7600] [c000000000c8aa08] 
schedule_preempt_disabled+0x18/0x30
kernel: [c00000002aaa7620] [c000000000c8cf0c] 
__mutex_lock.isra.11+0x36c/0x700
kernel: [c00000002aaa76b0] [c00000000067efe4] bd_link_disk_holder+0x34/0x270
kernel: [c00000002aaa7700] [c0080000003e3b78] 
dm_get_table_device+0x1e0/0x2c0 [dm_mod]
kernel: [c00000002aaa77a0] [c0080000003e7e48] dm_get_device+0x130/0x2f0 
[dm_mod]
kernel: [c00000002aaa7850] [c008000000745234] multipath_ctr+0x9bc/0xff0 
[dm_multipath]
kernel: [c00000002aaa79d0] [c0080000003e883c] 
dm_table_add_target+0x1a4/0x420 [dm_mod]
kernel: [c00000002aaa7a90] [c0080000003ee874] table_load+0x15c/0x4a0 
[dm_mod]
kernel: [c00000002aaa7b40] [c0080000003f1454] ctl_ioctl+0x27c/0x770 [dm_mod]
kernel: [c00000002aaa7d40] [c0080000003f1960] dm_ctl_ioctl+0x18/0x30 
[dm_mod]
kernel: [c00000002aaa7d60] [c000000000481198] sys_ioctl+0xf8/0x150
kernel: [c00000002aaa7db0] [c00000000002ff28] 
system_call_exception+0x158/0x2c0
kernel: [c00000002aaa7e10] [c00000000000c764] system_call_common+0xf4/0x258
kernel: --- interrupt: c00 at 0x7fffa06f4480
kernel: NIP:  00007fffa06f4480 LR: 00007fffa0ad6714 CTR: 0000000000000000
kernel: REGS: c00000002aaa7e80 TRAP: 0c00   Not tainted 
(5.14.0-rc7-next-20210827-autotest)
kernel: MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 24042204  
XER: 00000000
kernel: IRQMASK: 0 #012GPR00: 0000000000000036 00007fff9fd4c3a0 
00007fffa07e7100 0000000000000005 #012GPR04: 00000000c138fd09 
00007fff9806a0c0 00007fffa0ad9f18 00007fff9fd4a298 #012GPR08: 
0000000000000005 0000000000000000 0000000000000000 0000000000000000 
#012GPR12: 0000000000000000 00007fff9fd56300 00007fff9806a0c0 
00007fffa0ad9c80 #012GPR16: 00007fffa0ad9c80 00007fffa0ad9c80 
00007fffa0b13670 0000000000000000 #012GPR20: 00007fffa0ae3260 
00007fffa0b12040 00007fff9806a0f0 00007fff9800eca0 #012GPR24: 
00007fffa0ad9c80 00007fffa0ad9c80 00007fffa0ad9c80 0000000000000000 
#012GPR28: 00007fffa0ad9c80 00007fffa0ad9c80 0000000000000000 
00007fffa0ad9c80
kernel: NIP [00007fffa06f4480] 0x7fffa06f4480
kernel: LR [00007fffa0ad6714] 0x7fffa0ad6714
kernel: --- interrupt: c00
kernel: INFO: task systemd-udevd:180404 blocked for more than 122 seconds.
kernel:      Not tainted 5.14.0-rc7-next-20210827-autotest #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
kernel: task:systemd-udevd   state:D stack:    0 pid:180404 ppid:  1156 
flags:0x00042482
kernel: Call Trace:

Before test
------------
$multipath -ll
mpathf (360050768108001b3a8000000000000e8) dm-10 IBM,2145
size=30G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:4 sde 8:64  active ready running
| `- 2:0:0:4 sdq 65:0  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
   |- 1:0:1:4 sdj 8:144 active ready running
   `- 2:0:1:4 sdv 65:80 active ready running
$

After test fail
------------
$ multipath -ll
mpathf (360050768108001b3a8000000000000e8) dm-10 ##,##
size=30G features='1 queue_if_no_path' hwhandler='0' wp=rw
$

-- 
Regard's

Abdul Haleem
IBM Linux Technology Center


^ permalink raw reply

* Re: [next-20210820][ppc][nvme/raid] pci unbind WARNS at fs/kernfs/dir.c:1524 kernfs_remove_by_name_ns+
From: Abdul Haleem @ 2021-09-01 10:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Brian King, linux-next, linuxppc-dev, linux-kernel, linux-scsi
In-Reply-To: <YSfBwj1zo/w2GCH4@infradead.org>

Thanks Christoph,

The problem is fixed with below commit and tested on next-20210823

Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>

On 8/26/21 10:00 PM, Christoph Hellwig wrote:

> Are you sure this is the very latest linux-next?  This should hav been
> fixed by:
>
>      "block: add back the bd_holder_dir reference in bd_link_disk_holder"

-- 
Regard's

Abdul Haleem
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v2 3/3] powerpc/numa: Fill distance_lookup_table for offline nodes
From: Srikar Dronamraju @ 2021-09-01 10:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Vincent Guittot,
	kernel test robot, Peter Zijlstra, Geetika Moolchandani,
	Valentin Schneider, Laurent Dufour, linuxppc-dev, Ingo Molnar
In-Reply-To: <875yvsba4q.fsf@mpe.ellerman.id.au>

* Michael Ellerman <mpe@ellerman.id.au> [2021-08-26 23:36:53]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> > Scheduler expects unique number of node distances to be available at
> > boot.
> 
> I think it needs "the number of unique node distances" ?
> 
> > It uses node distance to calculate this unique node distances.
> 
> It iterates over all pairs of nodes and records node_distance() for that
> pair, and then calculates the set of unique distances.
> 
> > On POWER, node distances for offline nodes is not available. However,
> > POWER already knows unique possible node distances.
> 
> I think it would be more accurate to say PAPR rather than POWER there.
> It's PAPR that defines the way we determine distances and imposes that
> limitation.
> 

Okay, will do all the necessary modifications as suggested above.

> > Fake the offline node's distance_lookup_table entries so that all
> > possible node distances are updated.
> 
> Does this work if we have a single node offline at boot?
> 

It should.

> Say we start with:
> 
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
> 
> And node 2 is offline at boot. We can only initialise that nodes entries
> in the distance_lookup_table:
> 
> 		while (i--)
> 			distance_lookup_table[node][i] = node;
> 
> By filling them all with 2 that causes node_distance(2, X) to return the
> maximum distance for all other nodes X, because we won't break out of
> the loop in __node_distance():
> 
> 	for (i = 0; i < distance_ref_points_depth; i++) {
> 		if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
> 			break;
> 
> 		/* Double the distance for each NUMA level */
> 		distance *= 2;
> 	}
> 
> If distance_ref_points_depth was 4 we'd return 160.

As you already know, distance 10, 20, .. are defined by Powerpc, form1
affinity. PAPR doesn't define actual distances, it only provides us the
associativity. If there are distance_ref_points_depth is 4,
(distance_ref_points_depth doesn't take local distance into consideration)
10, 20, 40, 80, 160.

> 
> That'd leave us with 3 unique distances at boot, 10, 20, 160.
> 

So if there are unique distances, then the distances as per the current
code has to be 10, 20, 40, 80.. I dont see a way in which we have a break in
the series. like having 160 without 80.

> But when node 2 comes online it might introduce more than 1 new distance
> value, eg. it could be that the actual distances are:
> 
> node distances:
> node   0   1   2
>   0:  10  20  40
>   1:  20  10  80
>   2:  40  80  10
> 
> ie. we now have 4 distances, 10, 20, 40, 80.
> 
> What am I missing?
> 

As I said above, I am not sure how we can have a break in the series.
If distance_ref_points_depth is 3, the distances has to be 10,20,40,80 as
atleast for form1 affinity.

> > However this only needs to be done if the number of unique node
> > distances that can be computed for online nodes is less than the
> > number of possible unique node distances as represented by
> > distance_ref_points_depth.
> 
> Looking at a few machines they all have distance_ref_points_depth = 2.
> 
> So maybe that explains it, in practice we only see 10, 20, 40.
> 
> > When the node is actually onlined, distance_lookup_table will be
> > updated with actual entries.
> 
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
> > Cc: Laurent Dufour <ldufour@linux.ibm.com>
> > Cc: kernel test robot <lkp@intel.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/mm/numa.c | 70 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 70 insertions(+)
> >
> > Changelog:
> > v1: https://lore.kernel.org/linuxppc-dev/20210701041552.112072-3-srikar@linux.vnet.ibm.com/t/#u
> > [ Fixed a missing prototype warning Reported-by: kernel test robot <lkp@intel.com>]
> >
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 3c124928a16d..0ee79a08c9e1 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -856,6 +856,75 @@ void __init dump_numa_cpu_topology(void)
> >  	}
> >  }
> >  
> > +/*
> > + * Scheduler expects unique number of node distances to be available at
> > + * boot. It uses node distance to calculate this unique node distances. On
> > + * POWER, node distances for offline nodes is not available. However, POWER
> > + * already knows unique possible node distances. Fake the offline node's
> > + * distance_lookup_table entries so that all possible node distances are
> > + * updated.
> > + */
> 
> > +static void __init fake_update_distance_lookup_table(void)
> > +{
> > +	unsigned long distance_map;
> > +	int i, nr_levels, nr_depth, node;
> 
> Are they distances, depths, or levels? :)
> 
> Bit more consistency in the variable names might make the code easier to
> follow.
> 
> > +
> > +	if (!numa_enabled)
> > +		return;
> > +
> > +	if (!form1_affinity)
> > +		return;
> 
> That doesn't exist since Aneesh's FORM2 series, so that will need a
> rebase, and possibly some more rework to interact with that series.
> 

We only have to handle for form1, so it should be easier to handle.

> > +	/*
> > +	 * distance_ref_points_depth lists the unique numa domains
> > +	 * available. However it ignore LOCAL_DISTANCE. So add +1
> > +	 * to get the actual number of unique distances.
> > +	 */
> > +	nr_depth = distance_ref_points_depth + 1;
> 
> num_depths would be a better name IMHO.

Okay,
s/nr_depth/num_depths/g
s/nr_level/depth/g

> 
> > +
> > +	WARN_ON(nr_depth > sizeof(distance_map));
> 
> Warn but then continue, and corrupt something on the stack? Seems like a
> bad idea :)
> 
> I guess it's too early to use bitmap_alloc(). But can we at least return
> if nr_depth is too big.

Yes, we can't use bitmap_alloc here.
Now should we continue if nr_depth is greater than sizeof(distance_map)

If we don't and return immediately, then we can end up not creating enough
scheduler domains and may later on lead to build_sched_domain OOPs, when we
online nodes.

However if don't return, chance of surviving when the domains are actually
onlined is more.
We could probably reset nr_depth to be same as sizeof(distance_map).

That said, I think we are too far away from nr_depths being anywhere closer
to sizeof(long). So I am okay either way.

> 
> > +
> > +	bitmap_zero(&distance_map, nr_depth);
> > +	bitmap_set(&distance_map, 0, 1);
> > +
> > +	for_each_online_node(node) {
> > +		int nd, distance = LOCAL_DISTANCE;
> > +
> > +		if (node == first_online_node)
> > +			continue;
> > +
> > +		nd = __node_distance(node, first_online_node);
> > +		for (i = 0; i < nr_depth; i++, distance *= 2) {
> 
> 		for (i = 0, distance = LOCAL_DISTANCE; i < nr_depth; i++, distance *= 2) {
> 
> Could make it clearer what the for loop is doing I think.
> 
> > +			if (distance == nd) {
> > +				bitmap_set(&distance_map, i, 1);
> > +				break;
> > +			}
> > +		}
> > +		nr_levels = bitmap_weight(&distance_map, nr_depth);
> > +		if (nr_levels == nr_depth)
> > +			return;
> > +	}
> > +
> > +	for_each_node(node) {
> > +		if (node_online(node))
> > +			continue;
> > +
> > +		i = find_first_zero_bit(&distance_map, nr_depth);
> > +		if (i >= nr_depth || i == 0) {
> 
> Neither of those can happen can they?
> 
> We checked the bitmap weight in the previous for loop, or at the bottom
> of this one, and returned if we'd filled the map already.
> 
> And we set bit zero explicitly with bitmap_set().
> 

Agree, I can drop the hunk.

> > +			pr_warn("Levels(%d) not matching levels(%d)", nr_levels, nr_depth);
> > +			return;
> > +		}


> > +
> > +		bitmap_set(&distance_map, i, 1);
> > +		while (i--)
> > +			distance_lookup_table[node][i] = node;
> 
> That leaves distance_lookup_table[node][i+1] and so on uninitialised, or
> initialised to zero because it's static, is that OK?

Yes, this should be fine, because we are only interested in finding number
of unique numa distances, By the time actual distances come and overwrite,
we would no more use these fake distances.

But if you are comfortable with updating for all the depths, I can update
too.

> 
> > +		nr_levels = bitmap_weight(&distance_map, nr_depth);
> > +		if (nr_levels == nr_depth)
> > +			return;
> > +	}
> > +}
> > +
> >  /* Initialize NODE_DATA for a node on the local memory */
> >  static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> >  {
> > @@ -971,6 +1040,7 @@ void __init mem_topology_setup(void)
> >  		 */
> >  		numa_setup_cpu(cpu);
> >  	}
> > +	fake_update_distance_lookup_table();
> >  }
> >  
> >  void __init initmem_init(void)
> 
> 
> cheers

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH v3] powerpc/32: Add support for out-of-line static calls
From: Peter Zijlstra @ 2021-09-01  8:52 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-kernel, Steven Rostedt, Jason Baron, Paul Mackerras,
	Josh Poimboeuf, linuxppc-dev, Ard Biesheuvel
In-Reply-To: <6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu>

On Wed, Sep 01, 2021 at 08:30:21AM +0000, Christophe Leroy wrote:
> Add support for out-of-line static calls on PPC32. This change
> improve performance of calls to global function pointers by
> using direct calls instead of indirect calls.
> 
> The trampoline is initialy populated with a 'blr' or branch to target,
> followed by an unreachable long jump sequence.
> 
> In order to cater with parallele execution, the trampoline needs to
> be updated in a way that ensures it remains consistent at all time.
> This means we can't use the traditional lis/addi to load r12 with
> the target address, otherwise there would be a window during which
> the first instruction contains the upper part of the new target
> address while the second instruction still contains the lower part of
> the old target address. To avoid that the target address is stored
> just after the 'bctr' and loaded from there with a single instruction.
> 
> Then, depending on the target distance, arch_static_call_transform()
> will either replace the first instruction by a direct 'bl <target>' or
> 'nop' in order to have the trampoline fall through the long jump
> sequence.
> 
> For the special case of __static_call_return0(), to avoid the risk of
> a far branch, a version of it is inlined at the end of the trampoline.

(also, it's in the same line, so it avoids another cachemiss and it
nicely fills the hole you had in your 32byte chunk)

> Performancewise the long jump sequence is probably not better than
> the indirect calls set by GCC when we don't use static calls, but
> such calls are unlikely to be required on powerpc32: With most
> configurations the kernel size is far below 32 Mbytes so only
> modules may happen to be too far. And even modules are likely to
> be close enough as they are allocated below the kernel core and
> as close as possible of the kernel text.
> 
> static_call selftest is running successfully with this change.

Nice!, I'd ask if you'd tried PREEMPT_DYNAMIC, since that should really
stress the thing, but I see that also requires GENERIC_ENTRY and you
don't have that. Alas.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply

* [PATCH kernel] KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
From: Alexey Kardashevskiy @ 2021-09-01  8:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc

H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
in __alloc_pages() by constructing a tiny VM which only does
H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).

This silences the warning by adding __GFP_NOWARN.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index e57c08b968c0..a2e34efb8d31 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -580,7 +580,7 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
 	if (eaddr & (0xFFFUL << 52))
 		return H_PARAMETER;
 
-	buf = kzalloc(n, GFP_KERNEL);
+	buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN);
 	if (!buf)
 		return H_NO_MEM;
 
-- 
2.30.2


^ permalink raw reply related

* [PATCH kernel] KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
From: Alexey Kardashevskiy @ 2021-09-01  8:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc

The userspace can trigger "vmalloc size %lu allocation failure: exceeds
total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.

This silences the warning by checking the limit before calling vzalloc()
and returns ENOMEM if failed.

This does not call underlying valloc helpers as __vmalloc_node() is only
exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is not
exported at all.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_hv.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 474c0cfde384..a59f1cccbcf9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4830,8 +4830,12 @@ static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
 	unsigned long npages = mem->memory_size >> PAGE_SHIFT;
 
 	if (change == KVM_MR_CREATE) {
-		slot->arch.rmap = vzalloc(array_size(npages,
-					  sizeof(*slot->arch.rmap)));
+		unsigned long cb = array_size(npages, sizeof(*slot->arch.rmap));
+
+		if ((cb >> PAGE_SHIFT) > totalram_pages())
+			return -ENOMEM;
+
+		slot->arch.rmap = vzalloc(cb);
 		if (!slot->arch.rmap)
 			return -ENOMEM;
 	}
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH kernel] KVM: PPC: Book3S HV: Make unique debugfs nodename
From: Alexey Kardashevskiy @ 2021-09-01  8:43 UTC (permalink / raw)
  To: Fabiano Rosas, Michael Ellerman; +Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <a1be1913-f564-924b-1750-03efa859a0b1@ozlabs.ru>



On 24/08/2021 18:37, Alexey Kardashevskiy wrote:
> 
> 
> On 18/08/2021 08:20, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>
>>> On 07/07/2021 14:13, Alexey Kardashevskiy wrote:
>>
>>> alternatively move this debugfs stuff under the platform-independent
>>> directory, how about that?
>>
>> That's a good idea. I only now realized we have two separate directories
>> for the same guest:
>>
>> $ ls /sys/kernel/debug/kvm/ | grep $pid
>> 19062-11
>> vm19062
>>
>> Looks like we would have to implement kvm_arch_create_vcpu_debugfs for
>> the vcpu information and add a similar hook for the vm.
> 
> Something like that. From the git history, it looks like the ppc folder 
> was added first and then the generic kvm folder was added but apparently 
> they did not notice the ppc one due to natural reasons :)
> 
> If you are not too busy, can you please merge the ppc one into the 
> generic one and post the patch, so we won't need to fix these 
> duplication warnings again? Thanks,



Turns out it is not that straight forward as I thought as the common KVM 
debugfs entry is created after PPC HV KVM created its own and there is 
no obvious way to change the order (no "post init" hook in kvmppc_ops).

Also, unlike the common KVM debugfs setup, we do not allocate structures 
to support debugfs nodes so we do not leak anything to bother with a 
mutex like 85cd39af14f4 did.

So I'd stick to the original patch to reduce the noise in the dmesg, and 
it also exposes lpid which I find rather useful for finding the right 
partition scope tree in partition_tb.

Michael?


> 
> 
> 
>>>> ---
>>>>    arch/powerpc/kvm/book3s_hv.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c 
>>>> b/arch/powerpc/kvm/book3s_hv.c
>>>> index 1d1fcc290fca..0223ddc0eed0 100644
>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>> @@ -5227,7 +5227,7 @@ static int kvmppc_core_init_vm_hv(struct kvm 
>>>> *kvm)
>>>>        /*
>>>>         * Create a debugfs directory for the VM
>>>>         */
>>>> -    snprintf(buf, sizeof(buf), "vm%d", current->pid);
>>>> +    snprintf(buf, sizeof(buf), "vm%d-lp%ld", current->pid, lpid);
>>>>        kvm->arch.debugfs_dir = debugfs_create_dir(buf, 
>>>> kvm_debugfs_dir);
>>>>        kvmppc_mmu_debugfs_init(kvm);
>>>>        if (radix_enabled())
>>>>
> 

-- 
Alexey

^ permalink raw reply

* [PATCH v3] powerpc/32: Add support for out-of-line static calls
From: Christophe Leroy @ 2021-09-01  8:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Jason Baron,
	Josh Poimboeuf, linuxppc-dev, Ard Biesheuvel

Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.

The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.

In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.

Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl <target>' or
'nop' in order to have the trampoline fall through the long jump
sequence.

For the special case of __static_call_return0(), to avoid the risk of
a far branch, a version of it is inlined at the end of the trampoline.

Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.

static_call selftest is running successfully with this change.

With this patch, __do_irq() has the following sequence to trace
irq entries:

	c0004a00 <__SCT__tp_func_irq_entry>:
	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
	c0004a04:	3d 80 c0 00 	lis     r12,-16384
	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
	c0004a0c:	7d 89 03 a6 	mtctr   r12
	c0004a10:	4e 80 04 20 	bctr
	c0004a14:	38 60 00 00 	li      r3,0
	c0004a18:	4e 80 00 20 	blr
	c0004a1c:	00 00 00 00 	.long 0x0
...
	c0005654 <__do_irq>:
...
	c0005664:	7c 7f 1b 78 	mr      r31,r3
...
	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
	c00056a4:	39 29 00 01 	addi    r9,r9,1
	c00056a8:	91 22 00 00 	stw     r9,0(r2)
	c00056ac:	3d 20 c0 af 	lis     r9,-16209
	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
	c00056b4:	2c 09 00 00 	cmpwi   r9,0
	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
	c00056c0:	7f e4 fb 78 	mr      r4,r31
	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>

Before this patch, __do_irq() was doing the following to trace irq
entries:

	c0005700 <__do_irq>:
...
	c0005710:	7c 7e 1b 78 	mr      r30,r3
...
	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
	c0005750:	81 22 00 00 	lwz     r9,0(r2)
	c0005754:	39 29 00 01 	addi    r9,r9,1
	c0005758:	91 22 00 00 	stw     r9,0(r2)
	c000575c:	3d 20 c0 af 	lis     r9,-16209
	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
	c0005764:	2c 1f 00 00 	cmpwi   r31,0
	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
	c0005774:	7d 29 03 a6 	mtctr   r9
	c0005778:	7f c4 f3 78 	mr      r4,r30
	c000577c:	4e 80 04 21 	bctrl
	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
	c0005784:	2c 09 00 00 	cmpwi   r9,0
	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>

Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v3: Adding the special case of __static_call_return0()

v2: Use indirect load in long jump sequence to cater with parallele execution and preemption.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/Kconfig                   |  1 +
 arch/powerpc/include/asm/static_call.h | 28 +++++++++++++++++++
 arch/powerpc/kernel/Makefile           |  2 +-
 arch/powerpc/kernel/static_call.c      | 37 ++++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/static_call.h
 create mode 100644 arch/powerpc/kernel/static_call.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36b72d972568..a0fe69d8ec83 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -247,6 +247,7 @@ config PPC
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
 	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
+	select HAVE_STATIC_CALL			if PPC32
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HUGETLB_PAGE_SIZE_VARIABLE	if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
new file mode 100644
index 000000000000..0a0bc79bd1fa
--- /dev/null
+++ b/arch/powerpc/include/asm/static_call.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STATIC_CALL_H
+#define _ASM_POWERPC_STATIC_CALL_H
+
+#define __PPC_SCT(name, inst)					\
+	asm(".pushsection .text, \"ax\"				\n"	\
+	    ".align 5						\n"	\
+	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
+	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
+	    inst "						\n"	\
+	    "	lis	12,2f@ha				\n"	\
+	    "	lwz	12,2f@l(12)				\n"	\
+	    "	mtctr	12					\n"	\
+	    "	bctr						\n"	\
+	    "1:	li	3, 0					\n"	\
+	    "	blr						\n"	\
+	    "2:	.long 0						\n"	\
+	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
+	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
+	    ".popsection					\n")
+
+#define PPC_SCT_RET0		20		/* Offset of label 1 */
+#define PPC_SCT_DATA		28		/* Offset of label 2 */
+
+#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)	__PPC_SCT(name, "b " #func)
+#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)	__PPC_SCT(name, "blr")
+
+#endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 7be36c1e1db6..0e3640e14eb1 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -106,7 +106,7 @@ extra-y				+= vmlinux.lds
 
 obj-$(CONFIG_RELOCATABLE)	+= reloc_$(BITS).o
 
-obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o
+obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o static_call.o
 obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
new file mode 100644
index 000000000000..863a7aa24650
--- /dev/null
+++ b/arch/powerpc/kernel/static_call.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/memory.h>
+#include <linux/static_call.h>
+
+#include <asm/code-patching.h>
+
+void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
+{
+	int err;
+	bool is_ret0 = (func == __static_call_return0);
+	unsigned long target = (unsigned long)(is_ret0 ? tramp + PPC_SCT_RET0 : func);
+	bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
+
+	if (!tramp)
+		return;
+
+	mutex_lock(&text_mutex);
+
+	if (func && !is_short) {
+		err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
+		if (err)
+			goto out;
+	}
+
+	if (!func)
+		err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
+	else if (is_short)
+		err = patch_branch(tramp, target, 0);
+	else
+		err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
+out:
+	mutex_unlock(&text_mutex);
+
+	if (err)
+		panic("%s: patching failed %pS at %pS\n", __func__, func, tramp);
+}
+EXPORT_SYMBOL_GPL(arch_static_call_transform);
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm
From: Michael Ellerman @ 2021-09-01  7:17 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: nathan, linuxppc-dev
In-Reply-To: <20210831213432.GF1583@gate.crashing.org>

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
>> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
>> you pass a value of a type that's narrower than a register to an inline
>> asm, the high bits are undefined". In this case we are passing a bool
>> to the inline asm, which is a single bit value, and so the compiler
>> thinks it can leave the high bits of r30 unmasked, because only the
>> value of bit 0 matters.
>> 
>> Because the inline asm compares the full width of the register (32 or
>> 64-bit) we need to ensure the value passed to the inline asm has all
>> bits defined. So fix it by casting to long.
>> 
>> We also already cast to long for the similar case in BUG_ENTRY(), which
>> it turns out was added to fix a similar bug in 2005 in commit
>> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").
>
> That points to <https://gcc.gnu.org/PR23422>, which shows the correct
> explanation.

That's a pity because I don't understand that explanation ^_^

Why does sign extension matter when we're comparing against zero?

> The code as it was did **not** pass a bool.  It perhaps passed an int
> (so many macros in play, it is hard to tell for sure, but it is int or
> long int, perhaps unsigned (which does not change anything here).

I don't understand that. It definitely is passing a bool at the source
level. Are you saying it's getting promoted somehow?

It expands to:

      asm goto(
              "1:   "
              "tdnei"
              "
              " " % 4,
              0 " "\n " ".section __ex_table,\"a\";"
                                              " "
                                              ".balign 4;"
                                              " "
                                              ".long (1b) - . ;"
                                              " "
                                              ".long (%l[__label_warn_on]) - . ;"
                                              " "
                                              ".previous"
                                              " "
                                              ".section __bug_table,\"aw\"\n"
                                              "2:\t.4byte 1b - 2b, %0 - 2b\n"
                                              "\t.short %1, %2\n"
                                              ".org 2b+%3\n"
                                              ".previous\n"
              :
              : "i"("lib/klist.c"), "i"(62),
                "i"((1 << 0) | ((9) << 8)),
                "i"(sizeof(struct bug_entry)),
                "r"(knode_dead(knode))
                ^^^^^^^^^^^^^^^^^^^^^
              :
              : __label_warn_on);

And knode_dead() returns bool:

  static bool knode_dead(struct klist_node *knode)
  {
  	return (unsigned long)knode->n_klist & KNODE_DEAD;
  }

So in my book that means the type there is bool. But I'm not a compiler
guy so I guessing I'm missing something.

> But td wants a 64-bit register, not a 32-bit one (which are the only two
> possibilities for the "r" constraint on PowerPC).  The cast to "long" is
> fine for powerpc64, but it has nothing to do with "narrower than a
> register".

If it's 32-bit vs 64-bit, and the clang explanation is correct, then
we'd expect the low 32-bits of the value passed to the asm to have the
correct value, ie. have been masked with KNODE_DEAD.

> If this is not the correct explanation for LLVM, it needs to solve some
> other bug.

OK. I just need to get this fixed in the kernel, so I'll do a new
version with a changelog that is ~= "shrug not sure what's going on" and
merge that. Then we can argue about what is really going on later :)

cheeers

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm
From: Segher Boessenkool @ 2021-08-31 21:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: nathan, linuxppc-dev
In-Reply-To: <20210831132720.881643-1-mpe@ellerman.id.au>

Hi!

On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
> you pass a value of a type that's narrower than a register to an inline
> asm, the high bits are undefined". In this case we are passing a bool
> to the inline asm, which is a single bit value, and so the compiler
> thinks it can leave the high bits of r30 unmasked, because only the
> value of bit 0 matters.
> 
> Because the inline asm compares the full width of the register (32 or
> 64-bit) we need to ensure the value passed to the inline asm has all
> bits defined. So fix it by casting to long.
> 
> We also already cast to long for the similar case in BUG_ENTRY(), which
> it turns out was added to fix a similar bug in 2005 in commit
> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").

That points to <https://gcc.gnu.org/PR23422>, which shows the correct
explanation.

The code as it was did **not** pass a bool.  It perhaps passed an int
(so many macros in play, it is hard to tell for sure, but it is int or
long int, perhaps unsigned (which does not change anything here).  But
td wants a 64-bit register, not a 32-bit one (which are the only two
possibilities for the "r" constraint on PowerPC).  The cast to "long" is
fine for powerpc64, but it has nothing to do with "narrower than a
register".

If this is not the correct explanation for LLVM, it needs to solve some
other bug.


Segher

^ permalink raw reply

* Re: [PATCH v6 00/11] DDW + Indirect Mapping
From: Leonardo Brás @ 2021-08-31 20:49 UTC (permalink / raw)
  To: David Christensen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <311ece8f-bedc-b3f7-0d1b-2cb78438890f@linux.vnet.ibm.com>

On Tue, 2021-08-31 at 13:39 -0700, David Christensen wrote:
> > 
> > This series allow Indirect DMA using DDW when available, which
> > usually
> > means bigger pagesizes and more TCEs, and so more DMA space.
> 
> How is the mapping method selected?  LPAR creation via the HMC, Linux
> kernel load parameter, or some other method?

At device/bus probe, if there is enough DMA space available for Direct
DMA, then it's used. If not, it uses indirect DMA.

> 
> The hcall overhead doesn't seem too worrisome when mapping 1GB pages
> so 
> the Indirect DMA method might be best in my situation (DPDK).

Well, it depends on usage.
I mean, the recommended use of IOMMU is to map, transmit and then
unmap, but this will vary on the implementation of the driver.

If, for example, there is some reuse of the DMA mapping, as in a
previous patchset I sent (IOMMU Pagecache), then the hcall overhead can
be reduced drastically.

> 
> Dave
Best regards,
Leonardo


^ permalink raw reply

* Re: [PATCH v6 00/11] DDW + Indirect Mapping
From: David Christensen @ 2021-08-31 20:39 UTC (permalink / raw)
  To: Leonardo Brás, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <e867b4ddce01adf318bc8837c997ceb64e44c1c5.camel@gmail.com>



On 8/31/21 1:18 PM, Leonardo Brás wrote:
> Hello David,
> 
> Sorry for the delay, I did not get your mail because I was not CC'd
> in your reply (you sent the mail just to the mailing list).
> 
> Replies bellow:
> 
> On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote:
>> On 8/16/21 11:39 PM, Leonardo Bras wrote:
>>> So far it's assumed possible to map the guest RAM 1:1 to the bus,
>>> which
>>> works with a small number of devices. SRIOV changes it as the user
>>> can
>>> configure hundreds VFs and since phyp preallocates TCEs and does not
>>> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
>>> per a PE to limit waste of physical pages.
>>>
>>> As of today, if the assumed direct mapping is not possible, DDW
>>> creation
>>> is skipped and the default DMA window "ibm,dma-window" is used
>>> instead.
>>>
>>> Using the DDW instead of the default DMA window may allow to expand
>>> the
>>> amount of memory that can be DMA-mapped, given the number of pages
>>> (TCEs)
>>> may stay the same (or increase) and the default DMA window offers
>>> only
>>> 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).
>>
>> So if I'm reading this correctly, VFIO applications requiring hugepage
>> DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR
>> after this change, is that correct?
> 
> Different DDW IOMMU page sizes were already supported in Linux (4k,
> 64k, 16M) for a while now, and the remaining page sizes in LoPAR were
> enabled in the following patch:
> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobras.c@gmail.com/
> (commit 472724111f0f72042deb6a9dcee9578e5398a1a1)
> 
> The thing is there are two ways of using DMA:
> - Direct DMA, mapping the whole memory space of the host, which
> requires a lot of DMA space if the guest memory is huge. This already
> supports DDW and allows using the bigger pagesizes.
> This happens on device/bus probe.
> 
> - Indirect DMA with IOMMU, mapping memory regions on demand, and un-
> mapping after use. This requires much less DMA space, but causes an
> overhead because an hcall is necessary for mapping and un-mapping.
> Before this series, Indirect DMA was only possible with the 'default
> DMA window' which allows using only 4k pages.
> 
> This series allow Indirect DMA using DDW when available, which usually
> means bigger pagesizes and more TCEs, and so more DMA space.

How is the mapping method selected?  LPAR creation via the HMC, Linux 
kernel load parameter, or some other method?

The hcall overhead doesn't seem too worrisome when mapping 1GB pages so 
the Indirect DMA method might be best in my situation (DPDK).

Dave

^ permalink raw reply

* Re: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode
From: Leonardo Brás @ 2021-08-31 20:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc
In-Reply-To: <20210827040706.517652-1-aik@ozlabs.ru>

Hello Alexey,

On Fri, 2021-08-27 at 14:07 +1000, Alexey Kardashevskiy wrote:
> Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated
> only when needed. This allows skipping any update when clearing TCEs.
> This works mostly fine as TCE updates are handled when MMU is enabled.
> The realmode handlers fail with H_TOO_HARD when pages are not yet
> allocated except when clearing a TCE in which case KVM prints a warning
> but proceeds to dereference a NULL pointer which crashes the host OS.
> 
> This has not been caught so far as the change is reasonably new,
> POWER9 runs mostly radix which does not use realmode handlers.
> With hash, the default TCE table is memset() by QEMU the machine reset
> which triggers page faults and the KVM TCE device's
> kvm_spapr_tce_fault()
> handles those with MMU on. And the huge DMA windows are not cleared
> by VMs whicn instead successfully create a DMA window big enough to map
> the VM memory 1:1 and then VMs just map everything without clearing.
> 
> This started crashing now as upcoming sriov-under-powervm support added
> a mode when a dymanic DMA window not big enough to map the VM memory
> 1:1
> but it is used anyway, and the VM now is the first (i.e. not QEMU) to
> clear a just created table. Note that the upstream QEMU needs to be
> modified to trigger the VM to trigger the host OS crash.
> 
> This replaces WARN_ON_ONCE_RM() with a check and return.
> This adds another warning if TCE is not being cleared.
> 
> Cc: Leonardo Bras <leobras.c@gmail.com>
> Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand
> too")
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> With recent changes in the printk() department, calling pr_err() when
> MMU
> off causes lockdep lockups which I did not dig any further so we should
> start getting rid of the realmode's WARN_ON_ONCE_RM().
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 083a4e037718..e5ba96c41f3f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct
> kvmppc_spapr_tce_table *stt,
>         idx -= stt->offset;
>         page = stt->pages[idx / TCES_PER_PAGE];
>         /*
> -        * page must not be NULL in real mode,
> -        * kvmppc_rm_ioba_validate() must have taken care of this.
> +        * kvmppc_rm_ioba_validate() allows pages not be allocated if
> TCE is
> +        * being cleared, otherwise it returns H_TOO_HARD and we skip
> this.
>          */
> -       WARN_ON_ONCE_RM(!page);
> +       if (!page) {
> +               WARN_ON_ONCE_RM(tce != 0);
> +               return;
> +       }
>         tbl = kvmppc_page_address(page);
>  
>         tbl[idx % TCES_PER_PAGE] = tce;

That looks reasonable IMHO.

FWIW:
Reviewed-by: Leonardo Bras <leobras.c@gmail.com>


^ permalink raw reply

* Re: [PATCH v6 00/11] DDW + Indirect Mapping
From: Leonardo Brás @ 2021-08-31 20:18 UTC (permalink / raw)
  To: David Christensen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <82ca56ab-6a0a-7cbb-a5e7-facc40f35c3c@linux.vnet.ibm.com>

Hello David,

Sorry for the delay, I did not get your mail because I was not CC'd
in your reply (you sent the mail just to the mailing list).

Replies bellow:

On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote:
> On 8/16/21 11:39 PM, Leonardo Bras wrote:
> > So far it's assumed possible to map the guest RAM 1:1 to the bus,
> > which
> > works with a small number of devices. SRIOV changes it as the user
> > can
> > configure hundreds VFs and since phyp preallocates TCEs and does not
> > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> > per a PE to limit waste of physical pages.
> > 
> > As of today, if the assumed direct mapping is not possible, DDW
> > creation
> > is skipped and the default DMA window "ibm,dma-window" is used
> > instead.
> > 
> > Using the DDW instead of the default DMA window may allow to expand
> > the
> > amount of memory that can be DMA-mapped, given the number of pages
> > (TCEs)
> > may stay the same (or increase) and the default DMA window offers
> > only
> > 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).
> 
> So if I'm reading this correctly, VFIO applications requiring hugepage 
> DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR 
> after this change, is that correct?  

Different DDW IOMMU page sizes were already supported in Linux (4k,
64k, 16M) for a while now, and the remaining page sizes in LoPAR were
enabled in the following patch:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobras.c@gmail.com/
(commit 472724111f0f72042deb6a9dcee9578e5398a1a1)

The thing is there are two ways of using DMA:
- Direct DMA, mapping the whole memory space of the host, which
requires a lot of DMA space if the guest memory is huge. This already
supports DDW and allows using the bigger pagesizes.
This happens on device/bus probe.

- Indirect DMA with IOMMU, mapping memory regions on demand, and un-
mapping after use. This requires much less DMA space, but causes an
overhead because an hcall is necessary for mapping and un-mapping.
Before this series, Indirect DMA was only possible with the 'default
DMA window' which allows using only 4k pages. 

This series allow Indirect DMA using DDW when available, which usually
means bigger pagesizes and more TCEs, and so more DMA space.


tl;dr this patchset means you can have more DMA space in Indirect DMA,
because you are using DDW instead of the Default DMA window.

> Any limitations based on processor 
> or pHyp revision levels?

The IOMMU page size will be limited by the sizes offered by processor
and hypervisor. They are announced at  "IO Page Sizes" output of
ibm,query-pe-dma-window, but now the biggest pagesize automatically
selected with commit 472724111f0f72042deb6a9dcee9578e5398a1a1 above
mentioned. 

Hope this helps, please let me know if there is any remaining question.

Best regards,
Leonardo


^ permalink raw reply

* Re: [PATCH] powerpc/ptdump: Fix generic ptdump for 64-bit
From: Nathan Chancellor @ 2021-08-31 19:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <20210831135151.886620-1-mpe@ellerman.id.au>

On Tue, Aug 31, 2021 at 11:51:51PM +1000, Michael Ellerman wrote:
> Since the conversion to generic ptdump we see crashes on 64-bit:
> 
>   BUG: Unable to handle kernel data access on read at 0xc0eeff7f00000000
>   Faulting instruction address: 0xc00000000045e5fc
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   ...
>   NIP __walk_page_range+0x2bc/0xce0
>   LR  __walk_page_range+0x240/0xce0
>   Call Trace:
>     __walk_page_range+0x240/0xce0 (unreliable)
>     walk_page_range_novma+0x74/0xb0
>     ptdump_walk_pgd+0x98/0x170
>     ptdump_check_wx+0x88/0xd0
>     mark_rodata_ro+0x48/0x80
>     kernel_init+0x74/0x1a0
>     ret_from_kernel_thread+0x5c/0x64
> 
> What's happening is that have walked off the end of the kernel page
> tables, and started dereferencing junk values.
> 
> That happens because we initialised the ptdump_range to span all the way
> up to 0xffffffffffffffff:
> 
> static struct ptdump_range ptdump_range[] __ro_after_init = {
> 	{TASK_SIZE_MAX, ~0UL},
> 
> But the kernel page tables don't span that far. So on 64-bit set the end
> of the range to be the address immediately past the end of the kernel
> page tables, to limit the page table walk to valid addresses.
> 
> Fixes: e084728393a5 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  arch/powerpc/mm/ptdump/ptdump.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index 2d80d775d15e..bf251191e78d 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -359,6 +359,8 @@ static int __init ptdump_init(void)
>  		ptdump_range[0].start = KERN_VIRT_START;
>  	else
>  		ptdump_range[0].start = PAGE_OFFSET;
> +
> +	ptdump_range[0].end = PAGE_OFFSET + (PGDIR_SIZE * PTRS_PER_PGD);
>  #endif
>  
>  	populate_markers();
> 
> base-commit: e1ab9a730b426fadc018f91b7c98412473e542fb
> prerequisite-patch-id: 942553bda7d83bbae8bf6b2b718033d488ee2410
> prerequisite-patch-id: a14c44e671eba8648c4fe385a2552fd57875ec8a
> prerequisite-patch-id: 94f5c890f54da2b46f06c60562e879171fab2be3
> prerequisite-patch-id: 330af32f2aa34a432d450acc9f6e9fd1cec96417
> prerequisite-patch-id: b46c65afa63944f3fb02f4b9bdf940507bb25de6
> prerequisite-patch-id: c4ba00ee949f70d7745f75bad11bbb2416f329f1
> prerequisite-patch-id: f479601944d0aa615716d5349d93bd6e3d5619c1
> prerequisite-patch-id: 9523cde933393b2d68648cecb740efdba9dd8601
> prerequisite-patch-id: 034afc97c841a6dcd2b9932406f391d65d18bf87
> prerequisite-patch-id: effd7ac8a7db6b59a2677c9c3a7ef8b3ef8bdaf8
> prerequisite-patch-id: 23883cf116ee69b452db3c6e10dd49e756e7b5d5
> prerequisite-patch-id: 37b6695321c96db466b0faba9308bacfb79c7ced
> prerequisite-patch-id: 83420e68ca4476c9ba5a67aa19e1fdc0b6d656a4
> prerequisite-patch-id: 362219acf820b78b83c6c09071a636b28976a1ce
> prerequisite-patch-id: 857513c5f431887d16a59d193834dcec636c73dc
> prerequisite-patch-id: 49f6879a819e205b5361280ab923664fcd29daaf
> prerequisite-patch-id: 5a37bcf70c5cb44d78de63a64e5ce920a0a7e419
> prerequisite-patch-id: 2c06dd3833117b0498baa198694f6c7e84975840
> prerequisite-patch-id: 5794a211ebbf7f0d416ae882443201621c00f615
> prerequisite-patch-id: 19ed5ae34e233079c7f66376b8d309cac2b57dbc
> prerequisite-patch-id: 1d4c82277473e8dbecf83faf6c4a6788538b064d
> prerequisite-patch-id: 8cb5ecc4fe23dafb4a43192f93b669c80a548985
> prerequisite-patch-id: 763b8d98c3aefd120862154b94814e3ef3578b5c
> prerequisite-patch-id: f45e04e6d030eb157be550976b07dc891fa0836d
> prerequisite-patch-id: 07b6fb682675845aca694deff1847bc7a40e1fec
> prerequisite-patch-id: 7f1082effa12b1eba445cef90e4749155662888c
> prerequisite-patch-id: 76743814dd8e6151c27676ae2e318579d658bf8b
> prerequisite-patch-id: 8a6b12c11dbbcd5dda0ccc9877dee1be785e0173
> prerequisite-patch-id: e98f013ce41c27d16f75ac3eb1c7eec4235cca0a
> prerequisite-patch-id: 285e11f96169ec82702a69b2fca5318c0e307508
> prerequisite-patch-id: 9fa89fb9f4ac839177307891bb240009f1d55e88
> prerequisite-patch-id: feebaed3f6e0c15e8fa468d64129fe9aa4411d57
> prerequisite-patch-id: 8f1093cf40180a623439d82e563e1dd18194cc19
> prerequisite-patch-id: d0466662674595d0678e71e5258d55b93d54b5c4
> prerequisite-patch-id: 286812aaed6630139583fd21d388137b8d5a6931
> prerequisite-patch-id: 54af8aa735a12282bb40a0ed87455e268ae356d9
> prerequisite-patch-id: cc5ee85759d99a6ebf18e39634dde65f15476f84
> prerequisite-patch-id: 3f8437c8bfda23c45839596ec432d81a95505061
> prerequisite-patch-id: f30d6fa2c7c7c417ee4bee0827c0ce587570db34
> prerequisite-patch-id: fa402f5deaa301587ced629dfa523728aece4705
> prerequisite-patch-id: 51f326f5de947cea58003cc8b988b54436689d1b
> prerequisite-patch-id: 4003c9a6b2792e797c333875e63a184df8fcc7e7
> prerequisite-patch-id: f73fd878eb9b65ecbed3c3ee8ca6725f7e55d5d2
> prerequisite-patch-id: 5e55b3e9b3809da22b8742f0ed356df6d6fdd301
> prerequisite-patch-id: 1fde98fffabd6313d1921d8b2f28691e9a191b1d
> prerequisite-patch-id: 51c0595fe54ad077c736b7a4351c2f2700ab66d7
> prerequisite-patch-id: e490360db8c2dc7cbf693258ca93e4597f165c6f
> prerequisite-patch-id: c4354b3226d31d8ddb6992956cf0ed12ea97cb8e
> prerequisite-patch-id: c67a26ed658da4b11a3319e0e99c4a84afb68d80
> prerequisite-patch-id: a7165946b90250fe64f5fd89502c8b681ceb081c
> prerequisite-patch-id: 5d08a5118d0f54a376d9391db767a54a15af9007
> prerequisite-patch-id: 17f4db4239b3cdeed1c73fc7949fb88486670253
> prerequisite-patch-id: 6bbbe2843772c041816d424e8d413c78d1296167
> prerequisite-patch-id: 2e08a0fa85e090442cad0d6570fbdcda6ba77e52
> prerequisite-patch-id: 4a6da55322b1e84315f0358890df7c1160f3bb76
> prerequisite-patch-id: 66ef17a0eb92d9756f05f9ea43066d794b878db1
> prerequisite-patch-id: 504a94de0570b8f6e509fad140088543edfab60d
> prerequisite-patch-id: 254030d04e05a4f8275850705976bce02947e334
> prerequisite-patch-id: cbf69c10f9d79b3902e87190c6abcc6ad05146d6
> prerequisite-patch-id: 65d12c13deecc37aacb6bcbee69e65353032f672
> prerequisite-patch-id: d1361cfe1939801bd397f89958b43899be233cec
> prerequisite-patch-id: db8ea3183bc27e148f84482d75a7d835d94cedd7
> prerequisite-patch-id: 1974681313eaa98eb00464056d64aad1a3816237
> prerequisite-patch-id: 24ef0746bc6b7503e9a8f75754b786fb315c4f2d
> prerequisite-patch-id: 24d2ca0fb3c90a57fed495c2fb17f86447e04860
> prerequisite-patch-id: 6f1ca0ce4f9f8b65a56765bd31386decac28e9ea
> prerequisite-patch-id: 1715a4e4bbd75871049978b5551837b22f6263bb
> prerequisite-patch-id: 30c74cfb31b65e120611cb28ec8f0d79e0b9804e
> prerequisite-patch-id: 339583602a8c734ad1708d92a2d7af32284d1215
> prerequisite-patch-id: 22fc0eb9e2b45e62cc473bd43a51bb941d419063
> prerequisite-patch-id: 84cb7b5a209feedd520150c7193124cc1f9c80d7
> prerequisite-patch-id: ef807430dbb9e43cff53087a8d62767ce1d91ae1
> prerequisite-patch-id: 3b7b1f05efba0907f9a6d217ff67d2fdbd99df99
> prerequisite-patch-id: a991327b6eff6be015e345962707fabd6fd4227c
> prerequisite-patch-id: 26b68754714f850a7e9c80490a2f1bd67faf3529
> prerequisite-patch-id: 6638b702ed937493bfe7aa47af628f4f15549b73
> prerequisite-patch-id: c7dfcf376999604a0ccdc26dd2ffa4445f92e3c1
> prerequisite-patch-id: 14849c2c4012cc133be0db3eabcefb3f318ca342
> prerequisite-patch-id: fa197269ec80c97d03a750667db30bd5887bed54
> prerequisite-patch-id: 0e470afd61f97d5d7c08ebf515934ac1ec4e748e
> prerequisite-patch-id: 35b8e958c05de8b7005293b0cf217293a9eeaa58
> prerequisite-patch-id: 9bac761d812645c0aa485c4d2d9eed340ab11f0a
> prerequisite-patch-id: 01834ee99459f3c861da4910753b6480fe827dab
> prerequisite-patch-id: 4859596f0522be112f41e2f69c6f90c39bc967dd
> prerequisite-patch-id: 08c3397e11ef439e85a7bca2089a30016d36ea0e
> prerequisite-patch-id: fa5507ab30165df410e22d5c22de0537bb4ded59
> prerequisite-patch-id: 869dce1db9832c582994c6ed6f751b07b48125d1
> prerequisite-patch-id: 324bc707b7f150a18fab383536515aa833855ced
> prerequisite-patch-id: e88296896fb188f7c472c1dc53dbd367a230487b
> prerequisite-patch-id: 5ab1cbfb81aa91f5e81b3deb014cc0c21ec1bab5
> prerequisite-patch-id: cb4615af3de52eff7224280e92ad4758fc0c3343
> prerequisite-patch-id: 0237d4817f895c5ddbb36df05efc4f91d9edffc4
> prerequisite-patch-id: 6a110281aac979dec5e2e71f909fce1212278977
> prerequisite-patch-id: ee3b2d2acffd08a33c1217b658f8c88b876e35c4
> prerequisite-patch-id: 12b29eccf400cea67a55c74b56e577f49834047e
> prerequisite-patch-id: e29ebe2684a71e790a58f62085190a813ae5a5f1
> prerequisite-patch-id: d1e8c04f704e47bc8a3b614b6a385c01e25f4ff3
> prerequisite-patch-id: 87cb9aa0f7e7fbcdaaa6a2d62ccfe185b861c4e6
> prerequisite-patch-id: 22883bca2200f42c0024bdf1d8543375e699557f
> prerequisite-patch-id: 336e23cd5be248bdc0787ce164582cb371d97657
> prerequisite-patch-id: b11f7ac383774f86e86f6ae517b65c8c065c5797
> prerequisite-patch-id: f2c6c9391594477ce4f655d8b51d82ab0526fa3b
> prerequisite-patch-id: a045c2861288933be9c4cd1100e786b18843ebb7
> prerequisite-patch-id: 3a3cadbe351e63f3403a127d24ee9c5b142ec15c
> prerequisite-patch-id: 326bdc5846c89345e7a222ade9ec17c3bd62b3d0
> prerequisite-patch-id: 401ccfc11e13f684b88a7287296fad1569c9b25e
> prerequisite-patch-id: dd09a722c66ab1ba63c2203d520ba090c29c8283
> prerequisite-patch-id: 89804991ad73df809315885d92d97f4109234573
> prerequisite-patch-id: c94655023a07fbb27dfd12b5dca0758f64a1cc4c
> prerequisite-patch-id: f9762f94fb9fa39deef961626e938925837e8c4c
> prerequisite-patch-id: cdbec35dbdd3ce2f6e17c9595976ff128f029ec1
> prerequisite-patch-id: 503b7d6e7369f31241a3705a52d036e135599602
> prerequisite-patch-id: 489404df41c4e8bd6cc353d2eb8f714a7f7cb95d
> prerequisite-patch-id: 530df13ef25239dc259319098336dcebd6cc87de
> prerequisite-patch-id: a246c487a84b9a4932dc7ce77d4f9606afe7b85c
> prerequisite-patch-id: 393087a4e9f1a0a09be0343ff94925d17c4082ca
> prerequisite-patch-id: 89a56b4c9349d123dea13732015671e1aeb62b18
> prerequisite-patch-id: eb86942c6da6188efea6c6d67c6bdd60339e0476
> prerequisite-patch-id: afa740a40e3b07e5a9378918d725e5802167bdc1
> prerequisite-patch-id: e342e49df2157001bf6322211e75fe025786e30a
> prerequisite-patch-id: b3c799971c5da26f68f07028496a2eeb3bdaaf33
> prerequisite-patch-id: a735a743d7661d448ee6d124166298eae0ddf0e2
> prerequisite-patch-id: 510af96fd5e0fd92151b3a8b541f376792c39bec
> prerequisite-patch-id: a13a33ba2b519bfc40ff4997fa845b63d3653f39
> prerequisite-patch-id: c7df3d67d6395d33bd9751de80b55568f3fb6186
> prerequisite-patch-id: b258c071f38b60baf1030f8151ea94fe75a662bf
> prerequisite-patch-id: 2692c2e15a9895ee28ea5ce3f5028dfcbd25594d
> prerequisite-patch-id: 48709efeb5030256d124df36025e5bcc850d45b7
> prerequisite-patch-id: 6e9724110834a8337ea4c6792fd65b5c9d468d3c
> prerequisite-patch-id: d0250201c811f6f0428a3806bd5103c58b4e1e83
> prerequisite-patch-id: 1cb3e528643b493849ac4d1053261de195f410ad
> prerequisite-patch-id: a8d02c6e093c2a1239c134adab5b52bfe1b2a402
> prerequisite-patch-id: 38cd600a6d248b1962fb13c1bbf141bb95927f88
> prerequisite-patch-id: 7877ab5caf1d61e3dc3e4d1d4cdf09b102232e6c
> prerequisite-patch-id: 843bf6118787a648ebd0ee8d419d19e89f771ea3
> prerequisite-patch-id: 9a446cb0a9d212fc8ce5e3946a946ae39c2c349f
> prerequisite-patch-id: 8ac1f15124b2bd0d270e3663056c0f0cc1cb0ec8
> prerequisite-patch-id: 2c62e90eb7cd826b7206bf17cb307a63173535cb
> prerequisite-patch-id: b7683837ef6de9540e4c1b584b5f7f09f5a93462
> prerequisite-patch-id: 8229a9ffdf02c2387bca160368c0c979a8195dbf
> prerequisite-patch-id: 6996c60bfe5146e4a249d1eae10eff826cfc4d7f
> prerequisite-patch-id: bf13ed36a4a78760e89139117b900ab575f7e580
> prerequisite-patch-id: 195c4cbcf2370dcfa843c8b12bb733a4b1b5e8de
> prerequisite-patch-id: 4e0506e90c3ca82e30d058761e1a529b37924274
> prerequisite-patch-id: 21ff6da108be555677f80188b31e22a1e72926b8
> prerequisite-patch-id: 4644fd201040a92da4e8d2b2b63d903595614ab6
> prerequisite-patch-id: 9a09071502e021e0b537c934e45b21221daef912
> prerequisite-patch-id: fe8d55d4361067ca36e5826fd953c2211d4681e7
> prerequisite-patch-id: b177aae7e5f05b53b5f31f754a24492d85b8b4e8
> prerequisite-patch-id: 98f78c471a07fdc399d7d92041b3208a9228f627
> prerequisite-patch-id: d59f939e25301f9ee72f79b3c63cbc90b872743a
> prerequisite-patch-id: 727952f4af1baf3b0f53aa275f2bcd13a03dc270
> prerequisite-patch-id: c015879876072980d932ff39c7bf04b0d96d43f8
> -- 
> 2.25.1

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm
From: Nathan Chancellor @ 2021-08-31 19:46 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <20210831132720.881643-1-mpe@ellerman.id.au>

On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
> In commit 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to
> WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
> it would take the warning condition, x, and double negate it before
> converting the result to int, and passing that int to the underlying
> inline asm. ie:
> 
>   #define WARN_ON(x) ({
>   	int __ret_warn_on = !!(x);
>   	if (__builtin_constant_p(__ret_warn_on)) {
>   	...
>   	} else {
>   		BUG_ENTRY(PPC_TLNEI " %4, 0",
>   			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
>   			  "r" (__ret_warn_on));
> 
> The asm then does a full register width comparison with zero and traps
> if it is non-zero (PPC_TLNEI).
> 
> The new code instead passes the full expression, x, with some unknown
> type, to the inline asm:
> 
>   #define WARN_ON(x) ({
> 	...
> 	do {
> 		if (__builtin_constant_p((x))) {
> 		...
> 		} else {
> 			...
> 			WARN_ENTRY(PPC_TLNEI " %4, 0",
> 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
> 				   __label_warn_on, "r" (x));
> 
> This was not seen to cause any issues with GCC, however with clang in at
> least one case it leads to a WARN_ON() that fires incorrectly and
> repeatedly at boot, as reported[1] by Nathan:
> 
>   WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
>   NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
>   REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
>   MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
>   CFAR: c00000000090a034 IRQMASK: 0
>   GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
>   GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
>   GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
>   GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
>   GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
>   GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
>   NIP .klist_add_tail+0x3c/0x110
>   LR  .bus_add_driver+0x148/0x290
>   Call Trace:
>     0xc0000000073c35d0 (unreliable)
>     .bus_add_driver+0x148/0x290
>     .driver_register+0xb8/0x190
>     .__hid_register_driver+0x70/0xd0
>     .redragon_driver_init+0x34/0x58
>     .do_one_initcall+0x130/0x3b0
>     .do_initcall_level+0xd8/0x188
>     .do_initcalls+0x7c/0xdc
>     .kernel_init_freeable+0x178/0x21c
>     .kernel_init+0x34/0x220
>     .ret_from_kernel_thread+0x58/0x60
>   Instruction dump:
>   fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
>   fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> 
> The instruction dump shows that we are trapping because r30 is not zero:
>   tdnei   r30,0
> 
> Where r30 = c000000007de72c8
> 
> The WARN_ON() comes from:
> 
>   static void knode_set_klist(struct klist_node *knode, struct klist *klist)
>   {
>   	knode->n_klist = klist;
>   	/* no knode deserves to start its life dead */
>   	WARN_ON(knode_dead(knode));
>   		^^^^^^^^^^^^^^^^^
> 
> Where:
> 
>   #define KNODE_DEAD		1LU
> 
>   static bool knode_dead(struct klist_node *knode)
>   {
>   	return (unsigned long)knode->n_klist & KNODE_DEAD;
>   }
> 
> The full disassembly shows that the compiler has not generated any code
> to apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.
> 
> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
> you pass a value of a type that's narrower than a register to an inline
> asm, the high bits are undefined". In this case we are passing a bool
> to the inline asm, which is a single bit value, and so the compiler
> thinks it can leave the high bits of r30 unmasked, because only the
> value of bit 0 matters.
> 
> Because the inline asm compares the full width of the register (32 or
> 64-bit) we need to ensure the value passed to the inline asm has all
> bits defined. So fix it by casting to long.
> 
> We also already cast to long for the similar case in BUG_ENTRY(), which
> it turns out was added to fix a similar bug in 2005 in commit
> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").
> 
> [1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
> [2]: https://bugs.llvm.org/show_bug.cgi?id=51634
> 
> Fixes: 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto")
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  arch/powerpc/include/asm/bug.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..02c08d1492f8 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,8 @@ __label_warn_on:						\
>  								\
>  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on,		\
> +				   "r" ((__force long)(x)));	\
>  			break;					\
>  __label_warn_on:						\
>  			__ret_warn_on = true;			\
> 
> -- 
> 2.25.1

^ permalink raw reply

* Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
From: Paul Moore @ 2021-08-31 18:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-efi, Linux PCI, linux-cxl, Linux Kernel Mailing List,
	Steffen Klassert, Herbert Xu, X86 ML, James Morris, Linux ACPI,
	Ingo Molnar, linux-serial, Linux-pm mailing list, SElinux list,
	Steven Rostedt, Casey Schaufler, Netdev, Stephen Smalley,
	Kexec Mailing List, Ondrej Mosnacek, Linux Security Module list,
	linux-fsdevel, bpf, linuxppc-dev, David S . Miller
In-Reply-To: <CAPcyv4i8YXo=xOL2vO67KLABQRDNAxzrzT=a1xtwtrts5pVPKw@mail.gmail.com>

On Tue, Aug 31, 2021 at 2:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Aug 31, 2021 at 6:53 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Tue, Aug 31, 2021 at 5:09 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > On Sat, Jun 19, 2021 at 12:18 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > ...
> >
> > > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > > index 2acc6173da36..c1747b6555c7 100644
> > > > > --- a/drivers/cxl/mem.c
> > > > > +++ b/drivers/cxl/mem.c
> > > > > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> > > > >         if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > > > >                 return false;
> > > > >
> > > > > -       if (security_locked_down(LOCKDOWN_NONE))
> > > > > +       if (security_locked_down(current_cred(), LOCKDOWN_NONE))
> > > >
> > > > Acked-by: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > ...however that usage looks wrong. The expectation is that if kernel
> > > > integrity protections are enabled then raw command access should be
> > > > disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> > > > in terms of the command capabilities to filter.
> > >
> > > Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
> > > and I didn't want to go down yet another rabbit hole trying to fix it.
> > > I'll look at this again once this patch is settled - it may indeed be
> > > as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.
> >
> > At this point you should be well aware of my distaste for merging
> > patches that have known bugs in them.  Yes, this is a pre-existing
> > condition, but it seems well within the scope of this work to address
> > it as well.
> >
> > This isn't something that is going to get merged while the merge
> > window is open, so at the very least you've got almost two weeks to
> > sort this out - please do that.
>
> Yes, apologies, I should have sent the fix shortly after noticing the
> problem. I'll get the CXL bug fix out of the way so Ondrej can move
> this along.

Thanks Dan.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
From: Dan Williams @ 2021-08-31 18:58 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-efi, Linux PCI, linux-cxl, Linux Kernel Mailing List,
	Steffen Klassert, Herbert Xu, X86 ML, James Morris, Linux ACPI,
	Ingo Molnar, linux-serial, Linux-pm mailing list, SElinux list,
	Steven Rostedt, Casey Schaufler, Netdev, Stephen Smalley,
	Kexec Mailing List, Ondrej Mosnacek, Linux Security Module list,
	linux-fsdevel, bpf, linuxppc-dev, David S . Miller
In-Reply-To: <CAHC9VhTGECM2p+Q8n48aSdfJzY6XrpXQ5tcFurjWc4A3n8Qxjg@mail.gmail.com>

On Tue, Aug 31, 2021 at 6:53 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, Aug 31, 2021 at 5:09 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > On Sat, Jun 19, 2021 at 12:18 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > > On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> ...
>
> > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > index 2acc6173da36..c1747b6555c7 100644
> > > > --- a/drivers/cxl/mem.c
> > > > +++ b/drivers/cxl/mem.c
> > > > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> > > >         if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > > >                 return false;
> > > >
> > > > -       if (security_locked_down(LOCKDOWN_NONE))
> > > > +       if (security_locked_down(current_cred(), LOCKDOWN_NONE))
> > >
> > > Acked-by: Dan Williams <dan.j.williams@intel.com>
> > >
> > > ...however that usage looks wrong. The expectation is that if kernel
> > > integrity protections are enabled then raw command access should be
> > > disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> > > in terms of the command capabilities to filter.
> >
> > Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
> > and I didn't want to go down yet another rabbit hole trying to fix it.
> > I'll look at this again once this patch is settled - it may indeed be
> > as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.
>
> At this point you should be well aware of my distaste for merging
> patches that have known bugs in them.  Yes, this is a pre-existing
> condition, but it seems well within the scope of this work to address
> it as well.
>
> This isn't something that is going to get merged while the merge
> window is open, so at the very least you've got almost two weeks to
> sort this out - please do that.

Yes, apologies, I should have sent the fix shortly after noticing the
problem. I'll get the CXL bug fix out of the way so Ondrej can move
this along.

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS eba39ec79104b8b498e9862da658748e7539b100
From: kernel test robot @ 2021-08-31 15:38 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: eba39ec79104b8b498e9862da658748e7539b100  Automatic merge of 'next' into merge (2021-08-29 12:19)

elapsed time: 2995m

configs tested: 117
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm64                               defconfig
arm                                 defconfig
arm64                            allyesconfig
arm                              allyesconfig
arm                              allmodconfig
arm                            zeus_defconfig
h8300                       h8s-sim_defconfig
sh                         ap325rxa_defconfig
sh                          sdk7786_defconfig
sh                            titan_defconfig
sh                           se7722_defconfig
arc                         haps_hs_defconfig
sh                             espt_defconfig
sh                        edosk7705_defconfig
arm                          imote2_defconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
i386                 randconfig-a005-20210831
i386                 randconfig-a002-20210831
i386                 randconfig-a003-20210831
i386                 randconfig-a006-20210831
i386                 randconfig-a004-20210831
i386                 randconfig-a001-20210831
x86_64               randconfig-a014-20210829
x86_64               randconfig-a016-20210829
x86_64               randconfig-a015-20210829
x86_64               randconfig-a012-20210829
x86_64               randconfig-a013-20210829
x86_64               randconfig-a011-20210829
x86_64               randconfig-a014-20210830
x86_64               randconfig-a015-20210830
x86_64               randconfig-a013-20210830
x86_64               randconfig-a016-20210830
x86_64               randconfig-a012-20210830
x86_64               randconfig-a011-20210830
i386                 randconfig-a011-20210829
i386                 randconfig-a016-20210829
i386                 randconfig-a012-20210829
i386                 randconfig-a014-20210829
i386                 randconfig-a013-20210829
i386                 randconfig-a015-20210829
i386                 randconfig-a016-20210830
i386                 randconfig-a011-20210830
i386                 randconfig-a015-20210830
i386                 randconfig-a014-20210830
i386                 randconfig-a012-20210830
i386                 randconfig-a013-20210830
arc                  randconfig-r043-20210829
riscv                randconfig-r042-20210829
s390                 randconfig-r044-20210829
s390                 randconfig-r044-20210830
arc                  randconfig-r043-20210830
riscv                randconfig-r042-20210830
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                    rhel-8.3-kselftests
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-a001-20210829
x86_64               randconfig-a006-20210829
x86_64               randconfig-a005-20210829
x86_64               randconfig-a003-20210829
x86_64               randconfig-a004-20210829
x86_64               randconfig-a002-20210829
x86_64               randconfig-a005-20210830
x86_64               randconfig-a001-20210830
x86_64               randconfig-a003-20210830
x86_64               randconfig-a002-20210830
x86_64               randconfig-a004-20210830
x86_64               randconfig-a006-20210830
i386                 randconfig-a005-20210830
i386                 randconfig-a002-20210830
i386                 randconfig-a003-20210830
i386                 randconfig-a006-20210830
i386                 randconfig-a004-20210830
i386                 randconfig-a001-20210830
hexagon              randconfig-r041-20210829
hexagon              randconfig-r045-20210829

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH] powerpc/head_check: Fix shellcheck errors
From: Michael Ellerman @ 2021-08-31 14:02 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20210817125154.3369884-1-mpe@ellerman.id.au>

On Tue, 17 Aug 2021 22:51:54 +1000, Michael Ellerman wrote:
> Replace "cat file | grep pattern" with "grep pattern file", and quote a
> few variables. Together that fixes all shellcheck errors.

Applied to powerpc/next.

[1/1] powerpc/head_check: Fix shellcheck errors
      https://git.kernel.org/powerpc/c/e95ad5f21693a37c0d318b5988c5a0de324eb3e3

cheers

^ permalink raw reply

* Re: [PATCH v2] powerpc/32: Add support for out-of-line static calls
From: Peter Zijlstra @ 2021-08-31 14:00 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-kernel, Steven Rostedt, Jason Baron, Paul Mackerras,
	Josh Poimboeuf, linuxppc-dev, Ard Biesheuvel
In-Reply-To: <97f252fcd63e145f54fbf85124c75fb01e96e1bb.1630415517.git.christophe.leroy@csgroup.eu>

On Tue, Aug 31, 2021 at 01:12:26PM +0000, Christophe Leroy wrote:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 36b72d972568..a0fe69d8ec83 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -247,6 +247,7 @@ config PPC
>  	select HAVE_SOFTIRQ_ON_OWN_STACK
>  	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>  	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> +	select HAVE_STATIC_CALL			if PPC32
>  	select HAVE_SYSCALL_TRACEPOINTS
>  	select HAVE_VIRT_CPU_ACCOUNTING
>  	select HUGETLB_PAGE_SIZE_VARIABLE	if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
> new file mode 100644
> index 000000000000..2402c6d32439
> --- /dev/null
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_STATIC_CALL_H
> +#define _ASM_POWERPC_STATIC_CALL_H
> +
> +#define __POWERPC_SCT(name, inst)					\
> +	asm(".pushsection .text, \"ax\"				\n"	\
> +	    ".align 5						\n"	\
> +	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
> +	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
> +	    inst "						\n"	\
> +	    "	lis	12,1f@ha				\n"	\
> +	    "	lwz	12,1f@l(12)				\n"	\
> +	    "	mtctr	12					\n"	\
> +	    "	bctr						\n"	\
> +	    "1:	.long 0						\n"	\
> +	    "	nop						\n"	\
> +	    "	nop						\n"	\
> +	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
> +	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> +	    ".popsection					\n")
> +
> +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)	__POWERPC_SCT(name, "b " #func)
> +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)	__POWERPC_SCT(name, "blr")
> +
> +#endif /* _ASM_POWERPC_STATIC_CALL_H */

> diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
> new file mode 100644
> index 000000000000..e5e78205ccb4
> --- /dev/null
> +++ b/arch/powerpc/kernel/static_call.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/memory.h>
> +#include <linux/static_call.h>
> +
> +#include <asm/code-patching.h>
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
> +{
> +	int err;
> +	unsigned long target = (long)func;
> +	bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
> +
> +	if (!tramp)
> +		return;
> +
> +	mutex_lock(&text_mutex);
> +
> +	if (func && !is_short) {
> +		err = patch_instruction(tramp + 20, ppc_inst(target));
> +		if (err)
> +			goto out;
> +	}
> +
> +	if (!func)
> +		err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> +	else if (is_short)
> +		err = patch_branch(tramp, target, 0);
> +	else
> +		err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
> +out:
> +	mutex_unlock(&text_mutex);
> +
> +	if (err)
> +		panic("%s: patching failed %pS at %pS\n", __func__, func, tramp);
> +}
> +EXPORT_SYMBOL_GPL(arch_static_call_transform);

Yes, this should work nicely!

Since you have the two nop's at the end, you could frob in an
optimization for __static_call_return0 without too much issue.

Replace the two nops with (excuse my ppc asm):

	li r3, 0
	blr

and augment arch_static_call_transform() with something like:

	if (func == &__static_call_return0)
		err = patch_branch(tramp, tramp+24, 0);


^ permalink raw reply

* Re: [PATCH v3 1/3] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
From: Michael Ellerman @ 2021-08-31 14:00 UTC (permalink / raw)
  To: Christophe Leroy, Paul Mackerras, npiggin, Michael Ellerman,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <385ead49ccb66a259b25fee3eebf0bd4094068f3.1629707037.git.christophe.leroy@csgroup.eu>

On Mon, 23 Aug 2021 08:24:20 +0000 (UTC), Christophe Leroy wrote:
> In those hot functions that are called at every interrupt, any saved
> cycle is worth it.
> 
> interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
> called from three places:
> - From entry_32.S
> - From interrupt_64.S
> - From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
      https://git.kernel.org/powerpc/c/133c17a1788d68c9fff59d5f724a4ba14647a16d

cheers

^ permalink raw reply

* Re: [PATCH v3 0/5] Updates to powerpc for robust CPU online/offline
From: Michael Ellerman @ 2021-08-31 13:57 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Vincent Guittot, Peter Zijlstra,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar
In-Reply-To: <20210826100521.412639-1-srikar@linux.vnet.ibm.com>

On Thu, 26 Aug 2021 15:35:16 +0530, Srikar Dronamraju wrote:
> Changelog v2 -> v3:
> v2: https://lore.kernel.org/linuxppc-dev/20210821102535.169643-1-srikar@linux.vnet.ibm.com/t/#u
> Add patch 1: to drop dbg and numa=debug (Suggested by Michael Ellerman)
> Add patch 2: to convert printk to pr_xxx (Suggested by Michael Ellerman)
> 	Use pr_warn instead of pr_debug(WARNING) (Suggested by Laurent)
> 
> Changelog v1 -> v2:
> Moved patch to this series: powerpc/numa: Fill distance_lookup_table for offline nodes
> fixed a missing prototype warning
> 
> [...]

Patches 1-4 applied to powerpc/next.

[1/5] powerpc/numa: Drop dbg in favour of pr_debug
      https://git.kernel.org/powerpc/c/544af6429777cefae2f8af9a9866df5e8cb21763
[2/5] powerpc/numa: convert printk to pr_xxx
      https://git.kernel.org/powerpc/c/506c2075ffd8db352c53201ef166948a272e3bce
[3/5] powerpc/numa: Print debug statements only when required
      https://git.kernel.org/powerpc/c/544a09ee7434b949552266d20ece538d35535bd5
[4/5] powerpc/numa: Update cpu_cpu_map on CPU online/offline
      https://git.kernel.org/powerpc/c/9a245d0e1f006bc7ccf0285d0d520ed304d00c4a

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox