LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Vinod Koul @ 2020-08-05 13:19 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Song Liu, Alexei Starovoitov, Shawn Guo, Fabio Estevam,
	Daniel Borkmann, John Fastabend, Zhang Wei, NXP Linux Team,
	Yonghong Song, Andrii Nakryiko, Sascha Hauer, KP Singh,
	Dan Williams, linux-arm-kernel, netdev, linux-kernel, Li Yang,
	Pengutronix Kernel Team, dmaengine, bpf, linuxppc-dev,
	Martin KaFai Lau
In-Reply-To: <20200727203413.GA6245@embeddedor>

On 27-07-20, 15:34, Gustavo A. R. Silva wrote:

> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 2c508ee672b9..9b69716172a4 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>  
>  		if (_state(thrd) == PL330_STATE_KILLING)
>  			UNTIL(thrd, PL330_STATE_STOPPED)
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_FAULTING:
>  		_stop(thrd);
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_KILLING:
>  	case PL330_STATE_COMPLETING:
>  		UNTIL(thrd, PL330_STATE_STOPPED)
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_STOPPED:
>  		return _trigger(thrd);
> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>  
>  	switch (direction) {
>  	case DMA_MEM_TO_MEM:
> -		/* fall through */
>  	case DMA_MEM_TO_DEV:
>  		off += _emit_LD(dry_run, &buf[off], cond);
>  		break;
> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>  
>  	switch (direction) {
>  	case DMA_MEM_TO_MEM:
> -		/* fall through */
>  	case DMA_DEV_TO_MEM:
>  		off += _emit_ST(dry_run, &buf[off], cond);
>  		break;
> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>  
>  	switch (pxs->desc->rqtype) {
>  	case DMA_MEM_TO_DEV:
> -		/* fall through */
>  	case DMA_DEV_TO_MEM:
>  		off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>  			cond);
> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>  
>  	switch (pxs->desc->rqtype) {
>  	case DMA_MEM_TO_DEV:
> -		/* fall through */

replacement missed here and above few

-- 
~Vinod

^ permalink raw reply

* [PATCH v2] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 13:35 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>

All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.

Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.

This should prevent some soft lockups when starting large guests

Code has meaning only if CONFIG_MEMORY_HOTPLUG is set, otherwise the nid
field is not present in the drmem_lmb structure.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/mm/drmem.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..c11b6ec99ea3 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -402,6 +402,9 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	const __be32 *p;
 	u32 i, j, lmb_sets;
 	int lmb_index;
+#ifdef CONFIG_MEMORY_HOTPLUG
+	struct drmem_lmb *first = NULL;
+#endif
 
 	lmb_sets = of_read_number(prop++, 1);
 	if (lmb_sets == 0)
@@ -426,6 +429,15 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	for (i = 0; i < lmb_sets; i++) {
 		read_drconf_v2_cell(&dr_cell, &p);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+		/*
+		 * Fetch the NUMA node id for the fist set or if the
+		 * associativity index is different from the previous set.
+		 */
+		if (first && dr_cell.aa_index != first->aa_index)
+			first = NULL;
+#endif
+
 		for (j = 0; j < dr_cell.seq_lmbs; j++) {
 			lmb = &drmem_info->lmbs[lmb_index++];
 
@@ -438,7 +450,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 			lmb->aa_index = dr_cell.aa_index;
 			lmb->flags = dr_cell.flags;
 
-			lmb_set_nid(lmb);
+#ifdef CONFIG_MEMORY_HOTPLUG
+			/*
+			 * All the LMB in the set share the same NUMA
+			 * associativity property. So read that node only once.
+			 */
+			if (!first) {
+				lmb_set_nid(lmb);
+				first = lmb;
+			} else {
+				lmb->nid = first->nid;
+			}
+#endif
 		}
 	}
 }
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 13:35 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Christophe Leroy, nathanl, arnd, linux-kernel,
	Tulio Magno Quites Machado Filho, Paul Mackerras, luto,
	linux-arch, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <87wo2dy5in.fsf@mpe.ellerman.id.au>

Hi!

On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> > frame whenever it has anything to same.
> 
> Yeah OK that would explain it.
> 
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> >     fbb60:	94 21 ff e0 	stwu    r1,-32(r1)

This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).

> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> >     .cfi_startproc
> >   	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
> >   	mflr		r0
> > +	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
> >     .cfi_register lr, r0
> 
> The cfi_register should come directly after the mflr I think.

That is the idiomatic way to write it, and most obviously correct.  But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding.  Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)

> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if 
> > you think that's needed, just let's do it:
> 
> Hmm, true.
> 
> The compiler will use the toc for globals (and possibly also for large
> constants?)

And anything else it bloody well wants to, yeah :-)

> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.

Yes.

> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.

If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)

> So yeah we should probably do as below.

[ snip ]

Looks good yes.


Segher

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 14:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
> 
>   18:	35 25 ff e0 	addic.  r9,r5,-32
>   1c:	41 80 00 10 	blt     2c <shift+0x14>
>   20:	7c 64 4c 30 	srw     r4,r3,r9
>   24:	38 60 00 00 	li      r3,0
> ...
>   2c:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
>   30:	21 45 00 1f 	subfic  r10,r5,31
>   34:	7c 84 2c 30 	srw     r4,r4,r5
>   38:	7d 29 50 30 	slw     r9,r9,r10
>   3c:	7c 63 2c 30 	srw     r3,r3,r5
>   40:	7d 24 23 78 	or      r4,r9,r4
> 
> In our case the shift is always <= 32. In addition,  the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
> 
> With the patch, we get:
>    0:	21 25 00 20 	subfic  r9,r5,32
>    4:	7c 69 48 30 	slw     r9,r3,r9
>    8:	7c 84 2c 30 	srw     r4,r4,r5
>    c:	7d 24 23 78 	or      r4,r9,r4
>   10:	7c 63 2c 30 	srw     r3,r3,r5

See below.  Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).


> +/*
> + * The macros sets two stack frames, one for the caller and one for the callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */

If the caller follows the ABI, there always is a stack frame.  So what
is going on?


> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */

It cannot optimise it because it does not know shift < 32.  The code
below is incorrect for shift equal to 32, fwiw.

> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> +{
> +	return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> +	u32 hi = ns >> 32;
> +	u32 lo = ns;
> +
> +	lo >>= shift;
> +	lo |= hi << (32 - shift);
> +	hi >>= shift;


> +	if (likely(hi == 0))
> +		return lo;

Removing these two lines shouldn't change generated object code?  Or not
make it worse, at least.

> +	return ((u64)hi << 32) | lo;
> +}


What does the compiler do for just

static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
	return ns >> (shift & 31);
}

?


Segher

^ permalink raw reply

* [PATCH] powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

On BOOK3S_32, when we have modules and strict kernel RWX, modules
are not in vmalloc space but in a dedicated segment that is
below PAGE_OFFSET.

So KASAN_SHADOW_START must take it into account.

MODULES_VADDR can't be used because it is not defined yet
in kasan.h

Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/kasan.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index d635b96c7ea6..7355ed05e65e 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -15,11 +15,18 @@
 #ifndef __ASSEMBLY__
 
 #include <asm/page.h>
+#include <linux/sizes.h>
 
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
+#if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_MODULES) && defined(CONFIG_STRICT_KERNEL_RWX)
+#define KASAN_KERN_START	ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
+#else
+#define KASAN_KERN_START	PAGE_OFFSET
+#endif
+
 #define KASAN_SHADOW_START	(KASAN_SHADOW_OFFSET + \
-				 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
+				 (KASAN_KERN_START >> KASAN_SHADOW_SCALE_SHIFT))
 
 #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

When MODULES_VADDR is defined, is_module_segment() shall check the
address against it instead of checking agains VMALLOC_START.

Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/mm/book3s32/mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index c0162911f6cb..82ae9e06a773 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -191,10 +191,17 @@ static bool is_module_segment(unsigned long addr)
 {
 	if (!IS_ENABLED(CONFIG_MODULES))
 		return false;
+#ifdef MODULES_VADDR
+	if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
+		return false;
+	if (addr >= ALIGN(MODULES_END, SZ_256M))
+		return false;
+#else
 	if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
 		return false;
 	if (addr >= ALIGN(VMALLOC_END, SZ_256M))
 		return false;
+#endif
 	return true;
 }
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-05 15:52 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200722022422.825197-1-mpe@ellerman.id.au>



Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
> The assembler says:
>    arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression

I get exactly the same with head_32.S, for the exact same reason.

Christophe

> 
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
> 
> Fix it to use a literal 0, the generated code is identical.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>   arch/powerpc/kernel/head_40x.S | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 926bfa73586a..5b282d9965a5 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -620,7 +620,7 @@ _ENTRY(saved_ksp_limit)
>   	ori	r6, r6, swapper_pg_dir@l
>   	lis	r5, abatron_pteptrs@h
>   	ori	r5, r5, abatron_pteptrs@l
> -	stw	r5, 0xf0(r0)	/* Must match your Abatron config file */
> +	stw	r5, 0xf0(0)	/* Must match your Abatron config file */
>   	tophys(r5,r5)
>   	stw	r6, 0(r5)
>   
> 

^ permalink raw reply

* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Tyrel Datwyler @ 2020-08-05 16:14 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20200805131922.GZ12965@vkoul-mobl>

On 8/5/20 6:19 AM, Vinod Koul wrote:
> On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
> 
>> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
>> index 2c508ee672b9..9b69716172a4 100644
>> --- a/drivers/dma/pl330.c
>> +++ b/drivers/dma/pl330.c
>> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>>  
>>  		if (_state(thrd) == PL330_STATE_KILLING)
>>  			UNTIL(thrd, PL330_STATE_STOPPED)
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_FAULTING:
>>  		_stop(thrd);
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_KILLING:
>>  	case PL330_STATE_COMPLETING:
>>  		UNTIL(thrd, PL330_STATE_STOPPED)
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_STOPPED:
>>  		return _trigger(thrd);
>> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>>  
>>  	switch (direction) {
>>  	case DMA_MEM_TO_MEM:
>> -		/* fall through */
>>  	case DMA_MEM_TO_DEV:
>>  		off += _emit_LD(dry_run, &buf[off], cond);
>>  		break;
>> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>>  
>>  	switch (direction) {
>>  	case DMA_MEM_TO_MEM:
>> -		/* fall through */
>>  	case DMA_DEV_TO_MEM:
>>  		off += _emit_ST(dry_run, &buf[off], cond);
>>  		break;
>> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>>  
>>  	switch (pxs->desc->rqtype) {
>>  	case DMA_MEM_TO_DEV:
>> -		/* fall through */
>>  	case DMA_DEV_TO_MEM:
>>  		off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>>  			cond);
>> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>>  
>>  	switch (pxs->desc->rqtype) {
>>  	case DMA_MEM_TO_DEV:
>> -		/* fall through */
> 
> replacement missed here and above few
> 

Its not obvious via most of the documentation, but a case label followed
immediately by another case label is the only allowed implicit fall through as
it is obvious to the eye that there is no postceding statement.

For example the following is legal:

    case FOO:
    	/* fallthrough */ (or fallthrough;)
    case BAR:

is converted to:

    case FOO:
    case BAR:

I would assume the justification is that it is common to have a switch statement
where several case statements fall directly through to a single code block and
annotating each case label seems like overkill.

        switch (vhost->state) {
        case IBMVFC_LINK_DEAD:
        case IBMVFC_HOST_OFFLINE:
                result = DID_NO_CONNECT << 16;
                break;
        case IBMVFC_NO_CRQ:
        case IBMVFC_INITIALIZING:
        case IBMVFC_HALTED:
        case IBMVFC_LINK_DOWN:
                result = DID_REQUEUE << 16;
                break;
        case IBMVFC_ACTIVE:
                result = 0;
                break;
        }

Regards,

Tyrel

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>

Hi,

On 08/05/2020 02:03 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * powerpc specific delta calculation.
>> + *
>> + * This variant removes the masking of the subtraction because the
>> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
>> + * which would result in a pointless operation. The compiler cannot
>> + * optimize it away as the mask comes from the vdso data and is not compile
>> + * time constant.
>> + */
> 
> It cannot optimise it because it does not know shift < 32.  The code
> below is incorrect for shift equal to 32, fwiw.

Is there a way to tell it ?

> 
>> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
>> +{
>> +	return (cycles - last) * mult;
>> +}
>> +#define vdso_calc_delta vdso_calc_delta
>> +
>> +#ifndef __powerpc64__
>> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
>> +{
>> +	u32 hi = ns >> 32;
>> +	u32 lo = ns;
>> +
>> +	lo >>= shift;
>> +	lo |= hi << (32 - shift);
>> +	hi >>= shift;
> 
> 
>> +	if (likely(hi == 0))
>> +		return lo;
> 
> Removing these two lines shouldn't change generated object code?  Or not
> make it worse, at least.

I remember it made noticeable difference allthough I can't remember the 
details. See below with GCC 10.1. At least we see that with those two 
lines, GCC only sets a 16 bytes stack frame. Without those lines it sets 
a 32 bytes stack frame and seems to save some values for no reason.

With the two lines:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 04 	bgt     7b4 <__c_kernel_clock_gettime+0x108>
  6b4:	39 40 00 01 	li      r10,1
  6b8:	7d 4a 18 30 	slw     r10,r10,r3
  6bc:	71 47 08 83 	andi.   r7,r10,2179
  6c0:	41 82 01 2c 	beq     7ec <__c_kernel_clock_gettime+0x140>
  6c4:	94 21 ff f0 	stwu    r1,-16(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 e1 00 0c 	stw     r31,12(r1)
  6d0:	7d 85 1a 14 	add     r12,r5,r3
  6d4:	80 05 00 00 	lwz     r0,0(r5)
  6d8:	70 06 00 01 	andi.   r6,r0,1
  6dc:	40 82 00 d4 	bne     7b0 <__c_kernel_clock_gettime+0x104>
  6e0:	7d 4d 42 e6 	mftbu   r10
  6e4:	7d 6c 42 e6 	mftb    r11
  6e8:	7c ed 42 e6 	mftbu   r7
  6ec:	7c 0a 38 40 	cmplw   r10,r7
  6f0:	40 82 ff f0 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  6f4:	80 e5 00 0c 	lwz     r7,12(r5)
  6f8:	80 65 00 08 	lwz     r3,8(r5)
  6fc:	7c e7 58 10 	subfc   r7,r7,r11
  700:	81 65 00 18 	lwz     r11,24(r5)
  704:	7d 43 51 10 	subfe   r10,r3,r10
  708:	7f e7 58 16 	mulhwu  r31,r7,r11
  70c:	7d 4a 59 d6 	mullw   r10,r10,r11
  710:	7c e7 59 d6 	mullw   r7,r7,r11
  714:	80 6c 00 2c 	lwz     r3,44(r12)
  718:	81 6c 00 28 	lwz     r11,40(r12)
  71c:	7c e7 18 14 	addc    r7,r7,r3
  720:	7d 4a fa 14 	add     r10,r10,r31
  724:	80 65 00 1c 	lwz     r3,28(r5)
  728:	7d 4a 59 14 	adde    r10,r10,r11
  72c:	7c e7 1c 30 	srw     r7,r7,r3
  730:	21 63 00 20 	subfic  r11,r3,32
  734:	7d 43 1c 31 	srw.    r3,r10,r3
  738:	7d 4a 58 30 	slw     r10,r10,r11
  73c:	7d 49 3b 78 	or      r9,r10,r7
  740:	39 00 00 00 	li      r8,0
  744:	40 82 00 84 	bne     7c8 <__c_kernel_clock_gettime+0x11c>
  748:	80 6c 00 24 	lwz     r3,36(r12)
  74c:	81 45 00 00 	lwz     r10,0(r5)
  750:	7c 00 50 40 	cmplw   r0,r10
  754:	40 a2 ff 80 	bne     6d4 <__c_kernel_clock_gettime+0x28>
  758:	2c 08 00 00 	cmpwi   r8,0
  75c:	41 82 00 7c 	beq     7d8 <__c_kernel_clock_gettime+0x12c>
  760:	3c e0 c4 65 	lis     r7,-15259
  764:	3c 00 3b 9a 	lis     r0,15258
  768:	60 e7 36 00 	ori     r7,r7,13824
  76c:	60 00 c9 ff 	ori     r0,r0,51711
  770:	7c a9 38 14 	addc    r5,r9,r7
  774:	7d 48 01 d4 	addme   r10,r8
  778:	2c 0a 00 00 	cmpwi   r10,0
  77c:	7d 48 53 78 	mr      r8,r10
  780:	7c a9 2b 78 	mr      r9,r5
  784:	38 c6 00 01 	addi    r6,r6,1
  788:	40 82 ff e8 	bne     770 <__c_kernel_clock_gettime+0xc4>
  78c:	7c 05 00 40 	cmplw   r5,r0
  790:	41 81 ff e0 	bgt     770 <__c_kernel_clock_gettime+0xc4>
  794:	7c 66 18 14 	addc    r3,r6,r3
  798:	90 64 00 00 	stw     r3,0(r4)
  79c:	91 24 00 04 	stw     r9,4(r4)
  7a0:	38 60 00 00 	li      r3,0
  7a4:	83 e1 00 0c 	lwz     r31,12(r1)
  7a8:	38 21 00 10 	addi    r1,r1,16
  7ac:	4e 80 00 20 	blr
  7b0:	4b ff ff 24 	b       6d4 <__c_kernel_clock_gettime+0x28>
  7b4:	38 00 00 f6 	li      r0,246
  7b8:	44 00 00 02 	sc
  7bc:	40 a3 00 08 	bns     7c4 <__c_kernel_clock_gettime+0x118>
  7c0:	7c 63 00 d0 	neg     r3,r3
  7c4:	4e 80 00 20 	blr
  7c8:	7d 2a 4b 78 	mr      r10,r9
  7cc:	7c 68 1b 78 	mr      r8,r3
  7d0:	7d 49 53 78 	mr      r9,r10
  7d4:	4b ff ff 74 	b       748 <__c_kernel_clock_gettime+0x9c>
  7d8:	3d 40 3b 9a 	lis     r10,15258
  7dc:	61 4a c9 ff 	ori     r10,r10,51711
  7e0:	7c 09 50 40 	cmplw   r9,r10
  7e4:	41 81 ff 7c 	bgt     760 <__c_kernel_clock_gettime+0xb4>
  7e8:	4b ff ff b0 	b       798 <__c_kernel_clock_gettime+0xec>
  7ec:	71 47 00 60 	andi.   r7,r10,96
  7f0:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  7f4:	7d 25 4a 14 	add     r9,r5,r9
  7f8:	40 82 00 14 	bne     80c <__c_kernel_clock_gettime+0x160>
  7fc:	71 4a 00 10 	andi.   r10,r10,16
  800:	41 a2 ff b4 	beq     7b4 <__c_kernel_clock_gettime+0x108>
  804:	38 a5 00 f0 	addi    r5,r5,240
  808:	4b ff fe bc 	b       6c4 <__c_kernel_clock_gettime+0x18>
  80c:	81 05 00 00 	lwz     r8,0(r5)
  810:	71 0a 00 01 	andi.   r10,r8,1
  814:	40 a2 ff f8 	bne     80c <__c_kernel_clock_gettime+0x160>
  818:	80 69 00 24 	lwz     r3,36(r9)
  81c:	81 49 00 2c 	lwz     r10,44(r9)
  820:	80 e5 00 00 	lwz     r7,0(r5)
  824:	7c 08 38 40 	cmplw   r8,r7
  828:	40 a2 ff e4 	bne     80c <__c_kernel_clock_gettime+0x160>
  82c:	90 64 00 00 	stw     r3,0(r4)
  830:	91 44 00 04 	stw     r10,4(r4)
  834:	38 60 00 00 	li      r3,0
  838:	4e 80 00 20 	blr


Without the two lines:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 14 	bgt     7c4 <__c_kernel_clock_gettime+0x118>
  6b4:	39 20 00 01 	li      r9,1
  6b8:	7d 29 18 30 	slw     r9,r9,r3
  6bc:	71 2a 08 83 	andi.   r10,r9,2179
  6c0:	41 82 01 2c 	beq     7ec <__c_kernel_clock_gettime+0x140>
  6c4:	94 21 ff e0 	stwu    r1,-32(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 81 00 10 	stw     r28,16(r1)
  6d0:	93 a1 00 14 	stw     r29,20(r1)
  6d4:	93 c1 00 18 	stw     r30,24(r1)
  6d8:	93 e1 00 1c 	stw     r31,28(r1)
  6dc:	7c 65 1a 14 	add     r3,r5,r3
  6e0:	81 85 00 00 	lwz     r12,0(r5)
  6e4:	71 87 00 01 	andi.   r7,r12,1
  6e8:	40 82 00 d8 	bne     7c0 <__c_kernel_clock_gettime+0x114>
  6ec:	7d 2d 42 e6 	mftbu   r9
  6f0:	7c cc 42 e6 	mftb    r6
  6f4:	7d 4d 42 e6 	mftbu   r10
  6f8:	7c 09 50 40 	cmplw   r9,r10
  6fc:	40 82 ff f0 	bne     6ec <__c_kernel_clock_gettime+0x40>
  700:	83 83 00 28 	lwz     r28,40(r3)
  704:	83 a3 00 2c 	lwz     r29,44(r3)
  708:	81 65 00 08 	lwz     r11,8(r5)
  70c:	81 05 00 0c 	lwz     r8,12(r5)
  710:	83 c5 00 18 	lwz     r30,24(r5)
  714:	83 e5 00 1c 	lwz     r31,28(r5)
  718:	80 03 00 24 	lwz     r0,36(r3)
  71c:	81 45 00 00 	lwz     r10,0(r5)
  720:	7c 0c 50 40 	cmplw   r12,r10
  724:	40 a2 ff bc 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  728:	7d 48 30 10 	subfc   r10,r8,r6
  72c:	7c cb 49 10 	subfe   r6,r11,r9
  730:	7c c6 f1 d6 	mullw   r6,r6,r30
  734:	7d 2a f0 16 	mulhwu  r9,r10,r30
  738:	7d 4a f1 d6 	mullw   r10,r10,r30
  73c:	7c c6 4a 14 	add     r6,r6,r9
  740:	7d 4a e8 14 	addc    r10,r10,r29
  744:	7c c6 e1 14 	adde    r6,r6,r28
  748:	7c c8 fc 30 	srw     r8,r6,r31
  74c:	2c 08 00 00 	cmpwi   r8,0
  750:	20 bf 00 20 	subfic  r5,r31,32
  754:	7d 4a fc 30 	srw     r10,r10,r31
  758:	7c c5 28 30 	slw     r5,r6,r5
  75c:	7c a9 53 78 	or      r9,r5,r10
  760:	41 82 00 78 	beq     7d8 <__c_kernel_clock_gettime+0x12c>
  764:	3c c0 c4 65 	lis     r6,-15259
  768:	3c 60 3b 9a 	lis     r3,15258
  76c:	60 c6 36 00 	ori     r6,r6,13824
  770:	60 63 c9 ff 	ori     r3,r3,51711
  774:	7c a9 30 14 	addc    r5,r9,r6
  778:	7d 48 01 d4 	addme   r10,r8
  77c:	2c 0a 00 00 	cmpwi   r10,0
  780:	7d 48 53 78 	mr      r8,r10
  784:	7c a9 2b 78 	mr      r9,r5
  788:	38 e7 00 01 	addi    r7,r7,1
  78c:	40 82 ff e8 	bne     774 <__c_kernel_clock_gettime+0xc8>
  790:	7c 05 18 40 	cmplw   r5,r3
  794:	41 81 ff e0 	bgt     774 <__c_kernel_clock_gettime+0xc8>
  798:	7c 07 00 14 	addc    r0,r7,r0
  79c:	90 04 00 00 	stw     r0,0(r4)
  7a0:	91 24 00 04 	stw     r9,4(r4)
  7a4:	38 60 00 00 	li      r3,0
  7a8:	83 81 00 10 	lwz     r28,16(r1)
  7ac:	83 a1 00 14 	lwz     r29,20(r1)
  7b0:	83 c1 00 18 	lwz     r30,24(r1)
  7b4:	83 e1 00 1c 	lwz     r31,28(r1)
  7b8:	38 21 00 20 	addi    r1,r1,32
  7bc:	4e 80 00 20 	blr
  7c0:	4b ff ff 20 	b       6e0 <__c_kernel_clock_gettime+0x34>
  7c4:	38 00 00 f6 	li      r0,246
  7c8:	44 00 00 02 	sc
  7cc:	40 a3 00 08 	bns     7d4 <__c_kernel_clock_gettime+0x128>
  7d0:	7c 63 00 d0 	neg     r3,r3
  7d4:	4e 80 00 20 	blr
  7d8:	3d 40 3b 9a 	lis     r10,15258
  7dc:	61 4a c9 ff 	ori     r10,r10,51711
  7e0:	7c 09 50 40 	cmplw   r9,r10
  7e4:	41 81 ff 80 	bgt     764 <__c_kernel_clock_gettime+0xb8>
  7e8:	4b ff ff b4 	b       79c <__c_kernel_clock_gettime+0xf0>
  7ec:	71 2a 00 60 	andi.   r10,r9,96
  7f0:	40 82 00 14 	bne     804 <__c_kernel_clock_gettime+0x158>
  7f4:	71 29 00 10 	andi.   r9,r9,16
  7f8:	41 a2 ff cc 	beq     7c4 <__c_kernel_clock_gettime+0x118>
  7fc:	38 a5 00 f0 	addi    r5,r5,240
  800:	4b ff fe c4 	b       6c4 <__c_kernel_clock_gettime+0x18>
  804:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  808:	7d 25 4a 14 	add     r9,r5,r9
  80c:	81 05 00 00 	lwz     r8,0(r5)
  810:	71 0a 00 01 	andi.   r10,r8,1
  814:	40 82 00 28 	bne     83c <__c_kernel_clock_gettime+0x190>
  818:	80 09 00 24 	lwz     r0,36(r9)
  81c:	81 49 00 2c 	lwz     r10,44(r9)
  820:	80 e5 00 00 	lwz     r7,0(r5)
  824:	7c 08 38 40 	cmplw   r8,r7
  828:	40 a2 ff e4 	bne     80c <__c_kernel_clock_gettime+0x160>
  82c:	90 04 00 00 	stw     r0,0(r4)
  830:	91 44 00 04 	stw     r10,4(r4)
  834:	38 60 00 00 	li      r3,0
  838:	4e 80 00 20 	blr
  83c:	4b ff ff d0 	b       80c <__c_kernel_clock_gettime+0x160>


> 
>> +	return ((u64)hi << 32) | lo;
>> +}
> 
> 
> What does the compiler do for just
> 
> static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> 	return ns >> (shift & 31);
> }
> 

Worse:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 30 	bgt     7e0 <__c_kernel_clock_gettime+0x134>
  6b4:	39 20 00 01 	li      r9,1
  6b8:	7d 29 18 30 	slw     r9,r9,r3
  6bc:	71 2a 08 83 	andi.   r10,r9,2179
  6c0:	41 82 01 48 	beq     808 <__c_kernel_clock_gettime+0x15c>
  6c4:	94 21 ff e0 	stwu    r1,-32(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 81 00 10 	stw     r28,16(r1)
  6d0:	93 a1 00 14 	stw     r29,20(r1)
  6d4:	93 c1 00 18 	stw     r30,24(r1)
  6d8:	93 e1 00 1c 	stw     r31,28(r1)
  6dc:	7c 65 1a 14 	add     r3,r5,r3
  6e0:	80 c5 00 00 	lwz     r6,0(r5)
  6e4:	70 c7 00 01 	andi.   r7,r6,1
  6e8:	40 82 00 f4 	bne     7dc <__c_kernel_clock_gettime+0x130>
  6ec:	7d 2d 42 e6 	mftbu   r9
  6f0:	7d 0c 42 e6 	mftb    r8
  6f4:	7d 4d 42 e6 	mftbu   r10
  6f8:	7c 09 50 40 	cmplw   r9,r10
  6fc:	40 82 ff f0 	bne     6ec <__c_kernel_clock_gettime+0x40>
  700:	83 83 00 28 	lwz     r28,40(r3)
  704:	83 c3 00 2c 	lwz     r30,44(r3)
  708:	81 65 00 08 	lwz     r11,8(r5)
  70c:	81 45 00 0c 	lwz     r10,12(r5)
  710:	83 e5 00 18 	lwz     r31,24(r5)
  714:	81 85 00 1c 	lwz     r12,28(r5)
  718:	80 03 00 24 	lwz     r0,36(r3)
  71c:	83 a5 00 00 	lwz     r29,0(r5)
  720:	7c 06 e8 40 	cmplw   r6,r29
  724:	40 a2 ff bc 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  728:	7d 0a 40 10 	subfc   r8,r10,r8
  72c:	7c cb 49 10 	subfe   r6,r11,r9
  730:	7c c6 f9 d6 	mullw   r6,r6,r31
  734:	7d 28 f8 16 	mulhwu  r9,r8,r31
  738:	7d 08 f9 d6 	mullw   r8,r8,r31
  73c:	55 8c 06 fe 	clrlwi  r12,r12,27
  740:	7f c8 f0 14 	addc    r30,r8,r30
  744:	7c c6 4a 14 	add     r6,r6,r9
  748:	7c c6 e1 14 	adde    r6,r6,r28
  74c:	34 6c ff e0 	addic.  r3,r12,-32
  750:	41 80 00 70 	blt     7c0 <__c_kernel_clock_gettime+0x114>
  754:	7c c9 1c 30 	srw     r9,r6,r3
  758:	39 00 00 00 	li      r8,0
  75c:	2c 08 00 00 	cmpwi   r8,0
  760:	41 82 00 94 	beq     7f4 <__c_kernel_clock_gettime+0x148>
  764:	3c c0 c4 65 	lis     r6,-15259
  768:	3c 60 3b 9a 	lis     r3,15258
  76c:	60 c6 36 00 	ori     r6,r6,13824
  770:	60 63 c9 ff 	ori     r3,r3,51711
  774:	7c a9 30 14 	addc    r5,r9,r6
  778:	7d 48 01 d4 	addme   r10,r8
  77c:	2c 0a 00 00 	cmpwi   r10,0
  780:	7d 48 53 78 	mr      r8,r10
  784:	7c a9 2b 78 	mr      r9,r5
  788:	38 e7 00 01 	addi    r7,r7,1
  78c:	40 82 ff e8 	bne     774 <__c_kernel_clock_gettime+0xc8>
  790:	7c 05 18 40 	cmplw   r5,r3
  794:	41 81 ff e0 	bgt     774 <__c_kernel_clock_gettime+0xc8>
  798:	7c 07 00 14 	addc    r0,r7,r0
  79c:	90 04 00 00 	stw     r0,0(r4)
  7a0:	91 24 00 04 	stw     r9,4(r4)
  7a4:	38 60 00 00 	li      r3,0
  7a8:	83 81 00 10 	lwz     r28,16(r1)
  7ac:	83 a1 00 14 	lwz     r29,20(r1)
  7b0:	83 c1 00 18 	lwz     r30,24(r1)
  7b4:	83 e1 00 1c 	lwz     r31,28(r1)
  7b8:	38 21 00 20 	addi    r1,r1,32
  7bc:	4e 80 00 20 	blr
  7c0:	54 c3 08 3c 	rlwinm  r3,r6,1,0,30
  7c4:	21 6c 00 1f 	subfic  r11,r12,31
  7c8:	7c 63 58 30 	slw     r3,r3,r11
  7cc:	7f c9 64 30 	srw     r9,r30,r12
  7d0:	7c 69 4b 78 	or      r9,r3,r9
  7d4:	7c c8 64 30 	srw     r8,r6,r12
  7d8:	4b ff ff 84 	b       75c <__c_kernel_clock_gettime+0xb0>
  7dc:	4b ff ff 04 	b       6e0 <__c_kernel_clock_gettime+0x34>
  7e0:	38 00 00 f6 	li      r0,246
  7e4:	44 00 00 02 	sc
  7e8:	40 a3 00 08 	bns     7f0 <__c_kernel_clock_gettime+0x144>
  7ec:	7c 63 00 d0 	neg     r3,r3
  7f0:	4e 80 00 20 	blr
  7f4:	3d 40 3b 9a 	lis     r10,15258
  7f8:	61 4a c9 ff 	ori     r10,r10,51711
  7fc:	7c 09 50 40 	cmplw   r9,r10
  800:	41 81 ff 64 	bgt     764 <__c_kernel_clock_gettime+0xb8>
  804:	4b ff ff 98 	b       79c <__c_kernel_clock_gettime+0xf0>
  808:	71 2a 00 60 	andi.   r10,r9,96
  80c:	40 82 00 14 	bne     820 <__c_kernel_clock_gettime+0x174>
  810:	71 29 00 10 	andi.   r9,r9,16
  814:	41 a2 ff cc 	beq     7e0 <__c_kernel_clock_gettime+0x134>
  818:	38 a5 00 f0 	addi    r5,r5,240
  81c:	4b ff fe a8 	b       6c4 <__c_kernel_clock_gettime+0x18>
  820:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  824:	7d 25 4a 14 	add     r9,r5,r9
  828:	81 05 00 00 	lwz     r8,0(r5)
  82c:	71 0a 00 01 	andi.   r10,r8,1
  830:	40 82 00 28 	bne     858 <__c_kernel_clock_gettime+0x1ac>
  834:	80 09 00 24 	lwz     r0,36(r9)
  838:	81 49 00 2c 	lwz     r10,44(r9)
  83c:	80 e5 00 00 	lwz     r7,0(r5)
  840:	7c 08 38 40 	cmplw   r8,r7
  844:	40 a2 ff e4 	bne     828 <__c_kernel_clock_gettime+0x17c>
  848:	90 04 00 00 	stw     r0,0(r4)
  84c:	91 44 00 04 	stw     r10,4(r4)
  850:	38 60 00 00 	li      r3,0
  854:	4e 80 00 20 	blr
  858:	4b ff ff d0 	b       828 <__c_kernel_clock_gettime+0x17c>

Christophe

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>

Hi Again,

Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> Hi!
> 
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * The macros sets two stack frames, one for the caller and one for the callee
>> + * because there are no requirement for the caller to set a stack frame when
>> + * calling VDSO so it may have omitted to set one, especially on PPC64
>> + */
> 
> If the caller follows the ABI, there always is a stack frame.  So what
> is going on?

Looks like it is not the case. See discussion at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/

Seems like GCC uses the redzone and doesn't set a stack frame. I guess 
it doesn't know that the inline assembly contains a function call so it 
doesn't set the frame.


Christophe

^ permalink raw reply

* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Miguel Ojeda @ 2020-08-05 17:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
	Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
	linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>

On Sun, Aug 2, 2020 at 6:41 PM Mike Rapoport <rppt@kernel.org> wrote:
>
>  .clang-format                  |  3 ++-

The .clang-format bit:

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Miguel Ojeda @ 2020-08-05 17:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
	Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
	linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>

On Sun, Aug 2, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>
>  .clang-format                    |  2 +-

The .clang-format bit:

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 18:08 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>

Travis reported successful compilation with mpe/merge:

https://travis-ci.org/github/LeoBras/linux-ppc/builds/715028857


^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 18:40 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <3db2a590-b842-83db-ed2b-f3ee62595f18@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 04:40:16PM +0000, Christophe Leroy wrote:
> >It cannot optimise it because it does not know shift < 32.  The code
> >below is incorrect for shift equal to 32, fwiw.
> 
> Is there a way to tell it ?

Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)

> >What does the compiler do for just
> >
> >static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> >	return ns >> (shift & 31);
> >}
> >
> 
> Worse:

I cannot make heads or tails of all that branch spaghetti, sorry.

>  73c:	55 8c 06 fe 	clrlwi  r12,r12,27
>  740:	7f c8 f0 14 	addc    r30,r8,r30
>  744:	7c c6 4a 14 	add     r6,r6,r9
>  748:	7c c6 e1 14 	adde    r6,r6,r28
>  74c:	34 6c ff e0 	addic.  r3,r12,-32
>  750:	41 80 00 70 	blt     7c0 <__c_kernel_clock_gettime+0x114>

This branch is always true.  Hrm.


Segher

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 20:55 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <7d409421-6396-8eba-8250-b6c9ff8232d9@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 06:51:44PM +0200, Christophe Leroy wrote:
> Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> >On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> >>+/*
> >>+ * The macros sets two stack frames, one for the caller and one for the 
> >>callee
> >>+ * because there are no requirement for the caller to set a stack frame 
> >>when
> >>+ * calling VDSO so it may have omitted to set one, especially on PPC64
> >>+ */
> >
> >If the caller follows the ABI, there always is a stack frame.  So what
> >is going on?
> 
> Looks like it is not the case. See discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
> 
> Seems like GCC uses the redzone and doesn't set a stack frame. I guess 
> it doesn't know that the inline assembly contains a function call so it 
> doesn't set the frame.

Yes, that is the problem.  See
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Extended-Asm.html#AssemblerTemplate
where this is (briefly) discussed:
  "Accessing data from C programs without using input/output operands
  (such as by using global symbols directly from the assembler
  template) may not work as expected. Similarly, calling functions
  directly from an assembler template requires a detailed understanding
  of the target assembler and ABI."

I don't know of a good way to tell GCC some function needs a frame (that
is, one that doesn't result in extra code other than to set up the
frame).  I'll think about it.


Segher

^ permalink raw reply

* How would I code a Write to a proc file from within the kernel without reading anything from user space?
From: thefirst ECS @ 2020-08-05 21:52 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <3103222.25048.1596318084416.JavaMail.Dan@DanHP>

In order to help debug a certain discrepancy, I need to "simulate" an "echo 1 > /proc/file" but doing it from kernel even when root file system is unavailable. 

I have simulated it just fine via call_usermodehelper (with argv etc of "echo 1 > /proc/file") from inside the kernel which triggers:
[ee897ec0] [c0122704] proc_reg_write+0x80/0xb4
[ee897ef0] [c00d3b7c] vfs_write+0xb4/0x184

just as I had wanted. But now I need to trigger the "vfs_write" and "proc_reg_write" but without using call_usermodehelper since I will be doing it when root "/" is unavailable and so I can no longer access /bin/echo and call the usermodehelper etc. So my question is how can I do that in kernel?

Not sure if I'm supposed to look in fs read_write.c and fs.h and write a method based on those or if there's some other way etc.

Thanks,
-Dan

^ permalink raw reply

* Re: [PATCH] macintosh: windfarm: fix spelling mistake "detatch" -> "detach"
From: Wolfram Sang @ 2020-08-05 22:08 UTC (permalink / raw)
  To: Colin King; +Cc: kernel-janitors, linuxppc-dev, linux-kernel
In-Reply-To: <20200805104337.16104-1-colin.king@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 375 bytes --]

On Wed, Aug 05, 2020 at 11:43:37AM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> There are spelling mistakes in DBG messages. Fix them.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

These comments can go entirely. i2c_detach is long history. And for
remove, we have debugging output in the driver core meanwhile.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-05 22:29 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman
  Cc: Nathan Lynch, linuxppc-dev, Cedric Le Goater,
	Thiago Jung Bauermann
In-Reply-To: <159660225263.15440.2633856149684894440@sif>

Quoting Michael Roth (2020-08-04 23:37:32)
> Quoting Michael Ellerman (2020-08-04 22:07:08)
> > Greg Kurz <groug@kaod.org> writes:
> > > On Tue, 04 Aug 2020 23:35:10 +1000
> > > Michael Ellerman <mpe@ellerman.id.au> wrote:
> > >> Spinning forever seems like a bad idea, but as has been demonstrated at
> > >> least twice now, continuing when we don't know the state of the other
> > >> CPU can lead to straight up crashes.
> > >> 
> > >> So I think I'm persuaded that it's preferable to have the kernel stuck
> > >> spinning rather than oopsing.
> > >> 
> > >
> > > +1
> > >
> > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> > >> first instinct is no, if we're stuck here for 20s a stack trace would be
> > >> good. But then we will probably hit that on some big and/or heavily
> > >> loaded machine.
> > >> 
> > >> So possibly we should call cond_resched() but have some custom logic in
> > >> the loop to print a warning if we are stuck for more than some
> > >> sufficiently long amount of time.
> > >
> > > How long should that be ?
> > 
> > Yeah good question.
> > 
> > I guess step one would be seeing how long it can take on the 384 vcpu
> > machine. And we can probably test on some other big machines.
> > 
> > Hopefully Nathan can give us some idea of how long he's seen it take on
> > large systems? I know he was concerned about the 20s timeout of the
> > softlockup detector.
> > 
> > Maybe a minute or two?
> 
> Hmm, so I took a stab at this where I called cond_resched() after
> every 5 seconds of polling and printed a warning at the same time (FWIW
> that doesn't seem to trigger any warnings on a loaded 96-core mihawk
> system using KVM running the 384vcpu unplug loop)
> 
> But it sounds like that's not quite what you had in mind. How frequently
> do you think we should call cond_resched()? Maybe after 25 iterations
> of polling smp_query_cpu_stopped() to keep original behavior somewhat
> similar?
> 
> I'll let the current patch run on the mihawk system overnight in the
> meantime so we at least have that data point, but would be good to
> know what things look like a large pHyp machine.

At one point I did manage to get the system in a state where unplug
operations were taking 1-2s, but still not enough to trigger any
5s warning, and I wasn't able to reproduce that in subsequent runs.

I also tried reworking the patch so that we print a warning and
cond_resched() after 200 ms to make sure that path gets executed, but
only managed to trigger the warning twice after a few hours.

So, if we print a warning after a couple minutes, that seems pretty
conservative as far as avoiding spurious warnings. And if we
cond_resched() after 25 loops of polling (~0.1 ms in the cases
that caused the original crash), that would avoid most of the
default RCU/lockup warnings.

But having a second timeout to trigger the cond_resched() after some
set interval like 2s seems more deterministic since we're less
susceptible to longer delays due to things like the RTAS calls
contending for QEMU's global mutex in the the KVM case.


> 
> Thanks!
> 
> > 
> > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> > >> 
> > >> This is not public.
> > >
> > > I'll have a look at changing that.
> > 
> > Thanks.
> > 
> > cheers

^ permalink raw reply

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-05 22:31 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman
  Cc: Nathan Lynch, linuxppc-dev, Cedric Le Goater,
	Thiago Jung Bauermann
In-Reply-To: <159666656828.15440.9097316124875217814@sif>

Quoting Michael Roth (2020-08-05 17:29:28)
> Quoting Michael Roth (2020-08-04 23:37:32)
> > Quoting Michael Ellerman (2020-08-04 22:07:08)
> > > Greg Kurz <groug@kaod.org> writes:
> > > > On Tue, 04 Aug 2020 23:35:10 +1000
> > > > Michael Ellerman <mpe@ellerman.id.au> wrote:
> > > >> Spinning forever seems like a bad idea, but as has been demonstrated at
> > > >> least twice now, continuing when we don't know the state of the other
> > > >> CPU can lead to straight up crashes.
> > > >> 
> > > >> So I think I'm persuaded that it's preferable to have the kernel stuck
> > > >> spinning rather than oopsing.
> > > >> 
> > > >
> > > > +1
> > > >
> > > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> > > >> first instinct is no, if we're stuck here for 20s a stack trace would be
> > > >> good. But then we will probably hit that on some big and/or heavily
> > > >> loaded machine.
> > > >> 
> > > >> So possibly we should call cond_resched() but have some custom logic in
> > > >> the loop to print a warning if we are stuck for more than some
> > > >> sufficiently long amount of time.
> > > >
> > > > How long should that be ?
> > > 
> > > Yeah good question.
> > > 
> > > I guess step one would be seeing how long it can take on the 384 vcpu
> > > machine. And we can probably test on some other big machines.
> > > 
> > > Hopefully Nathan can give us some idea of how long he's seen it take on
> > > large systems? I know he was concerned about the 20s timeout of the
> > > softlockup detector.
> > > 
> > > Maybe a minute or two?
> > 
> > Hmm, so I took a stab at this where I called cond_resched() after
> > every 5 seconds of polling and printed a warning at the same time (FWIW
> > that doesn't seem to trigger any warnings on a loaded 96-core mihawk
> > system using KVM running the 384vcpu unplug loop)
> > 
> > But it sounds like that's not quite what you had in mind. How frequently
> > do you think we should call cond_resched()? Maybe after 25 iterations
> > of polling smp_query_cpu_stopped() to keep original behavior somewhat
> > similar?
> > 
> > I'll let the current patch run on the mihawk system overnight in the
> > meantime so we at least have that data point, but would be good to
> > know what things look like a large pHyp machine.
> 
> At one point I did manage to get the system in a state where unplug
> operations were taking 1-2s, but still not enough to trigger any
> 5s warning, and I wasn't able to reproduce that in subsequent runs.
> 
> I also tried reworking the patch so that we print a warning and
> cond_resched() after 200 ms to make sure that path gets executed, but
> only managed to trigger the warning twice after a few hours.
> 
> So, if we print a warning after a couple minutes, that seems pretty
> conservative as far as avoiding spurious warnings. And if we
> cond_resched() after 25 loops of polling (~0.1 ms in the cases

~0.1 seconds I mean

> that caused the original crash), that would avoid most of the
> default RCU/lockup warnings.
> 
> But having a second timeout to trigger the cond_resched() after some
> set interval like 2s seems more deterministic since we're less
> susceptible to longer delays due to things like the RTAS calls
> contending for QEMU's global mutex in the the KVM case.
> 
> 
> > 
> > Thanks!
> > 
> > > 
> > > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> > > >> 
> > > >> This is not public.
> > > >
> > > > I'll have a look at changing that.
> > > 
> > > Thanks.
> > > 
> > > cheers

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/perf: consolidate GPCI hcall structs into asm/hvcall.h
From: Tyrel Datwyler @ 2020-08-05 22:37 UTC (permalink / raw)
  To: Scott Cheloha, linuxppc-dev; +Cc: Nathan Lynch
In-Reply-To: <20200727184605.2945095-1-cheloha@linux.ibm.com>

On 7/27/20 11:46 AM, Scott Cheloha wrote:
> The H_GetPerformanceCounterInfo (GPCI) hypercall input/output structs are
> useful to modules outside of perf/, so move them into asm/hvcall.h to live
> alongside the other powerpc hypercall structs.
> 
> Leave the perf-specific GPCI stuff in perf/hv-gpci.h.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/hvcall.h | 36 +++++++++++++++++++++++++++++++
>  arch/powerpc/perf/hv-gpci.c       |  9 --------
>  arch/powerpc/perf/hv-gpci.h       | 27 -----------------------
>  3 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index e90c073e437e..c338480b4551 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -527,6 +527,42 @@ struct hv_guest_state {
>  /* Latest version of hv_guest_state structure */
>  #define HV_GUEST_STATE_VERSION	1
>  
> +/*
> + * From the document "H_GetPerformanceCounterInfo Interface" v1.07
> + *
> + * H_GET_PERF_COUNTER_INFO argument
> + */
> +struct hv_get_perf_counter_info_params {
> +	__be32 counter_request; /* I */
> +	__be32 starting_index;  /* IO */
> +	__be16 secondary_index; /* IO */
> +	__be16 returned_values; /* O */
> +	__be32 detail_rc; /* O, only needed when called via *_norets() */
> +
> +	/*
> +	 * O, size each of counter_value element in bytes, only set for version
> +	 * >= 0x3
> +	 */
> +	__be16 cv_element_size;
> +
> +	/* I, 0 (zero) for versions < 0x3 */
> +	__u8 counter_info_version_in;
> +
> +	/* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
> +	__u8 counter_info_version_out;
> +	__u8 reserved[0xC];
> +	__u8 counter_value[];
> +} __packed;
> +
> +#define HGPCI_REQ_BUFFER_SIZE	4096
> +#define HGPCI_MAX_DATA_BYTES \
> +	(HGPCI_REQ_BUFFER_SIZE - sizeof(struct hv_get_perf_counter_info_params))
> +
> +struct hv_gpci_request_buffer {
> +	struct hv_get_perf_counter_info_params params;
> +	uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> +} __packed;
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_HVCALL_H */
> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> index 6884d16ec19b..1667315b82e9 100644
> --- a/arch/powerpc/perf/hv-gpci.c
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -123,17 +123,8 @@ static const struct attribute_group *attr_groups[] = {
>  	NULL,
>  };
>  
> -#define HGPCI_REQ_BUFFER_SIZE	4096
> -#define HGPCI_MAX_DATA_BYTES \
> -	(HGPCI_REQ_BUFFER_SIZE - sizeof(struct hv_get_perf_counter_info_params))
> -
>  static DEFINE_PER_CPU(char, hv_gpci_reqb[HGPCI_REQ_BUFFER_SIZE]) __aligned(sizeof(uint64_t));
>  
> -struct hv_gpci_request_buffer {
> -	struct hv_get_perf_counter_info_params params;
> -	uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> -} __packed;
> -
>  static unsigned long single_gpci_request(u32 req, u32 starting_index,
>  		u16 secondary_index, u8 version_in, u32 offset, u8 length,
>  		u64 *value)
> diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h
> index a3053eda5dcc..4d108262bed7 100644
> --- a/arch/powerpc/perf/hv-gpci.h
> +++ b/arch/powerpc/perf/hv-gpci.h
> @@ -2,33 +2,6 @@
>  #ifndef LINUX_POWERPC_PERF_HV_GPCI_H_
>  #define LINUX_POWERPC_PERF_HV_GPCI_H_
>  
> -#include <linux/types.h>
> -
> -/* From the document "H_GetPerformanceCounterInfo Interface" v1.07 */
> -
> -/* H_GET_PERF_COUNTER_INFO argument */
> -struct hv_get_perf_counter_info_params {
> -	__be32 counter_request; /* I */
> -	__be32 starting_index;  /* IO */
> -	__be16 secondary_index; /* IO */
> -	__be16 returned_values; /* O */
> -	__be32 detail_rc; /* O, only needed when called via *_norets() */
> -
> -	/*
> -	 * O, size each of counter_value element in bytes, only set for version
> -	 * >= 0x3
> -	 */
> -	__be16 cv_element_size;
> -
> -	/* I, 0 (zero) for versions < 0x3 */
> -	__u8 counter_info_version_in;
> -
> -	/* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
> -	__u8 counter_info_version_out;
> -	__u8 reserved[0xC];
> -	__u8 counter_value[];
> -} __packed;
> -

Hmm, this pretty much guts this header which normally I'd be inclined to suggest
moving the whole thing. The remainder of that header for context:

>  /*
>   * counter info version => fw version/reference (spec version)
>   *
>
 * 8 => power8 (1.07)
 * [7 is skipped by spec 1.07]
 * 6 => TLBIE (1.07)
 * 5 => v7r7m0.phyp (1.05)
 * [4 skipped]
 * 3 => v7r6m0.phyp (?)
 * [1,2 skipped]
 * 0 => v7r{2,3,4}m0.phyp (?)
 */
#define COUNTER_INFO_VERSION_CURRENT 0x8

/* capability mask masks. */
enum {
        HV_GPCI_CM_GA = (1 << 7),
        HV_GPCI_CM_EXPANDED = (1 << 6),
        HV_GPCI_CM_LAB = (1 << 5)
};

#define REQUEST_FILE "../hv-gpci-requests.h"
#define NAME_LOWER hv_gpci
#define NAME_UPPER HV_GPCI
#include "req-gen/perf.h"
#undef REQUEST_FILE
#undef NAME_LOWER
#undef NAME_UPPER

The side effect of moving seems that we would have to drag "hv-gpci-requests.h"
along as well. So, maybe its best just moving the struct as you've done so it
can be used by code outside of perf.

-Tyrel

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score
From: Tyrel Datwyler @ 2020-08-05 22:42 UTC (permalink / raw)
  To: Scott Cheloha, linuxppc-dev; +Cc: Nathan Lynch
In-Reply-To: <20200727184605.2945095-2-cheloha@linux.ibm.com>

On 7/27/20 11:46 AM, Scott Cheloha wrote:
> The H_GetPerformanceCounterInfo (GPCI) PHYP hypercall has a subcall,
> Affinity_Domain_Info_By_Partition, which returns, among other things,
> a "partition affinity score" for a given LPAR.  This score, a value on
> [0-100], represents the processor-memory affinity for the LPAR in
> question.  A score of 0 indicates the worst possible affinity while a
> score of 100 indicates perfect affinity.  The score can be used to
> reason about performance.
> 
> This patch adds the score for the local LPAR to the lparcfg procfile
> under a new 'partition_affinity_score' key.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>

I was hoping Michael would chime in the first time around on this patch series
about adding another key/value pair to lparcfg. So, barring a NACK from mpe:

Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>

> ---
>  arch/powerpc/platforms/pseries/lparcfg.c | 35 ++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
> index b8d28ab88178..e278390ab28d 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -136,6 +136,39 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
>  	return rc;
>  }
>  
> +static void show_gpci_data(struct seq_file *m)
> +{
> +	struct hv_gpci_request_buffer *buf;
> +	unsigned int affinity_score;
> +	long ret;
> +
> +	buf = kmalloc(sizeof(*buf), GFP_KERNEL);
> +	if (buf == NULL)
> +		return;
> +
> +	/*
> +	 * Show the local LPAR's affinity score.
> +	 *
> +	 * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
> +	 * The score is at byte 0xB in the output buffer.
> +	 */
> +	memset(&buf->params, 0, sizeof(buf->params));
> +	buf->params.counter_request = cpu_to_be32(0xB1);
> +	buf->params.starting_index = cpu_to_be32(-1);	/* local LPAR */
> +	buf->params.counter_info_version_in = 0x5;	/* v5+ for score */
> +	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
> +				 sizeof(*buf));
> +	if (ret != H_SUCCESS) {
> +		pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
> +			 ret, be32_to_cpu(buf->params.detail_rc));
> +		goto out;
> +	}
> +	affinity_score = buf->bytes[0xB];
> +	seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
> +out:
> +	kfree(buf);
> +}
> +
>  static unsigned h_pic(unsigned long *pool_idle_time,
>  		      unsigned long *num_procs)
>  {
> @@ -487,6 +520,8 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
>  			   partition_active_processors * 100);
>  	}
>  
> +	show_gpci_data(m);
> +
>  	seq_printf(m, "partition_active_processors=%d\n",
>  		   partition_active_processors);
>  
> 


^ permalink raw reply

* Re: linux-next: manual merge of the char-misc tree with the powerpc tree
From: Stephen Rothwell @ 2020-08-05 23:40 UTC (permalink / raw)
  To: Greg KH, Arnd Bergmann, Michael Ellerman, PowerPC
  Cc: Linux Next Mailing List, Lee Jones, Linux Kernel Mailing List,
	Alastair D'Silva
In-Reply-To: <20200803165546.6ab5ab6f@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

Hi all,

On Mon, 3 Aug 2020 16:55:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the char-misc tree got a conflict in:
> 
>   drivers/misc/ocxl/config.c
> 
> between commit:
> 
>   3591538a31af ("ocxl: Address kernel doc errors & warnings")
> 
> from the powerpc tree and commit:
> 
>   28fc491e9be6 ("misc: ocxl: config: Provide correct formatting to function headers")
> 
> from the char-misc tree.
> 
> I fixed it up (as it was just differences in comments, I just arbitrarily
> chose the latter version) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

This is now a conflict between the powerpc tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay
From: Joel Stanley @ 2020-08-05 23:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Tyrel Datwyler, linuxppc-dev, Gautham R Shenoy
In-Reply-To: <87eeomzknj.fsf@mpe.ellerman.id.au>

On Tue, 4 Aug 2020 at 11:59, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Joel Stanley <joel@jms.id.au> writes:
> > It's not done anything for a long time. Save the percpu variable, and
> > emit a warning to remind users to not expect it to do anything.
> >
> > Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> > Cc: stable@vger.kernel.org # v3.14
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > --
> > v2:
> >  Use pr_warn instead of WARN
> >  Reword and print proccess name with pid in message
> >  Leave CPU_FTR_SMT test in
> >  Add Fixes line
> >
> > mpe, if you don't agree then feel free to drop the cc stable.
> >
> > Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> > as the online/offline loop that ppc64_cpu runs is slow.
>
> Hmm, that is pretty spammy.
>
> I know I suggested the ratelimit, but I was thinking it would print once
> for each ppc64_cpu invocation, not ~30 times.
>
> How about pr_warn_once(), that should still be sufficient for people to
> notice if they're looking for it.

I think that's a reasonable suggestion.

>
> ...
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> > index 571b3259697e..ba6d4cee19ef 100644
> > --- a/arch/powerpc/kernel/sysfs.c
> > +++ b/arch/powerpc/kernel/sysfs.c
> > @@ -32,29 +32,26 @@
> >
> >  static DEFINE_PER_CPU(struct cpu, cpu_devices);
> >
> > -/*
> > - * SMT snooze delay stuff, 64-bit only for now
> > - */
> > -
> >  #ifdef CONFIG_PPC64
> >
> > -/* Time in microseconds we delay before sleeping in the idle loop */
> > -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> > +/*
> > + * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
> > + * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
> > + * 2014:
> > + *
> > + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
> > + *  up the kernel code."
> > + *
> > + * At some point in the future this code should be removed.
> > + */
> >
> >  static ssize_t store_smt_snooze_delay(struct device *dev,
> >                                     struct device_attribute *attr,
> >                                     const char *buf,
> >                                     size_t count)
> >  {
> > -     struct cpu *cpu = container_of(dev, struct cpu, dev);
> > -     ssize_t ret;
> > -     long snooze;
> > -
> > -     ret = sscanf(buf, "%ld", &snooze);
> > -     if (ret != 1)
> > -             return -EINVAL;
> > -
> > -     per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> > +     pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> > +                         current->comm, current->pid);
>
> Can we make this:
>
>         "%s (%d) stored to unsupported smt_snooze_delay, which has no effect.\n",

ack

>
>
> >       return count;
> >  }
> >
> > @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
> >                                    struct device_attribute *attr,
> >                                    char *buf)
> >  {
> > -     struct cpu *cpu = container_of(dev, struct cpu, dev);
> > -
> > -     return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> > +     pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> > +                         current->comm, current->pid);
>
> It has as much effect as it ever did :)
>
> So maybe:
>
>         "%s (%d) read from unsupported smt_snooze_delay.\n",
>
>
> I can do those changes when applying if you like rather than making you
> do a v3.

Yes please! Your suggested changes lgtm.

Cheers,

Joel

^ permalink raw reply

* Re: [PATCH v2 1/5] powerpc/mm: Introduce temporary mm
From: Daniel Axtens @ 2020-08-06  1:27 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200709040316.12789-2-cmr@informatik.wtf>

Hi Chris,
  
>  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
>  bool ppc_breakpoint_available(void);
>  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
>  extern void do_send_trap(struct pt_regs *regs, unsigned long address,
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 1a474f6b1992..9269c7c7b04e 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -10,6 +10,7 @@
>  #include <asm/mmu.h>	
>  #include <asm/cputable.h>
>  #include <asm/cputhreads.h>
> +#include <asm/debug.h>
>  
>  /*
>   * Most if the context management is out of line
> @@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
>  	return 0;
>  }
>  
> +struct temp_mm {
> +	struct mm_struct *temp;
> +	struct mm_struct *prev;
> +	bool is_kernel_thread;
> +	struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> +};

This is on the nitpicky end, but I wonder if this should be named
temp_mm, or should be labelled something else to capture its broader
purpose as a context for code patching? I'm thinking that a store of
breakpoints is perhaps unusual in a memory-managment structure?

I don't have a better suggestion off the top of my head and I'm happy
for you to leave it, I just wanted to flag it as a possible way we could
be clearer.

> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +	temp_mm->temp = mm;
> +	temp_mm->prev = NULL;
> +	temp_mm->is_kernel_thread = false;
> +	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	temp_mm->is_kernel_thread = current->mm == NULL;
> +	if (temp_mm->is_kernel_thread)
> +		temp_mm->prev = current->active_mm;

You don't seem to restore active_mm below. I don't know what active_mm
does, so I don't know if this is a problem.

> +	else
> +		temp_mm->prev = current->mm;
> +
> +	/*
> +	 * Hash requires a non-NULL current->mm to allocate a userspace address
> +	 * when handling a page fault. Does not appear to hurt in Radix either.
> +	 */
> +	current->mm = temp_mm->temp;
> +	switch_mm_irqs_off(NULL, temp_mm->temp, current);
> +
> +	if (ppc_breakpoint_available()) {

I wondered if this could be changed during a text-patching operation.
AIUI, it potentially can on a P9 via "dawr_enable_dangerous" in debugfs.

I don't know if that's a problem. My concern is that you could turn off
breakpoints, call 'use_temporary_mm', then turn them back on again
before 'unuse_temporary_mm' and get a breakpoint while that can access
the temporary mm. Is there something else that makes that safe?
disabling IRQs maybe?

> +		struct arch_hw_breakpoint null_brk = {0};
> +		int i = 0;
> +
> +		for (; i < nr_wp_slots(); ++i) {

super nitpicky, and I'm not sure if this is actually documented, but I'd
usually see this written as:

for (i = 0; i < nr_wp_slots(); i++) {

Not sure if there's any reason that it _shouldn't_ be written the way
you've written it (and I do like initialising the variable when it's
defined!), I'm just not used to it. (Likewise with the unuse function.)

> +			__get_breakpoint(i, &temp_mm->brk[i]);
> +			if (temp_mm->brk[i].type != 0)
> +				__set_breakpoint(i, &null_brk);
> +		}
> +	}
> +}
> +

Kind regards,
Daniel

> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	if (temp_mm->is_kernel_thread)
> +		current->mm = NULL;
> +	else
> +		current->mm = temp_mm->prev;
> +	switch_mm_irqs_off(NULL, temp_mm->prev, current);
> +
> +	if (ppc_breakpoint_available()) {
> +		int i = 0;
> +
> +		for (; i < nr_wp_slots(); ++i)
> +			if (temp_mm->brk[i].type != 0)
> +				__set_breakpoint(i, &temp_mm->brk[i]);
> +	}
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4650b9bb217f..b6c123bf5edd 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -824,6 +824,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
>  	return 0;
>  }
>  
> +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> +{
> +	memcpy(brk, this_cpu_ptr(&current_brk[nr]), sizeof(*brk));
> +}
> +
>  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
>  {
>  	memcpy(this_cpu_ptr(&current_brk[nr]), brk, sizeof(*brk));
> -- 
> 2.27.0

^ permalink raw reply

* Re: How would I code a Write to a proc file from within the kernel without reading anything from user space?
From: Michael Ellerman @ 2020-08-06  1:30 UTC (permalink / raw)
  To: thefirst ECS, linuxppc-dev
In-Reply-To: <32112832.26959.1596664366002.JavaMail.Dan@DanHP>

thefirst ECS <ecs_dn@yahoo.com> writes:
> In order to help debug a certain discrepancy, I need to "simulate" an "echo 1 > /proc/file" but doing it from kernel even when root file system is unavailable. 
>
> I have simulated it just fine via call_usermodehelper (with argv etc of "echo 1 > /proc/file") from inside the kernel which triggers:
> [ee897ec0] [c0122704] proc_reg_write+0x80/0xb4
> [ee897ef0] [c00d3b7c] vfs_write+0xb4/0x184
>
> just as I had wanted. But now I need to trigger the "vfs_write" and "proc_reg_write" but without using call_usermodehelper since I will be doing it when root "/" is unavailable and so I can no longer access /bin/echo and call the usermodehelper etc. So my question is how can I do that in kernel?
>
> Not sure if I'm supposed to look in fs read_write.c and fs.h and write a method based on those or if there's some other way etc.

You don't say which proc file, which would be useful information.

I'll assume it's /proc/sysrq-trigger based on your previous mail.

Rather than trying to invoke the write path from inside the kernel, the
simplest option is to just call __handle_sysrq() directly.

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox