* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Vinod Koul @ 2020-08-05 13:19 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Song Liu, Alexei Starovoitov, Shawn Guo, Fabio Estevam,
Daniel Borkmann, John Fastabend, Zhang Wei, NXP Linux Team,
Yonghong Song, Andrii Nakryiko, Sascha Hauer, KP Singh,
Dan Williams, linux-arm-kernel, netdev, linux-kernel, Li Yang,
Pengutronix Kernel Team, dmaengine, bpf, linuxppc-dev,
Martin KaFai Lau
In-Reply-To: <20200727203413.GA6245@embeddedor>
On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 2c508ee672b9..9b69716172a4 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>
> if (_state(thrd) == PL330_STATE_KILLING)
> UNTIL(thrd, PL330_STATE_STOPPED)
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_FAULTING:
> _stop(thrd);
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_KILLING:
> case PL330_STATE_COMPLETING:
> UNTIL(thrd, PL330_STATE_STOPPED)
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_STOPPED:
> return _trigger(thrd);
> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>
> switch (direction) {
> case DMA_MEM_TO_MEM:
> - /* fall through */
> case DMA_MEM_TO_DEV:
> off += _emit_LD(dry_run, &buf[off], cond);
> break;
> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>
> switch (direction) {
> case DMA_MEM_TO_MEM:
> - /* fall through */
> case DMA_DEV_TO_MEM:
> off += _emit_ST(dry_run, &buf[off], cond);
> break;
> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>
> switch (pxs->desc->rqtype) {
> case DMA_MEM_TO_DEV:
> - /* fall through */
> case DMA_DEV_TO_MEM:
> off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
> cond);
> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>
> switch (pxs->desc->rqtype) {
> case DMA_MEM_TO_DEV:
> - /* fall through */
replacement missed here and above few
--
~Vinod
^ permalink raw reply
* [PATCH v2] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 13:35 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>
All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.
Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.
This should prevent some soft lockups when starting large guests
Code has meaning only if CONFIG_MEMORY_HOTPLUG is set, otherwise the nid
field is not present in the drmem_lmb structure.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/mm/drmem.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..c11b6ec99ea3 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -402,6 +402,9 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
const __be32 *p;
u32 i, j, lmb_sets;
int lmb_index;
+#ifdef CONFIG_MEMORY_HOTPLUG
+ struct drmem_lmb *first = NULL;
+#endif
lmb_sets = of_read_number(prop++, 1);
if (lmb_sets == 0)
@@ -426,6 +429,15 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
for (i = 0; i < lmb_sets; i++) {
read_drconf_v2_cell(&dr_cell, &p);
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Fetch the NUMA node id for the fist set or if the
+ * associativity index is different from the previous set.
+ */
+ if (first && dr_cell.aa_index != first->aa_index)
+ first = NULL;
+#endif
+
for (j = 0; j < dr_cell.seq_lmbs; j++) {
lmb = &drmem_info->lmbs[lmb_index++];
@@ -438,7 +450,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
- lmb_set_nid(lmb);
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * All the LMB in the set share the same NUMA
+ * associativity property. So read that node only once.
+ */
+ if (!first) {
+ lmb_set_nid(lmb);
+ first = lmb;
+ } else {
+ lmb->nid = first->nid;
+ }
+#endif
}
}
}
--
2.28.0
^ permalink raw reply related
* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 13:35 UTC (permalink / raw)
To: Michael Ellerman
Cc: Christophe Leroy, nathanl, arnd, linux-kernel,
Tulio Magno Quites Machado Filho, Paul Mackerras, luto,
linux-arch, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <87wo2dy5in.fsf@mpe.ellerman.id.au>
Hi!
On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack
> > frame whenever it has anything to same.
>
> Yeah OK that would explain it.
>
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> > fbb60: 94 21 ff e0 stwu r1,-32(r1)
This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).
> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> > .cfi_startproc
> > PPC_STLU r1, -STACK_FRAME_OVERHEAD(r1)
> > mflr r0
> > + PPC_STLU r1, -STACK_FRAME_OVERHEAD(r1)
> > .cfi_register lr, r0
>
> The cfi_register should come directly after the mflr I think.
That is the idiomatic way to write it, and most obviously correct. But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding. Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)
> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if
> > you think that's needed, just let's do it:
>
> Hmm, true.
>
> The compiler will use the toc for globals (and possibly also for large
> constants?)
And anything else it bloody well wants to, yeah :-)
> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.
Yes.
> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.
If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)
> So yeah we should probably do as below.
[ snip ]
Looks good yes.
Segher
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 14:03 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
>
> 18: 35 25 ff e0 addic. r9,r5,-32
> 1c: 41 80 00 10 blt 2c <shift+0x14>
> 20: 7c 64 4c 30 srw r4,r3,r9
> 24: 38 60 00 00 li r3,0
> ...
> 2c: 54 69 08 3c rlwinm r9,r3,1,0,30
> 30: 21 45 00 1f subfic r10,r5,31
> 34: 7c 84 2c 30 srw r4,r4,r5
> 38: 7d 29 50 30 slw r9,r9,r10
> 3c: 7c 63 2c 30 srw r3,r3,r5
> 40: 7d 24 23 78 or r4,r9,r4
>
> In our case the shift is always <= 32. In addition, the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
>
> With the patch, we get:
> 0: 21 25 00 20 subfic r9,r5,32
> 4: 7c 69 48 30 slw r9,r3,r9
> 8: 7c 84 2c 30 srw r4,r4,r5
> c: 7d 24 23 78 or r4,r9,r4
> 10: 7c 63 2c 30 srw r3,r3,r5
See below. Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).
> +/*
> + * The macros sets two stack frames, one for the caller and one for the callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */
If the caller follows the ABI, there always is a stack frame. So what
is going on?
> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */
It cannot optimise it because it does not know shift < 32. The code
below is incorrect for shift equal to 32, fwiw.
> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> +{
> + return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> + u32 hi = ns >> 32;
> + u32 lo = ns;
> +
> + lo >>= shift;
> + lo |= hi << (32 - shift);
> + hi >>= shift;
> + if (likely(hi == 0))
> + return lo;
Removing these two lines shouldn't change generated object code? Or not
make it worse, at least.
> + return ((u64)hi << 32) | lo;
> +}
What does the compiler do for just
static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
return ns >> (shift & 31);
}
?
Segher
^ permalink raw reply
* [PATCH] powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
On BOOK3S_32, when we have modules and strict kernel RWX, modules
are not in vmalloc space but in a dedicated segment that is
below PAGE_OFFSET.
So KASAN_SHADOW_START must take it into account.
MODULES_VADDR can't be used because it is not defined yet
in kasan.h
Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/kasan.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index d635b96c7ea6..7355ed05e65e 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -15,11 +15,18 @@
#ifndef __ASSEMBLY__
#include <asm/page.h>
+#include <linux/sizes.h>
#define KASAN_SHADOW_SCALE_SHIFT 3
+#if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_MODULES) && defined(CONFIG_STRICT_KERNEL_RWX)
+#define KASAN_KERN_START ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
+#else
+#define KASAN_KERN_START PAGE_OFFSET
+#endif
+
#define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
- (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
+ (KASAN_KERN_START >> KASAN_SHADOW_SCALE_SHIFT))
#define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
--
2.25.0
^ permalink raw reply related
* [PATCH] powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
When MODULES_VADDR is defined, is_module_segment() shall check the
address against it instead of checking agains VMALLOC_START.
Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/book3s32/mmu.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index c0162911f6cb..82ae9e06a773 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -191,10 +191,17 @@ static bool is_module_segment(unsigned long addr)
{
if (!IS_ENABLED(CONFIG_MODULES))
return false;
+#ifdef MODULES_VADDR
+ if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
+ return false;
+ if (addr >= ALIGN(MODULES_END, SZ_256M))
+ return false;
+#else
if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
return false;
if (addr >= ALIGN(VMALLOC_END, SZ_256M))
return false;
+#endif
return true;
}
--
2.25.0
^ permalink raw reply related
* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-05 15:52 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200722022422.825197-1-mpe@ellerman.id.au>
Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
> The assembler says:
> arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
I get exactly the same with head_32.S, for the exact same reason.
Christophe
>
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
>
> Fix it to use a literal 0, the generated code is identical.
>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> arch/powerpc/kernel/head_40x.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 926bfa73586a..5b282d9965a5 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -620,7 +620,7 @@ _ENTRY(saved_ksp_limit)
> ori r6, r6, swapper_pg_dir@l
> lis r5, abatron_pteptrs@h
> ori r5, r5, abatron_pteptrs@l
> - stw r5, 0xf0(r0) /* Must match your Abatron config file */
> + stw r5, 0xf0(0) /* Must match your Abatron config file */
> tophys(r5,r5)
> stw r6, 0(r5)
>
>
^ permalink raw reply
* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Tyrel Datwyler @ 2020-08-05 16:14 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20200805131922.GZ12965@vkoul-mobl>
On 8/5/20 6:19 AM, Vinod Koul wrote:
> On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
>
>> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
>> index 2c508ee672b9..9b69716172a4 100644
>> --- a/drivers/dma/pl330.c
>> +++ b/drivers/dma/pl330.c
>> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>>
>> if (_state(thrd) == PL330_STATE_KILLING)
>> UNTIL(thrd, PL330_STATE_STOPPED)
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_FAULTING:
>> _stop(thrd);
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_KILLING:
>> case PL330_STATE_COMPLETING:
>> UNTIL(thrd, PL330_STATE_STOPPED)
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_STOPPED:
>> return _trigger(thrd);
>> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>>
>> switch (direction) {
>> case DMA_MEM_TO_MEM:
>> - /* fall through */
>> case DMA_MEM_TO_DEV:
>> off += _emit_LD(dry_run, &buf[off], cond);
>> break;
>> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>>
>> switch (direction) {
>> case DMA_MEM_TO_MEM:
>> - /* fall through */
>> case DMA_DEV_TO_MEM:
>> off += _emit_ST(dry_run, &buf[off], cond);
>> break;
>> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>>
>> switch (pxs->desc->rqtype) {
>> case DMA_MEM_TO_DEV:
>> - /* fall through */
>> case DMA_DEV_TO_MEM:
>> off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>> cond);
>> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>>
>> switch (pxs->desc->rqtype) {
>> case DMA_MEM_TO_DEV:
>> - /* fall through */
>
> replacement missed here and above few
>
Its not obvious via most of the documentation, but a case label followed
immediately by another case label is the only allowed implicit fall through as
it is obvious to the eye that there is no postceding statement.
For example the following is legal:
case FOO:
/* fallthrough */ (or fallthrough;)
case BAR:
is converted to:
case FOO:
case BAR:
I would assume the justification is that it is common to have a switch statement
where several case statements fall directly through to a single code block and
annotating each case label seems like overkill.
switch (vhost->state) {
case IBMVFC_LINK_DEAD:
case IBMVFC_HOST_OFFLINE:
result = DID_NO_CONNECT << 16;
break;
case IBMVFC_NO_CRQ:
case IBMVFC_INITIALIZING:
case IBMVFC_HALTED:
case IBMVFC_LINK_DOWN:
result = DID_REQUEUE << 16;
break;
case IBMVFC_ACTIVE:
result = 0;
break;
}
Regards,
Tyrel
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:40 UTC (permalink / raw)
To: Segher Boessenkool
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>
Hi,
On 08/05/2020 02:03 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * powerpc specific delta calculation.
>> + *
>> + * This variant removes the masking of the subtraction because the
>> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
>> + * which would result in a pointless operation. The compiler cannot
>> + * optimize it away as the mask comes from the vdso data and is not compile
>> + * time constant.
>> + */
>
> It cannot optimise it because it does not know shift < 32. The code
> below is incorrect for shift equal to 32, fwiw.
Is there a way to tell it ?
>
>> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
>> +{
>> + return (cycles - last) * mult;
>> +}
>> +#define vdso_calc_delta vdso_calc_delta
>> +
>> +#ifndef __powerpc64__
>> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
>> +{
>> + u32 hi = ns >> 32;
>> + u32 lo = ns;
>> +
>> + lo >>= shift;
>> + lo |= hi << (32 - shift);
>> + hi >>= shift;
>
>
>> + if (likely(hi == 0))
>> + return lo;
>
> Removing these two lines shouldn't change generated object code? Or not
> make it worse, at least.
I remember it made noticeable difference allthough I can't remember the
details. See below with GCC 10.1. At least we see that with those two
lines, GCC only sets a 16 bytes stack frame. Without those lines it sets
a 32 bytes stack frame and seems to save some values for no reason.
With the two lines:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 04 bgt 7b4 <__c_kernel_clock_gettime+0x108>
6b4: 39 40 00 01 li r10,1
6b8: 7d 4a 18 30 slw r10,r10,r3
6bc: 71 47 08 83 andi. r7,r10,2179
6c0: 41 82 01 2c beq 7ec <__c_kernel_clock_gettime+0x140>
6c4: 94 21 ff f0 stwu r1,-16(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 e1 00 0c stw r31,12(r1)
6d0: 7d 85 1a 14 add r12,r5,r3
6d4: 80 05 00 00 lwz r0,0(r5)
6d8: 70 06 00 01 andi. r6,r0,1
6dc: 40 82 00 d4 bne 7b0 <__c_kernel_clock_gettime+0x104>
6e0: 7d 4d 42 e6 mftbu r10
6e4: 7d 6c 42 e6 mftb r11
6e8: 7c ed 42 e6 mftbu r7
6ec: 7c 0a 38 40 cmplw r10,r7
6f0: 40 82 ff f0 bne 6e0 <__c_kernel_clock_gettime+0x34>
6f4: 80 e5 00 0c lwz r7,12(r5)
6f8: 80 65 00 08 lwz r3,8(r5)
6fc: 7c e7 58 10 subfc r7,r7,r11
700: 81 65 00 18 lwz r11,24(r5)
704: 7d 43 51 10 subfe r10,r3,r10
708: 7f e7 58 16 mulhwu r31,r7,r11
70c: 7d 4a 59 d6 mullw r10,r10,r11
710: 7c e7 59 d6 mullw r7,r7,r11
714: 80 6c 00 2c lwz r3,44(r12)
718: 81 6c 00 28 lwz r11,40(r12)
71c: 7c e7 18 14 addc r7,r7,r3
720: 7d 4a fa 14 add r10,r10,r31
724: 80 65 00 1c lwz r3,28(r5)
728: 7d 4a 59 14 adde r10,r10,r11
72c: 7c e7 1c 30 srw r7,r7,r3
730: 21 63 00 20 subfic r11,r3,32
734: 7d 43 1c 31 srw. r3,r10,r3
738: 7d 4a 58 30 slw r10,r10,r11
73c: 7d 49 3b 78 or r9,r10,r7
740: 39 00 00 00 li r8,0
744: 40 82 00 84 bne 7c8 <__c_kernel_clock_gettime+0x11c>
748: 80 6c 00 24 lwz r3,36(r12)
74c: 81 45 00 00 lwz r10,0(r5)
750: 7c 00 50 40 cmplw r0,r10
754: 40 a2 ff 80 bne 6d4 <__c_kernel_clock_gettime+0x28>
758: 2c 08 00 00 cmpwi r8,0
75c: 41 82 00 7c beq 7d8 <__c_kernel_clock_gettime+0x12c>
760: 3c e0 c4 65 lis r7,-15259
764: 3c 00 3b 9a lis r0,15258
768: 60 e7 36 00 ori r7,r7,13824
76c: 60 00 c9 ff ori r0,r0,51711
770: 7c a9 38 14 addc r5,r9,r7
774: 7d 48 01 d4 addme r10,r8
778: 2c 0a 00 00 cmpwi r10,0
77c: 7d 48 53 78 mr r8,r10
780: 7c a9 2b 78 mr r9,r5
784: 38 c6 00 01 addi r6,r6,1
788: 40 82 ff e8 bne 770 <__c_kernel_clock_gettime+0xc4>
78c: 7c 05 00 40 cmplw r5,r0
790: 41 81 ff e0 bgt 770 <__c_kernel_clock_gettime+0xc4>
794: 7c 66 18 14 addc r3,r6,r3
798: 90 64 00 00 stw r3,0(r4)
79c: 91 24 00 04 stw r9,4(r4)
7a0: 38 60 00 00 li r3,0
7a4: 83 e1 00 0c lwz r31,12(r1)
7a8: 38 21 00 10 addi r1,r1,16
7ac: 4e 80 00 20 blr
7b0: 4b ff ff 24 b 6d4 <__c_kernel_clock_gettime+0x28>
7b4: 38 00 00 f6 li r0,246
7b8: 44 00 00 02 sc
7bc: 40 a3 00 08 bns 7c4 <__c_kernel_clock_gettime+0x118>
7c0: 7c 63 00 d0 neg r3,r3
7c4: 4e 80 00 20 blr
7c8: 7d 2a 4b 78 mr r10,r9
7cc: 7c 68 1b 78 mr r8,r3
7d0: 7d 49 53 78 mr r9,r10
7d4: 4b ff ff 74 b 748 <__c_kernel_clock_gettime+0x9c>
7d8: 3d 40 3b 9a lis r10,15258
7dc: 61 4a c9 ff ori r10,r10,51711
7e0: 7c 09 50 40 cmplw r9,r10
7e4: 41 81 ff 7c bgt 760 <__c_kernel_clock_gettime+0xb4>
7e8: 4b ff ff b0 b 798 <__c_kernel_clock_gettime+0xec>
7ec: 71 47 00 60 andi. r7,r10,96
7f0: 54 69 20 36 rlwinm r9,r3,4,0,27
7f4: 7d 25 4a 14 add r9,r5,r9
7f8: 40 82 00 14 bne 80c <__c_kernel_clock_gettime+0x160>
7fc: 71 4a 00 10 andi. r10,r10,16
800: 41 a2 ff b4 beq 7b4 <__c_kernel_clock_gettime+0x108>
804: 38 a5 00 f0 addi r5,r5,240
808: 4b ff fe bc b 6c4 <__c_kernel_clock_gettime+0x18>
80c: 81 05 00 00 lwz r8,0(r5)
810: 71 0a 00 01 andi. r10,r8,1
814: 40 a2 ff f8 bne 80c <__c_kernel_clock_gettime+0x160>
818: 80 69 00 24 lwz r3,36(r9)
81c: 81 49 00 2c lwz r10,44(r9)
820: 80 e5 00 00 lwz r7,0(r5)
824: 7c 08 38 40 cmplw r8,r7
828: 40 a2 ff e4 bne 80c <__c_kernel_clock_gettime+0x160>
82c: 90 64 00 00 stw r3,0(r4)
830: 91 44 00 04 stw r10,4(r4)
834: 38 60 00 00 li r3,0
838: 4e 80 00 20 blr
Without the two lines:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 14 bgt 7c4 <__c_kernel_clock_gettime+0x118>
6b4: 39 20 00 01 li r9,1
6b8: 7d 29 18 30 slw r9,r9,r3
6bc: 71 2a 08 83 andi. r10,r9,2179
6c0: 41 82 01 2c beq 7ec <__c_kernel_clock_gettime+0x140>
6c4: 94 21 ff e0 stwu r1,-32(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 81 00 10 stw r28,16(r1)
6d0: 93 a1 00 14 stw r29,20(r1)
6d4: 93 c1 00 18 stw r30,24(r1)
6d8: 93 e1 00 1c stw r31,28(r1)
6dc: 7c 65 1a 14 add r3,r5,r3
6e0: 81 85 00 00 lwz r12,0(r5)
6e4: 71 87 00 01 andi. r7,r12,1
6e8: 40 82 00 d8 bne 7c0 <__c_kernel_clock_gettime+0x114>
6ec: 7d 2d 42 e6 mftbu r9
6f0: 7c cc 42 e6 mftb r6
6f4: 7d 4d 42 e6 mftbu r10
6f8: 7c 09 50 40 cmplw r9,r10
6fc: 40 82 ff f0 bne 6ec <__c_kernel_clock_gettime+0x40>
700: 83 83 00 28 lwz r28,40(r3)
704: 83 a3 00 2c lwz r29,44(r3)
708: 81 65 00 08 lwz r11,8(r5)
70c: 81 05 00 0c lwz r8,12(r5)
710: 83 c5 00 18 lwz r30,24(r5)
714: 83 e5 00 1c lwz r31,28(r5)
718: 80 03 00 24 lwz r0,36(r3)
71c: 81 45 00 00 lwz r10,0(r5)
720: 7c 0c 50 40 cmplw r12,r10
724: 40 a2 ff bc bne 6e0 <__c_kernel_clock_gettime+0x34>
728: 7d 48 30 10 subfc r10,r8,r6
72c: 7c cb 49 10 subfe r6,r11,r9
730: 7c c6 f1 d6 mullw r6,r6,r30
734: 7d 2a f0 16 mulhwu r9,r10,r30
738: 7d 4a f1 d6 mullw r10,r10,r30
73c: 7c c6 4a 14 add r6,r6,r9
740: 7d 4a e8 14 addc r10,r10,r29
744: 7c c6 e1 14 adde r6,r6,r28
748: 7c c8 fc 30 srw r8,r6,r31
74c: 2c 08 00 00 cmpwi r8,0
750: 20 bf 00 20 subfic r5,r31,32
754: 7d 4a fc 30 srw r10,r10,r31
758: 7c c5 28 30 slw r5,r6,r5
75c: 7c a9 53 78 or r9,r5,r10
760: 41 82 00 78 beq 7d8 <__c_kernel_clock_gettime+0x12c>
764: 3c c0 c4 65 lis r6,-15259
768: 3c 60 3b 9a lis r3,15258
76c: 60 c6 36 00 ori r6,r6,13824
770: 60 63 c9 ff ori r3,r3,51711
774: 7c a9 30 14 addc r5,r9,r6
778: 7d 48 01 d4 addme r10,r8
77c: 2c 0a 00 00 cmpwi r10,0
780: 7d 48 53 78 mr r8,r10
784: 7c a9 2b 78 mr r9,r5
788: 38 e7 00 01 addi r7,r7,1
78c: 40 82 ff e8 bne 774 <__c_kernel_clock_gettime+0xc8>
790: 7c 05 18 40 cmplw r5,r3
794: 41 81 ff e0 bgt 774 <__c_kernel_clock_gettime+0xc8>
798: 7c 07 00 14 addc r0,r7,r0
79c: 90 04 00 00 stw r0,0(r4)
7a0: 91 24 00 04 stw r9,4(r4)
7a4: 38 60 00 00 li r3,0
7a8: 83 81 00 10 lwz r28,16(r1)
7ac: 83 a1 00 14 lwz r29,20(r1)
7b0: 83 c1 00 18 lwz r30,24(r1)
7b4: 83 e1 00 1c lwz r31,28(r1)
7b8: 38 21 00 20 addi r1,r1,32
7bc: 4e 80 00 20 blr
7c0: 4b ff ff 20 b 6e0 <__c_kernel_clock_gettime+0x34>
7c4: 38 00 00 f6 li r0,246
7c8: 44 00 00 02 sc
7cc: 40 a3 00 08 bns 7d4 <__c_kernel_clock_gettime+0x128>
7d0: 7c 63 00 d0 neg r3,r3
7d4: 4e 80 00 20 blr
7d8: 3d 40 3b 9a lis r10,15258
7dc: 61 4a c9 ff ori r10,r10,51711
7e0: 7c 09 50 40 cmplw r9,r10
7e4: 41 81 ff 80 bgt 764 <__c_kernel_clock_gettime+0xb8>
7e8: 4b ff ff b4 b 79c <__c_kernel_clock_gettime+0xf0>
7ec: 71 2a 00 60 andi. r10,r9,96
7f0: 40 82 00 14 bne 804 <__c_kernel_clock_gettime+0x158>
7f4: 71 29 00 10 andi. r9,r9,16
7f8: 41 a2 ff cc beq 7c4 <__c_kernel_clock_gettime+0x118>
7fc: 38 a5 00 f0 addi r5,r5,240
800: 4b ff fe c4 b 6c4 <__c_kernel_clock_gettime+0x18>
804: 54 69 20 36 rlwinm r9,r3,4,0,27
808: 7d 25 4a 14 add r9,r5,r9
80c: 81 05 00 00 lwz r8,0(r5)
810: 71 0a 00 01 andi. r10,r8,1
814: 40 82 00 28 bne 83c <__c_kernel_clock_gettime+0x190>
818: 80 09 00 24 lwz r0,36(r9)
81c: 81 49 00 2c lwz r10,44(r9)
820: 80 e5 00 00 lwz r7,0(r5)
824: 7c 08 38 40 cmplw r8,r7
828: 40 a2 ff e4 bne 80c <__c_kernel_clock_gettime+0x160>
82c: 90 04 00 00 stw r0,0(r4)
830: 91 44 00 04 stw r10,4(r4)
834: 38 60 00 00 li r3,0
838: 4e 80 00 20 blr
83c: 4b ff ff d0 b 80c <__c_kernel_clock_gettime+0x160>
>
>> + return ((u64)hi << 32) | lo;
>> +}
>
>
> What does the compiler do for just
>
> static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> return ns >> (shift & 31);
> }
>
Worse:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 30 bgt 7e0 <__c_kernel_clock_gettime+0x134>
6b4: 39 20 00 01 li r9,1
6b8: 7d 29 18 30 slw r9,r9,r3
6bc: 71 2a 08 83 andi. r10,r9,2179
6c0: 41 82 01 48 beq 808 <__c_kernel_clock_gettime+0x15c>
6c4: 94 21 ff e0 stwu r1,-32(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 81 00 10 stw r28,16(r1)
6d0: 93 a1 00 14 stw r29,20(r1)
6d4: 93 c1 00 18 stw r30,24(r1)
6d8: 93 e1 00 1c stw r31,28(r1)
6dc: 7c 65 1a 14 add r3,r5,r3
6e0: 80 c5 00 00 lwz r6,0(r5)
6e4: 70 c7 00 01 andi. r7,r6,1
6e8: 40 82 00 f4 bne 7dc <__c_kernel_clock_gettime+0x130>
6ec: 7d 2d 42 e6 mftbu r9
6f0: 7d 0c 42 e6 mftb r8
6f4: 7d 4d 42 e6 mftbu r10
6f8: 7c 09 50 40 cmplw r9,r10
6fc: 40 82 ff f0 bne 6ec <__c_kernel_clock_gettime+0x40>
700: 83 83 00 28 lwz r28,40(r3)
704: 83 c3 00 2c lwz r30,44(r3)
708: 81 65 00 08 lwz r11,8(r5)
70c: 81 45 00 0c lwz r10,12(r5)
710: 83 e5 00 18 lwz r31,24(r5)
714: 81 85 00 1c lwz r12,28(r5)
718: 80 03 00 24 lwz r0,36(r3)
71c: 83 a5 00 00 lwz r29,0(r5)
720: 7c 06 e8 40 cmplw r6,r29
724: 40 a2 ff bc bne 6e0 <__c_kernel_clock_gettime+0x34>
728: 7d 0a 40 10 subfc r8,r10,r8
72c: 7c cb 49 10 subfe r6,r11,r9
730: 7c c6 f9 d6 mullw r6,r6,r31
734: 7d 28 f8 16 mulhwu r9,r8,r31
738: 7d 08 f9 d6 mullw r8,r8,r31
73c: 55 8c 06 fe clrlwi r12,r12,27
740: 7f c8 f0 14 addc r30,r8,r30
744: 7c c6 4a 14 add r6,r6,r9
748: 7c c6 e1 14 adde r6,r6,r28
74c: 34 6c ff e0 addic. r3,r12,-32
750: 41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>
754: 7c c9 1c 30 srw r9,r6,r3
758: 39 00 00 00 li r8,0
75c: 2c 08 00 00 cmpwi r8,0
760: 41 82 00 94 beq 7f4 <__c_kernel_clock_gettime+0x148>
764: 3c c0 c4 65 lis r6,-15259
768: 3c 60 3b 9a lis r3,15258
76c: 60 c6 36 00 ori r6,r6,13824
770: 60 63 c9 ff ori r3,r3,51711
774: 7c a9 30 14 addc r5,r9,r6
778: 7d 48 01 d4 addme r10,r8
77c: 2c 0a 00 00 cmpwi r10,0
780: 7d 48 53 78 mr r8,r10
784: 7c a9 2b 78 mr r9,r5
788: 38 e7 00 01 addi r7,r7,1
78c: 40 82 ff e8 bne 774 <__c_kernel_clock_gettime+0xc8>
790: 7c 05 18 40 cmplw r5,r3
794: 41 81 ff e0 bgt 774 <__c_kernel_clock_gettime+0xc8>
798: 7c 07 00 14 addc r0,r7,r0
79c: 90 04 00 00 stw r0,0(r4)
7a0: 91 24 00 04 stw r9,4(r4)
7a4: 38 60 00 00 li r3,0
7a8: 83 81 00 10 lwz r28,16(r1)
7ac: 83 a1 00 14 lwz r29,20(r1)
7b0: 83 c1 00 18 lwz r30,24(r1)
7b4: 83 e1 00 1c lwz r31,28(r1)
7b8: 38 21 00 20 addi r1,r1,32
7bc: 4e 80 00 20 blr
7c0: 54 c3 08 3c rlwinm r3,r6,1,0,30
7c4: 21 6c 00 1f subfic r11,r12,31
7c8: 7c 63 58 30 slw r3,r3,r11
7cc: 7f c9 64 30 srw r9,r30,r12
7d0: 7c 69 4b 78 or r9,r3,r9
7d4: 7c c8 64 30 srw r8,r6,r12
7d8: 4b ff ff 84 b 75c <__c_kernel_clock_gettime+0xb0>
7dc: 4b ff ff 04 b 6e0 <__c_kernel_clock_gettime+0x34>
7e0: 38 00 00 f6 li r0,246
7e4: 44 00 00 02 sc
7e8: 40 a3 00 08 bns 7f0 <__c_kernel_clock_gettime+0x144>
7ec: 7c 63 00 d0 neg r3,r3
7f0: 4e 80 00 20 blr
7f4: 3d 40 3b 9a lis r10,15258
7f8: 61 4a c9 ff ori r10,r10,51711
7fc: 7c 09 50 40 cmplw r9,r10
800: 41 81 ff 64 bgt 764 <__c_kernel_clock_gettime+0xb8>
804: 4b ff ff 98 b 79c <__c_kernel_clock_gettime+0xf0>
808: 71 2a 00 60 andi. r10,r9,96
80c: 40 82 00 14 bne 820 <__c_kernel_clock_gettime+0x174>
810: 71 29 00 10 andi. r9,r9,16
814: 41 a2 ff cc beq 7e0 <__c_kernel_clock_gettime+0x134>
818: 38 a5 00 f0 addi r5,r5,240
81c: 4b ff fe a8 b 6c4 <__c_kernel_clock_gettime+0x18>
820: 54 69 20 36 rlwinm r9,r3,4,0,27
824: 7d 25 4a 14 add r9,r5,r9
828: 81 05 00 00 lwz r8,0(r5)
82c: 71 0a 00 01 andi. r10,r8,1
830: 40 82 00 28 bne 858 <__c_kernel_clock_gettime+0x1ac>
834: 80 09 00 24 lwz r0,36(r9)
838: 81 49 00 2c lwz r10,44(r9)
83c: 80 e5 00 00 lwz r7,0(r5)
840: 7c 08 38 40 cmplw r8,r7
844: 40 a2 ff e4 bne 828 <__c_kernel_clock_gettime+0x17c>
848: 90 04 00 00 stw r0,0(r4)
84c: 91 44 00 04 stw r10,4(r4)
850: 38 60 00 00 li r3,0
854: 4e 80 00 20 blr
858: 4b ff ff d0 b 828 <__c_kernel_clock_gettime+0x17c>
Christophe
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:51 UTC (permalink / raw)
To: Segher Boessenkool
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>
Hi Again,
Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> Hi!
>
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * The macros sets two stack frames, one for the caller and one for the callee
>> + * because there are no requirement for the caller to set a stack frame when
>> + * calling VDSO so it may have omitted to set one, especially on PPC64
>> + */
>
> If the caller follows the ABI, there always is a stack frame. So what
> is going on?
Looks like it is not the case. See discussion at
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
Seems like GCC uses the redzone and doesn't set a stack frame. I guess
it doesn't know that the inline assembly contains a function call so it
doesn't set the frame.
Christophe
^ permalink raw reply
* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Miguel Ojeda @ 2020-08-05 17:10 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>
On Sun, Aug 2, 2020 at 6:41 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> .clang-format | 3 ++-
The .clang-format bit:
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Miguel Ojeda @ 2020-08-05 17:11 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>
On Sun, Aug 2, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> .clang-format | 2 +-
The .clang-format bit:
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 18:08 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
Travis reported successful compilation with mpe/merge:
https://travis-ci.org/github/LeoBras/linux-ppc/builds/715028857
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 18:40 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <3db2a590-b842-83db-ed2b-f3ee62595f18@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 04:40:16PM +0000, Christophe Leroy wrote:
> >It cannot optimise it because it does not know shift < 32. The code
> >below is incorrect for shift equal to 32, fwiw.
>
> Is there a way to tell it ?
Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)
> >What does the compiler do for just
> >
> >static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> > return ns >> (shift & 31);
> >}
> >
>
> Worse:
I cannot make heads or tails of all that branch spaghetti, sorry.
> 73c: 55 8c 06 fe clrlwi r12,r12,27
> 740: 7f c8 f0 14 addc r30,r8,r30
> 744: 7c c6 4a 14 add r6,r6,r9
> 748: 7c c6 e1 14 adde r6,r6,r28
> 74c: 34 6c ff e0 addic. r3,r12,-32
> 750: 41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>
This branch is always true. Hrm.
Segher
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 20:55 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <7d409421-6396-8eba-8250-b6c9ff8232d9@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 06:51:44PM +0200, Christophe Leroy wrote:
> Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> >On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> >>+/*
> >>+ * The macros sets two stack frames, one for the caller and one for the
> >>callee
> >>+ * because there are no requirement for the caller to set a stack frame
> >>when
> >>+ * calling VDSO so it may have omitted to set one, especially on PPC64
> >>+ */
> >
> >If the caller follows the ABI, there always is a stack frame. So what
> >is going on?
>
> Looks like it is not the case. See discussion at
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
>
> Seems like GCC uses the redzone and doesn't set a stack frame. I guess
> it doesn't know that the inline assembly contains a function call so it
> doesn't set the frame.
Yes, that is the problem. See
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Extended-Asm.html#AssemblerTemplate
where this is (briefly) discussed:
"Accessing data from C programs without using input/output operands
(such as by using global symbols directly from the assembler
template) may not work as expected. Similarly, calling functions
directly from an assembler template requires a detailed understanding
of the target assembler and ABI."
I don't know of a good way to tell GCC some function needs a frame (that
is, one that doesn't result in extra code other than to set up the
frame). I'll think about it.
Segher
^ permalink raw reply
* How would I code a Write to a proc file from within the kernel without reading anything from user space?
From: thefirst ECS @ 2020-08-05 21:52 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <3103222.25048.1596318084416.JavaMail.Dan@DanHP>
In order to help debug a certain discrepancy, I need to "simulate" an "echo 1 > /proc/file" but doing it from kernel even when root file system is unavailable.
I have simulated it just fine via call_usermodehelper (with argv etc of "echo 1 > /proc/file") from inside the kernel which triggers:
[ee897ec0] [c0122704] proc_reg_write+0x80/0xb4
[ee897ef0] [c00d3b7c] vfs_write+0xb4/0x184
just as I had wanted. But now I need to trigger the "vfs_write" and "proc_reg_write" but without using call_usermodehelper since I will be doing it when root "/" is unavailable and so I can no longer access /bin/echo and call the usermodehelper etc. So my question is how can I do that in kernel?
Not sure if I'm supposed to look in fs read_write.c and fs.h and write a method based on those or if there's some other way etc.
Thanks,
-Dan
^ permalink raw reply
* Re: [PATCH] macintosh: windfarm: fix spelling mistake "detatch" -> "detach"
From: Wolfram Sang @ 2020-08-05 22:08 UTC (permalink / raw)
To: Colin King; +Cc: kernel-janitors, linuxppc-dev, linux-kernel
In-Reply-To: <20200805104337.16104-1-colin.king@canonical.com>
[-- Attachment #1: Type: text/plain, Size: 375 bytes --]
On Wed, Aug 05, 2020 at 11:43:37AM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> There are spelling mistakes in DBG messages. Fix them.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
These comments can go entirely. i2c_detach is long history. And for
remove, we have debugging output in the driver core meanwhile.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-05 22:29 UTC (permalink / raw)
To: Greg Kurz, Michael Ellerman
Cc: Nathan Lynch, linuxppc-dev, Cedric Le Goater,
Thiago Jung Bauermann
In-Reply-To: <159660225263.15440.2633856149684894440@sif>
Quoting Michael Roth (2020-08-04 23:37:32)
> Quoting Michael Ellerman (2020-08-04 22:07:08)
> > Greg Kurz <groug@kaod.org> writes:
> > > On Tue, 04 Aug 2020 23:35:10 +1000
> > > Michael Ellerman <mpe@ellerman.id.au> wrote:
> > >> Spinning forever seems like a bad idea, but as has been demonstrated at
> > >> least twice now, continuing when we don't know the state of the other
> > >> CPU can lead to straight up crashes.
> > >>
> > >> So I think I'm persuaded that it's preferable to have the kernel stuck
> > >> spinning rather than oopsing.
> > >>
> > >
> > > +1
> > >
> > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> > >> first instinct is no, if we're stuck here for 20s a stack trace would be
> > >> good. But then we will probably hit that on some big and/or heavily
> > >> loaded machine.
> > >>
> > >> So possibly we should call cond_resched() but have some custom logic in
> > >> the loop to print a warning if we are stuck for more than some
> > >> sufficiently long amount of time.
> > >
> > > How long should that be ?
> >
> > Yeah good question.
> >
> > I guess step one would be seeing how long it can take on the 384 vcpu
> > machine. And we can probably test on some other big machines.
> >
> > Hopefully Nathan can give us some idea of how long he's seen it take on
> > large systems? I know he was concerned about the 20s timeout of the
> > softlockup detector.
> >
> > Maybe a minute or two?
>
> Hmm, so I took a stab at this where I called cond_resched() after
> every 5 seconds of polling and printed a warning at the same time (FWIW
> that doesn't seem to trigger any warnings on a loaded 96-core mihawk
> system using KVM running the 384vcpu unplug loop)
>
> But it sounds like that's not quite what you had in mind. How frequently
> do you think we should call cond_resched()? Maybe after 25 iterations
> of polling smp_query_cpu_stopped() to keep original behavior somewhat
> similar?
>
> I'll let the current patch run on the mihawk system overnight in the
> meantime so we at least have that data point, but would be good to
> know what things look like a large pHyp machine.
At one point I did manage to get the system in a state where unplug
operations were taking 1-2s, but still not enough to trigger any
5s warning, and I wasn't able to reproduce that in subsequent runs.
I also tried reworking the patch so that we print a warning and
cond_resched() after 200 ms to make sure that path gets executed, but
only managed to trigger the warning twice after a few hours.
So, if we print a warning after a couple minutes, that seems pretty
conservative as far as avoiding spurious warnings. And if we
cond_resched() after 25 loops of polling (~0.1 ms in the cases
that caused the original crash), that would avoid most of the
default RCU/lockup warnings.
But having a second timeout to trigger the cond_resched() after some
set interval like 2s seems more deterministic since we're less
susceptible to longer delays due to things like the RTAS calls
contending for QEMU's global mutex in the the KVM case.
>
> Thanks!
>
> >
> > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> > >>
> > >> This is not public.
> > >
> > > I'll have a look at changing that.
> >
> > Thanks.
> >
> > cheers
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-05 22:31 UTC (permalink / raw)
To: Greg Kurz, Michael Ellerman
Cc: Nathan Lynch, linuxppc-dev, Cedric Le Goater,
Thiago Jung Bauermann
In-Reply-To: <159666656828.15440.9097316124875217814@sif>
Quoting Michael Roth (2020-08-05 17:29:28)
> Quoting Michael Roth (2020-08-04 23:37:32)
> > Quoting Michael Ellerman (2020-08-04 22:07:08)
> > > Greg Kurz <groug@kaod.org> writes:
> > > > On Tue, 04 Aug 2020 23:35:10 +1000
> > > > Michael Ellerman <mpe@ellerman.id.au> wrote:
> > > >> Spinning forever seems like a bad idea, but as has been demonstrated at
> > > >> least twice now, continuing when we don't know the state of the other
> > > >> CPU can lead to straight up crashes.
> > > >>
> > > >> So I think I'm persuaded that it's preferable to have the kernel stuck
> > > >> spinning rather than oopsing.
> > > >>
> > > >
> > > > +1
> > > >
> > > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> > > >> first instinct is no, if we're stuck here for 20s a stack trace would be
> > > >> good. But then we will probably hit that on some big and/or heavily
> > > >> loaded machine.
> > > >>
> > > >> So possibly we should call cond_resched() but have some custom logic in
> > > >> the loop to print a warning if we are stuck for more than some
> > > >> sufficiently long amount of time.
> > > >
> > > > How long should that be ?
> > >
> > > Yeah good question.
> > >
> > > I guess step one would be seeing how long it can take on the 384 vcpu
> > > machine. And we can probably test on some other big machines.
> > >
> > > Hopefully Nathan can give us some idea of how long he's seen it take on
> > > large systems? I know he was concerned about the 20s timeout of the
> > > softlockup detector.
> > >
> > > Maybe a minute or two?
> >
> > Hmm, so I took a stab at this where I called cond_resched() after
> > every 5 seconds of polling and printed a warning at the same time (FWIW
> > that doesn't seem to trigger any warnings on a loaded 96-core mihawk
> > system using KVM running the 384vcpu unplug loop)
> >
> > But it sounds like that's not quite what you had in mind. How frequently
> > do you think we should call cond_resched()? Maybe after 25 iterations
> > of polling smp_query_cpu_stopped() to keep original behavior somewhat
> > similar?
> >
> > I'll let the current patch run on the mihawk system overnight in the
> > meantime so we at least have that data point, but would be good to
> > know what things look like a large pHyp machine.
>
> At one point I did manage to get the system in a state where unplug
> operations were taking 1-2s, but still not enough to trigger any
> 5s warning, and I wasn't able to reproduce that in subsequent runs.
>
> I also tried reworking the patch so that we print a warning and
> cond_resched() after 200 ms to make sure that path gets executed, but
> only managed to trigger the warning twice after a few hours.
>
> So, if we print a warning after a couple minutes, that seems pretty
> conservative as far as avoiding spurious warnings. And if we
> cond_resched() after 25 loops of polling (~0.1 ms in the cases
~0.1 seconds I mean
> that caused the original crash), that would avoid most of the
> default RCU/lockup warnings.
>
> But having a second timeout to trigger the cond_resched() after some
> set interval like 2s seems more deterministic since we're less
> susceptible to longer delays due to things like the RTAS calls
> contending for QEMU's global mutex in the the KVM case.
>
>
> >
> > Thanks!
> >
> > >
> > > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> > > >>
> > > >> This is not public.
> > > >
> > > > I'll have a look at changing that.
> > >
> > > Thanks.
> > >
> > > cheers
^ permalink raw reply
* Re: [PATCH v2 1/2] powerpc/perf: consolidate GPCI hcall structs into asm/hvcall.h
From: Tyrel Datwyler @ 2020-08-05 22:37 UTC (permalink / raw)
To: Scott Cheloha, linuxppc-dev; +Cc: Nathan Lynch
In-Reply-To: <20200727184605.2945095-1-cheloha@linux.ibm.com>
On 7/27/20 11:46 AM, Scott Cheloha wrote:
> The H_GetPerformanceCounterInfo (GPCI) hypercall input/output structs are
> useful to modules outside of perf/, so move them into asm/hvcall.h to live
> alongside the other powerpc hypercall structs.
>
> Leave the perf-specific GPCI stuff in perf/hv-gpci.h.
>
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
> ---
> arch/powerpc/include/asm/hvcall.h | 36 +++++++++++++++++++++++++++++++
> arch/powerpc/perf/hv-gpci.c | 9 --------
> arch/powerpc/perf/hv-gpci.h | 27 -----------------------
> 3 files changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index e90c073e437e..c338480b4551 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -527,6 +527,42 @@ struct hv_guest_state {
> /* Latest version of hv_guest_state structure */
> #define HV_GUEST_STATE_VERSION 1
>
> +/*
> + * From the document "H_GetPerformanceCounterInfo Interface" v1.07
> + *
> + * H_GET_PERF_COUNTER_INFO argument
> + */
> +struct hv_get_perf_counter_info_params {
> + __be32 counter_request; /* I */
> + __be32 starting_index; /* IO */
> + __be16 secondary_index; /* IO */
> + __be16 returned_values; /* O */
> + __be32 detail_rc; /* O, only needed when called via *_norets() */
> +
> + /*
> + * O, size each of counter_value element in bytes, only set for version
> + * >= 0x3
> + */
> + __be16 cv_element_size;
> +
> + /* I, 0 (zero) for versions < 0x3 */
> + __u8 counter_info_version_in;
> +
> + /* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
> + __u8 counter_info_version_out;
> + __u8 reserved[0xC];
> + __u8 counter_value[];
> +} __packed;
> +
> +#define HGPCI_REQ_BUFFER_SIZE 4096
> +#define HGPCI_MAX_DATA_BYTES \
> + (HGPCI_REQ_BUFFER_SIZE - sizeof(struct hv_get_perf_counter_info_params))
> +
> +struct hv_gpci_request_buffer {
> + struct hv_get_perf_counter_info_params params;
> + uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> +} __packed;
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __KERNEL__ */
> #endif /* _ASM_POWERPC_HVCALL_H */
> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> index 6884d16ec19b..1667315b82e9 100644
> --- a/arch/powerpc/perf/hv-gpci.c
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -123,17 +123,8 @@ static const struct attribute_group *attr_groups[] = {
> NULL,
> };
>
> -#define HGPCI_REQ_BUFFER_SIZE 4096
> -#define HGPCI_MAX_DATA_BYTES \
> - (HGPCI_REQ_BUFFER_SIZE - sizeof(struct hv_get_perf_counter_info_params))
> -
> static DEFINE_PER_CPU(char, hv_gpci_reqb[HGPCI_REQ_BUFFER_SIZE]) __aligned(sizeof(uint64_t));
>
> -struct hv_gpci_request_buffer {
> - struct hv_get_perf_counter_info_params params;
> - uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> -} __packed;
> -
> static unsigned long single_gpci_request(u32 req, u32 starting_index,
> u16 secondary_index, u8 version_in, u32 offset, u8 length,
> u64 *value)
> diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h
> index a3053eda5dcc..4d108262bed7 100644
> --- a/arch/powerpc/perf/hv-gpci.h
> +++ b/arch/powerpc/perf/hv-gpci.h
> @@ -2,33 +2,6 @@
> #ifndef LINUX_POWERPC_PERF_HV_GPCI_H_
> #define LINUX_POWERPC_PERF_HV_GPCI_H_
>
> -#include <linux/types.h>
> -
> -/* From the document "H_GetPerformanceCounterInfo Interface" v1.07 */
> -
> -/* H_GET_PERF_COUNTER_INFO argument */
> -struct hv_get_perf_counter_info_params {
> - __be32 counter_request; /* I */
> - __be32 starting_index; /* IO */
> - __be16 secondary_index; /* IO */
> - __be16 returned_values; /* O */
> - __be32 detail_rc; /* O, only needed when called via *_norets() */
> -
> - /*
> - * O, size each of counter_value element in bytes, only set for version
> - * >= 0x3
> - */
> - __be16 cv_element_size;
> -
> - /* I, 0 (zero) for versions < 0x3 */
> - __u8 counter_info_version_in;
> -
> - /* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
> - __u8 counter_info_version_out;
> - __u8 reserved[0xC];
> - __u8 counter_value[];
> -} __packed;
> -
Hmm, this pretty much guts this header which normally I'd be inclined to suggest
moving the whole thing. The remainder of that header for context:
> /*
> * counter info version => fw version/reference (spec version)
> *
>
* 8 => power8 (1.07)
* [7 is skipped by spec 1.07]
* 6 => TLBIE (1.07)
* 5 => v7r7m0.phyp (1.05)
* [4 skipped]
* 3 => v7r6m0.phyp (?)
* [1,2 skipped]
* 0 => v7r{2,3,4}m0.phyp (?)
*/
#define COUNTER_INFO_VERSION_CURRENT 0x8
/* capability mask masks. */
enum {
HV_GPCI_CM_GA = (1 << 7),
HV_GPCI_CM_EXPANDED = (1 << 6),
HV_GPCI_CM_LAB = (1 << 5)
};
#define REQUEST_FILE "../hv-gpci-requests.h"
#define NAME_LOWER hv_gpci
#define NAME_UPPER HV_GPCI
#include "req-gen/perf.h"
#undef REQUEST_FILE
#undef NAME_LOWER
#undef NAME_UPPER
The side effect of moving seems that we would have to drag "hv-gpci-requests.h"
along as well. So, maybe its best just moving the struct as you've done so it
can be used by code outside of perf.
-Tyrel
^ permalink raw reply
* Re: [PATCH v2 2/2] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score
From: Tyrel Datwyler @ 2020-08-05 22:42 UTC (permalink / raw)
To: Scott Cheloha, linuxppc-dev; +Cc: Nathan Lynch
In-Reply-To: <20200727184605.2945095-2-cheloha@linux.ibm.com>
On 7/27/20 11:46 AM, Scott Cheloha wrote:
> The H_GetPerformanceCounterInfo (GPCI) PHYP hypercall has a subcall,
> Affinity_Domain_Info_By_Partition, which returns, among other things,
> a "partition affinity score" for a given LPAR. This score, a value on
> [0-100], represents the processor-memory affinity for the LPAR in
> question. A score of 0 indicates the worst possible affinity while a
> score of 100 indicates perfect affinity. The score can be used to
> reason about performance.
>
> This patch adds the score for the local LPAR to the lparcfg procfile
> under a new 'partition_affinity_score' key.
>
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
I was hoping Michael would chime in the first time around on this patch series
about adding another key/value pair to lparcfg. So, barring a NACK from mpe:
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> arch/powerpc/platforms/pseries/lparcfg.c | 35 ++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
> index b8d28ab88178..e278390ab28d 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -136,6 +136,39 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
> return rc;
> }
>
> +static void show_gpci_data(struct seq_file *m)
> +{
> + struct hv_gpci_request_buffer *buf;
> + unsigned int affinity_score;
> + long ret;
> +
> + buf = kmalloc(sizeof(*buf), GFP_KERNEL);
> + if (buf == NULL)
> + return;
> +
> + /*
> + * Show the local LPAR's affinity score.
> + *
> + * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
> + * The score is at byte 0xB in the output buffer.
> + */
> + memset(&buf->params, 0, sizeof(buf->params));
> + buf->params.counter_request = cpu_to_be32(0xB1);
> + buf->params.starting_index = cpu_to_be32(-1); /* local LPAR */
> + buf->params.counter_info_version_in = 0x5; /* v5+ for score */
> + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
> + sizeof(*buf));
> + if (ret != H_SUCCESS) {
> + pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
> + ret, be32_to_cpu(buf->params.detail_rc));
> + goto out;
> + }
> + affinity_score = buf->bytes[0xB];
> + seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
> +out:
> + kfree(buf);
> +}
> +
> static unsigned h_pic(unsigned long *pool_idle_time,
> unsigned long *num_procs)
> {
> @@ -487,6 +520,8 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
> partition_active_processors * 100);
> }
>
> + show_gpci_data(m);
> +
> seq_printf(m, "partition_active_processors=%d\n",
> partition_active_processors);
>
>
^ permalink raw reply
* Re: linux-next: manual merge of the char-misc tree with the powerpc tree
From: Stephen Rothwell @ 2020-08-05 23:40 UTC (permalink / raw)
To: Greg KH, Arnd Bergmann, Michael Ellerman, PowerPC
Cc: Linux Next Mailing List, Lee Jones, Linux Kernel Mailing List,
Alastair D'Silva
In-Reply-To: <20200803165546.6ab5ab6f@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]
Hi all,
On Mon, 3 Aug 2020 16:55:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the char-misc tree got a conflict in:
>
> drivers/misc/ocxl/config.c
>
> between commit:
>
> 3591538a31af ("ocxl: Address kernel doc errors & warnings")
>
> from the powerpc tree and commit:
>
> 28fc491e9be6 ("misc: ocxl: config: Provide correct formatting to function headers")
>
> from the char-misc tree.
>
> I fixed it up (as it was just differences in comments, I just arbitrarily
> chose the latter version) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging. You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
This is now a conflict between the powerpc tree and Linus' tree.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v2] powerpc: Warn about use of smt_snooze_delay
From: Joel Stanley @ 2020-08-05 23:57 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Tyrel Datwyler, linuxppc-dev, Gautham R Shenoy
In-Reply-To: <87eeomzknj.fsf@mpe.ellerman.id.au>
On Tue, 4 Aug 2020 at 11:59, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Joel Stanley <joel@jms.id.au> writes:
> > It's not done anything for a long time. Save the percpu variable, and
> > emit a warning to remind users to not expect it to do anything.
> >
> > Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.")
> > Cc: stable@vger.kernel.org # v3.14
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > --
> > v2:
> > Use pr_warn instead of WARN
> > Reword and print proccess name with pid in message
> > Leave CPU_FTR_SMT test in
> > Add Fixes line
> >
> > mpe, if you don't agree then feel free to drop the cc stable.
> >
> > Testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system it's quite noisy
> > as the online/offline loop that ppc64_cpu runs is slow.
>
> Hmm, that is pretty spammy.
>
> I know I suggested the ratelimit, but I was thinking it would print once
> for each ppc64_cpu invocation, not ~30 times.
>
> How about pr_warn_once(), that should still be sufficient for people to
> notice if they're looking for it.
I think that's a reasonable suggestion.
>
> ...
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> > index 571b3259697e..ba6d4cee19ef 100644
> > --- a/arch/powerpc/kernel/sysfs.c
> > +++ b/arch/powerpc/kernel/sysfs.c
> > @@ -32,29 +32,26 @@
> >
> > static DEFINE_PER_CPU(struct cpu, cpu_devices);
> >
> > -/*
> > - * SMT snooze delay stuff, 64-bit only for now
> > - */
> > -
> > #ifdef CONFIG_PPC64
> >
> > -/* Time in microseconds we delay before sleeping in the idle loop */
> > -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
> > +/*
> > + * Snooze delay has not been hooked up since 3fa8cad82b94 ("powerpc/pseries/cpuidle:
> > + * smt-snooze-delay cleanup.") and has been broken even longer. As was foretold in
> > + * 2014:
> > + *
> > + * "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
> > + * up the kernel code."
> > + *
> > + * At some point in the future this code should be removed.
> > + */
> >
> > static ssize_t store_smt_snooze_delay(struct device *dev,
> > struct device_attribute *attr,
> > const char *buf,
> > size_t count)
> > {
> > - struct cpu *cpu = container_of(dev, struct cpu, dev);
> > - ssize_t ret;
> > - long snooze;
> > -
> > - ret = sscanf(buf, "%ld", &snooze);
> > - if (ret != 1)
> > - return -EINVAL;
> > -
> > - per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
> > + pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> > + current->comm, current->pid);
>
> Can we make this:
>
> "%s (%d) stored to unsupported smt_snooze_delay, which has no effect.\n",
ack
>
>
> > return count;
> > }
> >
> > @@ -62,9 +59,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
> > struct device_attribute *attr,
> > char *buf)
> > {
> > - struct cpu *cpu = container_of(dev, struct cpu, dev);
> > -
> > - return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
> > + pr_warn_ratelimited("%s (%d) used unsupported smt_snooze_delay, this has no effect\n",
> > + current->comm, current->pid);
>
> It has as much effect as it ever did :)
>
> So maybe:
>
> "%s (%d) read from unsupported smt_snooze_delay.\n",
>
>
> I can do those changes when applying if you like rather than making you
> do a v3.
Yes please! Your suggested changes lgtm.
Cheers,
Joel
^ permalink raw reply
* Re: [PATCH v2 1/5] powerpc/mm: Introduce temporary mm
From: Daniel Axtens @ 2020-08-06 1:27 UTC (permalink / raw)
To: Christopher M. Riedl, linuxppc-dev; +Cc: kernel-hardening
In-Reply-To: <20200709040316.12789-2-cmr@informatik.wtf>
Hi Chris,
> void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> bool ppc_breakpoint_available(void);
> #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> extern void do_send_trap(struct pt_regs *regs, unsigned long address,
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 1a474f6b1992..9269c7c7b04e 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -10,6 +10,7 @@
> #include <asm/mmu.h>
> #include <asm/cputable.h>
> #include <asm/cputhreads.h>
> +#include <asm/debug.h>
>
> /*
> * Most if the context management is out of line
> @@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
> return 0;
> }
>
> +struct temp_mm {
> + struct mm_struct *temp;
> + struct mm_struct *prev;
> + bool is_kernel_thread;
> + struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> +};
This is on the nitpicky end, but I wonder if this should be named
temp_mm, or should be labelled something else to capture its broader
purpose as a context for code patching? I'm thinking that a store of
breakpoints is perhaps unusual in a memory-managment structure?
I don't have a better suggestion off the top of my head and I'm happy
for you to leave it, I just wanted to flag it as a possible way we could
be clearer.
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> + temp_mm->temp = mm;
> + temp_mm->prev = NULL;
> + temp_mm->is_kernel_thread = false;
> + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> + lockdep_assert_irqs_disabled();
> +
> + temp_mm->is_kernel_thread = current->mm == NULL;
> + if (temp_mm->is_kernel_thread)
> + temp_mm->prev = current->active_mm;
You don't seem to restore active_mm below. I don't know what active_mm
does, so I don't know if this is a problem.
> + else
> + temp_mm->prev = current->mm;
> +
> + /*
> + * Hash requires a non-NULL current->mm to allocate a userspace address
> + * when handling a page fault. Does not appear to hurt in Radix either.
> + */
> + current->mm = temp_mm->temp;
> + switch_mm_irqs_off(NULL, temp_mm->temp, current);
> +
> + if (ppc_breakpoint_available()) {
I wondered if this could be changed during a text-patching operation.
AIUI, it potentially can on a P9 via "dawr_enable_dangerous" in debugfs.
I don't know if that's a problem. My concern is that you could turn off
breakpoints, call 'use_temporary_mm', then turn them back on again
before 'unuse_temporary_mm' and get a breakpoint while that can access
the temporary mm. Is there something else that makes that safe?
disabling IRQs maybe?
> + struct arch_hw_breakpoint null_brk = {0};
> + int i = 0;
> +
> + for (; i < nr_wp_slots(); ++i) {
super nitpicky, and I'm not sure if this is actually documented, but I'd
usually see this written as:
for (i = 0; i < nr_wp_slots(); i++) {
Not sure if there's any reason that it _shouldn't_ be written the way
you've written it (and I do like initialising the variable when it's
defined!), I'm just not used to it. (Likewise with the unuse function.)
> + __get_breakpoint(i, &temp_mm->brk[i]);
> + if (temp_mm->brk[i].type != 0)
> + __set_breakpoint(i, &null_brk);
> + }
> + }
> +}
> +
Kind regards,
Daniel
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> + lockdep_assert_irqs_disabled();
> +
> + if (temp_mm->is_kernel_thread)
> + current->mm = NULL;
> + else
> + current->mm = temp_mm->prev;
> + switch_mm_irqs_off(NULL, temp_mm->prev, current);
> +
> + if (ppc_breakpoint_available()) {
> + int i = 0;
> +
> + for (; i < nr_wp_slots(); ++i)
> + if (temp_mm->brk[i].type != 0)
> + __set_breakpoint(i, &temp_mm->brk[i]);
> + }
> +}
> +
> #endif /* __KERNEL__ */
> #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4650b9bb217f..b6c123bf5edd 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -824,6 +824,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
> return 0;
> }
>
> +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> +{
> + memcpy(brk, this_cpu_ptr(¤t_brk[nr]), sizeof(*brk));
> +}
> +
> void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> {
> memcpy(this_cpu_ptr(¤t_brk[nr]), brk, sizeof(*brk));
> --
> 2.27.0
^ permalink raw reply
* Re: How would I code a Write to a proc file from within the kernel without reading anything from user space?
From: Michael Ellerman @ 2020-08-06 1:30 UTC (permalink / raw)
To: thefirst ECS, linuxppc-dev
In-Reply-To: <32112832.26959.1596664366002.JavaMail.Dan@DanHP>
thefirst ECS <ecs_dn@yahoo.com> writes:
> In order to help debug a certain discrepancy, I need to "simulate" an "echo 1 > /proc/file" but doing it from kernel even when root file system is unavailable.
>
> I have simulated it just fine via call_usermodehelper (with argv etc of "echo 1 > /proc/file") from inside the kernel which triggers:
> [ee897ec0] [c0122704] proc_reg_write+0x80/0xb4
> [ee897ef0] [c00d3b7c] vfs_write+0xb4/0x184
>
> just as I had wanted. But now I need to trigger the "vfs_write" and "proc_reg_write" but without using call_usermodehelper since I will be doing it when root "/" is unavailable and so I can no longer access /bin/echo and call the usermodehelper etc. So my question is how can I do that in kernel?
>
> Not sure if I'm supposed to look in fs read_write.c and fs.h and write a method based on those or if there's some other way etc.
You don't say which proc file, which would be useful information.
I'll assume it's /proc/sysrq-trigger based on your previous mail.
Rather than trying to invoke the write path from inside the kernel, the
simplest option is to just call __handle_sysrq() directly.
cheers
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox