* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 20:55 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <7d409421-6396-8eba-8250-b6c9ff8232d9@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 06:51:44PM +0200, Christophe Leroy wrote:
> Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> >On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> >>+/*
> >>+ * The macros sets two stack frames, one for the caller and one for the
> >>callee
> >>+ * because there are no requirement for the caller to set a stack frame
> >>when
> >>+ * calling VDSO so it may have omitted to set one, especially on PPC64
> >>+ */
> >
> >If the caller follows the ABI, there always is a stack frame. So what
> >is going on?
>
> Looks like it is not the case. See discussion at
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
>
> Seems like GCC uses the redzone and doesn't set a stack frame. I guess
> it doesn't know that the inline assembly contains a function call so it
> doesn't set the frame.
Yes, that is the problem. See
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Extended-Asm.html#AssemblerTemplate
where this is (briefly) discussed:
"Accessing data from C programs without using input/output operands
(such as by using global symbols directly from the assembler
template) may not work as expected. Similarly, calling functions
directly from an assembler template requires a detailed understanding
of the target assembler and ABI."
I don't know of a good way to tell GCC some function needs a frame (that
is, one that doesn't result in extra code other than to set up the
frame). I'll think about it.
Segher
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 18:40 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <3db2a590-b842-83db-ed2b-f3ee62595f18@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 04:40:16PM +0000, Christophe Leroy wrote:
> >It cannot optimise it because it does not know shift < 32. The code
> >below is incorrect for shift equal to 32, fwiw.
>
> Is there a way to tell it ?
Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)
> >What does the compiler do for just
> >
> >static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> > return ns >> (shift & 31);
> >}
> >
>
> Worse:
I cannot make heads or tails of all that branch spaghetti, sorry.
> 73c: 55 8c 06 fe clrlwi r12,r12,27
> 740: 7f c8 f0 14 addc r30,r8,r30
> 744: 7c c6 4a 14 add r6,r6,r9
> 748: 7c c6 e1 14 adde r6,r6,r28
> 74c: 34 6c ff e0 addic. r3,r12,-32
> 750: 41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>
This branch is always true. Hrm.
Segher
^ permalink raw reply
* Re: [PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 18:08 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
Travis reported successful compilation with mpe/merge:
https://travis-ci.org/github/LeoBras/linux-ppc/builds/715028857
^ permalink raw reply
* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Miguel Ojeda @ 2020-08-05 17:11 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>
On Sun, Aug 2, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> .clang-format | 2 +-
The .clang-format bit:
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Miguel Ojeda @ 2020-08-05 17:10 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>
On Sun, Aug 2, 2020 at 6:41 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> .clang-format | 3 ++-
The .clang-format bit:
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:51 UTC (permalink / raw)
To: Segher Boessenkool
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>
Hi Again,
Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> Hi!
>
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * The macros sets two stack frames, one for the caller and one for the callee
>> + * because there are no requirement for the caller to set a stack frame when
>> + * calling VDSO so it may have omitted to set one, especially on PPC64
>> + */
>
> If the caller follows the ABI, there always is a stack frame. So what
> is going on?
Looks like it is not the case. See discussion at
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
Seems like GCC uses the redzone and doesn't set a stack frame. I guess
it doesn't know that the inline assembly contains a function call so it
doesn't set the frame.
Christophe
^ permalink raw reply
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:40 UTC (permalink / raw)
To: Segher Boessenkool
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>
Hi,
On 08/05/2020 02:03 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * powerpc specific delta calculation.
>> + *
>> + * This variant removes the masking of the subtraction because the
>> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
>> + * which would result in a pointless operation. The compiler cannot
>> + * optimize it away as the mask comes from the vdso data and is not compile
>> + * time constant.
>> + */
>
> It cannot optimise it because it does not know shift < 32. The code
> below is incorrect for shift equal to 32, fwiw.
Is there a way to tell it ?
>
>> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
>> +{
>> + return (cycles - last) * mult;
>> +}
>> +#define vdso_calc_delta vdso_calc_delta
>> +
>> +#ifndef __powerpc64__
>> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
>> +{
>> + u32 hi = ns >> 32;
>> + u32 lo = ns;
>> +
>> + lo >>= shift;
>> + lo |= hi << (32 - shift);
>> + hi >>= shift;
>
>
>> + if (likely(hi == 0))
>> + return lo;
>
> Removing these two lines shouldn't change generated object code? Or not
> make it worse, at least.
I remember it made noticeable difference allthough I can't remember the
details. See below with GCC 10.1. At least we see that with those two
lines, GCC only sets a 16 bytes stack frame. Without those lines it sets
a 32 bytes stack frame and seems to save some values for no reason.
With the two lines:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 04 bgt 7b4 <__c_kernel_clock_gettime+0x108>
6b4: 39 40 00 01 li r10,1
6b8: 7d 4a 18 30 slw r10,r10,r3
6bc: 71 47 08 83 andi. r7,r10,2179
6c0: 41 82 01 2c beq 7ec <__c_kernel_clock_gettime+0x140>
6c4: 94 21 ff f0 stwu r1,-16(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 e1 00 0c stw r31,12(r1)
6d0: 7d 85 1a 14 add r12,r5,r3
6d4: 80 05 00 00 lwz r0,0(r5)
6d8: 70 06 00 01 andi. r6,r0,1
6dc: 40 82 00 d4 bne 7b0 <__c_kernel_clock_gettime+0x104>
6e0: 7d 4d 42 e6 mftbu r10
6e4: 7d 6c 42 e6 mftb r11
6e8: 7c ed 42 e6 mftbu r7
6ec: 7c 0a 38 40 cmplw r10,r7
6f0: 40 82 ff f0 bne 6e0 <__c_kernel_clock_gettime+0x34>
6f4: 80 e5 00 0c lwz r7,12(r5)
6f8: 80 65 00 08 lwz r3,8(r5)
6fc: 7c e7 58 10 subfc r7,r7,r11
700: 81 65 00 18 lwz r11,24(r5)
704: 7d 43 51 10 subfe r10,r3,r10
708: 7f e7 58 16 mulhwu r31,r7,r11
70c: 7d 4a 59 d6 mullw r10,r10,r11
710: 7c e7 59 d6 mullw r7,r7,r11
714: 80 6c 00 2c lwz r3,44(r12)
718: 81 6c 00 28 lwz r11,40(r12)
71c: 7c e7 18 14 addc r7,r7,r3
720: 7d 4a fa 14 add r10,r10,r31
724: 80 65 00 1c lwz r3,28(r5)
728: 7d 4a 59 14 adde r10,r10,r11
72c: 7c e7 1c 30 srw r7,r7,r3
730: 21 63 00 20 subfic r11,r3,32
734: 7d 43 1c 31 srw. r3,r10,r3
738: 7d 4a 58 30 slw r10,r10,r11
73c: 7d 49 3b 78 or r9,r10,r7
740: 39 00 00 00 li r8,0
744: 40 82 00 84 bne 7c8 <__c_kernel_clock_gettime+0x11c>
748: 80 6c 00 24 lwz r3,36(r12)
74c: 81 45 00 00 lwz r10,0(r5)
750: 7c 00 50 40 cmplw r0,r10
754: 40 a2 ff 80 bne 6d4 <__c_kernel_clock_gettime+0x28>
758: 2c 08 00 00 cmpwi r8,0
75c: 41 82 00 7c beq 7d8 <__c_kernel_clock_gettime+0x12c>
760: 3c e0 c4 65 lis r7,-15259
764: 3c 00 3b 9a lis r0,15258
768: 60 e7 36 00 ori r7,r7,13824
76c: 60 00 c9 ff ori r0,r0,51711
770: 7c a9 38 14 addc r5,r9,r7
774: 7d 48 01 d4 addme r10,r8
778: 2c 0a 00 00 cmpwi r10,0
77c: 7d 48 53 78 mr r8,r10
780: 7c a9 2b 78 mr r9,r5
784: 38 c6 00 01 addi r6,r6,1
788: 40 82 ff e8 bne 770 <__c_kernel_clock_gettime+0xc4>
78c: 7c 05 00 40 cmplw r5,r0
790: 41 81 ff e0 bgt 770 <__c_kernel_clock_gettime+0xc4>
794: 7c 66 18 14 addc r3,r6,r3
798: 90 64 00 00 stw r3,0(r4)
79c: 91 24 00 04 stw r9,4(r4)
7a0: 38 60 00 00 li r3,0
7a4: 83 e1 00 0c lwz r31,12(r1)
7a8: 38 21 00 10 addi r1,r1,16
7ac: 4e 80 00 20 blr
7b0: 4b ff ff 24 b 6d4 <__c_kernel_clock_gettime+0x28>
7b4: 38 00 00 f6 li r0,246
7b8: 44 00 00 02 sc
7bc: 40 a3 00 08 bns 7c4 <__c_kernel_clock_gettime+0x118>
7c0: 7c 63 00 d0 neg r3,r3
7c4: 4e 80 00 20 blr
7c8: 7d 2a 4b 78 mr r10,r9
7cc: 7c 68 1b 78 mr r8,r3
7d0: 7d 49 53 78 mr r9,r10
7d4: 4b ff ff 74 b 748 <__c_kernel_clock_gettime+0x9c>
7d8: 3d 40 3b 9a lis r10,15258
7dc: 61 4a c9 ff ori r10,r10,51711
7e0: 7c 09 50 40 cmplw r9,r10
7e4: 41 81 ff 7c bgt 760 <__c_kernel_clock_gettime+0xb4>
7e8: 4b ff ff b0 b 798 <__c_kernel_clock_gettime+0xec>
7ec: 71 47 00 60 andi. r7,r10,96
7f0: 54 69 20 36 rlwinm r9,r3,4,0,27
7f4: 7d 25 4a 14 add r9,r5,r9
7f8: 40 82 00 14 bne 80c <__c_kernel_clock_gettime+0x160>
7fc: 71 4a 00 10 andi. r10,r10,16
800: 41 a2 ff b4 beq 7b4 <__c_kernel_clock_gettime+0x108>
804: 38 a5 00 f0 addi r5,r5,240
808: 4b ff fe bc b 6c4 <__c_kernel_clock_gettime+0x18>
80c: 81 05 00 00 lwz r8,0(r5)
810: 71 0a 00 01 andi. r10,r8,1
814: 40 a2 ff f8 bne 80c <__c_kernel_clock_gettime+0x160>
818: 80 69 00 24 lwz r3,36(r9)
81c: 81 49 00 2c lwz r10,44(r9)
820: 80 e5 00 00 lwz r7,0(r5)
824: 7c 08 38 40 cmplw r8,r7
828: 40 a2 ff e4 bne 80c <__c_kernel_clock_gettime+0x160>
82c: 90 64 00 00 stw r3,0(r4)
830: 91 44 00 04 stw r10,4(r4)
834: 38 60 00 00 li r3,0
838: 4e 80 00 20 blr
Without the two lines:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 14 bgt 7c4 <__c_kernel_clock_gettime+0x118>
6b4: 39 20 00 01 li r9,1
6b8: 7d 29 18 30 slw r9,r9,r3
6bc: 71 2a 08 83 andi. r10,r9,2179
6c0: 41 82 01 2c beq 7ec <__c_kernel_clock_gettime+0x140>
6c4: 94 21 ff e0 stwu r1,-32(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 81 00 10 stw r28,16(r1)
6d0: 93 a1 00 14 stw r29,20(r1)
6d4: 93 c1 00 18 stw r30,24(r1)
6d8: 93 e1 00 1c stw r31,28(r1)
6dc: 7c 65 1a 14 add r3,r5,r3
6e0: 81 85 00 00 lwz r12,0(r5)
6e4: 71 87 00 01 andi. r7,r12,1
6e8: 40 82 00 d8 bne 7c0 <__c_kernel_clock_gettime+0x114>
6ec: 7d 2d 42 e6 mftbu r9
6f0: 7c cc 42 e6 mftb r6
6f4: 7d 4d 42 e6 mftbu r10
6f8: 7c 09 50 40 cmplw r9,r10
6fc: 40 82 ff f0 bne 6ec <__c_kernel_clock_gettime+0x40>
700: 83 83 00 28 lwz r28,40(r3)
704: 83 a3 00 2c lwz r29,44(r3)
708: 81 65 00 08 lwz r11,8(r5)
70c: 81 05 00 0c lwz r8,12(r5)
710: 83 c5 00 18 lwz r30,24(r5)
714: 83 e5 00 1c lwz r31,28(r5)
718: 80 03 00 24 lwz r0,36(r3)
71c: 81 45 00 00 lwz r10,0(r5)
720: 7c 0c 50 40 cmplw r12,r10
724: 40 a2 ff bc bne 6e0 <__c_kernel_clock_gettime+0x34>
728: 7d 48 30 10 subfc r10,r8,r6
72c: 7c cb 49 10 subfe r6,r11,r9
730: 7c c6 f1 d6 mullw r6,r6,r30
734: 7d 2a f0 16 mulhwu r9,r10,r30
738: 7d 4a f1 d6 mullw r10,r10,r30
73c: 7c c6 4a 14 add r6,r6,r9
740: 7d 4a e8 14 addc r10,r10,r29
744: 7c c6 e1 14 adde r6,r6,r28
748: 7c c8 fc 30 srw r8,r6,r31
74c: 2c 08 00 00 cmpwi r8,0
750: 20 bf 00 20 subfic r5,r31,32
754: 7d 4a fc 30 srw r10,r10,r31
758: 7c c5 28 30 slw r5,r6,r5
75c: 7c a9 53 78 or r9,r5,r10
760: 41 82 00 78 beq 7d8 <__c_kernel_clock_gettime+0x12c>
764: 3c c0 c4 65 lis r6,-15259
768: 3c 60 3b 9a lis r3,15258
76c: 60 c6 36 00 ori r6,r6,13824
770: 60 63 c9 ff ori r3,r3,51711
774: 7c a9 30 14 addc r5,r9,r6
778: 7d 48 01 d4 addme r10,r8
77c: 2c 0a 00 00 cmpwi r10,0
780: 7d 48 53 78 mr r8,r10
784: 7c a9 2b 78 mr r9,r5
788: 38 e7 00 01 addi r7,r7,1
78c: 40 82 ff e8 bne 774 <__c_kernel_clock_gettime+0xc8>
790: 7c 05 18 40 cmplw r5,r3
794: 41 81 ff e0 bgt 774 <__c_kernel_clock_gettime+0xc8>
798: 7c 07 00 14 addc r0,r7,r0
79c: 90 04 00 00 stw r0,0(r4)
7a0: 91 24 00 04 stw r9,4(r4)
7a4: 38 60 00 00 li r3,0
7a8: 83 81 00 10 lwz r28,16(r1)
7ac: 83 a1 00 14 lwz r29,20(r1)
7b0: 83 c1 00 18 lwz r30,24(r1)
7b4: 83 e1 00 1c lwz r31,28(r1)
7b8: 38 21 00 20 addi r1,r1,32
7bc: 4e 80 00 20 blr
7c0: 4b ff ff 20 b 6e0 <__c_kernel_clock_gettime+0x34>
7c4: 38 00 00 f6 li r0,246
7c8: 44 00 00 02 sc
7cc: 40 a3 00 08 bns 7d4 <__c_kernel_clock_gettime+0x128>
7d0: 7c 63 00 d0 neg r3,r3
7d4: 4e 80 00 20 blr
7d8: 3d 40 3b 9a lis r10,15258
7dc: 61 4a c9 ff ori r10,r10,51711
7e0: 7c 09 50 40 cmplw r9,r10
7e4: 41 81 ff 80 bgt 764 <__c_kernel_clock_gettime+0xb8>
7e8: 4b ff ff b4 b 79c <__c_kernel_clock_gettime+0xf0>
7ec: 71 2a 00 60 andi. r10,r9,96
7f0: 40 82 00 14 bne 804 <__c_kernel_clock_gettime+0x158>
7f4: 71 29 00 10 andi. r9,r9,16
7f8: 41 a2 ff cc beq 7c4 <__c_kernel_clock_gettime+0x118>
7fc: 38 a5 00 f0 addi r5,r5,240
800: 4b ff fe c4 b 6c4 <__c_kernel_clock_gettime+0x18>
804: 54 69 20 36 rlwinm r9,r3,4,0,27
808: 7d 25 4a 14 add r9,r5,r9
80c: 81 05 00 00 lwz r8,0(r5)
810: 71 0a 00 01 andi. r10,r8,1
814: 40 82 00 28 bne 83c <__c_kernel_clock_gettime+0x190>
818: 80 09 00 24 lwz r0,36(r9)
81c: 81 49 00 2c lwz r10,44(r9)
820: 80 e5 00 00 lwz r7,0(r5)
824: 7c 08 38 40 cmplw r8,r7
828: 40 a2 ff e4 bne 80c <__c_kernel_clock_gettime+0x160>
82c: 90 04 00 00 stw r0,0(r4)
830: 91 44 00 04 stw r10,4(r4)
834: 38 60 00 00 li r3,0
838: 4e 80 00 20 blr
83c: 4b ff ff d0 b 80c <__c_kernel_clock_gettime+0x160>
>
>> + return ((u64)hi << 32) | lo;
>> +}
>
>
> What does the compiler do for just
>
> static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> return ns >> (shift & 31);
> }
>
Worse:
000006ac <__c_kernel_clock_gettime>:
6ac: 28 03 00 0f cmplwi r3,15
6b0: 41 81 01 30 bgt 7e0 <__c_kernel_clock_gettime+0x134>
6b4: 39 20 00 01 li r9,1
6b8: 7d 29 18 30 slw r9,r9,r3
6bc: 71 2a 08 83 andi. r10,r9,2179
6c0: 41 82 01 48 beq 808 <__c_kernel_clock_gettime+0x15c>
6c4: 94 21 ff e0 stwu r1,-32(r1)
6c8: 54 63 20 36 rlwinm r3,r3,4,0,27
6cc: 93 81 00 10 stw r28,16(r1)
6d0: 93 a1 00 14 stw r29,20(r1)
6d4: 93 c1 00 18 stw r30,24(r1)
6d8: 93 e1 00 1c stw r31,28(r1)
6dc: 7c 65 1a 14 add r3,r5,r3
6e0: 80 c5 00 00 lwz r6,0(r5)
6e4: 70 c7 00 01 andi. r7,r6,1
6e8: 40 82 00 f4 bne 7dc <__c_kernel_clock_gettime+0x130>
6ec: 7d 2d 42 e6 mftbu r9
6f0: 7d 0c 42 e6 mftb r8
6f4: 7d 4d 42 e6 mftbu r10
6f8: 7c 09 50 40 cmplw r9,r10
6fc: 40 82 ff f0 bne 6ec <__c_kernel_clock_gettime+0x40>
700: 83 83 00 28 lwz r28,40(r3)
704: 83 c3 00 2c lwz r30,44(r3)
708: 81 65 00 08 lwz r11,8(r5)
70c: 81 45 00 0c lwz r10,12(r5)
710: 83 e5 00 18 lwz r31,24(r5)
714: 81 85 00 1c lwz r12,28(r5)
718: 80 03 00 24 lwz r0,36(r3)
71c: 83 a5 00 00 lwz r29,0(r5)
720: 7c 06 e8 40 cmplw r6,r29
724: 40 a2 ff bc bne 6e0 <__c_kernel_clock_gettime+0x34>
728: 7d 0a 40 10 subfc r8,r10,r8
72c: 7c cb 49 10 subfe r6,r11,r9
730: 7c c6 f9 d6 mullw r6,r6,r31
734: 7d 28 f8 16 mulhwu r9,r8,r31
738: 7d 08 f9 d6 mullw r8,r8,r31
73c: 55 8c 06 fe clrlwi r12,r12,27
740: 7f c8 f0 14 addc r30,r8,r30
744: 7c c6 4a 14 add r6,r6,r9
748: 7c c6 e1 14 adde r6,r6,r28
74c: 34 6c ff e0 addic. r3,r12,-32
750: 41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>
754: 7c c9 1c 30 srw r9,r6,r3
758: 39 00 00 00 li r8,0
75c: 2c 08 00 00 cmpwi r8,0
760: 41 82 00 94 beq 7f4 <__c_kernel_clock_gettime+0x148>
764: 3c c0 c4 65 lis r6,-15259
768: 3c 60 3b 9a lis r3,15258
76c: 60 c6 36 00 ori r6,r6,13824
770: 60 63 c9 ff ori r3,r3,51711
774: 7c a9 30 14 addc r5,r9,r6
778: 7d 48 01 d4 addme r10,r8
77c: 2c 0a 00 00 cmpwi r10,0
780: 7d 48 53 78 mr r8,r10
784: 7c a9 2b 78 mr r9,r5
788: 38 e7 00 01 addi r7,r7,1
78c: 40 82 ff e8 bne 774 <__c_kernel_clock_gettime+0xc8>
790: 7c 05 18 40 cmplw r5,r3
794: 41 81 ff e0 bgt 774 <__c_kernel_clock_gettime+0xc8>
798: 7c 07 00 14 addc r0,r7,r0
79c: 90 04 00 00 stw r0,0(r4)
7a0: 91 24 00 04 stw r9,4(r4)
7a4: 38 60 00 00 li r3,0
7a8: 83 81 00 10 lwz r28,16(r1)
7ac: 83 a1 00 14 lwz r29,20(r1)
7b0: 83 c1 00 18 lwz r30,24(r1)
7b4: 83 e1 00 1c lwz r31,28(r1)
7b8: 38 21 00 20 addi r1,r1,32
7bc: 4e 80 00 20 blr
7c0: 54 c3 08 3c rlwinm r3,r6,1,0,30
7c4: 21 6c 00 1f subfic r11,r12,31
7c8: 7c 63 58 30 slw r3,r3,r11
7cc: 7f c9 64 30 srw r9,r30,r12
7d0: 7c 69 4b 78 or r9,r3,r9
7d4: 7c c8 64 30 srw r8,r6,r12
7d8: 4b ff ff 84 b 75c <__c_kernel_clock_gettime+0xb0>
7dc: 4b ff ff 04 b 6e0 <__c_kernel_clock_gettime+0x34>
7e0: 38 00 00 f6 li r0,246
7e4: 44 00 00 02 sc
7e8: 40 a3 00 08 bns 7f0 <__c_kernel_clock_gettime+0x144>
7ec: 7c 63 00 d0 neg r3,r3
7f0: 4e 80 00 20 blr
7f4: 3d 40 3b 9a lis r10,15258
7f8: 61 4a c9 ff ori r10,r10,51711
7fc: 7c 09 50 40 cmplw r9,r10
800: 41 81 ff 64 bgt 764 <__c_kernel_clock_gettime+0xb8>
804: 4b ff ff 98 b 79c <__c_kernel_clock_gettime+0xf0>
808: 71 2a 00 60 andi. r10,r9,96
80c: 40 82 00 14 bne 820 <__c_kernel_clock_gettime+0x174>
810: 71 29 00 10 andi. r9,r9,16
814: 41 a2 ff cc beq 7e0 <__c_kernel_clock_gettime+0x134>
818: 38 a5 00 f0 addi r5,r5,240
81c: 4b ff fe a8 b 6c4 <__c_kernel_clock_gettime+0x18>
820: 54 69 20 36 rlwinm r9,r3,4,0,27
824: 7d 25 4a 14 add r9,r5,r9
828: 81 05 00 00 lwz r8,0(r5)
82c: 71 0a 00 01 andi. r10,r8,1
830: 40 82 00 28 bne 858 <__c_kernel_clock_gettime+0x1ac>
834: 80 09 00 24 lwz r0,36(r9)
838: 81 49 00 2c lwz r10,44(r9)
83c: 80 e5 00 00 lwz r7,0(r5)
840: 7c 08 38 40 cmplw r8,r7
844: 40 a2 ff e4 bne 828 <__c_kernel_clock_gettime+0x17c>
848: 90 04 00 00 stw r0,0(r4)
84c: 91 44 00 04 stw r10,4(r4)
850: 38 60 00 00 li r3,0
854: 4e 80 00 20 blr
858: 4b ff ff d0 b 828 <__c_kernel_clock_gettime+0x17c>
Christophe
^ permalink raw reply
* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Tyrel Datwyler @ 2020-08-05 16:14 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20200805131922.GZ12965@vkoul-mobl>
On 8/5/20 6:19 AM, Vinod Koul wrote:
> On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
>
>> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
>> index 2c508ee672b9..9b69716172a4 100644
>> --- a/drivers/dma/pl330.c
>> +++ b/drivers/dma/pl330.c
>> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>>
>> if (_state(thrd) == PL330_STATE_KILLING)
>> UNTIL(thrd, PL330_STATE_STOPPED)
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_FAULTING:
>> _stop(thrd);
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_KILLING:
>> case PL330_STATE_COMPLETING:
>> UNTIL(thrd, PL330_STATE_STOPPED)
>> - /* fall through */
>> + fallthrough;
>>
>> case PL330_STATE_STOPPED:
>> return _trigger(thrd);
>> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>>
>> switch (direction) {
>> case DMA_MEM_TO_MEM:
>> - /* fall through */
>> case DMA_MEM_TO_DEV:
>> off += _emit_LD(dry_run, &buf[off], cond);
>> break;
>> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>>
>> switch (direction) {
>> case DMA_MEM_TO_MEM:
>> - /* fall through */
>> case DMA_DEV_TO_MEM:
>> off += _emit_ST(dry_run, &buf[off], cond);
>> break;
>> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>>
>> switch (pxs->desc->rqtype) {
>> case DMA_MEM_TO_DEV:
>> - /* fall through */
>> case DMA_DEV_TO_MEM:
>> off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>> cond);
>> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>>
>> switch (pxs->desc->rqtype) {
>> case DMA_MEM_TO_DEV:
>> - /* fall through */
>
> replacement missed here and above few
>
Its not obvious via most of the documentation, but a case label followed
immediately by another case label is the only allowed implicit fall through as
it is obvious to the eye that there is no postceding statement.
For example the following is legal:
case FOO:
/* fallthrough */ (or fallthrough;)
case BAR:
is converted to:
case FOO:
case BAR:
I would assume the justification is that it is common to have a switch statement
where several case statements fall directly through to a single code block and
annotating each case label seems like overkill.
switch (vhost->state) {
case IBMVFC_LINK_DEAD:
case IBMVFC_HOST_OFFLINE:
result = DID_NO_CONNECT << 16;
break;
case IBMVFC_NO_CRQ:
case IBMVFC_INITIALIZING:
case IBMVFC_HALTED:
case IBMVFC_LINK_DOWN:
result = DID_REQUEUE << 16;
break;
case IBMVFC_ACTIVE:
result = 0;
break;
}
Regards,
Tyrel
^ permalink raw reply
* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-05 15:52 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200722022422.825197-1-mpe@ellerman.id.au>
Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
> The assembler says:
> arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
I get exactly the same with head_32.S, for the exact same reason.
Christophe
>
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
>
> Fix it to use a literal 0, the generated code is identical.
>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> arch/powerpc/kernel/head_40x.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 926bfa73586a..5b282d9965a5 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -620,7 +620,7 @@ _ENTRY(saved_ksp_limit)
> ori r6, r6, swapper_pg_dir@l
> lis r5, abatron_pteptrs@h
> ori r5, r5, abatron_pteptrs@l
> - stw r5, 0xf0(r0) /* Must match your Abatron config file */
> + stw r5, 0xf0(0) /* Must match your Abatron config file */
> tophys(r5,r5)
> stw r6, 0(r5)
>
>
^ permalink raw reply
* [PATCH] powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
When MODULES_VADDR is defined, is_module_segment() shall check the
address against it instead of checking agains VMALLOC_START.
Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/book3s32/mmu.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index c0162911f6cb..82ae9e06a773 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -191,10 +191,17 @@ static bool is_module_segment(unsigned long addr)
{
if (!IS_ENABLED(CONFIG_MODULES))
return false;
+#ifdef MODULES_VADDR
+ if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
+ return false;
+ if (addr >= ALIGN(MODULES_END, SZ_256M))
+ return false;
+#else
if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
return false;
if (addr >= ALIGN(VMALLOC_END, SZ_256M))
return false;
+#endif
return true;
}
--
2.25.0
^ permalink raw reply related
* [PATCH] powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
On BOOK3S_32, when we have modules and strict kernel RWX, modules
are not in vmalloc space but in a dedicated segment that is
below PAGE_OFFSET.
So KASAN_SHADOW_START must take it into account.
MODULES_VADDR can't be used because it is not defined yet
in kasan.h
Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/kasan.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index d635b96c7ea6..7355ed05e65e 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -15,11 +15,18 @@
#ifndef __ASSEMBLY__
#include <asm/page.h>
+#include <linux/sizes.h>
#define KASAN_SHADOW_SCALE_SHIFT 3
+#if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_MODULES) && defined(CONFIG_STRICT_KERNEL_RWX)
+#define KASAN_KERN_START ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
+#else
+#define KASAN_KERN_START PAGE_OFFSET
+#endif
+
#define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
- (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
+ (KASAN_KERN_START >> KASAN_SHADOW_SCALE_SHIFT))
#define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
--
2.25.0
^ permalink raw reply related
* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 14:03 UTC (permalink / raw)
To: Christophe Leroy
Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu>
Hi!
On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
>
> 18: 35 25 ff e0 addic. r9,r5,-32
> 1c: 41 80 00 10 blt 2c <shift+0x14>
> 20: 7c 64 4c 30 srw r4,r3,r9
> 24: 38 60 00 00 li r3,0
> ...
> 2c: 54 69 08 3c rlwinm r9,r3,1,0,30
> 30: 21 45 00 1f subfic r10,r5,31
> 34: 7c 84 2c 30 srw r4,r4,r5
> 38: 7d 29 50 30 slw r9,r9,r10
> 3c: 7c 63 2c 30 srw r3,r3,r5
> 40: 7d 24 23 78 or r4,r9,r4
>
> In our case the shift is always <= 32. In addition, the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
>
> With the patch, we get:
> 0: 21 25 00 20 subfic r9,r5,32
> 4: 7c 69 48 30 slw r9,r3,r9
> 8: 7c 84 2c 30 srw r4,r4,r5
> c: 7d 24 23 78 or r4,r9,r4
> 10: 7c 63 2c 30 srw r3,r3,r5
See below. Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).
> +/*
> + * The macros sets two stack frames, one for the caller and one for the callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */
If the caller follows the ABI, there always is a stack frame. So what
is going on?
> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */
It cannot optimise it because it does not know shift < 32. The code
below is incorrect for shift equal to 32, fwiw.
> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> +{
> + return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> + u32 hi = ns >> 32;
> + u32 lo = ns;
> +
> + lo >>= shift;
> + lo |= hi << (32 - shift);
> + hi >>= shift;
> + if (likely(hi == 0))
> + return lo;
Removing these two lines shouldn't change generated object code? Or not
make it worse, at least.
> + return ((u64)hi << 32) | lo;
> +}
What does the compiler do for just
static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
return ns >> (shift & 31);
}
?
Segher
^ permalink raw reply
* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 13:35 UTC (permalink / raw)
To: Michael Ellerman
Cc: Christophe Leroy, nathanl, arnd, linux-kernel,
Tulio Magno Quites Machado Filho, Paul Mackerras, luto,
linux-arch, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <87wo2dy5in.fsf@mpe.ellerman.id.au>
Hi!
On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack
> > frame whenever it has anything to same.
>
> Yeah OK that would explain it.
>
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> > fbb60: 94 21 ff e0 stwu r1,-32(r1)
This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).
> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> > .cfi_startproc
> > PPC_STLU r1, -STACK_FRAME_OVERHEAD(r1)
> > mflr r0
> > + PPC_STLU r1, -STACK_FRAME_OVERHEAD(r1)
> > .cfi_register lr, r0
>
> The cfi_register should come directly after the mflr I think.
That is the idiomatic way to write it, and most obviously correct. But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding. Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)
> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if
> > you think that's needed, just let's do it:
>
> Hmm, true.
>
> The compiler will use the toc for globals (and possibly also for large
> constants?)
And anything else it bloody well wants to, yeah :-)
> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.
Yes.
> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.
If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)
> So yeah we should probably do as below.
[ snip ]
Looks good yes.
Segher
^ permalink raw reply
* [PATCH v2] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 13:35 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>
All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.
Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.
This should prevent some soft lockups when starting large guests
Code has meaning only if CONFIG_MEMORY_HOTPLUG is set, otherwise the nid
field is not present in the drmem_lmb structure.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/mm/drmem.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..c11b6ec99ea3 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -402,6 +402,9 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
const __be32 *p;
u32 i, j, lmb_sets;
int lmb_index;
+#ifdef CONFIG_MEMORY_HOTPLUG
+ struct drmem_lmb *first = NULL;
+#endif
lmb_sets = of_read_number(prop++, 1);
if (lmb_sets == 0)
@@ -426,6 +429,15 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
for (i = 0; i < lmb_sets; i++) {
read_drconf_v2_cell(&dr_cell, &p);
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Fetch the NUMA node id for the fist set or if the
+ * associativity index is different from the previous set.
+ */
+ if (first && dr_cell.aa_index != first->aa_index)
+ first = NULL;
+#endif
+
for (j = 0; j < dr_cell.seq_lmbs; j++) {
lmb = &drmem_info->lmbs[lmb_index++];
@@ -438,7 +450,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
- lmb_set_nid(lmb);
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * All the LMB in the set share the same NUMA
+ * associativity property. So read that node only once.
+ */
+ if (!first) {
+ lmb_set_nid(lmb);
+ first = lmb;
+ } else {
+ lmb->nid = first->nid;
+ }
+#endif
}
}
}
--
2.28.0
^ permalink raw reply related
* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Vinod Koul @ 2020-08-05 13:19 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Song Liu, Alexei Starovoitov, Shawn Guo, Fabio Estevam,
Daniel Borkmann, John Fastabend, Zhang Wei, NXP Linux Team,
Yonghong Song, Andrii Nakryiko, Sascha Hauer, KP Singh,
Dan Williams, linux-arm-kernel, netdev, linux-kernel, Li Yang,
Pengutronix Kernel Team, dmaengine, bpf, linuxppc-dev,
Martin KaFai Lau
In-Reply-To: <20200727203413.GA6245@embeddedor>
On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 2c508ee672b9..9b69716172a4 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>
> if (_state(thrd) == PL330_STATE_KILLING)
> UNTIL(thrd, PL330_STATE_STOPPED)
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_FAULTING:
> _stop(thrd);
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_KILLING:
> case PL330_STATE_COMPLETING:
> UNTIL(thrd, PL330_STATE_STOPPED)
> - /* fall through */
> + fallthrough;
>
> case PL330_STATE_STOPPED:
> return _trigger(thrd);
> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>
> switch (direction) {
> case DMA_MEM_TO_MEM:
> - /* fall through */
> case DMA_MEM_TO_DEV:
> off += _emit_LD(dry_run, &buf[off], cond);
> break;
> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>
> switch (direction) {
> case DMA_MEM_TO_MEM:
> - /* fall through */
> case DMA_DEV_TO_MEM:
> off += _emit_ST(dry_run, &buf[off], cond);
> break;
> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>
> switch (pxs->desc->rqtype) {
> case DMA_MEM_TO_DEV:
> - /* fall through */
> case DMA_DEV_TO_MEM:
> off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
> cond);
> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>
> switch (pxs->desc->rqtype) {
> case DMA_MEM_TO_DEV:
> - /* fall through */
replacement missed here and above few
--
~Vinod
^ permalink raw reply
* [PATCH] usb: gadget: fix spelling mistake "Dectected" -> "Detected"
From: Colin King @ 2020-08-05 13:14 UTC (permalink / raw)
To: Li Yang, Felipe Balbi, Greg Kroah-Hartman, linux-usb,
linuxppc-dev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
There is a spelling mistake in a literal string. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/usb/gadget/udc/fsl_udc_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index b2638e83bb49..c79407f2d577 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -2061,7 +2061,7 @@ static int fsl_proc_read(struct seq_file *m, void *v)
"Sleep Enable: %d SOF Received Enable: %d "
"Reset Enable: %d\n"
"System Error Enable: %d "
- "Port Change Dectected Enable: %d\n"
+ "Port Change Detected Enable: %d\n"
"USB Error Intr Enable: %d USB Intr Enable: %d\n\n",
(tmp_reg & USB_INTR_DEVICE_SUSPEND) ? 1 : 0,
(tmp_reg & USB_INTR_SOF_EN) ? 1 : 0,
--
2.27.0
^ permalink raw reply related
* Re: [PATCH v3 2/4] powerpc/sstep: support emulation for vsx vector paired storage access instructions
From: Naveen N. Rao @ 2020-08-05 12:53 UTC (permalink / raw)
To: Balamuruhan S, mpe; +Cc: paulus, linuxppc-dev, sandipan, ravi.bangoria
In-Reply-To: <20200731081637.1837559-3-bala24@linux.ibm.com>
Balamuruhan S wrote:
> add emulate_step() changes to support vsx vector paired storage
> access instructions that provides octword operands loads/stores
> between storage and set of 64 Vector Scalar Registers (VSRs).
This should be squashed in with the previous patch. Otherwise, emulation
of these instructions won't be complete, which affects bisectability.
- Naveen
^ permalink raw reply
* Re: [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 12:23 UTC (permalink / raw)
To: kernel test robot, linuxppc-dev, linux-kernel
Cc: nathanl, cheloha, kbuild-all
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>
Le 05/08/2020 à 12:43, kernel test robot a écrit :
> Hi Laurent,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on linux/master linus/master v5.8 next-20200804]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Laurent-Dufour/powerpc-drmem-Don-t-compute-the-NUMA-node-for-each-LMB/20200805-173213
> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-mpc885_ads_defconfig (attached as .config)
> compiler: powerpc-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> arch/powerpc/mm/drmem.c: In function 'init_drmem_v2_lmbs':
>>> arch/powerpc/mm/drmem.c:457:8: error: 'struct drmem_lmb' has no member named 'nid'
> 457 | lmb->nid = first->nid;
> | ^~
> arch/powerpc/mm/drmem.c:457:21: error: 'struct drmem_lmb' has no member named 'nid'
> 457 | lmb->nid = first->nid;
> | ^~
My mistake, the nid member is only present when CONFIG_MEMORY_HOTPLUG is set.
I'll send a new version fixing this.
> vim +457 arch/powerpc/mm/drmem.c
>
> 397
> 398 static void __init init_drmem_v2_lmbs(const __be32 *prop)
> 399 {
> 400 struct drmem_lmb *lmb, *first;
> 401 struct of_drconf_cell_v2 dr_cell;
> 402 const __be32 *p;
> 403 u32 i, j, lmb_sets;
> 404 int lmb_index;
> 405
> 406 lmb_sets = of_read_number(prop++, 1);
> 407 if (lmb_sets == 0)
> 408 return;
> 409
> 410 /* first pass, calculate the number of LMBs */
> 411 p = prop;
> 412 for (i = 0; i < lmb_sets; i++) {
> 413 read_drconf_v2_cell(&dr_cell, &p);
> 414 drmem_info->n_lmbs += dr_cell.seq_lmbs;
> 415 }
> 416
> 417 drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
> 418 GFP_KERNEL);
> 419 if (!drmem_info->lmbs)
> 420 return;
> 421
> 422 /* second pass, read in the LMB information */
> 423 lmb_index = 0;
> 424 p = prop;
> 425 first = NULL;
> 426
> 427 for (i = 0; i < lmb_sets; i++) {
> 428 read_drconf_v2_cell(&dr_cell, &p);
> 429
> 430 /*
> 431 * Fetch the NUMA node id for the fist set or if the
> 432 * associativity index is different from the previous set.
> 433 */
> 434 if (first && dr_cell.aa_index != first->aa_index)
> 435 first = NULL;
> 436
> 437 for (j = 0; j < dr_cell.seq_lmbs; j++) {
> 438 lmb = &drmem_info->lmbs[lmb_index++];
> 439
> 440 lmb->base_addr = dr_cell.base_addr;
> 441 dr_cell.base_addr += drmem_info->lmb_size;
> 442
> 443 lmb->drc_index = dr_cell.drc_index;
> 444 dr_cell.drc_index++;
> 445
> 446 lmb->aa_index = dr_cell.aa_index;
> 447 lmb->flags = dr_cell.flags;
> 448
> 449 /*
> 450 * All the LMB in the set share the same NUMA
> 451 * associativity property. So read that node only once.
> 452 */
> 453 if (!first) {
> 454 lmb_set_nid(lmb);
> 455 first = lmb;
> 456 } else {
> > 457 lmb->nid = first->nid;
> 458 }
> 459 }
> 460 }
> 461 }
> 462
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
^ permalink raw reply
* Re: [PATCH v2 12/17] arch, drivers: replace for_each_membock() with for_each_mem_range()
From: Thomas Bogendoerfer @ 2020-08-05 11:00 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
linux-mm, linux-kernel, iommu, Palmer Dabbelt, Andrew Morton,
linuxppc-dev
In-Reply-To: <20200802163601.8189-13-rppt@kernel.org>
On Sun, Aug 02, 2020 at 07:35:56PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> There are several occurrences of the following pattern:
>
> for_each_memblock(memory, reg) {
> start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
> end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));
>
> /* do something with start and end */
> }
>
> Using for_each_mem_range() iterator is more appropriate in such cases and
> allows simpler and cleaner code.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> arch/mips/cavium-octeon/dma-octeon.c | 12 +++---
> arch/mips/kernel/setup.c | 31 +++++++--------
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Thomas Bogendoerfer @ 2020-08-05 10:58 UTC (permalink / raw)
To: Mike Rapoport
Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
linux-mm, linux-kernel, iommu, Palmer Dabbelt, Andrew Morton,
linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>
On Sun, Aug 02, 2020 at 07:36:01PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> for_each_memblock() is used to iterate over memblock.memory in
> a few places that use data from memblock_region rather than the memory
> ranges.
>
> Introduce separate for_each_mem_region() and for_each_reserved_mem_region()
> to improve encapsulation of memblock internals from its users.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> arch/mips/netlogic/xlp/setup.c | 2 +-
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* Re: [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: kernel test robot @ 2020-08-05 10:43 UTC (permalink / raw)
To: Laurent Dufour, linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha, kbuild-all
In-Reply-To: <20200805092858.64143-1-ldufour@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]
Hi Laurent,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on linux/master linus/master v5.8 next-20200804]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Laurent-Dufour/powerpc-drmem-Don-t-compute-the-NUMA-node-for-each-LMB/20200805-173213
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-mpc885_ads_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/powerpc/mm/drmem.c: In function 'init_drmem_v2_lmbs':
>> arch/powerpc/mm/drmem.c:457:8: error: 'struct drmem_lmb' has no member named 'nid'
457 | lmb->nid = first->nid;
| ^~
arch/powerpc/mm/drmem.c:457:21: error: 'struct drmem_lmb' has no member named 'nid'
457 | lmb->nid = first->nid;
| ^~
vim +457 arch/powerpc/mm/drmem.c
397
398 static void __init init_drmem_v2_lmbs(const __be32 *prop)
399 {
400 struct drmem_lmb *lmb, *first;
401 struct of_drconf_cell_v2 dr_cell;
402 const __be32 *p;
403 u32 i, j, lmb_sets;
404 int lmb_index;
405
406 lmb_sets = of_read_number(prop++, 1);
407 if (lmb_sets == 0)
408 return;
409
410 /* first pass, calculate the number of LMBs */
411 p = prop;
412 for (i = 0; i < lmb_sets; i++) {
413 read_drconf_v2_cell(&dr_cell, &p);
414 drmem_info->n_lmbs += dr_cell.seq_lmbs;
415 }
416
417 drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
418 GFP_KERNEL);
419 if (!drmem_info->lmbs)
420 return;
421
422 /* second pass, read in the LMB information */
423 lmb_index = 0;
424 p = prop;
425 first = NULL;
426
427 for (i = 0; i < lmb_sets; i++) {
428 read_drconf_v2_cell(&dr_cell, &p);
429
430 /*
431 * Fetch the NUMA node id for the fist set or if the
432 * associativity index is different from the previous set.
433 */
434 if (first && dr_cell.aa_index != first->aa_index)
435 first = NULL;
436
437 for (j = 0; j < dr_cell.seq_lmbs; j++) {
438 lmb = &drmem_info->lmbs[lmb_index++];
439
440 lmb->base_addr = dr_cell.base_addr;
441 dr_cell.base_addr += drmem_info->lmb_size;
442
443 lmb->drc_index = dr_cell.drc_index;
444 dr_cell.drc_index++;
445
446 lmb->aa_index = dr_cell.aa_index;
447 lmb->flags = dr_cell.flags;
448
449 /*
450 * All the LMB in the set share the same NUMA
451 * associativity property. So read that node only once.
452 */
453 if (!first) {
454 lmb_set_nid(lmb);
455 first = lmb;
456 } else {
> 457 lmb->nid = first->nid;
458 }
459 }
460 }
461 }
462
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 10853 bytes --]
^ permalink raw reply
* [PATCH] macintosh: windfarm: fix spelling mistake "detatch" -> "detach"
From: Colin King @ 2020-08-05 10:43 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Wolfram Sang, colin.king, linuxppc-dev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
There are spelling mistakes in DBG messages. Fix them.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/macintosh/windfarm_lm75_sensor.c | 2 +-
drivers/macintosh/windfarm_lm87_sensor.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/macintosh/windfarm_lm75_sensor.c b/drivers/macintosh/windfarm_lm75_sensor.c
index 1e5fa09845e7..a88f73af4d5e 100644
--- a/drivers/macintosh/windfarm_lm75_sensor.c
+++ b/drivers/macintosh/windfarm_lm75_sensor.c
@@ -152,7 +152,7 @@ static int wf_lm75_remove(struct i2c_client *client)
{
struct wf_lm75_sensor *lm = i2c_get_clientdata(client);
- DBG("wf_lm75: i2c detatch called for %s\n", lm->sens.name);
+ DBG("wf_lm75: i2c detach called for %s\n", lm->sens.name);
/* Mark client detached */
lm->i2c = NULL;
diff --git a/drivers/macintosh/windfarm_lm87_sensor.c b/drivers/macintosh/windfarm_lm87_sensor.c
index d011899c0a8a..de8ef76a0ac8 100644
--- a/drivers/macintosh/windfarm_lm87_sensor.c
+++ b/drivers/macintosh/windfarm_lm87_sensor.c
@@ -149,7 +149,7 @@ static int wf_lm87_remove(struct i2c_client *client)
{
struct wf_lm87_sensor *lm = i2c_get_clientdata(client);
- DBG("wf_lm87: i2c detatch called for %s\n", lm->sens.name);
+ DBG("wf_lm87: i2c detach called for %s\n", lm->sens.name);
/* Mark client detached */
lm->i2c = NULL;
--
2.27.0
^ permalink raw reply related
* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Baoquan He @ 2020-08-05 9:29 UTC (permalink / raw)
To: Mike Rapoport
Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>
On 08/02/20 at 07:36pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> for_each_memblock() is used to iterate over memblock.memory in
> a few places that use data from memblock_region rather than the memory
> ranges.
>
> Introduce separate for_each_mem_region() and for_each_reserved_mem_region()
> to improve encapsulation of memblock internals from its users.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> .clang-format | 3 ++-
> arch/arm64/kernel/setup.c | 2 +-
> arch/arm64/mm/numa.c | 2 +-
> arch/mips/netlogic/xlp/setup.c | 2 +-
> arch/x86/mm/numa.c | 2 +-
> include/linux/memblock.h | 19 ++++++++++++++++---
> mm/memblock.c | 4 ++--
> mm/page_alloc.c | 8 ++++----
> 8 files changed, 28 insertions(+), 14 deletions(-)
>
...
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 9e51b3fd4134..a6970e058bd7 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -522,9 +522,22 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
> return PFN_UP(reg->base + reg->size);
> }
>
> -#define for_each_memblock(memblock_type, region) \
> - for (region = memblock.memblock_type.regions; \
> - region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
> +/**
> + * for_each_mem_region - itereate over registered memory regions
~~~~~~~~~~~~~~~~~
Wonder why emphasize 'registered' memory.
Other than this confusion to me, this patch looks good.
Reviewed-by: Baoquan He <bhe@redhat.com>
^ permalink raw reply
* [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 9:28 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha
All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.
Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.
This should prevent some soft lockups when starting large guests
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/mm/drmem.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..3819c523c65b 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -397,7 +397,7 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
static void __init init_drmem_v2_lmbs(const __be32 *prop)
{
- struct drmem_lmb *lmb;
+ struct drmem_lmb *lmb, *first;
struct of_drconf_cell_v2 dr_cell;
const __be32 *p;
u32 i, j, lmb_sets;
@@ -422,10 +422,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
/* second pass, read in the LMB information */
lmb_index = 0;
p = prop;
+ first = NULL;
for (i = 0; i < lmb_sets; i++) {
read_drconf_v2_cell(&dr_cell, &p);
+ /*
+ * Fetch the NUMA node id for the fist set or if the
+ * associativity index is different from the previous set.
+ */
+ if (first && dr_cell.aa_index != first->aa_index)
+ first = NULL;
+
for (j = 0; j < dr_cell.seq_lmbs; j++) {
lmb = &drmem_info->lmbs[lmb_index++];
@@ -438,7 +446,16 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
- lmb_set_nid(lmb);
+ /*
+ * All the LMB in the set share the same NUMA
+ * associativity property. So read that node only once.
+ */
+ if (!first) {
+ lmb_set_nid(lmb);
+ first = lmb;
+ } else {
+ lmb->nid = first->nid;
+ }
}
}
}
--
2.28.0
^ permalink raw reply related
* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Baoquan He @ 2020-08-05 9:11 UTC (permalink / raw)
To: Mike Rapoport
Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>
On 08/02/20 at 07:36pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Iteration over memblock.reserved with for_each_reserved_mem_region() used
> __next_reserved_mem_region() that implemented a subset of
> __next_mem_region().
>
> Use __for_each_mem_range() and, essentially, __next_mem_region() with
> appropriate parameters to reduce code duplication.
>
> While on it, rename for_each_reserved_mem_region() to
> for_each_reserved_mem_range() for consistency.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> .clang-format | 2 +-
> arch/arm64/kernel/setup.c | 2 +-
> drivers/irqchip/irq-gic-v3-its.c | 2 +-
> include/linux/memblock.h | 12 +++------
> mm/memblock.c | 46 +++++++-------------------------
> 5 files changed, 17 insertions(+), 47 deletions(-)
Reviewed-by: Baoquan He <bhe@redhat.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox