* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available [not found] <20211213140252.2856053-1-ardb@kernel.org> @ 2021-12-14 2:36 ` Nathan Chancellor 2021-12-14 8:19 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Nathan Chancellor @ 2021-12-14 2:36 UTC (permalink / raw) To: Ard Biesheuvel Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, llvm Hi Ard, On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > Use the EOR3 instruction to implement xor_blocks() if the instruction is > available, which is the case if the CPU implements the SHA-3 extension. > This is about 20% faster on Apple M1 when using the 5-way version. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use EOR3 instructions when available") in the arm64 tree breaks allyesconfig: https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true I also see this when building with GCC 11.2.0: WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object arch/arm64/lib/xor-neon.o:(.data+0x0): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data+0x18): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data+0x20): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x0): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x8): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x10): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x18): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x20): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x28): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x30): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x38): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x40): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x48): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.fini_array+0x0): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.init_array+0x0): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x8): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x18): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x20): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x28): dangerous relocation: unsupported relocation arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x30): dangerous relocation: unsupported relocation Cheers, Nathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available 2021-12-14 2:36 ` [PATCH v2] arm64/xor: use EOR3 instructions when available Nathan Chancellor @ 2021-12-14 8:19 ` Ard Biesheuvel 2021-12-14 11:05 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2021-12-14 8:19 UTC (permalink / raw) To: Nathan Chancellor, Arnd Bergmann Cc: Linux ARM, Catalin Marinas, Will Deacon, Mark Rutland, llvm + Arnd On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan@kernel.org> wrote: > > Hi Ard, > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > > Use the EOR3 instruction to implement xor_blocks() if the instruction is > > available, which is the case if the CPU implements the SHA-3 extension. > > This is about 20% faster on Apple M1 when using the 5-way version. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use > EOR3 instructions when available") in the arm64 tree breaks > allyesconfig: > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true > > I also see this when building with GCC 11.2.0: > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object I suspect this is another genksyms crash, preventing the __crc_xor_block_inner_neon symbol from ever being emitted. This is a recurring annoyance and I am not sure how to address this properly. Arnd might have some thoughts on the matter as well. > arch/arm64/lib/xor-neon.o:(.data+0x0): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data+0x18): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data+0x20): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x0): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x8): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x10): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x18): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x20): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x28): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x30): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x38): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x40): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(__patchable_function_entries+0x48): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.fini_array+0x0): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.init_array+0x0): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x8): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x18): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x20): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x28): dangerous relocation: unsupported relocation > arch/arm64/lib/xor-neon.o:(.data..ro_after_init+0x30): dangerous relocation: unsupported relocation > > Cheers, > Nathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available 2021-12-14 8:19 ` Ard Biesheuvel @ 2021-12-14 11:05 ` Ard Biesheuvel 2021-12-14 11:36 ` Catalin Marinas 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2021-12-14 11:05 UTC (permalink / raw) To: Nathan Chancellor, Arnd Bergmann Cc: Linux ARM, Catalin Marinas, Will Deacon, Mark Rutland, llvm On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb@kernel.org> wrote: > > + Arnd > > On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan@kernel.org> wrote: > > > > Hi Ard, > > > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > > > Use the EOR3 instruction to implement xor_blocks() if the instruction is > > > available, which is the case if the CPU implements the SHA-3 extension. > > > This is about 20% faster on Apple M1 when using the 5-way version. > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use > > EOR3 instructions when available") in the arm64 tree breaks > > allyesconfig: > > > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true > > > > I also see this when building with GCC 11.2.0: > > > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... > > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? > > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object > > I suspect this is another genksyms crash, preventing the > __crc_xor_block_inner_neon symbol from ever being emitted. > > This is a recurring annoyance and I am not sure how to address this > properly. Arnd might have some thoughts on the matter as well. > > I managed to reproduce this: it's not a crash but definitely a bug in genksyms, as it simply fails to produce the output containing the assignment of __crc_xor_block_inner_neon. Moving the definition of xor_block_inner_neon as below works around the issue. Catalin: would you like me to spin a v3? Or do your prefer to just fold this into the existing one? diff --git a/arch/arm64/lib/xor-neon.c b/arch/arm64/lib/xor-neon.c index 5c8688700f63..d189cf4e70ea 100644 --- a/arch/arm64/lib/xor-neon.c +++ b/arch/arm64/lib/xor-neon.c @@ -167,6 +167,15 @@ void xor_arm64_neon_5(unsigned long bytes, unsigned long *p1, } while (--lines > 0); } +struct xor_block_template xor_block_inner_neon __ro_after_init = { + .name = "__inner_neon__", + .do_2 = xor_arm64_neon_2, + .do_3 = xor_arm64_neon_3, + .do_4 = xor_arm64_neon_4, + .do_5 = xor_arm64_neon_5, +}; +EXPORT_SYMBOL(xor_block_inner_neon); + static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) { uint64x2_t res; @@ -296,15 +305,6 @@ static void xor_arm64_eor3_5(unsigned long bytes, unsigned long *p1, } while (--lines > 0); } -struct xor_block_template xor_block_inner_neon __ro_after_init = { - .name = "__inner_neon__", - .do_2 = xor_arm64_neon_2, - .do_3 = xor_arm64_neon_3, - .do_4 = xor_arm64_neon_4, - .do_5 = xor_arm64_neon_5, -}; -EXPORT_SYMBOL(xor_block_inner_neon); - static int __init xor_neon_init(void) { if (IS_ENABLED(CONFIG_AS_HAS_SHA3) && cpu_have_named_feature(SHA3)) { ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available 2021-12-14 11:05 ` Ard Biesheuvel @ 2021-12-14 11:36 ` Catalin Marinas 2021-12-14 12:57 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Catalin Marinas @ 2021-12-14 11:36 UTC (permalink / raw) To: Ard Biesheuvel Cc: Nathan Chancellor, Arnd Bergmann, Linux ARM, Will Deacon, Mark Rutland, llvm On Tue, Dec 14, 2021 at 12:05:34PM +0100, Ard Biesheuvel wrote: > On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > + Arnd > > > > On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan@kernel.org> wrote: > > > > > > Hi Ard, > > > > > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > > > > Use the EOR3 instruction to implement xor_blocks() if the instruction is > > > > available, which is the case if the CPU implements the SHA-3 extension. > > > > This is about 20% faster on Apple M1 when using the 5-way version. > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use > > > EOR3 instructions when available") in the arm64 tree breaks > > > allyesconfig: > > > > > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true > > > > > > I also see this when building with GCC 11.2.0: > > > > > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... > > > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? > > > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object > > > > I suspect this is another genksyms crash, preventing the > > __crc_xor_block_inner_neon symbol from ever being emitted. > > > > This is a recurring annoyance and I am not sure how to address this > > properly. Arnd might have some thoughts on the matter as well. > > I managed to reproduce this: it's not a crash but definitely a bug in > genksyms, as it simply fails to produce the output containing the > assignment of __crc_xor_block_inner_neon. > > Moving the definition of xor_block_inner_neon as below works around the issue. > > Catalin: would you like me to spin a v3? Or do your prefer to just > fold this into the existing one? I'll fold it in. Thanks. -- Catalin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available 2021-12-14 11:36 ` Catalin Marinas @ 2021-12-14 12:57 ` Ard Biesheuvel 2021-12-15 15:15 ` Catalin Marinas 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2021-12-14 12:57 UTC (permalink / raw) To: Catalin Marinas Cc: Nathan Chancellor, Arnd Bergmann, Linux ARM, Will Deacon, Mark Rutland, llvm On Tue, 14 Dec 2021 at 12:36, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Dec 14, 2021 at 12:05:34PM +0100, Ard Biesheuvel wrote: > > On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > + Arnd > > > > > > On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan@kernel.org> wrote: > > > > > > > > Hi Ard, > > > > > > > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > > > > > Use the EOR3 instruction to implement xor_blocks() if the instruction is > > > > > available, which is the case if the CPU implements the SHA-3 extension. > > > > > This is about 20% faster on Apple M1 when using the 5-way version. > > > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use > > > > EOR3 instructions when available") in the arm64 tree breaks > > > > allyesconfig: > > > > > > > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true > > > > > > > > I also see this when building with GCC 11.2.0: > > > > > > > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... > > > > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? > > > > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object > > > > > > I suspect this is another genksyms crash, preventing the > > > __crc_xor_block_inner_neon symbol from ever being emitted. > > > > > > This is a recurring annoyance and I am not sure how to address this > > > properly. Arnd might have some thoughts on the matter as well. > > > > I managed to reproduce this: it's not a crash but definitely a bug in > > genksyms, as it simply fails to produce the output containing the > > assignment of __crc_xor_block_inner_neon. > > > > Moving the definition of xor_block_inner_neon as below works around the issue. > > > > Catalin: would you like me to spin a v3? Or do your prefer to just > > fold this into the existing one? > > I'll fold it in. Thanks. > The root cause appears to be that genksyms gives up when it encounters static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) { because the types are not defined. This is because our asm/neon-intrinsics.h header avoids #include'ing arm-neon.h in the context of genksyms, as doing so does result in a genksyms crash. I have very little motivation to go and figure out why genksyms crashes in that case, so I think for now, we can stick with the fix I proposed. Alternatively, we could typedef uint64x2_t to something arbitrary if __GENKSYMS__ is defined, or use a macro instead of a static inline for eor3() ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] arm64/xor: use EOR3 instructions when available 2021-12-14 12:57 ` Ard Biesheuvel @ 2021-12-15 15:15 ` Catalin Marinas 0 siblings, 0 replies; 6+ messages in thread From: Catalin Marinas @ 2021-12-15 15:15 UTC (permalink / raw) To: Ard Biesheuvel Cc: Nathan Chancellor, Arnd Bergmann, Linux ARM, Will Deacon, Mark Rutland, llvm On Tue, Dec 14, 2021 at 01:57:47PM +0100, Ard Biesheuvel wrote: > On Tue, 14 Dec 2021 at 12:36, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Dec 14, 2021 at 12:05:34PM +0100, Ard Biesheuvel wrote: > > > On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan@kernel.org> wrote: > > > > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote: > > > > > > Use the EOR3 instruction to implement xor_blocks() if the instruction is > > > > > > available, which is the case if the CPU implements the SHA-3 extension. > > > > > > This is about 20% faster on Apple M1 when using the 5-way version. > > > > > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use > > > > > EOR3 instructions when available") in the arm64 tree breaks > > > > > allyesconfig: > > > > > > > > > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true > > > > > > > > > > I also see this when building with GCC 11.2.0: > > > > > > > > > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ... > > > > > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>? > > > > > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object > > > > > > > > I suspect this is another genksyms crash, preventing the > > > > __crc_xor_block_inner_neon symbol from ever being emitted. > > > > > > > > This is a recurring annoyance and I am not sure how to address this > > > > properly. Arnd might have some thoughts on the matter as well. > > > > > > I managed to reproduce this: it's not a crash but definitely a bug in > > > genksyms, as it simply fails to produce the output containing the > > > assignment of __crc_xor_block_inner_neon. > > > > > > Moving the definition of xor_block_inner_neon as below works around the issue. > > > > > > Catalin: would you like me to spin a v3? Or do your prefer to just > > > fold this into the existing one? > > > > I'll fold it in. Thanks. > > The root cause appears to be that genksyms gives up when it encounters > > static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) > { > > because the types are not defined. This is because our > asm/neon-intrinsics.h header avoids #include'ing arm-neon.h in the > context of genksyms, as doing so does result in a genksyms crash. > > I have very little motivation to go and figure out why genksyms > crashes in that case, so I think for now, we can stick with the fix I > proposed. Alternatively, we could typedef uint64x2_t to something > arbitrary if __GENKSYMS__ is defined, or use a macro instead of a > static inline for eor3() I'll stick to the fix you proposed (already folded in). If we ever add another EXPORT_SYMBOL after the eor3() function, we better look into fixing genksyms or defining a dummy uint64x2_t. -- Catalin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-15 15:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211213140252.2856053-1-ardb@kernel.org>
2021-12-14 2:36 ` [PATCH v2] arm64/xor: use EOR3 instructions when available Nathan Chancellor
2021-12-14 8:19 ` Ard Biesheuvel
2021-12-14 11:05 ` Ard Biesheuvel
2021-12-14 11:36 ` Catalin Marinas
2021-12-14 12:57 ` Ard Biesheuvel
2021-12-15 15:15 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox