From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A62812550AF for ; Fri, 27 Mar 2026 17:50:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774633822; cv=none; b=glYhnXxJmcUWQdU1AIVd20uXOMyzIJyt5T3bCN+IQl9CuygAoEUJh4NV5xei9JtjoITQ0wCXBdNpGX8bRFmqLh2hPemu373P0eaZCNHwkVJK6QiLvMeKz690KiJYPIdDcPMFrGniO1IRXBpMyX2k+dEHrBz5F+sD4p6+OY+C/XI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774633822; c=relaxed/simple; bh=0YFC5KCsTcpH6zQkuxptBqv3Eue5kRLMIFvyO8gX2hw=; h=Date:To:From:Subject:Message-Id; b=oYjjOLCw2WfKnBdzv8hUeXZLj5VuGBLJWhgOEIzxeL84Ln/Y/sjBO1JWaqawUcl6bFdZ+/ZAau3ddeoRX0uYy+8FIMytjnxuBf8qsBTET0/sImKa4h5Bqk4Df5idJIn5i6d4wVbljV0XQdm1oVDXvfwROOdk7C0okW88AMPnMo0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=VQVtzGTs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="VQVtzGTs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33C99C19424; Fri, 27 Mar 2026 17:50:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1774633822; bh=0YFC5KCsTcpH6zQkuxptBqv3Eue5kRLMIFvyO8gX2hw=; h=Date:To:From:Subject:From; b=VQVtzGTsBkN9BMXkeU+JxytYPNytNBd4ahfMmTtVUZxlni+3qYYm3J3+A9TJjCwTm 1pxh2kqokUjQe6YbkE+zuX0gID7Oh0B4hqlYDeLn7vRfW+NlwS6Rde3qYP3FFxDVAO uns7fS9/NUlGjYq4P/a9cn4WgnIymP8M9slEmboI= Date: Fri, 27 Mar 2026 10:50:21 -0700 To: mm-commits@vger.kernel.org,hch@lst.de,akpm@linux-foundation.org From: Andrew Morton Subject: + loongarch-move-the-xor-code-to-lib-raid.patch added to mm-nonmm-unstable branch Message-Id: <20260327175022.33C99C19424@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: loongarch: move the XOR code to lib/raid/ has been added to the -mm mm-nonmm-unstable branch. Its filename is loongarch-move-the-xor-code-to-lib-raid.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/loongarch-move-the-xor-code-to-lib-raid.patch This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via various branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there most days ------------------------------------------------------ From: Christoph Hellwig Subject: loongarch: move the XOR code to lib/raid/ Date: Fri, 27 Mar 2026 07:16:46 +0100 Move the optimized XOR into lib/raid and include it it in xor.ko instead of always building it into the main kernel image. Link: https://lkml.kernel.org/r/20260327061704.3707577-15-hch@lst.de Signed-off-by: Christoph Hellwig Cc: Albert Ou Cc: Alexander Gordeev Cc: Alexandre Ghiti Cc: Andreas Larsson Cc: Anton Ivanov Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: "Borislav Petkov (AMD)" Cc: Catalin Marinas Cc: Chris Mason Cc: Christian Borntraeger Cc: Dan Williams Cc: David S. Miller Cc: David Sterba Cc: Heiko Carstens Cc: Herbert Xu Cc: "H. Peter Anvin" Cc: Huacai Chen Cc: Ingo Molnar Cc: Jason A. Donenfeld Cc: Johannes Berg Cc: Li Nan Cc: Madhavan Srinivasan Cc: Magnus Lindholm Cc: Matt Turner Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Richard Henderson Cc: Richard Weinberger Cc: Russell King Cc: Song Liu Cc: Sven Schnelle Cc: Ted Ts'o Cc: Vasily Gorbik Cc: WANG Xuerui Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/loongarch/include/asm/xor.h | 24 ----- arch/loongarch/include/asm/xor_simd.h | 34 ------- arch/loongarch/lib/Makefile | 2 arch/loongarch/lib/xor_simd.c | 93 ------------------- arch/loongarch/lib/xor_simd.h | 38 ------- arch/loongarch/lib/xor_simd_glue.c | 72 --------------- arch/loongarch/lib/xor_template.c | 110 ----------------------- lib/raid/xor/Makefile | 2 lib/raid/xor/loongarch/xor_simd.c | 93 +++++++++++++++++++ lib/raid/xor/loongarch/xor_simd.h | 38 +++++++ lib/raid/xor/loongarch/xor_simd_glue.c | 77 ++++++++++++++++ lib/raid/xor/loongarch/xor_template.c | 110 +++++++++++++++++++++++ 12 files changed, 323 insertions(+), 370 deletions(-) --- a/arch/loongarch/include/asm/xor.h~loongarch-move-the-xor-code-to-lib-raid +++ a/arch/loongarch/include/asm/xor.h @@ -6,27 +6,6 @@ #define _ASM_LOONGARCH_XOR_H #include -#include - -#ifdef CONFIG_CPU_HAS_LSX -static struct xor_block_template xor_block_lsx = { - .name = "lsx", - .do_2 = xor_lsx_2, - .do_3 = xor_lsx_3, - .do_4 = xor_lsx_4, - .do_5 = xor_lsx_5, -}; -#endif /* CONFIG_CPU_HAS_LSX */ - -#ifdef CONFIG_CPU_HAS_LASX -static struct xor_block_template xor_block_lasx = { - .name = "lasx", - .do_2 = xor_lasx_2, - .do_3 = xor_lasx_3, - .do_4 = xor_lasx_4, - .do_5 = xor_lasx_5, -}; -#endif /* CONFIG_CPU_HAS_LASX */ /* * For grins, also test the generic routines. @@ -38,6 +17,9 @@ static struct xor_block_template xor_blo */ #include +extern struct xor_block_template xor_block_lsx; +extern struct xor_block_template xor_block_lasx; + #define arch_xor_init arch_xor_init static __always_inline void __init arch_xor_init(void) { diff --git a/arch/loongarch/include/asm/xor_simd.h a/arch/loongarch/include/asm/xor_simd.h deleted file mode 100644 --- a/arch/loongarch/include/asm/xor_simd.h +++ /dev/null @@ -1,34 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Copyright (C) 2023 WANG Xuerui - */ -#ifndef _ASM_LOONGARCH_XOR_SIMD_H -#define _ASM_LOONGARCH_XOR_SIMD_H - -#ifdef CONFIG_CPU_HAS_LSX -void xor_lsx_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2); -void xor_lsx_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3); -void xor_lsx_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4); -void xor_lsx_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4, const unsigned long * __restrict p5); -#endif /* CONFIG_CPU_HAS_LSX */ - -#ifdef CONFIG_CPU_HAS_LASX -void xor_lasx_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2); -void xor_lasx_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3); -void xor_lasx_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4); -void xor_lasx_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4, const unsigned long * __restrict p5); -#endif /* CONFIG_CPU_HAS_LASX */ - -#endif /* _ASM_LOONGARCH_XOR_SIMD_H */ --- a/arch/loongarch/lib/Makefile~loongarch-move-the-xor-code-to-lib-raid +++ a/arch/loongarch/lib/Makefile @@ -8,6 +8,4 @@ lib-y += delay.o memset.o memcpy.o memmo obj-$(CONFIG_ARCH_SUPPORTS_INT128) += tishift.o -obj-$(CONFIG_CPU_HAS_LSX) += xor_simd.o xor_simd_glue.o - obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o diff --git a/arch/loongarch/lib/xor_simd.c a/arch/loongarch/lib/xor_simd.c deleted file mode 100644 --- a/arch/loongarch/lib/xor_simd.c +++ /dev/null @@ -1,93 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * LoongArch SIMD XOR operations - * - * Copyright (C) 2023 WANG Xuerui - */ - -#include "xor_simd.h" - -/* - * Process one cache line (64 bytes) per loop. This is assuming all future - * popular LoongArch cores are similar performance-characteristics-wise to the - * current models. - */ -#define LINE_WIDTH 64 - -#ifdef CONFIG_CPU_HAS_LSX - -#define LD(reg, base, offset) \ - "vld $vr" #reg ", %[" #base "], " #offset "\n\t" -#define ST(reg, base, offset) \ - "vst $vr" #reg ", %[" #base "], " #offset "\n\t" -#define XOR(dj, k) "vxor.v $vr" #dj ", $vr" #dj ", $vr" #k "\n\t" - -#define LD_INOUT_LINE(base) \ - LD(0, base, 0) \ - LD(1, base, 16) \ - LD(2, base, 32) \ - LD(3, base, 48) - -#define LD_AND_XOR_LINE(base) \ - LD(4, base, 0) \ - LD(5, base, 16) \ - LD(6, base, 32) \ - LD(7, base, 48) \ - XOR(0, 4) \ - XOR(1, 5) \ - XOR(2, 6) \ - XOR(3, 7) - -#define ST_LINE(base) \ - ST(0, base, 0) \ - ST(1, base, 16) \ - ST(2, base, 32) \ - ST(3, base, 48) - -#define XOR_FUNC_NAME(nr) __xor_lsx_##nr -#include "xor_template.c" - -#undef LD -#undef ST -#undef XOR -#undef LD_INOUT_LINE -#undef LD_AND_XOR_LINE -#undef ST_LINE -#undef XOR_FUNC_NAME - -#endif /* CONFIG_CPU_HAS_LSX */ - -#ifdef CONFIG_CPU_HAS_LASX - -#define LD(reg, base, offset) \ - "xvld $xr" #reg ", %[" #base "], " #offset "\n\t" -#define ST(reg, base, offset) \ - "xvst $xr" #reg ", %[" #base "], " #offset "\n\t" -#define XOR(dj, k) "xvxor.v $xr" #dj ", $xr" #dj ", $xr" #k "\n\t" - -#define LD_INOUT_LINE(base) \ - LD(0, base, 0) \ - LD(1, base, 32) - -#define LD_AND_XOR_LINE(base) \ - LD(2, base, 0) \ - LD(3, base, 32) \ - XOR(0, 2) \ - XOR(1, 3) - -#define ST_LINE(base) \ - ST(0, base, 0) \ - ST(1, base, 32) - -#define XOR_FUNC_NAME(nr) __xor_lasx_##nr -#include "xor_template.c" - -#undef LD -#undef ST -#undef XOR -#undef LD_INOUT_LINE -#undef LD_AND_XOR_LINE -#undef ST_LINE -#undef XOR_FUNC_NAME - -#endif /* CONFIG_CPU_HAS_LASX */ diff --git a/arch/loongarch/lib/xor_simd_glue.c a/arch/loongarch/lib/xor_simd_glue.c deleted file mode 100644 --- a/arch/loongarch/lib/xor_simd_glue.c +++ /dev/null @@ -1,72 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * LoongArch SIMD XOR operations - * - * Copyright (C) 2023 WANG Xuerui - */ - -#include -#include -#include -#include -#include "xor_simd.h" - -#define MAKE_XOR_GLUE_2(flavor) \ -void xor_##flavor##_2(unsigned long bytes, unsigned long * __restrict p1, \ - const unsigned long * __restrict p2) \ -{ \ - kernel_fpu_begin(); \ - __xor_##flavor##_2(bytes, p1, p2); \ - kernel_fpu_end(); \ -} \ -EXPORT_SYMBOL_GPL(xor_##flavor##_2) - -#define MAKE_XOR_GLUE_3(flavor) \ -void xor_##flavor##_3(unsigned long bytes, unsigned long * __restrict p1, \ - const unsigned long * __restrict p2, \ - const unsigned long * __restrict p3) \ -{ \ - kernel_fpu_begin(); \ - __xor_##flavor##_3(bytes, p1, p2, p3); \ - kernel_fpu_end(); \ -} \ -EXPORT_SYMBOL_GPL(xor_##flavor##_3) - -#define MAKE_XOR_GLUE_4(flavor) \ -void xor_##flavor##_4(unsigned long bytes, unsigned long * __restrict p1, \ - const unsigned long * __restrict p2, \ - const unsigned long * __restrict p3, \ - const unsigned long * __restrict p4) \ -{ \ - kernel_fpu_begin(); \ - __xor_##flavor##_4(bytes, p1, p2, p3, p4); \ - kernel_fpu_end(); \ -} \ -EXPORT_SYMBOL_GPL(xor_##flavor##_4) - -#define MAKE_XOR_GLUE_5(flavor) \ -void xor_##flavor##_5(unsigned long bytes, unsigned long * __restrict p1, \ - const unsigned long * __restrict p2, \ - const unsigned long * __restrict p3, \ - const unsigned long * __restrict p4, \ - const unsigned long * __restrict p5) \ -{ \ - kernel_fpu_begin(); \ - __xor_##flavor##_5(bytes, p1, p2, p3, p4, p5); \ - kernel_fpu_end(); \ -} \ -EXPORT_SYMBOL_GPL(xor_##flavor##_5) - -#define MAKE_XOR_GLUES(flavor) \ - MAKE_XOR_GLUE_2(flavor); \ - MAKE_XOR_GLUE_3(flavor); \ - MAKE_XOR_GLUE_4(flavor); \ - MAKE_XOR_GLUE_5(flavor) - -#ifdef CONFIG_CPU_HAS_LSX -MAKE_XOR_GLUES(lsx); -#endif - -#ifdef CONFIG_CPU_HAS_LASX -MAKE_XOR_GLUES(lasx); -#endif diff --git a/arch/loongarch/lib/xor_simd.h a/arch/loongarch/lib/xor_simd.h deleted file mode 100644 --- a/arch/loongarch/lib/xor_simd.h +++ /dev/null @@ -1,38 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Simple interface to link xor_simd.c and xor_simd_glue.c - * - * Separating these files ensures that no SIMD instructions are run outside of - * the kfpu critical section. - */ - -#ifndef __LOONGARCH_LIB_XOR_SIMD_H -#define __LOONGARCH_LIB_XOR_SIMD_H - -#ifdef CONFIG_CPU_HAS_LSX -void __xor_lsx_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2); -void __xor_lsx_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3); -void __xor_lsx_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4); -void __xor_lsx_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4, const unsigned long * __restrict p5); -#endif /* CONFIG_CPU_HAS_LSX */ - -#ifdef CONFIG_CPU_HAS_LASX -void __xor_lasx_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2); -void __xor_lasx_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3); -void __xor_lasx_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4); -void __xor_lasx_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, const unsigned long * __restrict p3, - const unsigned long * __restrict p4, const unsigned long * __restrict p5); -#endif /* CONFIG_CPU_HAS_LASX */ - -#endif /* __LOONGARCH_LIB_XOR_SIMD_H */ diff --git a/arch/loongarch/lib/xor_template.c a/arch/loongarch/lib/xor_template.c deleted file mode 100644 --- a/arch/loongarch/lib/xor_template.c +++ /dev/null @@ -1,110 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright (C) 2023 WANG Xuerui - * - * Template for XOR operations, instantiated in xor_simd.c. - * - * Expected preprocessor definitions: - * - * - LINE_WIDTH - * - XOR_FUNC_NAME(nr) - * - LD_INOUT_LINE(buf) - * - LD_AND_XOR_LINE(buf) - * - ST_LINE(buf) - */ - -void XOR_FUNC_NAME(2)(unsigned long bytes, - unsigned long * __restrict v1, - const unsigned long * __restrict v2) -{ - unsigned long lines = bytes / LINE_WIDTH; - - do { - __asm__ __volatile__ ( - LD_INOUT_LINE(v1) - LD_AND_XOR_LINE(v2) - ST_LINE(v1) - : : [v1] "r"(v1), [v2] "r"(v2) : "memory" - ); - - v1 += LINE_WIDTH / sizeof(unsigned long); - v2 += LINE_WIDTH / sizeof(unsigned long); - } while (--lines > 0); -} - -void XOR_FUNC_NAME(3)(unsigned long bytes, - unsigned long * __restrict v1, - const unsigned long * __restrict v2, - const unsigned long * __restrict v3) -{ - unsigned long lines = bytes / LINE_WIDTH; - - do { - __asm__ __volatile__ ( - LD_INOUT_LINE(v1) - LD_AND_XOR_LINE(v2) - LD_AND_XOR_LINE(v3) - ST_LINE(v1) - : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3) : "memory" - ); - - v1 += LINE_WIDTH / sizeof(unsigned long); - v2 += LINE_WIDTH / sizeof(unsigned long); - v3 += LINE_WIDTH / sizeof(unsigned long); - } while (--lines > 0); -} - -void XOR_FUNC_NAME(4)(unsigned long bytes, - unsigned long * __restrict v1, - const unsigned long * __restrict v2, - const unsigned long * __restrict v3, - const unsigned long * __restrict v4) -{ - unsigned long lines = bytes / LINE_WIDTH; - - do { - __asm__ __volatile__ ( - LD_INOUT_LINE(v1) - LD_AND_XOR_LINE(v2) - LD_AND_XOR_LINE(v3) - LD_AND_XOR_LINE(v4) - ST_LINE(v1) - : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3), [v4] "r"(v4) - : "memory" - ); - - v1 += LINE_WIDTH / sizeof(unsigned long); - v2 += LINE_WIDTH / sizeof(unsigned long); - v3 += LINE_WIDTH / sizeof(unsigned long); - v4 += LINE_WIDTH / sizeof(unsigned long); - } while (--lines > 0); -} - -void XOR_FUNC_NAME(5)(unsigned long bytes, - unsigned long * __restrict v1, - const unsigned long * __restrict v2, - const unsigned long * __restrict v3, - const unsigned long * __restrict v4, - const unsigned long * __restrict v5) -{ - unsigned long lines = bytes / LINE_WIDTH; - - do { - __asm__ __volatile__ ( - LD_INOUT_LINE(v1) - LD_AND_XOR_LINE(v2) - LD_AND_XOR_LINE(v3) - LD_AND_XOR_LINE(v4) - LD_AND_XOR_LINE(v5) - ST_LINE(v1) - : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3), [v4] "r"(v4), - [v5] "r"(v5) : "memory" - ); - - v1 += LINE_WIDTH / sizeof(unsigned long); - v2 += LINE_WIDTH / sizeof(unsigned long); - v3 += LINE_WIDTH / sizeof(unsigned long); - v4 += LINE_WIDTH / sizeof(unsigned long); - v5 += LINE_WIDTH / sizeof(unsigned long); - } while (--lines > 0); -} diff --git a/lib/raid/xor/loongarch/xor_simd.c a/lib/raid/xor/loongarch/xor_simd.c new file mode 100664 --- /dev/null +++ a/lib/raid/xor/loongarch/xor_simd.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * LoongArch SIMD XOR operations + * + * Copyright (C) 2023 WANG Xuerui + */ + +#include "xor_simd.h" + +/* + * Process one cache line (64 bytes) per loop. This is assuming all future + * popular LoongArch cores are similar performance-characteristics-wise to the + * current models. + */ +#define LINE_WIDTH 64 + +#ifdef CONFIG_CPU_HAS_LSX + +#define LD(reg, base, offset) \ + "vld $vr" #reg ", %[" #base "], " #offset "\n\t" +#define ST(reg, base, offset) \ + "vst $vr" #reg ", %[" #base "], " #offset "\n\t" +#define XOR(dj, k) "vxor.v $vr" #dj ", $vr" #dj ", $vr" #k "\n\t" + +#define LD_INOUT_LINE(base) \ + LD(0, base, 0) \ + LD(1, base, 16) \ + LD(2, base, 32) \ + LD(3, base, 48) + +#define LD_AND_XOR_LINE(base) \ + LD(4, base, 0) \ + LD(5, base, 16) \ + LD(6, base, 32) \ + LD(7, base, 48) \ + XOR(0, 4) \ + XOR(1, 5) \ + XOR(2, 6) \ + XOR(3, 7) + +#define ST_LINE(base) \ + ST(0, base, 0) \ + ST(1, base, 16) \ + ST(2, base, 32) \ + ST(3, base, 48) + +#define XOR_FUNC_NAME(nr) __xor_lsx_##nr +#include "xor_template.c" + +#undef LD +#undef ST +#undef XOR +#undef LD_INOUT_LINE +#undef LD_AND_XOR_LINE +#undef ST_LINE +#undef XOR_FUNC_NAME + +#endif /* CONFIG_CPU_HAS_LSX */ + +#ifdef CONFIG_CPU_HAS_LASX + +#define LD(reg, base, offset) \ + "xvld $xr" #reg ", %[" #base "], " #offset "\n\t" +#define ST(reg, base, offset) \ + "xvst $xr" #reg ", %[" #base "], " #offset "\n\t" +#define XOR(dj, k) "xvxor.v $xr" #dj ", $xr" #dj ", $xr" #k "\n\t" + +#define LD_INOUT_LINE(base) \ + LD(0, base, 0) \ + LD(1, base, 32) + +#define LD_AND_XOR_LINE(base) \ + LD(2, base, 0) \ + LD(3, base, 32) \ + XOR(0, 2) \ + XOR(1, 3) + +#define ST_LINE(base) \ + ST(0, base, 0) \ + ST(1, base, 32) + +#define XOR_FUNC_NAME(nr) __xor_lasx_##nr +#include "xor_template.c" + +#undef LD +#undef ST +#undef XOR +#undef LD_INOUT_LINE +#undef LD_AND_XOR_LINE +#undef ST_LINE +#undef XOR_FUNC_NAME + +#endif /* CONFIG_CPU_HAS_LASX */ diff --git a/lib/raid/xor/loongarch/xor_simd_glue.c a/lib/raid/xor/loongarch/xor_simd_glue.c new file mode 100664 --- /dev/null +++ a/lib/raid/xor/loongarch/xor_simd_glue.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * LoongArch SIMD XOR operations + * + * Copyright (C) 2023 WANG Xuerui + */ + +#include +#include +#include +#include +#include "xor_simd.h" + +#define MAKE_XOR_GLUE_2(flavor) \ +static void xor_##flavor##_2(unsigned long bytes, unsigned long * __restrict p1,\ + const unsigned long * __restrict p2) \ +{ \ + kernel_fpu_begin(); \ + __xor_##flavor##_2(bytes, p1, p2); \ + kernel_fpu_end(); \ +} \ + +#define MAKE_XOR_GLUE_3(flavor) \ +static void xor_##flavor##_3(unsigned long bytes, unsigned long * __restrict p1,\ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3) \ +{ \ + kernel_fpu_begin(); \ + __xor_##flavor##_3(bytes, p1, p2, p3); \ + kernel_fpu_end(); \ +} \ + +#define MAKE_XOR_GLUE_4(flavor) \ +static void xor_##flavor##_4(unsigned long bytes, unsigned long * __restrict p1,\ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3, \ + const unsigned long * __restrict p4) \ +{ \ + kernel_fpu_begin(); \ + __xor_##flavor##_4(bytes, p1, p2, p3, p4); \ + kernel_fpu_end(); \ +} \ + +#define MAKE_XOR_GLUE_5(flavor) \ +static void xor_##flavor##_5(unsigned long bytes, unsigned long * __restrict p1,\ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3, \ + const unsigned long * __restrict p4, \ + const unsigned long * __restrict p5) \ +{ \ + kernel_fpu_begin(); \ + __xor_##flavor##_5(bytes, p1, p2, p3, p4, p5); \ + kernel_fpu_end(); \ +} \ + +#define MAKE_XOR_GLUES(flavor) \ + MAKE_XOR_GLUE_2(flavor); \ + MAKE_XOR_GLUE_3(flavor); \ + MAKE_XOR_GLUE_4(flavor); \ + MAKE_XOR_GLUE_5(flavor); \ + \ +struct xor_block_template xor_block_##flavor = { \ + .name = __stringify(flavor), \ + .do_2 = xor_##flavor##_2, \ + .do_3 = xor_##flavor##_3, \ + .do_4 = xor_##flavor##_4, \ + .do_5 = xor_##flavor##_5, \ +} + + +#ifdef CONFIG_CPU_HAS_LSX +MAKE_XOR_GLUES(lsx); +#endif /* CONFIG_CPU_HAS_LSX */ + +#ifdef CONFIG_CPU_HAS_LASX +MAKE_XOR_GLUES(lasx); +#endif /* CONFIG_CPU_HAS_LASX */ diff --git a/lib/raid/xor/loongarch/xor_simd.h a/lib/raid/xor/loongarch/xor_simd.h new file mode 100664 --- /dev/null +++ a/lib/raid/xor/loongarch/xor_simd.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Simple interface to link xor_simd.c and xor_simd_glue.c + * + * Separating these files ensures that no SIMD instructions are run outside of + * the kfpu critical section. + */ + +#ifndef __LOONGARCH_LIB_XOR_SIMD_H +#define __LOONGARCH_LIB_XOR_SIMD_H + +#ifdef CONFIG_CPU_HAS_LSX +void __xor_lsx_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2); +void __xor_lsx_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3); +void __xor_lsx_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3, + const unsigned long * __restrict p4); +void __xor_lsx_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3, + const unsigned long * __restrict p4, const unsigned long * __restrict p5); +#endif /* CONFIG_CPU_HAS_LSX */ + +#ifdef CONFIG_CPU_HAS_LASX +void __xor_lasx_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2); +void __xor_lasx_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3); +void __xor_lasx_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3, + const unsigned long * __restrict p4); +void __xor_lasx_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, const unsigned long * __restrict p3, + const unsigned long * __restrict p4, const unsigned long * __restrict p5); +#endif /* CONFIG_CPU_HAS_LASX */ + +#endif /* __LOONGARCH_LIB_XOR_SIMD_H */ diff --git a/lib/raid/xor/loongarch/xor_template.c a/lib/raid/xor/loongarch/xor_template.c new file mode 100664 --- /dev/null +++ a/lib/raid/xor/loongarch/xor_template.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2023 WANG Xuerui + * + * Template for XOR operations, instantiated in xor_simd.c. + * + * Expected preprocessor definitions: + * + * - LINE_WIDTH + * - XOR_FUNC_NAME(nr) + * - LD_INOUT_LINE(buf) + * - LD_AND_XOR_LINE(buf) + * - ST_LINE(buf) + */ + +void XOR_FUNC_NAME(2)(unsigned long bytes, + unsigned long * __restrict v1, + const unsigned long * __restrict v2) +{ + unsigned long lines = bytes / LINE_WIDTH; + + do { + __asm__ __volatile__ ( + LD_INOUT_LINE(v1) + LD_AND_XOR_LINE(v2) + ST_LINE(v1) + : : [v1] "r"(v1), [v2] "r"(v2) : "memory" + ); + + v1 += LINE_WIDTH / sizeof(unsigned long); + v2 += LINE_WIDTH / sizeof(unsigned long); + } while (--lines > 0); +} + +void XOR_FUNC_NAME(3)(unsigned long bytes, + unsigned long * __restrict v1, + const unsigned long * __restrict v2, + const unsigned long * __restrict v3) +{ + unsigned long lines = bytes / LINE_WIDTH; + + do { + __asm__ __volatile__ ( + LD_INOUT_LINE(v1) + LD_AND_XOR_LINE(v2) + LD_AND_XOR_LINE(v3) + ST_LINE(v1) + : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3) : "memory" + ); + + v1 += LINE_WIDTH / sizeof(unsigned long); + v2 += LINE_WIDTH / sizeof(unsigned long); + v3 += LINE_WIDTH / sizeof(unsigned long); + } while (--lines > 0); +} + +void XOR_FUNC_NAME(4)(unsigned long bytes, + unsigned long * __restrict v1, + const unsigned long * __restrict v2, + const unsigned long * __restrict v3, + const unsigned long * __restrict v4) +{ + unsigned long lines = bytes / LINE_WIDTH; + + do { + __asm__ __volatile__ ( + LD_INOUT_LINE(v1) + LD_AND_XOR_LINE(v2) + LD_AND_XOR_LINE(v3) + LD_AND_XOR_LINE(v4) + ST_LINE(v1) + : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3), [v4] "r"(v4) + : "memory" + ); + + v1 += LINE_WIDTH / sizeof(unsigned long); + v2 += LINE_WIDTH / sizeof(unsigned long); + v3 += LINE_WIDTH / sizeof(unsigned long); + v4 += LINE_WIDTH / sizeof(unsigned long); + } while (--lines > 0); +} + +void XOR_FUNC_NAME(5)(unsigned long bytes, + unsigned long * __restrict v1, + const unsigned long * __restrict v2, + const unsigned long * __restrict v3, + const unsigned long * __restrict v4, + const unsigned long * __restrict v5) +{ + unsigned long lines = bytes / LINE_WIDTH; + + do { + __asm__ __volatile__ ( + LD_INOUT_LINE(v1) + LD_AND_XOR_LINE(v2) + LD_AND_XOR_LINE(v3) + LD_AND_XOR_LINE(v4) + LD_AND_XOR_LINE(v5) + ST_LINE(v1) + : : [v1] "r"(v1), [v2] "r"(v2), [v3] "r"(v3), [v4] "r"(v4), + [v5] "r"(v5) : "memory" + ); + + v1 += LINE_WIDTH / sizeof(unsigned long); + v2 += LINE_WIDTH / sizeof(unsigned long); + v3 += LINE_WIDTH / sizeof(unsigned long); + v4 += LINE_WIDTH / sizeof(unsigned long); + v5 += LINE_WIDTH / sizeof(unsigned long); + } while (--lines > 0); +} --- a/lib/raid/xor/Makefile~loongarch-move-the-xor-code-to-lib-raid +++ a/lib/raid/xor/Makefile @@ -14,6 +14,8 @@ ifeq ($(CONFIG_ARM),y) xor-$(CONFIG_KERNEL_MODE_NEON) += arm/xor-neon.o arm/xor-neon-glue.o endif xor-$(CONFIG_ARM64) += arm64/xor-neon.o arm64/xor-neon-glue.o +xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd.o +xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd_glue.o CFLAGS_arm/xor-neon.o += $(CC_FLAGS_FPU) _ Patches currently in -mm which might be from hch@lst.de are xor-assert-that-xor_blocks-is-not-call-from-interrupt-context.patch arm-xor-remove-in_interrupt-handling.patch arm64-xor-fix-conflicting-attributes-for-xor_block_template.patch um-xor-cleanup-xorh.patch xor-move-to-lib-raid.patch xor-small-cleanups.patch xor-cleanup-registration-and-probing.patch xor-split-xorh.patch xor-remove-macro-abuse-for-xor-implementation-registrations.patch xor-move-generic-implementations-out-of-asm-generic-xorh.patch alpha-move-the-xor-code-to-lib-raid.patch arm-move-the-xor-code-to-lib-raid.patch arm64-move-the-xor-code-to-lib-raid.patch loongarch-move-the-xor-code-to-lib-raid.patch powerpc-move-the-xor-code-to-lib-raid.patch riscv-move-the-xor-code-to-lib-raid.patch sparc-move-the-xor-code-to-lib-raid.patch s390-move-the-xor-code-to-lib-raid.patch x86-move-the-xor-code-to-lib-raid.patch xor-avoid-indirect-calls-for-arm64-optimized-ops.patch xor-make-xorko-self-contained-in-lib-raid.patch xor-add-a-better-public-api.patch xor-add-a-better-public-api-2.patch async_xor-use-xor_gen.patch btrfs-use-xor_gen.patch xor-pass-the-entire-operation-to-the-low-level-ops.patch xor-use-static_call-for-xor_gen.patch xor-add-a-kunit-test-case.patch