From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FF433FF89D for ; Fri, 27 Mar 2026 17:49:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774633799; cv=none; b=FMtdloY3d3U0qXjzHOxos3jOrQ6/2+uRM8MQB5H2aUtlaLo7rO0PPf0lMZFcQPSHvU52Iq3mOPS7xwATWc+oRm4ILN2OjP3FEfHUCmXKIL7FqQoMi227rZxGemWy/p6mXBw44BKQ92nYqqqYG5YZ9HiEJzQaTIpscDWe+JHu1Tg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774633799; c=relaxed/simple; bh=bxWvOHOI02Dbm/rGXc0tqpLAbdkP+KRkB38yfk7ZT1Q=; h=Date:To:From:Subject:Message-Id; b=bYhiSy3FPAedofRW6ITTnppOOqA9Bcgt5DCu4xv3Mt0RYgbwKtfKtHN3p7vhSuHFIUikK5AtSBdaolZrZDK5M/IHJ9ECaOORf+3N5OBhHIjm2+sbf+Gy+DqIJYyV2RV+IhtcY5fKC0uKhF1+ONH2cbD3pRSohcseKyeCcKuH26I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=EWxEcCyH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="EWxEcCyH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BEEEC19423; Fri, 27 Mar 2026 17:49:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1774633799; bh=bxWvOHOI02Dbm/rGXc0tqpLAbdkP+KRkB38yfk7ZT1Q=; h=Date:To:From:Subject:From; b=EWxEcCyH2hAO2m4ZgyTKxpn07yWfLLNjBgzlCplVTEBpBrqo8rd+4CAiSuuTmy0wv JM5aHMJe21WEhAy9rz2NNOroQrENQDNpkT8Jn+e31dMv2g3nmjWQNerya4yE+wnWdt Zl2Fsxi9fpbEu3dvurN6NeDUZcjTbFAyc8Xq8Ib8= Date: Fri, 27 Mar 2026 10:49:58 -0700 To: mm-commits@vger.kernel.org,hch@lst.de,akpm@linux-foundation.org From: Andrew Morton Subject: + xor-move-generic-implementations-out-of-asm-generic-xorh.patch added to mm-nonmm-unstable branch Message-Id: <20260327174959.2BEEEC19423@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: xor: move generic implementations out of asm-generic/xor.h has been added to the -mm mm-nonmm-unstable branch. Its filename is xor-move-generic-implementations-out-of-asm-generic-xorh.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/xor-move-generic-implementations-out-of-asm-generic-xorh.patch This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via various branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there most days ------------------------------------------------------ From: Christoph Hellwig Subject: xor: move generic implementations out of asm-generic/xor.h Date: Fri, 27 Mar 2026 07:16:42 +0100 Move the generic implementations from asm-generic/xor.h to per-implementaion .c files in lib/raid. This will build them unconditionally even when an architecture forces a specific implementation, but as we'll need at least one generic version for the static_call optimization later on we'll pay that price. Note that this would cause the second xor_block_8regs instance created by arch/arm/lib/xor-neon.c to be generated instead of discarded as dead code, so add a NO_TEMPLATE symbol to disable it for this case. Link: https://lkml.kernel.org/r/20260327061704.3707577-11-hch@lst.de Signed-off-by: Christoph Hellwig Cc: Albert Ou Cc: Alexander Gordeev Cc: Alexandre Ghiti Cc: Andreas Larsson Cc: Anton Ivanov Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: "Borislav Petkov (AMD)" Cc: Catalin Marinas Cc: Chris Mason Cc: Christian Borntraeger Cc: Dan Williams Cc: David S. Miller Cc: David Sterba Cc: Heiko Carstens Cc: Herbert Xu Cc: "H. Peter Anvin" Cc: Huacai Chen Cc: Ingo Molnar Cc: Jason A. Donenfeld Cc: Johannes Berg Cc: Li Nan Cc: Madhavan Srinivasan Cc: Magnus Lindholm Cc: Matt Turner Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Richard Henderson Cc: Richard Weinberger Cc: Russell King Cc: Song Liu Cc: Sven Schnelle Cc: Ted Ts'o Cc: Vasily Gorbik Cc: WANG Xuerui Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/arm/lib/xor-neon.c | 4 include/asm-generic/xor.h | 727 --------------------------- lib/raid/xor/Makefile | 4 lib/raid/xor/xor-32regs-prefetch.c | 268 +++++++++ lib/raid/xor/xor-32regs.c | 219 ++++++++ lib/raid/xor/xor-8regs-prefetch.c | 146 +++++ lib/raid/xor/xor-8regs.c | 105 +++ 7 files changed, 748 insertions(+), 725 deletions(-) --- a/arch/arm/lib/xor-neon.c~xor-move-generic-implementations-out-of-asm-generic-xorh +++ a/arch/arm/lib/xor-neon.c @@ -26,8 +26,8 @@ MODULE_LICENSE("GPL"); #pragma GCC optimize "tree-vectorize" #endif -#pragma GCC diagnostic ignored "-Wunused-variable" -#include +#define NO_TEMPLATE +#include "../../../lib/raid/xor/xor-8regs.c" struct xor_block_template const xor_block_neon_inner = { .name = "__inner_neon__", --- a/include/asm-generic/xor.h~xor-move-generic-implementations-out-of-asm-generic-xorh +++ a/include/asm-generic/xor.h @@ -5,726 +5,7 @@ * Generic optimized RAID-5 checksumming functions. */ -#include - -static void -xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - p1[0] ^= p2[0]; - p1[1] ^= p2[1]; - p1[2] ^= p2[2]; - p1[3] ^= p2[3]; - p1[4] ^= p2[4]; - p1[5] ^= p2[5]; - p1[6] ^= p2[6]; - p1[7] ^= p2[7]; - p1 += 8; - p2 += 8; - } while (--lines > 0); -} - -static void -xor_8regs_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - p1[0] ^= p2[0] ^ p3[0]; - p1[1] ^= p2[1] ^ p3[1]; - p1[2] ^= p2[2] ^ p3[2]; - p1[3] ^= p2[3] ^ p3[3]; - p1[4] ^= p2[4] ^ p3[4]; - p1[5] ^= p2[5] ^ p3[5]; - p1[6] ^= p2[6] ^ p3[6]; - p1[7] ^= p2[7] ^ p3[7]; - p1 += 8; - p2 += 8; - p3 += 8; - } while (--lines > 0); -} - -static void -xor_8regs_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; - p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; - p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; - p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; - p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; - p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; - p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; - p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - } while (--lines > 0); -} - -static void -xor_8regs_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; - p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; - p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; - p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; - p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; - p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; - p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; - p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - p5 += 8; - } while (--lines > 0); -} - -static void -xor_32regs_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - } while (--lines > 0); -} - -static void -xor_32regs_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - } while (--lines > 0); -} - -static void -xor_32regs_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - d0 ^= p4[0]; - d1 ^= p4[1]; - d2 ^= p4[2]; - d3 ^= p4[3]; - d4 ^= p4[4]; - d5 ^= p4[5]; - d6 ^= p4[6]; - d7 ^= p4[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - } while (--lines > 0); -} - -static void -xor_32regs_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - long lines = bytes / (sizeof (long)) / 8; - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - d0 ^= p4[0]; - d1 ^= p4[1]; - d2 ^= p4[2]; - d3 ^= p4[3]; - d4 ^= p4[4]; - d5 ^= p4[5]; - d6 ^= p4[6]; - d7 ^= p4[7]; - d0 ^= p5[0]; - d1 ^= p5[1]; - d2 ^= p5[2]; - d3 ^= p5[3]; - d4 ^= p5[4]; - d5 ^= p5[5]; - d6 ^= p5[6]; - d7 ^= p5[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - p5 += 8; - } while (--lines > 0); -} - -static void -xor_8regs_p_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - prefetchw(p1); - prefetch(p2); - - do { - prefetchw(p1+8); - prefetch(p2+8); - once_more: - p1[0] ^= p2[0]; - p1[1] ^= p2[1]; - p1[2] ^= p2[2]; - p1[3] ^= p2[3]; - p1[4] ^= p2[4]; - p1[5] ^= p2[5]; - p1[6] ^= p2[6]; - p1[7] ^= p2[7]; - p1 += 8; - p2 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_8regs_p_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - prefetchw(p1); - prefetch(p2); - prefetch(p3); - - do { - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - once_more: - p1[0] ^= p2[0] ^ p3[0]; - p1[1] ^= p2[1] ^ p3[1]; - p1[2] ^= p2[2] ^ p3[2]; - p1[3] ^= p2[3] ^ p3[3]; - p1[4] ^= p2[4] ^ p3[4]; - p1[5] ^= p2[5] ^ p3[5]; - p1[6] ^= p2[6] ^ p3[6]; - p1[7] ^= p2[7] ^ p3[7]; - p1 += 8; - p2 += 8; - p3 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_8regs_p_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - prefetch(p3); - prefetch(p4); - - do { - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - prefetch(p4+8); - once_more: - p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; - p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; - p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; - p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; - p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; - p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; - p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; - p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_8regs_p_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - prefetch(p3); - prefetch(p4); - prefetch(p5); - - do { - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - prefetch(p4+8); - prefetch(p5+8); - once_more: - p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; - p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; - p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; - p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; - p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; - p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; - p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; - p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - p5 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_32regs_p_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - - prefetchw(p1+8); - prefetch(p2+8); - once_more: - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_32regs_p_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - prefetch(p3); - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - once_more: - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_32regs_p_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - prefetch(p3); - prefetch(p4); - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - prefetch(p4+8); - once_more: - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - d0 ^= p4[0]; - d1 ^= p4[1]; - d2 ^= p4[2]; - d3 ^= p4[3]; - d4 ^= p4[4]; - d5 ^= p4[5]; - d6 ^= p4[6]; - d7 ^= p4[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static void -xor_32regs_p_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - long lines = bytes / (sizeof (long)) / 8 - 1; - - prefetchw(p1); - prefetch(p2); - prefetch(p3); - prefetch(p4); - prefetch(p5); - - do { - register long d0, d1, d2, d3, d4, d5, d6, d7; - - prefetchw(p1+8); - prefetch(p2+8); - prefetch(p3+8); - prefetch(p4+8); - prefetch(p5+8); - once_more: - d0 = p1[0]; /* Pull the stuff into registers */ - d1 = p1[1]; /* ... in bursts, if possible. */ - d2 = p1[2]; - d3 = p1[3]; - d4 = p1[4]; - d5 = p1[5]; - d6 = p1[6]; - d7 = p1[7]; - d0 ^= p2[0]; - d1 ^= p2[1]; - d2 ^= p2[2]; - d3 ^= p2[3]; - d4 ^= p2[4]; - d5 ^= p2[5]; - d6 ^= p2[6]; - d7 ^= p2[7]; - d0 ^= p3[0]; - d1 ^= p3[1]; - d2 ^= p3[2]; - d3 ^= p3[3]; - d4 ^= p3[4]; - d5 ^= p3[5]; - d6 ^= p3[6]; - d7 ^= p3[7]; - d0 ^= p4[0]; - d1 ^= p4[1]; - d2 ^= p4[2]; - d3 ^= p4[3]; - d4 ^= p4[4]; - d5 ^= p4[5]; - d6 ^= p4[6]; - d7 ^= p4[7]; - d0 ^= p5[0]; - d1 ^= p5[1]; - d2 ^= p5[2]; - d3 ^= p5[3]; - d4 ^= p5[4]; - d5 ^= p5[5]; - d6 ^= p5[6]; - d7 ^= p5[7]; - p1[0] = d0; /* Store the result (in bursts) */ - p1[1] = d1; - p1[2] = d2; - p1[3] = d3; - p1[4] = d4; - p1[5] = d5; - p1[6] = d6; - p1[7] = d7; - p1 += 8; - p2 += 8; - p3 += 8; - p4 += 8; - p5 += 8; - } while (--lines > 0); - if (lines == 0) - goto once_more; -} - -static struct xor_block_template xor_block_8regs = { - .name = "8regs", - .do_2 = xor_8regs_2, - .do_3 = xor_8regs_3, - .do_4 = xor_8regs_4, - .do_5 = xor_8regs_5, -}; - -static struct xor_block_template xor_block_32regs = { - .name = "32regs", - .do_2 = xor_32regs_2, - .do_3 = xor_32regs_3, - .do_4 = xor_32regs_4, - .do_5 = xor_32regs_5, -}; - -static struct xor_block_template xor_block_8regs_p __maybe_unused = { - .name = "8regs_prefetch", - .do_2 = xor_8regs_p_2, - .do_3 = xor_8regs_p_3, - .do_4 = xor_8regs_p_4, - .do_5 = xor_8regs_p_5, -}; - -static struct xor_block_template xor_block_32regs_p __maybe_unused = { - .name = "32regs_prefetch", - .do_2 = xor_32regs_p_2, - .do_3 = xor_32regs_p_3, - .do_4 = xor_32regs_p_4, - .do_5 = xor_32regs_p_5, -}; +extern struct xor_block_template xor_block_8regs; +extern struct xor_block_template xor_block_32regs; +extern struct xor_block_template xor_block_8regs_p; +extern struct xor_block_template xor_block_32regs_p; --- a/lib/raid/xor/Makefile~xor-move-generic-implementations-out-of-asm-generic-xorh +++ a/lib/raid/xor/Makefile @@ -3,3 +3,7 @@ obj-$(CONFIG_XOR_BLOCKS) += xor.o xor-y += xor-core.o +xor-y += xor-8regs.o +xor-y += xor-32regs.o +xor-y += xor-8regs-prefetch.o +xor-y += xor-32regs-prefetch.o diff --git a/lib/raid/xor/xor-32regs.c a/lib/raid/xor/xor-32regs.c new file mode 100644 --- /dev/null +++ a/lib/raid/xor/xor-32regs.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include + +static void +xor_32regs_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + } while (--lines > 0); +} + +static void +xor_32regs_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + } while (--lines > 0); +} + +static void +xor_32regs_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + d0 ^= p4[0]; + d1 ^= p4[1]; + d2 ^= p4[2]; + d3 ^= p4[3]; + d4 ^= p4[4]; + d5 ^= p4[5]; + d6 ^= p4[6]; + d7 ^= p4[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + } while (--lines > 0); +} + +static void +xor_32regs_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + d0 ^= p4[0]; + d1 ^= p4[1]; + d2 ^= p4[2]; + d3 ^= p4[3]; + d4 ^= p4[4]; + d5 ^= p4[5]; + d6 ^= p4[6]; + d7 ^= p4[7]; + d0 ^= p5[0]; + d1 ^= p5[1]; + d2 ^= p5[2]; + d3 ^= p5[3]; + d4 ^= p5[4]; + d5 ^= p5[5]; + d6 ^= p5[6]; + d7 ^= p5[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + p5 += 8; + } while (--lines > 0); +} + +struct xor_block_template xor_block_32regs = { + .name = "32regs", + .do_2 = xor_32regs_2, + .do_3 = xor_32regs_3, + .do_4 = xor_32regs_4, + .do_5 = xor_32regs_5, +}; diff --git a/lib/raid/xor/xor-32regs-prefetch.c a/lib/raid/xor/xor-32regs-prefetch.c new file mode 100644 --- /dev/null +++ a/lib/raid/xor/xor-32regs-prefetch.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include + +static void +xor_32regs_p_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + + prefetchw(p1+8); + prefetch(p2+8); + once_more: + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_32regs_p_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + prefetch(p3); + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + once_more: + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_32regs_p_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + prefetch(p3); + prefetch(p4); + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + prefetch(p4+8); + once_more: + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + d0 ^= p4[0]; + d1 ^= p4[1]; + d2 ^= p4[2]; + d3 ^= p4[3]; + d4 ^= p4[4]; + d5 ^= p4[5]; + d6 ^= p4[6]; + d7 ^= p4[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_32regs_p_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + prefetch(p3); + prefetch(p4); + prefetch(p5); + + do { + register long d0, d1, d2, d3, d4, d5, d6, d7; + + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + prefetch(p4+8); + prefetch(p5+8); + once_more: + d0 = p1[0]; /* Pull the stuff into registers */ + d1 = p1[1]; /* ... in bursts, if possible. */ + d2 = p1[2]; + d3 = p1[3]; + d4 = p1[4]; + d5 = p1[5]; + d6 = p1[6]; + d7 = p1[7]; + d0 ^= p2[0]; + d1 ^= p2[1]; + d2 ^= p2[2]; + d3 ^= p2[3]; + d4 ^= p2[4]; + d5 ^= p2[5]; + d6 ^= p2[6]; + d7 ^= p2[7]; + d0 ^= p3[0]; + d1 ^= p3[1]; + d2 ^= p3[2]; + d3 ^= p3[3]; + d4 ^= p3[4]; + d5 ^= p3[5]; + d6 ^= p3[6]; + d7 ^= p3[7]; + d0 ^= p4[0]; + d1 ^= p4[1]; + d2 ^= p4[2]; + d3 ^= p4[3]; + d4 ^= p4[4]; + d5 ^= p4[5]; + d6 ^= p4[6]; + d7 ^= p4[7]; + d0 ^= p5[0]; + d1 ^= p5[1]; + d2 ^= p5[2]; + d3 ^= p5[3]; + d4 ^= p5[4]; + d5 ^= p5[5]; + d6 ^= p5[6]; + d7 ^= p5[7]; + p1[0] = d0; /* Store the result (in bursts) */ + p1[1] = d1; + p1[2] = d2; + p1[3] = d3; + p1[4] = d4; + p1[5] = d5; + p1[6] = d6; + p1[7] = d7; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + p5 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +struct xor_block_template xor_block_32regs_p = { + .name = "32regs_prefetch", + .do_2 = xor_32regs_p_2, + .do_3 = xor_32regs_p_3, + .do_4 = xor_32regs_p_4, + .do_5 = xor_32regs_p_5, +}; diff --git a/lib/raid/xor/xor-8regs.c a/lib/raid/xor/xor-8regs.c new file mode 100644 --- /dev/null +++ a/lib/raid/xor/xor-8regs.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include + +static void +xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + p1[0] ^= p2[0]; + p1[1] ^= p2[1]; + p1[2] ^= p2[2]; + p1[3] ^= p2[3]; + p1[4] ^= p2[4]; + p1[5] ^= p2[5]; + p1[6] ^= p2[6]; + p1[7] ^= p2[7]; + p1 += 8; + p2 += 8; + } while (--lines > 0); +} + +static void +xor_8regs_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + p1[0] ^= p2[0] ^ p3[0]; + p1[1] ^= p2[1] ^ p3[1]; + p1[2] ^= p2[2] ^ p3[2]; + p1[3] ^= p2[3] ^ p3[3]; + p1[4] ^= p2[4] ^ p3[4]; + p1[5] ^= p2[5] ^ p3[5]; + p1[6] ^= p2[6] ^ p3[6]; + p1[7] ^= p2[7] ^ p3[7]; + p1 += 8; + p2 += 8; + p3 += 8; + } while (--lines > 0); +} + +static void +xor_8regs_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; + p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; + p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; + p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; + p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; + p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; + p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; + p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + } while (--lines > 0); +} + +static void +xor_8regs_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) +{ + long lines = bytes / (sizeof (long)) / 8; + + do { + p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; + p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; + p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; + p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; + p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; + p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; + p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; + p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + p5 += 8; + } while (--lines > 0); +} + +#ifndef NO_TEMPLATE +struct xor_block_template xor_block_8regs = { + .name = "8regs", + .do_2 = xor_8regs_2, + .do_3 = xor_8regs_3, + .do_4 = xor_8regs_4, + .do_5 = xor_8regs_5, +}; +#endif /* NO_TEMPLATE */ diff --git a/lib/raid/xor/xor-8regs-prefetch.c a/lib/raid/xor/xor-8regs-prefetch.c new file mode 100644 --- /dev/null +++ a/lib/raid/xor/xor-8regs-prefetch.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include + +static void +xor_8regs_p_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + prefetchw(p1); + prefetch(p2); + + do { + prefetchw(p1+8); + prefetch(p2+8); + once_more: + p1[0] ^= p2[0]; + p1[1] ^= p2[1]; + p1[2] ^= p2[2]; + p1[3] ^= p2[3]; + p1[4] ^= p2[4]; + p1[5] ^= p2[5]; + p1[6] ^= p2[6]; + p1[7] ^= p2[7]; + p1 += 8; + p2 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_8regs_p_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + prefetchw(p1); + prefetch(p2); + prefetch(p3); + + do { + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + once_more: + p1[0] ^= p2[0] ^ p3[0]; + p1[1] ^= p2[1] ^ p3[1]; + p1[2] ^= p2[2] ^ p3[2]; + p1[3] ^= p2[3] ^ p3[3]; + p1[4] ^= p2[4] ^ p3[4]; + p1[5] ^= p2[5] ^ p3[5]; + p1[6] ^= p2[6] ^ p3[6]; + p1[7] ^= p2[7] ^ p3[7]; + p1 += 8; + p2 += 8; + p3 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_8regs_p_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + prefetch(p3); + prefetch(p4); + + do { + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + prefetch(p4+8); + once_more: + p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; + p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; + p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; + p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; + p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; + p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; + p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; + p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +static void +xor_8regs_p_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) +{ + long lines = bytes / (sizeof (long)) / 8 - 1; + + prefetchw(p1); + prefetch(p2); + prefetch(p3); + prefetch(p4); + prefetch(p5); + + do { + prefetchw(p1+8); + prefetch(p2+8); + prefetch(p3+8); + prefetch(p4+8); + prefetch(p5+8); + once_more: + p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; + p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; + p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; + p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; + p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; + p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; + p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; + p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; + p1 += 8; + p2 += 8; + p3 += 8; + p4 += 8; + p5 += 8; + } while (--lines > 0); + if (lines == 0) + goto once_more; +} + +struct xor_block_template xor_block_8regs_p = { + .name = "8regs_prefetch", + .do_2 = xor_8regs_p_2, + .do_3 = xor_8regs_p_3, + .do_4 = xor_8regs_p_4, + .do_5 = xor_8regs_p_5, +}; _ Patches currently in -mm which might be from hch@lst.de are xor-assert-that-xor_blocks-is-not-call-from-interrupt-context.patch arm-xor-remove-in_interrupt-handling.patch arm64-xor-fix-conflicting-attributes-for-xor_block_template.patch um-xor-cleanup-xorh.patch xor-move-to-lib-raid.patch xor-small-cleanups.patch xor-cleanup-registration-and-probing.patch xor-split-xorh.patch xor-remove-macro-abuse-for-xor-implementation-registrations.patch xor-move-generic-implementations-out-of-asm-generic-xorh.patch alpha-move-the-xor-code-to-lib-raid.patch arm-move-the-xor-code-to-lib-raid.patch arm64-move-the-xor-code-to-lib-raid.patch loongarch-move-the-xor-code-to-lib-raid.patch powerpc-move-the-xor-code-to-lib-raid.patch riscv-move-the-xor-code-to-lib-raid.patch sparc-move-the-xor-code-to-lib-raid.patch s390-move-the-xor-code-to-lib-raid.patch x86-move-the-xor-code-to-lib-raid.patch xor-avoid-indirect-calls-for-arm64-optimized-ops.patch xor-make-xorko-self-contained-in-lib-raid.patch xor-add-a-better-public-api.patch xor-add-a-better-public-api-2.patch async_xor-use-xor_gen.patch btrfs-use-xor_gen.patch xor-pass-the-entire-operation-to-the-low-level-ops.patch xor-use-static_call-for-xor_gen.patch xor-add-a-kunit-test-case.patch