From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2503D10BA432 for ; Fri, 27 Mar 2026 06:22:24 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [127.0.0.1]) by lists.ozlabs.org (Postfix) with ESMTP id 4fhrCf6DK6z3c95; Fri, 27 Mar 2026 17:22:22 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; arc=none smtp.remote-ip="2607:7c80:54:3::133" ARC-Seal: i=1; a=rsa-sha256; d=lists.ozlabs.org; s=201707; t=1774592542; cv=none; b=ffH8U7Oy1w2WaC9j6fUB0JYT1e7NqvRiyaXSiD4EvvL5CmJCRHzw9hpzK1vTUmucXguYMQ5PhRUh+iEjzY2drjuLTZoJIZVmzkKosmtGKAnnc5o3AxqC5/Krc26rORuCGTtNE3xYkVGX6RdoRSS/mvvW+40GmpiPEppmDOp8UjTnQg1rW4/b4Qg5UIK0S2H00HkpB+yO4uI15Cq9P1u/cqINZ7dBNAThTkt7m5hdoYx1ncma1vJR8lMjm0ktFb5a6KaJK3Uv98CEbkmp/UDJZVQlKlspdZXKxuCLdNKAoy/GpTOAnCOBuxjq3WBhDE4PnQwJ290A2W1kcJ9BgN0NIw== ARC-Message-Signature: i=1; a=rsa-sha256; d=lists.ozlabs.org; s=201707; t=1774592542; c=relaxed/relaxed; bh=vfRQfd6HX2UIy1GEtF4WxXFvmD5740u62PQfhFgAIN8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ntg4Nu0bdWyqr/f8ayTyPO9WNhs1J9fWYG08X0ljMZtmNlyGiFTVrFQmiN/7QfBaZJKpDq4NS9ikVw+EakIOipwd6n1cpK3tsJrW9qzSd5W//hn9PUlMDaXXak898R/1mKdOOxs3IVXJA+1WLTK7eGEirkoiA1JDo/jOnwwVZkFUhAOz2DLSVzzeIY7y0McEMmXUiExAGxm0XFWdKWqy6tjl2NwzAVN4IS3vjhw3yFRxg5OGWJ0idsdi6UDGtVQxDOU0OyMiBU/6tBqcRY5QU8GQImoiptS5aKXAoCnWZrOcgfN3O0fya6uDcLFACmt5iFLgRDs4QO7LfgaOGp3ZjA== ARC-Authentication-Results: i=1; lists.ozlabs.org; dmarc=fail (p=none dis=none) header.from=lst.de; dkim=pass (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=G9bmXSk5; dkim-atps=neutral; spf=none (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=batv+7b1de7ca9b09bfe890a7+8251+infradead.org+hch@bombadil.srs.infradead.org; receiver=lists.ozlabs.org) smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: lists.ozlabs.org; dmarc=fail (p=none dis=none) header.from=lst.de Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=G9bmXSk5; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=bombadil.srs.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=batv+7b1de7ca9b09bfe890a7+8251+infradead.org+hch@bombadil.srs.infradead.org; receiver=lists.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4fhrCd75bSz3c8h for ; Fri, 27 Mar 2026 17:22:21 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=vfRQfd6HX2UIy1GEtF4WxXFvmD5740u62PQfhFgAIN8=; b=G9bmXSk5lWRDcT6MwzHIpW2cMk UAJCcqzFNv0zZUhqVAbwLCxEVQkdGWtWKAd/wKpyf/kuBW89AREn9l6BRkXoovrjMID1CBl4c3tSF hs8DUiRJ6VDlqwRPAudWFx4pc0bpcSUbub7ywFHqwDsGmaPF0FDqJUwh2F8R2s4jJzCrHBk96xYfx rsWTIx4se/FGxHE29+kj27+FtqMFwcGTTjAYHJxizzMTerlyTsej2nE7UYRSIzXD/TGrxRkozW0vL N9M0xXo5Q0KQuBYONEskmOfKAEhS+YR5nlf+zBqvL28HkNO6sWKlrlHL5Xjg+uPyT3jX3r0fUfRpG TQWgxDmg==; Received: from 2a02-8389-2341-5b80-d601-7564-c2e0-491c.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:d601:7564:c2e0:491c] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w60a5-00000006nGi-1YFY; Fri, 27 Mar 2026 06:22:05 +0000 From: Christoph Hellwig To: Andrew Morton Cc: Richard Henderson , Matt Turner , Magnus Lindholm , Russell King , Catalin Marinas , Will Deacon , Ard Biesheuvel , Huacai Chen , WANG Xuerui , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , "Christophe Leroy (CS GROUP)" , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexandre Ghiti , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Herbert Xu , Dan Williams , Chris Mason , David Sterba , Arnd Bergmann , Song Liu , Yu Kuai , Li Nan , "Theodore Ts'o" , "Jason A. Donenfeld" , linux-alpha@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-crypto@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-arch@vger.kernel.org, linux-raid@vger.kernel.org Subject: [PATCH 20/28] xor: avoid indirect calls for arm64-optimized ops Date: Fri, 27 Mar 2026 07:16:52 +0100 Message-ID: <20260327061704.3707577-21-hch@lst.de> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260327061704.3707577-1-hch@lst.de> References: <20260327061704.3707577-1-hch@lst.de> X-Mailing-List: linuxppc-dev@lists.ozlabs.org List-Id: List-Help: List-Owner: List-Post: List-Archive: , List-Subscribe: , , List-Unsubscribe: Precedence: list MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Remove the inner xor_block_templates, and instead have two separate actual template that call into the neon-enabled compilation unit. Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/xor.h | 13 ++-- lib/raid/xor/arm64/xor-neon-glue.c | 95 +++++++++++++++--------------- lib/raid/xor/arm64/xor-neon.c | 73 +++++++++-------------- lib/raid/xor/arm64/xor-neon.h | 30 ++++++++++ 4 files changed, 114 insertions(+), 97 deletions(-) create mode 100644 lib/raid/xor/arm64/xor-neon.h diff --git a/arch/arm64/include/asm/xor.h b/arch/arm64/include/asm/xor.h index 81718f010761..4782c760bcac 100644 --- a/arch/arm64/include/asm/xor.h +++ b/arch/arm64/include/asm/xor.h @@ -7,15 +7,18 @@ #include #include -extern struct xor_block_template xor_block_arm64; -void __init xor_neon_init(void); +extern struct xor_block_template xor_block_neon; +extern struct xor_block_template xor_block_eor3; #define arch_xor_init arch_xor_init static __always_inline void __init arch_xor_init(void) { - xor_neon_init(); xor_register(&xor_block_8regs); xor_register(&xor_block_32regs); - if (cpu_has_neon()) - xor_register(&xor_block_arm64); + if (cpu_has_neon()) { + if (cpu_have_named_feature(SHA3)) + xor_register(&xor_block_eor3); + else + xor_register(&xor_block_neon); + } } diff --git a/lib/raid/xor/arm64/xor-neon-glue.c b/lib/raid/xor/arm64/xor-neon-glue.c index 067a2095659a..08c3e3573388 100644 --- a/lib/raid/xor/arm64/xor-neon-glue.c +++ b/lib/raid/xor/arm64/xor-neon-glue.c @@ -7,51 +7,54 @@ #include #include #include +#include "xor-neon.h" -extern struct xor_block_template const xor_block_inner_neon; - -static void -xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - scoped_ksimd() - xor_block_inner_neon.do_2(bytes, p1, p2); -} - -static void -xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - scoped_ksimd() - xor_block_inner_neon.do_3(bytes, p1, p2, p3); -} - -static void -xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - scoped_ksimd() - xor_block_inner_neon.do_4(bytes, p1, p2, p3, p4); -} - -static void -xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - scoped_ksimd() - xor_block_inner_neon.do_5(bytes, p1, p2, p3, p4, p5); -} - -struct xor_block_template xor_block_arm64 = { - .name = "arm64_neon", - .do_2 = xor_neon_2, - .do_3 = xor_neon_3, - .do_4 = xor_neon_4, - .do_5 = xor_neon_5 +#define XOR_TEMPLATE(_name) \ +static void \ +xor_##_name##_2(unsigned long bytes, unsigned long * __restrict p1, \ + const unsigned long * __restrict p2) \ +{ \ + scoped_ksimd() \ + __xor_##_name##_2(bytes, p1, p2); \ +} \ + \ +static void \ +xor_##_name##_3(unsigned long bytes, unsigned long * __restrict p1, \ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3) \ +{ \ + scoped_ksimd() \ + __xor_##_name##_3(bytes, p1, p2, p3); \ +} \ + \ +static void \ +xor_##_name##_4(unsigned long bytes, unsigned long * __restrict p1, \ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3, \ + const unsigned long * __restrict p4) \ +{ \ + scoped_ksimd() \ + __xor_##_name##_4(bytes, p1, p2, p3, p4); \ +} \ + \ +static void \ +xor_##_name##_5(unsigned long bytes, unsigned long * __restrict p1, \ + const unsigned long * __restrict p2, \ + const unsigned long * __restrict p3, \ + const unsigned long * __restrict p4, \ + const unsigned long * __restrict p5) \ +{ \ + scoped_ksimd() \ + __xor_##_name##_5(bytes, p1, p2, p3, p4, p5); \ +} \ + \ +struct xor_block_template xor_block_##_name = { \ + .name = __stringify(_name), \ + .do_2 = xor_##_name##_2, \ + .do_3 = xor_##_name##_3, \ + .do_4 = xor_##_name##_4, \ + .do_5 = xor_##_name##_5 \ }; + +XOR_TEMPLATE(neon); +XOR_TEMPLATE(eor3); diff --git a/lib/raid/xor/arm64/xor-neon.c b/lib/raid/xor/arm64/xor-neon.c index 8d2d185090db..61194c292917 100644 --- a/lib/raid/xor/arm64/xor-neon.c +++ b/lib/raid/xor/arm64/xor-neon.c @@ -8,9 +8,10 @@ #include #include #include +#include "xor-neon.h" -static void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) +void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -36,9 +37,9 @@ static void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1, } while (--lines > 0); } -static void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) +void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -72,10 +73,10 @@ static void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1, } while (--lines > 0); } -static void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) +void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -117,11 +118,11 @@ static void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1, } while (--lines > 0); } -static void xor_arm64_neon_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) +void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -171,14 +172,6 @@ static void xor_arm64_neon_5(unsigned long bytes, unsigned long * __restrict p1, } while (--lines > 0); } -struct xor_block_template xor_block_inner_neon __ro_after_init = { - .name = "__inner_neon__", - .do_2 = xor_arm64_neon_2, - .do_3 = xor_arm64_neon_3, - .do_4 = xor_arm64_neon_4, - .do_5 = xor_arm64_neon_5, -}; - static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) { uint64x2_t res; @@ -189,10 +182,9 @@ static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) return res; } -static void xor_arm64_eor3_3(unsigned long bytes, - unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) +void __xor_eor3_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -224,11 +216,10 @@ static void xor_arm64_eor3_3(unsigned long bytes, } while (--lines > 0); } -static void xor_arm64_eor3_4(unsigned long bytes, - unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) +void __xor_eor3_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -268,12 +259,11 @@ static void xor_arm64_eor3_4(unsigned long bytes, } while (--lines > 0); } -static void xor_arm64_eor3_5(unsigned long bytes, - unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) +void __xor_eor3_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5) { uint64_t *dp1 = (uint64_t *)p1; uint64_t *dp2 = (uint64_t *)p2; @@ -314,12 +304,3 @@ static void xor_arm64_eor3_5(unsigned long bytes, dp5 += 8; } while (--lines > 0); } - -void __init xor_neon_init(void) -{ - if (cpu_have_named_feature(SHA3)) { - xor_block_inner_neon.do_3 = xor_arm64_eor3_3; - xor_block_inner_neon.do_4 = xor_arm64_eor3_4; - xor_block_inner_neon.do_5 = xor_arm64_eor3_5; - } -} diff --git a/lib/raid/xor/arm64/xor-neon.h b/lib/raid/xor/arm64/xor-neon.h new file mode 100644 index 000000000000..cec0ac846fea --- /dev/null +++ b/lib/raid/xor/arm64/xor-neon.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2); +void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3); +void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4); +void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5); + +#define __xor_eor3_2 __xor_neon_2 +void __xor_eor3_3(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3); +void __xor_eor3_4(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4); +void __xor_eor3_5(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2, + const unsigned long * __restrict p3, + const unsigned long * __restrict p4, + const unsigned long * __restrict p5); -- 2.47.3