From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76C38FF60E0 for ; Tue, 31 Mar 2026 07:50:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Gi9Cem+79JlTHpKD7Csp1LYarBVlhKoya6zNjC01wME=; b=cLP8Rjj8NrS2zTAD+qrXHGBv5/ ohuFW2dJbd+0DZs/Y+679G8BI5H+FAZhHdHRA2tsQls7CMmo89BU5qPZbR1ClVHqgDTxm1bTBUAdJ tyotu08WYsDAbu1fKGyrRRQS+TLukka2Ha6LZ+lHio4XRVcocjXAOdvH/OLl4mqfgI1DVGy7itqJY TDA0HCGvlx4lv2OQTHnYyD9416aUakxhCJHio566RKDcMzyJDJjZk/DTVT9at0EvfQESiziRSYRSd pppOZt0zLGVBU4/6lU2EdbTdm1BxNPcnNqfGepoFl9unKvMWjMI8hCByhDVX7DUFw8EnrFlIUih1Q xL4cemkA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1w7TrP-0000000CWSa-15hJ; Tue, 31 Mar 2026 07:50:03 +0000 Received: from mail-wm1-x34a.google.com ([2a00:1450:4864:20::34a]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1w7TrK-0000000CWP6-1v3w for linux-arm-kernel@lists.infradead.org; Tue, 31 Mar 2026 07:49:59 +0000 Received: by mail-wm1-x34a.google.com with SMTP id 5b1f17b1804b1-486fa07f2bbso32289475e9.2 for ; Tue, 31 Mar 2026 00:49:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1774943396; x=1775548196; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Gi9Cem+79JlTHpKD7Csp1LYarBVlhKoya6zNjC01wME=; b=b2GsoK6KlA/KUPhqHsJLboGalP87KNcBUnSboBb9YI/bNJ0YX49yvEhC0RkHbCcMCb kVazSGLNSeOVUhfxdowuda9RVrXRf5Q1ibUvZtb2Y2DC5wHAuMkoqb5EFDTISnvRZ9hV 2/ptLs7DXpEUDZuZ2pyFBVq2VrBRQq0jqKD+jKZUWiC6UNx7oTIhHzPqh1Um9BK/hAqm 9uE1wEKgwV7YOcPbtxwlAw/KmsMAhSLm+ZwDPuFJrtY0Zf3uqqvBPFzO3Of0O9l5CDnl O2PpfFpPdyfN+rEt7Pkt5dh+gYyOkjyU141JRegRFo/N4n9wfXmjhzs9v0OZMBXq/tV3 rDwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774943396; x=1775548196; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Gi9Cem+79JlTHpKD7Csp1LYarBVlhKoya6zNjC01wME=; b=eMqiuKJxWspepYTaqZC+eW7/GGprrjlDvAZKwTSeGH1O3C4KgEptN2VI5Eeo+mxME1 x4Z9mKYXrJlqOpa1hDI9m5+Z4SyiY1mieaLeZHxeTsaYudJq5IcoQKb5UimTGkVDSwib O5S8hWCv6PtGA+JPbe6o7wPbqi8e+yhr3ash46FsIErwk2AOiMzs9UrFcHIYLBaUX645 1J0DtGYrixGOY79VbvhCXMuHMV+7kYE4KUTAF0CrQwg4FMA7es5150gtPkD3l1bNRtYG GyzVgiMxLBzWBl1FVBbxT5s8Anfb0RkR3ftQLEv9wUd2lZ9DsfCZivubwRKnYpYHQfIM yr3w== X-Gm-Message-State: AOJu0YzFxdc9c53csSVCe/5Por3ryX9dTARpUKqxo5OG+TXtI3OTuUe2 pn21GXd21EQkiONIr3nyfpuoSmLtElpWyVtb4dGeTT1gcR7WSVDElNsHfHZpB0RarkSsqZ40cw= = X-Received: from wmg15.prod.google.com ([2002:a05:600c:22cf:b0:483:a1ee:5eca]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e15:b0:477:7ae0:cd6e with SMTP id 5b1f17b1804b1-48727d5a16fmr261564395e9.5.1774943396116; Tue, 31 Mar 2026 00:49:56 -0700 (PDT) Date: Tue, 31 Mar 2026 09:49:44 +0200 In-Reply-To: <20260331074940.55502-7-ardb+git@google.com> Mime-Version: 1.0 References: <20260331074940.55502-7-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=7427; i=ardb@kernel.org; h=from:subject; bh=CV0qFP0DlqNmCBgCBDEE3xAee2gYtqdznvlLU2UGzTs=; b=owGbwMvMwCVmkMcZplerG8N4Wi2JIfN0zawPW8+tUVdv/t0rwljXElDoosL0Z0XOxmVqT+Unl l1Kz7bsKGVhEONikBVTZBGY/ffdztMTpWqdZ8nCzGFlAhnCwMUpABPhq2D4zSbmci9xYqzP7Zyl p/x6D1sEct4/wKPuJb2mdnd8U/TvTkaGuSe1jU9MP9/+pmPns75jas4JM6+XNfRF7p8cdtK2df9 WNgA= X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog Message-ID: <20260331074940.55502-11-ardb+git@google.com> Subject: [PATCH v2 4/5] xor/arm64: Use shared NEON intrinsics implementation from 32-bit ARM From: Ard Biesheuvel To: linux-raid@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, linux-crypto@vger.kernel.org, Ard Biesheuvel , Christoph Hellwig , Russell King , Arnd Bergmann , Eric Biggers Content-Type: text/plain; charset="UTF-8" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260331_004958_573274_F5C59F5D X-CRM114-Status: GOOD ( 13.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Ard Biesheuvel Tweak the arm64 code so that the pure NEON intrinsics implementation of XOR is shared between arm64 and ARM. Signed-off-by: Ard Biesheuvel --- lib/raid/xor/Makefile | 3 +- lib/raid/xor/arm/xor-neon.c | 4 + lib/raid/xor/arm64/xor-neon.c | 172 +------------------- 3 files changed, 9 insertions(+), 170 deletions(-) diff --git a/lib/raid/xor/Makefile b/lib/raid/xor/Makefile index 4d633dfd5b90..b27bf5156784 100644 --- a/lib/raid/xor/Makefile +++ b/lib/raid/xor/Makefile @@ -19,7 +19,8 @@ xor-$(CONFIG_ARM) += arm/xor.o ifeq ($(CONFIG_ARM),y) xor-$(CONFIG_KERNEL_MODE_NEON) += arm/xor-neon.o arm/xor-neon-glue.o endif -xor-$(CONFIG_ARM64) += arm64/xor-neon.o arm64/xor-neon-glue.o +xor-$(CONFIG_ARM64) += arm/xor-neon.o arm64/xor-neon.o \ + arm64/xor-neon-glue.o xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd.o xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd_glue.o xor-$(CONFIG_ALTIVEC) += powerpc/xor_vmx.o powerpc/xor_vmx_glue.o diff --git a/lib/raid/xor/arm/xor-neon.c b/lib/raid/xor/arm/xor-neon.c index a3e2b4af8d36..c7c3cf634e23 100644 --- a/lib/raid/xor/arm/xor-neon.c +++ b/lib/raid/xor/arm/xor-neon.c @@ -173,3 +173,7 @@ static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, __DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4, __xor_neon_5); + +#ifdef CONFIG_ARM64 +extern typeof(__xor_neon_2) __xor_eor3_2 __alias(__xor_neon_2); +#endif diff --git a/lib/raid/xor/arm64/xor-neon.c b/lib/raid/xor/arm64/xor-neon.c index 97ef3cb92496..e44016c363f1 100644 --- a/lib/raid/xor/arm64/xor-neon.c +++ b/lib/raid/xor/arm64/xor-neon.c @@ -1,8 +1,4 @@ // SPDX-License-Identifier: GPL-2.0-only -/* - * Authors: Jackie Liu - * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. - */ #include #include @@ -10,170 +6,8 @@ #include "xor_arch.h" #include "xor-neon.h" -static void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2) -{ - uint64_t *dp1 = (uint64_t *)p1; - uint64_t *dp2 = (uint64_t *)p2; - - register uint64x2_t v0, v1, v2, v3; - long lines = bytes / (sizeof(uint64x2_t) * 4); - - do { - /* p1 ^= p2 */ - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); - - /* store */ - vst1q_u64(dp1 + 0, v0); - vst1q_u64(dp1 + 2, v1); - vst1q_u64(dp1 + 4, v2); - vst1q_u64(dp1 + 6, v3); - - dp1 += 8; - dp2 += 8; - } while (--lines > 0); -} - -static void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3) -{ - uint64_t *dp1 = (uint64_t *)p1; - uint64_t *dp2 = (uint64_t *)p2; - uint64_t *dp3 = (uint64_t *)p3; - - register uint64x2_t v0, v1, v2, v3; - long lines = bytes / (sizeof(uint64x2_t) * 4); - - do { - /* p1 ^= p2 */ - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); - - /* p1 ^= p3 */ - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); - - /* store */ - vst1q_u64(dp1 + 0, v0); - vst1q_u64(dp1 + 2, v1); - vst1q_u64(dp1 + 4, v2); - vst1q_u64(dp1 + 6, v3); - - dp1 += 8; - dp2 += 8; - dp3 += 8; - } while (--lines > 0); -} - -static void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4) -{ - uint64_t *dp1 = (uint64_t *)p1; - uint64_t *dp2 = (uint64_t *)p2; - uint64_t *dp3 = (uint64_t *)p3; - uint64_t *dp4 = (uint64_t *)p4; - - register uint64x2_t v0, v1, v2, v3; - long lines = bytes / (sizeof(uint64x2_t) * 4); - - do { - /* p1 ^= p2 */ - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); - - /* p1 ^= p3 */ - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); - - /* p1 ^= p4 */ - v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); - - /* store */ - vst1q_u64(dp1 + 0, v0); - vst1q_u64(dp1 + 2, v1); - vst1q_u64(dp1 + 4, v2); - vst1q_u64(dp1 + 6, v3); - - dp1 += 8; - dp2 += 8; - dp3 += 8; - dp4 += 8; - } while (--lines > 0); -} - -static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, - const unsigned long * __restrict p2, - const unsigned long * __restrict p3, - const unsigned long * __restrict p4, - const unsigned long * __restrict p5) -{ - uint64_t *dp1 = (uint64_t *)p1; - uint64_t *dp2 = (uint64_t *)p2; - uint64_t *dp3 = (uint64_t *)p3; - uint64_t *dp4 = (uint64_t *)p4; - uint64_t *dp5 = (uint64_t *)p5; - - register uint64x2_t v0, v1, v2, v3; - long lines = bytes / (sizeof(uint64x2_t) * 4); - - do { - /* p1 ^= p2 */ - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); - - /* p1 ^= p3 */ - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); - - /* p1 ^= p4 */ - v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); - - /* p1 ^= p5 */ - v0 = veorq_u64(v0, vld1q_u64(dp5 + 0)); - v1 = veorq_u64(v1, vld1q_u64(dp5 + 2)); - v2 = veorq_u64(v2, vld1q_u64(dp5 + 4)); - v3 = veorq_u64(v3, vld1q_u64(dp5 + 6)); - - /* store */ - vst1q_u64(dp1 + 0, v0); - vst1q_u64(dp1 + 2, v1); - vst1q_u64(dp1 + 4, v2); - vst1q_u64(dp1 + 6, v3); - - dp1 += 8; - dp2 += 8; - dp3 += 8; - dp4 += 8; - dp5 += 8; - } while (--lines > 0); -} - -__DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4, - __xor_neon_5); +extern void __xor_eor3_2(unsigned long bytes, unsigned long * __restrict p1, + const unsigned long * __restrict p2); static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) { @@ -308,5 +142,5 @@ static void __xor_eor3_5(unsigned long bytes, unsigned long * __restrict p1, } while (--lines > 0); } -__DO_XOR_BLOCKS(eor3_inner, __xor_neon_2, __xor_eor3_3, __xor_eor3_4, +__DO_XOR_BLOCKS(eor3_inner, __xor_eor3_2, __xor_eor3_3, __xor_eor3_4, __xor_eor3_5); -- 2.53.0.1018.g2bb0e51243-goog