From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A74A0C433EF for ; Tue, 9 Nov 2021 12:05:25 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 76A4060F58 for ; Tue, 9 Nov 2021 12:05:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 76A4060F58 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qcFi9xVHrZtIpaDr9AH6wP92hYJMfdbDNQotgTFYCzs=; b=f3FAlurj3VA2zT 9WIyShMcIxOYQyU5wA3ZvY3nhaWBLrqBuD+sAHM8SqjUCqPSfNp+/HRx4AOl9UU5RZrakL9jmhpYw i5Hhw+un2WexxPrq8w49m8GwYrGEoU9HI8+YgFk0xrZn42vt5/hjqbCxjGo35jhlJYby6lVgahdln chBbgK8l2SxpAkboOQDValIn0y4zOlWceO0tYAE+nt9ePDt/qF6qGGHs/a01Jo9puiTpMTEE2GWNK j0KrTJBT8HVkKvEvCrr6hHVRz3FApTt7GBRIOBg1OP/mU+CR2+3zo9EZaAl0OV3ZpzA45lX2e/WoP WeKf1fU9/1DiCO47nXAw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mkPrK-001tkf-KK; Tue, 09 Nov 2021 12:04:14 +0000 Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mkPqw-001tgI-Ig for linux-arm-kernel@lists.infradead.org; Tue, 09 Nov 2021 12:03:52 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id D1CCB61207; Tue, 9 Nov 2021 12:03:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1636459430; bh=bR1aKs6K785YmgQP0AV2z4PeMfTRZcJOozWmDFFRAxA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PKa6XFmXiQBLjn6zuRWDATaE4AGDlXBYrWPgT24qTHzexhGI0SwzxoepP5wZ4AFJu n8HzJEJDnBhnmY09sH/+jvzq7EXWjRgE5OtPy+liNjeiqnB71gjKB7ozbD6wfrdFPK VAnvDoi4eF8BIrqFr/5bYNjORuYZyic4jhQRqa4tUd+bZgTgwKj4lusfrqSxqCi6Du MWR2l/di8ZVbQEc6dZZ7Nq1GRflKPJWcmEwBE5jVvZy5tgPnpD0x2HB4BYRHSN5H5r WFpHrjzl/z+zAIoFt5lDjhNukmyKN/0glo7IInu5wwq2zXZuZ28GRjYIoGn+aHjSc0 /OiUN00V+IZZw== From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: catalin.marinas@arm.com, will@kernel.org, Ard Biesheuvel , Mark Rutland , Peter Zijlstra Subject: [PATCH 2/2] arm64/xor: use EOR3 instructions when available Date: Tue, 9 Nov 2021 13:03:36 +0100 Message-Id: <20211109120336.3561463-3-ardb@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211109120336.3561463-1-ardb@kernel.org> References: <20211109120336.3561463-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5538; h=from:subject; bh=bR1aKs6K785YmgQP0AV2z4PeMfTRZcJOozWmDFFRAxA=; b=owEB7QES/pANAwAKAcNPIjmS2Y8kAcsmYgBhimOX6fd3xAy29KfOScK3+SgOetqsJFO0JYpYPWI2 sMxTLl+JAbMEAAEKAB0WIQT72WJ8QGnJQhU3VynDTyI5ktmPJAUCYYpjlwAKCRDDTyI5ktmPJCU6C/ oDs8p2QKWfV0t0xMxZ17mYclSIOCbpV3MLZf8MRDGLE6q6Xn0zIPvKWa9y/FY0AGufH8MxBfgtX64o +I8obacPVF4Uzb88MVCkU8MuxO9snyEgnTaU5M5m1aDluVkqh7OORC5WQAk1veTSCD2o+TsE3OyHVn U4gOjPu1J9y/Hpl/Q/nhT6v35b9mQoCD9TUTIXGolF3XbdAF0JEt/txklPeXcIv3SrBEqQPMwBFBPY zWDVhON6bl/x744uxNMmOqi67RaNWwBW8HGEgldgdRDOUZPGjpupEMMpoFyzeQsCtAR6z6fnpVGc/r 8HDcEp79FGIjPy11J8ILkaQZXtSq5avbBaTgx/2IqY4MFWMpIpLJmzc4I/ixL6RgPpdyWTw76338wO ggGKgmx3huoHaDJ1EE/3DP8XH11Xq7uraEaOwxbfzojy1bUdd2nnsQdacFutsWa4CMpAFrHENTWIjj kMgfV2HoU92Ois1UZh14ba2CK6iqvfMj+SzihaVltTb1U= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211109_040350_715231_B5C79941 X-CRM114-Status: GOOD ( 12.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Use the EOR3 instruction to implement xor_blocks() if the instruction is available, which is the case if the CPU implements the SHA-3 extension. This is about 20% faster on Apple M1 when using the 5-way version. Signed-off-by: Ard Biesheuvel --- arch/arm64/Kconfig | 3 + arch/arm64/lib/xor-neon.c | 145 ++++++++++++++++++++ 2 files changed, 148 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6f2d3e31fb54..14354acba5b4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2034,6 +2034,9 @@ config SYSVIPC_COMPAT def_bool y depends on COMPAT && SYSVIPC +config CC_HAVE_SHA3 + def_bool $(cc-option, -march=armv8.2-a+sha3) + menu "Power management options" source "kernel/power/Kconfig" diff --git a/arch/arm64/lib/xor-neon.c b/arch/arm64/lib/xor-neon.c index ee4795f3e166..0415cb94c781 100644 --- a/arch/arm64/lib/xor-neon.c +++ b/arch/arm64/lib/xor-neon.c @@ -172,6 +172,135 @@ void xor_arm64_neon_5(unsigned long bytes, unsigned long *p1, } EXPORT_SYMBOL(xor_arm64_neon_5); +static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) +{ + uint64x2_t res; + + asm(".arch armv8.2-a+sha3 \n" + "eor3 %0.16b, %1.16b, %2.16b, %3.16b" + : "=w"(res) : "w"(p), "w"(q), "w"(r)); + return res; +} + +static void xor_arm64_eor3_3(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + } while (--lines > 0); +} + +static void xor_arm64_eor3_4(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3, + unsigned long *p4) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + uint64_t *dp4 = (uint64_t *)p4; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* p1 ^= p4 */ + v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); + v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); + v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); + v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + dp4 += 8; + } while (--lines > 0); +} + +static void xor_arm64_eor3_5(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3, + unsigned long *p4, unsigned long *p5) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + uint64_t *dp4 = (uint64_t *)p4; + uint64_t *dp5 = (uint64_t *)p5; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* p1 ^= p4 ^ p5 */ + v0 = eor3(v0, vld1q_u64(dp4 + 0), vld1q_u64(dp5 + 0)); + v1 = eor3(v1, vld1q_u64(dp4 + 2), vld1q_u64(dp5 + 2)); + v2 = eor3(v2, vld1q_u64(dp4 + 4), vld1q_u64(dp5 + 4)); + v3 = eor3(v3, vld1q_u64(dp4 + 6), vld1q_u64(dp5 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + dp4 += 8; + dp5 += 8; + } while (--lines > 0); +} + DEFINE_STATIC_CALL(xor_arm64_3, xor_arm64_neon_3); DEFINE_STATIC_CALL(xor_arm64_4, xor_arm64_neon_4); DEFINE_STATIC_CALL(xor_arm64_5, xor_arm64_neon_5); @@ -180,6 +309,22 @@ EXPORT_STATIC_CALL(xor_arm64_3); EXPORT_STATIC_CALL(xor_arm64_4); EXPORT_STATIC_CALL(xor_arm64_5); +static int __init xor_neon_init(void) +{ + if (IS_ENABLED(CONFIG_CC_HAVE_SHA3) && cpu_have_named_feature(SHA3)) { + static_call_update(xor_arm64_3, xor_arm64_eor3_3); + static_call_update(xor_arm64_4, xor_arm64_eor3_4); + static_call_update(xor_arm64_5, xor_arm64_eor3_5); + } + return 0; +} +module_init(xor_neon_init); + +static void __exit xor_neon_exit(void) +{ +} +module_exit(xor_neon_exit); + MODULE_AUTHOR("Jackie Liu "); MODULE_DESCRIPTION("ARMv8 XOR Extensions"); MODULE_LICENSE("GPL"); -- 2.30.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel