From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 63BD0D3E195 for ; Fri, 18 Oct 2024 23:58:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=z8fog5lOsMA5lQaKuVUcy5oM1/zsMYXM9K/Ola2yywU=; b=knKG68fkqyqayPI/AMp6QNiv0J kBt8h4G6G93M3V52QV6IdV7oDPuT5jkuA9B/4mGXuLYYJFbIQ48ACipxIkiKKvIY3dh24MQ9shBWG 4w3UpOPSjI2q6jXqmEdMct7O+5Zag8KVTtmO9Y0+zBX5rwQi5dpJKzi5shwnydT/DFZXDdvu0wWYc aXNPP85IqsDZmkE7oDeNjw0XtZgHPQdgIrlhHNT12US/eEcJb4aCThCWbdE+gV/iFyMxh5ZUKbTGx GqYDlLyIc0mw7OqpANJWHuQtAJIveoakbQy8lMSAadDpHucF9NAiuH5PMdxBbNSh+2vatYRqJ1uU/ S1B62klg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t1wrW-00000002J1k-3avF; Fri, 18 Oct 2024 23:58:30 +0000 Received: from nyc.source.kernel.org ([2604:1380:45d1:ec00::3]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t1wpR-00000002IlU-2g75 for linux-arm-kernel@lists.infradead.org; Fri, 18 Oct 2024 23:56:23 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id A2406A44BF5; Fri, 18 Oct 2024 23:56:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 013AEC4CEC3; Fri, 18 Oct 2024 23:56:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729295780; bh=T+fck0FKE7KEsi6QVW88MImDd340gpAGDdrA780Mp+s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=S9UGa620VSwNCDAaA9nRPPBBCFnlXXzqehxpGzXt4J7hiWp0fUhOxOjXMPJtcsEkJ 6JbiuZtXv/nUeo9New77lOgv1o1uWW2GQXwA1wW0PK24DYlzKyFqWj57+HOkFBrRi/ PE6yKEfifMu/T/LGopqT3/AOcPOTL2hiJedsr2MdMlLPlWgo/GCxTeKUkl9/xLD3vZ Q5CEHgfzvB6FM2gvKjuoeZSieUomwWCC6ZO//EN/bFPzaetzfplfxA516ScH9qbf1r pvkxX/OJLiziFZOIHBTzo+eLgYdLk8KyROU6oyhNsVy3vzEbuBXXrCq9V5LbIRrtuE VNfZTihGI5+FA== Date: Fri, 18 Oct 2024 16:56:18 -0700 From: Eric Biggers To: Ard Biesheuvel Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, will@kernel.org, catalin.marinas@arm.com, Ard Biesheuvel , Kees Cook Subject: Re: [PATCH v4 3/3] arm64/crc32: Implement 4-way interleave using PMULL Message-ID: <20241018235618.GB2589@sol.localdomain> References: <20241018075347.2821102-5-ardb+git@google.com> <20241018075347.2821102-8-ardb+git@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241018075347.2821102-8-ardb+git@google.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241018_165621_760396_9378D835 X-CRM114-Status: GOOD ( 17.03 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Oct 18, 2024 at 09:53:51AM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel > > Now that kernel mode NEON no longer disables preemption, using FP/SIMD > in library code which is not obviously part of the crypto subsystem is > no longer problematic, as it will no longer incur unexpected latencies. > > So accelerate the CRC-32 library code on arm64 to use a 4-way > interleave, using PMULL instructions to implement the folding. > > On Apple M2, this results in a speedup of 2 - 2.8x when using input > sizes of 1k - 8k. For smaller sizes, the overhead of preserving and > restoring the FP/SIMD register file may not be worth it, so 1k is used > as a threshold for choosing this code path. > > The coefficient tables were generated using code provided by Eric. [0] > > [0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c > > Cc: Eric Biggers > Signed-off-by: Ard Biesheuvel > --- > arch/arm64/lib/crc32-glue.c | 48 ++++ > arch/arm64/lib/crc32.S | 231 +++++++++++++++++++- > 2 files changed, 276 insertions(+), 3 deletions(-) Reviewed-by: Eric Biggers > + if (len >= min_len && cpu_have_named_feature(PMULL) && crypto_simd_usable()) { Using crypto_simd_usable() here causes a build error when CRYPTO_ALGAPI2=m. https://lore.kernel.org/linux-crypto/20241018235343.425758-1-ebiggers@kernel.org will fix that. - Eric