From: Ard Biesheuvel <ardb+git@google.com>
To: linux-crypto@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org, ebiggers@kernel.org,
herbert@gondor.apana.org.au, keescook@chromium.org,
Ard Biesheuvel <ardb@kernel.org>
Subject: [PATCH 0/6] Clean up and improve ARM/arm64 CRC-T10DIF code
Date: Mon, 28 Oct 2024 20:02:08 +0100 [thread overview]
Message-ID: <20241028190207.1394367-8-ardb+git@google.com> (raw)
From: Ard Biesheuvel <ardb@kernel.org>
I realized that the generic sequence implementing 64x64 polynomial
multiply using 8x8 PMULL instructions, which is used in the CRC-T10DIF
code to implement a fallback version for cores that lack the 64x64 PMULL
instruction, is not very efficient.
The folding coefficients that are used when processing the bulk of the
data are only 16 bits wide, and so 3/4 of the partial results of all those
8x8->16 bit multiplications do not contribute anything to the end result.
This means we can use a much faster implementation, producing a speedup
of 3.3x on Cortex-A72 without Crypto Extensions (Raspberry Pi 4).
The same logic can be ported to 32-bit ARM too, where it produces a
speedup of 6.6x compared with the generic C implementation on the same
platform.
Ard Biesheuvel (6):
crypto: arm64/crct10dif - Remove obsolete chunking logic
crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply
crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl
crypto: arm/crct10dif - Macroify PMULL asm code
crypto: arm/crct10dif - Implement plain NEON variant
arch/arm/crypto/crct10dif-ce-core.S | 201 ++++++++------
arch/arm/crypto/crct10dif-ce-glue.c | 54 +++-
arch/arm64/crypto/crct10dif-ce-core.S | 282 +++++++-------------
arch/arm64/crypto/crct10dif-ce-glue.c | 43 ++-
4 files changed, 274 insertions(+), 306 deletions(-)
--
2.47.0.163.g1226f6d8fa-goog
next reply other threads:[~2024-10-28 19:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-28 19:02 Ard Biesheuvel [this message]
2024-10-28 19:02 ` [PATCH 1/6] crypto: arm64/crct10dif - Remove obsolete chunking logic Ard Biesheuvel
2024-10-30 3:54 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 2/6] crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply Ard Biesheuvel
2024-10-30 4:01 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 3/6] crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code Ard Biesheuvel
2024-10-30 4:15 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 4/6] crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl Ard Biesheuvel
2024-10-30 4:29 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 5/6] crypto: arm/crct10dif - Macroify PMULL asm code Ard Biesheuvel
2024-10-30 4:31 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant Ard Biesheuvel
2024-10-30 4:33 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241028190207.1394367-8-ardb+git@google.com \
--to=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=ebiggers@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=keescook@chromium.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).