From: Kamlesh Gurudasani <kamlesh@ti.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
"David S. Miller" <davem@davemloft.net>,
Rob Herring <robh+dt@kernel.org>,
"Krzysztof Kozlowski" <krzysztof.kozlowski+dt@linaro.org>,
Conor Dooley <conor+dt@kernel.org>, Nishanth Menon <nm@ti.com>,
Vignesh Raghavendra <vigneshr@ti.com>,
Tero Kristo <kristo@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
Alexandre Torgue <alexandre.torgue@foss.st.com>,
<linux-crypto@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<devicetree@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-stm32@st-md-mailman.stormreply.com>
Subject: Re: [EXTERNAL] Re: [EXTERNAL] Re: [PATCH v2 0/6] Add support for Texas Instruments MCRC64 engine
Date: Wed, 30 Aug 2023 20:16:26 +0530 [thread overview]
Message-ID: <87pm34d1vh.fsf@kamlesh.i-did-not-set--mail-host-address--so-tickle-me> (raw)
In-Reply-To: <20230822051710.GC1661@sol.localdomain>
Eric Biggers <ebiggers@kernel.org> writes:
Somehow couple of my earlier mails got blocked mailing list because of
table formatting, I guess. Resending. Accept my apologies for spamming.
> On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote:
>> Hi Eric,
>>
>> We are more interested in offload than performance, with splice system
>> call and DMA mode in driver(will be implemented after this series gets
>> merged), good amount of cpu cycles will be saved.
>
> So it's for power usage, then? Or freeing up CPU for other tasks?
It's for freeing the CPU for other tasks
>
>> There is one more mode(auto mode) in mcrc64 which helps to verify crc64
>> values against pre calculated crc64, saving the efforts of comparing in
>> userspace.
>
> Is there any path forward to actually support this?
>
>>
>> Current generic implementation of crc64-iso(part of this series)
>> gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812
>> Mb/s when tested with tcrypt.
>
> This doesn't answer my question, which to reiterate was:
>
> How does performance compare to a properly optimized software CRC
> implementation on your platform, i.e. an implementation using carryless
> multiplication instructions (e.g. ARMv8 CE) if available on your platform,
> otherwise an implementation using the slice-by-8 or slice-by-16 method?
>
> The implementation you tested was slice-by-1. Compared to that, it's common for
> slice-by-8 to speed up CRCs by about 4 times and for folding with carryless
> multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory
> bandwidth. I don't know what specific results you would get on your specific
> CPU and for this specific CRC, and you could certainly see something different
> if you e.g. have some low-end embedded CPU. But those are the typical results
> I've seen for other CRCs on different CPUs. So, a software implementation may
> be more attractive than you realize. It could very well be the case that a
> PMULL based CRC implementation actually ends up with less CPU load than your
> "hardware offload", when taking into syscall, algif_hash, and driver overhead...
>
> - Eric
Hi Eric, thanks for your detailed and valuable inputs.
As per your suggestion, we did some profiling.
Use case is to calculate crc32/crc64 for file input from user space.
Instead of directly implementing PMULL based CRC64, we made first comparison between Case 1.
CRC32 (splice() + kernel space SW driver)
https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21
Case 2.
CRC32(mmap() + userspace armv8 crc32 instruction implementation)
(tried read() as well to get contents of file,
but that lost to mmap() so not mentioning number here)
https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d
Case 3.
CRC64 (splice() + MCRC64 HW)
https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0
Overall, overhead of userspace + af_alg + driver in (Case 1) and ( Case
3) is ~0.025s, which is constant for any file size.
This is calculated using
real time to calculate crc - driver time (time spend inside init() + update() +final()) = overhead ~0.025s
Here, if we consider similar numbers for crc64 PMULL implementation as
crc32 (case 2) ,
we save good number of cpu cycles using mcrc64 in case of files bigger
than 5-10mb as most of the time is being spent in HW offload.
Comparison table:
https://gist.github.com/ti-kamlesh/8117b6f7120960a71541ab67c671602a
prev parent reply other threads:[~2023-08-30 18:31 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 19:28 [PATCH v2 0/6] Add support for Texas Instruments MCRC64 engine Kamlesh Gurudasani
2023-08-10 19:28 ` [PATCH v2 1/6] lib: add ISO 3309 model crc64 Kamlesh Gurudasani
2023-08-10 19:28 ` [PATCH v2 2/6] crypto: crc64 - add crc64-iso framework Kamlesh Gurudasani
2023-08-11 4:24 ` Eric Biggers
2023-08-11 6:40 ` [EXTERNAL] " Kamlesh Gurudasani
2023-08-12 2:55 ` Eric Biggers
2023-08-18 7:25 ` [EXTERNAL] " Kamlesh Gurudasani
2024-02-22 21:50 ` Elliott, Robert (Servers)
2024-02-29 8:45 ` Kamlesh Gurudasani
2023-08-10 19:28 ` [PATCH v2 3/6] dt-bindings: crypto: Add Texas Instruments MCRC64 Kamlesh Gurudasani
2023-08-11 15:34 ` Conor Dooley
2023-08-11 15:36 ` Conor Dooley
2024-05-27 8:25 ` [EXTERNAL] " Kamlesh Gurudasani
2024-05-27 8:33 ` Krzysztof Kozlowski
2024-05-27 10:11 ` Kamlesh Gurudasani
2024-05-29 5:13 ` Vignesh Raghavendra
2023-08-10 19:28 ` [PATCH v2 4/6] crypto: ti - add driver for MCRC64 engine Kamlesh Gurudasani
2023-08-10 19:28 ` [PATCH v2 5/6] arm64: dts: ti: k3-am62: Add dt node, cbass_main ranges for MCRC64 Kamlesh Gurudasani
2023-08-10 20:21 ` Nishanth Menon
2023-08-10 19:28 ` [PATCH v2 6/6] arm64: defconfig: enable TI MCRC64 module Kamlesh Gurudasani
2023-08-10 20:25 ` Nishanth Menon
2023-08-12 3:01 ` [PATCH v2 0/6] Add support for Texas Instruments MCRC64 engine Eric Biggers
2023-08-18 9:06 ` [EXTERNAL] " Kamlesh Gurudasani
2023-08-22 5:17 ` Eric Biggers
2023-08-30 11:51 ` [EXTERNAL] " Kamlesh Gurudasani
2023-09-20 6:53 ` Kamlesh Gurudasani
2023-10-03 6:07 ` Kamlesh Gurudasani
2023-08-30 13:48 ` Kamlesh Gurudasani
2023-08-30 14:46 ` Kamlesh Gurudasani [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pm34d1vh.fsf@kamlesh.i-did-not-set--mail-host-address--so-tickle-me \
--to=kamlesh@ti.com \
--cc=alexandre.torgue@foss.st.com \
--cc=catalin.marinas@arm.com \
--cc=conor+dt@kernel.org \
--cc=davem@davemloft.net \
--cc=devicetree@vger.kernel.org \
--cc=ebiggers@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=kristo@kernel.org \
--cc=krzysztof.kozlowski+dt@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-stm32@st-md-mailman.stormreply.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=nm@ti.com \
--cc=robh+dt@kernel.org \
--cc=vigneshr@ti.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).