From: Daniel Axtens <dja@axtens.net>
To: David Laight <David.Laight@ACULAB.COM>,
"linuxppc-dev\@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"linux-crypto\@vger.kernel.org" <linux-crypto@vger.kernel.org>
Cc: "anton@samba.org" <anton@samba.org>
Subject: RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
Date: Thu, 16 Mar 2017 09:30:17 +1100 [thread overview]
Message-ID: <87efxy41hi.fsf@possimpible.ozlabs.ibm.com> (raw)
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DCFFB1A81@AcuExch.aculab.com>
Hi David,
> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.
>
> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.
> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.
Internal testing on a NVMe device with T10DIF enabled on 4k blocks
shows a 20x - 30x improvement. Without these patches, crc_t10dif_generic
uses over 60% of CPU time - with these patches CRC drops to single
digits.
I should probably have lead with that, sorry.
FWIW, the original patch showed a 3.7x gain on btrfs as well -
6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
When Anton wrote the original code he had access to IBM's internal
tooling for looking at how instructions flow through the various stages
of the CPU, so I trust it's pretty much optimal from that point of view.
Regards,
Daniel
next prev parent reply other threads:[~2017-03-15 22:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-15 12:37 [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm Daniel Axtens
2017-03-15 12:37 ` [PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs Daniel Axtens
2017-03-16 10:44 ` Michael Ellerman
2017-03-15 12:37 ` [PATCH 3/4] crypto: powerpc - Add CRC-T10DIF acceleration Daniel Axtens
2017-03-15 12:37 ` [PATCH 4/4] crypto: powerpc - Stress test for vpmsum implementations Daniel Axtens
2017-03-15 16:10 ` [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm David Laight
2017-03-15 22:30 ` Daniel Axtens [this message]
2017-03-16 9:50 ` David Laight
2017-03-16 11:13 ` Anton Blanchard
2017-03-16 10:45 ` Michael Ellerman
2017-03-16 12:54 ` Daniel Axtens
2017-03-24 14:12 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87efxy41hi.fsf@possimpible.ozlabs.ibm.com \
--to=dja@axtens.net \
--cc=David.Laight@ACULAB.COM \
--cc=anton@samba.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox