linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code
@ 2011-12-01 20:13 Darrick J. Wong
  2011-12-01 20:13 ` [PATCH 01/14] crc32: removed two instances of trailing whitespaces Darrick J. Wong
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Darrick J. Wong @ 2011-12-01 20:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, Joakim Tjernlund, Bob Pearson, linux-kernel,
	Andreas Dilger, linux-crypto, linux-fsdevel, Mingming Cao,
	linux-ext4, Herbert Xu

Hi all,

This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
software crc32c implementation.  It removes the crc32c implementation in
crypto/ in favor of using the stamped-out one in lib/.  There is also a change
to Kconfig so that the kernel builder can pick an implementation best suited
for the hardware.

The motivation for this patchset is that I am working on adding full metadata
checksumming to ext4.  As far as performance impact of adding checksumming
goes, I see nearly no change with a standard mail server ffsb simulation.  On a
test that involves only file creation and deletion and extent tree writes, I
see a drop of about 50 pcercent with the current kernel crc32c implementation;
this improves to a drop of about 20 percent with the enclosed crc32c code.

When metadata is usually a small fraction of total IO, this new implementation
doesn't help much because metadata is usually a small fraction of total IO.
However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
tree), then this patch speeds up the operation substantially.

Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
should improve their speed as well.  I have not yet quantified that, however.
This latest submission combines Bob's patches from late August 2011 with mine
so that they can be one coherent patch set.  Please excuse my inability to
combine some of the patches; I've been advised to leave Bob's patches alone and
build atop them instead. :/

Since the last posting, I've also collected some crc32c test results on a bunch
of different x86/powerpc/sparc platforms.  The results can be viewed here:
http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c" columns describe the
performance of the kernel's current crc32 and crc32c software implementations.
The "crc32c-by8-le" column shows crc32c performance with this patchset applied.
I expect crc32 performance to be roughly the same.

The two _boost columns at the right side of the spreadsheet shows how much
faster the new implementation is over the old one.  As you can see, crc32 rises
substantially, and crc32c experiences a huge increase.  I'm hoping this patch
set meets with everyone's approval and can go in soon.  Herbert Xu didn't
appear to have any strong objections to last month's posting, so I'm wondering
if Andrew has an opinion?

v2: Use the crypto testmgr api for self-test.
v3: Get rid of the -be version, which had no users.
v4: Allow kernel builder a choice of speed vs. space optimization.
v5: Reuse lib/crc32 for crc32c as well, and make crypto/crc32c use lib/crc32.c.
v5.1: Include Bob Pearson's patches in submission request.
v5.2: Fix changelogs for Bob's patches per akpm request.

--D


^ permalink raw reply	[flat|nested] 29+ messages in thread
* [PATCH v5.3 00/14] crc32c: Add faster algorithm and self-test code
@ 2012-01-07  5:50 Darrick J. Wong
  2012-01-07  5:51 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong
  0 siblings, 1 reply; 29+ messages in thread
From: Darrick J. Wong @ 2012-01-07  5:50 UTC (permalink / raw)
  To: Andrew Morton, Herbert Xu, Darrick J. Wong
  Cc: Theodore Tso, Joakim Tjernlund, Bob Pearson, linux-kernel,
	Andreas Dilger, linux-crypto, linux-fsdevel, Mingming Cao,
	linux-ext4

Hi all,

This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
software crc32c implementation.  It removes the crc32c implementation in
crypto/ in favor of using the stamped-out one in lib/.  There is also a change
to Kconfig so that the kernel builder can pick an implementation best suited
for the hardware.

The motivation for this patchset is that I am working on adding full metadata
checksumming to ext4.  As far as performance impact of adding checksumming
goes, I see nearly no change with a standard mail server ffsb simulation.  On a
test that involves only file creation and deletion and extent tree writes, I
see a drop of about 50 pcercent with the current kernel crc32c implementation;
this improves to a drop of about 20 percent with the enclosed crc32c code.

When metadata is usually a small fraction of total IO, this new implementation
doesn't help much because metadata is usually a small fraction of total IO.
However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
tree), then this patch speeds up the operation substantially.

Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
should improve their speed as well.  I have not yet quantified that, however.
This latest submission combines Bob's patches from late August 2011 with mine
so that they can be one coherent patch set.  Please excuse my inability to
combine some of the patches; I've been advised to leave Bob's patches alone and
build atop them instead. :/

Since the last posting, I've also collected some crc32c test results on a bunch
of different x86/powerpc/sparc platforms.  The results can be viewed here:
http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c" columns describe the
performance of the kernel's current crc32 and crc32c software implementations.
The "crc32c-by8-le" column shows crc32c performance with this patchset applied.
I expect crc32 performance to be roughly the same.

The two _boost columns at the right side of the spreadsheet shows how much
faster the new implementation is over the old one.  As you can see, crc32 rises
substantially, and crc32c experiences a huge increase.

Since this patch has been out for review for several weeks now without
objections, can this go into 3.3, please?

v2: Use the crypto testmgr api for self-test.
v3: Get rid of the -be version, which had no users.
v4: Allow kernel builder a choice of speed vs. space optimization.
v5: Reuse lib/crc32 for crc32c as well, and make crypto/crc32c use lib/crc32.c.
v5.1: Include Bob Pearson's patches in submission request.
v5.2: Fix changelogs for Bob's patches per akpm request.
v5.3: Fix from header bug in patch mail generation scripts.

--D

^ permalink raw reply	[flat|nested] 29+ messages in thread
* [PATCH v5.1 00/14] crc32c: Add faster algorithm and self-test code
@ 2011-11-28 22:36 Darrick J. Wong
  2011-11-28 22:38 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong
  0 siblings, 1 reply; 29+ messages in thread
From: Darrick J. Wong @ 2011-11-28 22:36 UTC (permalink / raw)
  To: Andrew Morton, Herbert Xu, Darrick J. Wong
  Cc: Theodore Tso, Joakim Tjernlund, Bob Pearson, linux-kernel,
	Andreas Dilger, linux-crypto, linux-fsdevel, Mingming Cao,
	linux-ext4

Hi all,

This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
software crc32c implementation.  It removes the crc32c implementation in
crypto/ in favor of using the stamped-out one in lib/.  There is also a change
to Kconfig so that the kernel builder can pick an implementation best suited
for the hardware.

The motivation for this patchset is that I am working on adding full metadata
checksumming to ext4.  As far as performance impact of adding checksumming
goes, I see nearly no change with a standard mail server ffsb simulation.  On a
test that involves only file creation and deletion and extent tree writes, I
see a drop of about 50 pcercent with the current kernel crc32c implementation;
this improves to a drop of about 20 percent with the enclosed crc32c code.

When metadata is usually a small fraction of total IO, this new implementation
doesn't help much because metadata is usually a small fraction of total IO.
However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
tree), then this patch speeds up the operation substantially.

Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
should improve their speed as well.  I have not yet quantified that, however.
This latest submission combines Bob's patches from late August 2011 with mine
so that they can be one coherent patch set.  Please excuse my inability to
combine some of the patches; I've been advised to leave Bob's patches alone and
build atop them instead. :/

Since the last posting, I've also collected some crc32c test results on a bunch
of different x86/powerpc/sparc platforms.  The results can be viewed here:
http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c" columns describe the
performance of the kernel's current crc32 and crc32c software implementations.
The "crc32c-by8-le" column shows crc32c performance with this patchset applied.
I expect crc32 performance to be roughly the same.

The two _boost columns at the right side of the spreadsheet shows how much
faster the new implementation is over the old one.  As you can see, crc32 rises
substantially, and crc32c experiences a huge increase.  I'm hoping this patch
set meets with everyone's approval and can go in soon.  Herbert Xu didn't
appear to have any strong objections to last month's posting, so I'm wondering
if Andrew has an opinion?

--D

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-01-07  5:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-01 20:13 [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code Darrick J. Wong
2011-12-01 20:13 ` [PATCH 01/14] crc32: removed two instances of trailing whitespaces Darrick J. Wong
2011-12-01 20:13 ` [PATCH 02/14] crc32: Move long comment about crc32 fundamentals to Documentation/ Darrick J. Wong
2011-12-01 20:14 ` [PATCH 03/14] crc32: Simplify unit test code Darrick J. Wong
2011-12-01 20:14 ` [PATCH 04/14] crc32: Speed up memory table access on powerpc Darrick J. Wong
2011-12-01 20:14 ` [PATCH 05/14] crc32: Miscellaneous cleanups Darrick J. Wong
2011-12-01 20:14 ` [PATCH 06/14] crc32: Fix mixing of endian-specific types Darrick J. Wong
2011-12-01 20:14 ` [PATCH 07/14] crc32: Make CRC_*_BITS definition correspond to actual bit counts Darrick J. Wong
2011-12-01 20:14 ` [PATCH 08/14] crc32: Add slice-by-8 algorithm to existing code Darrick J. Wong
2011-12-01 20:14 ` [PATCH 09/14] crc32: Optimize loop counter for x86 Darrick J. Wong
2011-12-01 20:14 ` [PATCH 10/14] crc32: Add note about this patchset to crc32.c Darrick J. Wong
2011-12-01 20:14 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong
2011-12-01 20:15 ` [PATCH 12/14] crypto: crc32c should use library implementation Darrick J. Wong
2011-12-01 20:15 ` [PATCH 13/14] crc32: Add self-test code for crc32c Darrick J. Wong
2011-12-01 20:15 ` [PATCH 14/14] crc32: Select an algorithm via kconfig Darrick J. Wong
2011-12-02  0:25   ` Herbert Xu
2011-12-03  2:36     ` Darrick J. Wong
2011-12-12 22:58       ` Darrick J. Wong
2011-12-12 23:10         ` Bob Pearson
2011-12-13  6:32           ` Darrick J. Wong
2011-12-13  8:27             ` Joakim Tjernlund
2011-12-13 18:36               ` Darrick J. Wong
2011-12-01 20:20 ` [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code Joel Becker
2011-12-01 20:31   ` Darrick J. Wong
2011-12-02  0:23     ` Herbert Xu
2011-12-03  2:30       ` Darrick J. Wong
2011-12-03 11:00         ` Herbert Xu
  -- strict thread matches above, loose matches on Subject: below --
2012-01-07  5:50 [PATCH v5.3 " Darrick J. Wong
2012-01-07  5:51 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong
2011-11-28 22:36 [PATCH v5.1 00/14] crc32c: Add faster algorithm and self-test code Darrick J. Wong
2011-11-28 22:38 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).