From: "Darrick J. Wong" <djwong@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
"Darrick J. Wong" <djwong@us.ibm.com>
Cc: Theodore Tso <tytso@mit.edu>,
Joakim Tjernlund <joakim.tjernlund@transmode.se>,
Bob Pearson <rpearson@systemfabricworks.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Andreas Dilger <adilger.kernel@dilger.ca>,
linux-crypto <linux-crypto@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Mingming Cao <cmm@us.ibm.com>,
linux-ext4@vger.kernel.org
Subject: [PATCH 09/14] Add two changes that improve the performance of x86 systems
Date: Mon, 28 Nov 2011 14:38:07 -0800 [thread overview]
Message-ID: <20111128223807.28705.45122.stgit@elm3c44.beaverton.ibm.com> (raw)
In-Reply-To: <20111128223659.28705.56719.stgit@elm3c44.beaverton.ibm.com>
1. replace main loop with incrementing counter
this change improves the performance of the selftest
by about 5-6% on Nehalem CPUs. The apparent
reason is that the compiler can use the loop index
to perform an indexed memory access. This is
reported to make the performance of PowerPC CPUs
to get worse.
2. replace the rem_len loop with incrementing counter
this change improves the performance of the selftest,
which has more than the usual number of occurances,
by about 1-2% on x86 CPUs. In actual work loads
the length is most often a multiple of 4 bytes and
this code does not get executed as often if at all.
Again this change is reported to make the performance
of PowerPC get worse.
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
---
lib/crc32.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/lib/crc32.c b/lib/crc32.c
index 6311712..2c8e8c0 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -66,6 +66,9 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
# endif
const u32 *b;
size_t rem_len;
+# ifdef CONFIG_X86
+ size_t i;
+# endif
const u32 *t0 = tab[0], *t1 = tab[1], *t2 = tab[2], *t3 = tab[3];
const u32 *t4 = tab[4], *t5 = tab[5], *t6 = tab[6], *t7 = tab[7];
u32 q;
@@ -86,7 +89,12 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
# endif
b = (const u32 *)buf;
+# ifdef CONFIG_X86
+ --b;
+ for (i = 0; i < len; i++) {
+# else
for (--b; len; --len) {
+# endif
q = crc ^ *++b; /* use pre increment for speed */
# if CRC_LE_BITS == 32
crc = DO_CRC4;
@@ -100,9 +108,14 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
/* And the last few bytes */
if (len) {
u8 *p = (u8 *)(b + 1) - 1;
+# ifdef CONFIG_X86
+ for (i = 0; i < len; i++)
+ DO_CRC(*++p); /* use pre increment for speed */
+# else
do {
DO_CRC(*++p); /* use pre increment for speed */
} while (--len);
+# endif
}
return crc;
#undef DO_CRC
next prev parent reply other threads:[~2011-11-28 22:38 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-28 22:36 [PATCH v5.1 00/14] crc32c: Add faster algorithm and self-test code Darrick J. Wong
2011-11-28 22:37 ` [PATCH 01/14] removed two instances of trailing whitespaces Darrick J. Wong
2011-11-28 22:37 ` [PATCH 02/14] Moved a long comment from lib/crc32.c to Documentation/crc32.txt Darrick J. Wong
2011-11-28 22:37 ` [PATCH 03/14] Replaced the unit test provided in crc32.c, which doesn't have a Darrick J. Wong
2011-11-28 22:37 ` [PATCH 04/14] Replace 2D array references by pointer references in loops Darrick J. Wong
2011-11-28 22:37 ` [PATCH 05/14] Misc cleanup of lib/crc32.c and related files Darrick J. Wong
2011-11-28 22:37 ` [PATCH 06/14] crc32.c in its original version freely mixed u32, __le32 and __be32 types Darrick J. Wong
2011-11-28 22:37 ` [PATCH 07/14] crc32.c provides a choice of one of several algorithms for Darrick J. Wong
2011-11-28 22:37 ` [PATCH 08/14] add slicing-by-8 algorithm to the existing Darrick J. Wong
2011-11-28 22:38 ` Darrick J. Wong [this message]
2011-11-28 22:38 ` [PATCH 10/14] Some final changes Darrick J. Wong
2011-11-28 22:38 ` [PATCH 11/14] crc32: Bolt on crc32c Darrick J. Wong
2011-11-28 22:38 ` [PATCH 12/14] crypto: crc32c should use library implementation Darrick J. Wong
2011-11-28 22:38 ` [PATCH 13/14] crc32: Add self-test code for crc32c Darrick J. Wong
2011-11-28 22:38 ` [PATCH 14/14] crc32: Select an algorithm via kconfig Darrick J. Wong
2011-11-30 22:29 ` [PATCH v5.1 00/14] crc32c: Add faster algorithm and self-test code Andrew Morton
2011-12-01 20:12 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111128223807.28705.45122.stgit@elm3c44.beaverton.ibm.com \
--to=djwong@us.ibm.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=cmm@us.ibm.com \
--cc=herbert@gondor.apana.org.au \
--cc=joakim.tjernlund@transmode.se \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rpearson@systemfabricworks.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).