* [PATCH v2] ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers
@ 2025-11-22 4:39 Guan-Chun Wu
2026-04-09 14:10 ` Theodore Tso
0 siblings, 1 reply; 2+ messages in thread
From: Guan-Chun Wu @ 2025-11-22 4:39 UTC (permalink / raw)
To: tytso, adilger.kernel
Cc: linux-ext4, linux-kernel, visitorckw, david.laight.linux,
Guan-Chun Wu
The original byte-by-byte implementation with modulo checks is less
efficient. Refactor str2hashbuf_unsigned() and str2hashbuf_signed()
to process input in explicit 4-byte chunks instead of using a
modulus-based loop to emit words byte by byte.
Additionally, the use of function pointers for selecting the appropriate
str2hashbuf implementation has been removed. Instead, the functions are
directly invoked based on the hash type, eliminating the overhead of
dynamic function calls.
Performance test (x86_64, Intel Core i7-10700 @ 2.90GHz, average over 10000
runs, using kernel module for testing):
len | orig_s | new_s | orig_u | new_u
----+--------+-------+--------+-------
1 | 70 | 71 | 63 | 63
8 | 68 | 64 | 64 | 62
32 | 75 | 70 | 75 | 63
64 | 96 | 71 | 100 | 68
255 | 192 | 108 | 187 | 84
This change improves performance, especially for larger input sizes.
Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw>
---
v1 -> v2:
- Drop redundant (int) casts.
- Replace indirect calls with simple conditionals.
- Use get_unaligned_be32() instead of manual byte extraction.
- Link to v1: https://lore.kernel.org/lkml/20251116130105.1988020-1-409411716@gms.tku.edu.tw/
---
fs/ext4/hash.c | 64 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 22 deletions(-)
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index 33cd5b6b02d5..97b7a3b0603e 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -9,6 +9,7 @@
#include <linux/unicode.h>
#include <linux/compiler.h>
#include <linux/bitops.h>
+#include <linux/unaligned.h>
#include "ext4.h"
#define DELTA 0x9E3779B9
@@ -141,21 +142,28 @@ static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
- val = pad;
if (len > num*4)
len = num * 4;
- for (i = 0; i < len; i++) {
- val = ((int) scp[i]) + (val << 8);
- if ((i % 4) == 3) {
- *buf++ = val;
- val = pad;
- num--;
- }
+
+ while (len >= 4) {
+ val = (scp[0] << 24) + (scp[1] << 16) + (scp[2] << 8) + scp[3];
+ *buf++ = val;
+ scp += 4;
+ len -= 4;
+ num--;
}
+
+ val = pad;
+
+ for (i = 0; i < len; i++)
+ val = scp[i] + (val << 8);
+
if (--num >= 0)
*buf++ = val;
+
while (--num >= 0)
*buf++ = pad;
+
}
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
@@ -167,21 +175,28 @@ static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
- val = pad;
if (len > num*4)
len = num * 4;
- for (i = 0; i < len; i++) {
- val = ((int) ucp[i]) + (val << 8);
- if ((i % 4) == 3) {
- *buf++ = val;
- val = pad;
- num--;
- }
+
+ while (len >= 4) {
+ val = get_unaligned_be32(ucp);
+ *buf++ = val;
+ ucp += 4;
+ len -= 4;
+ num--;
}
+
+ val = pad;
+
+ for (i = 0; i < len; i++)
+ val = ucp[i] + (val << 8);
+
if (--num >= 0)
*buf++ = val;
+
while (--num >= 0)
*buf++ = pad;
+
}
/*
@@ -205,8 +220,7 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
const char *p;
int i;
__u32 in[8], buf[4];
- void (*str2hashbuf)(const char *, int, __u32 *, int) =
- str2hashbuf_signed;
+ bool use_unsigned = false;
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
@@ -232,12 +246,15 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
- str2hashbuf = str2hashbuf_unsigned;
+ use_unsigned = true;
fallthrough;
case DX_HASH_HALF_MD4:
p = name;
while (len > 0) {
- (*str2hashbuf)(p, len, in, 8);
+ if (use_unsigned)
+ str2hashbuf_unsigned(p, len, in, 8);
+ else
+ str2hashbuf_signed(p, len, in, 8);
half_md4_transform(buf, in);
len -= 32;
p += 32;
@@ -246,12 +263,15 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
hash = buf[1];
break;
case DX_HASH_TEA_UNSIGNED:
- str2hashbuf = str2hashbuf_unsigned;
+ use_unsigned = true;
fallthrough;
case DX_HASH_TEA:
p = name;
while (len > 0) {
- (*str2hashbuf)(p, len, in, 4);
+ if (use_unsigned)
+ str2hashbuf_unsigned(p, len, in, 4);
+ else
+ str2hashbuf_signed(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
--
2.34.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers
2025-11-22 4:39 [PATCH v2] ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers Guan-Chun Wu
@ 2026-04-09 14:10 ` Theodore Tso
0 siblings, 0 replies; 2+ messages in thread
From: Theodore Tso @ 2026-04-09 14:10 UTC (permalink / raw)
To: Guan-Chun Wu
Cc: adilger.kernel, linux-ext4, linux-kernel, visitorckw,
david.laight.linux
On Sat, Nov 22, 2025 at 12:39:29PM +0800, Guan-Chun Wu wrote:
> The original byte-by-byte implementation with modulo checks is less
> efficient. Refactor str2hashbuf_unsigned() and str2hashbuf_signed()
> to process input in explicit 4-byte chunks instead of using a
> modulus-based loop to emit words byte by byte.
>
> Additionally, the use of function pointers for selecting the appropriate
> str2hashbuf implementation has been removed. Instead, the functions are
> directly invoked based on the hash type, eliminating the overhead of
> dynamic function calls.
>
> Performance test (x86_64, Intel Core i7-10700 @ 2.90GHz, average over 10000
> runs, using kernel module for testing):
>
> len | orig_s | new_s | orig_u | new_u
> ----+--------+-------+--------+-------
> 1 | 70 | 71 | 63 | 63
> 8 | 68 | 64 | 64 | 62
> 32 | 75 | 70 | 75 | 63
> 64 | 96 | 71 | 100 | 68
> 255 | 192 | 108 | 187 | 84
>
> This change improves performance, especially for larger input sizes.
>
> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw>
Apologies for the delay in looking at this. It fell through the
cracks on my end.
Because of how I'm a bit late with reviewing patches before the merge
window, I'm going to be very conservative in which patches I'm going
to land. So this is going to be deferred until the next cycle, but I
wanted to let you know that I haven't forgotten about it.
If this was a comprehensive set of Kunit tests for fs/ext4/hash.c, I
might have taken it. And that's something that I would look at adding
for the next cycle, but if you'd be interested in creating the kunit
tests for hash.c, that would be great.
- Ted
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-09 14:12 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-22 4:39 [PATCH v2] ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers Guan-Chun Wu
2026-04-09 14:10 ` Theodore Tso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox