* Re: [PATCH] ext4: make mballoc max prealloc size configurable
From: Jan Kara @ 2026-04-13 8:37 UTC (permalink / raw)
To: guzebing
Cc: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
yi.zhang, guzebing, linux-kernel, linux-ext4
In-Reply-To: <20260410035635.1381920-1-guzebing1612@gmail.com>
On Fri 10-04-26 11:56:35, guzebing wrote:
> From: Guzebing <guzebing@bytedance.com>
>
> Add per-superblock sysfs knob mb_max_prealloc_kb (min 8MiB, roundup
> pow2) and use it in request normalization.
>
> When multiple tasks write to different files on the same filesystem
> concurrently, each file ends up with 8 MiB extents. If the preallocation
> size is increased, the resulting extent size grows accordingly. Due
> to the readahead mechanism on NVMe SSDs, files with larger extents
> achieve higher sequential read throughput.
>
> On an ext4 filesystem on an NVMe Gen4 data drive, dd read throughput
> for a file with 8 MiB extents is 455 MB/s, while for a file with
> 32 MiB extents it reaches 702 MB/s.
Hum, I think you are not speaking about general Linux readahead code here..
> Steps to reproduce:
> 1.Configure the maximum preallocation size to 8 MiB or 32 MiB:
> echo 8192 > /sys/fs/ext4/nvme13n1/mb_max_prealloc_kb
> echo 32768 > /sys/fs/ext4/nvme13n1/mb_max_prealloc_kb
>
> 2.Run the following commands simultaneously so that the extents of
> the two files are physically interleaved, resulting in 8 MiB or 32 MiB
> extents:
> dd if=/dev/zero of=/mnt/store1/501.txt bs=128K count=80K oflag=direct
> dd if=/dev/zero of=/mnt/store1/502.txt bs=128K count=80K oflag=direct
>
> 3.Read back the file and measure the read throughput:
> dd if=/mnt/store1/501.txt of=/dev/null bs=128K count=80K iflag=direct
OK, seeing that you are using direct IO here you are likely speaking about
some internal mechanism within the SSD that is happier when the IO is more
contiguous in the LBA space?
In general I find the example you show with dd not very performance
relevant. If you care about performance, then you should be running
multiple direct IO requests in parallel (either with AIO/DIO or with
iouring). Or you should be using buffered IO to do this for you behind the
scenes. So do you have a more realistic usecase where the extent allocation
size matters so much or is this mostly a benchmarking exercise?
Honza
>
> Signed-off-by: Guzebing <guzebing@bytedance.com>
> ---
> Documentation/ABI/testing/sysfs-fs-ext4 | 8 +++++++
> fs/ext4/ext4.h | 1 +
> fs/ext4/mballoc.c | 2 +-
> fs/ext4/super.c | 1 +
> fs/ext4/sysfs.c | 28 ++++++++++++++++++++++++-
> 5 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-fs-ext4 b/Documentation/ABI/testing/sysfs-fs-ext4
> index 2edd0a6672d3a..316ae1d1ec18b 100644
> --- a/Documentation/ABI/testing/sysfs-fs-ext4
> +++ b/Documentation/ABI/testing/sysfs-fs-ext4
> @@ -48,6 +48,14 @@ Description:
> will have its blocks allocated out of its own unique
> preallocation pool.
>
> +What: /sys/fs/ext4/<disk>/mb_max_prealloc_kb
> +Date: April 2026
> +Contact: "Linux Ext4 Development List" <linux-ext4@vger.kernel.org>
> +Description:
> + Maximum size (in kilobytes) used by the multiblock allocator's
> + normalized request preallocation heuristic. Values are rounded
> + up to a power of two and clamped to a minimum of 8192 (8MiB).
> +
> What: /sys/fs/ext4/<disk>/inode_readahead_blks
> Date: March 2008
> Contact: "Theodore Ts'o" <tytso@mit.edu>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 7617e2d454ea5..bce99740740f5 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1634,6 +1634,7 @@ struct ext4_sb_info {
> unsigned int s_mb_best_avail_max_trim_order;
> unsigned int s_sb_update_sec;
> unsigned int s_sb_update_kb;
> + unsigned int s_mb_max_prealloc_kb;
>
> /* where last allocation was done - for stream allocation */
> ext4_group_t *s_mb_last_groups;
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index bb58eafb87bcd..f5f63c56fcdac 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4589,7 +4589,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
> (8<<20)>>bsbits, max, 8 * 1024)) {
> start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
> (23 - bsbits)) << 23;
> - size = 8 * 1024 * 1024;
> + size = (loff_t)sbi->s_mb_max_prealloc_kb << 10;
> } else {
> start_off = (loff_t) ac->ac_o_ex.fe_logical << bsbits;
> size = (loff_t) EXT4_C2B(sbi,
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index a34efb44e73d7..f815e31657cc9 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5447,6 +5447,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> sbi->s_stripe = 0;
> }
> sbi->s_extent_max_zeroout_kb = 32;
> + sbi->s_mb_max_prealloc_kb = 8 * 1024;
>
> /*
> * set up enough so that it can read an inode
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 923b375e017fa..6339492eb2fa7 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -10,6 +10,8 @@
>
> #include <linux/time.h>
> #include <linux/fs.h>
> +#include <linux/log2.h>
> +#include <linux/limits.h>
> #include <linux/seq_file.h>
> #include <linux/slab.h>
> #include <linux/proc_fs.h>
> @@ -41,6 +43,7 @@ typedef enum {
> attr_pointer_atomic,
> attr_journal_task,
> attr_err_report_sec,
> + attr_mb_max_prealloc_kb,
> } attr_id_t;
>
> typedef enum {
> @@ -115,6 +118,25 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
> return count;
> }
>
> +static ssize_t mb_max_prealloc_kb_store(struct ext4_sb_info *sbi,
> + const char *buf, size_t count)
> +{
> + unsigned int v;
> + int ret;
> + unsigned long rounded;
> +
> + ret = kstrtouint(skip_spaces(buf), 0, &v);
> + if (ret)
> + return ret;
> + if (v < 8192)
> + v = 8192;
> + rounded = roundup_pow_of_two((unsigned long)v);
> + if (rounded > UINT_MAX)
> + return -EINVAL;
> + sbi->s_mb_max_prealloc_kb = (unsigned int)rounded;
> + return count;
> +}
> +
> static ssize_t trigger_test_error(struct ext4_sb_info *sbi,
> const char *buf, size_t count)
> {
> @@ -288,6 +310,7 @@ EXT4_RW_ATTR_SBI_UI(mb_prefetch_limit, s_mb_prefetch_limit);
> EXT4_RW_ATTR_SBI_UL(last_trim_minblks, s_last_trim_minblks);
> EXT4_RW_ATTR_SBI_UI(sb_update_sec, s_sb_update_sec);
> EXT4_RW_ATTR_SBI_UI(sb_update_kb, s_sb_update_kb);
> +EXT4_ATTR_OFFSET(mb_max_prealloc_kb, 0644, mb_max_prealloc_kb, ext4_sb_info, s_mb_max_prealloc_kb);
>
> static unsigned int old_bump_val = 128;
> EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val);
> @@ -341,6 +364,7 @@ static struct attribute *ext4_attrs[] = {
> ATTR_LIST(last_trim_minblks),
> ATTR_LIST(sb_update_sec),
> ATTR_LIST(sb_update_kb),
> + ATTR_LIST(mb_max_prealloc_kb),
> ATTR_LIST(err_report_sec),
> NULL,
> };
> @@ -431,6 +455,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
> case attr_mb_order:
> case attr_pointer_pi:
> case attr_pointer_ui:
> + case attr_mb_max_prealloc_kb:
> if (a->attr_ptr == ptr_ext4_super_block_offset)
> return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
> return sysfs_emit(buf, "%u\n", *((unsigned int *) ptr));
> @@ -557,6 +582,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
> return reserved_clusters_store(sbi, buf, len);
> case attr_inode_readahead:
> return inode_readahead_blks_store(sbi, buf, len);
> + case attr_mb_max_prealloc_kb:
> + return mb_max_prealloc_kb_store(sbi, buf, len);
> case attr_trigger_test_error:
> return trigger_test_error(sbi, buf, len);
> case attr_err_report_sec:
> @@ -695,4 +722,3 @@ void ext4_exit_sysfs(void)
> remove_proc_entry(proc_dirname, NULL);
> ext4_proc_root = NULL;
> }
> -
> --
> 2.20.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* (no subject)
From: Harry Yoo (Oracle) @ 2026-04-13 7:58 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Vlastimil Babka, linux-mm, Arnd Bergmann, x86, Lu Baolu,
iommu, Michael Grzeschik, netdev, linux-wireless, Herbert Xu,
linux-crypto, David Woodhouse, Bernie Thompson, linux-fbdev,
Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux, Hao Li, Christoph Lameter, David Rientjes,
Roman Gushchin, Shengming Hu
Bcc:
Subject: Re: [patch 14/38] slub: Use prandom instead of get_cycles()
Reply-To:
In-Reply-To: <20260410120318.525653921@kernel.org>
On Fri, Apr 10, 2026 at 02:19:37PM +0200, Thomas Gleixner wrote:
> The decision whether to scan remote nodes is based on a 'random' number
> retrieved via get_cycles(). get_cycles() is about to be removed.
>
> There is already prandom state in the code, so use that instead.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: linux-mm@kvack.org
> ---
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Is this for this merge window?
This may conflict with upcoming changes on freelist shuffling [1]
(not queued for slab/for-next yet though), but it should be easy to
resolve.
[Cc'ing Shengming and SLAB ALLOCATOR folks]
[1] https://lore.kernel.org/linux-mm/20260409204352095kKWVYKtZImN59ybO6iRNj@zte.com.cn
--
Cheers,
Harry / Hyeonggon
> mm/slub.c | 37 +++++++++++++++++++++++--------------
> 1 file changed, 23 insertions(+), 14 deletions(-)
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3302,6 +3302,25 @@ static inline struct slab *alloc_slab_pa
> return slab;
> }
>
> +#if defined(CONFIG_SLAB_FREELIST_RANDOM) || defined(CONFIG_NUMA)
> +static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
> +
> +static unsigned int slab_get_prandom_state(unsigned int limit)
> +{
> + struct rnd_state *state;
> + unsigned int res;
> +
> + /*
> + * An interrupt or NMI handler might interrupt and change
> + * the state in the middle, but that's safe.
> + */
> + state = &get_cpu_var(slab_rnd_state);
> + res = prandom_u32_state(state) % limit;
> + put_cpu_var(slab_rnd_state);
> + return res;
> +}
> +#endif
> +
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> /* Pre-initialize the random sequence cache */
> static int init_cache_random_seq(struct kmem_cache *s)
> @@ -3365,8 +3384,6 @@ static void *next_freelist_entry(struct
> return (char *)start + idx;
> }
>
> -static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
> -
> /* Shuffle the single linked freelist based on a random pre-computed sequence */
> static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
> bool allow_spin)
> @@ -3383,15 +3400,7 @@ static bool shuffle_freelist(struct kmem
> if (allow_spin) {
> pos = get_random_u32_below(freelist_count);
> } else {
> - struct rnd_state *state;
> -
> - /*
> - * An interrupt or NMI handler might interrupt and change
> - * the state in the middle, but that's safe.
> - */
> - state = &get_cpu_var(slab_rnd_state);
> - pos = prandom_u32_state(state) % freelist_count;
> - put_cpu_var(slab_rnd_state);
> + pos = slab_get_prandom_state(freelist_count);
> }
>
> page_limit = slab->objects * s->size;
> @@ -3882,7 +3891,7 @@ static void *get_from_any_partial(struct
> * with available objects.
> */
> if (!s->remote_node_defrag_ratio ||
> - get_cycles() % 1024 > s->remote_node_defrag_ratio)
> + slab_get_prandom_state(1024) > s->remote_node_defrag_ratio)
> return NULL;
>
> do {
> @@ -7102,7 +7111,7 @@ static unsigned int
>
> /* see get_from_any_partial() for the defrag ratio description */
> if (!s->remote_node_defrag_ratio ||
> - get_cycles() % 1024 > s->remote_node_defrag_ratio)
> + slab_get_prandom_state(1024) > s->remote_node_defrag_ratio)
> return 0;
>
> do {
> @@ -8421,7 +8430,7 @@ void __init kmem_cache_init_late(void)
> flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM | WQ_PERCPU,
> 0);
> WARN_ON(!flushwq);
> -#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +#if defined(CONFIG_SLAB_FREELIST_RANDOM) || defined(CONFIG_NUMA)
> prandom_init_once(&slab_rnd_state);
> #endif
> }
>
>
^ permalink raw reply
* [PATCH v3 2/2] ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers
From: Guan-Chun Wu @ 2026-04-13 6:51 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
Ojaswin Mujoo, Ritesh Harjani, Zhang Yi
Cc: linux-ext4, linux-kernel, edward062254, visitorckw,
david.laight.linux, Guan-Chun Wu
In-Reply-To: <20260413065114.730231-1-409411716@gms.tku.edu.tw>
The original byte-by-byte implementation with modulo checks is less
efficient. Refactor str2hashbuf_unsigned() and str2hashbuf_signed()
to process input in explicit 4-byte chunks instead of using a
modulus-based loop to emit words byte by byte.
Additionally, the use of function pointers for selecting the appropriate
str2hashbuf implementation has been removed. Instead, the functions are
directly invoked based on the hash type, eliminating the overhead of
dynamic function calls.
Performance test (x86_64, Intel Core i7-10700 @ 2.90GHz, average over 10000
runs, using kernel module for testing):
len | orig_s | new_s | orig_u | new_u
----+--------+-------+--------+-------
1 | 70 | 71 | 63 | 63
8 | 68 | 64 | 64 | 62
32 | 75 | 70 | 75 | 63
64 | 96 | 71 | 100 | 68
255 | 192 | 108 | 187 | 84
This change improves performance, especially for larger input sizes.
Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw>
---
fs/ext4/hash.c | 64 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 22 deletions(-)
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index 48483cd01..c3fb2df44 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -9,6 +9,7 @@
#include <linux/unicode.h>
#include <linux/compiler.h>
#include <linux/bitops.h>
+#include <linux/unaligned.h>
#include "ext4.h"
#define DELTA 0x9E3779B9
@@ -141,21 +142,28 @@ static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
- val = pad;
if (len > num*4)
len = num * 4;
- for (i = 0; i < len; i++) {
- val = ((int) scp[i]) + (val << 8);
- if ((i % 4) == 3) {
- *buf++ = val;
- val = pad;
- num--;
- }
+
+ while (len >= 4) {
+ val = (scp[0] << 24) + (scp[1] << 16) + (scp[2] << 8) + scp[3];
+ *buf++ = val;
+ scp += 4;
+ len -= 4;
+ num--;
}
+
+ val = pad;
+
+ for (i = 0; i < len; i++)
+ val = scp[i] + (val << 8);
+
if (--num >= 0)
*buf++ = val;
+
while (--num >= 0)
*buf++ = pad;
+
}
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
@@ -167,21 +175,28 @@ static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
- val = pad;
if (len > num*4)
len = num * 4;
- for (i = 0; i < len; i++) {
- val = ((int) ucp[i]) + (val << 8);
- if ((i % 4) == 3) {
- *buf++ = val;
- val = pad;
- num--;
- }
+
+ while (len >= 4) {
+ val = get_unaligned_be32(ucp);
+ *buf++ = val;
+ ucp += 4;
+ len -= 4;
+ num--;
}
+
+ val = pad;
+
+ for (i = 0; i < len; i++)
+ val = ucp[i] + (val << 8);
+
if (--num >= 0)
*buf++ = val;
+
while (--num >= 0)
*buf++ = pad;
+
}
/*
@@ -205,8 +220,7 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
const char *p;
int i;
__u32 in[8], buf[4];
- void (*str2hashbuf)(const char *, int, __u32 *, int) =
- str2hashbuf_signed;
+ bool use_unsigned = false;
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
@@ -232,12 +246,15 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
- str2hashbuf = str2hashbuf_unsigned;
+ use_unsigned = true;
fallthrough;
case DX_HASH_HALF_MD4:
p = name;
while (len > 0) {
- (*str2hashbuf)(p, len, in, 8);
+ if (use_unsigned)
+ str2hashbuf_unsigned(p, len, in, 8);
+ else
+ str2hashbuf_signed(p, len, in, 8);
half_md4_transform(buf, in);
len -= 32;
p += 32;
@@ -246,12 +263,15 @@ static int __ext4fs_dirhash(const struct inode *dir, const char *name, int len,
hash = buf[1];
break;
case DX_HASH_TEA_UNSIGNED:
- str2hashbuf = str2hashbuf_unsigned;
+ use_unsigned = true;
fallthrough;
case DX_HASH_TEA:
p = name;
while (len > 0) {
- (*str2hashbuf)(p, len, in, 4);
+ if (use_unsigned)
+ str2hashbuf_unsigned(p, len, in, 4);
+ else
+ str2hashbuf_signed(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
--
2.34.1
^ permalink raw reply related
* [PATCH v3 1/2] ext4: add Kunit coverage for directory hash computation
From: Guan-Chun Wu @ 2026-04-13 6:51 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
Ojaswin Mujoo, Ritesh Harjani, Zhang Yi
Cc: linux-ext4, linux-kernel, edward062254, visitorckw,
david.laight.linux, Guan-Chun Wu
In-Reply-To: <20260413065114.730231-1-409411716@gms.tku.edu.tw>
Introduce Kunit tests for fs/ext4/hash.c to verify ext4fs_dirhash()
across the legacy, half-MD4, and TEA hash variants.
The tests cover empty, seeded hashing, and non-ASCII name handling.
They also verify error paths, including invalid hash versions and
SipHash without a configured key, and check that the signed and
unsigned hash variants differ on non-ASCII input as expected.
When CONFIG_UNICODE is enabled, the tests further verify casefolded-name
hashing and the fallback behavior for invalid input.
Co-developed-by: Chen Hao Yu <edward062254@gmail.com>
Signed-off-by: Chen Hao Yu <edward062254@gmail.com>
Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw>
---
fs/ext4/Makefile | 2 +-
fs/ext4/hash-test.c | 546 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 547 insertions(+), 1 deletion(-)
create mode 100644 fs/ext4/hash-test.c
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 3baee4e7c..3f9fc0eb8 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -15,7 +15,7 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
ext4-test-objs += inode-test.o mballoc-test.o \
- extents-test.o
+ extents-test.o hash-test.o
obj-$(CONFIG_EXT4_KUNIT_TESTS) += ext4-test.o
ext4-$(CONFIG_FS_VERITY) += verity.o
ext4-$(CONFIG_FS_ENCRYPTION) += crypto.o
diff --git a/fs/ext4/hash-test.c b/fs/ext4/hash-test.c
new file mode 100644
index 000000000..a151b5684
--- /dev/null
+++ b/fs/ext4/hash-test.c
@@ -0,0 +1,546 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit tests for ext4 directory hash computation.
+ */
+
+#include <kunit/test.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/unicode.h>
+#include "ext4.h"
+
+static void ext4_hash_init_fake_dir(struct inode *dir, struct super_block *sb)
+{
+ memset(sb, 0, sizeof(*sb));
+ memset(dir, 0, sizeof(*dir));
+ dir->i_sb = sb;
+ strscpy(sb->s_id, "kunit-ext4", sizeof(sb->s_id));
+}
+
+static void ext4_hash_init_fake_dir_with_sbi(struct inode *dir,
+ struct super_block *sb,
+ struct ext4_sb_info *sbi)
+{
+ ext4_hash_init_fake_dir(dir, sb);
+ memset(sbi, 0, sizeof(*sbi));
+ sb->s_fs_info = sbi;
+ sbi->s_sb = sb;
+}
+
+#if IS_ENABLED(CONFIG_UNICODE)
+static void ext4_hash_init_fake_ext4_dir(struct ext4_inode_info *ei,
+ struct super_block *sb,
+ struct ext4_sb_info *sbi)
+{
+ memset(sb, 0, sizeof(*sb));
+ memset(ei, 0, sizeof(*ei));
+ memset(sbi, 0, sizeof(*sbi));
+
+ inode_init_once(&ei->vfs_inode);
+ ei->vfs_inode.i_sb = sb;
+ ei->vfs_inode.i_mode = S_IFDIR;
+
+ strscpy(sb->s_id, "kunit-ext4", sizeof(sb->s_id));
+ sb->s_fs_info = sbi;
+ sbi->s_sb = sb;
+}
+#endif
+
+struct ext4_dirhash_test_case {
+ const char *name;
+ u32 hash_version;
+ const char *input;
+ int len;
+ u32 seed[4];
+ bool use_seed;
+ u32 expected_hash;
+ u32 expected_minor_hash;
+};
+
+static const struct ext4_dirhash_test_case ext4_dirhash_test_cases[] = {
+ {
+ .name = "legacy_abc",
+ .hash_version = DX_HASH_LEGACY,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0x75afd992,
+ .expected_minor_hash = 0x00000000,
+ },
+ {
+ .name = "legacy_unsigned_abc",
+ .hash_version = DX_HASH_LEGACY_UNSIGNED,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0x75afd992,
+ .expected_minor_hash = 0x00000000,
+ },
+ {
+ .name = "half_md4_abc",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0xd196a868,
+ .expected_minor_hash = 0xc420eb28,
+ },
+ {
+ .name = "half_md4_unsigned_abc",
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0xd196a868,
+ .expected_minor_hash = 0xc420eb28,
+ },
+ {
+ .name = "tea_abc",
+ .hash_version = DX_HASH_TEA,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0xb1435ec4,
+ .expected_minor_hash = 0x3f7eaa0e,
+ },
+ {
+ .name = "tea_unsigned_abc",
+ .hash_version = DX_HASH_TEA_UNSIGNED,
+ .input = "abc",
+ .len = 3,
+ .use_seed = false,
+ .expected_hash = 0xb1435ec4,
+ .expected_minor_hash = 0x3f7eaa0e,
+ },
+ {
+ .name = "empty_half_md4",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "",
+ .len = 0,
+ .use_seed = false,
+ .expected_hash = 0xefcdab88,
+ .expected_minor_hash = 0x98badcfe,
+ },
+ {
+ .name = "half_md4_31bytes",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "1234567890123456789012345678901",
+ .len = 31,
+ .use_seed = false,
+ .expected_hash = 0xc4db1f78,
+ .expected_minor_hash = 0xea23921b,
+ },
+ {
+ .name = "half_md4_32bytes",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "12345678901234567890123456789012",
+ .len = 32,
+ .use_seed = false,
+ .expected_hash = 0xfa6cc63e,
+ .expected_minor_hash = 0x2f77bd1c,
+ },
+ {
+ .name = "half_md4_33bytes",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "123456789012345678901234567890123",
+ .len = 33,
+ .use_seed = false,
+ .expected_hash = 0xdc0c2dec,
+ .expected_minor_hash = 0x5ca23365,
+ },
+ {
+ .name = "half_md4_unsigned_31bytes",
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ .input = "1234567890123456789012345678901",
+ .len = 31,
+ .use_seed = false,
+ .expected_hash = 0xc4db1f78,
+ .expected_minor_hash = 0xea23921b,
+ },
+ {
+ .name = "half_md4_unsigned_32bytes",
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ .input = "12345678901234567890123456789012",
+ .len = 32,
+ .use_seed = false,
+ .expected_hash = 0xfa6cc63e,
+ .expected_minor_hash = 0x2f77bd1c,
+ },
+ {
+ .name = "half_md4_unsigned_33bytes",
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ .input = "123456789012345678901234567890123",
+ .len = 33,
+ .use_seed = false,
+ .expected_hash = 0xdc0c2dec,
+ .expected_minor_hash = 0x5ca23365,
+ },
+ {
+ .name = "tea_15bytes",
+ .hash_version = DX_HASH_TEA,
+ .input = "123456789abcdef",
+ .len = 15,
+ .use_seed = false,
+ .expected_hash = 0xa562903a,
+ .expected_minor_hash = 0x6174a00f,
+ },
+ {
+ .name = "tea_16bytes",
+ .hash_version = DX_HASH_TEA,
+ .input = "1234567890abcdef",
+ .len = 16,
+ .use_seed = false,
+ .expected_hash = 0x8449f258,
+ .expected_minor_hash = 0x49a16d46,
+ },
+ {
+ .name = "tea_17bytes",
+ .hash_version = DX_HASH_TEA,
+ .input = "123456789abcdefgh",
+ .len = 17,
+ .use_seed = false,
+ .expected_hash = 0xf32ec10c,
+ .expected_minor_hash = 0x58ceae61,
+ },
+ {
+ .name = "half_md4_seeded",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "same-name",
+ .len = 9,
+ .seed = { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
+ .use_seed = true,
+ .expected_hash = 0x8aebf604,
+ .expected_minor_hash = 0x66ce48fe,
+ },
+ {
+ .name = "half_md4_non_ascii_signed",
+ .hash_version = DX_HASH_HALF_MD4,
+ .input = "\x80\x81\x82\x83\x84",
+ .len = 5,
+ .use_seed = false,
+ .expected_hash = 0x8bab0498,
+ .expected_minor_hash = 0xc326632d,
+ },
+ {
+ .name = "half_md4_non_ascii_unsigned",
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ .input = "\x80\x81\x82\x83\x84",
+ .len = 5,
+ .use_seed = false,
+ .expected_hash = 0xbc48596e,
+ .expected_minor_hash = 0xde0fad41,
+ },
+ {
+ .name = "tea_non_ascii_signed",
+ .hash_version = DX_HASH_TEA,
+ .input = "\x80\x81\x82\x83\x84",
+ .len = 5,
+ .use_seed = false,
+ .expected_hash = 0x21e3a154,
+ .expected_minor_hash = 0x90112c3d,
+ },
+ {
+ .name = "tea_non_ascii_unsigned",
+ .hash_version = DX_HASH_TEA_UNSIGNED,
+ .input = "\x80\x81\x82\x83\x84",
+ .len = 5,
+ .use_seed = false,
+ .expected_hash = 0x9b648616,
+ .expected_minor_hash = 0x011dd507,
+ },
+};
+
+static void test_ext4fs_dirhash_vectors(struct kunit *test)
+{
+ struct super_block *sb;
+ struct inode *dir;
+ int i;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ dir = kunit_kzalloc(test, sizeof(*dir), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, dir);
+
+ ext4_hash_init_fake_dir(dir, sb);
+
+ for (i = 0; i < ARRAY_SIZE(ext4_dirhash_test_cases); i++) {
+ const struct ext4_dirhash_test_case *tc =
+ &ext4_dirhash_test_cases[i];
+ struct dx_hash_info hinfo;
+ int ret;
+
+ memset(&hinfo, 0, sizeof(hinfo));
+ hinfo.hash_version = tc->hash_version;
+ hinfo.seed = tc->use_seed ? (u32 *)tc->seed : NULL;
+
+ ret = ext4fs_dirhash(dir, tc->input, tc->len, &hinfo);
+
+ KUNIT_ASSERT_EQ_MSG(test, ret, 0, "case=%s", tc->name);
+ KUNIT_EXPECT_EQ_MSG(test, hinfo.hash, tc->expected_hash,
+ "case=%s", tc->name);
+ KUNIT_EXPECT_EQ_MSG(test, hinfo.minor_hash,
+ tc->expected_minor_hash,
+ "case=%s", tc->name);
+ }
+}
+
+static void test_ext4fs_dirhash_seed_changes_result(struct kunit *test)
+{
+ struct super_block *sb;
+ struct inode *dir;
+ u32 seed[4] = { 0x11111111, 0x22222222, 0x33333333, 0x44444444 };
+ struct dx_hash_info plain = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ struct dx_hash_info seeded = {
+ .hash_version = DX_HASH_HALF_MD4,
+ .seed = seed,
+ };
+ int ret_plain, ret_seeded;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ dir = kunit_kzalloc(test, sizeof(*dir), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, dir);
+
+ ext4_hash_init_fake_dir(dir, sb);
+
+ ret_plain = ext4fs_dirhash(dir, "same-name", 9, &plain);
+ ret_seeded = ext4fs_dirhash(dir, "same-name", 9, &seeded);
+
+ KUNIT_ASSERT_EQ(test, ret_plain, 0);
+ KUNIT_ASSERT_EQ(test, ret_seeded, 0);
+
+ KUNIT_EXPECT_TRUE(test,
+ plain.hash != seeded.hash ||
+ plain.minor_hash != seeded.minor_hash);
+}
+
+static void test_ext4fs_dirhash_invalid_version_returns_einval(struct kunit *test)
+{
+ struct super_block *sb;
+ struct inode *dir;
+ struct ext4_sb_info *sbi;
+ struct dx_hash_info hinfo = {
+ .hash = 0xdeadbeef,
+ .minor_hash = 0xcafebabe,
+ .hash_version = DX_HASH_LAST + 1,
+ };
+ int ret;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ dir = kunit_kzalloc(test, sizeof(*dir), GFP_KERNEL);
+ sbi = kunit_kzalloc(test, sizeof(*sbi), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, dir);
+ KUNIT_ASSERT_NOT_NULL(test, sbi);
+
+ ext4_hash_init_fake_dir_with_sbi(dir, sb, sbi);
+
+ ret = ext4fs_dirhash(dir, "abc", 3, &hinfo);
+
+ KUNIT_EXPECT_EQ(test, ret, -EINVAL);
+ KUNIT_EXPECT_EQ(test, hinfo.hash, 0);
+ KUNIT_EXPECT_EQ(test, hinfo.minor_hash, 0);
+}
+
+static void test_ext4fs_dirhash_siphash_without_key_returns_einval(struct kunit *test)
+{
+ struct super_block *sb;
+ struct inode *dir;
+ struct ext4_sb_info *sbi;
+ struct dx_hash_info hinfo = {
+ .hash_version = DX_HASH_SIPHASH,
+ };
+ int ret;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ dir = kunit_kzalloc(test, sizeof(*dir), GFP_KERNEL);
+ sbi = kunit_kzalloc(test, sizeof(*sbi), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, dir);
+ KUNIT_ASSERT_NOT_NULL(test, sbi);
+
+ ext4_hash_init_fake_dir_with_sbi(dir, sb, sbi);
+
+ ret = ext4fs_dirhash(dir, "abc", 3, &hinfo);
+
+ KUNIT_EXPECT_EQ(test, ret, -EINVAL);
+}
+
+static void test_ext4fs_dirhash_signed_unsigned_differ_on_nonascii(struct kunit *test)
+{
+ struct super_block *sb;
+ struct inode *dir;
+ static const char input[] = "\x80\xff\x81\xfe\101bc";
+ struct dx_hash_info legacy_signed = {
+ .hash_version = DX_HASH_LEGACY,
+ };
+ struct dx_hash_info legacy_unsigned = {
+ .hash_version = DX_HASH_LEGACY_UNSIGNED,
+ };
+ struct dx_hash_info md4_signed = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ struct dx_hash_info md4_unsigned = {
+ .hash_version = DX_HASH_HALF_MD4_UNSIGNED,
+ };
+ struct dx_hash_info tea_signed = {
+ .hash_version = DX_HASH_TEA,
+ };
+ struct dx_hash_info tea_unsigned = {
+ .hash_version = DX_HASH_TEA_UNSIGNED,
+ };
+ int ret;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ dir = kunit_kzalloc(test, sizeof(*dir), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, dir);
+
+ ext4_hash_init_fake_dir(dir, sb);
+
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &legacy_signed);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &legacy_unsigned);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ KUNIT_EXPECT_NE(test, legacy_signed.hash, legacy_unsigned.hash);
+
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &md4_signed);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &md4_unsigned);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ KUNIT_EXPECT_TRUE(test,
+ md4_signed.hash != md4_unsigned.hash ||
+ md4_signed.minor_hash != md4_unsigned.minor_hash);
+
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &tea_signed);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ ret = ext4fs_dirhash(dir, input, sizeof(input) - 1, &tea_unsigned);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ KUNIT_EXPECT_TRUE(test,
+ tea_signed.hash != tea_unsigned.hash ||
+ tea_signed.minor_hash != tea_unsigned.minor_hash);
+}
+
+#if IS_ENABLED(CONFIG_UNICODE)
+static void test_ext4fs_dirhash_casefolded_names_hash_consistently(struct kunit *test)
+{
+ struct super_block *sb;
+ struct ext4_inode_info *ei;
+ struct ext4_sb_info *sbi;
+ struct unicode_map *um;
+ struct dx_hash_info h1 = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ struct dx_hash_info h2 = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ int ret1, ret2;
+
+ sb = kunit_kzalloc(test, sizeof(*sb), GFP_KERNEL);
+ ei = kunit_kzalloc(test, sizeof(*ei), GFP_KERNEL);
+ sbi = kunit_kzalloc(test, sizeof(*sbi), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb);
+ KUNIT_ASSERT_NOT_NULL(test, ei);
+ KUNIT_ASSERT_NOT_NULL(test, sbi);
+
+ um = utf8_load(UTF8_LATEST);
+ if (!um) {
+ kunit_skip(test, "utf8_load(UTF8_LATEST) failed");
+ return;
+ }
+
+ ext4_hash_init_fake_ext4_dir(ei, sb, sbi);
+ sb->s_encoding = um;
+ ei->vfs_inode.i_flags |= S_CASEFOLD;
+
+ KUNIT_ASSERT_TRUE(test, IS_CASEFOLDED(&ei->vfs_inode));
+
+ ret1 = ext4fs_dirhash(&ei->vfs_inode, "Alpha", 5, &h1);
+ ret2 = ext4fs_dirhash(&ei->vfs_inode, "aLPHa", 5, &h2);
+
+ KUNIT_ASSERT_EQ(test, ret1, 0);
+ KUNIT_ASSERT_EQ(test, ret2, 0);
+ KUNIT_EXPECT_EQ(test, h1.hash, h2.hash);
+ KUNIT_EXPECT_EQ(test, h1.minor_hash, h2.minor_hash);
+
+ utf8_unload(um);
+}
+
+static void test_ext4fs_dirhash_casefold_fallback(struct kunit *test)
+{
+ struct super_block *sb_cf, *sb_plain;
+ struct ext4_inode_info *ei;
+ struct ext4_sb_info *sbi;
+ struct inode *plain_dir;
+ struct unicode_map *um;
+ static const char invalid_utf8[] = "\xc3\x28";
+ struct dx_hash_info folded_dir = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ struct dx_hash_info plain = {
+ .hash_version = DX_HASH_HALF_MD4,
+ };
+ int ret_cf, ret_plain;
+
+ sb_cf = kunit_kzalloc(test, sizeof(*sb_cf), GFP_KERNEL);
+ sb_plain = kunit_kzalloc(test, sizeof(*sb_plain), GFP_KERNEL);
+ ei = kunit_kzalloc(test, sizeof(*ei), GFP_KERNEL);
+ sbi = kunit_kzalloc(test, sizeof(*sbi), GFP_KERNEL);
+ plain_dir = kunit_kzalloc(test, sizeof(*plain_dir), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sb_cf);
+ KUNIT_ASSERT_NOT_NULL(test, sb_plain);
+ KUNIT_ASSERT_NOT_NULL(test, ei);
+ KUNIT_ASSERT_NOT_NULL(test, sbi);
+ KUNIT_ASSERT_NOT_NULL(test, plain_dir);
+
+ um = utf8_load(UTF8_LATEST);
+ if (!um) {
+ kunit_skip(test, "utf8_load(UTF8_LATEST) failed");
+ return;
+ }
+
+ ext4_hash_init_fake_ext4_dir(ei, sb_cf, sbi);
+ sb_cf->s_encoding = um;
+ ei->vfs_inode.i_flags |= S_CASEFOLD;
+
+ KUNIT_ASSERT_TRUE(test, IS_CASEFOLDED(&ei->vfs_inode));
+
+ ext4_hash_init_fake_dir(plain_dir, sb_plain);
+
+ ret_cf = ext4fs_dirhash(&ei->vfs_inode, invalid_utf8,
+ sizeof(invalid_utf8) - 1, &folded_dir);
+ ret_plain = ext4fs_dirhash(plain_dir, invalid_utf8,
+ sizeof(invalid_utf8) - 1, &plain);
+
+ KUNIT_ASSERT_EQ(test, ret_cf, 0);
+ KUNIT_ASSERT_EQ(test, ret_plain, 0);
+ KUNIT_EXPECT_EQ(test, folded_dir.hash, plain.hash);
+ KUNIT_EXPECT_EQ(test, folded_dir.minor_hash, plain.minor_hash);
+
+ utf8_unload(um);
+}
+#endif
+
+static struct kunit_case ext4_hash_test_cases[] = {
+ KUNIT_CASE(test_ext4fs_dirhash_vectors),
+ KUNIT_CASE(test_ext4fs_dirhash_seed_changes_result),
+ KUNIT_CASE(test_ext4fs_dirhash_invalid_version_returns_einval),
+ KUNIT_CASE(test_ext4fs_dirhash_siphash_without_key_returns_einval),
+ KUNIT_CASE(test_ext4fs_dirhash_signed_unsigned_differ_on_nonascii),
+#if IS_ENABLED(CONFIG_UNICODE)
+ KUNIT_CASE(test_ext4fs_dirhash_casefolded_names_hash_consistently),
+ KUNIT_CASE(test_ext4fs_dirhash_casefold_fallback),
+#endif
+ {}
+};
+
+static struct kunit_suite ext4_hash_test_suite = {
+ .name = "ext4_hash",
+ .test_cases = ext4_hash_test_cases,
+};
+
+kunit_test_suites(&ext4_hash_test_suite);
+
+MODULE_LICENSE("GPL");
--
2.34.1
^ permalink raw reply related
* [PATCH v3 0/2] ext4: add hash Kunit tests and optimize str2hashbuf
From: Guan-Chun Wu @ 2026-04-13 6:51 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
Ojaswin Mujoo, Ritesh Harjani, Zhang Yi
Cc: linux-ext4, linux-kernel, edward062254, visitorckw,
david.laight.linux, Guan-Chun Wu
This series adds Kunit tests for fs/ext4/hash.c and refactors
the str2hashbuf_{signed,unsigned}() helpers.
Patch 1 adds test coverage for ext4fs_dirhash(), including the main
hash variants and relevant edge cases.
Patch 2 simplifies the str2hashbuf helper implementation by processing
input in 4-byte chunks and removing function-pointer dispatch. This also
reduces overhead and shows roughly 2x improvement on longer inputs in
local testing.
Thanks,
Guan-Chun Wu
Link: https://lore.kernel.org/lkml/20251122043929.1908643-1-409411716@gms.tku.edu.tw/
---
v2 -> v3 :
- Added Kunit tests for fs/ext4/hash.c.
---
Guan-Chun Wu (2):
ext4: add Kunit coverage for directory hash computation
ext4: improve str2hashbuf by processing 4-byte chunks and removing
function pointers
fs/ext4/Makefile | 2 +-
fs/ext4/hash-test.c | 546 ++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/hash.c | 64 ++++--
3 files changed, 589 insertions(+), 23 deletions(-)
create mode 100644 fs/ext4/hash-test.c
--
2.34.1
^ permalink raw reply
* [RFC v2 1/1] ext4: fail fast on repeated buffer_head reads after IO failure
From: Diangang Li @ 2026-04-13 6:25 UTC (permalink / raw)
To: tytso, adilger.kernel
Cc: linux-ext4, linux-fsdevel, linux-kernel, changfengnan, yizhang089,
willy, Diangang Li
In-Reply-To: <20260413062500.1380307-1-diangangli@gmail.com>
From: Diangang Li <lidiangang@bytedance.com>
ext4 buffer_head reads serialize on BH_Lock. If a read fails, the buffer
remains !Uptodate. With concurrent callers, each waiter may resubmit the
same failing read after the previous holder drops BH_Lock. This can turn
a single read error into long stalls and hung tasks.
The block layer already retries reads. After it gives up, re-submitting
the same buffer_head read from ext4 makes no forward progress and just
keeps waiters serialized on BH_Lock.
Record read failures on buffer_head (BH_Read_EIO + b_err_timestamp) and,
when a retry window is configured (sysfs: err_retry_sec), fail fast for
repeated ext4 buffer_head reads within the window. Clear the state on
successful completion so the buffer can recover.
err_retry_sec defaults to 0, which keeps the current behavior (subsequent
callers may retry the same read). Set it to a non-zero value to throttle
repeated reads within the window.
Example hung stacks:
INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.
Call Trace:
__schedule
io_schedule
__wait_on_bit_lock
bh_uptodate_or_lock
__read_extent_tree_block
ext4_find_extent
ext4_ext_map_blocks
ext4_map_blocks
ext4_getblk
ext4_bread
__ext4_read_dirblock
dx_probe
ext4_htree_fill_tree
ext4_readdir
iterate_dir
ksys_getdents64
INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.
Call Trace:
__schedule
io_schedule
__wait_on_bit_lock
ext4_read_bh_lock
ext4_bread
__ext4_read_dirblock
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
ksys_getdents64
Signed-off-by: Diangang Li <lidiangang@bytedance.com>
---
fs/buffer.c | 2 ++
fs/ext4/balloc.c | 2 +-
fs/ext4/ext4.h | 13 ++++++----
fs/ext4/extents.c | 2 +-
fs/ext4/ialloc.c | 3 ++-
fs/ext4/indirect.c | 2 +-
fs/ext4/inode.c | 10 ++++----
fs/ext4/mmp.c | 2 +-
fs/ext4/move_extent.c | 2 +-
fs/ext4/resize.c | 2 +-
fs/ext4/super.c | 51 +++++++++++++++++++++++++++----------
fs/ext4/sysfs.c | 2 ++
include/linux/buffer_head.h | 16 ++++++++++++
13 files changed, 79 insertions(+), 30 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 22b43642ba574..10b1f60368db4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -159,6 +159,7 @@ static void __end_buffer_read_notouch(struct buffer_head *bh, int uptodate)
void end_buffer_read_sync(struct buffer_head *bh, int uptodate)
{
put_bh(bh);
+ bh_update_read_io_error(bh, uptodate, jiffies);
__end_buffer_read_notouch(bh, uptodate);
}
EXPORT_SYMBOL(end_buffer_read_sync);
@@ -167,6 +168,7 @@ void end_buffer_write_sync(struct buffer_head *bh, int uptodate)
{
if (uptodate) {
set_buffer_uptodate(bh);
+ bh_update_read_io_error(bh, 1, jiffies);
} else {
buffer_io_error(bh, ", lost sync page write");
mark_buffer_write_io_error(bh);
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 8040c731b3e45..8d7797adbb63e 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -548,7 +548,7 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
*/
set_buffer_new(bh);
trace_ext4_read_block_bitmap_load(sb, block_group, ignore_locked);
- ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO |
+ ext4_read_bh_nowait(sb, bh, REQ_META | REQ_PRIO |
(ignore_locked ? REQ_RAHEAD : 0),
ext4_end_bitmap_read,
ext4_simulate_fail(sb, EXT4_SIM_BBITMAP_EIO));
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7617e2d454ea5..4b6ff26201933 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1682,6 +1682,8 @@ struct ext4_sb_info {
struct timer_list s_err_report;
/* timeout in seconds for s_err_report; 0 disables the timer. */
unsigned long s_err_report_sec;
+ /* timeout in seconds for read error retry window; 0 disables. */
+ unsigned long s_err_retry_sec;
/* Lazy inode table initialization info */
struct ext4_li_request *s_li_request;
@@ -3185,11 +3187,12 @@ extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
sector_t block);
extern struct buffer_head *ext4_sb_bread_nofail(struct super_block *sb,
sector_t block);
-extern void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
- bh_end_io_t *end_io, bool simu_fail);
-extern int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
- bh_end_io_t *end_io, bool simu_fail);
-extern int ext4_read_bh_lock(struct buffer_head *bh, blk_opf_t op_flags, bool wait);
+extern void ext4_read_bh_nowait(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bh_end_io_t *end_io, bool simu_fail);
+extern int ext4_read_bh(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bh_end_io_t *end_io, bool simu_fail);
+extern int ext4_read_bh_lock(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bool wait);
extern void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block);
extern int ext4_seq_options_show(struct seq_file *seq, void *offset);
extern int ext4_calculate_overhead(struct super_block *sb);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8cce1479be6d1..b7fb195ded3e3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -567,7 +567,7 @@ __read_extent_tree_block(const char *function, unsigned int line,
if (!bh_uptodate_or_lock(bh)) {
trace_ext4_ext_load_extent(inode, pblk, _RET_IP_);
- err = ext4_read_bh(bh, 0, NULL, false);
+ err = ext4_read_bh(inode->i_sb, bh, 0, NULL, false);
if (err < 0)
goto errout;
}
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b1bc1950c9f03..25a177eb89bf1 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -72,6 +72,7 @@ void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate)
set_buffer_uptodate(bh);
set_bitmap_uptodate(bh);
}
+ bh_update_read_io_error(bh, uptodate, jiffies);
unlock_buffer(bh);
put_bh(bh);
}
@@ -193,7 +194,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
* submit the buffer_head for reading
*/
trace_ext4_load_inode_bitmap(sb, block_group);
- ext4_read_bh(bh, REQ_META | REQ_PRIO,
+ ext4_read_bh(sb, bh, REQ_META | REQ_PRIO,
ext4_end_bitmap_read,
ext4_simulate_fail(sb, EXT4_SIM_IBITMAP_EIO));
if (!buffer_uptodate(bh)) {
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index da76353b3a575..1ff2b5872e8b0 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -170,7 +170,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
}
if (!bh_uptodate_or_lock(bh)) {
- if (ext4_read_bh(bh, 0, NULL, false) < 0) {
+ if (ext4_read_bh(sb, bh, 0, NULL, false) < 0) {
put_bh(bh);
goto failure;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1123d995494b5..49c03c485a8d5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1053,7 +1053,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
if (!bh || ext4_buffer_uptodate(bh))
return bh;
- ret = ext4_read_bh_lock(bh, REQ_META | REQ_PRIO, true);
+ ret = ext4_read_bh_lock(inode->i_sb, bh, REQ_META | REQ_PRIO, true);
if (ret) {
put_bh(bh);
return ERR_PTR(ret);
@@ -1079,7 +1079,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
for (i = 0; i < bh_count; i++)
/* Note that NULL bhs[i] is valid because of holes. */
if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
- ext4_read_bh_lock(bhs[i], REQ_META | REQ_PRIO, false);
+ ext4_read_bh_lock(inode->i_sb, bhs[i], REQ_META | REQ_PRIO, false);
if (!wait)
return 0;
@@ -1239,7 +1239,7 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
if (!buffer_uptodate(bh) && !buffer_delay(bh) &&
!buffer_unwritten(bh) &&
(block_start < from || block_end > to)) {
- ext4_read_bh_lock(bh, 0, false);
+ ext4_read_bh_lock(inode->i_sb, bh, 0, false);
wait[nr_wait++] = bh;
}
}
@@ -4063,7 +4063,7 @@ static int __ext4_block_zero_page_range(handle_t *handle,
set_buffer_uptodate(bh);
if (!buffer_uptodate(bh)) {
- err = ext4_read_bh_lock(bh, 0, true);
+ err = ext4_read_bh_lock(inode->i_sb, bh, 0, true);
if (err)
goto unlock;
if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
@@ -4891,7 +4891,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
* Read the block from disk.
*/
trace_ext4_load_inode(sb, ino);
- ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL,
+ ext4_read_bh_nowait(sb, bh, REQ_META | REQ_PRIO, NULL,
ext4_simulate_fail(sb, EXT4_SIM_INODE_EIO));
blk_finish_plug(&plug);
wait_on_buffer(bh);
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 6f57c181ff778..6407b7fbdd3e8 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -90,7 +90,7 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
}
lock_buffer(*bh);
- ret = ext4_read_bh(*bh, REQ_META | REQ_PRIO, NULL, false);
+ ret = ext4_read_bh(sb, *bh, REQ_META | REQ_PRIO, NULL, false);
if (ret)
goto warn_exit;
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index ce1f738dff938..a304352a0741f 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -162,7 +162,7 @@ static int mext_folio_mkuptodate(struct folio *folio, size_t from, size_t to)
unlock_buffer(bh);
continue;
}
- ext4_read_bh_nowait(bh, 0, NULL, false);
+ ext4_read_bh_nowait(inode->i_sb, bh, 0, NULL, false);
nr++;
} while (block++, (bh = bh->b_this_page) != head);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 2c5b851c552a6..0350e85cc58fb 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1299,7 +1299,7 @@ static struct buffer_head *ext4_get_bitmap(struct super_block *sb, __u64 block)
if (unlikely(!bh))
return NULL;
if (!bh_uptodate_or_lock(bh)) {
- if (ext4_read_bh(bh, 0, NULL, false) < 0) {
+ if (ext4_read_bh(sb, bh, 0, NULL, false) < 0) {
brelse(bh);
return NULL;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a34efb44e73d7..c0e4d8106e4f3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -160,8 +160,26 @@ MODULE_ALIAS("ext3");
#define IS_EXT3_SB(sb) ((sb)->s_type == &ext3_fs_type)
-static inline void __ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
- bh_end_io_t *end_io, bool simu_fail)
+static bool ext4_bh_throttle_read(struct super_block *sb, struct buffer_head *bh)
+{
+ unsigned long retry_sec = EXT4_SB(sb)->s_err_retry_sec;
+
+ if (!retry_sec || !buffer_read_io_error(bh))
+ return false;
+
+ if (bh->b_err_timestamp &&
+ time_before(jiffies, bh->b_err_timestamp +
+ secs_to_jiffies(retry_sec)))
+ return true;
+
+ clear_buffer_read_io_error(bh);
+ bh->b_err_timestamp = 0;
+ return false;
+}
+
+static inline void __ext4_read_bh(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bh_end_io_t *end_io,
+ bool simu_fail)
{
if (simu_fail) {
clear_buffer_uptodate(bh);
@@ -169,6 +187,12 @@ static inline void __ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
return;
}
+ if (ext4_bh_throttle_read(sb, bh)) {
+ clear_buffer_uptodate(bh);
+ unlock_buffer(bh);
+ return;
+ }
+
/*
* buffer's verified bit is no longer valid after reading from
* disk again due to write out error, clear it to make sure we
@@ -181,8 +205,8 @@ static inline void __ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
submit_bh(REQ_OP_READ | op_flags, bh);
}
-void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
- bh_end_io_t *end_io, bool simu_fail)
+void ext4_read_bh_nowait(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bh_end_io_t *end_io, bool simu_fail)
{
BUG_ON(!buffer_locked(bh));
@@ -190,11 +214,11 @@ void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
unlock_buffer(bh);
return;
}
- __ext4_read_bh(bh, op_flags, end_io, simu_fail);
+ __ext4_read_bh(sb, bh, op_flags, end_io, simu_fail);
}
-int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
- bh_end_io_t *end_io, bool simu_fail)
+int ext4_read_bh(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bh_end_io_t *end_io, bool simu_fail)
{
BUG_ON(!buffer_locked(bh));
@@ -203,7 +227,7 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
return 0;
}
- __ext4_read_bh(bh, op_flags, end_io, simu_fail);
+ __ext4_read_bh(sb, bh, op_flags, end_io, simu_fail);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
@@ -211,14 +235,15 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
return -EIO;
}
-int ext4_read_bh_lock(struct buffer_head *bh, blk_opf_t op_flags, bool wait)
+int ext4_read_bh_lock(struct super_block *sb, struct buffer_head *bh,
+ blk_opf_t op_flags, bool wait)
{
lock_buffer(bh);
if (!wait) {
- ext4_read_bh_nowait(bh, op_flags, NULL, false);
+ ext4_read_bh_nowait(sb, bh, op_flags, NULL, false);
return 0;
}
- return ext4_read_bh(bh, op_flags, NULL, false);
+ return ext4_read_bh(sb, bh, op_flags, NULL, false);
}
/*
@@ -240,7 +265,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
if (ext4_buffer_uptodate(bh))
return bh;
- ret = ext4_read_bh_lock(bh, REQ_META | op_flags, true);
+ ret = ext4_read_bh_lock(sb, bh, REQ_META | op_flags, true);
if (ret) {
put_bh(bh);
return ERR_PTR(ret);
@@ -282,7 +307,7 @@ void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
if (likely(bh)) {
if (trylock_buffer(bh))
- ext4_read_bh_nowait(bh, REQ_RAHEAD, NULL, false);
+ ext4_read_bh_nowait(sb, bh, REQ_RAHEAD, NULL, false);
brelse(bh);
}
}
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 923b375e017fa..21fed223c9e86 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -249,6 +249,7 @@ EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
EXT4_ATTR_OFFSET(mb_best_avail_max_trim_order, 0644, mb_order,
ext4_sb_info, s_mb_best_avail_max_trim_order);
EXT4_ATTR_OFFSET(err_report_sec, 0644, err_report_sec, ext4_sb_info, s_err_report_sec);
+EXT4_RW_ATTR_SBI_UL(err_retry_sec, s_err_retry_sec);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
@@ -342,6 +343,7 @@ static struct attribute *ext4_attrs[] = {
ATTR_LIST(sb_update_sec),
ATTR_LIST(sb_update_kb),
ATTR_LIST(err_report_sec),
+ ATTR_LIST(err_retry_sec),
NULL,
};
ATTRIBUTE_GROUPS(ext4);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index b16b88bfbc3e7..77e42e706d1e5 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -29,6 +29,7 @@ enum bh_state_bits {
BH_Delay, /* Buffer is not yet allocated on disk */
BH_Boundary, /* Block is followed by a discontiguity */
BH_Write_EIO, /* I/O error on write */
+ BH_Read_EIO, /* I/O error on read */
BH_Unwritten, /* Buffer is allocated on disk but not written */
BH_Quiet, /* Buffer Error Prinks to be quiet */
BH_Meta, /* Buffer contains metadata */
@@ -79,6 +80,7 @@ struct buffer_head {
spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to
* serialise IO completion of other
* buffers in the page */
+ unsigned long b_err_timestamp; /* timestamp of last IO error (jiffies) */
};
/*
@@ -132,11 +134,25 @@ BUFFER_FNS(Async_Write, async_write)
BUFFER_FNS(Delay, delay)
BUFFER_FNS(Boundary, boundary)
BUFFER_FNS(Write_EIO, write_io_error)
+BUFFER_FNS(Read_EIO, read_io_error)
BUFFER_FNS(Unwritten, unwritten)
BUFFER_FNS(Meta, meta)
BUFFER_FNS(Prio, prio)
BUFFER_FNS(Defer_Completion, defer_completion)
+static __always_inline void bh_update_read_io_error(struct buffer_head *bh,
+ int uptodate,
+ unsigned long now)
+{
+ if (uptodate) {
+ clear_buffer_read_io_error(bh);
+ bh->b_err_timestamp = 0;
+ } else if (!buffer_read_io_error(bh)) {
+ set_buffer_read_io_error(bh);
+ bh->b_err_timestamp = now;
+ }
+}
+
static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
{
/*
--
2.39.5
^ permalink raw reply related
* [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO failure
From: Diangang Li @ 2026-04-13 6:24 UTC (permalink / raw)
To: tytso, adilger.kernel
Cc: linux-ext4, linux-fsdevel, linux-kernel, changfengnan, yizhang089,
willy, Diangang Li
In-Reply-To: <20260325093349.630193-1-diangangli@gmail.com>
From: Diangang Li <lidiangang@bytedance.com>
A production system reported hung tasks blocked for 300s+ in ext4 buffer_head
paths. Hung task reports were accompanied by disk IO errors, but profiling
showed that most individual reads completed (or failed) within 10s, with
the worst case around 60s.
At the same time, we observed a high repeat rate to the same disk LBAs.
The repeated reads frequently showed seconds-level latency and ended with
IO errors, e.g.:
[Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,
sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,
sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,
sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
We also sampled repeated-LBA latency histograms on /dev/sdi and saw that
the same error-prone LBAs were re-submitted many times with ~1-4s latency:
LBA 10704488160 (count=22): 1-2s: 20, 2-4s: 2
LBA 10704382912 (count=21): 1-2s: 20, 2-4s: 1
LBA 10704150288 (count=21): 1-2s: 19, 2-4s: 2
Root cause
==========
ext4 buffer_head reads serialize IO via BH_Lock. When one read fails, the
buffer remains !Uptodate. With multiple threads concurrently accessing
the same buffer_head, each waiter wakes up after the previous owner drops
BH_Lock, then submits the same read again and waits again. This makes the
latency grow linearly with the number of contending threads, leading to
300s+ hung tasks.
The failing IOs are repeatedly issued to the same LBA. The observed 1s+
per-IO latency is likely from device-side retry/error recovery. On SCSI the
driver typically retries reads several times (e.g. 5 retries in our
environment), so a single filesystem submission can easily accumulate 5s+
delay before failing. When multiple threads then re-submit the same failing
read and serialize on BH_Lock, the delay is amplified into 300s+ hung tasks.
Similar behavior exists for other devices (e.g. NVMe with multiple internal
retries).
Example hung stacks:
INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.
Call Trace:
__schedule
io_schedule
__wait_on_bit_lock
bh_uptodate_or_lock
__read_extent_tree_block
ext4_find_extent
ext4_ext_map_blocks
ext4_map_blocks
ext4_getblk
ext4_bread
__ext4_read_dirblock
dx_probe
ext4_htree_fill_tree
ext4_readdir
iterate_dir
ksys_getdents64
INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.
Call Trace:
__schedule
io_schedule
__wait_on_bit_lock
ext4_read_bh_lock
ext4_bread
__ext4_read_dirblock
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
ksys_getdents64
Approach
========
Record read failures on buffer_head (BH_Read_EIO + b_err_timestamp). When a
retry window is configured (sysfs: err_retry_sec), ext4 will skip submitting
another read for the buffer_head that already failed within the window and
return/unlock immediately. Clear the state on successful completion so the
buffer can recover if the error is transient.
err_retry_sec defaults to 0, which keeps the current behavior: after a read
error, callers may keep retrying the same read. Set it to a non-zero value
to throttle repeated reads within the window.
Patch summary
=============
1) Add BH_Read_EIO, b_err_timestamp and a small helper for tracking read
failures on buffer_head.
2) Update end_buffer_read_sync() and end_buffer_write_sync() (success path)
to maintain that state.
3) Add ext4 sysfs knob err_retry_sec and throttle ext4 buffer_head reads
within the configured window.
4) Pass sb into ext4_read_bh_nowait(), ext4_read_bh() and ext4_read_bh_lock()
so __ext4_read_bh() can apply the per-sb retry window check.
Diangang Li (1):
ext4: fail fast on repeated buffer_head reads after IO failure
fs/buffer.c | 2 ++
fs/ext4/balloc.c | 2 +-
fs/ext4/ext4.h | 13 ++++++----
fs/ext4/extents.c | 2 +-
fs/ext4/ialloc.c | 3 ++-
fs/ext4/indirect.c | 2 +-
fs/ext4/inode.c | 10 ++++----
fs/ext4/mmp.c | 2 +-
fs/ext4/move_extent.c | 2 +-
fs/ext4/resize.c | 2 +-
fs/ext4/super.c | 51 +++++++++++++++++++++++++++----------
fs/ext4/sysfs.c | 2 ++
include/linux/buffer_head.h | 16 ++++++++++++
13 files changed, 79 insertions(+), 30 deletions(-)
--
2.39.5
^ permalink raw reply
* Re: [patch 28/38] mips: Select ARCH_HAS_RANDOM_ENTROPY
From: Maciej W. Rozycki @ 2026-04-13 5:47 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik,
netdev, linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120319.462206386@kernel.org>
On Fri, 10 Apr 2026, Thomas Gleixner wrote:
> The only solution for now is to uninline random_get_entropy(). Fix up all
> other dependencies on the content of asm/timex.h in those files which
> really depend on it.
Oh dear! I'd yet have to fully evaluate the consequences, but offhand
this has clearly turned what compiles to a single CPU instruction on the
vast majority of MIPS platforms into an expensive function call, possibly
also changing the caller from a leaf to a nested function with all the
associated execution penalty. Is there no other way?
Cf. commit 06947aaaf9bf ("MIPS: Implement random_get_entropy with CP0
Random").
Maciej
^ permalink raw reply
* Re: [patch 23/38] alpha: Select ARCH_HAS_RANDOM_ENTROPY
From: Magnus Lindholm @ 2026-04-12 13:22 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Richard Henderson, linux-alpha, Arnd Bergmann, x86,
Lu Baolu, iommu, Michael Grzeschik, netdev, linux-wireless,
Herbert Xu, linux-crypto, Vlastimil Babka, linux-mm,
David Woodhouse, Bernie Thompson, linux-fbdev, Theodore Tso,
linux-ext4, Andrew Morton, Uladzislau Rezki, Marco Elver,
Dmitry Vyukov, kasan-dev, Andrey Ryabinin, Thomas Sailer,
linux-hams, Jason A. Donenfeld, Russell King, linux-arm-kernel,
Catalin Marinas, Huacai Chen, loongarch, Geert Uytterhoeven,
linux-m68k, Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120319.131582521@kernel.org>
On Fri, Apr 10, 2026 at 2:36 PM Thomas Gleixner <tglx@kernel.org> wrote:
>
> The only remaining usage of get_cycles() is to provide
> random_get_entropy().
>
> Switch alpha over to the new scheme of selecting ARCH_HAS_RANDOM_ENTROPY
> and providing random_get_entropy() in asm/random.h.
>
> Remove asm/timex.h as it has no functionality anymore.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: linux-alpha@vger.kernel.org
> ---
> arch/alpha/Kconfig | 1 +
> arch/alpha/include/asm/random.h | 14 ++++++++++++++
> arch/alpha/include/asm/timex.h | 26 --------------------------
> 3 files changed, 15 insertions(+), 26 deletions(-)
Hi,
The Alpha side looks fine to me.
I've applied this patch on top of v7.0-rc7, built a kernel successfully,
boot-tested it on an Alpha UP2000+ (SMP) without issues.
Acked-by: Magnus Lindholm <linmag7@gmail.com>
Tested-by: Magnus Lindholm <linmag7@gmail.com>
^ permalink raw reply
* Re: [patch 30/38] openrisc: Select ARCH_HAS_RANDOM_ENTROPY
From: Stafford Horne @ 2026-04-12 8:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Jonas Bonn, linux-openrisc, Arnd Bergmann, x86, Lu Baolu,
iommu, Michael Grzeschik, netdev, linux-wireless, Herbert Xu,
linux-crypto, Vlastimil Babka, linux-mm, David Woodhouse,
Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
Andrew Morton, Uladzislau Rezki, Marco Elver, Dmitry Vyukov,
kasan-dev, Andrey Ryabinin, Thomas Sailer, linux-hams,
Jason A. Donenfeld, Richard Henderson, linux-alpha, Russell King,
linux-arm-kernel, Catalin Marinas, Huacai Chen, loongarch,
Geert Uytterhoeven, linux-m68k, Dinh Nguyen, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120319.593798781@kernel.org>
On Fri, Apr 10, 2026 at 02:20:55PM +0200, Thomas Gleixner wrote:
> The only remaining non-architecture usage of get_cycles() is to provide
> random_get_entropy().
>
> Switch openrisc over to the new scheme of selecting ARCH_HAS_RANDOM_ENTROPY
> and providing random_get_entropy() in asm/random.h.
>
> Add 'asm/timex.h' includes to the relevant files, so the global include can
> be removed once all architectures are converted over.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: linux-openrisc@vger.kernel.org
This looks good to me.
Acked-by: Stafford Horne <shorne@gmail.com>
> ---
> arch/openrisc/Kconfig | 1 +
> arch/openrisc/include/asm/random.h | 12 ++++++++++++
> arch/openrisc/include/asm/timex.h | 5 -----
> arch/openrisc/lib/delay.c | 1 +
> 4 files changed, 14 insertions(+), 5 deletions(-)
>
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -10,6 +10,7 @@ config OPENRISC
> select ARCH_HAS_DELAY_TIMER
> select ARCH_HAS_DMA_SET_UNCACHED
> select ARCH_HAS_DMA_CLEAR_UNCACHED
> + select ARCH_HAS_RANDOM_ENTROPY
> select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> select GENERIC_BUILTIN_DTB
> select COMMON_CLK
> --- /dev/null
> +++ b/arch/openrisc/include/asm/random.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef __ASM_OPENRISC_RANDOM_H
> +#define __ASM_OPENRISC_RANDOM_H
> +
> +#include <asm/timex.h>
> +
> +static inline unsigned long random_get_entropy(void)
> +{
> + return get_cycles();
> +}
> +
> +#endif
> --- a/arch/openrisc/include/asm/timex.h
> +++ b/arch/openrisc/include/asm/timex.h
> @@ -9,13 +9,9 @@
> * OpenRISC implementation:
> * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
> */
> -
> #ifndef __ASM_OPENRISC_TIMEX_H
> #define __ASM_OPENRISC_TIMEX_H
>
> -#define get_cycles get_cycles
> -
> -#include <asm-generic/timex.h>
> #include <asm/spr.h>
> #include <asm/spr_defs.h>
>
> @@ -23,6 +19,5 @@ static inline cycles_t get_cycles(void)
> {
> return mfspr(SPR_TTCR);
> }
> -#define get_cycles get_cycles
>
> #endif
> --- a/arch/openrisc/lib/delay.c
> +++ b/arch/openrisc/lib/delay.c
> @@ -18,6 +18,7 @@
> #include <linux/init.h>
>
> #include <asm/param.h>
> +#include <asm/timex.h>
> #include <asm/processor.h>
>
> bool delay_read_timer(unsigned long *timer_value)
>
>
^ permalink raw reply
* [patch V1.1 11/38] misc: sgi-gru: Remove get_cycles() [ab]use
From: Thomas Gleixner @ 2026-04-10 20:56 UTC (permalink / raw)
To: LKML
Cc: Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik, netdev,
linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120318.320727701@kernel.org>
Calculating a timeout from get_cycles() is a historical leftover without
any functional requirement.
Use ktime_get() instead.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V2: Fix typo
---
drivers/misc/sgi-gru/gruhandles.c | 20 ++++++++------------
drivers/misc/sgi-gru/grukservices.c | 3 ++-
drivers/misc/sgi-gru/grutlbpurge.c | 5 ++---
3 files changed, 12 insertions(+), 16 deletions(-)
--- a/drivers/misc/sgi-gru/gruhandles.c
+++ b/drivers/misc/sgi-gru/gruhandles.c
@@ -6,26 +6,22 @@
*/
#include <linux/kernel.h>
+#include <linux/timekeeping.h>
#include "gru.h"
#include "grulib.h"
#include "grutables.h"
-/* 10 sec */
#include <linux/sync_core.h>
-#include <asm/tsc.h>
-#define GRU_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
-#define CLKS2NSEC(c) ((c) * 1000000 / tsc_khz)
+
+#define GRU_OPERATION_TIMEOUT_NSEC (((ktime_t)10 * NSEC_PER_SEC))
/* Extract the status field from a kernel handle */
#define GET_MSEG_HANDLE_STATUS(h) (((*(unsigned long *)(h)) >> 16) & 3)
struct mcs_op_statistic mcs_op_statistics[mcsop_last];
-static void update_mcs_stats(enum mcs_op op, unsigned long clks)
+static void update_mcs_stats(enum mcs_op op, unsigned long nsec)
{
- unsigned long nsec;
-
- nsec = CLKS2NSEC(clks);
atomic_long_inc(&mcs_op_statistics[op].count);
atomic_long_add(nsec, &mcs_op_statistics[op].total);
if (mcs_op_statistics[op].max < nsec)
@@ -58,21 +54,21 @@ static void report_instruction_timeout(v
static int wait_instruction_complete(void *h, enum mcs_op opc)
{
+ ktime_t start_time = ktime_get();
int status;
- unsigned long start_time = get_cycles();
while (1) {
cpu_relax();
status = GET_MSEG_HANDLE_STATUS(h);
if (status != CCHSTATUS_ACTIVE)
break;
- if (GRU_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
+ if (GRU_OPERATION_TIMEOUT_NSEC < (ktime_get() - start_time)) {
report_instruction_timeout(h);
- start_time = get_cycles();
+ start_time = ktime_get();
}
}
if (gru_options & OPT_STATS)
- update_mcs_stats(opc, get_cycles() - start_time);
+ update_mcs_stats(opc, (unsigned long)(ktime_get() - start_time));
return status;
}
--- a/drivers/misc/sgi-gru/grukservices.c
+++ b/drivers/misc/sgi-gru/grukservices.c
@@ -20,6 +20,7 @@
#include <linux/uaccess.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include <linux/random.h>
#include <asm/io_apic.h>
#include "gru.h"
#include "grulib.h"
@@ -1106,7 +1107,7 @@ static int quicktest3(unsigned long arg)
int ret = 0;
memset(buf2, 0, sizeof(buf2));
- memset(buf1, get_cycles() & 255, sizeof(buf1));
+ memset(buf1, get_random_u32() & 255, sizeof(buf1));
gru_copy_gpa(uv_gpa(buf2), uv_gpa(buf1), BUFSIZE);
if (memcmp(buf1, buf2, BUFSIZE)) {
printk(KERN_DEBUG "GRU:%d quicktest3 error\n", smp_processor_id());
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -22,13 +22,12 @@
#include <linux/delay.h>
#include <linux/timex.h>
#include <linux/srcu.h>
+#include <linux/random.h>
#include <asm/processor.h>
#include "gru.h"
#include "grutables.h"
#include <asm/uv/uv_hub.h>
-#define gru_random() get_cycles()
-
/* ---------------------------------- TLB Invalidation functions --------
* get_tgh_handle
*
@@ -49,7 +48,7 @@ static inline int get_off_blade_tgh(stru
int n;
n = GRU_NUM_TGH - gru->gs_tgh_first_remote;
- n = gru_random() % n;
+ n = get_random_u32() % n;
n += gru->gs_tgh_first_remote;
return n;
}
^ permalink raw reply
* [patch V1.1 02/38] x86: Cleanup include recursion hell
From: Thomas Gleixner @ 2026-04-10 20:55 UTC (permalink / raw)
To: LKML
Cc: Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik, netdev,
linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120317.709923681@kernel.org>
Including a random architecture specific header which requires global
headers just to avoid including that header at the two usage sites is
really beyond lazy and tasteless. Including global headers just to get the
__percpu macro from linux/compiler_types.h falls into the same category.
Remove the linux/percpu.h and asm/cpumask.h includes from msr.h and smp.h
and fix the resulting fallout by a simple forward struct declaration and by
including the x86 specific asm/cpumask.h header where it is actually
required.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
V1.1: Fix PARAVIRT_XXL fallout....
---
arch/x86/include/asm/cache.h | 1 +
arch/x86/include/asm/msr.h | 5 +++--
arch/x86/include/asm/paravirt.h | 3 ++-
arch/x86/include/asm/pvclock.h | 1 +
arch/x86/include/asm/smp.h | 2 --
arch/x86/include/asm/vdso/gettimeofday.h | 5 ++---
arch/x86/kernel/cpu/mce/core.c | 1 +
arch/x86/kernel/nmi.c | 1 +
arch/x86/kernel/smpboot.c | 1 +
9 files changed, 12 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/cache.h
+++ b/arch/x86/include/asm/cache.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_CACHE_H
#define _ASM_X86_CACHE_H
+#include <vdso/page.h>
#include <linux/linkage.h>
/* L1 cache line size */
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -8,12 +8,11 @@
#include <asm/asm.h>
#include <asm/errno.h>
-#include <asm/cpumask.h>
#include <uapi/asm/msr.h>
#include <asm/shared/msr.h>
+#include <linux/compiler_types.h>
#include <linux/types.h>
-#include <linux/percpu.h>
struct msr_info {
u32 msr_no;
@@ -256,6 +255,8 @@ int msr_set_bit(u32 msr, u8 bit);
int msr_clear_bit(u32 msr, u8 bit);
#ifdef CONFIG_SMP
+struct cpumask;
+
int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);
int rdmsrq_on_cpu(unsigned int cpu, u32 msr_no, u64 *q);
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -16,9 +16,10 @@
#ifndef __ASSEMBLER__
#include <linux/types.h>
-#include <linux/cpumask.h>
#include <asm/frame.h>
+struct cpumask;
+
/* The paravirtualized I/O functions */
static inline void slow_down_io(void)
{
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_PVCLOCK_H
#define _ASM_X86_PVCLOCK_H
+#include <asm/barrier.h>
#include <asm/clocksource.h>
#include <asm/pvclock-abi.h>
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -5,8 +5,6 @@
#include <linux/cpumask.h>
#include <linux/thread_info.h>
-#include <asm/cpumask.h>
-
DECLARE_PER_CPU_CACHE_HOT(int, cpu_number);
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -11,13 +11,12 @@
#define __ASM_VDSO_GETTIMEOFDAY_H
#ifndef __ASSEMBLER__
-
+#include <clocksource/hyperv_timer.h>
#include <uapi/linux/time.h>
+
#include <asm/vgtod.h>
#include <asm/unistd.h>
-#include <asm/msr.h>
#include <asm/pvclock.h>
-#include <clocksource/hyperv_timer.h>
#include <asm/vdso/sys_call.h>
#define VDSO_HAS_TIME 1
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -48,6 +48,7 @@
#include <linux/vmcore_info.h>
#include <asm/fred.h>
+#include <asm/cpumask.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
#include <asm/traps.h>
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -26,6 +26,7 @@
#include <linux/sched/clock.h>
#include <linux/kvm_types.h>
+#include <asm/cpumask.h>
#include <asm/cpu_entry_area.h>
#include <asm/traps.h>
#include <asm/mach_traps.h>
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -70,6 +70,7 @@
#include <asm/irq.h>
#include <asm/realmode.h>
#include <asm/cpu.h>
+#include <asm/cpumask.h>
#include <asm/numa.h>
#include <asm/tlbflush.h>
#include <asm/mtrr.h>
^ permalink raw reply
* [tytso-ext4:dev] BUILD SUCCESS 981fcc5674e67158d24d23e841523eccba19d0e7
From: kernel test robot @ 2026-04-10 18:07 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
branch HEAD: 981fcc5674e67158d24d23e841523eccba19d0e7 jbd2: fix deadlock in jbd2_journal_cancel_revoke()
elapsed time: 730m
configs tested: 163
configs skipped: 2
The following configs have been built successfully.
More configs may be tested in the coming days.
tested configs:
alpha allnoconfig gcc-15.2.0
alpha allyesconfig gcc-15.2.0
alpha defconfig gcc-15.2.0
arc allmodconfig clang-16
arc allnoconfig gcc-15.2.0
arc allyesconfig clang-23
arc defconfig gcc-15.2.0
arc randconfig-001-20260410 gcc-8.5.0
arc randconfig-002-20260410 gcc-8.5.0
arm allnoconfig gcc-15.2.0
arm allyesconfig clang-16
arm defconfig gcc-15.2.0
arm randconfig-001-20260410 gcc-8.5.0
arm randconfig-002-20260410 gcc-8.5.0
arm randconfig-003-20260410 gcc-8.5.0
arm randconfig-004-20260410 gcc-8.5.0
arm64 allmodconfig clang-23
arm64 allnoconfig gcc-15.2.0
arm64 defconfig gcc-15.2.0
arm64 randconfig-001-20260410 gcc-10.5.0
arm64 randconfig-002-20260410 gcc-10.5.0
arm64 randconfig-003-20260410 gcc-10.5.0
arm64 randconfig-004-20260410 gcc-10.5.0
csky allmodconfig gcc-15.2.0
csky allnoconfig gcc-15.2.0
csky defconfig gcc-15.2.0
csky randconfig-001-20260410 gcc-10.5.0
csky randconfig-002-20260410 gcc-10.5.0
hexagon allmodconfig gcc-15.2.0
hexagon allnoconfig gcc-15.2.0
hexagon defconfig gcc-15.2.0
hexagon randconfig-001-20260410 gcc-15.2.0
hexagon randconfig-002-20260410 gcc-15.2.0
i386 allmodconfig clang-20
i386 allnoconfig gcc-15.2.0
i386 allyesconfig clang-20
i386 buildonly-randconfig-001-20260410 clang-20
i386 buildonly-randconfig-002-20260410 clang-20
i386 buildonly-randconfig-003-20260410 clang-20
i386 buildonly-randconfig-004-20260410 clang-20
i386 buildonly-randconfig-005-20260410 clang-20
i386 buildonly-randconfig-006-20260410 clang-20
i386 defconfig gcc-15.2.0
i386 randconfig-001-20260410 gcc-14
i386 randconfig-002-20260410 gcc-14
i386 randconfig-003-20260410 gcc-14
i386 randconfig-004-20260410 gcc-14
i386 randconfig-005-20260410 gcc-14
i386 randconfig-006-20260410 gcc-14
i386 randconfig-007-20260410 gcc-14
i386 randconfig-011-20260410 clang-20
i386 randconfig-012-20260410 clang-20
i386 randconfig-013-20260410 clang-20
i386 randconfig-014-20260410 clang-20
i386 randconfig-015-20260410 clang-20
i386 randconfig-016-20260410 clang-20
i386 randconfig-017-20260410 clang-20
loongarch allmodconfig clang-23
loongarch allnoconfig gcc-15.2.0
loongarch defconfig clang-19
loongarch randconfig-001-20260410 gcc-15.2.0
loongarch randconfig-002-20260410 gcc-15.2.0
m68k allmodconfig gcc-15.2.0
m68k allnoconfig gcc-15.2.0
m68k allyesconfig clang-16
m68k defconfig clang-19
microblaze allnoconfig gcc-15.2.0
microblaze allyesconfig gcc-15.2.0
microblaze defconfig clang-19
mips allmodconfig gcc-15.2.0
mips allnoconfig gcc-15.2.0
mips allyesconfig gcc-15.2.0
mips vocore2_defconfig clang-23
nios2 allmodconfig clang-23
nios2 allnoconfig clang-23
nios2 defconfig clang-19
nios2 randconfig-001-20260410 gcc-15.2.0
nios2 randconfig-002-20260410 gcc-15.2.0
openrisc allmodconfig clang-23
openrisc allnoconfig clang-23
openrisc defconfig gcc-15.2.0
openrisc simple_smp_defconfig gcc-15.2.0
parisc allmodconfig gcc-15.2.0
parisc allnoconfig clang-23
parisc allyesconfig clang-19
parisc defconfig gcc-15.2.0
parisc randconfig-001-20260410 gcc-14.3.0
parisc randconfig-002-20260410 gcc-14.3.0
parisc64 defconfig clang-19
powerpc allmodconfig gcc-15.2.0
powerpc allnoconfig clang-23
powerpc randconfig-001-20260410 gcc-14.3.0
powerpc randconfig-002-20260410 gcc-14.3.0
powerpc64 randconfig-001-20260410 gcc-14.3.0
powerpc64 randconfig-002-20260410 gcc-14.3.0
riscv allmodconfig clang-23
riscv allnoconfig clang-23
riscv allyesconfig clang-16
riscv defconfig gcc-15.2.0
s390 allmodconfig clang-19
s390 allnoconfig clang-23
s390 allyesconfig gcc-15.2.0
s390 defconfig gcc-15.2.0
sh allmodconfig gcc-15.2.0
sh allnoconfig clang-23
sh allyesconfig clang-19
sh defconfig gcc-14
sparc allnoconfig clang-23
sparc defconfig gcc-15.2.0
sparc randconfig-001-20260410 clang-23
sparc randconfig-002-20260410 clang-23
sparc64 allmodconfig clang-23
sparc64 defconfig gcc-14
sparc64 randconfig-001-20260410 clang-23
sparc64 randconfig-002-20260410 clang-23
um allmodconfig clang-19
um allnoconfig clang-23
um allyesconfig gcc-15.2.0
um defconfig gcc-14
um i386_defconfig gcc-14
um randconfig-001-20260410 clang-23
um randconfig-002-20260410 clang-23
um x86_64_defconfig gcc-14
x86_64 allmodconfig clang-20
x86_64 allnoconfig clang-23
x86_64 allyesconfig clang-20
x86_64 buildonly-randconfig-001-20260410 clang-20
x86_64 buildonly-randconfig-002-20260410 clang-20
x86_64 buildonly-randconfig-003-20260410 clang-20
x86_64 buildonly-randconfig-004-20260410 clang-20
x86_64 buildonly-randconfig-005-20260410 clang-20
x86_64 buildonly-randconfig-006-20260410 clang-20
x86_64 defconfig gcc-14
x86_64 kexec clang-20
x86_64 randconfig-001-20260410 clang-20
x86_64 randconfig-002-20260410 clang-20
x86_64 randconfig-003-20260410 clang-20
x86_64 randconfig-004-20260410 clang-20
x86_64 randconfig-005-20260410 clang-20
x86_64 randconfig-006-20260410 clang-20
x86_64 randconfig-011-20260410 gcc-14
x86_64 randconfig-012-20260410 gcc-14
x86_64 randconfig-013-20260410 gcc-14
x86_64 randconfig-014-20260410 gcc-14
x86_64 randconfig-015-20260410 gcc-14
x86_64 randconfig-016-20260410 gcc-14
x86_64 randconfig-071-20260410 clang-20
x86_64 randconfig-072-20260410 clang-20
x86_64 randconfig-073-20260410 clang-20
x86_64 randconfig-074-20260410 clang-20
x86_64 randconfig-075-20260410 clang-20
x86_64 randconfig-076-20260410 clang-20
x86_64 rhel-9.4 clang-20
x86_64 rhel-9.4-bpf gcc-14
x86_64 rhel-9.4-func clang-20
x86_64 rhel-9.4-kselftests clang-20
x86_64 rhel-9.4-kunit gcc-14
x86_64 rhel-9.4-ltp gcc-14
x86_64 rhel-9.4-rust clang-20
xtensa allnoconfig clang-23
xtensa allyesconfig clang-23
xtensa randconfig-001-20260410 clang-23
xtensa randconfig-002-20260410 clang-23
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [patch 27/38] m68k: Select ARCH_HAS_RANDOM_ENTROPY
From: Daniel Palmer @ 2026-04-10 15:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Geert Uytterhoeven, linux-m68k, Arnd Bergmann, x86,
Lu Baolu, iommu, Michael Grzeschik, netdev, linux-wireless,
Herbert Xu, linux-crypto, Vlastimil Babka, linux-mm,
David Woodhouse, Bernie Thompson, linux-fbdev, Theodore Tso,
linux-ext4, Andrew Morton, Uladzislau Rezki, Marco Elver,
Dmitry Vyukov, kasan-dev, Andrey Ryabinin, Thomas Sailer,
linux-hams, Jason A. Donenfeld, Richard Henderson, linux-alpha,
Russell King, linux-arm-kernel, Catalin Marinas, Huacai Chen,
loongarch, Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
sparclinux
In-Reply-To: <20260410120319.397219631@kernel.org>
Hi
On Fri, 10 Apr 2026 at 21:39, Thomas Gleixner <tglx@kernel.org> wrote:
>
> The only remaining usage of get_cycles() is to provide
> random_get_entropy().
>
> Switch m68k over to the new scheme of selecting ARCH_HAS_RANDOM_ENTROPY and
> providing random_get_entropy() in asm/random.h.
I have built and booted this on my Amiga 4000 and it apparently still
works so FWIW:
Tested-by: Daniel Palmer <daniel@thingy.jp>
^ permalink raw reply
* Re: [PATCH v2] ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: linux-ext4, skoyama.kernel
Cc: Theodore Ts'o, adilger.kernel, libaokun, jack, ojaswin,
ritesh.list, yi.zhang, bhupesh, Sohei Koyama, Andreas Dilger,
stable
In-Reply-To: <20260406074830.8480-1-skoyama@ddn.com>
On Mon, 06 Apr 2026 16:48:30 +0900, skoyama.kernel@gmail.com wrote:
> The commit c8e008b60492 ("ext4: ignore xattrs past end")
> introduced a refcount leak in when block_csum is false.
>
> ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to
> get iloc.bh, but never releases it with brelse().
>
>
> [...]
Applied, thanks!
[1/1] ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
commit: 77d059519382bd66283e6a4e83ee186e87e7708f
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH 04/61] ext4: Prefer IS_ERR_OR_NULL over manual NULL check
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Philipp Hahn
Cc: Theodore Ts'o, Andreas Dilger
In-Reply-To: <20260310-b4-is_err_or_null-v1-4-bd63b656022d@avm.de>
On Tue, 10 Mar 2026 12:48:30 +0100, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Change generated with coccinelle.
Applied, thanks!
[04/61] ext4: Prefer IS_ERR_OR_NULL over manual NULL check
commit: 1d749e110277ce4103f27bd60d6181e52c0cc1e3
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v4 00/13] ext4: refactor partial block zero-out for iomap conversion
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: linux-ext4, Zhang Yi
Cc: Theodore Ts'o, linux-fsdevel, linux-kernel, adilger.kernel,
jack, ojaswin, ritesh.list, libaokun, yi.zhang, yizhang089,
yangerkun, yukuai
In-Reply-To: <20260327102939.1095257-1-yi.zhang@huaweicloud.com>
On Fri, 27 Mar 2026 18:29:26 +0800, Zhang Yi wrote:
> Changes since v3:
> - In patch 04, fix the comments of ext4_block_zero_range().
> - Add patch 09, ensure zeroed partial blocks are persisted in SYNC mode
> for ext4_punch_hole() and ext4_zero_range(), This can resolve the
> breaks in the guarantee of data-ordered mode caused by patch 05, as
> pointed out by Sashiko.
> - Add patch 10, unify the inconsistent SYNC mode checks in all
> fallocate paths.
> - In patch 12, fix two issues related to updating the file size in
> error paths, as pointed out by Sashiko.
>
> [...]
Applied, thanks!
[01/13] ext4: add did_zero output parameter to ext4_block_zero_page_range()
commit: 5447c8b9de7581ca7254d712652678cc460a18c2
[02/13] ext4: rename and extend ext4_block_truncate_page()
commit: bd099a0565fce5c771e1d0bfcefec26fb5b1c1b7
[03/13] ext4: factor out journalled block zeroing range
commit: 3b312a6f510ca217607ffacf5cbca2f08c402ec0
[04/13] ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range()
commit: ad11526d1504641b632918e202e23c9c80923fff
[05/13] ext4: move ordered data handling out of ext4_block_do_zero_range()
commit: 69e2d5c1f544982389327ff90b491a0f7d1afe48
[06/13] ext4: remove handle parameters from zero partial block functions
commit: d3609a71b777d073ea6ead2e6eed93e97841fa21
[07/13] ext4: pass allocate range as loff_t to ext4_alloc_file_blocks()
commit: ad1876bc4c4cae59f747b4225007cdc31f834597
[08/13] ext4: move zero partial block range functions out of active handle
commit: c4602a1d09ec7c6dd6f53e5faf3f04e9c02d71eb
[09/13] ext4: ensure zeroed partial blocks are persisted in SYNC mode
commit: 7d81ec0246ff74b10d92a4617fea84eaf06162c0
[10/13] ext4: unify SYNC mode checks in fallocate paths
commit: c3688d212fc6306bbb7136fbc1d0be0f175a5270
[11/13] ext4: remove ctime/mtime update from ext4_alloc_file_blocks()
commit: 116c0bdac2ec059d91045ba3f57cc90cb1e3b71d
[12/13] ext4: move pagecache_isize_extended() out of active handle
commit: 1ad0f42823291bcac371dafd37533f5e8d92acc3
[13/13] ext4: zero post-EOF partial block before appending write
commit: 3f60efd65412dfe4ff33b376a983220ef74056b1
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v2] jbd2: fix deadlock in jbd2_journal_cancel_revoke()
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: linux-ext4, Zhang Yi
Cc: Theodore Ts'o, linux-fsdevel, linux-kernel, dave,
adilger.kernel, jack, ojaswin, ritesh.list, libaokun, yi.zhang,
yizhang089, yangerkun, yukuai
In-Reply-To: <20260409114204.917154-1-yi.zhang@huaweicloud.com>
On Thu, 09 Apr 2026 19:42:03 +0800, Zhang Yi wrote:
> Commit f76d4c28a46a ("fs/jbd2: use sleeping version of
> __find_get_block()") changed jbd2_journal_cancel_revoke() to use
> __find_get_block_nonatomic() which holds the folio lock instead of
> i_private_lock. This breaks the lock ordering (folio -> buffer) and
> causes an ABBA deadlock when the filesystem blocksize < pagesize:
>
> T1 T2
> ext4_mkdir()
> ext4_init_new_dir()
> ext4_append()
> ext4_getblk()
> lock_buffer() <- A
> sync_blockdev()
> blkdev_writepages()
> writeback_iter()
> writeback_get_folio()
> folio_lock() <- B
> ext4_journal_get_create_access()
> jbd2_journal_cancel_revoke()
> __find_get_block_nonatomic()
> folio_lock() <- B
> block_write_full_folio()
> lock_buffer() <- A
>
> [...]
Applied, thanks!
[1/1] jbd2: fix deadlock in jbd2_journal_cancel_revoke()
commit: 981fcc5674e67158d24d23e841523eccba19d0e7
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v4 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Li Chen
Cc: Theodore Ts'o, Andreas Dilger, Joel Becker, Joseph Qi,
linux-kernel
In-Reply-To: <20260306085643.465275-1-me@linux.beauty>
On Fri, 06 Mar 2026 16:56:38 +0800, Li Chen wrote:
> This series makes the jbd2_inode dirty range tracking safe for lockless
> reads in jbd2 and filesystem callbacks used by ext4 and ocfs2.
>
> Some paths access jinode fields without holding journal->j_list_lock
> (e.g. fast commit helpers and ordered truncate helpers). v1 used READ_ONCE()
> on i_dirty_start/end, but Matthew pointed out that loff_t can be torn on
> 32-bit platforms, and Jan suggested storing the dirty range in PAGE_SIZE
> units as pgoff_t.
>
> [...]
Applied, thanks!
[1/4] jbd2: add jinode dirty range accessors
commit: 5267f6ef49cb5fba426f2d286817b1355fde31da
[2/4] ext4: use jbd2 jinode dirty range accessor
commit: 660d23669982202c99798658e2a15ccdd001f82b
[3/4] ocfs2: use jbd2 jinode dirty range accessor
commit: be81084e032c2d74f51173e30f687ce13476cb73
[4/4] jbd2: store jinode dirty range in PAGE_SIZE units
commit: 4edafa81a1d6020272d0c6eb68faeb810dd083c1
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v3] ext4: unmap invalidated folios from page tables in mpage_release_unused_pages()
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: adilger.kernel, willy, Deepanshu Kartikey
Cc: Theodore Ts'o, linux-ext4, linux-kernel, yi.zhang, djwong,
syzbot+b0a0670332b6b3230a0a
In-Reply-To: <20251205055914.1393799-1-kartikey406@gmail.com>
On Fri, 05 Dec 2025 11:29:14 +0530, Deepanshu Kartikey wrote:
> When delayed block allocation fails (e.g., due to filesystem corruption
> detected in ext4_map_blocks()), the writeback error handler calls
> mpage_release_unused_pages(invalidate=true) which invalidates affected
> folios by clearing their uptodate flag via folio_clear_uptodate().
>
> However, these folios may still be mapped in process page tables. If a
> subsequent operation (such as ftruncate calling ext4_block_truncate_page)
> triggers a write fault, the existing page table entry allows access to
> the now-invalidated folio. This leads to ext4_page_mkwrite() being called
> with a non-uptodate folio, which then gets marked dirty, triggering:
>
> [...]
Applied, thanks!
[1/1] ext4: unmap invalidated folios from page tables in mpage_release_unused_pages()
commit: 9b25f381de6b8942645f43735cb0a4fb0ab3a6d1
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH next] ext4: Fix diagnostic printf formats
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: Andreas Dilger, linux-ext4, linux-kernel, david.laight.linux
Cc: Theodore Ts'o, Masami Hiramatsu, Petr Mladek,
Rasmus Villemoes, Andy Shevchenko, Steven Rostedt,
Sergey Senozhatsky, Andrew Morton
In-Reply-To: <20260326201804.3881-1-david.laight.linux@gmail.com>
On Thu, 26 Mar 2026 20:18:04 +0000, david.laight.linux@gmail.com wrote:
> The formats for non-terminated names should be "%.*s" not "%*.s".
> The kernel currently treats "%*.s" as equivalent to "%*s" whereas
> userspace requires it be equivalent to "%*.0s".
> Neither is correct here.
Applied, thanks!
[1/1] ext4: Fix diagnostic printf formats
commit: 6ea3b34d8625ef5544d1c619bd67e2c6080ea4c2
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v5 0/5] Fix some issues about ext4-test
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: adilger.kernel, linux-ext4, Ye Bin; +Cc: Theodore Ts'o, jack
In-Reply-To: <20260330133035.287842-1-yebin@huaweicloud.com>
On Mon, 30 Mar 2026 21:30:30 +0800, Ye Bin wrote:
> This patch series is based on:
> [1]: https://lore.kernel.org/linux-ext4/5bb9041471dab8ce870c191c19cbe4df57473be8.1772381213.git.ritesh.list@gmail.com/
>
> Diff v5 vs v4:
> 1. Patch[3]
> Move ext4_es_register_shrinker() before setting up the mock inode.
>
> [...]
Applied, thanks!
[1/5] ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init()
commit: 5941a072d48841255005e3a5b5a620692d81d1a7
[2/5] ext4: call deactivate_super() in extents_kunit_exit()
commit: f9c1f7647ac8fb70bebb1615ac112d1568abe339
[3/5] ext4: fix the error handling process in extents_kunit_init).
commit: 17f73c95d47325000ee68492be3ad76ae09f6f19
[4/5] ext4: fix possible null-ptr-deref in extents_kunit_exit()
commit: ca78c31af467ffe94b15f6a2e4e1cc1c164db19b
[5/5] ext4: fix possible null-ptr-deref in mbt_kunit_exit()
commit: 22f53f08d9eb837ce69b1a07641d414aac8d045f
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH] ext4/move_extent: use folio_next_pos()
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: Matthew Wilcox, Julia Lawall
Cc: Theodore Ts'o, kernel-janitors, Andreas Dilger, linux-ext4,
linux-kernel
In-Reply-To: <20260222125049.1309075-1-Julia.Lawall@inria.fr>
On Sun, 22 Feb 2026 13:50:49 +0100, Julia Lawall wrote:
> A series of patches such as commit 60a70e61430b ("mm: Use
> folio_next_pos()") replace folio_pos() + folio_size() by
> folio_next_pos(). The former performs x << z + y << z while
> the latter performs (x + y) << z, which is slightly more
> efficient. This case was not taken into account, perhaps
> because the argument is not named folio.
>
> [...]
Applied, thanks!
[1/1] ext4/move_extent: use folio_next_pos()
commit: a804ecc399d91a529726fa1b10ff699bb531253d
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH v4] ext4: simplify mballoc preallocation size rounding for small files
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: dilger.kernel, Weixie Cui
Cc: Theodore Ts'o, linux-ext4, linux-kernel, Weixie Cui,
Andreas Dilger
In-Reply-To: <tencent_E9C5F1B2E9939B3037501FD04A7E9CF0C407@qq.com>
On Wed, 25 Feb 2026 13:02:31 +0800, Weixie Cui wrote:
> The if-else ladder in ext4_mb_normalize_request() manually rounds up
> the preallocation size to the next power of two for files up to 1MB,
> enumerating each step from 16KB to 1MB individually. Replace this with
> a single roundup_pow_of_two() call clamped to a 16KB minimum, which
> is functionally equivalent but much more concise.
>
> Also replace raw byte constants with SZ_1M and SZ_16K from
> <linux/sizes.h> for clarity, and remove the stale "XXX: should this
> table be tunable?" comment that has been there since the original
> mballoc code.
>
> [...]
Applied, thanks!
[1/1] ext4: simplify mballoc preallocation size rounding for small files
commit: af1502f98e2cdd43504596cd438f3aa6d0be8712
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* Re: [PATCH] ext4: remove unused i_fc_wait
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: Andreas Dilger, linux-ext4, linux-kernel, Li Chen; +Cc: Theodore Ts'o
In-Reply-To: <20260120121941.144192-1-me@linux.beauty>
On Tue, 20 Jan 2026 20:19:41 +0800, Li Chen wrote:
> i_fc_wait is only initialized in ext4_fc_init_inode() and never used for
> waiting or wakeups. Drop it.
Applied, thanks!
[1/1] ext4: remove unused i_fc_wait
commit: eb10607628acd1408a02e49b545e6421bb7a6ea2
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox