From: "Theodore Ts'o" <tytso@mit.edu>
To: Matt Fleming <matt@readmodwrite.com>
Cc: adilger.kernel@dilger.ca, kernel-team@cloudflare.com,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, willy@infradead.org,
Baokun Li <libaokun1@huawei.com>, Jan Kara <jack@suse.cz>
Subject: Re: ext4 writeback performance issue in 6.12
Date: Wed, 8 Oct 2025 12:26:55 -0400 [thread overview]
Message-ID: <20251008162655.GB502448@mit.edu> (raw)
In-Reply-To: <20251008150705.4090434-1-matt@readmodwrite.com>
On Wed, Oct 08, 2025 at 04:07:05PM +0100, Matt Fleming wrote:
> >
> > These machines are striped and are using noatime:
> >
> > $ grep ext4 /proc/mounts
> > /dev/md127 /state ext4 rw,noatime,stripe=1280 0 0
> >
> > Is there some tunable or configuration option that I'm missing that
> > could help here to avoid wasting time in
> > ext4_mb_find_good_group_avg_frag_lists() when it's most likely going to
> > fail an order 9 allocation anyway?
Can you try disabling stripe parameter? If you are willing to try the
latest mainline kernel, there are some changes that *might* make a
different, but RAID stripe alignment has been causing problems.
In fact, in the latest e2fsprogs release, we have added this change:
commit b61f182b2de1ea75cff935037883ba1a8c7db623
Author: Theodore Ts'o <tytso@mit.edu>
Date: Sun May 4 14:07:14 2025 -0400
mke2fs: don't set the raid stripe for non-rotational devices by default
The ext4 block allocator is not at all efficient when it is asked to
enforce RAID alignment. It is especially bad for flash-based devices,
or when the file system is highly fragmented. For non-rotational
devices, it's fine to set the stride parameter (which controls
spreading the allocation bitmaps across the RAID component devices,
which always makessense); but for the stripe parameter (which asks the
ext4 block alocator to try _very_ hard to find RAID stripe aligned
devices) it's probably not a good idea.
Add new mke2fs.conf parameters with the defaults:
[defaults]
set_raid_stride = always
set_raid_stripe = disk
Even for RAID arrays based on HDD's, we can still have problems for
highly fragmented file systems. This will need to solved in the
kernel, probably by having some kind of wall clock or CPU time
limitation for each block allocation or adding some kind of
optimization which is faster than using our current buddy bitmap
implementation, especially if the stripe size is not multiple of a
power of two. But for SSD's, it's much less likely to make sense even
if we have an optimized block allocator, because if you've paid $$$
for a flash-based RAID array, the cost/benefit tradeoffs of doing less
optimized stripe RMW cycles versus the block allocator time and CPU
overhead is harder to justify without a lot of optimization effort.
If and when we can improve the ext4 kernel implementation (and it gets
rolled out to users using LTS kernels), we can change the defaults.
And of course, system administrators can always change
/etc/mke2fs.conf settings.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
- Ted
next prev parent reply other threads:[~2025-10-08 16:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 11:56 ext4 writeback performance issue in 6.12 Matt Fleming
2025-10-08 15:07 ` Matt Fleming
2025-10-08 16:26 ` Theodore Ts'o [this message]
2025-10-09 10:22 ` Matt Fleming
2025-10-09 17:52 ` Matt Fleming
2025-10-10 2:04 ` Theodore Ts'o
2025-10-10 12:42 ` Matt Fleming
2025-10-08 16:35 ` Jan Kara
2025-10-09 10:17 ` Matt Fleming
2025-10-09 12:29 ` Jan Kara
2025-10-09 17:21 ` Matt Fleming
2025-10-10 17:23 ` Jan Kara
2025-10-14 10:13 ` Matt Fleming
2025-10-09 12:36 ` Ojaswin Mujoo
2025-10-09 17:50 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251008162655.GB502448@mit.edu \
--to=tytso@mit.edu \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=kernel-team@cloudflare.com \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox