From: Matthew Wilcox <willy@infradead.org>
To: Jia Zhu <zhujia.zj@bytedance.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Baokun Li <libaokun@linux.alibaba.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Ritesh Harjani <ritesh.list@gmail.com>,
Zhang Yi <yi.zhang@huawei.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
Date: Fri, 5 Jun 2026 15:24:20 +0100 [thread overview]
Message-ID: <aiLcFP2drmHGjEL2@casper.infradead.org> (raw)
In-Reply-To: <20260605090253.32822-1-zhujia.zj@bytedance.com>
On Fri, Jun 05, 2026 at 05:02:53PM +0800, Jia Zhu wrote:
> On Wed, Jun 03, 2026 at 07:11:48PM +0100, Matthew Wilcox wrote:
> > Is this a common case for you, or is this something you noticed by
> > inspection?
>
> This was found by our kernel release benchmark. We run libMicro as part
> of that test suite:
>
> https://github.com/rzezeski/libMicro
>
> The regression shows up in buffered write/pwrite/writev overwrite tests
> on ext4 large folios.
Makes sense. I'll assume this can correspond to a reasonable workload.
It certainly seems like something that could exist.
> > Wouldn't you get just as much benefit from this?
>
> Yes. I tested this approach, and it gives almost the same result as my
> original partial-commit helper.
Excellent! Obviously it'd be even better if we didn't have to walk the
leading buffer_heads ... but there's no way to do this with the data
structure we have.
> Agreed. The original ext4_block_write_begin() change was too aggressive.
> Seeking directly to @from also skips the prefix buffers, which makes the
> old side effects harder to prove.
>
> For v2 I plan to drop that part and keep the existing walk from the head.
> The ext4 change would only stop after @to when the folio was already
> uptodate on entry, similar to your block_commit_write() suggestion:
>
> + bool folio_uptodate = folio_test_uptodate(folio);
> +
> for (bh = head, block_start = 0;
> - bh != head || !block_start;
> + (bh != head || !block_start) &&
> + (!folio_uptodate || block_start < to);
> block++, block_start = block_end, bh = bh->b_this_page) {
> ...
> }
Yes, I think that's a good approach.
> So the prefix path and all in-range handling stay unchanged. The only
> skipped work is the tail part after @to, and only for a folio that was
> already uptodate before write_begin() started.
>
> > ... converting ext4 to use iomap instead of buffer heads.
>
> I strongly agree that iomap is the right direction for ext4. The iomap
> buffered write path would make this particular buffer-head walk cost go
> away.
>
> The reason I am still looking at this path is that the regression is
> visible in our LTS upgrade testing from 6.12 to 6.18. It was introduced
> by the ext4 large-folio enablement in v6.16. For example, in our
> libMicro release benchmark with THP always enabled, usecs/call, lower is
> better:
>
> case v6.12 v6.18 regression
> write_u1k 0.609 4.659 +665.0%
> write_u10k 1.408 4.869 +245.8%
Ouch ;-) No wonder you want to address this. Do you recover all the
regression with this fix?
> The iomap conversion is the long-term fix, but it does not help kernels
> which still use the buffer-head buffered write path. I would like to keep
> this as a small regression fix for that path, and make it minimal enough
> to be suitable for stable/LTS backport.
Is it that you're using some ext4 features that aren't supported by
iomap yet? Could you say which ones? That might motivate someone to
prioritise that support.
> Would this v2 direction look OK to you?
Absolutely. Very happy with this approach.
next prev parent reply other threads:[~2026-06-05 14:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 13:48 [PATCH] ext4: avoid full buffer walks for large folio partial writes Jia Zhu
2026-06-03 18:11 ` Matthew Wilcox
2026-06-05 9:02 ` Jia Zhu
2026-06-05 14:24 ` Matthew Wilcox [this message]
2026-06-08 11:56 ` Jia Zhu
2026-06-08 12:01 ` [PATCH v2 0/2] ext4: avoid tail walks for cached large-folio writes Jia Zhu
2026-06-08 12:01 ` [PATCH v2 1/2] fs/buffer: avoid tail commit walk for uptodate folios Jia Zhu
2026-06-08 13:06 ` Jan Kara
2026-06-08 12:01 ` [PATCH v2 2/2] ext4: avoid tail write_begin " Jia Zhu
2026-06-08 14:29 ` Jan Kara
2026-06-09 3:54 ` Jia Zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiLcFP2drmHGjEL2@casper.infradead.org \
--to=willy@infradead.org \
--cc=adilger.kernel@dilger.ca \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=libaokun@linux.alibaba.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=yi.zhang@huawei.com \
--cc=zhujia.zj@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox