From: Matthew Wilcox <willy@infradead.org>
To: Jia Zhu <zhujia.zj@bytedance.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Baokun Li <libaokun@linux.alibaba.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Ritesh Harjani <ritesh.list@gmail.com>,
Zhang Yi <yi.zhang@huawei.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
Date: Fri, 5 Jun 2026 15:24:20 +0100 [thread overview]
Message-ID: <aiLcFP2drmHGjEL2@casper.infradead.org> (raw)
In-Reply-To: <20260605090253.32822-1-zhujia.zj@bytedance.com>
On Fri, Jun 05, 2026 at 05:02:53PM +0800, Jia Zhu wrote:
> On Wed, Jun 03, 2026 at 07:11:48PM +0100, Matthew Wilcox wrote:
> > Is this a common case for you, or is this something you noticed by
> > inspection?
>
> This was found by our kernel release benchmark. We run libMicro as part
> of that test suite:
>
> https://github.com/rzezeski/libMicro
>
> The regression shows up in buffered write/pwrite/writev overwrite tests
> on ext4 large folios.
Makes sense. I'll assume this can correspond to a reasonable workload.
It certainly seems like something that could exist.
> > Wouldn't you get just as much benefit from this?
>
> Yes. I tested this approach, and it gives almost the same result as my
> original partial-commit helper.
Excellent! Obviously it'd be even better if we didn't have to walk the
leading buffer_heads ... but there's no way to do this with the data
structure we have.
> Agreed. The original ext4_block_write_begin() change was too aggressive.
> Seeking directly to @from also skips the prefix buffers, which makes the
> old side effects harder to prove.
>
> For v2 I plan to drop that part and keep the existing walk from the head.
> The ext4 change would only stop after @to when the folio was already
> uptodate on entry, similar to your block_commit_write() suggestion:
>
> + bool folio_uptodate = folio_test_uptodate(folio);
> +
> for (bh = head, block_start = 0;
> - bh != head || !block_start;
> + (bh != head || !block_start) &&
> + (!folio_uptodate || block_start < to);
> block++, block_start = block_end, bh = bh->b_this_page) {
> ...
> }
Yes, I think that's a good approach.
> So the prefix path and all in-range handling stay unchanged. The only
> skipped work is the tail part after @to, and only for a folio that was
> already uptodate before write_begin() started.
>
> > ... converting ext4 to use iomap instead of buffer heads.
>
> I strongly agree that iomap is the right direction for ext4. The iomap
> buffered write path would make this particular buffer-head walk cost go
> away.
>
> The reason I am still looking at this path is that the regression is
> visible in our LTS upgrade testing from 6.12 to 6.18. It was introduced
> by the ext4 large-folio enablement in v6.16. For example, in our
> libMicro release benchmark with THP always enabled, usecs/call, lower is
> better:
>
> case v6.12 v6.18 regression
> write_u1k 0.609 4.659 +665.0%
> write_u10k 1.408 4.869 +245.8%
Ouch ;-) No wonder you want to address this. Do you recover all the
regression with this fix?
> The iomap conversion is the long-term fix, but it does not help kernels
> which still use the buffer-head buffered write path. I would like to keep
> this as a small regression fix for that path, and make it minimal enough
> to be suitable for stable/LTS backport.
Is it that you're using some ext4 features that aren't supported by
iomap yet? Could you say which ones? That might motivate someone to
prioritise that support.
> Would this v2 direction look OK to you?
Absolutely. Very happy with this approach.
next prev parent reply other threads:[~2026-06-05 14:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 13:48 [PATCH] ext4: avoid full buffer walks for large folio partial writes Jia Zhu
2026-06-03 18:11 ` Matthew Wilcox
2026-06-05 9:02 ` Jia Zhu
2026-06-05 14:24 ` Matthew Wilcox [this message]
2026-06-08 11:56 ` Jia Zhu
2026-06-08 12:01 ` [PATCH v2 0/2] ext4: avoid tail walks for cached large-folio writes Jia Zhu
2026-06-08 12:01 ` [PATCH v2 1/2] fs/buffer: avoid tail commit walk for uptodate folios Jia Zhu
2026-06-08 13:06 ` Jan Kara
2026-06-08 12:01 ` [PATCH v2 2/2] ext4: avoid tail write_begin " Jia Zhu
2026-06-08 14:29 ` Jan Kara
2026-06-09 3:54 ` Jia Zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiLcFP2drmHGjEL2@casper.infradead.org \
--to=willy@infradead.org \
--cc=adilger.kernel@dilger.ca \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=libaokun@linux.alibaba.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=yi.zhang@huawei.com \
--cc=zhujia.zj@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.