Linux EXT4 FS development
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Jia Zhu <zhujia.zj@bytedance.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Baokun Li <libaokun@linux.alibaba.com>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Zhang Yi <yi.zhang@huawei.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
Date: Wed, 3 Jun 2026 19:11:48 +0100	[thread overview]
Message-ID: <aiBuZE5NWMfOGAA6@casper.infradead.org> (raw)
In-Reply-To: <20260603134800.25155-1-zhujia.zj@bytedance.com>

On Wed, Jun 03, 2026 at 09:48:00PM +0800, Jia Zhu wrote:
> Ext4 buffered writes into large folios still walk every buffer_head in the
> folio in ext4_block_write_begin() and again in block_commit_write(). Before
> regular files used large folios this was cheap, but a large folio can
> contain hundreds of buffer_heads. Small overwrites of an existing large
> folio therefore pay work proportional to the folio size instead of the
> write size.

Is this a common case for you, or is this something you noticed by
inspection?

> Start the ext4 write_begin walk at the first buffer that overlaps the
> write. For already-uptodate large folio overwrites, add a partial commit
> path which marks only the written buffers uptodate and dirty. Leave
> non-uptodate folios on the old full-buffer commit path so BH_New cleanup
> and folio-uptodate discovery are preserved.

Wouldn't you get just as much benefit from this?

+++ b/fs/buffer.c
@@ -2096,6 +2096,7 @@ void block_commit_write(struct folio *folio, size_t from,
size_t to)
 {
        size_t block_start, block_end;
        bool partial = false;
+       bool uptodate = folio_test_uptodate(folio);
        unsigned blocksize;
        struct buffer_head *bh, *head;

@@ -2118,6 +2119,8 @@ void block_commit_write(struct folio *folio, size_t from, size_t to)
                        clear_buffer_new(bh);

                block_start = block_end;
+               if (uptodate && block_start >= to)
+                       break;
                bh = bh->b_this_page;
        } while (bh != head);

> @@ -1191,17 +1191,18 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
>  	head = folio_buffers(folio);
>  	if (!head)
>  		head = create_empty_buffers(folio, blocksize, 0);
> -	block = EXT4_PG_TO_LBLK(inode, folio->index);
> +	if (from == to)
> +		return 0;
> +	block_start = round_down(from, blocksize);
> +	block = EXT4_PG_TO_LBLK(inode, folio->index) +
> +		(block_start >> inode->i_blkbits);
> +	bh = head;
> +	for (i = 0; i < block_start; i += blocksize)
> +		bh = bh->b_this_page;
>  
> -	for (bh = head, block_start = 0; bh != head || !block_start;
> -	    block++, block_start = block_end, bh = bh->b_this_page) {
> +	for (; block_start < to;
> +	     block++, block_start = block_end, bh = bh->b_this_page) {
>  		block_end = block_start + blocksize;
> -		if (block_end <= from || block_start >= to) {
> -			if (folio_test_uptodate(folio)) {
> -				set_buffer_uptodate(bh);
> -			}
> -			continue;
> -		}
>  		if (WARN_ON_ONCE(buffer_new(bh)))
>  			clear_buffer_new(bh);
>  		if (!buffer_mapped(bh)) {
> 

I'm unconvinced that this is safe ... but all of this is a distraction
form what we should really be doing which is converting ext4 to use
iomap instead of buffer heads.

      reply	other threads:[~2026-06-03 18:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 13:48 [PATCH] ext4: avoid full buffer walks for large folio partial writes Jia Zhu
2026-06-03 18:11 ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiBuZE5NWMfOGAA6@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=libaokun@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    --cc=zhujia.zj@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox