All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Jia Zhu <zhujia.zj@bytedance.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Baokun Li <libaokun@linux.alibaba.com>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Zhang Yi <yi.zhang@huawei.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
Date: Wed, 3 Jun 2026 19:11:48 +0100	[thread overview]
Message-ID: <aiBuZE5NWMfOGAA6@casper.infradead.org> (raw)
In-Reply-To: <20260603134800.25155-1-zhujia.zj@bytedance.com>

On Wed, Jun 03, 2026 at 09:48:00PM +0800, Jia Zhu wrote:
> Ext4 buffered writes into large folios still walk every buffer_head in the
> folio in ext4_block_write_begin() and again in block_commit_write(). Before
> regular files used large folios this was cheap, but a large folio can
> contain hundreds of buffer_heads. Small overwrites of an existing large
> folio therefore pay work proportional to the folio size instead of the
> write size.

Is this a common case for you, or is this something you noticed by
inspection?

> Start the ext4 write_begin walk at the first buffer that overlaps the
> write. For already-uptodate large folio overwrites, add a partial commit
> path which marks only the written buffers uptodate and dirty. Leave
> non-uptodate folios on the old full-buffer commit path so BH_New cleanup
> and folio-uptodate discovery are preserved.

Wouldn't you get just as much benefit from this?

+++ b/fs/buffer.c
@@ -2096,6 +2096,7 @@ void block_commit_write(struct folio *folio, size_t from,
size_t to)
 {
        size_t block_start, block_end;
        bool partial = false;
+       bool uptodate = folio_test_uptodate(folio);
        unsigned blocksize;
        struct buffer_head *bh, *head;

@@ -2118,6 +2119,8 @@ void block_commit_write(struct folio *folio, size_t from, size_t to)
                        clear_buffer_new(bh);

                block_start = block_end;
+               if (uptodate && block_start >= to)
+                       break;
                bh = bh->b_this_page;
        } while (bh != head);

> @@ -1191,17 +1191,18 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
>  	head = folio_buffers(folio);
>  	if (!head)
>  		head = create_empty_buffers(folio, blocksize, 0);
> -	block = EXT4_PG_TO_LBLK(inode, folio->index);
> +	if (from == to)
> +		return 0;
> +	block_start = round_down(from, blocksize);
> +	block = EXT4_PG_TO_LBLK(inode, folio->index) +
> +		(block_start >> inode->i_blkbits);
> +	bh = head;
> +	for (i = 0; i < block_start; i += blocksize)
> +		bh = bh->b_this_page;
>  
> -	for (bh = head, block_start = 0; bh != head || !block_start;
> -	    block++, block_start = block_end, bh = bh->b_this_page) {
> +	for (; block_start < to;
> +	     block++, block_start = block_end, bh = bh->b_this_page) {
>  		block_end = block_start + blocksize;
> -		if (block_end <= from || block_start >= to) {
> -			if (folio_test_uptodate(folio)) {
> -				set_buffer_uptodate(bh);
> -			}
> -			continue;
> -		}
>  		if (WARN_ON_ONCE(buffer_new(bh)))
>  			clear_buffer_new(bh);
>  		if (!buffer_mapped(bh)) {
> 

I'm unconvinced that this is safe ... but all of this is a distraction
form what we should really be doing which is converting ext4 to use
iomap instead of buffer heads.

  reply	other threads:[~2026-06-03 18:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 13:48 [PATCH] ext4: avoid full buffer walks for large folio partial writes Jia Zhu
2026-06-03 18:11 ` Matthew Wilcox [this message]
2026-06-05  9:02   ` Jia Zhu
2026-06-05 14:24     ` Matthew Wilcox
2026-06-08 11:56       ` Jia Zhu
2026-06-08 12:01 ` [PATCH v2 0/2] ext4: avoid tail walks for cached large-folio writes Jia Zhu
2026-06-08 12:01   ` [PATCH v2 1/2] fs/buffer: avoid tail commit walk for uptodate folios Jia Zhu
2026-06-08 13:06     ` Jan Kara
2026-06-08 12:01   ` [PATCH v2 2/2] ext4: avoid tail write_begin " Jia Zhu
2026-06-08 14:29     ` Jan Kara
2026-06-09  3:54       ` Jia Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiBuZE5NWMfOGAA6@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=libaokun@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    --cc=zhujia.zj@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.