From: Matthew Wilcox <willy@infradead.org>
To: Baokun Li <libaokun@huaweicloud.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, jack@suse.cz,
linux-kernel@vger.kernel.org, kernel@pankajraghav.com,
mcgrof@kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, yi.zhang@huawei.com, yangerkun@huawei.com,
chengzhihao1@huawei.com, libaokun1@huawei.com
Subject: Re: [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS
Date: Thu, 30 Oct 2025 21:25:43 +0000 [thread overview]
Message-ID: <aQPX1-XWQjKaMTZB@casper.infradead.org> (raw)
In-Reply-To: <adccaa99-ffbc-4fbf-9210-47932724c184@huaweicloud.com>
On Sat, Oct 25, 2025 at 02:32:45PM +0800, Baokun Li wrote:
> On 2025-10-25 12:45, Matthew Wilcox wrote:
> > No, absolutely not. We're not having open-coded GFP_NOFAIL semantics.
> > The right way forward is for ext4 to use iomap, not for buffer heads
> > to support large block sizes.
>
> ext4 only calls getblk_unmovable or __getblk when reading critical
> metadata. Both of these functions set __GFP_NOFAIL to ensure that
> metadata reads do not fail due to memory pressure.
>
> Both functions eventually call grow_dev_folio(), which is why we
> handle the __GFP_NOFAIL logic there. xfs_buf_alloc_backing_mem()
> has similar logic, but XFS manages its own metadata, allowing it
> to use vmalloc for memory allocation.
In today's ext4 call, we discussed various options:
1. Change folios to be potentially fragmented. This change would be
ridiculously large and nobody thinks this is a good idea. Included here
for completeness.
2. Separate the buffer cache from the page cache again. They were
unified about 25 years ago, and this also feels like a very big job.
3. Duplicate the buffer cache into ext4/jbd2, remove the functionality
not needed and make _this_ version of the buffer cache allocate
its own memory instead of aliasing into the page cache. More feasible
than 1 or 2; still quite a big job.
4. Pick up Catherine's work and make ext4/jbd2 use it. Seems to be
about an equivalent amount of work to option 3.
5. Make __GFP_NOFAIL work for allocations up to 64KiB (we decided this was
probably the practical limit of sector sizes that people actually want).
In terms of programming, it's a one-line change. But we need to sell
this change to the MM people. I think it's doable because if we have
a filesystem with 64KiB sectors, there will be many clean folios in the
pagecache which are 64KiB or larger.
So, we liked option 5 best.
next prev parent reply other threads:[~2025-10-30 21:25 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-25 3:21 [PATCH 00/25] ext4: enable block size larger than page size libaokun
2025-10-25 3:21 ` [PATCH 01/25] ext4: remove page offset calculation in ext4_block_zero_page_range() libaokun
2025-11-03 7:41 ` Jan Kara
2025-10-25 3:21 ` [PATCH 02/25] ext4: remove page offset calculation in ext4_block_truncate_page() libaokun
2025-11-03 7:42 ` Jan Kara
2025-10-25 3:21 ` [PATCH 03/25] ext4: remove PAGE_SIZE checks for rec_len conversion libaokun
2025-11-03 7:43 ` Jan Kara
2025-10-25 3:22 ` [PATCH 04/25] ext4: make ext4_punch_hole() support large block size libaokun
2025-11-03 8:05 ` Jan Kara
2025-11-04 6:55 ` Baokun Li
2025-10-25 3:22 ` [PATCH 05/25] ext4: enable DIOREAD_NOLOCK by default for BS > PS as well libaokun
2025-11-03 8:06 ` Jan Kara
2025-10-25 3:22 ` [PATCH 06/25] ext4: introduce s_min_folio_order for future BS > PS support libaokun
2025-11-03 8:19 ` Jan Kara
2025-10-25 3:22 ` [PATCH 07/25] ext4: support large block size in ext4_calculate_overhead() libaokun
2025-11-03 8:14 ` Jan Kara
2025-11-03 14:37 ` Baokun Li
2025-10-25 3:22 ` [PATCH 08/25] ext4: support large block size in ext4_readdir() libaokun
2025-11-03 8:27 ` Jan Kara
2025-10-25 3:22 ` [PATCH 09/25] ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion libaokun
2025-11-03 8:21 ` Jan Kara
2025-10-25 3:22 ` [PATCH 10/25] ext4: add EXT4_LBLK_TO_P and EXT4_P_TO_LBLK for block/page conversion libaokun
2025-11-03 8:26 ` Jan Kara
2025-11-03 14:45 ` Baokun Li
2025-11-05 8:27 ` Jan Kara
2025-10-25 3:22 ` [PATCH 11/25] ext4: support large block size in ext4_mb_load_buddy_gfp() libaokun
2025-11-05 8:46 ` Jan Kara
2025-10-25 3:22 ` [PATCH 12/25] ext4: support large block size in ext4_mb_get_buddy_page_lock() libaokun
2025-11-05 9:13 ` Jan Kara
2025-11-05 9:44 ` Baokun Li
2025-10-25 3:22 ` [PATCH 13/25] ext4: support large block size in ext4_mb_init_cache() libaokun
2025-11-05 9:18 ` Jan Kara
2025-10-25 3:22 ` [PATCH 14/25] ext4: prepare buddy cache inode for BS > PS with large folios libaokun
2025-11-05 9:19 ` Jan Kara
2025-10-25 3:22 ` [PATCH 15/25] ext4: rename 'page' references to 'folio' in multi-block allocator libaokun
2025-11-05 9:21 ` Jan Kara
2025-10-25 3:22 ` [PATCH 16/25] ext4: support large block size in ext4_mpage_readpages() libaokun
2025-11-05 9:26 ` Jan Kara
2025-10-25 3:22 ` [PATCH 17/25] ext4: support large block size in ext4_block_write_begin() libaokun
2025-11-05 9:28 ` Jan Kara
2025-10-25 3:22 ` [PATCH 18/25] ext4: support large block size in mpage_map_and_submit_buffers() libaokun
2025-11-05 9:30 ` Jan Kara
2025-10-25 3:22 ` [PATCH 19/25] ext4: support large block size in mpage_prepare_extent_to_map() libaokun
2025-11-05 9:31 ` Jan Kara
2025-10-25 3:22 ` [PATCH 20/25] ext4: support large block size in __ext4_block_zero_page_range() libaokun
2025-11-05 9:33 ` Jan Kara
2025-10-25 3:22 ` [PATCH 21/25] ext4: make online defragmentation support large block size libaokun
2025-11-05 9:50 ` Jan Kara
2025-11-05 10:48 ` Zhang Yi
2025-11-05 11:28 ` Baokun Li
2025-10-25 3:22 ` [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS libaokun
2025-10-25 4:45 ` Matthew Wilcox
2025-10-25 5:13 ` Darrick J. Wong
2025-10-25 6:32 ` Baokun Li
2025-10-25 7:01 ` Zhang Yi
2025-10-25 17:56 ` Matthew Wilcox
2025-10-27 2:57 ` Baokun Li
2025-10-27 7:40 ` Christoph Hellwig
2025-10-30 21:25 ` Matthew Wilcox [this message]
2025-10-31 1:47 ` Zhang Yi
2025-10-31 1:55 ` Baokun Li
2025-10-25 6:34 ` Baokun Li
2025-10-25 3:22 ` [PATCH 23/25] jbd2: " libaokun
2025-10-25 3:22 ` [PATCH 24/25] ext4: add checks for large folio incompatibilities " libaokun
2025-11-05 9:59 ` Jan Kara
2025-10-25 3:22 ` [PATCH 25/25] ext4: enable block size larger than page size libaokun
2025-11-05 10:14 ` Jan Kara
2025-11-06 2:44 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQPX1-XWQjKaMTZB@casper.infradead.org \
--to=willy@infradead.org \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=kernel@pankajraghav.com \
--cc=libaokun1@huawei.com \
--cc=libaokun@huaweicloud.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).