From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Matthew Wilcox <willy@infradead.org>,
Baokun Li <libaokun@huaweicloud.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, jack@suse.cz,
linux-kernel@vger.kernel.org, kernel@pankajraghav.com,
mcgrof@kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, yangerkun@huawei.com,
chengzhihao1@huawei.com, libaokun1@huawei.com
Subject: Re: [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS
Date: Fri, 31 Oct 2025 09:47:43 +0800 [thread overview]
Message-ID: <1901ccda-bed8-4f83-a959-7a6acccf2754@huaweicloud.com> (raw)
In-Reply-To: <aQPX1-XWQjKaMTZB@casper.infradead.org>
Hi!
On 10/31/2025 5:25 AM, Matthew Wilcox wrote:
> On Sat, Oct 25, 2025 at 02:32:45PM +0800, Baokun Li wrote:
>> On 2025-10-25 12:45, Matthew Wilcox wrote:
>>> No, absolutely not. We're not having open-coded GFP_NOFAIL semantics.
>>> The right way forward is for ext4 to use iomap, not for buffer heads
>>> to support large block sizes.
>>
>> ext4 only calls getblk_unmovable or __getblk when reading critical
>> metadata. Both of these functions set __GFP_NOFAIL to ensure that
>> metadata reads do not fail due to memory pressure.
>>
>> Both functions eventually call grow_dev_folio(), which is why we
>> handle the __GFP_NOFAIL logic there. xfs_buf_alloc_backing_mem()
>> has similar logic, but XFS manages its own metadata, allowing it
>> to use vmalloc for memory allocation.
>
> In today's ext4 call, we discussed various options:
>
> 1. Change folios to be potentially fragmented. This change would be
> ridiculously large and nobody thinks this is a good idea. Included here
> for completeness.
>
> 2. Separate the buffer cache from the page cache again. They were
> unified about 25 years ago, and this also feels like a very big job.
>
> 3. Duplicate the buffer cache into ext4/jbd2, remove the functionality
> not needed and make _this_ version of the buffer cache allocate
> its own memory instead of aliasing into the page cache. More feasible
> than 1 or 2; still quite a big job.
>
> 4. Pick up Catherine's work and make ext4/jbd2 use it. Seems to be
> about an equivalent amount of work to option 3.
>
Regarding these two proposals, would you consider them for the long
term? Besides the currently discussed case, they offer additional
benefits, such as making ext4's metadata management more flexible and
secure, as well as enabling more robust error handling.
Thanks,
Yi.
> 5. Make __GFP_NOFAIL work for allocations up to 64KiB (we decided this was
> probably the practical limit of sector sizes that people actually want).
> In terms of programming, it's a one-line change. But we need to sell
> this change to the MM people. I think it's doable because if we have
> a filesystem with 64KiB sectors, there will be many clean folios in the
> pagecache which are 64KiB or larger.
>
> So, we liked option 5 best.
>
next prev parent reply other threads:[~2025-10-31 1:47 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-25 3:21 [PATCH 00/25] ext4: enable block size larger than page size libaokun
2025-10-25 3:21 ` [PATCH 01/25] ext4: remove page offset calculation in ext4_block_zero_page_range() libaokun
2025-11-03 7:41 ` Jan Kara
2025-10-25 3:21 ` [PATCH 02/25] ext4: remove page offset calculation in ext4_block_truncate_page() libaokun
2025-11-03 7:42 ` Jan Kara
2025-10-25 3:21 ` [PATCH 03/25] ext4: remove PAGE_SIZE checks for rec_len conversion libaokun
2025-11-03 7:43 ` Jan Kara
2025-10-25 3:22 ` [PATCH 04/25] ext4: make ext4_punch_hole() support large block size libaokun
2025-11-03 8:05 ` Jan Kara
2025-11-04 6:55 ` Baokun Li
2025-10-25 3:22 ` [PATCH 05/25] ext4: enable DIOREAD_NOLOCK by default for BS > PS as well libaokun
2025-11-03 8:06 ` Jan Kara
2025-10-25 3:22 ` [PATCH 06/25] ext4: introduce s_min_folio_order for future BS > PS support libaokun
2025-11-03 8:19 ` Jan Kara
2025-10-25 3:22 ` [PATCH 07/25] ext4: support large block size in ext4_calculate_overhead() libaokun
2025-11-03 8:14 ` Jan Kara
2025-11-03 14:37 ` Baokun Li
2025-10-25 3:22 ` [PATCH 08/25] ext4: support large block size in ext4_readdir() libaokun
2025-11-03 8:27 ` Jan Kara
2025-10-25 3:22 ` [PATCH 09/25] ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion libaokun
2025-11-03 8:21 ` Jan Kara
2025-10-25 3:22 ` [PATCH 10/25] ext4: add EXT4_LBLK_TO_P and EXT4_P_TO_LBLK for block/page conversion libaokun
2025-11-03 8:26 ` Jan Kara
2025-11-03 14:45 ` Baokun Li
2025-11-05 8:27 ` Jan Kara
2025-10-25 3:22 ` [PATCH 11/25] ext4: support large block size in ext4_mb_load_buddy_gfp() libaokun
2025-11-05 8:46 ` Jan Kara
2025-10-25 3:22 ` [PATCH 12/25] ext4: support large block size in ext4_mb_get_buddy_page_lock() libaokun
2025-11-05 9:13 ` Jan Kara
2025-11-05 9:44 ` Baokun Li
2025-10-25 3:22 ` [PATCH 13/25] ext4: support large block size in ext4_mb_init_cache() libaokun
2025-11-05 9:18 ` Jan Kara
2025-10-25 3:22 ` [PATCH 14/25] ext4: prepare buddy cache inode for BS > PS with large folios libaokun
2025-11-05 9:19 ` Jan Kara
2025-10-25 3:22 ` [PATCH 15/25] ext4: rename 'page' references to 'folio' in multi-block allocator libaokun
2025-11-05 9:21 ` Jan Kara
2025-10-25 3:22 ` [PATCH 16/25] ext4: support large block size in ext4_mpage_readpages() libaokun
2025-11-05 9:26 ` Jan Kara
2025-10-25 3:22 ` [PATCH 17/25] ext4: support large block size in ext4_block_write_begin() libaokun
2025-11-05 9:28 ` Jan Kara
2025-10-25 3:22 ` [PATCH 18/25] ext4: support large block size in mpage_map_and_submit_buffers() libaokun
2025-11-05 9:30 ` Jan Kara
2025-10-25 3:22 ` [PATCH 19/25] ext4: support large block size in mpage_prepare_extent_to_map() libaokun
2025-11-05 9:31 ` Jan Kara
2025-10-25 3:22 ` [PATCH 20/25] ext4: support large block size in __ext4_block_zero_page_range() libaokun
2025-11-05 9:33 ` Jan Kara
2025-10-25 3:22 ` [PATCH 21/25] ext4: make online defragmentation support large block size libaokun
2025-11-05 9:50 ` Jan Kara
2025-11-05 10:48 ` Zhang Yi
2025-11-05 11:28 ` Baokun Li
2025-10-25 3:22 ` [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS libaokun
2025-10-25 4:45 ` Matthew Wilcox
2025-10-25 5:13 ` Darrick J. Wong
2025-10-25 6:32 ` Baokun Li
2025-10-25 7:01 ` Zhang Yi
2025-10-25 17:56 ` Matthew Wilcox
2025-10-27 2:57 ` Baokun Li
2025-10-27 7:40 ` Christoph Hellwig
2025-10-30 21:25 ` Matthew Wilcox
2025-10-31 1:47 ` Zhang Yi [this message]
2025-10-31 1:55 ` Baokun Li
2025-10-25 6:34 ` Baokun Li
2025-10-25 3:22 ` [PATCH 23/25] jbd2: " libaokun
2025-10-25 3:22 ` [PATCH 24/25] ext4: add checks for large folio incompatibilities " libaokun
2025-11-05 9:59 ` Jan Kara
2025-10-25 3:22 ` [PATCH 25/25] ext4: enable block size larger than page size libaokun
2025-11-05 10:14 ` Jan Kara
2025-11-06 2:44 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1901ccda-bed8-4f83-a959-7a6acccf2754@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=kernel@pankajraghav.com \
--cc=libaokun1@huawei.com \
--cc=libaokun@huaweicloud.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
--cc=yangerkun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).