linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, willy@infradead.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, jack@suse.cz, yi.zhang@huawei.com,
	libaokun1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 0/8] ext4: enable large folio for regular files
Date: Mon, 19 May 2025 09:19:10 +0800	[thread overview]
Message-ID: <33b938e9-bd81-4017-a7e0-e5ffb216ac70@huaweicloud.com> (raw)
In-Reply-To: <aCcmGyse9prx-D7S@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>

On 2025/5/16 19:48, Ojaswin Mujoo wrote:
> On Mon, May 12, 2025 at 02:33:11PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Changes since v1:
>>  - Rebase codes on 6.15-rc6.
>>  - Drop the modifications in block_read_full_folio() which has supported
>>    by commit b72e591f74de ("fs/buffer: remove batching from async
>>    read").
>>  - Fine-tuning patch 6 without modifying the logic.
>>
>> v1: https://lore.kernel.org/linux-ext4/20241125114419.903270-1-yi.zhang@huaweicloud.com/
>>
>> Original Description:
>>
>> Since almost all of the code paths in ext4 have already been converted
>> to use folios, there isn't much additional work required to support
>> large folios. This series completes the remaining work and enables large
>> folios for regular files on ext4, with the exception of fsverity,
>> fscrypt, and data=journal mode.
>>
>> Unlike my other series[1], which enables large folios by converting the
>> buffered I/O path from the classic buffer_head to iomap, this solution
>> is based on the original buffer_head, it primarily modifies the block
>> offset and length calculations within a single folio in the buffer
>> write, buffer read, zero range, writeback, and move extents paths to
>> support large folios, doesn't do further code refactoring and
>> optimization.
>>
>> This series have passed kvm-xfstests in auto mode several times, every
>> thing looks fine, any comments are welcome.
>>
>> About performance:
>>
>> I used the same test script from my iomap series (need to drop the mount
>> opts parameter MOUNT_OPT) [2], run fio tests on the same machine with
>> Intel Xeon Gold 6240 CPU with 400GB system ram, 200GB ramdisk and 4TB
>> nvme ssd disk. Both compared with the base and the IOMAP + large folio
>> changes.
>>
>>  == buffer read ==
>>
>>                 base          iomap+large folio base+large folio
>>  type     bs    IOPS  BW(M/s) IOPS  BW(M/s)     IOPS   BW(M/s)
>>  ----------------------------------------------------------------
>>  hole     4K  | 576k  2253  | 762k  2975(+32%) | 747k  2918(+29%)
>>  hole     64K | 48.7k 3043  | 77.8k 4860(+60%) | 76.3k 4767(+57%)
>>  hole     1M  | 2960  2960  | 4942  4942(+67%) | 4737  4738(+60%)
>>  ramdisk  4K  | 443k  1732  | 530k  2069(+19%) | 494k  1930(+11%)
>>  ramdisk  64K | 34.5k 2156  | 45.6k 2850(+32%) | 41.3k 2584(+20%)
>>  ramdisk  1M  | 2093  2093  | 2841  2841(+36%) | 2585  2586(+24%)
>>  nvme     4K  | 339k  1323  | 364k  1425(+8%)  | 344k  1341(+1%)
>>  nvme     64K | 23.6k 1471  | 25.2k 1574(+7%)  | 25.4k 1586(+8%)
>>  nvme     1M  | 2012  2012  | 2153  2153(+7%)  | 2122  2122(+5%)
>>
>>
>>  == buffer write ==
>>
>>  O: Overwrite; S: Sync; W: Writeback
>>
>>                      base         iomap+large folio    base+large folio
>>  type    O S W bs    IOPS  BW     IOPS  BW(M/s)        IOPS  BW(M/s)
>>  ----------------------------------------------------------------------
>>  cache   N N N 4K  | 417k  1631 | 440k  1719 (+5%)   | 423k  1655 (+2%)
>>  cache   N N N 64K | 33.4k 2088 | 81.5k 5092 (+144%) | 59.1k 3690 (+77%)
>>  cache   N N N 1M  | 2143  2143 | 5716  5716 (+167%) | 3901  3901 (+82%)
>>  cache   Y N N 4K  | 449k  1755 | 469k  1834 (+5%)   | 452k  1767 (+1%)
>>  cache   Y N N 64K | 36.6k 2290 | 82.3k 5142 (+125%) | 67.2k 4200 (+83%)
>>  cache   Y N N 1M  | 2352  2352 | 5577  5577 (+137%  | 4275  4276 (+82%)
>>  ramdisk N N Y 4K  | 365k  1424 | 354k  1384 (-3%)   | 372k  1449 (+2%)
>>  ramdisk N N Y 64K | 31.2k 1950 | 74.2k 4640 (+138%) | 56.4k 3528 (+81%)
>>  ramdisk N N Y 1M  | 1968  1968 | 5201  5201 (+164%) | 3814  3814 (+94%)
>>  ramdisk N Y N 4K  | 9984  39   | 12.9k 51   (+29%)  | 9871  39   (-1%)
>>  ramdisk N Y N 64K | 5936  371  | 8960  560  (+51%)  | 6320  395  (+6%)
>>  ramdisk N Y N 1M  | 1050  1050 | 1835  1835 (+75%)  | 1656  1657 (+58%)
>>  ramdisk Y N Y 4K  | 411k  1609 | 443k  1731 (+8%)   | 441k  1723 (+7%)
>>  ramdisk Y N Y 64K | 34.1k 2134 | 77.5k 4844 (+127%) | 66.4k 4151 (+95%)
>>  ramdisk Y N Y 1M  | 2248  2248 | 5372  5372 (+139%) | 4209  4210 (+87%)
>>  ramdisk Y Y N 4K  | 182k  711  | 186k  730  (+3%)   | 182k  711  (0%)
>>  ramdisk Y Y N 64K | 18.7k 1170 | 34.7k 2171 (+86%)  | 31.5k 1969 (+68%)
>>  ramdisk Y Y N 1M  | 1229  1229 | 2269  2269 (+85%)  | 1943  1944 (+58%)
>>  nvme    N N Y 4K  | 373k  1458 | 387k  1512 (+4%)   | 399k  1559 (+7%)
>>  nvme    N N Y 64K | 29.2k 1827 | 70.9k 4431 (+143%) | 54.3k 3390 (+86%)
>>  nvme    N N Y 1M  | 1835  1835 | 4919  4919 (+168%) | 3658  3658 (+99%)
>>  nvme    N Y N 4K  | 11.7k 46   | 11.7k 46   (0%)    | 11.5k 45   (-1%)
>>  nvme    N Y N 64K | 6453  403  | 8661  541  (+34%)  | 7520  470  (+17%)
>>  nvme    N Y N 1M  | 649   649  | 1351  1351 (+108%) | 885   886  (+37%)
>>  nvme    Y N Y 4K  | 372k  1456 | 433k  1693 (+16%)  | 419k  1637 (+12%)
>>  nvme    Y N Y 64K | 33.0k 2064 | 74.7k 4669 (+126%) | 64.1k 4010 (+94%)
>>  nvme    Y N Y 1M  | 2131  2131 | 5273  5273 (+147%) | 4259  4260 (+100%)
>>  nvme    Y Y N 4K  | 56.7k 222  | 56.4k 220  (-1%)   | 59.4k 232  (+5%)
>>  nvme    Y Y N 64K | 13.4k 840  | 19.4k 1214 (+45%)  | 18.5k 1156 (+38%)
>>  nvme    Y Y N 1M  | 714   714  | 1504  1504 (+111%) | 1319  1320 (+85%)
>>
>> [1] https://lore.kernel.org/linux-ext4/20241022111059.2566137-1-yi.zhang@huaweicloud.com/
>> [2] https://lore.kernel.org/linux-ext4/3c01efe6-007a-4422-ad79-0bad3af281b1@huaweicloud.com/
>>
>> Thanks,
>> Yi.
>>
>> Zhang Yi (8):
>>   ext4: make ext4_mpage_readpages() support large folios
>>   ext4: make regular file's buffered write path support large folios
>>   ext4: make __ext4_block_zero_page_range() support large folio
>>   ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large
>>     folio
>>   ext4: correct the journal credits calculations of allocating blocks
>>   ext4: make the writeback path support large folios
>>   ext4: make online defragmentation support large folios
>>   ext4: enable large folio for regular file
>>
>>  fs/ext4/ext4.h        |  1 +
>>  fs/ext4/ext4_jbd2.c   |  3 +-
>>  fs/ext4/ext4_jbd2.h   |  4 +--
>>  fs/ext4/extents.c     |  5 +--
>>  fs/ext4/ialloc.c      |  3 ++
>>  fs/ext4/inode.c       | 72 ++++++++++++++++++++++++++++++-------------
>>  fs/ext4/move_extent.c | 11 +++----
>>  fs/ext4/readpage.c    | 28 ++++++++++-------
>>  fs/jbd2/journal.c     |  7 +++--
>>  include/linux/jbd2.h  |  2 +-
>>  10 files changed, 88 insertions(+), 48 deletions(-)
>>
>> -- 
>> 2.46.1
> 
> Hi Zhang,
> 
> I'm currently testing the patches with 4k block size and 64k pagesize on
> power and noticed that ext4/046 is hitting a bug on:
> 
> [  188.351668][ T1320] NIP [c0000000006f15a4] block_read_full_folio+0x444/0x450
> [  188.351782][ T1320] LR [c0000000006f15a0] block_read_full_folio+0x440/0x450
> [  188.351868][ T1320] --- interrupt: 700
> [  188.351919][ T1320] [c0000000058176e0] [c0000000007d7564] ext4_mpage_readpages+0x204/0x910
> [  188.352027][ T1320] [c0000000058177e0] [c0000000007a55d4] ext4_readahead+0x44/0x60
> [  188.352119][ T1320] [c000000005817800] [c00000000052bd80] read_pages+0xa0/0x3d0
> [  188.352216][ T1320] [c0000000058178a0] [c00000000052cb84] page_cache_ra_order+0x2c4/0x560
> [  188.352312][ T1320] [c000000005817990] [c000000000514614] filemap_readahead.isra.0+0x74/0xe0
> [  188.352427][ T1320] [c000000005817a00] [c000000000519fe8] filemap_get_pages+0x548/0x9d0
> [  188.352529][ T1320] [c000000005817af0] [c00000000051a59c] filemap_read+0x12c/0x520
> [  188.352624][ T1320] [c000000005817cc0] [c000000000793ae8] ext4_file_read_iter+0x78/0x320
> [  188.352724][ T1320] [c000000005817d10] [c000000000673e54] vfs_read+0x314/0x3d0
> [  188.352813][ T1320] [c000000005817dc0] [c000000000674ad8] ksys_read+0x88/0x150
> [  188.352905][ T1320] [c000000005817e10] [c00000000002fff4] system_call_exception+0x114/0x300
> [  188.353019][ T1320] [c000000005817e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> 
> which is:
> 
> int block_read_full_folio(struct folio *folio, get_block_t *get_block)
> {
> 	...
> 	/* This is needed for ext4. */
> 	if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
> 		limit = inode->i_sb->s_maxbytes;
> 
> 	VM_BUG_ON_FOLIO(folio_test_large(folio), folio);    <-------------
> 
> 	head = folio_create_buffers(folio, inode, 0);
> 	blocksize = head->b_size;
> 
> This seems like it got mistakenly left out. Wihtout this line I'm not
> hitting the BUG, however it's strange that none the x86 testing caught
> this. I can only replicate this on 4k blocksize on 64k page size power
> pc architecture. I'll spend some time to understand why it is not
> getting hit on x86 with 1k bs. (maybe ext4_mpage_readpages() is not
> falling to block_read_full_folio that easily.)
> 
> I'll continue testing with the line removed.

Hi Ojaswin.

Thanks for the test again, I checked the commit, this line has already
been removed by commit e59e97d42b05 ("fs/buffer fs/mpage: remove large
folio restriction").

Thanks,
Yi.


  reply	other threads:[~2025-05-19  1:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12  6:33 [PATCH v2 0/8] ext4: enable large folio for regular files Zhang Yi
2025-05-12  6:33 ` [PATCH v2 1/8] ext4: make ext4_mpage_readpages() support large folios Zhang Yi
2025-05-20 10:41   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 2/8] ext4: make regular file's buffered write path " Zhang Yi
2025-05-12  6:33 ` [PATCH v2 3/8] ext4: make __ext4_block_zero_page_range() support large folio Zhang Yi
2025-05-20 10:41   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to " Zhang Yi
2025-05-19 20:16   ` Jan Kara
2025-05-20 12:46     ` Zhang Yi
2025-05-21 10:31       ` Jan Kara
2025-05-12  6:33 ` [PATCH v2 5/8] ext4: correct the journal credits calculations of allocating blocks Zhang Yi
2025-05-19  2:48   ` Zhang Yi
2025-05-19 15:48     ` Theodore Ts'o
2025-05-20 13:04       ` Zhang Yi
2025-05-19 20:24   ` Jan Kara
2025-05-20 12:53     ` Zhang Yi
2025-05-12  6:33 ` [PATCH v2 6/8] ext4: make the writeback path support large folios Zhang Yi
2025-05-20 10:42   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 7/8] ext4: make online defragmentation " Zhang Yi
2025-05-12  6:33 ` [PATCH v2 8/8] ext4: enable large folio for regular file Zhang Yi
2025-05-16  9:05   ` kernel test robot
2025-05-20 10:48   ` Ojaswin Mujoo
2025-06-25  8:14   ` Lai, Yi
2025-06-25 13:15     ` Theodore Ts'o
2025-06-26  3:35       ` Lai, Yi
2025-06-26 11:29   ` D, Suneeth
2025-06-26 13:26     ` Zhang Yi
2025-06-26 14:56       ` Theodore Ts'o
2025-07-03 14:13         ` Zhang Yi
2025-05-16 11:48 ` [PATCH v2 0/8] ext4: enable large folio for regular files Ojaswin Mujoo
2025-05-19  1:19   ` Zhang Yi [this message]
2025-05-19 11:02     ` Ojaswin Mujoo
2025-05-19 20:33 ` Jan Kara
2025-05-20 13:09   ` Zhang Yi
2025-05-20 10:37 ` Ojaswin Mujoo
2025-05-20 13:41   ` Zhang Yi
2025-05-20 14:40 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33b938e9-bd81-4017-a7e0-e5ffb216ac70@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).