All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: "D, Suneeth" <Suneeth.D@amd.com>, linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	willy@infradead.org, tytso@mit.edu, adilger.kernel@dilger.ca,
	jack@suse.cz, yi.zhang@huawei.com, libaokun1@huawei.com,
	yukuai3@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 8/8] ext4: enable large folio for regular file
Date: Thu, 26 Jun 2025 21:26:41 +0800	[thread overview]
Message-ID: <94de227e-23c1-4089-b99c-e8fc0beae5da@huaweicloud.com> (raw)
In-Reply-To: <f59ef632-0d11-4ae7-bdad-d552fe1f1d78@amd.com>

Hello Suneeth D!

On 2025/6/26 19:29, D, Suneeth wrote:
> 
> Hello Zhang Yi,
> 
> On 5/12/2025 12:03 PM, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Besides fsverity, fscrypt, and the data=journal mode, ext4 now supports
>> large folios for regular files. Enable this feature by default. However,
>> since we cannot change the folio order limitation of mappings on active
>> inodes, setting the journal=data mode via ioctl on an active inode will
>> not take immediate effect in non-delalloc mode.
>>
> 
> We run lmbench3 as part of our Weekly CI for the purpose of Kernel Performance Regression testing between a stable vs rc kernel. We noticed a regression on the kernels starting from 6.16-rc1 all the way through 6.16-rc3 in the range of 8-12%. Further bisection b/w 6.15 and 6.16-rc1 pointed me to the first bad commit as 7ac67301e82f02b77a5c8e7377a1f414ef108b84. The following were the machine configurations and test parameters used:-
> 
> Model name:           AMD EPYC 9754 128-Core Processor [Bergamo]
> Thread(s) per core:   2
> Core(s) per socket:   128
> Socket(s):            1
> Total online memory:  258G
> 
> micro-benchmark_variant: "lmbench3-development-1-0-MMAP-50%" which has the following parameters,
> 
> -> nr_thread:     1
> -> memory_size: 50%
> -> mode:     development
> -> test:        MMAP
> 
> The following are the stats after bisection:-
> 
> (the KPI used here is lmbench3.MMAP.read.latency.us)
> 
> v6.15 -                         97.3K
> 
> v6.16-rc1 -                         107.5K
> 
> v6.16-rc3 -                         107.4K
> 
> 6.15.0-rc4badcommit -                     103.5K
> 
> 6.15.0-rc4badcommit_m1 (one commit before bad-commit) - 94.2K

Thanks for the report, I will try to reproduce this performance regression on
my machine and find out what caused this regression.

Thanks,
Yi.

> 
> I also ran the micro-benchmark with tools/testing/perf record and following is the output from tools/testing/perf diff b/w the bad commit and just one commit before that.
> 
> # ./perf diff perf.data.old  perf.data
> No kallsyms or vmlinux with build-id da8042fb274c5e3524318e5e3afbeeef5df2055e was found
> # Event 'cycles:P'
> #
> # Baseline  Delta Abs  Shared Object            Symbol
> 
>            >
> # ........  .........  ....................... ....................................................................................................................................................................................>
> #
>                +4.34%  [kernel.kallsyms]        [k] __lruvec_stat_mod_folio
>                +3.41%  [kernel.kallsyms]        [k] unmap_page_range
>                +3.33%  [kernel.kallsyms]        [k] __mod_memcg_lruvec_state
>                +2.04%  [kernel.kallsyms]        [k] srso_alias_return_thunk
>                +2.02%  [kernel.kallsyms]        [k] srso_alias_safe_ret
>     22.22%     -1.78%  bw_mmap_rd               [.] bread
>                +1.76%  [kernel.kallsyms]        [k] __handle_mm_fault
>                +1.70%  [kernel.kallsyms]        [k] filemap_map_pages
>                +1.58%  [kernel.kallsyms]        [k] set_pte_range
>                +1.58%  [kernel.kallsyms]        [k] next_uptodate_folio
>                +1.33%  [kernel.kallsyms]        [k] do_anonymous_page
>                +1.01%  [kernel.kallsyms]        [k] get_page_from_freelist
>                +0.98%  [kernel.kallsyms]        [k] __mem_cgroup_charge
>                +0.85%  [kernel.kallsyms]        [k] asm_exc_page_fault
>                +0.82%  [kernel.kallsyms]        [k] native_irq_return_iret
>                +0.82%  [kernel.kallsyms]        [k] do_user_addr_fault
>                +0.77%  [kernel.kallsyms]        [k] clear_page_erms
>                +0.75%  [kernel.kallsyms]        [k] handle_mm_fault
>                +0.73%  [kernel.kallsyms]        [k] set_ptes.isra.0
>                +0.70%  [kernel.kallsyms]        [k] lru_add
>                +0.69%  [kernel.kallsyms]        [k] folio_add_file_rmap_ptes
>                +0.68%  [kernel.kallsyms]        [k] folio_remove_rmap_ptes
>     12.45%     -0.65%  line                     [.] mem_benchmark_0
>                +0.64%  [kernel.kallsyms]        [k] __alloc_frozen_pages_noprof
>                +0.63%  [kernel.kallsyms]        [k] vm_normal_page
>                +0.63%  [kernel.kallsyms]        [k] free_pages_and_swap_cache
>                +0.63%  [kernel.kallsyms]        [k] lock_vma_under_rcu
>                +0.60%  [kernel.kallsyms]        [k] __rcu_read_unlock
>                +0.59%  [kernel.kallsyms]        [k] cgroup_rstat_updated
>                +0.57%  [kernel.kallsyms]        [k] get_mem_cgroup_from_mm
>                +0.52%  [kernel.kallsyms]        [k] __mod_lruvec_state
>                +0.51%  [kernel.kallsyms]        [k] exc_page_fault
> 
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>   fs/ext4/ext4.h      |  1 +
>>   fs/ext4/ext4_jbd2.c |  3 ++-
>>   fs/ext4/ialloc.c    |  3 +++
>>   fs/ext4/inode.c     | 20 ++++++++++++++++++++
>>   4 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 5a20e9cd7184..2fad90c30493 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -2993,6 +2993,7 @@ int ext4_walk_page_buffers(handle_t *handle,
>>                        struct buffer_head *bh));
>>   int do_journal_get_write_access(handle_t *handle, struct inode *inode,
>>                   struct buffer_head *bh);
>> +bool ext4_should_enable_large_folio(struct inode *inode);
>>   #define FALL_BACK_TO_NONDELALLOC 1
>>   #define CONVERT_INLINE_DATA     2
>>   diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
>> index 135e278c832e..b3e9b7bd7978 100644
>> --- a/fs/ext4/ext4_jbd2.c
>> +++ b/fs/ext4/ext4_jbd2.c
>> @@ -16,7 +16,8 @@ int ext4_inode_journal_mode(struct inode *inode)
>>           ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
>>           test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
>>           (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
>> -        !test_opt(inode->i_sb, DELALLOC))) {
>> +        !test_opt(inode->i_sb, DELALLOC) &&
>> +        !mapping_large_folio_support(inode->i_mapping))) {
>>           /* We do not support data journalling for encrypted data */
>>           if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode))
>>               return EXT4_INODE_ORDERED_DATA_MODE;  /* ordered */
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index e7ecc7c8a729..4938e78cbadc 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -1336,6 +1336,9 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idmap,
>>           }
>>       }
>>   +    if (ext4_should_enable_large_folio(inode))
>> +        mapping_set_large_folios(inode->i_mapping);
>> +
>>       ext4_update_inode_fsync_trans(handle, inode, 1);
>>         err = ext4_mark_inode_dirty(handle, inode);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 29eccdf8315a..7fd3921cfe46 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4774,6 +4774,23 @@ static int check_igot_inode(struct inode *inode, ext4_iget_flags flags,
>>       return -EFSCORRUPTED;
>>   }
>>   +bool ext4_should_enable_large_folio(struct inode *inode)
>> +{
>> +    struct super_block *sb = inode->i_sb;
>> +
>> +    if (!S_ISREG(inode->i_mode))
>> +        return false;
>> +    if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
>> +        ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
>> +        return false;
>> +    if (ext4_has_feature_verity(sb))
>> +        return false;
>> +    if (ext4_has_feature_encrypt(sb))
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>>   struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>>                 ext4_iget_flags flags, const char *function,
>>                 unsigned int line)
>> @@ -5096,6 +5113,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>>           ret = -EFSCORRUPTED;
>>           goto bad_inode;
>>       }
>> +    if (ext4_should_enable_large_folio(inode))
>> +        mapping_set_large_folios(inode->i_mapping);
>> +
>>       ret = check_igot_inode(inode, flags, function, line);
>>       /*
>>        * -ESTALE here means there is nothing inherently wrong with the inode,
> 
> ---
> Thanks and Regards,
> Suneeth D


  reply	other threads:[~2025-06-26 13:26 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12  6:33 [PATCH v2 0/8] ext4: enable large folio for regular files Zhang Yi
2025-05-12  6:33 ` [PATCH v2 1/8] ext4: make ext4_mpage_readpages() support large folios Zhang Yi
2025-05-20 10:41   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 2/8] ext4: make regular file's buffered write path " Zhang Yi
2025-05-12  6:33 ` [PATCH v2 3/8] ext4: make __ext4_block_zero_page_range() support large folio Zhang Yi
2025-05-20 10:41   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page() to " Zhang Yi
2025-05-19 20:16   ` Jan Kara
2025-05-20 12:46     ` Zhang Yi
2025-05-21 10:31       ` Jan Kara
2025-05-12  6:33 ` [PATCH v2 5/8] ext4: correct the journal credits calculations of allocating blocks Zhang Yi
2025-05-19  2:48   ` Zhang Yi
2025-05-19 15:48     ` Theodore Ts'o
2025-05-20 13:04       ` Zhang Yi
2025-05-19 20:24   ` Jan Kara
2025-05-20 12:53     ` Zhang Yi
2025-05-12  6:33 ` [PATCH v2 6/8] ext4: make the writeback path support large folios Zhang Yi
2025-05-20 10:42   ` Ojaswin Mujoo
2025-05-12  6:33 ` [PATCH v2 7/8] ext4: make online defragmentation " Zhang Yi
2025-05-12  6:33 ` [PATCH v2 8/8] ext4: enable large folio for regular file Zhang Yi
2025-05-16  9:05   ` kernel test robot
2025-05-20 10:48   ` Ojaswin Mujoo
2025-06-25  8:14   ` Lai, Yi
2025-06-25 13:15     ` Theodore Ts'o
2025-06-26  3:35       ` Lai, Yi
2025-06-26 11:29   ` D, Suneeth
2025-06-26 13:26     ` Zhang Yi [this message]
2025-06-26 14:56       ` Theodore Ts'o
2025-07-03 14:13         ` Zhang Yi
2025-05-16 11:48 ` [PATCH v2 0/8] ext4: enable large folio for regular files Ojaswin Mujoo
2025-05-19  1:19   ` Zhang Yi
2025-05-19 11:02     ` Ojaswin Mujoo
2025-05-19 20:33 ` Jan Kara
2025-05-20 13:09   ` Zhang Yi
2025-05-20 10:37 ` Ojaswin Mujoo
2025-05-20 13:41   ` Zhang Yi
2025-05-20 14:40 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94de227e-23c1-4089-b99c-e8fc0beae5da@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=Suneeth.D@amd.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.