From: Jan Kara <jack@suse.cz>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Ritesh Harjani <riteshh@linux.ibm.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Andreas Dilger <adilger.kernel@dilger.ca>,
Jan Kara <jack@suse.cz>, rookxu <brookxu.cn@gmail.com>,
Ritesh Harjani <ritesh.list@gmail.com>
Subject: Re: [RFC v3 8/8] ext4: Remove the logic to trim inode PAs
Date: Thu, 29 Sep 2022 14:53:11 +0200 [thread overview]
Message-ID: <20220929125311.bmkta7gp4a2hmcny@quack3> (raw)
In-Reply-To: <a26fdd12f4f60cf506a42b6a95e8014e5f380b05.1664269665.git.ojaswin@linux.ibm.com>
On Tue 27-09-22 14:46:48, Ojaswin Mujoo wrote:
> Earlier, inode PAs were stored in a linked list. This caused a need to
> periodically trim the list down inorder to avoid growing it to a very
> large size, as this would severly affect performance during list
> iteration.
>
> Recent patches changed this list to an rbtree, and since the tree scales
> up much better, we no longer need to have the trim functionality, hence
> remove it.
>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
I'm kind of wondering: Now there won't be performance issues with much
more inode PAs but probably we don't want to let them grow completely out
of control? E.g. I can imagine that if we'd have 1 billion of inode PAs
attached to an inode, things would get wonky both in terms of memory
consumption and also in terms of CPU time spent for the cases where we
still do iterate all of the PAs... Is there anything which keeps inode PAs
reasonably bounded?
Honza
> ---
> Documentation/admin-guide/ext4.rst | 3 ---
> fs/ext4/ext4.h | 1 -
> fs/ext4/mballoc.c | 20 --------------------
> fs/ext4/mballoc.h | 5 -----
> fs/ext4/sysfs.c | 2 --
> 5 files changed, 31 deletions(-)
>
> diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
> index 4c559e08d11e..5740d85439ff 100644
> --- a/Documentation/admin-guide/ext4.rst
> +++ b/Documentation/admin-guide/ext4.rst
> @@ -489,9 +489,6 @@ Files in /sys/fs/ext4/<devname>:
> multiple of this tuning parameter if the stripe size is not set in the
> ext4 superblock
>
> - mb_max_inode_prealloc
> - The maximum length of per-inode ext4_prealloc_space list.
> -
> mb_max_to_scan
> The maximum number of extents the multiblock allocator will search to
> find the best extent.
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index d54b972f1f0f..bca4b41cc192 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1612,7 +1612,6 @@ struct ext4_sb_info {
> unsigned int s_mb_stats;
> unsigned int s_mb_order2_reqs;
> unsigned int s_mb_group_prealloc;
> - unsigned int s_mb_max_inode_prealloc;
> unsigned int s_max_dir_size_kb;
> /* where last allocation was done - for stream allocation */
> unsigned long s_mb_last_group;
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index cd19b9e84767..57e1ec88477a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3420,7 +3420,6 @@ int ext4_mb_init(struct super_block *sb)
> sbi->s_mb_stats = MB_DEFAULT_STATS;
> sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD;
> sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS;
> - sbi->s_mb_max_inode_prealloc = MB_DEFAULT_MAX_INODE_PREALLOC;
> /*
> * The default group preallocation is 512, which for 4k block
> * sizes translates to 2 megabytes. However for bigalloc file
> @@ -5546,29 +5545,11 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
> return ;
> }
>
> -/*
> - * if per-inode prealloc list is too long, trim some PA
> - */
> -static void ext4_mb_trim_inode_pa(struct inode *inode)
> -{
> - struct ext4_inode_info *ei = EXT4_I(inode);
> - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> - int count, delta;
> -
> - count = atomic_read(&ei->i_prealloc_active);
> - delta = (sbi->s_mb_max_inode_prealloc >> 2) + 1;
> - if (count > sbi->s_mb_max_inode_prealloc + delta) {
> - count -= sbi->s_mb_max_inode_prealloc;
> - ext4_discard_preallocations(inode, count);
> - }
> -}
> -
> /*
> * release all resource we used in allocation
> */
> static int ext4_mb_release_context(struct ext4_allocation_context *ac)
> {
> - struct inode *inode = ac->ac_inode;
> struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> struct ext4_prealloc_space *pa = ac->ac_pa;
> if (pa) {
> @@ -5604,7 +5585,6 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac)
> if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)
> mutex_unlock(&ac->ac_lg->lg_mutex);
> ext4_mb_collect_stats(ac);
> - ext4_mb_trim_inode_pa(inode);
> return 0;
> }
>
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index f8e8ee493867..6d85ee8674a6 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -73,11 +73,6 @@
> */
> #define MB_DEFAULT_GROUP_PREALLOC 512
>
> -/*
> - * maximum length of inode prealloc list
> - */
> -#define MB_DEFAULT_MAX_INODE_PREALLOC 512
> -
> /*
> * Number of groups to search linearly before performing group scanning
> * optimization.
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index d233c24ea342..f0d42cf44c71 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -214,7 +214,6 @@ EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
> EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
> EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
> EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
> -EXT4_RW_ATTR_SBI_UI(mb_max_inode_prealloc, s_mb_max_inode_prealloc);
> EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
> EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
> EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
> @@ -264,7 +263,6 @@ static struct attribute *ext4_attrs[] = {
> ATTR_LIST(mb_order2_req),
> ATTR_LIST(mb_stream_req),
> ATTR_LIST(mb_group_prealloc),
> - ATTR_LIST(mb_max_inode_prealloc),
> ATTR_LIST(mb_max_linear_groups),
> ATTR_LIST(max_writeback_mb_bump),
> ATTR_LIST(extent_max_zeroout_kb),
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2022-09-29 12:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-27 9:16 [RFC v2 0/8] ext4: Convert inode preallocation list to an rbtree Ojaswin Mujoo
2022-09-27 9:16 ` [RFC v3 1/8] ext4: Stop searching if PA doesn't satisfy non-extent file Ojaswin Mujoo
2022-09-29 11:24 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 2/8] ext4: Refactor code related to freeing PAs Ojaswin Mujoo
2022-09-29 11:26 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 3/8] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Ojaswin Mujoo
2022-09-29 11:30 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 4/8] ext4: Move overlap assert logic into a separate function Ojaswin Mujoo
2022-09-29 11:32 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 5/8] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Ojaswin Mujoo
2022-09-29 11:36 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 6/8] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Ojaswin Mujoo
2022-09-29 11:40 ` Jan Kara
2022-09-27 9:16 ` [RFC v3 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Ojaswin Mujoo
2022-09-29 12:39 ` Jan Kara
2022-10-03 11:25 ` Ojaswin Mujoo
2022-09-27 9:16 ` [RFC v3 8/8] ext4: Remove the logic to trim inode PAs Ojaswin Mujoo
2022-09-29 12:53 ` Jan Kara [this message]
2022-10-06 6:55 ` Ojaswin Mujoo
2022-10-06 8:59 ` Jan Kara
2022-10-06 10:03 ` Ojaswin Mujoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220929125311.bmkta7gp4a2hmcny@quack3 \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=brookxu.cn@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).