From: Wang Jianchao <jianchao.wan9@gmail.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
lishujin@kuaishou.com,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH V2 7/7] ext4: get discard out of jbd2 commit kthread contex
Date: Fri, 28 May 2021 11:06:52 +0800 [thread overview]
Message-ID: <60e8710a-1d4a-e415-a364-f6f1c75c54d6@gmail.com> (raw)
In-Reply-To: <2BC51066-DC7C-4DAF-80B4-EEE8BD9FD814@dilger.ca>
On 2021/5/28 4:18 AM, Andreas Dilger wrote:
> On May 26, 2021, at 2:44 AM, Wang Jianchao <jianchao.wan9@gmail.com> wrote:
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>> - mark blocks to be discarded as used if they have not been allocated
>> - do discard
>> - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>
> I definitely agree that sharing the existing fstrim functionality makes
> the most sense here. Some comments inline on the implementation.
>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@mit.edu>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
>> ---
>> fs/ext4/ext4.h | 2 +
>> fs/ext4/mballoc.c | 162 +++++++++++++++++++++++++++++++++---------------------
>> fs/ext4/mballoc.h | 3 -
>> 3 files changed, 101 insertions(+), 66 deletions(-)
>>
>> @@ -3024,30 +3039,77 @@ static inline int ext4_issue_discard(struct super_block *sb,
>> return sb_issue_discard(sb, discard_block, count, GFP_NOFS, 0);
>> }
>>
>> -static void ext4_free_data_in_buddy(struct super_block *sb,
>> - struct ext4_free_data *entry)
>> +static void ext4_discard_work(struct work_struct *work)
>> {
>> + struct ext4_sb_info *sbi = container_of(work,
>> + struct ext4_sb_info, s_discard_work);
>> + struct super_block *sb = sbi->s_sb;
>> + ext4_group_t ngroups = ext4_get_groups_count(sb);
>> + struct ext4_group_info *grp;
>> + struct ext4_free_data *fd, *nfd;
>> struct ext4_buddy e4b;
>> - struct ext4_group_info *db;
>> - int err, count = 0, count2 = 0;
>> + int i, err;
>> +
>> + for (i = 0; i < ngroups; i++) {
>> + grp = ext4_get_group_info(sb, i);
>> + if (RB_EMPTY_ROOT(&grp->bb_discard_root))
>> + continue;
>
> For large filesystems there may be millions of block groups, so it
> seems inefficient to scan all of the groups each time the work queue
Yes it seems to be. At the moment I cooked the patch, I thought kwork is
running on background, it should not be a big deal.
> is run. Having a list of block group numbers, or bitmap/rbtree/xarray
> of the group numbers that need to be trimmed may be more efficient?
Maybe we can use a bitmap to record the bgs that need to be trimed
Best regards
Jianchao
>
> Most of the complexity in the rest of the patch goes away if the trim
> tracking is only done on a whole-group basis (min/max or just a single
> bit per group).
>
> Cheers, Andreas
>
>> - mb_debug(sb, "gonna free %u blocks in group %u (0x%p):",
>
>
>
>
next prev parent reply other threads:[~2021-05-28 3:07 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <164ffa3b-c4d5-6967-feba-b972995a6dfb@gmail.com>
2021-05-26 8:42 ` [PATCH V2 1/7] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
2021-05-27 19:47 ` Andreas Dilger
2021-06-23 13:13 ` Theodore Ts'o
2021-06-23 19:49 ` Theodore Ts'o
2021-07-02 10:27 ` Josh Triplett
2021-07-05 11:28 ` Wang Jianchao
[not found] ` <a602a6ba-2073-8384-4c8f-d669ee25c065@gmail.com>
2021-05-26 8:43 ` [PATCH V2 2/7] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
2021-05-27 19:48 ` Andreas Dilger
[not found] ` <49382052-6238-f1fb-40d1-b6b801b39ff7@gmail.com>
2021-05-26 8:43 ` [PATCH V2 3/7] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
2021-05-27 19:49 ` Andreas Dilger
[not found] ` <48e33dea-d15e-f211-0191-e01bd3eb17b3@gmail.com>
2021-05-26 8:43 ` [PATCH V2 4/7] ext4: add new helper interface ext4_insert_free_data Wang Jianchao
2021-05-27 20:09 ` Andreas Dilger
2021-05-28 3:40 ` Wang Jianchao
[not found] ` <67eeb65a-d413-c4f9-c06f-d5dcceca0e4f@gmail.com>
2021-05-26 8:43 ` [PATCH V2 5/7] ext4: get buddy cache after insert successfully Wang Jianchao
2021-06-23 3:06 ` Theodore Ts'o
[not found] ` <0b7915bc-193a-137b-4e52-8aaef8d6fef3@gmail.com>
2021-05-26 8:43 ` [PATCH V2 6/7] ext4: use bb_free_root to get the free data entry Wang Jianchao
2021-05-26 8:44 ` [PATCH V2 7/7] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-05-27 20:18 ` Andreas Dilger
2021-05-28 3:06 ` Wang Jianchao [this message]
2021-06-22 0:55 ` Josh Triplett
2023-09-06 0:11 ` Sarthak Kukreti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60e8710a-1d4a-e415-a364-f6f1c75c54d6@gmail.com \
--to=jianchao.wan9@gmail.com \
--cc=adilger@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lishujin@kuaishou.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).