[PATCH v6] ext4: improve trim efficiency

Linux EXT4 FS development
 help / color / mirror / Atom feed

* [PATCH v6] ext4: improve trim efficiency
@ 2023-09-01  9:28 Fengnan Chang
  2023-09-15  9:19 ` Fengnan Chang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Fengnan Chang @ 2023-09-01  9:28 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, Fengnan Chang

In commit a015434480dc("ext4: send parallel discards on commit
completions"), issue all discard commands in parallel make all
bios could merged into one request, so lowlevel drive can issue
multi segments in one time which is more efficiency, but commit
55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
seems broke this way, let's fix it.

In my test:
1. create 10 normal files, each file size is 10G.
2. deallocate file, punch a 16k holes every 32k.
3. trim all fs.
the time of fstrim fs reduce from 6.7s to 1.3s.

Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
---
 fs/ext4/mballoc.c | 95 +++++++++++++++++++++++++----------------------
 1 file changed, 51 insertions(+), 44 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1e4c667812a9..9fc69a92c496 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6874,70 +6874,61 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	return err;
 }
 
-/**
- * ext4_trim_extent -- function to TRIM one single free extent in the group
- * @sb:		super block for the file system
- * @start:	starting block of the free extent in the alloc. group
- * @count:	number of blocks to TRIM
- * @e4b:	ext4 buddy for the group
- *
- * Trim "count" blocks starting at "start" in the "group". To assure that no
- * one will allocate those blocks, mark it as used in buddy bitmap. This must
- * be called with under the group lock.
- */
-static int ext4_trim_extent(struct super_block *sb,
-		int start, int count, struct ext4_buddy *e4b)
-__releases(bitlock)
-__acquires(bitlock)
-{
-	struct ext4_free_extent ex;
-	ext4_group_t group = e4b->bd_group;
-	int ret = 0;
-
-	trace_ext4_trim_extent(sb, group, start, count);
-
-	assert_spin_locked(ext4_group_lock_ptr(sb, group));
-
-	ex.fe_start = start;
-	ex.fe_group = group;
-	ex.fe_len = count;
-
-	/*
-	 * Mark blocks used, so no one can reuse them while
-	 * being trimmed.
-	 */
-	mb_mark_used(e4b, &ex);
-	ext4_unlock_group(sb, group);
-	ret = ext4_issue_discard(sb, group, start, count, NULL);
-	ext4_lock_group(sb, group);
-	mb_free_blocks(NULL, e4b, start, ex.fe_len);
-	return ret;
-}
-
 static int ext4_try_to_trim_range(struct super_block *sb,
 		struct ext4_buddy *e4b, ext4_grpblk_t start,
 		ext4_grpblk_t max, ext4_grpblk_t minblocks)
 __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
 __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
 {
-	ext4_grpblk_t next, count, free_count;
+	ext4_grpblk_t next, count, free_count, bak;
 	void *bitmap;
+	struct ext4_free_data *entry = NULL, *fd, *nfd;
+	struct list_head discard_data_list;
+	struct bio *discard_bio = NULL;
+	struct blk_plug plug;
+	ext4_group_t group = e4b->bd_group;
+	struct ext4_free_extent ex;
+	bool noalloc = false;
+	int ret = 0;
+
+	INIT_LIST_HEAD(&discard_data_list);
 
 	bitmap = e4b->bd_bitmap;
 	start = max(e4b->bd_info->bb_first_free, start);
 	count = 0;
 	free_count = 0;
 
+	blk_start_plug(&plug);
 	while (start <= max) {
 		start = mb_find_next_zero_bit(bitmap, max + 1, start);
 		if (start > max)
 			break;
+		bak = start;
 		next = mb_find_next_bit(bitmap, max + 1, start);
-
 		if ((next - start) >= minblocks) {
-			int ret = ext4_trim_extent(sb, start, next - start, e4b);
+			/* when only one segment, there is no need to alloc entry */
+			noalloc = (free_count == 0) && (next >= max);
 
-			if (ret && ret != -EOPNOTSUPP)
+			trace_ext4_trim_extent(sb, group, start, next - start);
+			ex.fe_start = start;
+			ex.fe_group = group;
+			ex.fe_len = next - start;
+			/*
+			 * Mark blocks used, so no one can reuse them while
+			 * being trimmed.
+			 */
+			mb_mark_used(e4b, &ex);
+			ext4_unlock_group(sb, group);
+			ret = ext4_issue_discard(sb, group, start, next - start, &discard_bio);
+			if (!noalloc) {
+				entry = kmem_cache_alloc(ext4_free_data_cachep,
+							GFP_NOFS|__GFP_NOFAIL);
+				entry->efd_start_cluster = start;
+				entry->efd_count = next - start;
+				list_add_tail(&entry->efd_list, &discard_data_list);
+			}
+			ext4_lock_group(sb, group);
+			if (ret < 0)
 				break;
 			count += next - start;
 		}
@@ -6959,6 +6950,22 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
 			break;
 	}
 
+	if (discard_bio) {
+		ext4_unlock_group(sb, e4b->bd_group);
+		submit_bio_wait(discard_bio);
+		bio_put(discard_bio);
+		ext4_lock_group(sb, e4b->bd_group);
+	}
+	blk_finish_plug(&plug);
+
+	if (noalloc && free_count)
+		mb_free_blocks(NULL, e4b, bak, free_count);
+
+	list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
+		mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
+		kmem_cache_free(ext4_free_data_cachep, fd);
+	}
+
 	return count;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v6] ext4: improve trim efficiency
  2023-09-01  9:28 [PATCH v6] ext4: improve trim efficiency Fengnan Chang
@ 2023-09-15  9:19 ` Fengnan Chang
  2023-10-12 11:57 ` Fengnan Chang
  2024-01-08 17:15 ` Jan Kara
  2 siblings, 0 replies; 7+ messages in thread
From: Fengnan Chang @ 2023-09-15  9:19 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4

ping

Fengnan Chang <changfengnan@bytedance.com> 于2023年9月1日周五 17:28写道：
>
> In commit a015434480dc("ext4: send parallel discards on commit
> completions"), issue all discard commands in parallel make all
> bios could merged into one request, so lowlevel drive can issue
> multi segments in one time which is more efficiency, but commit
> 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> seems broke this way, let's fix it.
>
> In my test:
> 1. create 10 normal files, each file size is 10G.
> 2. deallocate file, punch a 16k holes every 32k.
> 3. trim all fs.
> the time of fstrim fs reduce from 6.7s to 1.3s.
>
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> ---
>  fs/ext4/mballoc.c | 95 +++++++++++++++++++++++++----------------------
>  1 file changed, 51 insertions(+), 44 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 1e4c667812a9..9fc69a92c496 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6874,70 +6874,61 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>         return err;
>  }
>
> -/**
> - * ext4_trim_extent -- function to TRIM one single free extent in the group
> - * @sb:                super block for the file system
> - * @start:     starting block of the free extent in the alloc. group
> - * @count:     number of blocks to TRIM
> - * @e4b:       ext4 buddy for the group
> - *
> - * Trim "count" blocks starting at "start" in the "group". To assure that no
> - * one will allocate those blocks, mark it as used in buddy bitmap. This must
> - * be called with under the group lock.
> - */
> -static int ext4_trim_extent(struct super_block *sb,
> -               int start, int count, struct ext4_buddy *e4b)
> -__releases(bitlock)
> -__acquires(bitlock)
> -{
> -       struct ext4_free_extent ex;
> -       ext4_group_t group = e4b->bd_group;
> -       int ret = 0;
> -
> -       trace_ext4_trim_extent(sb, group, start, count);
> -
> -       assert_spin_locked(ext4_group_lock_ptr(sb, group));
> -
> -       ex.fe_start = start;
> -       ex.fe_group = group;
> -       ex.fe_len = count;
> -
> -       /*
> -        * Mark blocks used, so no one can reuse them while
> -        * being trimmed.
> -        */
> -       mb_mark_used(e4b, &ex);
> -       ext4_unlock_group(sb, group);
> -       ret = ext4_issue_discard(sb, group, start, count, NULL);
> -       ext4_lock_group(sb, group);
> -       mb_free_blocks(NULL, e4b, start, ex.fe_len);
> -       return ret;
> -}
> -
>  static int ext4_try_to_trim_range(struct super_block *sb,
>                 struct ext4_buddy *e4b, ext4_grpblk_t start,
>                 ext4_grpblk_t max, ext4_grpblk_t minblocks)
>  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
>  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>  {
> -       ext4_grpblk_t next, count, free_count;
> +       ext4_grpblk_t next, count, free_count, bak;
>         void *bitmap;
> +       struct ext4_free_data *entry = NULL, *fd, *nfd;
> +       struct list_head discard_data_list;
> +       struct bio *discard_bio = NULL;
> +       struct blk_plug plug;
> +       ext4_group_t group = e4b->bd_group;
> +       struct ext4_free_extent ex;
> +       bool noalloc = false;
> +       int ret = 0;
> +
> +       INIT_LIST_HEAD(&discard_data_list);
>
>         bitmap = e4b->bd_bitmap;
>         start = max(e4b->bd_info->bb_first_free, start);
>         count = 0;
>         free_count = 0;
>
> +       blk_start_plug(&plug);
>         while (start <= max) {
>                 start = mb_find_next_zero_bit(bitmap, max + 1, start);
>                 if (start > max)
>                         break;
> +               bak = start;
>                 next = mb_find_next_bit(bitmap, max + 1, start);
> -
>                 if ((next - start) >= minblocks) {
> -                       int ret = ext4_trim_extent(sb, start, next - start, e4b);
> +                       /* when only one segment, there is no need to alloc entry */
> +                       noalloc = (free_count == 0) && (next >= max);
>
> -                       if (ret && ret != -EOPNOTSUPP)
> +                       trace_ext4_trim_extent(sb, group, start, next - start);
> +                       ex.fe_start = start;
> +                       ex.fe_group = group;
> +                       ex.fe_len = next - start;
> +                       /*
> +                        * Mark blocks used, so no one can reuse them while
> +                        * being trimmed.
> +                        */
> +                       mb_mark_used(e4b, &ex);
> +                       ext4_unlock_group(sb, group);
> +                       ret = ext4_issue_discard(sb, group, start, next - start, &discard_bio);
> +                       if (!noalloc) {
> +                               entry = kmem_cache_alloc(ext4_free_data_cachep,
> +                                                       GFP_NOFS|__GFP_NOFAIL);
> +                               entry->efd_start_cluster = start;
> +                               entry->efd_count = next - start;
> +                               list_add_tail(&entry->efd_list, &discard_data_list);
> +                       }
> +                       ext4_lock_group(sb, group);
> +                       if (ret < 0)
>                                 break;
>                         count += next - start;
>                 }
> @@ -6959,6 +6950,22 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>                         break;
>         }
>
> +       if (discard_bio) {
> +               ext4_unlock_group(sb, e4b->bd_group);
> +               submit_bio_wait(discard_bio);
> +               bio_put(discard_bio);
> +               ext4_lock_group(sb, e4b->bd_group);
> +       }
> +       blk_finish_plug(&plug);
> +
> +       if (noalloc && free_count)
> +               mb_free_blocks(NULL, e4b, bak, free_count);
> +
> +       list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
> +               mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
> +               kmem_cache_free(ext4_free_data_cachep, fd);
> +       }
> +
>         return count;
>  }
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v6] ext4: improve trim efficiency
  2023-09-01  9:28 [PATCH v6] ext4: improve trim efficiency Fengnan Chang
  2023-09-15  9:19 ` Fengnan Chang
@ 2023-10-12 11:57 ` Fengnan Chang
  2024-01-08 17:15 ` Jan Kara
  2 siblings, 0 replies; 7+ messages in thread
From: Fengnan Chang @ 2023-10-12 11:57 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4

Hi Ted:
    any new comments ?

Fengnan Chang <changfengnan@bytedance.com> 于2023年9月1日周五 17:28写道：
>
> In commit a015434480dc("ext4: send parallel discards on commit
> completions"), issue all discard commands in parallel make all
> bios could merged into one request, so lowlevel drive can issue
> multi segments in one time which is more efficiency, but commit
> 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> seems broke this way, let's fix it.
>
> In my test:
> 1. create 10 normal files, each file size is 10G.
> 2. deallocate file, punch a 16k holes every 32k.
> 3. trim all fs.
> the time of fstrim fs reduce from 6.7s to 1.3s.
>
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> ---
>  fs/ext4/mballoc.c | 95 +++++++++++++++++++++++++----------------------
>  1 file changed, 51 insertions(+), 44 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 1e4c667812a9..9fc69a92c496 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6874,70 +6874,61 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>         return err;
>  }
>
> -/**
> - * ext4_trim_extent -- function to TRIM one single free extent in the group
> - * @sb:                super block for the file system
> - * @start:     starting block of the free extent in the alloc. group
> - * @count:     number of blocks to TRIM
> - * @e4b:       ext4 buddy for the group
> - *
> - * Trim "count" blocks starting at "start" in the "group". To assure that no
> - * one will allocate those blocks, mark it as used in buddy bitmap. This must
> - * be called with under the group lock.
> - */
> -static int ext4_trim_extent(struct super_block *sb,
> -               int start, int count, struct ext4_buddy *e4b)
> -__releases(bitlock)
> -__acquires(bitlock)
> -{
> -       struct ext4_free_extent ex;
> -       ext4_group_t group = e4b->bd_group;
> -       int ret = 0;
> -
> -       trace_ext4_trim_extent(sb, group, start, count);
> -
> -       assert_spin_locked(ext4_group_lock_ptr(sb, group));
> -
> -       ex.fe_start = start;
> -       ex.fe_group = group;
> -       ex.fe_len = count;
> -
> -       /*
> -        * Mark blocks used, so no one can reuse them while
> -        * being trimmed.
> -        */
> -       mb_mark_used(e4b, &ex);
> -       ext4_unlock_group(sb, group);
> -       ret = ext4_issue_discard(sb, group, start, count, NULL);
> -       ext4_lock_group(sb, group);
> -       mb_free_blocks(NULL, e4b, start, ex.fe_len);
> -       return ret;
> -}
> -
>  static int ext4_try_to_trim_range(struct super_block *sb,
>                 struct ext4_buddy *e4b, ext4_grpblk_t start,
>                 ext4_grpblk_t max, ext4_grpblk_t minblocks)
>  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
>  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>  {
> -       ext4_grpblk_t next, count, free_count;
> +       ext4_grpblk_t next, count, free_count, bak;
>         void *bitmap;
> +       struct ext4_free_data *entry = NULL, *fd, *nfd;
> +       struct list_head discard_data_list;
> +       struct bio *discard_bio = NULL;
> +       struct blk_plug plug;
> +       ext4_group_t group = e4b->bd_group;
> +       struct ext4_free_extent ex;
> +       bool noalloc = false;
> +       int ret = 0;
> +
> +       INIT_LIST_HEAD(&discard_data_list);
>
>         bitmap = e4b->bd_bitmap;
>         start = max(e4b->bd_info->bb_first_free, start);
>         count = 0;
>         free_count = 0;
>
> +       blk_start_plug(&plug);
>         while (start <= max) {
>                 start = mb_find_next_zero_bit(bitmap, max + 1, start);
>                 if (start > max)
>                         break;
> +               bak = start;
>                 next = mb_find_next_bit(bitmap, max + 1, start);
> -
>                 if ((next - start) >= minblocks) {
> -                       int ret = ext4_trim_extent(sb, start, next - start, e4b);
> +                       /* when only one segment, there is no need to alloc entry */
> +                       noalloc = (free_count == 0) && (next >= max);
>
> -                       if (ret && ret != -EOPNOTSUPP)
> +                       trace_ext4_trim_extent(sb, group, start, next - start);
> +                       ex.fe_start = start;
> +                       ex.fe_group = group;
> +                       ex.fe_len = next - start;
> +                       /*
> +                        * Mark blocks used, so no one can reuse them while
> +                        * being trimmed.
> +                        */
> +                       mb_mark_used(e4b, &ex);
> +                       ext4_unlock_group(sb, group);
> +                       ret = ext4_issue_discard(sb, group, start, next - start, &discard_bio);
> +                       if (!noalloc) {
> +                               entry = kmem_cache_alloc(ext4_free_data_cachep,
> +                                                       GFP_NOFS|__GFP_NOFAIL);
> +                               entry->efd_start_cluster = start;
> +                               entry->efd_count = next - start;
> +                               list_add_tail(&entry->efd_list, &discard_data_list);
> +                       }
> +                       ext4_lock_group(sb, group);
> +                       if (ret < 0)
>                                 break;
>                         count += next - start;
>                 }
> @@ -6959,6 +6950,22 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>                         break;
>         }
>
> +       if (discard_bio) {
> +               ext4_unlock_group(sb, e4b->bd_group);
> +               submit_bio_wait(discard_bio);
> +               bio_put(discard_bio);
> +               ext4_lock_group(sb, e4b->bd_group);
> +       }
> +       blk_finish_plug(&plug);
> +
> +       if (noalloc && free_count)
> +               mb_free_blocks(NULL, e4b, bak, free_count);
> +
> +       list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
> +               mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
> +               kmem_cache_free(ext4_free_data_cachep, fd);
> +       }
> +
>         return count;
>  }
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v6] ext4: improve trim efficiency
  2023-09-01  9:28 [PATCH v6] ext4: improve trim efficiency Fengnan Chang
  2023-09-15  9:19 ` Fengnan Chang
  2023-10-12 11:57 ` Fengnan Chang
@ 2024-01-08 17:15 ` Jan Kara
  2024-01-09 11:28   ` [External] " Fengnan Chang
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2024-01-08 17:15 UTC (permalink / raw)
  To: Fengnan Chang; +Cc: tytso, adilger.kernel, linux-ext4

On Fri 01-09-23 17:28:20, Fengnan Chang wrote:
> In commit a015434480dc("ext4: send parallel discards on commit
> completions"), issue all discard commands in parallel make all
> bios could merged into one request, so lowlevel drive can issue
> multi segments in one time which is more efficiency, but commit
> 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> seems broke this way, let's fix it.
> 
> In my test:
> 1. create 10 normal files, each file size is 10G.
> 2. deallocate file, punch a 16k holes every 32k.
> 3. trim all fs.
> the time of fstrim fs reduce from 6.7s to 1.3s.
> 
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>

This seems to have fallen through the cracks... I'm sorry for that.

>  static int ext4_try_to_trim_range(struct super_block *sb,
>  		struct ext4_buddy *e4b, ext4_grpblk_t start,
>  		ext4_grpblk_t max, ext4_grpblk_t minblocks)
>  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
>  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>  {
> -	ext4_grpblk_t next, count, free_count;
> +	ext4_grpblk_t next, count, free_count, bak;
>  	void *bitmap;
> +	struct ext4_free_data *entry = NULL, *fd, *nfd;
> +	struct list_head discard_data_list;
> +	struct bio *discard_bio = NULL;
> +	struct blk_plug plug;
> +	ext4_group_t group = e4b->bd_group;
> +	struct ext4_free_extent ex;
> +	bool noalloc = false;
> +	int ret = 0;
> +
> +	INIT_LIST_HEAD(&discard_data_list);
>  
>  	bitmap = e4b->bd_bitmap;
>  	start = max(e4b->bd_info->bb_first_free, start);
>  	count = 0;
>  	free_count = 0;
>  
> +	blk_start_plug(&plug);
>  	while (start <= max) {
>  		start = mb_find_next_zero_bit(bitmap, max + 1, start);
>  		if (start > max)
>  			break;
> +		bak = start;
>  		next = mb_find_next_bit(bitmap, max + 1, start);
> -
>  		if ((next - start) >= minblocks) {
> -			int ret = ext4_trim_extent(sb, start, next - start, e4b);
> +			/* when only one segment, there is no need to alloc entry */
> +			noalloc = (free_count == 0) && (next >= max);

Is the single extent case really worth the complications to save one
allocation? I don't think it is but maybe I'm missing something. Otherwise
the patch looks good to me!

								Honza

>  
> -			if (ret && ret != -EOPNOTSUPP)
> +			trace_ext4_trim_extent(sb, group, start, next - start);
> +			ex.fe_start = start;
> +			ex.fe_group = group;
> +			ex.fe_len = next - start;
> +			/*
> +			 * Mark blocks used, so no one can reuse them while
> +			 * being trimmed.
> +			 */
> +			mb_mark_used(e4b, &ex);
> +			ext4_unlock_group(sb, group);
> +			ret = ext4_issue_discard(sb, group, start, next - start, &discard_bio);
> +			if (!noalloc) {
> +				entry = kmem_cache_alloc(ext4_free_data_cachep,
> +							GFP_NOFS|__GFP_NOFAIL);
> +				entry->efd_start_cluster = start;
> +				entry->efd_count = next - start;
> +				list_add_tail(&entry->efd_list, &discard_data_list);
> +			}
> +			ext4_lock_group(sb, group);
> +			if (ret < 0)
>  				break;
>  			count += next - start;
>  		}
> @@ -6959,6 +6950,22 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
>  			break;
>  	}
>  
> +	if (discard_bio) {
> +		ext4_unlock_group(sb, e4b->bd_group);
> +		submit_bio_wait(discard_bio);
> +		bio_put(discard_bio);
> +		ext4_lock_group(sb, e4b->bd_group);
> +	}
> +	blk_finish_plug(&plug);
> +
> +	if (noalloc && free_count)
> +		mb_free_blocks(NULL, e4b, bak, free_count);
> +
> +	list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
> +		mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
> +		kmem_cache_free(ext4_free_data_cachep, fd);
> +	}
> +
>  	return count;
>  }
>  
> -- 
> 2.20.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH v6] ext4: improve trim efficiency
  2024-01-08 17:15 ` Jan Kara
@ 2024-01-09 11:28   ` Fengnan Chang
  2024-01-09 12:08     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Fengnan Chang @ 2024-01-09 11:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4

Jan Kara <jack@suse.cz> 于2024年1月9日周二 01:15写道：
>
> On Fri 01-09-23 17:28:20, Fengnan Chang wrote:
> > In commit a015434480dc("ext4: send parallel discards on commit
> > completions"), issue all discard commands in parallel make all
> > bios could merged into one request, so lowlevel drive can issue
> > multi segments in one time which is more efficiency, but commit
> > 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> > seems broke this way, let's fix it.
> >
> > In my test:
> > 1. create 10 normal files, each file size is 10G.
> > 2. deallocate file, punch a 16k holes every 32k.
> > 3. trim all fs.
> > the time of fstrim fs reduce from 6.7s to 1.3s.
> >
> > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
>
> This seems to have fallen through the cracks... I'm sorry for that.
>
> >  static int ext4_try_to_trim_range(struct super_block *sb,
> >               struct ext4_buddy *e4b, ext4_grpblk_t start,
> >               ext4_grpblk_t max, ext4_grpblk_t minblocks)
> >  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
> >  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
> >  {
> > -     ext4_grpblk_t next, count, free_count;
> > +     ext4_grpblk_t next, count, free_count, bak;
> >       void *bitmap;
> > +     struct ext4_free_data *entry = NULL, *fd, *nfd;
> > +     struct list_head discard_data_list;
> > +     struct bio *discard_bio = NULL;
> > +     struct blk_plug plug;
> > +     ext4_group_t group = e4b->bd_group;
> > +     struct ext4_free_extent ex;
> > +     bool noalloc = false;
> > +     int ret = 0;
> > +
> > +     INIT_LIST_HEAD(&discard_data_list);
> >
> >       bitmap = e4b->bd_bitmap;
> >       start = max(e4b->bd_info->bb_first_free, start);
> >       count = 0;
> >       free_count = 0;
> >
> > +     blk_start_plug(&plug);
> >       while (start <= max) {
> >               start = mb_find_next_zero_bit(bitmap, max + 1, start);
> >               if (start > max)
> >                       break;
> > +             bak = start;
> >               next = mb_find_next_bit(bitmap, max + 1, start);
> > -
> >               if ((next - start) >= minblocks) {
> > -                     int ret = ext4_trim_extent(sb, start, next - start, e4b);
> > +                     /* when only one segment, there is no need to alloc entry */
> > +                     noalloc = (free_count == 0) && (next >= max);
>
> Is the single extent case really worth the complications to save one
> allocation? I don't think it is but maybe I'm missing something. Otherwise
> the patch looks good to me!
yeah, it's necessary, if there is only one segment, alloc memory may cause
performance regression.
Refer to this https://lore.kernel.org/linux-ext4/CALWNXx-6y0=ZDBMicv2qng9pKHWcpJbCvUm9TaRBwg81WzWkWQ@mail.gmail.com/

Thanks.

>
>                                                                 Honza
>
> >
> > -                     if (ret && ret != -EOPNOTSUPP)
> > +                     trace_ext4_trim_extent(sb, group, start, next - start);
> > +                     ex.fe_start = start;
> > +                     ex.fe_group = group;
> > +                     ex.fe_len = next - start;
> > +                     /*
> > +                      * Mark blocks used, so no one can reuse them while
> > +                      * being trimmed.
> > +                      */
> > +                     mb_mark_used(e4b, &ex);
> > +                     ext4_unlock_group(sb, group);
> > +                     ret = ext4_issue_discard(sb, group, start, next - start, &discard_bio);
> > +                     if (!noalloc) {
> > +                             entry = kmem_cache_alloc(ext4_free_data_cachep,
> > +                                                     GFP_NOFS|__GFP_NOFAIL);
> > +                             entry->efd_start_cluster = start;
> > +                             entry->efd_count = next - start;
> > +                             list_add_tail(&entry->efd_list, &discard_data_list);
> > +                     }
> > +                     ext4_lock_group(sb, group);
> > +                     if (ret < 0)
> >                               break;
> >                       count += next - start;
> >               }
> > @@ -6959,6 +6950,22 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
> >                       break;
> >       }
> >
> > +     if (discard_bio) {
> > +             ext4_unlock_group(sb, e4b->bd_group);
> > +             submit_bio_wait(discard_bio);
> > +             bio_put(discard_bio);
> > +             ext4_lock_group(sb, e4b->bd_group);
> > +     }
> > +     blk_finish_plug(&plug);
> > +
> > +     if (noalloc && free_count)
> > +             mb_free_blocks(NULL, e4b, bak, free_count);
> > +
> > +     list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
> > +             mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
> > +             kmem_cache_free(ext4_free_data_cachep, fd);
> > +     }
> > +
> >       return count;
> >  }
> >
> > --
> > 2.20.1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH v6] ext4: improve trim efficiency
  2024-01-09 11:28   ` [External] " Fengnan Chang
@ 2024-01-09 12:08     ` Jan Kara
  2024-01-10  1:55       ` Fengnan Chang
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2024-01-09 12:08 UTC (permalink / raw)
  To: Fengnan Chang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4

On Tue 09-01-24 19:28:07, Fengnan Chang wrote:
> Jan Kara <jack@suse.cz> 于2024年1月9日周二 01:15写道：
> >
> > On Fri 01-09-23 17:28:20, Fengnan Chang wrote:
> > > In commit a015434480dc("ext4: send parallel discards on commit
> > > completions"), issue all discard commands in parallel make all
> > > bios could merged into one request, so lowlevel drive can issue
> > > multi segments in one time which is more efficiency, but commit
> > > 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> > > seems broke this way, let's fix it.
> > >
> > > In my test:
> > > 1. create 10 normal files, each file size is 10G.
> > > 2. deallocate file, punch a 16k holes every 32k.
> > > 3. trim all fs.
> > > the time of fstrim fs reduce from 6.7s to 1.3s.
> > >
> > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> >
> > This seems to have fallen through the cracks... I'm sorry for that.
> >
> > >  static int ext4_try_to_trim_range(struct super_block *sb,
> > >               struct ext4_buddy *e4b, ext4_grpblk_t start,
> > >               ext4_grpblk_t max, ext4_grpblk_t minblocks)
> > >  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
> > >  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
> > >  {
> > > -     ext4_grpblk_t next, count, free_count;
> > > +     ext4_grpblk_t next, count, free_count, bak;
> > >       void *bitmap;
> > > +     struct ext4_free_data *entry = NULL, *fd, *nfd;
> > > +     struct list_head discard_data_list;
> > > +     struct bio *discard_bio = NULL;
> > > +     struct blk_plug plug;
> > > +     ext4_group_t group = e4b->bd_group;
> > > +     struct ext4_free_extent ex;
> > > +     bool noalloc = false;
> > > +     int ret = 0;
> > > +
> > > +     INIT_LIST_HEAD(&discard_data_list);
> > >
> > >       bitmap = e4b->bd_bitmap;
> > >       start = max(e4b->bd_info->bb_first_free, start);
> > >       count = 0;
> > >       free_count = 0;
> > >
> > > +     blk_start_plug(&plug);
> > >       while (start <= max) {
> > >               start = mb_find_next_zero_bit(bitmap, max + 1, start);
> > >               if (start > max)
> > >                       break;
> > > +             bak = start;
> > >               next = mb_find_next_bit(bitmap, max + 1, start);
> > > -
> > >               if ((next - start) >= minblocks) {
> > > -                     int ret = ext4_trim_extent(sb, start, next - start, e4b);
> > > +                     /* when only one segment, there is no need to alloc entry */
> > > +                     noalloc = (free_count == 0) && (next >= max);
> >
> > Is the single extent case really worth the complications to save one
> > allocation? I don't think it is but maybe I'm missing something. Otherwise
> > the patch looks good to me!
> yeah, it's necessary, if there is only one segment, alloc memory may cause
> performance regression.
> Refer to this https://lore.kernel.org/linux-ext4/CALWNXx-6y0=ZDBMicv2qng9pKHWcpJbCvUm9TaRBwg81WzWkWQ@mail.gmail.com/

Ah, thanks for the reference! Then what I'd suggest is something like:

	struct ext4_free_data first_entry;
	/*
	 * We preallocate the first entry on stack to optimize for the common
	 * case of trimming single extent in each group. It has measurable
	 * performance impact.
	 */
	struct ext4_free_data *entry = &first_entry;

then when we allocate we do:

		if (!entry)
			entry = kmem_cache_alloc(...)
		entry->efd_start_cluster = start;
		entry->efd_count = next - start;
		list_add_tail(&entry->efd_list, &discard_data_list);
		entry = NULL;

and then when freeing we can have:

	list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
		mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
		if (fd != &first_entry)
			kmem_cache_free(ext4_free_data_cachep, fd);
	}

Then it is more understandable what's going on...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH v6] ext4: improve trim efficiency
  2024-01-09 12:08     ` Jan Kara
@ 2024-01-10  1:55       ` Fengnan Chang
  0 siblings, 0 replies; 7+ messages in thread
From: Fengnan Chang @ 2024-01-10  1:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4

Jan Kara <jack@suse.cz> 于2024年1月9日周二 20:09写道：
>
> On Tue 09-01-24 19:28:07, Fengnan Chang wrote:
> > Jan Kara <jack@suse.cz> 于2024年1月9日周二 01:15写道：
> > >
> > > On Fri 01-09-23 17:28:20, Fengnan Chang wrote:
> > > > In commit a015434480dc("ext4: send parallel discards on commit
> > > > completions"), issue all discard commands in parallel make all
> > > > bios could merged into one request, so lowlevel drive can issue
> > > > multi segments in one time which is more efficiency, but commit
> > > > 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")
> > > > seems broke this way, let's fix it.
> > > >
> > > > In my test:
> > > > 1. create 10 normal files, each file size is 10G.
> > > > 2. deallocate file, punch a 16k holes every 32k.
> > > > 3. trim all fs.
> > > > the time of fstrim fs reduce from 6.7s to 1.3s.
> > > >
> > > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > >
> > > This seems to have fallen through the cracks... I'm sorry for that.
> > >
> > > >  static int ext4_try_to_trim_range(struct super_block *sb,
> > > >               struct ext4_buddy *e4b, ext4_grpblk_t start,
> > > >               ext4_grpblk_t max, ext4_grpblk_t minblocks)
> > > >  __acquires(ext4_group_lock_ptr(sb, e4b->bd_group))
> > > >  __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
> > > >  {
> > > > -     ext4_grpblk_t next, count, free_count;
> > > > +     ext4_grpblk_t next, count, free_count, bak;
> > > >       void *bitmap;
> > > > +     struct ext4_free_data *entry = NULL, *fd, *nfd;
> > > > +     struct list_head discard_data_list;
> > > > +     struct bio *discard_bio = NULL;
> > > > +     struct blk_plug plug;
> > > > +     ext4_group_t group = e4b->bd_group;
> > > > +     struct ext4_free_extent ex;
> > > > +     bool noalloc = false;
> > > > +     int ret = 0;
> > > > +
> > > > +     INIT_LIST_HEAD(&discard_data_list);
> > > >
> > > >       bitmap = e4b->bd_bitmap;
> > > >       start = max(e4b->bd_info->bb_first_free, start);
> > > >       count = 0;
> > > >       free_count = 0;
> > > >
> > > > +     blk_start_plug(&plug);
> > > >       while (start <= max) {
> > > >               start = mb_find_next_zero_bit(bitmap, max + 1, start);
> > > >               if (start > max)
> > > >                       break;
> > > > +             bak = start;
> > > >               next = mb_find_next_bit(bitmap, max + 1, start);
> > > > -
> > > >               if ((next - start) >= minblocks) {
> > > > -                     int ret = ext4_trim_extent(sb, start, next - start, e4b);
> > > > +                     /* when only one segment, there is no need to alloc entry */
> > > > +                     noalloc = (free_count == 0) && (next >= max);
> > >
> > > Is the single extent case really worth the complications to save one
> > > allocation? I don't think it is but maybe I'm missing something. Otherwise
> > > the patch looks good to me!
> > yeah, it's necessary, if there is only one segment, alloc memory may cause
> > performance regression.
> > Refer to this https://lore.kernel.org/linux-ext4/CALWNXx-6y0=ZDBMicv2qng9pKHWcpJbCvUm9TaRBwg81WzWkWQ@mail.gmail.com/
>
> Ah, thanks for the reference! Then what I'd suggest is something like:
>
>         struct ext4_free_data first_entry;
>         /*
>          * We preallocate the first entry on stack to optimize for the common
>          * case of trimming single extent in each group. It has measurable
>          * performance impact.
>          */
>         struct ext4_free_data *entry = &first_entry;
>
> then when we allocate we do:
>
>                 if (!entry)
>                         entry = kmem_cache_alloc(...)
>                 entry->efd_start_cluster = start;
>                 entry->efd_count = next - start;
>                 list_add_tail(&entry->efd_list, &discard_data_list);
>                 entry = NULL;
>
> and then when freeing we can have:
>
>         list_for_each_entry_safe(fd, nfd, &discard_data_list, efd_list) {
>                 mb_free_blocks(NULL, e4b, fd->efd_start_cluster, fd->efd_count);
>                 if (fd != &first_entry)
>                         kmem_cache_free(ext4_free_data_cachep, fd);
>         }
>
> Then it is more understandable what's going on...
Looks better, I'll modify it in the next version.
Thanks.
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-10  1:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-01  9:28 [PATCH v6] ext4: improve trim efficiency Fengnan Chang
2023-09-15  9:19 ` Fengnan Chang
2023-10-12 11:57 ` Fengnan Chang
2024-01-08 17:15 ` Jan Kara
2024-01-09 11:28   ` [External] " Fengnan Chang
2024-01-09 12:08     ` Jan Kara
2024-01-10  1:55       ` Fengnan Chang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox