* [PATCH 0/2] btrfs: defrag: better lone extent handling
@ 2024-02-05 23:45 Qu Wenruo
2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-02-05 23:45 UTC (permalink / raw)
To: linux-btrfs
When a lone file extent (a file extent without any adjacent non-hole
file extent) is involved, it would haven no chance to be touched by
defrag.
This would mean that, if we have some lone extents with very low
utilization ratio, or defragging it can free up a lot of space, defrag
would not touch them no matter what.
This is not ideal for some situations, e.g.:
# mkfs.btrfs -f $dev
# mount $dev $mnt
# xfs_io -f -c "pwrite 0 128M" $mnt/foobar
# sync
# truncate -s 4k $mnt/foobar
# btrfs filesystem defrag $mnt/foobar
# sync
In above case, if defrag touches the 4k extent, it would free up the
whole 128M extent, which should be a good win.
This patchset would address the problem by introducing a special
entrance for lone file extents.
Lone file extents meeting either ratio or wasted bytes threshold would
be considered as a defrag target, allowing end uesrs to address above
situation.
This change requires progs support (or direct ioctl() calling), by
default they would be disabled.
And my personal recommendation for the ratio would be (4069 / 65536)
(aka, 1/16 of the on-disk file extent size), and 16MiB (if freeing up
the extent can free more than 16MiB, excluding the lone extent size).
Qu Wenruo (2):
btrfs: defrag: add under utilized extent to defrag target list
btrfs: defrag: allow fine-tuning on lone extent defrag behavior
fs/btrfs/defrag.c | 68 +++++++++++++++++++++++++++++++++-----
fs/btrfs/ioctl.c | 9 +++++
include/uapi/linux/btrfs.h | 28 +++++++++++++---
3 files changed, 92 insertions(+), 13 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list 2024-02-05 23:45 [PATCH 0/2] btrfs: defrag: better lone extent handling Qu Wenruo @ 2024-02-05 23:45 ` Qu Wenruo 2024-02-06 16:23 ` Filipe Manana 2024-02-05 23:45 ` [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior Qu Wenruo 2024-02-06 20:27 ` [PATCH 0/2] btrfs: defrag: better lone extent handling Christoph Anton Mitterer 2 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2024-02-05 23:45 UTC (permalink / raw) To: linux-btrfs; +Cc: Christoph Anton Mitterer [BUG] The following script can lead to a very under utilized extent and we have no way to use defrag to properly reclaim its wasted space: # mkfs.btrfs -f $dev # mount $dev $mnt # xfs_io -f -c "pwrite 0 128M" $mnt/foobar # sync # truncate -s 4k $mnt/foobar # btrfs filesystem defrag $mnt/foobar # sync After the above operations, the file "foobar" is still utilizing the whole 128M: item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160 generation 7 transid 8 size 4096 nbytes 4096 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 32770 flags 0x0(none) item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14 index 2 namelen 4 name: file item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 7 type 1 (regular) extent data disk byte 298844160 nr 134217728 <<< extent data offset 0 nr 4096 ram 134217728 extent compression 0 (none) This is the common btrfs bookend behavior, that 128M extent would only be freed if the last referencer of the extent is freed. The problem is, normally we would go defrag to free that 128M extent, but defrag would not touch the extent at all. [CAUSE] The file extent has no adjacent extent at all, thus all existing defrag code consider it a perfectly good file extent, even if it's only utilizing a very tiny amount of space. [FIX] For a file extent without any adjacent file extent, we should still consider to defrag such under utilized extent, base on the following conditions: - utilization ratio If the extent is utilizing less than 1/16 of the on-disk extent size, then it would be a defrag target. - wasted space If we defrag the extent and can free at least 16MiB, then it would be a defrag target. Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/defrag.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index 8fc8118c3225..85c6e45d0cd4 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -950,6 +950,38 @@ struct defrag_target_range { u64 len; }; +/* + * Special entry for extents that do not have any adjacent extents. + * + * This is for cases like the only and truncated extent of a file. + * Normally they won't be defraged at all (as they won't be merged with + * any adjacent ones), but we may still want to defrag them, to free up + * some space if possible. + */ +static bool should_defrag_under_utilized(struct extent_map *em) +{ + /* + * Ratio based check. + * + * If the current extent is only utilizing 1/16 of its on-disk size, + * it's definitely under-utilized, and defragging it may free up + * the whole extent. + */ + if (em->len < em->orig_block_len / 16) + return true; + + /* + * Wasted space based check. + * + * If we can free up at least 16MiB, then it may be a good idea + * to defrag. + */ + if (em->len < em->orig_block_len && + em->orig_block_len - em->len > SZ_16M) + return true; + return false; +} + /* * Collect all valid target extents. * @@ -1070,6 +1102,16 @@ static int defrag_collect_targets(struct btrfs_inode *inode, if (!next_mergeable) { struct defrag_target_range *last; + /* + * This is a single extent without any chance to merge + * with any adjacent extent. + * + * But if we may free up some space, it is still worth + * defragging. + */ + if (should_defrag_under_utilized(em)) + goto add; + /* Empty target list, no way to merge with last entry */ if (list_empty(target_list)) goto next; -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list 2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo @ 2024-02-06 16:23 ` Filipe Manana 2024-02-06 20:41 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Filipe Manana @ 2024-02-06 16:23 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, Christoph Anton Mitterer On Mon, Feb 5, 2024 at 11:46 PM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > The following script can lead to a very under utilized extent and we > have no way to use defrag to properly reclaim its wasted space: > > # mkfs.btrfs -f $dev > # mount $dev $mnt > # xfs_io -f -c "pwrite 0 128M" $mnt/foobar > # sync > # truncate -s 4k $mnt/foobar > # btrfs filesystem defrag $mnt/foobar > # sync > > After the above operations, the file "foobar" is still utilizing the > whole 128M: > > item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160 > generation 7 transid 8 size 4096 nbytes 4096 > block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 > sequence 32770 flags 0x0(none) > item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14 > index 2 namelen 4 name: file > item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 > generation 7 type 1 (regular) > extent data disk byte 298844160 nr 134217728 <<< > extent data offset 0 nr 4096 ram 134217728 > extent compression 0 (none) > > This is the common btrfs bookend behavior, that 128M extent would only > be freed if the last referencer of the extent is freed. > > The problem is, normally we would go defrag to free that 128M extent, > but defrag would not touch the extent at all. > > [CAUSE] > The file extent has no adjacent extent at all, thus all existing defrag > code consider it a perfectly good file extent, even if it's only > utilizing a very tiny amount of space. > > [FIX] > For a file extent without any adjacent file extent, we should still > consider to defrag such under utilized extent, base on the following > conditions: > > - utilization ratio > If the extent is utilizing less than 1/16 of the on-disk extent size, > then it would be a defrag target. > > - wasted space > If we defrag the extent and can free at least 16MiB, then it would be > a defrag target. > > Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> > Signed-off-by: Qu Wenruo <wqu@suse.com> As I said in my previous review, please include the lore link to the report, it's always useful. > --- > fs/btrfs/defrag.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > index 8fc8118c3225..85c6e45d0cd4 100644 > --- a/fs/btrfs/defrag.c > +++ b/fs/btrfs/defrag.c > @@ -950,6 +950,38 @@ struct defrag_target_range { > u64 len; > }; > > +/* > + * Special entry for extents that do not have any adjacent extents. > + * > + * This is for cases like the only and truncated extent of a file. > + * Normally they won't be defraged at all (as they won't be merged with > + * any adjacent ones), but we may still want to defrag them, to free up > + * some space if possible. > + */ > +static bool should_defrag_under_utilized(struct extent_map *em) Can be made const. > +{ > + /* > + * Ratio based check. > + * > + * If the current extent is only utilizing 1/16 of its on-disk size, > + * it's definitely under-utilized, and defragging it may free up > + * the whole extent. > + */ > + if (em->len < em->orig_block_len / 16) > + return true; > + > + /* > + * Wasted space based check. > + * > + * If we can free up at least 16MiB, then it may be a good idea > + * to defrag. > + */ > + if (em->len < em->orig_block_len && > + em->orig_block_len - em->len > SZ_16M) The first check, len < orig_block_len, is redundant, isn't it? em->len can only be less than or equals to em->orig_block_len. The second condition is enough to have. > + return true; > + return false; > +} > + > /* > * Collect all valid target extents. > * > @@ -1070,6 +1102,16 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > if (!next_mergeable) { > struct defrag_target_range *last; > > + /* > + * This is a single extent without any chance to merge > + * with any adjacent extent. > + * > + * But if we may free up some space, it is still worth > + * defragging. > + */ > + if (should_defrag_under_utilized(em)) > + goto add; > + So this logic is making some cases worse actually, making us use more disk space. For example: DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT xfs_io -f -c "pwrite 0 128M" $MNT/foobar sync xfs_io -c "truncate 40M" $MNT/foobar btrfs filesystem defrag $MNT/foobar After this patch, we get: item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 generation 7 type 1 (regular) extent data disk byte 1104150528 nr 134217728 extent data offset 0 nr 8650752 ram 134217728 extent compression 0 (none) item 7 key (257 EXTENT_DATA 8650752) itemoff 15757 itemsize 53 generation 8 type 1 (regular) extent data disk byte 1238368256 nr 33292288 extent data offset 0 nr 33292288 ram 33292288 extent compression 0 (none) So we're now using 128M + 32M of disk space where before defrag we used 128M. Before the defrag we had: item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 generation 7 type 1 (regular) extent data disk byte 1104150528 nr 134217728 extent data offset 0 nr 41943040 ram 134217728 extent compression 0 (none) So something's not good in this logic. Also, there's something else worth considering here, which is extent sharedness. If an extent is shared we don't want to always dirty the respective range and rewrite it, as that may result in using more disk space... Example: xfs_io -f -c "pwrite 0 128M" $MNT/foobar btrfs subvolume snapshot $MNT $MNT/snap xfs_io -c "truncate 16M" $MNT/foobar btrfs filesystem defrag $MNT/foobar We end up consuming an additional 16M of data without any benefits. Without this patch, this wouldn't happen. Thanks. > /* Empty target list, no way to merge with last entry */ > if (list_empty(target_list)) > goto next; > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list 2024-02-06 16:23 ` Filipe Manana @ 2024-02-06 20:41 ` Qu Wenruo 2024-02-06 22:26 ` Qu Wenruo 2024-02-07 12:19 ` Filipe Manana 0 siblings, 2 replies; 12+ messages in thread From: Qu Wenruo @ 2024-02-06 20:41 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs, Christoph Anton Mitterer On 2024/2/7 02:53, Filipe Manana wrote: > On Mon, Feb 5, 2024 at 11:46 PM Qu Wenruo <wqu@suse.com> wrote: >> >> [BUG] >> The following script can lead to a very under utilized extent and we >> have no way to use defrag to properly reclaim its wasted space: >> >> # mkfs.btrfs -f $dev >> # mount $dev $mnt >> # xfs_io -f -c "pwrite 0 128M" $mnt/foobar >> # sync >> # truncate -s 4k $mnt/foobar >> # btrfs filesystem defrag $mnt/foobar >> # sync >> >> After the above operations, the file "foobar" is still utilizing the >> whole 128M: >> >> item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160 >> generation 7 transid 8 size 4096 nbytes 4096 >> block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 >> sequence 32770 flags 0x0(none) >> item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14 >> index 2 namelen 4 name: file >> item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 >> generation 7 type 1 (regular) >> extent data disk byte 298844160 nr 134217728 <<< >> extent data offset 0 nr 4096 ram 134217728 >> extent compression 0 (none) >> >> This is the common btrfs bookend behavior, that 128M extent would only >> be freed if the last referencer of the extent is freed. >> >> The problem is, normally we would go defrag to free that 128M extent, >> but defrag would not touch the extent at all. >> >> [CAUSE] >> The file extent has no adjacent extent at all, thus all existing defrag >> code consider it a perfectly good file extent, even if it's only >> utilizing a very tiny amount of space. >> >> [FIX] >> For a file extent without any adjacent file extent, we should still >> consider to defrag such under utilized extent, base on the following >> conditions: >> >> - utilization ratio >> If the extent is utilizing less than 1/16 of the on-disk extent size, >> then it would be a defrag target. >> >> - wasted space >> If we defrag the extent and can free at least 16MiB, then it would be >> a defrag target. >> >> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > As I said in my previous review, please include the lore link to the > report, it's always useful. > >> --- >> fs/btrfs/defrag.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 42 insertions(+) >> >> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c >> index 8fc8118c3225..85c6e45d0cd4 100644 >> --- a/fs/btrfs/defrag.c >> +++ b/fs/btrfs/defrag.c >> @@ -950,6 +950,38 @@ struct defrag_target_range { >> u64 len; >> }; >> >> +/* >> + * Special entry for extents that do not have any adjacent extents. >> + * >> + * This is for cases like the only and truncated extent of a file. >> + * Normally they won't be defraged at all (as they won't be merged with >> + * any adjacent ones), but we may still want to defrag them, to free up >> + * some space if possible. >> + */ >> +static bool should_defrag_under_utilized(struct extent_map *em) > > Can be made const. > >> +{ >> + /* >> + * Ratio based check. >> + * >> + * If the current extent is only utilizing 1/16 of its on-disk size, >> + * it's definitely under-utilized, and defragging it may free up >> + * the whole extent. >> + */ >> + if (em->len < em->orig_block_len / 16) >> + return true; >> + >> + /* >> + * Wasted space based check. >> + * >> + * If we can free up at least 16MiB, then it may be a good idea >> + * to defrag. >> + */ >> + if (em->len < em->orig_block_len && >> + em->orig_block_len - em->len > SZ_16M) > > The first check, len < orig_block_len, is redundant, isn't it? > em->len can only be less than or equals to em->orig_block_len. > The second condition is enough to have. Not exactly, don't forget compressed file extents. > >> + return true; >> + return false; >> +} >> + >> /* >> * Collect all valid target extents. >> * >> @@ -1070,6 +1102,16 @@ static int defrag_collect_targets(struct btrfs_inode *inode, >> if (!next_mergeable) { >> struct defrag_target_range *last; >> >> + /* >> + * This is a single extent without any chance to merge >> + * with any adjacent extent. >> + * >> + * But if we may free up some space, it is still worth >> + * defragging. >> + */ >> + if (should_defrag_under_utilized(em)) >> + goto add; >> + > > So this logic is making some cases worse actually, making us use more > disk space. > For example: > > DEV=/dev/sdi > MNT=/mnt/sdi > > mkfs.btrfs -f $DEV > mount $DEV $MNT > > xfs_io -f -c "pwrite 0 128M" $MNT/foobar > sync > xfs_io -c "truncate 40M" $MNT/foobar > btrfs filesystem defrag $MNT/foobar > > After this patch, we get: > > item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 > generation 7 type 1 (regular) > extent data disk byte 1104150528 nr 134217728 > extent data offset 0 nr 8650752 ram 134217728 > extent compression 0 (none) > item 7 key (257 EXTENT_DATA 8650752) itemoff 15757 itemsize 53 > generation 8 type 1 (regular) > extent data disk byte 1238368256 nr 33292288 > extent data offset 0 nr 33292288 ram 33292288 > extent compression 0 (none) This behavior is unexpected, as we should redirty the whole 40M, but the first 8.25M didn't got re-dirtied is a big problem to me. Will look into the situation. > > So we're now using 128M + 32M of disk space where before defrag we used 128M. > Before the defrag we had: > > item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 > generation 7 type 1 (regular) > extent data disk byte 1104150528 nr 134217728 > extent data offset 0 nr 41943040 ram 134217728 > extent compression 0 (none) > > So something's not good in this logic. > > Also, there's something else worth considering here, which is extent sharedness. This is a known problem for defrag, thus for snapshot/reflink it's really on the end user to determine whether they really need defrag. Thanks, Qu > If an extent is shared we don't want to always dirty the respective > range and rewrite it, as that may result in using more disk space... > > Example: > > xfs_io -f -c "pwrite 0 128M" $MNT/foobar > btrfs subvolume snapshot $MNT $MNT/snap > xfs_io -c "truncate 16M" $MNT/foobar > btrfs filesystem defrag $MNT/foobar > > We end up consuming an additional 16M of data without any benefits. > Without this patch, this wouldn't happen. > > Thanks. > > >> /* Empty target list, no way to merge with last entry */ >> if (list_empty(target_list)) >> goto next; >> -- >> 2.43.0 >> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list 2024-02-06 20:41 ` Qu Wenruo @ 2024-02-06 22:26 ` Qu Wenruo 2024-02-07 12:19 ` Filipe Manana 1 sibling, 0 replies; 12+ messages in thread From: Qu Wenruo @ 2024-02-06 22:26 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs, Christoph Anton Mitterer On 2024/2/7 07:11, Qu Wenruo wrote: [...] >> >> item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 >> generation 7 type 1 (regular) >> extent data disk byte 1104150528 nr 134217728 >> extent data offset 0 nr 8650752 ram 134217728 >> extent compression 0 (none) >> item 7 key (257 EXTENT_DATA 8650752) itemoff 15757 itemsize 53 >> generation 8 type 1 (regular) >> extent data disk byte 1238368256 nr 33292288 >> extent data offset 0 nr 33292288 ram 33292288 >> extent compression 0 (none) > > This behavior is unexpected, as we should redirty the whole 40M, but the > first 8.25M didn't got re-dirtied is a big problem to me. > Will look into the situation. > Got the cause. The problem is related to a check in the existing code: range_len = em->len - (cur - em->start); /* Skip too large extent */ if (range_len >= extent_thresh) goto next; Since the range_len is to the end of the extent, thus at the beginning 8MiB it's always larger than the 32M extent threshold. But at around 8M, we no longer meet the 32M extent threshold thus begin to defrag the remaining 32M. To me, the check itself is not correct, it should go em->len instead. But on the other hand, that "goto next" would skip our wasted ratio/bytes checks. At least with that fixed, we won't make the situation worse though. I'll submit a fix for it with a test case. Thanks for exposing the bug, Qu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list 2024-02-06 20:41 ` Qu Wenruo 2024-02-06 22:26 ` Qu Wenruo @ 2024-02-07 12:19 ` Filipe Manana 1 sibling, 0 replies; 12+ messages in thread From: Filipe Manana @ 2024-02-07 12:19 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, Christoph Anton Mitterer On Tue, Feb 6, 2024 at 8:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2024/2/7 02:53, Filipe Manana wrote: > > On Mon, Feb 5, 2024 at 11:46 PM Qu Wenruo <wqu@suse.com> wrote: > >> > >> [BUG] > >> The following script can lead to a very under utilized extent and we > >> have no way to use defrag to properly reclaim its wasted space: > >> > >> # mkfs.btrfs -f $dev > >> # mount $dev $mnt > >> # xfs_io -f -c "pwrite 0 128M" $mnt/foobar > >> # sync > >> # truncate -s 4k $mnt/foobar > >> # btrfs filesystem defrag $mnt/foobar > >> # sync > >> > >> After the above operations, the file "foobar" is still utilizing the > >> whole 128M: > >> > >> item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160 > >> generation 7 transid 8 size 4096 nbytes 4096 > >> block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 > >> sequence 32770 flags 0x0(none) > >> item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14 > >> index 2 namelen 4 name: file > >> item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 > >> generation 7 type 1 (regular) > >> extent data disk byte 298844160 nr 134217728 <<< > >> extent data offset 0 nr 4096 ram 134217728 > >> extent compression 0 (none) > >> > >> This is the common btrfs bookend behavior, that 128M extent would only > >> be freed if the last referencer of the extent is freed. > >> > >> The problem is, normally we would go defrag to free that 128M extent, > >> but defrag would not touch the extent at all. > >> > >> [CAUSE] > >> The file extent has no adjacent extent at all, thus all existing defrag > >> code consider it a perfectly good file extent, even if it's only > >> utilizing a very tiny amount of space. > >> > >> [FIX] > >> For a file extent without any adjacent file extent, we should still > >> consider to defrag such under utilized extent, base on the following > >> conditions: > >> > >> - utilization ratio > >> If the extent is utilizing less than 1/16 of the on-disk extent size, > >> then it would be a defrag target. > >> > >> - wasted space > >> If we defrag the extent and can free at least 16MiB, then it would be > >> a defrag target. > >> > >> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > > > As I said in my previous review, please include the lore link to the > > report, it's always useful. > > > >> --- > >> fs/btrfs/defrag.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 42 insertions(+) > >> > >> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > >> index 8fc8118c3225..85c6e45d0cd4 100644 > >> --- a/fs/btrfs/defrag.c > >> +++ b/fs/btrfs/defrag.c > >> @@ -950,6 +950,38 @@ struct defrag_target_range { > >> u64 len; > >> }; > >> > >> +/* > >> + * Special entry for extents that do not have any adjacent extents. > >> + * > >> + * This is for cases like the only and truncated extent of a file. > >> + * Normally they won't be defraged at all (as they won't be merged with > >> + * any adjacent ones), but we may still want to defrag them, to free up > >> + * some space if possible. > >> + */ > >> +static bool should_defrag_under_utilized(struct extent_map *em) > > > > Can be made const. > > > >> +{ > >> + /* > >> + * Ratio based check. > >> + * > >> + * If the current extent is only utilizing 1/16 of its on-disk size, > >> + * it's definitely under-utilized, and defragging it may free up > >> + * the whole extent. > >> + */ > >> + if (em->len < em->orig_block_len / 16) > >> + return true; > >> + > >> + /* > >> + * Wasted space based check. > >> + * > >> + * If we can free up at least 16MiB, then it may be a good idea > >> + * to defrag. > >> + */ > >> + if (em->len < em->orig_block_len && > >> + em->orig_block_len - em->len > SZ_16M) > > > > The first check, len < orig_block_len, is redundant, isn't it? > > em->len can only be less than or equals to em->orig_block_len. > > The second condition is enough to have. > > Not exactly, don't forget compressed file extents. Right. > > > > >> + return true; > >> + return false; > >> +} > >> + > >> /* > >> * Collect all valid target extents. > >> * > >> @@ -1070,6 +1102,16 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > >> if (!next_mergeable) { > >> struct defrag_target_range *last; > >> > >> + /* > >> + * This is a single extent without any chance to merge > >> + * with any adjacent extent. > >> + * > >> + * But if we may free up some space, it is still worth > >> + * defragging. > >> + */ > >> + if (should_defrag_under_utilized(em)) > >> + goto add; > >> + > > > > So this logic is making some cases worse actually, making us use more > > disk space. > > For example: > > > > DEV=/dev/sdi > > MNT=/mnt/sdi > > > > mkfs.btrfs -f $DEV > > mount $DEV $MNT > > > > xfs_io -f -c "pwrite 0 128M" $MNT/foobar > > sync > > xfs_io -c "truncate 40M" $MNT/foobar > > btrfs filesystem defrag $MNT/foobar > > > > After this patch, we get: > > > > item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 > > generation 7 type 1 (regular) > > extent data disk byte 1104150528 nr 134217728 > > extent data offset 0 nr 8650752 ram 134217728 > > extent compression 0 (none) > > item 7 key (257 EXTENT_DATA 8650752) itemoff 15757 itemsize 53 > > generation 8 type 1 (regular) > > extent data disk byte 1238368256 nr 33292288 > > extent data offset 0 nr 33292288 ram 33292288 > > extent compression 0 (none) > > This behavior is unexpected, as we should redirty the whole 40M, but the > first 8.25M didn't got re-dirtied is a big problem to me. > Will look into the situation. > > > > > So we're now using 128M + 32M of disk space where before defrag we used 128M. > > Before the defrag we had: > > > > item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 > > generation 7 type 1 (regular) > > extent data disk byte 1104150528 nr 134217728 > > extent data offset 0 nr 41943040 ram 134217728 > > extent compression 0 (none) > > > > So something's not good in this logic. > > > > Also, there's something else worth considering here, which is extent sharedness. > > This is a known problem for defrag, thus for snapshot/reflink it's > really on the end user to determine whether they really need defrag. It is a known problem for defrag that is intentional, because the goal of defrag is to optimize extent layout, reduce the number of extents, and have larger extents to optimize reads. However this new behavior, rewriting "lone" extents, we are adding to here defrag is different, as its goal is to release unused space. So having it increase space in case an extent is shared, goes against the goal of eliminating wasted space. From a user perspective it's confusing. It's documented that for the normal mode we can break extent sharing because the intention is to optimize extent layout to optimize reads (and also reduce metadata). Now this new defrag mode, which gets a new flag in the path, its goal is only to reduce wasted space... > > Thanks, > Qu > > If an extent is shared we don't want to always dirty the respective > > range and rewrite it, as that may result in using more disk space... > > > > Example: > > > > xfs_io -f -c "pwrite 0 128M" $MNT/foobar > > btrfs subvolume snapshot $MNT $MNT/snap > > xfs_io -c "truncate 16M" $MNT/foobar > > btrfs filesystem defrag $MNT/foobar > > > > We end up consuming an additional 16M of data without any benefits. > > Without this patch, this wouldn't happen. > > > > Thanks. > > > > > >> /* Empty target list, no way to merge with last entry */ > >> if (list_empty(target_list)) > >> goto next; > >> -- > >> 2.43.0 > >> > >> > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior 2024-02-05 23:45 [PATCH 0/2] btrfs: defrag: better lone extent handling Qu Wenruo 2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo @ 2024-02-05 23:45 ` Qu Wenruo 2024-02-06 17:03 ` Filipe Manana 2024-02-06 20:27 ` [PATCH 0/2] btrfs: defrag: better lone extent handling Christoph Anton Mitterer 2 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2024-02-05 23:45 UTC (permalink / raw) To: linux-btrfs Previously we're using a fixed ratio and fixed bytes for lone extents defragging, which may not fit all situations. This patch would enhance the behavior by allowing fine-tuning using some extra members inside btrfs_ioctl_defrag_range_args. This would introduce two flags and two new members: - BTRFS_DEFRAG_RANGE_LONE_RATIO and BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES With these flags set, defrag would consider lone extents with their utilization ratio and wasted bytes as a defrag condition. - lone_ratio This is a u32 value, but only [0, 65536] is allowed (still beyond u16 range, thus have to go u32). 0 means disable lone ratio detection. [1, 65536] means the ratio. If a lone extent is utilizing less than (lone_ratio / 65536) * on-disk size of an extent, it would be considered as a defrag target. - lone_wasted_bytes This is a u32 value. If we free the lone extent, and can free up to @lone_wasted_bytes (excluding the extent itself), then it would be considered as a defrag target. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/defrag.c | 40 ++++++++++++++++++++++++-------------- fs/btrfs/ioctl.c | 9 +++++++++ include/uapi/linux/btrfs.h | 28 +++++++++++++++++++++----- 3 files changed, 57 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index 85c6e45d0cd4..3566845ee3e6 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -958,26 +958,28 @@ struct defrag_target_range { * any adjacent ones), but we may still want to defrag them, to free up * some space if possible. */ -static bool should_defrag_under_utilized(struct extent_map *em) +static bool should_defrag_under_utilized(struct extent_map *em, u32 lone_ratio, + u32 lone_wasted_bytes) { /* * Ratio based check. * - * If the current extent is only utilizing 1/16 of its on-disk size, + * If the current extent is only utilizing less than + * (%lone_ratio/65536) of its on-disk size, * it's definitely under-utilized, and defragging it may free up * the whole extent. */ - if (em->len < em->orig_block_len / 16) + if (lone_ratio && em->len < em->orig_block_len * lone_ratio / 65536) return true; /* * Wasted space based check. * - * If we can free up at least 16MiB, then it may be a good idea - * to defrag. + * If we can free up at least @lone_wasted_bytes, then it may be a + * good idea to defrag. */ if (em->len < em->orig_block_len && - em->orig_block_len - em->len > SZ_16M) + em->orig_block_len - em->len > lone_wasted_bytes) return true; return false; } @@ -998,7 +1000,8 @@ static bool should_defrag_under_utilized(struct extent_map *em) */ static int defrag_collect_targets(struct btrfs_inode *inode, u64 start, u64 len, u32 extent_thresh, - u64 newer_than, bool do_compress, + u64 newer_than, u32 lone_ratio, + u32 lone_wasted_bytes, bool do_compress, bool locked, struct list_head *target_list, u64 *last_scanned_ret) { @@ -1109,7 +1112,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, * But if we may free up some space, it is still worth * defragging. */ - if (should_defrag_under_utilized(em)) + if (should_defrag_under_utilized(em, lone_ratio, lone_wasted_bytes)) goto add; /* Empty target list, no way to merge with last entry */ @@ -1240,7 +1243,8 @@ static int defrag_one_locked_target(struct btrfs_inode *inode, } static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, - u32 extent_thresh, u64 newer_than, bool do_compress, + u32 extent_thresh, u64 newer_than, + u32 lone_ratio, u32 lone_wasted_bytes, bool do_compress, u64 *last_scanned_ret) { struct extent_state *cached_state = NULL; @@ -1286,7 +1290,8 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, * so that we won't relock the extent range and cause deadlock. */ ret = defrag_collect_targets(inode, start, len, extent_thresh, - newer_than, do_compress, true, + newer_than, lone_ratio, lone_wasted_bytes, + do_compress, true, &target_list, last_scanned_ret); if (ret < 0) goto unlock_extent; @@ -1318,7 +1323,9 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, static int defrag_one_cluster(struct btrfs_inode *inode, struct file_ra_state *ra, u64 start, u32 len, u32 extent_thresh, - u64 newer_than, bool do_compress, + u64 newer_than, u32 lone_ratio, + u32 lone_wasted_bytes, + bool do_compress, unsigned long *sectors_defragged, unsigned long max_sectors, u64 *last_scanned_ret) @@ -1330,8 +1337,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode, int ret; ret = defrag_collect_targets(inode, start, len, extent_thresh, - newer_than, do_compress, false, - &target_list, NULL); + newer_than, lone_ratio, lone_wasted_bytes, + do_compress, false, &target_list, NULL); if (ret < 0) goto out; @@ -1369,7 +1376,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode, * accounting. */ ret = defrag_one_range(inode, entry->start, range_len, - extent_thresh, newer_than, do_compress, + extent_thresh, newer_than, lone_ratio, + lone_wasted_bytes, do_compress, last_scanned_ret); if (ret < 0) break; @@ -1495,7 +1503,9 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, BTRFS_I(inode)->defrag_compress = compress_type; ret = defrag_one_cluster(BTRFS_I(inode), ra, cur, cluster_end + 1 - cur, extent_thresh, - newer_than, do_compress, §ors_defragged, + newer_than, range->lone_ratio, + range->lone_wasted_bytes, do_compress, + §ors_defragged, max_to_defrag, &last_scanned); if (sectors_defragged > prev_sectors_defragged) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 46f9a6645bf6..d2d9b6d440b3 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2616,6 +2616,15 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) range.flags |= BTRFS_DEFRAG_RANGE_START_IO; range.extent_thresh = (u32)-1; } + + if (!(range.flags & BTRFS_DEFRAG_RANGE_LONE_RATIO)) + range.lone_ratio = 0; + else if (range.lone_ratio > 65536) { + ret = -EINVAL; + goto out; + } + if (!(range.flags & BTRFS_DEFRAG_RANGE_LONE_RATIO)) + range.lone_wasted_bytes = U32_MAX; } else { /* the rest are all set to zero by kzalloc */ range.len = (u64)-1; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index f8bc34a6bcfa..3d5517c33f42 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -612,10 +612,14 @@ struct btrfs_ioctl_clone_range_args { * Used by: * struct btrfs_ioctl_defrag_range_args.flags */ -#define BTRFS_DEFRAG_RANGE_COMPRESS 1 -#define BTRFS_DEFRAG_RANGE_START_IO 2 -#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS | \ - BTRFS_DEFRAG_RANGE_START_IO) +#define BTRFS_DEFRAG_RANGE_COMPRESS (1ULL << 0) +#define BTRFS_DEFRAG_RANGE_START_IO (1ULL << 1) +#define BTRFS_DEFRAG_RANGE_LONE_RATIO (1ULL << 2) +#define BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES (1ULL << 3) +#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS | \ + BTRFS_DEFRAG_RANGE_START_IO | \ + BTRFS_DEFRAG_RANGE_LONE_RATIO |\ + BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES) struct btrfs_ioctl_defrag_range_args { /* start of the defrag operation */ @@ -644,8 +648,22 @@ struct btrfs_ioctl_defrag_range_args { */ __u32 compress_type; + /* + * If a lone extent (has no adjacent file extent) is utilizing less + * than the ratio (0 means 0%, while 65536 means 100%) of the on-disk + * space, it would be considered as a defrag target. + */ + __u32 lone_ratio; + + /* + * If we defrag a lone extent (has no adjacent file extent) can free + * up more space than this value (in bytes), it would be considered + * as a defrag target. + */ + __u32 lone_wasted_bytes; + /* spare for later */ - __u32 unused[4]; + __u32 unused[2]; }; -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior 2024-02-05 23:45 ` [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior Qu Wenruo @ 2024-02-06 17:03 ` Filipe Manana 2024-02-06 20:50 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Filipe Manana @ 2024-02-06 17:03 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Feb 5, 2024 at 11:48 PM Qu Wenruo <wqu@suse.com> wrote: > > Previously we're using a fixed ratio and fixed bytes for lone extents > defragging, which may not fit all situations. > > This patch would enhance the behavior by allowing fine-tuning using some > extra members inside btrfs_ioctl_defrag_range_args. > > This would introduce two flags and two new members: > > - BTRFS_DEFRAG_RANGE_LONE_RATIO and BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES > With these flags set, defrag would consider lone extents with their > utilization ratio and wasted bytes as a defrag condition. > > - lone_ratio > This is a u32 value, but only [0, 65536] is allowed (still beyond u16 > range, thus have to go u32). > 0 means disable lone ratio detection. > [1, 65536] means the ratio. If a lone extent is utilizing less than > (lone_ratio / 65536) * on-disk size of an extent, it would be > considered as a defrag target. > > - lone_wasted_bytes > This is a u32 value. > If we free the lone extent, and can free up to @lone_wasted_bytes > (excluding the extent itself), then it would be considered as a defrag > target. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/defrag.c | 40 ++++++++++++++++++++++++-------------- > fs/btrfs/ioctl.c | 9 +++++++++ > include/uapi/linux/btrfs.h | 28 +++++++++++++++++++++----- > 3 files changed, 57 insertions(+), 20 deletions(-) > > diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > index 85c6e45d0cd4..3566845ee3e6 100644 > --- a/fs/btrfs/defrag.c > +++ b/fs/btrfs/defrag.c > @@ -958,26 +958,28 @@ struct defrag_target_range { > * any adjacent ones), but we may still want to defrag them, to free up > * some space if possible. > */ > -static bool should_defrag_under_utilized(struct extent_map *em) > +static bool should_defrag_under_utilized(struct extent_map *em, u32 lone_ratio, > + u32 lone_wasted_bytes) > { > /* > * Ratio based check. > * > - * If the current extent is only utilizing 1/16 of its on-disk size, > + * If the current extent is only utilizing less than > + * (%lone_ratio/65536) of its on-disk size, I really don't understand what this notation "(%lone_ratio/65536)" means... > * it's definitely under-utilized, and defragging it may free up > * the whole extent. > */ > - if (em->len < em->orig_block_len / 16) > + if (lone_ratio && em->len < em->orig_block_len * lone_ratio / 65536) Why don't you use SZ_64K everywhere instead of 65536? > return true; > > /* > * Wasted space based check. > * > - * If we can free up at least 16MiB, then it may be a good idea > - * to defrag. > + * If we can free up at least @lone_wasted_bytes, then it may be a > + * good idea to defrag. > */ > if (em->len < em->orig_block_len && > - em->orig_block_len - em->len > SZ_16M) > + em->orig_block_len - em->len > lone_wasted_bytes) According to the comment it should be >=. So either fix the comment, or the comparison. > return true; > return false; > } > @@ -998,7 +1000,8 @@ static bool should_defrag_under_utilized(struct extent_map *em) > */ > static int defrag_collect_targets(struct btrfs_inode *inode, > u64 start, u64 len, u32 extent_thresh, > - u64 newer_than, bool do_compress, > + u64 newer_than, u32 lone_ratio, > + u32 lone_wasted_bytes, bool do_compress, > bool locked, struct list_head *target_list, > u64 *last_scanned_ret) > { > @@ -1109,7 +1112,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > * But if we may free up some space, it is still worth > * defragging. > */ > - if (should_defrag_under_utilized(em)) > + if (should_defrag_under_utilized(em, lone_ratio, lone_wasted_bytes)) > goto add; > > /* Empty target list, no way to merge with last entry */ > @@ -1240,7 +1243,8 @@ static int defrag_one_locked_target(struct btrfs_inode *inode, > } > > static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, > - u32 extent_thresh, u64 newer_than, bool do_compress, > + u32 extent_thresh, u64 newer_than, > + u32 lone_ratio, u32 lone_wasted_bytes, bool do_compress, > u64 *last_scanned_ret) > { > struct extent_state *cached_state = NULL; > @@ -1286,7 +1290,8 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, > * so that we won't relock the extent range and cause deadlock. > */ > ret = defrag_collect_targets(inode, start, len, extent_thresh, > - newer_than, do_compress, true, > + newer_than, lone_ratio, lone_wasted_bytes, > + do_compress, true, > &target_list, last_scanned_ret); > if (ret < 0) > goto unlock_extent; > @@ -1318,7 +1323,9 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len, > static int defrag_one_cluster(struct btrfs_inode *inode, > struct file_ra_state *ra, > u64 start, u32 len, u32 extent_thresh, > - u64 newer_than, bool do_compress, > + u64 newer_than, u32 lone_ratio, > + u32 lone_wasted_bytes, > + bool do_compress, > unsigned long *sectors_defragged, > unsigned long max_sectors, > u64 *last_scanned_ret) > @@ -1330,8 +1337,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode, > int ret; > > ret = defrag_collect_targets(inode, start, len, extent_thresh, > - newer_than, do_compress, false, > - &target_list, NULL); > + newer_than, lone_ratio, lone_wasted_bytes, > + do_compress, false, &target_list, NULL); > if (ret < 0) > goto out; > > @@ -1369,7 +1376,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode, > * accounting. > */ > ret = defrag_one_range(inode, entry->start, range_len, > - extent_thresh, newer_than, do_compress, > + extent_thresh, newer_than, lone_ratio, > + lone_wasted_bytes, do_compress, > last_scanned_ret); > if (ret < 0) > break; > @@ -1495,7 +1503,9 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, > BTRFS_I(inode)->defrag_compress = compress_type; > ret = defrag_one_cluster(BTRFS_I(inode), ra, cur, > cluster_end + 1 - cur, extent_thresh, > - newer_than, do_compress, §ors_defragged, > + newer_than, range->lone_ratio, > + range->lone_wasted_bytes, do_compress, > + §ors_defragged, > max_to_defrag, &last_scanned); > > if (sectors_defragged > prev_sectors_defragged) > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 46f9a6645bf6..d2d9b6d440b3 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2616,6 +2616,15 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) > range.flags |= BTRFS_DEFRAG_RANGE_START_IO; > range.extent_thresh = (u32)-1; > } > + > + if (!(range.flags & BTRFS_DEFRAG_RANGE_LONE_RATIO)) > + range.lone_ratio = 0; > + else if (range.lone_ratio > 65536) { > + ret = -EINVAL; > + goto out; > + } > + if (!(range.flags & BTRFS_DEFRAG_RANGE_LONE_RATIO)) > + range.lone_wasted_bytes = U32_MAX; > } else { > /* the rest are all set to zero by kzalloc */ > range.len = (u64)-1; > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h > index f8bc34a6bcfa..3d5517c33f42 100644 > --- a/include/uapi/linux/btrfs.h > +++ b/include/uapi/linux/btrfs.h > @@ -612,10 +612,14 @@ struct btrfs_ioctl_clone_range_args { > * Used by: > * struct btrfs_ioctl_defrag_range_args.flags > */ > -#define BTRFS_DEFRAG_RANGE_COMPRESS 1 > -#define BTRFS_DEFRAG_RANGE_START_IO 2 > -#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS | \ > - BTRFS_DEFRAG_RANGE_START_IO) > +#define BTRFS_DEFRAG_RANGE_COMPRESS (1ULL << 0) > +#define BTRFS_DEFRAG_RANGE_START_IO (1ULL << 1) > +#define BTRFS_DEFRAG_RANGE_LONE_RATIO (1ULL << 2) > +#define BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES (1ULL << 3) > +#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS | \ > + BTRFS_DEFRAG_RANGE_START_IO | \ > + BTRFS_DEFRAG_RANGE_LONE_RATIO |\ > + BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES) > > struct btrfs_ioctl_defrag_range_args { > /* start of the defrag operation */ > @@ -644,8 +648,22 @@ struct btrfs_ioctl_defrag_range_args { > */ > __u32 compress_type; > > + /* > + * If a lone extent (has no adjacent file extent) is utilizing less > + * than the ratio (0 means 0%, while 65536 means 100%) of the on-disk > + * space, it would be considered as a defrag target. > + */ > + __u32 lone_ratio; > + > + /* > + * If we defrag a lone extent (has no adjacent file extent) can free Using this term "lone" is confusing for me, and this description also suggests it's about a file with only one extent. I would call it wasted_ratio, and the other one wasted_bytes, without the "lone" in the name. Because a case like this: xfs_io -f -c "pwrite 0 4K" $MNT/foobar sync xfs_io -c "pwrite 8K 128M" $MNT/foobar sync xfs_io -c "truncate 16K" $MNT/foobar btrfs filesystem defrag $MNT/foobar There's an adjacent extent, it's simply not mergeable, but we want to rewrite the second extent to avoid pinning 128M - 8K of data space. A similar example with a next extent that is not mergeable is also doable, or examples with previous and next extents that aren't mergeable. And I still don't get the logic of using 65536... Why not have the ratio as an integer between 0 and 99? That's a lot more intuitive IMO... Thanks. > + * up more space than this value (in bytes), it would be considered > + * as a defrag target. > + */ > + __u32 lone_wasted_bytes; > + > /* spare for later */ > - __u32 unused[4]; > + __u32 unused[2]; > }; > > > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior 2024-02-06 17:03 ` Filipe Manana @ 2024-02-06 20:50 ` Qu Wenruo 2024-02-07 12:27 ` Filipe Manana 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2024-02-06 20:50 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs On 2024/2/7 03:33, Filipe Manana wrote: > On Mon, Feb 5, 2024 at 11:48 PM Qu Wenruo <wqu@suse.com> wrote: >> >> Previously we're using a fixed ratio and fixed bytes for lone extents >> defragging, which may not fit all situations. >> >> This patch would enhance the behavior by allowing fine-tuning using some >> extra members inside btrfs_ioctl_defrag_range_args. >> >> This would introduce two flags and two new members: >> >> - BTRFS_DEFRAG_RANGE_LONE_RATIO and BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES >> With these flags set, defrag would consider lone extents with their >> utilization ratio and wasted bytes as a defrag condition. >> >> - lone_ratio >> This is a u32 value, but only [0, 65536] is allowed (still beyond u16 >> range, thus have to go u32). >> 0 means disable lone ratio detection. >> [1, 65536] means the ratio. If a lone extent is utilizing less than >> (lone_ratio / 65536) * on-disk size of an extent, it would be >> considered as a defrag target. >> >> - lone_wasted_bytes >> This is a u32 value. >> If we free the lone extent, and can free up to @lone_wasted_bytes >> (excluding the extent itself), then it would be considered as a defrag >> target. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/defrag.c | 40 ++++++++++++++++++++++++-------------- >> fs/btrfs/ioctl.c | 9 +++++++++ >> include/uapi/linux/btrfs.h | 28 +++++++++++++++++++++----- >> 3 files changed, 57 insertions(+), 20 deletions(-) >> >> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c >> index 85c6e45d0cd4..3566845ee3e6 100644 >> --- a/fs/btrfs/defrag.c >> +++ b/fs/btrfs/defrag.c >> @@ -958,26 +958,28 @@ struct defrag_target_range { >> * any adjacent ones), but we may still want to defrag them, to free up >> * some space if possible. >> */ >> -static bool should_defrag_under_utilized(struct extent_map *em) >> +static bool should_defrag_under_utilized(struct extent_map *em, u32 lone_ratio, >> + u32 lone_wasted_bytes) >> { >> /* >> * Ratio based check. >> * >> - * If the current extent is only utilizing 1/16 of its on-disk size, >> + * If the current extent is only utilizing less than >> + * (%lone_ratio/65536) of its on-disk size, > > I really don't understand what this notation "(%lone_ratio/65536)" means... That means the ratio is between [0, 65536]. 0 means disable ratio check, 1 means 1/65536 of the on-disk size and 65536 mean 65536/65536 of the on-disk size (aka, all regular extents would meet the ratio) As normally we don't do any float inside kernel, the ratio must be some unsigned int/long based. Any better ideas on the description? > >> * it's definitely under-utilized, and defragging it may free up >> * the whole extent. >> */ >> - if (em->len < em->orig_block_len / 16) >> + if (lone_ratio && em->len < em->orig_block_len * lone_ratio / 65536) > > Why don't you use SZ_64K everywhere instead of 65536? Maybe for the ratio, I should use some macro to define the division base. > >> return true; >> >> /* >> * Wasted space based check. >> * >> - * If we can free up at least 16MiB, then it may be a good idea >> - * to defrag. >> + * If we can free up at least @lone_wasted_bytes, then it may be a >> + * good idea to defrag. >> */ >> if (em->len < em->orig_block_len && >> - em->orig_block_len - em->len > SZ_16M) >> + em->orig_block_len - em->len > lone_wasted_bytes) > > According to the comment it should be >=. So either fix the comment, > or the comparison. OK, would change that. [...] >> + >> + /* >> + * If we defrag a lone extent (has no adjacent file extent) can free > > Using this term "lone" is confusing for me, and this description also > suggests it's about a file with only one extent. > I would call it wasted_ratio, and the other one wasted_bytes, without > the "lone" in the name. The "lone" would describe a file extent without non-hole neighbor file extent, thus I though "lone" would be good enough, but it's not. > > Because a case like this: > > xfs_io -f -c "pwrite 0 4K" $MNT/foobar > sync > xfs_io -c "pwrite 8K 128M" $MNT/foobar > sync > xfs_io -c "truncate 16K" $MNT/foobar > btrfs filesystem defrag $MNT/foobar > > There's an adjacent extent, it's simply not mergeable, but we want to > rewrite the second extent to avoid pinning 128M - 8K of data space. > A similar example with a next extent that is not mergeable is also > doable, or examples with previous and next extents that aren't > mergeable. Yeah, such extent would always have hole neighbors, just not mergeable. Thus "lone" is not good enough. > > And I still don't get the logic of using 65536... Why not have the > ratio as an integer between 0 and 99? That's a lot more intuitive > IMO... The choose of 65536 is to make the division faster, as regular division is always slow no matter whatever arch we're one. "/ 65536" would be optimized by compiler to ">> 4". But I guess I'm over optimizing here? Thanks, Qu > > Thanks. > >> + * up more space than this value (in bytes), it would be considered >> + * as a defrag target. >> + */ >> + __u32 lone_wasted_bytes; >> + >> /* spare for later */ >> - __u32 unused[4]; >> + __u32 unused[2]; >> }; >> >> >> -- >> 2.43.0 >> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior 2024-02-06 20:50 ` Qu Wenruo @ 2024-02-07 12:27 ` Filipe Manana 2024-02-07 17:40 ` Christoph Anton Mitterer 0 siblings, 1 reply; 12+ messages in thread From: Filipe Manana @ 2024-02-07 12:27 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Tue, Feb 6, 2024 at 8:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2024/2/7 03:33, Filipe Manana wrote: > > On Mon, Feb 5, 2024 at 11:48 PM Qu Wenruo <wqu@suse.com> wrote: > >> > >> Previously we're using a fixed ratio and fixed bytes for lone extents > >> defragging, which may not fit all situations. > >> > >> This patch would enhance the behavior by allowing fine-tuning using some > >> extra members inside btrfs_ioctl_defrag_range_args. > >> > >> This would introduce two flags and two new members: > >> > >> - BTRFS_DEFRAG_RANGE_LONE_RATIO and BTRFS_DEFRAG_RANGE_LONE_WASTED_BYTES > >> With these flags set, defrag would consider lone extents with their > >> utilization ratio and wasted bytes as a defrag condition. > >> > >> - lone_ratio > >> This is a u32 value, but only [0, 65536] is allowed (still beyond u16 > >> range, thus have to go u32). > >> 0 means disable lone ratio detection. > >> [1, 65536] means the ratio. If a lone extent is utilizing less than > >> (lone_ratio / 65536) * on-disk size of an extent, it would be > >> considered as a defrag target. > >> > >> - lone_wasted_bytes > >> This is a u32 value. > >> If we free the lone extent, and can free up to @lone_wasted_bytes > >> (excluding the extent itself), then it would be considered as a defrag > >> target. > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/defrag.c | 40 ++++++++++++++++++++++++-------------- > >> fs/btrfs/ioctl.c | 9 +++++++++ > >> include/uapi/linux/btrfs.h | 28 +++++++++++++++++++++----- > >> 3 files changed, 57 insertions(+), 20 deletions(-) > >> > >> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > >> index 85c6e45d0cd4..3566845ee3e6 100644 > >> --- a/fs/btrfs/defrag.c > >> +++ b/fs/btrfs/defrag.c > >> @@ -958,26 +958,28 @@ struct defrag_target_range { > >> * any adjacent ones), but we may still want to defrag them, to free up > >> * some space if possible. > >> */ > >> -static bool should_defrag_under_utilized(struct extent_map *em) > >> +static bool should_defrag_under_utilized(struct extent_map *em, u32 lone_ratio, > >> + u32 lone_wasted_bytes) > >> { > >> /* > >> * Ratio based check. > >> * > >> - * If the current extent is only utilizing 1/16 of its on-disk size, > >> + * If the current extent is only utilizing less than > >> + * (%lone_ratio/65536) of its on-disk size, > > > > I really don't understand what this notation "(%lone_ratio/65536)" means... > > That means the ratio is between [0, 65536]. > > 0 means disable ratio check, 1 means 1/65536 of the on-disk size and > 65536 mean 65536/65536 of the on-disk size (aka, all regular extents > would meet the ratio) > > As normally we don't do any float inside kernel, the ratio must be some > unsigned int/long based. > > Any better ideas on the description? > > > > >> * it's definitely under-utilized, and defragging it may free up > >> * the whole extent. > >> */ > >> - if (em->len < em->orig_block_len / 16) > >> + if (lone_ratio && em->len < em->orig_block_len * lone_ratio / 65536) > > > > Why don't you use SZ_64K everywhere instead of 65536? > > Maybe for the ratio, I should use some macro to define the division base. > > > > >> return true; > >> > >> /* > >> * Wasted space based check. > >> * > >> - * If we can free up at least 16MiB, then it may be a good idea > >> - * to defrag. > >> + * If we can free up at least @lone_wasted_bytes, then it may be a > >> + * good idea to defrag. > >> */ > >> if (em->len < em->orig_block_len && > >> - em->orig_block_len - em->len > SZ_16M) > >> + em->orig_block_len - em->len > lone_wasted_bytes) > > > > According to the comment it should be >=. So either fix the comment, > > or the comparison. > > OK, would change that. > > [...] > >> + > >> + /* > >> + * If we defrag a lone extent (has no adjacent file extent) can free > > > > Using this term "lone" is confusing for me, and this description also > > suggests it's about a file with only one extent. > > I would call it wasted_ratio, and the other one wasted_bytes, without > > the "lone" in the name. > > The "lone" would describe a file extent without non-hole neighbor file > extent, thus I though "lone" would be good enough, but it's not. > > > > Because a case like this: > > > > xfs_io -f -c "pwrite 0 4K" $MNT/foobar > > sync > > xfs_io -c "pwrite 8K 128M" $MNT/foobar > > sync > > xfs_io -c "truncate 16K" $MNT/foobar > > btrfs filesystem defrag $MNT/foobar > > > > There's an adjacent extent, it's simply not mergeable, but we want to > > rewrite the second extent to avoid pinning 128M - 8K of data space. > > A similar example with a next extent that is not mergeable is also > > doable, or examples with previous and next extents that aren't > > mergeable. > > Yeah, such extent would always have hole neighbors, just not mergeable. > Thus "lone" is not good enough. > > > > > And I still don't get the logic of using 65536... Why not have the > > ratio as an integer between 0 and 99? That's a lot more intuitive > > IMO... > > The choose of 65536 is to make the division faster, as regular division > is always slow no matter whatever arch we're one. > "/ 65536" would be optimized by compiler to ">> 4". > > But I guess I'm over optimizing here? Honestly I don't even know how to answer you... I would be surprised if integer divisions are that heavy in the context of defrag... probably not even 1% of the time is spent on divisions. And really exposing an interface that requires a ratio with such an odd formula to optimize divisions, is really ugly, odd and confusing for users. Specially taking into account that it even leaks to the user interface in btrfs-progs, as looking into the btrfs-progs patch I see the following: + --lone-ratio <ratio> + For a lone file extent (which has no adjacent file exctent), if its utilizing + less than (ratio / 65536) of the on-disk extent size, it would also be defraged + for its potential to free up the on-disk extent. + + Valid values are in the range [0, 65536]. A percentage, in the integer range from 1 to 99% for example, is a lot more user friendly, intuitive, obvious. Thanks. > > Thanks, > Qu > > > > > Thanks. > > > >> + * up more space than this value (in bytes), it would be considered > >> + * as a defrag target. > >> + */ > >> + __u32 lone_wasted_bytes; > >> + > >> /* spare for later */ > >> - __u32 unused[4]; > >> + __u32 unused[2]; > >> }; > >> > >> > >> -- > >> 2.43.0 > >> > >> > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior 2024-02-07 12:27 ` Filipe Manana @ 2024-02-07 17:40 ` Christoph Anton Mitterer 0 siblings, 0 replies; 12+ messages in thread From: Christoph Anton Mitterer @ 2024-02-07 17:40 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Wed, 2024-02-07 at 12:27 +0000, Filipe Manana wrote: > A percentage, in the integer range from 1 to 99% for example, is a > lot > more user friendly, intuitive, obvious. If by that you'd mean the percentage of wasted storage, then I'm not really sure whether I can agree to that. Especially percentage of what? The original untruncated extent size? The user would never really know what that is. If I have e.g. a 10.000 files with extent sizes of 1 GB, then 10% wasted may already be too much for me. But if I had 10.000 files with extents sizes of 1MB, the same 10% may not be that much of a problem. OTOH, if I have 10 million files with extent sizes of a few kB, than even 1% might be already too much. If the clean up of such wasted space is a manual operation, then I would be best if one could somehow see not only how much is wasted (which can, to some extent already be done via compsize) but also some estimates how that is distributed like in the examples above... and from that, which precentage or ratio or whatever one has to specify for clean up to get so and so much wasted storage back. Cheers Chris. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] btrfs: defrag: better lone extent handling 2024-02-05 23:45 [PATCH 0/2] btrfs: defrag: better lone extent handling Qu Wenruo 2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo 2024-02-05 23:45 ` [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior Qu Wenruo @ 2024-02-06 20:27 ` Christoph Anton Mitterer 2 siblings, 0 replies; 12+ messages in thread From: Christoph Anton Mitterer @ 2024-02-06 20:27 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs Hey. From an admin PoV, I wonder whether this (i.e. doing it via defrag) is the (only) desired approach to this problem. defrag always sounds like a manual intervention on a degrading filesystem, which for an admin is always rater undesirable than desirable. (questions like: How often needs on to run it? What are the best settings? Does it impact the regular system IO? ...) Your patches to btrfs-progs, would that allow one to defrag *only* the extents with the cut-off? I mean from an admin-PoV I may want to fix that - but not defrag all other files, which may involve quite some reading/writing? Would it technically even be possible, to fix (= shrink by rewriting) the extent right at the time when the userspace truncates it? I mean has btrfs even a chance to notice it at that point ("oops, we've just wasted n MB, shall we do something about it right, or wait for the user to manually defrag?"). Feels a bit like a mount option where one could say if a truncate causes more than n bytes to be lost, re-write instantaneously. *If* that would even be possible, it doesn't mean that the defrag way couldn't be beneficial, too. Some people may rather want to waste the space first, and only clean up later, when IO is less busy. Cheers, Chris. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-02-07 18:57 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-05 23:45 [PATCH 0/2] btrfs: defrag: better lone extent handling Qu Wenruo 2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo 2024-02-06 16:23 ` Filipe Manana 2024-02-06 20:41 ` Qu Wenruo 2024-02-06 22:26 ` Qu Wenruo 2024-02-07 12:19 ` Filipe Manana 2024-02-05 23:45 ` [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior Qu Wenruo 2024-02-06 17:03 ` Filipe Manana 2024-02-06 20:50 ` Qu Wenruo 2024-02-07 12:27 ` Filipe Manana 2024-02-07 17:40 ` Christoph Anton Mitterer 2024-02-06 20:27 ` [PATCH 0/2] btrfs: defrag: better lone extent handling Christoph Anton Mitterer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox