* [PATCH v2 0/6] Ioctl to clear unused space in various ways
@ 2025-03-12 11:12 David Sterba
2025-03-12 11:12 ` [PATCH v2 1/6] btrfs: extend trim callchains to pass the operation type David Sterba
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Add ioctl that is similar to FITRIM and in addition to trim can do also
zeroing (either plain overwrite, or unmap the blocks if the device
supports it) and secure erase.
This can be used to zero the unused space in e.g. VM images (when run
from inside the guest, if fstrim is not supported) or free space on
thin-provisioned devices.
The secure erase is provided by blkdiscard command but I'm not aware of
equivalent that can be run on a filesystem, so this is for parity.
v2:
- return -EOPNOTSUPP for zoned mode, with a note
- add new operation to reset the chunk map status wrt trim
David Sterba (6):
btrfs: extend trim callchains to pass the operation type
btrfs: add new ioctl CLEAR_FREE
btrfs: add zeroout mode to CLEAR_FREE ioctl
btrfs: add secure erase mode to CLEAR_FREE ioctl
btrfs: add more zeroout modes to CLEAR_FREE ioctl
btrfs: add mode to clear chunk map status to CLEAR_FREE ioctl
fs/btrfs/discard.c | 4 +-
fs/btrfs/extent-tree.c | 159 +++++++++++++++++++++++++++++++-----
fs/btrfs/extent-tree.h | 5 +-
fs/btrfs/free-space-cache.c | 29 ++++---
fs/btrfs/free-space-cache.h | 8 +-
fs/btrfs/inode.c | 2 +-
fs/btrfs/ioctl.c | 62 ++++++++++++++
fs/btrfs/volumes.c | 8 +-
fs/btrfs/volumes.h | 1 +
include/uapi/linux/btrfs.h | 53 ++++++++++++
10 files changed, 291 insertions(+), 40 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/6] btrfs: extend trim callchains to pass the operation type 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-12 11:12 ` [PATCH v2 2/6] btrfs: add new ioctl CLEAR_FREE David Sterba ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Preparatory work for more than trim/discard operation that can be performed on the unused space from an ioctl. As FITRIM is not extensible, we'll need a new one. Now we extend any caller that takes part in the trim/discard to take one parameter defining the type of operation. The operation multiplexer btrfs_issue_clear_op() will be extended in followup patches. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/discard.c | 4 +-- fs/btrfs/extent-tree.c | 51 +++++++++++++++++++++++-------------- fs/btrfs/extent-tree.h | 3 ++- fs/btrfs/free-space-cache.c | 29 +++++++++++---------- fs/btrfs/free-space-cache.h | 8 +++--- fs/btrfs/inode.c | 2 +- fs/btrfs/volumes.c | 3 ++- include/uapi/linux/btrfs.h | 8 ++++++ 8 files changed, 68 insertions(+), 40 deletions(-) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index d6eef4bd9e9d..4515548b107b 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -524,13 +524,13 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_trim_block_group_bitmaps(block_group, &trimmed, block_group->discard_cursor, btrfs_block_group_end(block_group), - minlen, maxlen, true); + minlen, maxlen, true, BTRFS_CLEAR_OP_DISCARD); discard_ctl->discard_bitmap_bytes += trimmed; } else { btrfs_trim_block_group_extents(block_group, &trimmed, block_group->discard_cursor, btrfs_block_group_end(block_group), - minlen, true); + minlen, true, BTRFS_CLEAR_OP_DISCARD); discard_ctl->discard_extent_bytes += trimmed; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 957230abd827..dcc16ca91f11 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1247,8 +1247,20 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans, return ret; } +static int btrfs_issue_clear_op(struct block_device *bdev, u64 start, u64 size, + enum btrfs_clear_op_type clear) +{ + switch (clear) { + case BTRFS_CLEAR_OP_DISCARD: + return blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, + size >> SECTOR_SHIFT, GFP_NOFS); + default: + return -EOPNOTSUPP; + } +} + static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, - u64 *discarded_bytes) + u64 *discarded_bytes, enum btrfs_clear_op_type clear) { int j, ret = 0; u64 bytes_left, end; @@ -1293,11 +1305,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, bytes_left = end - start; continue; } - if (size) { - ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, - size >> SECTOR_SHIFT, - GFP_NOFS); + ret = btrfs_issue_clear_op(bdev, start, size, clear); if (!ret) *discarded_bytes += size; else if (ret != -EOPNOTSUPP) @@ -1315,9 +1324,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, while (bytes_left) { u64 bytes_to_discard = min(BTRFS_MAX_DISCARD_CHUNK_SIZE, bytes_left); - ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, - bytes_to_discard >> SECTOR_SHIFT, - GFP_NOFS); + ret = btrfs_issue_clear_op(bdev, start, bytes_left, clear); if (ret) { if (ret != -EOPNOTSUPP) @@ -1338,7 +1345,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } -static int do_discard_extent(struct btrfs_discard_stripe *stripe, u64 *bytes) +static int do_discard_extent(struct btrfs_discard_stripe *stripe, u64 *bytes, + enum btrfs_clear_op_type clear) { struct btrfs_device *dev = stripe->dev; struct btrfs_fs_info *fs_info = dev->fs_info; @@ -1367,7 +1375,7 @@ static int do_discard_extent(struct btrfs_discard_stripe *stripe, u64 *bytes) &discarded); discarded += src_disc; } else if (bdev_max_discard_sectors(stripe->dev->bdev)) { - ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded, clear); } else { ret = 0; *bytes = 0; @@ -1379,7 +1387,8 @@ static int do_discard_extent(struct btrfs_discard_stripe *stripe, u64 *bytes) } int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, - u64 num_bytes, u64 *actual_bytes) + u64 num_bytes, u64 *actual_bytes, + enum btrfs_clear_op_type clear) { int ret = 0; u64 discarded_bytes = 0; @@ -1418,7 +1427,7 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, &stripe->dev->dev_state)) continue; - ret = do_discard_extent(stripe, &bytes); + ret = do_discard_extent(stripe, &bytes, clear); if (ret) { /* * Keep going if discard is not supported by the @@ -2837,7 +2846,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, - end + 1 - start, NULL); + end + 1 - start, NULL, + BTRFS_CLEAR_OP_DISCARD); clear_extent_dirty(unpin, start, end, &cached_state); ret = unpin_extent_range(fs_info, start, end, true); @@ -2866,7 +2876,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) ret = btrfs_discard_extent(fs_info, block_group->start, block_group->length, - &trimmed); + &trimmed, + BTRFS_CLEAR_OP_DISCARD); /* * Not strictly necessary to lock, as the block_group should be @@ -6368,7 +6379,8 @@ void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u6 * it while performing the free space search since we have already * held back allocations. */ -static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) +static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed, + enum btrfs_clear_op_type clear) { u64 start = BTRFS_DEVICE_RANGE_RESERVED, len = 0, end = 0; int ret; @@ -6433,8 +6445,7 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) break; } - ret = btrfs_issue_discard(device->bdev, start, len, - &bytes); + ret = btrfs_issue_discard(device->bdev, start, len, &bytes, clear); if (!ret) set_extent_bit(&device->alloc_state, start, start + bytes - 1, CHUNK_TRIMMED, NULL); @@ -6516,7 +6527,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) &group_trimmed, start, end, - range->minlen); + range->minlen, + BTRFS_CLEAR_OP_DISCARD); trimmed += group_trimmed; if (ret) { @@ -6537,7 +6549,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) continue; - ret = btrfs_trim_free_extents(device, &group_trimmed); + ret = btrfs_trim_free_extents(device, &group_trimmed, + BTRFS_CLEAR_OP_DISCARD); trimmed += group_trimmed; if (ret) { diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 0ed682d9ed7b..c8e1a30309ab 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -163,7 +163,8 @@ int btrfs_drop_subtree(struct btrfs_trans_handle *trans, struct extent_buffer *parent); void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end); int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, - u64 num_bytes, u64 *actual_bytes); + u64 num_bytes, u64 *actual_bytes, + enum btrfs_clear_op_type clear); int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range); #endif diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 05e173311c1a..05066cf485d0 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3652,7 +3652,8 @@ static int do_trimming(struct btrfs_block_group *block_group, u64 *total_trimmed, u64 start, u64 bytes, u64 reserved_start, u64 reserved_bytes, enum btrfs_trim_state reserved_trim_state, - struct btrfs_trim_range *trim_entry) + struct btrfs_trim_range *trim_entry, + enum btrfs_clear_op_type clear) { struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_fs_info *fs_info = block_group->fs_info; @@ -3674,7 +3675,7 @@ static int do_trimming(struct btrfs_block_group *block_group, spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); - ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed); + ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed, clear); if (!ret) { *total_trimmed += trimmed; trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -3711,7 +3712,7 @@ static int do_trimming(struct btrfs_block_group *block_group, */ static int trim_no_bitmap(struct btrfs_block_group *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, - bool async) + bool async, enum btrfs_clear_op_type clear) { struct btrfs_discard_ctl *discard_ctl = &block_group->fs_info->discard_ctl; @@ -3800,7 +3801,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group, ret = do_trimming(block_group, total_trimmed, start, bytes, extent_start, extent_bytes, extent_trim_state, - &trim_entry); + &trim_entry, clear); if (ret) { block_group->discard_cursor = start + bytes; break; @@ -3877,7 +3878,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, */ static int trim_bitmaps(struct btrfs_block_group *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, - u64 maxlen, bool async) + u64 maxlen, bool async, enum btrfs_clear_op_type clear) { struct btrfs_discard_ctl *discard_ctl = &block_group->fs_info->discard_ctl; @@ -3986,7 +3987,7 @@ static int trim_bitmaps(struct btrfs_block_group *block_group, mutex_unlock(&ctl->cache_writeout_mutex); ret = do_trimming(block_group, total_trimmed, start, bytes, - start, bytes, 0, &trim_entry); + start, bytes, 0, &trim_entry, clear); if (ret) { reset_trimming_bitmap(ctl, offset); block_group->discard_cursor = @@ -4020,7 +4021,8 @@ static int trim_bitmaps(struct btrfs_block_group *block_group, } int btrfs_trim_block_group(struct btrfs_block_group *block_group, - u64 *trimmed, u64 start, u64 end, u64 minlen) + u64 *trimmed, u64 start, u64 end, u64 minlen, + enum btrfs_clear_op_type clear) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; @@ -4038,11 +4040,11 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, btrfs_freeze_block_group(block_group); spin_unlock(&block_group->lock); - ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, false); + ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, false, clear); if (ret) goto out; - ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false, clear); div64_u64_rem(end, BITS_PER_BITMAP * ctl->unit, &rem); /* If we ended in the middle of a bitmap, reset the trimming flag */ if (rem) @@ -4054,7 +4056,7 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async) + bool async, enum btrfs_clear_op_type clear) { int ret; @@ -4068,7 +4070,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group, btrfs_freeze_block_group(block_group); spin_unlock(&block_group->lock); - ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, async); + ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, async, clear); btrfs_unfreeze_block_group(block_group); return ret; @@ -4076,7 +4078,8 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group, int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - u64 maxlen, bool async) + u64 maxlen, bool async, + enum btrfs_clear_op_type clear) { int ret; @@ -4091,7 +4094,7 @@ int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group, spin_unlock(&block_group->lock); ret = trim_bitmaps(block_group, trimmed, start, end, minlen, maxlen, - async); + async, clear); btrfs_unfreeze_block_group(block_group); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 9f1dbfdee8ca..c4c2e5571355 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -159,13 +159,15 @@ void btrfs_return_cluster_to_free_space( struct btrfs_block_group *block_group, struct btrfs_free_cluster *cluster); int btrfs_trim_block_group(struct btrfs_block_group *block_group, - u64 *trimmed, u64 start, u64 end, u64 minlen); + u64 *trimmed, u64 start, u64 end, u64 minlen, + enum btrfs_clear_op_type clear); int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async); + bool async, enum btrfs_clear_op_type clear); int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - u64 maxlen, bool async); + u64 maxlen, bool async, + enum btrfs_clear_op_type clear); bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info); int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info, bool active); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b4efd0c00f21..4c368e1516b8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3307,7 +3307,7 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent) btrfs_discard_extent(fs_info, ordered_extent->disk_bytenr, ordered_extent->disk_num_bytes, - NULL); + NULL, BTRFS_CLEAR_OP_DISCARD); btrfs_free_reserved_extent(fs_info, ordered_extent->disk_bytenr, ordered_extent->disk_num_bytes, 1); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e6761ccd8187..f1b1d7446b20 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3534,7 +3534,8 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) * filesystem's point of view. */ if (btrfs_is_zoned(fs_info)) { - ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL); + ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL, + BTRFS_CLEAR_OP_DISCARD); if (ret) btrfs_info(fs_info, "failed to reset zone %llu after relocation", diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index dd02160015b2..aab7fac56d32 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1096,6 +1096,14 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET, }; +/* + * Type of operation that will be used to clear unused blocks. + */ +enum btrfs_clear_op_type { + BTRFS_CLEAR_OP_DISCARD, + BTRFS_NR_CLEAR_OP_TYPES, +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/6] btrfs: add new ioctl CLEAR_FREE 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba 2025-03-12 11:12 ` [PATCH v2 1/6] btrfs: extend trim callchains to pass the operation type David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-12 11:12 ` [PATCH v2 3/6] btrfs: add zeroout mode to CLEAR_FREE ioctl David Sterba ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add a new ioctl that is an extensible version of FITRIM. It currently does only the trim/discard and will be extended by other modes like zeroing or block unmapping. We need a new ioctl for that because struct fstrim_range does not provide any existing or reserved member for extensions. The new ioctl also supports TRIM as the operation type. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/extent-tree.c | 92 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/extent-tree.h | 2 + fs/btrfs/ioctl.c | 49 ++++++++++++++++++++ include/uapi/linux/btrfs.h | 20 +++++++++ 4 files changed, 163 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dcc16ca91f11..942584b9018a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6570,3 +6570,95 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) return bg_ret; return dev_ret; } + +int btrfs_clear_free_space(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_clear_free_args *args) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + struct btrfs_block_group *cache = NULL; + u64 group_cleared; + u64 range_end = U64_MAX; + u64 start; + u64 end; + u64 cleared = 0; + u64 bg_failed = 0; + u64 dev_failed = 0; + int bg_ret = 0; + int dev_ret = 0; + int ret = 0; + + if (args->start == U64_MAX) + return -EINVAL; + + /* + * Check range overflow if args->length is set. The default args->length + * is U64_MAX. + */ + if (args->length != U64_MAX && + check_add_overflow(args->start, args->length, &range_end)) + return -EINVAL; + + cache = btrfs_lookup_first_block_group(fs_info, args->start); + for (; cache; cache = btrfs_next_block_group(cache)) { + if (cache->start >= range_end) { + btrfs_put_block_group(cache); + break; + } + + start = max(args->start, cache->start); + end = min(range_end, cache->start + cache->length); + + if (end - start >= args->minlen) { + if (!btrfs_block_group_done(cache)) { + ret = btrfs_cache_block_group(cache, true); + if (ret) { + bg_failed++; + bg_ret = ret; + continue; + } + } + ret = btrfs_trim_block_group(cache, &group_cleared, + start, end, args->minlen, + args->type); + + cleared += group_cleared; + if (ret) { + bg_failed++; + bg_ret = ret; + continue; + } + } + } + + if (bg_failed) + btrfs_warn(fs_info, + "failed to clear %llu block group(s), last error %d", + bg_failed, bg_ret); + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) + continue; + + ret = btrfs_trim_free_extents(device, &group_cleared, args->type); + if (ret) { + dev_failed++; + dev_ret = ret; + break; + } + + cleared += group_cleared; + } + mutex_unlock(&fs_devices->device_list_mutex); + + if (dev_failed) + btrfs_warn(fs_info, + "failed to trim %llu device(s), last error %d", + dev_failed, dev_ret); + args->length = cleared; + if (bg_ret) + return bg_ret; + + return dev_ret; +} diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index c8e1a30309ab..e0702b276825 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -166,5 +166,7 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes, enum btrfs_clear_op_type clear); int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range); +int btrfs_clear_free_space(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_clear_free_args *args); #endif diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a13d81bb56a0..e84db3929763 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -5211,6 +5211,53 @@ static int btrfs_ioctl_subvol_sync(struct btrfs_fs_info *fs_info, void __user *a return 0; } +static int btrfs_ioctl_clear_free(struct file *file, void __user *arg) +{ + struct btrfs_fs_info *fs_info = inode_to_fs_info(file_inode(file)); + struct btrfs_ioctl_clear_free_args args; + u64 total_bytes; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + /* + * This can be relaxed to support conventional zones or zones that can + * be reset. Otherwise the assumptions of write pointer are not + * compatible with zeroout or trim. + */ + if (btrfs_is_zoned(fs_info)) + return -EOPNOTSUPP; + + if (copy_from_user(&args, arg, sizeof(args))) + return -EFAULT; + + if (args.type >= BTRFS_NR_CLEAR_OP_TYPES) + return -EOPNOTSUPP; + + ret = mnt_want_write_file(file); + if (ret) + return ret; + + total_bytes = btrfs_super_total_bytes(fs_info->super_copy); + if (args.start > total_bytes) { + ret = -EINVAL; + goto out_drop_write; + } + + ret = btrfs_clear_free_space(fs_info, &args); + if (ret < 0) + goto out_drop_write; + + if (copy_to_user(arg, &args, sizeof(args))) + ret = -EFAULT; + +out_drop_write: + mnt_drop_write_file(file); + + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -5366,6 +5413,8 @@ long btrfs_ioctl(struct file *file, unsigned int #endif case BTRFS_IOC_SUBVOL_SYNC_WAIT: return btrfs_ioctl_subvol_sync(fs_info, argp); + case BTRFS_IOC_CLEAR_FREE: + return btrfs_ioctl_clear_free(file, argp); } return -ENOTTY; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index aab7fac56d32..cfa1136815f1 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1104,6 +1104,24 @@ enum btrfs_clear_op_type { BTRFS_NR_CLEAR_OP_TYPES, }; +struct btrfs_ioctl_clear_free_args { + /* In, type of clearing operation, enumerated in btrfs_clear_free_op_type. */ + __u32 type; + /* Reserved must be zero. */ + __u32 reserved1; + /* + * In. Starting offset to clear from in the logical address space (same + * as fstrim_range::start). + */ + __u64 start; /* in */ + /* In, out. Length from the start to clear (same as fstrim_range::length). */ + __u64 length; + /* In. Minimal length to clear (same as fstrim_range::minlen). */ + __u64 minlen; + /* Reserved, must be zero. */ + __u64 reserved2[4]; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -1224,6 +1242,8 @@ enum btrfs_clear_op_type { struct btrfs_ioctl_encoded_io_args) #define BTRFS_IOC_SUBVOL_SYNC_WAIT _IOW(BTRFS_IOCTL_MAGIC, 65, \ struct btrfs_ioctl_subvol_wait) +#define BTRFS_IOC_CLEAR_FREE _IOWR(BTRFS_IOCTL_MAGIC, 66, \ + struct btrfs_ioctl_clear_free_args) #ifdef __cplusplus } -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/6] btrfs: add zeroout mode to CLEAR_FREE ioctl 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba 2025-03-12 11:12 ` [PATCH v2 1/6] btrfs: extend trim callchains to pass the operation type David Sterba 2025-03-12 11:12 ` [PATCH v2 2/6] btrfs: add new ioctl CLEAR_FREE David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-12 11:12 ` [PATCH v2 4/6] btrfs: add secure erase " David Sterba ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add new type of clearing that will write zeros to the unused space (similar to what trim/discard would do). The mode is implemented by blkdev_issue_zeroout() that can write zeros to the blocks explicitly unless the hardware implements UNMAP command that unmaps the blocks that effectively appear as zeroed. This is handled transparently. As a special case of thin provisioning device, the UNMAP is usually handled and can free the underlying space. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/extent-tree.c | 6 ++++++ include/uapi/linux/btrfs.h | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 942584b9018a..35bef44f069d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1250,10 +1250,16 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans, static int btrfs_issue_clear_op(struct block_device *bdev, u64 start, u64 size, enum btrfs_clear_op_type clear) { + unsigned int flags = BLKDEV_ZERO_KILLABLE; + switch (clear) { case BTRFS_CLEAR_OP_DISCARD: return blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, size >> SECTOR_SHIFT, GFP_NOFS); + case BTRFS_CLEAR_OP_ZERO: + return blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, + size >> SECTOR_SHIFT, GFP_NOFS, + flags); default: return -EOPNOTSUPP; } diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index cfa1136815f1..7529bc0c6efa 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1101,6 +1101,12 @@ enum btrfs_err_code { */ enum btrfs_clear_op_type { BTRFS_CLEAR_OP_DISCARD, + /* + * Write zeros to the range, either overwrite or with hardware offload + * that can unmap the blocks internally. + * (Same as blkdev_issue_zeroout() with 0 flags). + */ + BTRFS_CLEAR_OP_ZERO, BTRFS_NR_CLEAR_OP_TYPES, }; -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/6] btrfs: add secure erase mode to CLEAR_FREE ioctl 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba ` (2 preceding siblings ...) 2025-03-12 11:12 ` [PATCH v2 3/6] btrfs: add zeroout mode to CLEAR_FREE ioctl David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-12 11:12 ` [PATCH v2 5/6] btrfs: add more zeroout modes " David Sterba ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add another type of clearing that will do secure erase on the unused space. This requires hardware support and works as a regular discard while also deleting any copied or cached blocks. Same as "blkdiscard --secure". The unused space ranges may not be aligned to the secure erase block or be of a sufficient length, the exact result depends on the device. Some blocks may still contain valid data even after this ioctl. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/extent-tree.c | 3 +++ include/uapi/linux/btrfs.h | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 35bef44f069d..1e2fe403ee89 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1253,6 +1253,9 @@ static int btrfs_issue_clear_op(struct block_device *bdev, u64 start, u64 size, unsigned int flags = BLKDEV_ZERO_KILLABLE; switch (clear) { + case BTRFS_CLEAR_OP_SECURE_ERASE: + return blkdev_issue_secure_erase(bdev, start >> SECTOR_SHIFT, + size >> SECTOR_SHIFT, GFP_NOFS); case BTRFS_CLEAR_OP_DISCARD: return blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, size >> SECTOR_SHIFT, GFP_NOFS); diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 7529bc0c6efa..229a07843965 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1107,6 +1107,13 @@ enum btrfs_clear_op_type { * (Same as blkdev_issue_zeroout() with 0 flags). */ BTRFS_CLEAR_OP_ZERO, + /* + * Do a secure erase operation on the range. If supported by the + * underlying hardware, this works as regular discard except that all + * copies of the discarded blocks that were possibly created by + * garbage collection must also be erased. + */ + BTRFS_CLEAR_OP_SECURE_ERASE, BTRFS_NR_CLEAR_OP_TYPES, }; -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 5/6] btrfs: add more zeroout modes to CLEAR_FREE ioctl 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba ` (3 preceding siblings ...) 2025-03-12 11:12 ` [PATCH v2 4/6] btrfs: add secure erase " David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-12 11:12 ` [PATCH v2 6/6] btrfs: add mode to clear chunk map status " David Sterba 2025-03-17 20:11 ` [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba The zeroing mode BTRFS_CLEAR_OP_ZERO is safe for use regardless of the underlying device capabilities, either zeros are written or the device will unmap the blocks. This a safe behaviour. In case it's desired to do one or the another add modes that can enforce that or fail when unsupported; - CLEAR_OP_ZERO - overwrite by zero blocks, forbid unmapping blocks by the device - CLEAR_OP_ZERO_NOFALLBACK - unmap the blocks by device and do not fall back to overwriting by zeros Implemented by __blkdev_issue_zeroout() and also documented there. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/extent-tree.c | 11 +++++++++-- include/uapi/linux/btrfs.h | 5 +++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1e2fe403ee89..f287184ae663 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1259,10 +1259,17 @@ static int btrfs_issue_clear_op(struct block_device *bdev, u64 start, u64 size, case BTRFS_CLEAR_OP_DISCARD: return blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, size >> SECTOR_SHIFT, GFP_NOFS); + case BTRFS_CLEAR_OP_ZERO_NOUNMAP: + flags |= BLKDEV_ZERO_NOUNMAP; + return blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, + size >> SECTOR_SHIFT, GFP_NOFS, flags); + case BTRFS_CLEAR_OP_ZERO_NOFALLBACK: + flags |= BLKDEV_ZERO_NOFALLBACK; + return blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, + size >> SECTOR_SHIFT, GFP_NOFS, flags); case BTRFS_CLEAR_OP_ZERO: return blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, - size >> SECTOR_SHIFT, GFP_NOFS, - flags); + size >> SECTOR_SHIFT, GFP_NOFS, flags); default: return -EOPNOTSUPP; } diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 229a07843965..e2f16733c53f 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1114,6 +1114,11 @@ enum btrfs_clear_op_type { * garbage collection must also be erased. */ BTRFS_CLEAR_OP_SECURE_ERASE, + + /* Overwrite by zeros, do not try to unmap blocks. */ + BTRFS_CLEAR_OP_ZERO_NOUNMAP, + /* Request unmapping the blocks and don't fall back to writing zeros. */ + BTRFS_CLEAR_OP_ZERO_NOFALLBACK, BTRFS_NR_CLEAR_OP_TYPES, }; -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 6/6] btrfs: add mode to clear chunk map status to CLEAR_FREE ioctl 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba ` (4 preceding siblings ...) 2025-03-12 11:12 ` [PATCH v2 5/6] btrfs: add more zeroout modes " David Sterba @ 2025-03-12 11:12 ` David Sterba 2025-03-17 20:11 ` [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-12 11:12 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba The trim status is tracked for each chunk in the fs_info::mapping_tree and updated as trim is called either manually by 'fstrim' or automatically when discard=async is enabled. With the new modes it's necessary to allow clearing the cache otherwise on a fully or partially trimmed filesystem the ioctl won't work as expected. Add separate clear free operation to reset just the trim status bits from all chunks. This should be called namely when the clearing operation is *not* trim (e.g. zeroout or secure erase). Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/ioctl.c | 13 +++++++++++++ fs/btrfs/volumes.c | 5 +++++ fs/btrfs/volumes.h | 1 + include/uapi/linux/btrfs.h | 7 +++++++ 4 files changed, 26 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e84db3929763..f965f7fc1fa8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -5235,6 +5235,19 @@ static int btrfs_ioctl_clear_free(struct file *file, void __user *arg) if (args.type >= BTRFS_NR_CLEAR_OP_TYPES) return -EOPNOTSUPP; + if (args.type == BTRFS_CLEAR_OP_RESET_CHUNK_STATUS_CACHE) { + write_lock(&fs_info->mapping_tree_lock); + for (struct rb_node *node = rb_first_cached(&fs_info->mapping_tree); + node; node = rb_next(node)) { + struct btrfs_chunk_map *map; + + map = rb_entry(node, struct btrfs_chunk_map, rb_node); + btrfs_chunk_map_clear_bits(map, CHUNK_TRIMMED); + } + write_unlock(&fs_info->mapping_tree_lock); + return 0; + } + ret = mnt_want_write_file(file); if (ret) return ret; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f1b1d7446b20..786b93c18a22 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -8079,6 +8079,11 @@ static int verify_chunk_dev_extent_mapping(struct btrfs_fs_info *fs_info) return ret; } +void btrfs_chunk_map_clear_bits(struct btrfs_chunk_map *map, unsigned int bits) +{ + chunk_map_device_clear_bits(map, bits); +} + /* * Ensure that all dev extents are mapped to correct chunk, otherwise * later chunk allocation/free would cause unexpected behavior. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index e247d551da67..0e793b9776d6 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -785,6 +785,7 @@ struct btrfs_chunk_map *btrfs_find_chunk_map_nolock(struct btrfs_fs_info *fs_inf u64 logical, u64 length); struct btrfs_chunk_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info, u64 logical, u64 length); +void btrfs_chunk_map_clear_bits(struct btrfs_chunk_map *map, unsigned int bits); void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *map); void btrfs_release_disk_super(struct btrfs_super_block *super); diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index e2f16733c53f..605108ab21f3 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -1119,6 +1119,13 @@ enum btrfs_clear_op_type { BTRFS_CLEAR_OP_ZERO_NOUNMAP, /* Request unmapping the blocks and don't fall back to writing zeros. */ BTRFS_CLEAR_OP_ZERO_NOFALLBACK, + + /* + * Only reset status of previously cleared (by any operation) chunks, + * tracked in memory since the last mount. Without that repeated calls + * to clear will skip already processed chunks. + */ + BTRFS_CLEAR_OP_RESET_CHUNK_STATUS_CACHE, BTRFS_NR_CLEAR_OP_TYPES, }; -- 2.47.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/6] Ioctl to clear unused space in various ways 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba ` (5 preceding siblings ...) 2025-03-12 11:12 ` [PATCH v2 6/6] btrfs: add mode to clear chunk map status " David Sterba @ 2025-03-17 20:11 ` David Sterba 6 siblings, 0 replies; 8+ messages in thread From: David Sterba @ 2025-03-17 20:11 UTC (permalink / raw) To: David Sterba; +Cc: linux-btrfs On Wed, Mar 12, 2025 at 12:12:10PM +0100, David Sterba wrote: > Add ioctl that is similar to FITRIM and in addition to trim can do also > zeroing (either plain overwrite, or unmap the blocks if the device > supports it) and secure erase. > > This can be used to zero the unused space in e.g. VM images (when run > from inside the guest, if fstrim is not supported) or free space on > thin-provisioned devices. > > The secure erase is provided by blkdiscard command but I'm not aware of > equivalent that can be run on a filesystem, so this is for parity. I've found more things to add to the API to support more use cases so this will be moved to another major release. For example physical ranges rather than logical (current FITRIM deficiency wrt btrfs), per-device clearing. Also the cache dropping can be adjusted for a device or range. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-03-17 20:12 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-12 11:12 [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba 2025-03-12 11:12 ` [PATCH v2 1/6] btrfs: extend trim callchains to pass the operation type David Sterba 2025-03-12 11:12 ` [PATCH v2 2/6] btrfs: add new ioctl CLEAR_FREE David Sterba 2025-03-12 11:12 ` [PATCH v2 3/6] btrfs: add zeroout mode to CLEAR_FREE ioctl David Sterba 2025-03-12 11:12 ` [PATCH v2 4/6] btrfs: add secure erase " David Sterba 2025-03-12 11:12 ` [PATCH v2 5/6] btrfs: add more zeroout modes " David Sterba 2025-03-12 11:12 ` [PATCH v2 6/6] btrfs: add mode to clear chunk map status " David Sterba 2025-03-17 20:11 ` [PATCH v2 0/6] Ioctl to clear unused space in various ways David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox