linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] btrfs: zoned: automatic BG reclaim
@ 2021-04-15 13:58 Johannes Thumshirn
  2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-15 13:58 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

When a file gets deleted on a zoned file system, the space freed is not
returned back into the block group's free space, but is migrated to
zone_unusable.

As this zone_unusable space is behind the current write pointer it is not
possible to use it for new allocations. In the current implementation a
zone is reset once all of the block group's space is accounted as zone
unusable.

This behaviour can lead to premature ENOSPC errors on a busy file system.

Instead of only reclaiming the zone once it is completely unusable,
kick off a reclaim job once the amount of unusable bytes exceeds a user
configurable threshold between 51% and 100%. It can be set per mounted
filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
per default.

Similar to reclaiming unused block groups, these dirty block groups are
added to a to_reclaim list and then on a transaction commit, the reclaim
process is triggered but after we deleted unused block groups, which will
free space for the relocation process.

Zones that are 100% full and zone unusable already get reclaimed atomatically
on transaction commit. Another improvement on the garbage collection side of
zoned btrfs would be no to reclaim block groups that have used, pinned and
reserved = 0 but zone_unusable > 0. This is not yet included as it needs
further reaserch and testing.

Changes to v3:
- Special case "discarding" after relocation (Filipe)

Changes to v2:
- Fix locking in multiple ways (Filipe)
- Offload reclaim into workqueue (Josef)
- Add patch discarding/zone-resetting after successfull relocation (Anand)

Changes to v1:
- Document sysfs parameter (David)
- Add info print for reclaim (Josef)
- Rename delete_unused_bgs_mutex to reclaim_bgs_lock (Filipe)
- Remove list_is_singular check (Filipe)
- Document of space_info->groups_sem use (Filipe)

Johannes Thumshirn (3):
  btrfs: zoned: reset zones of relocated block groups
  btrfs: rename delete_unused_bgs_mutex
  btrfs: zoned: automatically reclaim zones

 fs/btrfs/block-group.c       | 102 +++++++++++++++++++++++++++++++++--
 fs/btrfs/block-group.h       |   3 ++
 fs/btrfs/ctree.h             |   8 ++-
 fs/btrfs/disk-io.c           |  19 +++++--
 fs/btrfs/free-space-cache.c  |   9 +++-
 fs/btrfs/sysfs.c             |  35 ++++++++++++
 fs/btrfs/volumes.c           |  69 ++++++++++++++----------
 fs/btrfs/volumes.h           |   1 +
 include/trace/events/btrfs.h |  12 +++++
 9 files changed, 222 insertions(+), 36 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-15 13:58 [PATCH v4 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
@ 2021-04-15 13:58 ` Johannes Thumshirn
  2021-04-15 18:26   ` Josef Bacik
  2021-04-16  9:11   ` Filipe Manana
  2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
  2 siblings, 2 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-15 13:58 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

When relocating a block group the freed up space is not discarded in one
big block, but each extent is discarded on it's own with -odisard=sync.

For a zoned filesystem we need to discard the whole block group at once,
so btrfs_discard_extent() will translate the discard into a
REQ_OP_ZONE_RESET operation, which then resets the device's zone.

Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6d9b2369f17a..b1bab75ec12a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3103,6 +3103,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	struct btrfs_root *root = fs_info->chunk_root;
 	struct btrfs_trans_handle *trans;
 	struct btrfs_block_group *block_group;
+	u64 length;
 	int ret;
 
 	/*
@@ -3130,8 +3131,24 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	if (!block_group)
 		return -ENOENT;
 	btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
+	length = block_group->length;
 	btrfs_put_block_group(block_group);
 
+	/*
+	 * Step two, delete the device extents and the chunk tree entries.
+	 *
+	 * On a zoned file system, discard the whole block group, this will
+	 * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
+	 * resetting the zone fails, don't treat it as a fatal problem from the
+	 * filesystem's point of view.
+	 */
+	if (btrfs_is_zoned(fs_info)) {
+		ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
+		if (ret)
+			btrfs_info(fs_info, "failed to reset zone %llu",
+				   chunk_offset);
+	}
+
 	trans = btrfs_start_trans_remove_block_group(root->fs_info,
 						     chunk_offset);
 	if (IS_ERR(trans)) {
@@ -3140,10 +3157,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 		return ret;
 	}
 
-	/*
-	 * step two, delete the device extents and the
-	 * chunk tree entries
-	 */
 	ret = btrfs_remove_chunk(trans, chunk_offset);
 	btrfs_end_transaction(trans);
 	return ret;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex
  2021-04-15 13:58 [PATCH v4 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
  2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
@ 2021-04-15 13:58 ` Johannes Thumshirn
  2021-04-15 18:26   ` Josef Bacik
                     ` (2 more replies)
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
  2 siblings, 3 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-15 13:58 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

As a preparation for another user, rename the unused_bgs_mutex into
reclaim_bgs_lock.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-group.c |  6 +++---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/disk-io.c     |  6 +++---
 fs/btrfs/volumes.c     | 46 +++++++++++++++++++++---------------------
 4 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 293f3169be80..bbb5a6e170c7 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1289,7 +1289,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 	 * Long running balances can keep us blocked here for eternity, so
 	 * simply skip deletion if we're unable to get the mutex.
 	 */
-	if (!mutex_trylock(&fs_info->delete_unused_bgs_mutex))
+	if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
 		return;
 
 	spin_lock(&fs_info->unused_bgs_lock);
@@ -1462,12 +1462,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		spin_lock(&fs_info->unused_bgs_lock);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);
-	mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+	mutex_unlock(&fs_info->reclaim_bgs_lock);
 	return;
 
 flip_async:
 	btrfs_end_transaction(trans);
-	mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+	mutex_unlock(&fs_info->reclaim_bgs_lock);
 	btrfs_put_block_group(block_group);
 	btrfs_discard_punt_unused_bgs_list(fs_info);
 }
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2c858d5349c8..c80302564e6b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -957,7 +957,7 @@ struct btrfs_fs_info {
 	spinlock_t unused_bgs_lock;
 	struct list_head unused_bgs;
 	struct mutex unused_bg_unpin_mutex;
-	struct mutex delete_unused_bgs_mutex;
+	struct mutex reclaim_bgs_lock;
 
 	/* Cached block sizes */
 	u32 nodesize;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0a1182694f48..e52b89ad0a61 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1890,10 +1890,10 @@ static int cleaner_kthread(void *arg)
 		btrfs_run_defrag_inodes(fs_info);
 
 		/*
-		 * Acquires fs_info->delete_unused_bgs_mutex to avoid racing
+		 * Acquires fs_info->reclaim_bgs_lock to avoid racing
 		 * with relocation (btrfs_relocate_chunk) and relocation
 		 * acquires fs_info->cleaner_mutex (btrfs_relocate_block_group)
-		 * after acquiring fs_info->delete_unused_bgs_mutex. So we
+		 * after acquiring fs_info->reclaim_bgs_lock. So we
 		 * can't hold, nor need to, fs_info->cleaner_mutex when deleting
 		 * unused block groups.
 		 */
@@ -2876,7 +2876,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	spin_lock_init(&fs_info->treelog_bg_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
-	mutex_init(&fs_info->delete_unused_bgs_mutex);
+	mutex_init(&fs_info->reclaim_bgs_lock);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
 	mutex_init(&fs_info->zoned_meta_io_lock);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b1bab75ec12a..a2a7f5ab0a3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3118,7 +3118,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	 * we release the path used to search the chunk/dev tree and before
 	 * the current task acquires this mutex and calls us.
 	 */
-	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
+	lockdep_assert_held(&fs_info->reclaim_bgs_lock);
 
 	/* step one, relocate all the extents inside this chunk */
 	btrfs_scrub_pause(fs_info);
@@ -3185,10 +3185,10 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
 	key.type = BTRFS_CHUNK_ITEM_KEY;
 
 	while (1) {
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		mutex_lock(&fs_info->reclaim_bgs_lock);
 		ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto error;
 		}
 		BUG_ON(ret == 0); /* Corruption */
@@ -3196,7 +3196,7 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
 		ret = btrfs_previous_item(chunk_root, path, key.objectid,
 					  key.type);
 		if (ret)
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 		if (ret < 0)
 			goto error;
 		if (ret > 0)
@@ -3217,7 +3217,7 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
 			else
 				BUG_ON(ret);
 		}
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
 
 		if (found_key.offset == 0)
 			break;
@@ -3757,10 +3757,10 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 			goto error;
 		}
 
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		mutex_lock(&fs_info->reclaim_bgs_lock);
 		ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto error;
 		}
 
@@ -3774,7 +3774,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 		ret = btrfs_previous_item(chunk_root, path, 0,
 					  BTRFS_CHUNK_ITEM_KEY);
 		if (ret) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			ret = 0;
 			break;
 		}
@@ -3784,7 +3784,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 		btrfs_item_key_to_cpu(leaf, &found_key, slot);
 
 		if (found_key.objectid != key.objectid) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			break;
 		}
 
@@ -3801,12 +3801,12 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 
 		btrfs_release_path(path);
 		if (!ret) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto loop;
 		}
 
 		if (counting) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			spin_lock(&fs_info->balance_lock);
 			bctl->stat.expected++;
 			spin_unlock(&fs_info->balance_lock);
@@ -3831,7 +3831,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 					count_meta < bctl->meta.limit_min)
 				|| ((chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) &&
 					count_sys < bctl->sys.limit_min)) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto loop;
 		}
 
@@ -3845,7 +3845,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 			ret = btrfs_may_alloc_data_chunk(fs_info,
 							 found_key.offset);
 			if (ret < 0) {
-				mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+				mutex_unlock(&fs_info->reclaim_bgs_lock);
 				goto error;
 			} else if (ret == 1) {
 				chunk_reserved = 1;
@@ -3853,7 +3853,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 		}
 
 		ret = btrfs_relocate_chunk(fs_info, found_key.offset);
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
 		if (ret == -ENOSPC) {
 			enospc_errors++;
 		} else if (ret == -ETXTBSY) {
@@ -4738,16 +4738,16 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	key.type = BTRFS_DEV_EXTENT_KEY;
 
 	do {
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		mutex_lock(&fs_info->reclaim_bgs_lock);
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto done;
 		}
 
 		ret = btrfs_previous_item(root, path, 0, key.type);
 		if (ret) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			if (ret < 0)
 				goto done;
 			ret = 0;
@@ -4760,7 +4760,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		btrfs_item_key_to_cpu(l, &key, path->slots[0]);
 
 		if (key.objectid != device->devid) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			btrfs_release_path(path);
 			break;
 		}
@@ -4769,7 +4769,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		length = btrfs_dev_extent_length(l, dev_extent);
 
 		if (key.offset + length <= new_size) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			btrfs_release_path(path);
 			break;
 		}
@@ -4785,12 +4785,12 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		 */
 		ret = btrfs_may_alloc_data_chunk(fs_info, chunk_offset);
 		if (ret < 0) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			mutex_unlock(&fs_info->reclaim_bgs_lock);
 			goto done;
 		}
 
 		ret = btrfs_relocate_chunk(fs_info, chunk_offset);
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
 		if (ret == -ENOSPC) {
 			failed++;
 		} else if (ret) {
@@ -8016,7 +8016,7 @@ static int relocating_repair_kthread(void *data)
 		return -EBUSY;
 	}
 
-	mutex_lock(&fs_info->delete_unused_bgs_mutex);
+	mutex_lock(&fs_info->reclaim_bgs_lock);
 
 	/* Ensure block group still exists */
 	cache = btrfs_lookup_block_group(fs_info, target);
@@ -8038,7 +8038,7 @@ static int relocating_repair_kthread(void *data)
 out:
 	if (cache)
 		btrfs_put_block_group(cache);
-	mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+	mutex_unlock(&fs_info->reclaim_bgs_lock);
 	btrfs_exclop_finish(fs_info);
 
 	return ret;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 13:58 [PATCH v4 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
  2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
  2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
@ 2021-04-15 13:58 ` Johannes Thumshirn
  2021-04-15 18:36   ` Josef Bacik
                     ` (3 more replies)
  2 siblings, 4 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-15 13:58 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

When a file gets deleted on a zoned file system, the space freed is not
returned back into the block group's free space, but is migrated to
zone_unusable.

As this zone_unusable space is behind the current write pointer it is not
possible to use it for new allocations. In the current implementation a
zone is reset once all of the block group's space is accounted as zone
unusable.

This behaviour can lead to premature ENOSPC errors on a busy file system.

Instead of only reclaiming the zone once it is completely unusable,
kick off a reclaim job once the amount of unusable bytes exceeds a user
configurable threshold between 51% and 100%. It can be set per mounted
filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
per default.

Similar to reclaiming unused block groups, these dirty block groups are
added to a to_reclaim list and then on a transaction commit, the reclaim
process is triggered but after we deleted unused block groups, which will
free space for the relocation process.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-group.c       | 96 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/block-group.h       |  3 ++
 fs/btrfs/ctree.h             |  6 +++
 fs/btrfs/disk-io.c           | 13 +++++
 fs/btrfs/free-space-cache.c  |  9 +++-
 fs/btrfs/sysfs.c             | 35 +++++++++++++
 fs/btrfs/volumes.c           |  2 +-
 fs/btrfs/volumes.h           |  1 +
 include/trace/events/btrfs.h | 12 +++++
 9 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index bbb5a6e170c7..3f06ea42c013 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1485,6 +1485,92 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
 	spin_unlock(&fs_info->unused_bgs_lock);
 }
 
+void btrfs_reclaim_bgs_work(struct work_struct *work)
+{
+	struct btrfs_fs_info *fs_info =
+		container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
+	struct btrfs_block_group *bg;
+	struct btrfs_space_info *space_info;
+	int ret = 0;
+
+	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
+		return;
+
+	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
+		return;
+
+	mutex_lock(&fs_info->reclaim_bgs_lock);
+	spin_lock(&fs_info->unused_bgs_lock);
+	while (!list_empty(&fs_info->reclaim_bgs)) {
+		bg = list_first_entry(&fs_info->reclaim_bgs,
+				      struct btrfs_block_group,
+				      bg_list);
+		list_del_init(&bg->bg_list);
+
+		space_info = bg->space_info;
+		spin_unlock(&fs_info->unused_bgs_lock);
+
+		/* Don't want to race with allocators so take the groups_sem */
+		down_write(&space_info->groups_sem);
+
+		spin_lock(&bg->lock);
+		if (bg->reserved || bg->pinned || bg->ro) {
+			/*
+			 * We want to bail if we made new allocations or have
+			 * outstanding allocations in this block group.  We do
+			 * the ro check in case balance is currently acting on
+			 * this block group.
+			 */
+			spin_unlock(&bg->lock);
+			up_write(&space_info->groups_sem);
+			goto next;
+		}
+		spin_unlock(&bg->lock);
+
+		ret = inc_block_group_ro(bg, 0);
+		up_write(&space_info->groups_sem);
+		if (ret < 0) {
+			ret = 0;
+			goto next;
+		}
+
+		btrfs_info(fs_info, "reclaiming chunk %llu", bg->start);
+		trace_btrfs_reclaim_block_group(bg);
+		ret = btrfs_relocate_chunk(fs_info, bg->start);
+		if (ret)
+			btrfs_err(fs_info, "error relocating chunk %llu",
+				  bg->start);
+
+next:
+		btrfs_put_block_group(bg);
+		spin_lock(&fs_info->unused_bgs_lock);
+	}
+	spin_unlock(&fs_info->unused_bgs_lock);
+	mutex_unlock(&fs_info->reclaim_bgs_lock);
+	btrfs_exclop_finish(fs_info);
+}
+
+void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info)
+{
+	spin_lock(&fs_info->unused_bgs_lock);
+	if (!list_empty(&fs_info->reclaim_bgs))
+		queue_work(system_unbound_wq, &fs_info->reclaim_bgs_work);
+	spin_unlock(&fs_info->unused_bgs_lock);
+}
+
+void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg)
+{
+	struct btrfs_fs_info *fs_info = bg->fs_info;
+
+	spin_lock(&fs_info->unused_bgs_lock);
+	if (list_empty(&bg->bg_list)) {
+		btrfs_get_block_group(bg);
+		trace_btrfs_add_reclaim_block_group(bg);
+		list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
+	}
+	spin_unlock(&fs_info->unused_bgs_lock);
+}
+
 static int read_bg_from_eb(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 			   struct btrfs_path *path)
 {
@@ -3446,6 +3532,16 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 	}
 	spin_unlock(&info->unused_bgs_lock);
 
+	spin_lock(&info->unused_bgs_lock);
+	while (!list_empty(&info->reclaim_bgs)) {
+		block_group = list_first_entry(&info->reclaim_bgs,
+					       struct btrfs_block_group,
+					       bg_list);
+		list_del_init(&block_group->bg_list);
+		btrfs_put_block_group(block_group);
+	}
+	spin_unlock(&info->unused_bgs_lock);
+
 	spin_lock(&info->block_group_cache_lock);
 	while ((n = rb_last(&info->block_group_cache_tree)) != NULL) {
 		block_group = rb_entry(n, struct btrfs_block_group,
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 3ecc3372a5ce..7b927425dc71 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -264,6 +264,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 			     u64 group_start, struct extent_map *em);
 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
 void btrfs_mark_bg_unused(struct btrfs_block_group *bg);
+void btrfs_reclaim_bgs_work(struct work_struct *work);
+void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info);
+void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg);
 int btrfs_read_block_groups(struct btrfs_fs_info *info);
 int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
 			   u64 type, u64 chunk_offset, u64 size);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c80302564e6b..88531c1fbcdf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -954,10 +954,14 @@ struct btrfs_fs_info {
 	struct work_struct async_data_reclaim_work;
 	struct work_struct preempt_reclaim_work;
 
+	/* Used to reclaim data space in the background */
+	struct work_struct reclaim_bgs_work;
+
 	spinlock_t unused_bgs_lock;
 	struct list_head unused_bgs;
 	struct mutex unused_bg_unpin_mutex;
 	struct mutex reclaim_bgs_lock;
+	struct list_head reclaim_bgs;
 
 	/* Cached block sizes */
 	u32 nodesize;
@@ -998,6 +1002,8 @@ struct btrfs_fs_info {
 	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
 
+	int bg_reclaim_threshold;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e52b89ad0a61..942d894ec175 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1898,6 +1898,13 @@ static int cleaner_kthread(void *arg)
 		 * unused block groups.
 		 */
 		btrfs_delete_unused_bgs(fs_info);
+
+		/*
+		 * Reclaim block groups in the reclaim_bgs list after we deleted
+		 * all unused block_groups. This possibly gives us some more free
+		 * space.
+		 */
+		btrfs_reclaim_bgs(fs_info);
 sleep:
 		clear_and_wake_up_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
 		if (kthread_should_park())
@@ -2886,6 +2893,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&fs_info->space_info);
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
 	INIT_LIST_HEAD(&fs_info->unused_bgs);
+	INIT_LIST_HEAD(&fs_info->reclaim_bgs);
 #ifdef CONFIG_BTRFS_DEBUG
 	INIT_LIST_HEAD(&fs_info->allocated_roots);
 	INIT_LIST_HEAD(&fs_info->allocated_ebs);
@@ -2974,6 +2982,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	fs_info->swapfile_pins = RB_ROOT;
 
 	fs_info->send_in_progress = 0;
+
+	fs_info->bg_reclaim_threshold = 75;
+	INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
 }
 
 static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
@@ -4332,6 +4343,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	cancel_work_sync(&fs_info->async_data_reclaim_work);
 	cancel_work_sync(&fs_info->preempt_reclaim_work);
 
+	cancel_work_sync(&fs_info->reclaim_bgs_work);
+
 	/* Cancel or finish ongoing discard work */
 	btrfs_discard_cleanup(fs_info);
 
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 9988decd5717..e54466fc101f 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -11,6 +11,7 @@
 #include <linux/ratelimit.h>
 #include <linux/error-injection.h>
 #include <linux/sched/mm.h>
+#include "misc.h"
 #include "ctree.h"
 #include "free-space-cache.h"
 #include "transaction.h"
@@ -2539,6 +2540,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info,
 static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 					u64 bytenr, u64 size, bool used)
 {
+	struct btrfs_fs_info *fs_info = block_group->fs_info;
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
 	u64 offset = bytenr - block_group->start;
 	u64 to_free, to_unusable;
@@ -2569,8 +2571,13 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 	}
 
 	/* All the region is now unusable. Mark it as unused and reclaim */
-	if (block_group->zone_unusable == block_group->length)
+	if (block_group->zone_unusable == block_group->length) {
 		btrfs_mark_bg_unused(block_group);
+	} else if (block_group->zone_unusable >=
+		   div_factor_fine(block_group->length,
+				   fs_info->bg_reclaim_threshold)) {
+		btrfs_mark_bg_to_reclaim(block_group);
+	}
 
 	return 0;
 }
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index a99d1f415a7f..436ac7b4b334 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -980,6 +980,40 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
 }
 BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
 
+static ssize_t btrfs_bg_reclaim_threshold_show(struct kobject *kobj,
+					       struct kobj_attribute *a,
+					       char *buf)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	ssize_t ret;
+
+	ret = scnprintf(buf, PAGE_SIZE, "%d\n", fs_info->bg_reclaim_threshold);
+
+	return ret;
+}
+
+static ssize_t btrfs_bg_reclaim_threshold_store(struct kobject *kobj,
+						struct kobj_attribute *a,
+						const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	int thresh;
+	int ret;
+
+	ret = kstrtoint(buf, 10, &thresh);
+	if (ret)
+		return ret;
+
+	if (thresh <= 50 || thresh > 100)
+		return -EINVAL;
+
+	fs_info->bg_reclaim_threshold = thresh;
+
+	return len;
+}
+BTRFS_ATTR_RW(, bg_reclaim_threshold, btrfs_bg_reclaim_threshold_show,
+	      btrfs_bg_reclaim_threshold_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
@@ -991,6 +1025,7 @@ static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, exclusive_operation),
 	BTRFS_ATTR_PTR(, generation),
 	BTRFS_ATTR_PTR(, read_policy),
+	BTRFS_ATTR_PTR(, bg_reclaim_threshold),
 	NULL,
 };
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a2a7f5ab0a3e..527324154e3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3098,7 +3098,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
 	return ret;
 }
 
-static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
+int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 {
 	struct btrfs_root *root = fs_info->chunk_root;
 	struct btrfs_trans_handle *trans;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d4c3e0dd32b8..9c0d84e5ec06 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -484,6 +484,7 @@ void btrfs_describe_block_groups(u64 flags, char *buf, u32 size_buf);
 int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
 int btrfs_recover_balance(struct btrfs_fs_info *fs_info);
 int btrfs_pause_balance(struct btrfs_fs_info *fs_info);
+int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset);
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
 int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_uuid_scan_kthread(void *data);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 0551ea65374f..a41dd8a0c730 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1903,6 +1903,18 @@ DEFINE_EVENT(btrfs__block_group, btrfs_add_unused_block_group,
 	TP_ARGS(bg_cache)
 );
 
+DEFINE_EVENT(btrfs__block_group, btrfs_add_reclaim_block_group,
+	TP_PROTO(const struct btrfs_block_group *bg_cache),
+
+	TP_ARGS(bg_cache)
+);
+
+DEFINE_EVENT(btrfs__block_group, btrfs_reclaim_block_group,
+	TP_PROTO(const struct btrfs_block_group *bg_cache),
+
+	TP_ARGS(bg_cache)
+);
+
 DEFINE_EVENT(btrfs__block_group, btrfs_skip_unused_block_group,
 	TP_PROTO(const struct btrfs_block_group *bg_cache),
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
@ 2021-04-15 18:26   ` Josef Bacik
  2021-04-16  5:50     ` Johannes Thumshirn
  2021-04-16  9:11   ` Filipe Manana
  1 sibling, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2021-04-15 18:26 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: linux-btrfs, Naohiro Aota, Filipe Manana, Anand Jain

On 4/15/21 9:58 AM, Johannes Thumshirn wrote:
> When relocating a block group the freed up space is not discarded in one
> big block, but each extent is discarded on it's own with -odisard=sync.
> 
> For a zoned filesystem we need to discard the whole block group at once,
> so btrfs_discard_extent() will translate the discard into a
> REQ_OP_ZONE_RESET operation, which then resets the device's zone.
> 
> Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

What would be cool is if we could disable discard per bg so we don't discard at 
all during the relocation, and then discard the whole block group no matter if 
we have zoned or not.  However not really something you need to do, just 
thinking out loud

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex
  2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
@ 2021-04-15 18:26   ` Josef Bacik
  2021-04-16  9:15   ` Filipe Manana
  2021-04-16 16:36   ` David Sterba
  2 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2021-04-15 18:26 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: linux-btrfs, Naohiro Aota, Filipe Manana, Anand Jain

On 4/15/21 9:58 AM, Johannes Thumshirn wrote:
> As a preparation for another user, rename the unused_bgs_mutex into
> reclaim_bgs_lock.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
@ 2021-04-15 18:36   ` Josef Bacik
  2021-04-16  5:50     ` Johannes Thumshirn
  2021-04-16  9:28   ` Filipe Manana
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2021-04-15 18:36 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba
  Cc: linux-btrfs, Naohiro Aota, Filipe Manana, Anand Jain

On 4/15/21 9:58 AM, Johannes Thumshirn wrote:
> When a file gets deleted on a zoned file system, the space freed is not
> returned back into the block group's free space, but is migrated to
> zone_unusable.
> 
> As this zone_unusable space is behind the current write pointer it is not
> possible to use it for new allocations. In the current implementation a
> zone is reset once all of the block group's space is accounted as zone
> unusable.
> 
> This behaviour can lead to premature ENOSPC errors on a busy file system.
> 
> Instead of only reclaiming the zone once it is completely unusable,
> kick off a reclaim job once the amount of unusable bytes exceeds a user
> configurable threshold between 51% and 100%. It can be set per mounted
> filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
> per default.
> 
> Similar to reclaiming unused block groups, these dirty block groups are
> added to a to_reclaim list and then on a transaction commit, the reclaim
> process is triggered but after we deleted unused block groups, which will
> free space for the relocation process.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>   fs/btrfs/block-group.c       | 96 ++++++++++++++++++++++++++++++++++++
>   fs/btrfs/block-group.h       |  3 ++
>   fs/btrfs/ctree.h             |  6 +++
>   fs/btrfs/disk-io.c           | 13 +++++
>   fs/btrfs/free-space-cache.c  |  9 +++-
>   fs/btrfs/sysfs.c             | 35 +++++++++++++
>   fs/btrfs/volumes.c           |  2 +-
>   fs/btrfs/volumes.h           |  1 +
>   include/trace/events/btrfs.h | 12 +++++
>   9 files changed, 175 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index bbb5a6e170c7..3f06ea42c013 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1485,6 +1485,92 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>   	spin_unlock(&fs_info->unused_bgs_lock);
>   }
>   
> +void btrfs_reclaim_bgs_work(struct work_struct *work)
> +{
> +	struct btrfs_fs_info *fs_info =
> +		container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
> +	struct btrfs_block_group *bg;
> +	struct btrfs_space_info *space_info;
> +	int ret = 0;
> +
> +	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
> +		return;
> +
> +	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
> +		return;
> +
> +	mutex_lock(&fs_info->reclaim_bgs_lock);
> +	spin_lock(&fs_info->unused_bgs_lock);
> +	while (!list_empty(&fs_info->reclaim_bgs)) {
> +		bg = list_first_entry(&fs_info->reclaim_bgs,
> +				      struct btrfs_block_group,
> +				      bg_list);
> +		list_del_init(&bg->bg_list);
> +
> +		space_info = bg->space_info;
> +		spin_unlock(&fs_info->unused_bgs_lock);
> +
> +		/* Don't want to race with allocators so take the groups_sem */
> +		down_write(&space_info->groups_sem);
> +
> +		spin_lock(&bg->lock);
> +		if (bg->reserved || bg->pinned || bg->ro) {
> +			/*
> +			 * We want to bail if we made new allocations or have
> +			 * outstanding allocations in this block group.  We do
> +			 * the ro check in case balance is currently acting on
> +			 * this block group.
> +			 */
> +			spin_unlock(&bg->lock);
> +			up_write(&space_info->groups_sem);
> +			goto next;
> +		}
> +		spin_unlock(&bg->lock);
> +

Here I think we want a

if (btrfs_fs_closing())
	goto next;

so we don't block out a umount for all eternity.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-15 18:26   ` Josef Bacik
@ 2021-04-16  5:50     ` Johannes Thumshirn
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-16  5:50 UTC (permalink / raw)
  To: Josef Bacik, David Sterba
  Cc: linux-btrfs@vger.kernel.org, Naohiro Aota, Filipe Manana,
	Anand Jain

On 15/04/2021 20:26, Josef Bacik wrote:
> On 4/15/21 9:58 AM, Johannes Thumshirn wrote:
>> When relocating a block group the freed up space is not discarded in one
>> big block, but each extent is discarded on it's own with -odisard=sync.
>>
>> For a zoned filesystem we need to discard the whole block group at once,
>> so btrfs_discard_extent() will translate the discard into a
>> REQ_OP_ZONE_RESET operation, which then resets the device's zone.
>>
>> Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> What would be cool is if we could disable discard per bg so we don't discard at 
> all during the relocation, and then discard the whole block group no matter if 
> we have zoned or not.  However not really something you need to do, just 
> thinking out loud

I could say I'll add it to my queue, but that's already so long, I have no
idea when that's gonna happen.


> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 18:36   ` Josef Bacik
@ 2021-04-16  5:50     ` Johannes Thumshirn
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-16  5:50 UTC (permalink / raw)
  To: Josef Bacik, David Sterba
  Cc: linux-btrfs@vger.kernel.org, Naohiro Aota, Filipe Manana,
	Anand Jain

On 15/04/2021 20:37, Josef Bacik wrote:
> On 4/15/21 9:58 AM, Johannes Thumshirn wrote:
>> When a file gets deleted on a zoned file system, the space freed is not
>> returned back into the block group's free space, but is migrated to
>> zone_unusable.
>>
>> As this zone_unusable space is behind the current write pointer it is not
>> possible to use it for new allocations. In the current implementation a
>> zone is reset once all of the block group's space is accounted as zone
>> unusable.
>>
>> This behaviour can lead to premature ENOSPC errors on a busy file system.
>>
>> Instead of only reclaiming the zone once it is completely unusable,
>> kick off a reclaim job once the amount of unusable bytes exceeds a user
>> configurable threshold between 51% and 100%. It can be set per mounted
>> filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
>> per default.
>>
>> Similar to reclaiming unused block groups, these dirty block groups are
>> added to a to_reclaim list and then on a transaction commit, the reclaim
>> process is triggered but after we deleted unused block groups, which will
>> free space for the relocation process.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/block-group.c       | 96 ++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/block-group.h       |  3 ++
>>   fs/btrfs/ctree.h             |  6 +++
>>   fs/btrfs/disk-io.c           | 13 +++++
>>   fs/btrfs/free-space-cache.c  |  9 +++-
>>   fs/btrfs/sysfs.c             | 35 +++++++++++++
>>   fs/btrfs/volumes.c           |  2 +-
>>   fs/btrfs/volumes.h           |  1 +
>>   include/trace/events/btrfs.h | 12 +++++
>>   9 files changed, 175 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index bbb5a6e170c7..3f06ea42c013 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -1485,6 +1485,92 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>>   	spin_unlock(&fs_info->unused_bgs_lock);
>>   }
>>   
>> +void btrfs_reclaim_bgs_work(struct work_struct *work)
>> +{
>> +	struct btrfs_fs_info *fs_info =
>> +		container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
>> +	struct btrfs_block_group *bg;
>> +	struct btrfs_space_info *space_info;
>> +	int ret = 0;
>> +
>> +	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
>> +		return;
>> +
>> +	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
>> +		return;
>> +
>> +	mutex_lock(&fs_info->reclaim_bgs_lock);
>> +	spin_lock(&fs_info->unused_bgs_lock);
>> +	while (!list_empty(&fs_info->reclaim_bgs)) {
>> +		bg = list_first_entry(&fs_info->reclaim_bgs,
>> +				      struct btrfs_block_group,
>> +				      bg_list);
>> +		list_del_init(&bg->bg_list);
>> +
>> +		space_info = bg->space_info;
>> +		spin_unlock(&fs_info->unused_bgs_lock);
>> +
>> +		/* Don't want to race with allocators so take the groups_sem */
>> +		down_write(&space_info->groups_sem);
>> +
>> +		spin_lock(&bg->lock);
>> +		if (bg->reserved || bg->pinned || bg->ro) {
>> +			/*
>> +			 * We want to bail if we made new allocations or have
>> +			 * outstanding allocations in this block group.  We do
>> +			 * the ro check in case balance is currently acting on
>> +			 * this block group.
>> +			 */
>> +			spin_unlock(&bg->lock);
>> +			up_write(&space_info->groups_sem);
>> +			goto next;
>> +		}
>> +		spin_unlock(&bg->lock);
>> +
> 
> Here I think we want a
> 
> if (btrfs_fs_closing())
> 	goto next;
> 
> so we don't block out a umount for all eternity.  Thanks,

Right, will add.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
  2021-04-15 18:26   ` Josef Bacik
@ 2021-04-16  9:11   ` Filipe Manana
  2021-04-16  9:14     ` Johannes Thumshirn
  1 sibling, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2021-04-16  9:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 3:00 PM Johannes Thumshirn
<johannes.thumshirn@wdc.com> wrote:
>
> When relocating a block group the freed up space is not discarded in one
> big block, but each extent is discarded on it's own with -odisard=sync.
>
> For a zoned filesystem we need to discard the whole block group at once,
> so btrfs_discard_extent() will translate the discard into a
> REQ_OP_ZONE_RESET operation, which then resets the device's zone.
>
> Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/volumes.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6d9b2369f17a..b1bab75ec12a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3103,6 +3103,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         struct btrfs_root *root = fs_info->chunk_root;
>         struct btrfs_trans_handle *trans;
>         struct btrfs_block_group *block_group;
> +       u64 length;
>         int ret;
>
>         /*
> @@ -3130,8 +3131,24 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         if (!block_group)
>                 return -ENOENT;
>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> +       length = block_group->length;
>         btrfs_put_block_group(block_group);
>
> +       /*
> +        * Step two, delete the device extents and the chunk tree entries.
> +        *
> +        * On a zoned file system, discard the whole block group, this will
> +        * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
> +        * resetting the zone fails, don't treat it as a fatal problem from the
> +        * filesystem's point of view.
> +        */
> +       if (btrfs_is_zoned(fs_info)) {
> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
> +               if (ret)
> +                       btrfs_info(fs_info, "failed to reset zone %llu",
> +                                  chunk_offset);
> +       }
> +
>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
>                                                      chunk_offset);
>         if (IS_ERR(trans)) {
> @@ -3140,10 +3157,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>                 return ret;
>         }
>
> -       /*
> -        * step two, delete the device extents and the
> -        * chunk tree entries
> -        */

This hunk seems unrelated and unintentional.
Not that the comment adds any value, but if the removal was
intentional, I would suggest a separate patch for that.

Other than that, it looks good, thanks.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

>         ret = btrfs_remove_chunk(trans, chunk_offset);
>         btrfs_end_transaction(trans);
>         return ret;
> --
> 2.30.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-16  9:11   ` Filipe Manana
@ 2021-04-16  9:14     ` Johannes Thumshirn
  2021-04-16  9:30       ` Filipe Manana
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-16  9:14 UTC (permalink / raw)
  To: fdmanana@gmail.com
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On 16/04/2021 11:12, Filipe Manana wrote:
> On Thu, Apr 15, 2021 at 3:00 PM Johannes Thumshirn
> <johannes.thumshirn@wdc.com> wrote:
>>
>> When relocating a block group the freed up space is not discarded in one
>> big block, but each extent is discarded on it's own with -odisard=sync.
>>
>> For a zoned filesystem we need to discard the whole block group at once,
>> so btrfs_discard_extent() will translate the discard into a
>> REQ_OP_ZONE_RESET operation, which then resets the device's zone.
>>
>> Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>  fs/btrfs/volumes.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 6d9b2369f17a..b1bab75ec12a 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -3103,6 +3103,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>         struct btrfs_root *root = fs_info->chunk_root;
>>         struct btrfs_trans_handle *trans;
>>         struct btrfs_block_group *block_group;
>> +       u64 length;
>>         int ret;
>>
>>         /*
>> @@ -3130,8 +3131,24 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>         if (!block_group)
>>                 return -ENOENT;
>>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
>> +       length = block_group->length;
>>         btrfs_put_block_group(block_group);
>>
>> +       /*
>> +        * Step two, delete the device extents and the chunk tree entries.
>> +        *
>> +        * On a zoned file system, discard the whole block group, this will
>> +        * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
>> +        * resetting the zone fails, don't treat it as a fatal problem from the
>> +        * filesystem's point of view.
>> +        */
>> +       if (btrfs_is_zoned(fs_info)) {
>> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
>> +               if (ret)
>> +                       btrfs_info(fs_info, "failed to reset zone %llu",
>> +                                  chunk_offset);
>> +       }
>> +
>>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
>>                                                      chunk_offset);
>>         if (IS_ERR(trans)) {
>> @@ -3140,10 +3157,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>                 return ret;
>>         }
>>
>> -       /*
>> -        * step two, delete the device extents and the
>> -        * chunk tree entries
>> -        */
> 
> This hunk seems unrelated and unintentional.
> Not that the comment adds any value, but if the removal was
> intentional, I would suggest a separate patch for that.

It's moved upwards

+       /*
+        * Step two, delete the device extents and the chunk tree entries.
+        *
+        * On a zoned file system, discard the whole block group, this will
+        * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
+        * resetting the zone fails, don't treat it as a fatal problem from the
+        * filesystem's point of view.
+        */

'cause technically the "delete extents" step starts with the discard now. But I
can leave it where it was.

> Other than that, it looks good, thanks.
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex
  2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
  2021-04-15 18:26   ` Josef Bacik
@ 2021-04-16  9:15   ` Filipe Manana
  2021-04-16 16:36   ` David Sterba
  2 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2021-04-16  9:15 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 3:00 PM Johannes Thumshirn
<johannes.thumshirn@wdc.com> wrote:
>
> As a preparation for another user, rename the unused_bgs_mutex into
> reclaim_bgs_lock.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/block-group.c |  6 +++---
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/disk-io.c     |  6 +++---
>  fs/btrfs/volumes.c     | 46 +++++++++++++++++++++---------------------
>  4 files changed, 30 insertions(+), 30 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 293f3169be80..bbb5a6e170c7 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1289,7 +1289,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>          * Long running balances can keep us blocked here for eternity, so
>          * simply skip deletion if we're unable to get the mutex.
>          */
> -       if (!mutex_trylock(&fs_info->delete_unused_bgs_mutex))
> +       if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
>                 return;
>
>         spin_lock(&fs_info->unused_bgs_lock);
> @@ -1462,12 +1462,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>                 spin_lock(&fs_info->unused_bgs_lock);
>         }
>         spin_unlock(&fs_info->unused_bgs_lock);
> -       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +       mutex_unlock(&fs_info->reclaim_bgs_lock);
>         return;
>
>  flip_async:
>         btrfs_end_transaction(trans);
> -       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +       mutex_unlock(&fs_info->reclaim_bgs_lock);
>         btrfs_put_block_group(block_group);
>         btrfs_discard_punt_unused_bgs_list(fs_info);
>  }
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2c858d5349c8..c80302564e6b 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -957,7 +957,7 @@ struct btrfs_fs_info {
>         spinlock_t unused_bgs_lock;
>         struct list_head unused_bgs;
>         struct mutex unused_bg_unpin_mutex;
> -       struct mutex delete_unused_bgs_mutex;
> +       struct mutex reclaim_bgs_lock;
>
>         /* Cached block sizes */
>         u32 nodesize;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 0a1182694f48..e52b89ad0a61 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1890,10 +1890,10 @@ static int cleaner_kthread(void *arg)
>                 btrfs_run_defrag_inodes(fs_info);
>
>                 /*
> -                * Acquires fs_info->delete_unused_bgs_mutex to avoid racing
> +                * Acquires fs_info->reclaim_bgs_lock to avoid racing
>                  * with relocation (btrfs_relocate_chunk) and relocation
>                  * acquires fs_info->cleaner_mutex (btrfs_relocate_block_group)
> -                * after acquiring fs_info->delete_unused_bgs_mutex. So we
> +                * after acquiring fs_info->reclaim_bgs_lock. So we
>                  * can't hold, nor need to, fs_info->cleaner_mutex when deleting
>                  * unused block groups.
>                  */
> @@ -2876,7 +2876,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>         spin_lock_init(&fs_info->treelog_bg_lock);
>         rwlock_init(&fs_info->tree_mod_log_lock);
>         mutex_init(&fs_info->unused_bg_unpin_mutex);
> -       mutex_init(&fs_info->delete_unused_bgs_mutex);
> +       mutex_init(&fs_info->reclaim_bgs_lock);
>         mutex_init(&fs_info->reloc_mutex);
>         mutex_init(&fs_info->delalloc_root_mutex);
>         mutex_init(&fs_info->zoned_meta_io_lock);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b1bab75ec12a..a2a7f5ab0a3e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3118,7 +3118,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>          * we release the path used to search the chunk/dev tree and before
>          * the current task acquires this mutex and calls us.
>          */
> -       lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
> +       lockdep_assert_held(&fs_info->reclaim_bgs_lock);
>
>         /* step one, relocate all the extents inside this chunk */
>         btrfs_scrub_pause(fs_info);
> @@ -3185,10 +3185,10 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
>         key.type = BTRFS_CHUNK_ITEM_KEY;
>
>         while (1) {
> -               mutex_lock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_lock(&fs_info->reclaim_bgs_lock);
>                 ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
>                 if (ret < 0) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto error;
>                 }
>                 BUG_ON(ret == 0); /* Corruption */
> @@ -3196,7 +3196,7 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
>                 ret = btrfs_previous_item(chunk_root, path, key.objectid,
>                                           key.type);
>                 if (ret)
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 if (ret < 0)
>                         goto error;
>                 if (ret > 0)
> @@ -3217,7 +3217,7 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
>                         else
>                                 BUG_ON(ret);
>                 }
> -               mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_unlock(&fs_info->reclaim_bgs_lock);
>
>                 if (found_key.offset == 0)
>                         break;
> @@ -3757,10 +3757,10 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                         goto error;
>                 }
>
> -               mutex_lock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_lock(&fs_info->reclaim_bgs_lock);
>                 ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
>                 if (ret < 0) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto error;
>                 }
>
> @@ -3774,7 +3774,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                 ret = btrfs_previous_item(chunk_root, path, 0,
>                                           BTRFS_CHUNK_ITEM_KEY);
>                 if (ret) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         ret = 0;
>                         break;
>                 }
> @@ -3784,7 +3784,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                 btrfs_item_key_to_cpu(leaf, &found_key, slot);
>
>                 if (found_key.objectid != key.objectid) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         break;
>                 }
>
> @@ -3801,12 +3801,12 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>
>                 btrfs_release_path(path);
>                 if (!ret) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto loop;
>                 }
>
>                 if (counting) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         spin_lock(&fs_info->balance_lock);
>                         bctl->stat.expected++;
>                         spin_unlock(&fs_info->balance_lock);
> @@ -3831,7 +3831,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                                         count_meta < bctl->meta.limit_min)
>                                 || ((chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) &&
>                                         count_sys < bctl->sys.limit_min)) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto loop;
>                 }
>
> @@ -3845,7 +3845,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                         ret = btrfs_may_alloc_data_chunk(fs_info,
>                                                          found_key.offset);
>                         if (ret < 0) {
> -                               mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                               mutex_unlock(&fs_info->reclaim_bgs_lock);
>                                 goto error;
>                         } else if (ret == 1) {
>                                 chunk_reserved = 1;
> @@ -3853,7 +3853,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                 }
>
>                 ret = btrfs_relocate_chunk(fs_info, found_key.offset);
> -               mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 if (ret == -ENOSPC) {
>                         enospc_errors++;
>                 } else if (ret == -ETXTBSY) {
> @@ -4738,16 +4738,16 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>         key.type = BTRFS_DEV_EXTENT_KEY;
>
>         do {
> -               mutex_lock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_lock(&fs_info->reclaim_bgs_lock);
>                 ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>                 if (ret < 0) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto done;
>                 }
>
>                 ret = btrfs_previous_item(root, path, 0, key.type);
>                 if (ret) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         if (ret < 0)
>                                 goto done;
>                         ret = 0;
> @@ -4760,7 +4760,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>                 btrfs_item_key_to_cpu(l, &key, path->slots[0]);
>
>                 if (key.objectid != device->devid) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         btrfs_release_path(path);
>                         break;
>                 }
> @@ -4769,7 +4769,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>                 length = btrfs_dev_extent_length(l, dev_extent);
>
>                 if (key.offset + length <= new_size) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         btrfs_release_path(path);
>                         break;
>                 }
> @@ -4785,12 +4785,12 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>                  */
>                 ret = btrfs_may_alloc_data_chunk(fs_info, chunk_offset);
>                 if (ret < 0) {
> -                       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +                       mutex_unlock(&fs_info->reclaim_bgs_lock);
>                         goto done;
>                 }
>
>                 ret = btrfs_relocate_chunk(fs_info, chunk_offset);
> -               mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +               mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 if (ret == -ENOSPC) {
>                         failed++;
>                 } else if (ret) {
> @@ -8016,7 +8016,7 @@ static int relocating_repair_kthread(void *data)
>                 return -EBUSY;
>         }
>
> -       mutex_lock(&fs_info->delete_unused_bgs_mutex);
> +       mutex_lock(&fs_info->reclaim_bgs_lock);
>
>         /* Ensure block group still exists */
>         cache = btrfs_lookup_block_group(fs_info, target);
> @@ -8038,7 +8038,7 @@ static int relocating_repair_kthread(void *data)
>  out:
>         if (cache)
>                 btrfs_put_block_group(cache);
> -       mutex_unlock(&fs_info->delete_unused_bgs_mutex);
> +       mutex_unlock(&fs_info->reclaim_bgs_lock);
>         btrfs_exclop_finish(fs_info);
>
>         return ret;
> --
> 2.30.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
  2021-04-15 18:36   ` Josef Bacik
@ 2021-04-16  9:28   ` Filipe Manana
  2021-04-16 16:38   ` David Sterba
  2021-04-16 16:41   ` David Sterba
  3 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2021-04-16  9:28 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 3:00 PM Johannes Thumshirn
<johannes.thumshirn@wdc.com> wrote:
>
> When a file gets deleted on a zoned file system, the space freed is not
> returned back into the block group's free space, but is migrated to
> zone_unusable.
>
> As this zone_unusable space is behind the current write pointer it is not
> possible to use it for new allocations. In the current implementation a
> zone is reset once all of the block group's space is accounted as zone
> unusable.
>
> This behaviour can lead to premature ENOSPC errors on a busy file system.
>
> Instead of only reclaiming the zone once it is completely unusable,
> kick off a reclaim job once the amount of unusable bytes exceeds a user
> configurable threshold between 51% and 100%. It can be set per mounted
> filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
> per default.
>
> Similar to reclaiming unused block groups, these dirty block groups are
> added to a to_reclaim list and then on a transaction commit, the reclaim
> process is triggered but after we deleted unused block groups, which will
> free space for the relocation process.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/block-group.c       | 96 ++++++++++++++++++++++++++++++++++++
>  fs/btrfs/block-group.h       |  3 ++
>  fs/btrfs/ctree.h             |  6 +++
>  fs/btrfs/disk-io.c           | 13 +++++
>  fs/btrfs/free-space-cache.c  |  9 +++-
>  fs/btrfs/sysfs.c             | 35 +++++++++++++
>  fs/btrfs/volumes.c           |  2 +-
>  fs/btrfs/volumes.h           |  1 +
>  include/trace/events/btrfs.h | 12 +++++
>  9 files changed, 175 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index bbb5a6e170c7..3f06ea42c013 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1485,6 +1485,92 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>         spin_unlock(&fs_info->unused_bgs_lock);
>  }
>
> +void btrfs_reclaim_bgs_work(struct work_struct *work)
> +{
> +       struct btrfs_fs_info *fs_info =
> +               container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
> +       struct btrfs_block_group *bg;
> +       struct btrfs_space_info *space_info;
> +       int ret = 0;
> +
> +       if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
> +               return;
> +
> +       if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
> +               return;
> +
> +       mutex_lock(&fs_info->reclaim_bgs_lock);
> +       spin_lock(&fs_info->unused_bgs_lock);
> +       while (!list_empty(&fs_info->reclaim_bgs)) {
> +               bg = list_first_entry(&fs_info->reclaim_bgs,
> +                                     struct btrfs_block_group,
> +                                     bg_list);
> +               list_del_init(&bg->bg_list);
> +
> +               space_info = bg->space_info;
> +               spin_unlock(&fs_info->unused_bgs_lock);
> +
> +               /* Don't want to race with allocators so take the groups_sem */
> +               down_write(&space_info->groups_sem);
> +
> +               spin_lock(&bg->lock);
> +               if (bg->reserved || bg->pinned || bg->ro) {
> +                       /*
> +                        * We want to bail if we made new allocations or have
> +                        * outstanding allocations in this block group.  We do
> +                        * the ro check in case balance is currently acting on
> +                        * this block group.
> +                        */
> +                       spin_unlock(&bg->lock);
> +                       up_write(&space_info->groups_sem);
> +                       goto next;
> +               }
> +               spin_unlock(&bg->lock);
> +
> +               ret = inc_block_group_ro(bg, 0);
> +               up_write(&space_info->groups_sem);
> +               if (ret < 0) {
> +                       ret = 0;

It's pointless to set ret to 0.

> +                       goto next;
> +               }
> +
> +               btrfs_info(fs_info, "reclaiming chunk %llu", bg->start);
> +               trace_btrfs_reclaim_block_group(bg);
> +               ret = btrfs_relocate_chunk(fs_info, bg->start);
> +               if (ret)
> +                       btrfs_err(fs_info, "error relocating chunk %llu",
> +                                 bg->start);
> +
> +next:
> +               btrfs_put_block_group(bg);
> +               spin_lock(&fs_info->unused_bgs_lock);
> +       }
> +       spin_unlock(&fs_info->unused_bgs_lock);
> +       mutex_unlock(&fs_info->reclaim_bgs_lock);
> +       btrfs_exclop_finish(fs_info);
> +}
> +
> +void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info)
> +{
> +       spin_lock(&fs_info->unused_bgs_lock);
> +       if (!list_empty(&fs_info->reclaim_bgs))
> +               queue_work(system_unbound_wq, &fs_info->reclaim_bgs_work);
> +       spin_unlock(&fs_info->unused_bgs_lock);
> +}
> +
> +void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg)
> +{
> +       struct btrfs_fs_info *fs_info = bg->fs_info;
> +
> +       spin_lock(&fs_info->unused_bgs_lock);
> +       if (list_empty(&bg->bg_list)) {
> +               btrfs_get_block_group(bg);
> +               trace_btrfs_add_reclaim_block_group(bg);
> +               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> +       }
> +       spin_unlock(&fs_info->unused_bgs_lock);
> +}
> +
>  static int read_bg_from_eb(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>                            struct btrfs_path *path)
>  {
> @@ -3446,6 +3532,16 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>         }
>         spin_unlock(&info->unused_bgs_lock);
>
> +       spin_lock(&info->unused_bgs_lock);
> +       while (!list_empty(&info->reclaim_bgs)) {
> +               block_group = list_first_entry(&info->reclaim_bgs,
> +                                              struct btrfs_block_group,
> +                                              bg_list);
> +               list_del_init(&block_group->bg_list);
> +               btrfs_put_block_group(block_group);
> +       }
> +       spin_unlock(&info->unused_bgs_lock);
> +
>         spin_lock(&info->block_group_cache_lock);
>         while ((n = rb_last(&info->block_group_cache_tree)) != NULL) {
>                 block_group = rb_entry(n, struct btrfs_block_group,
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 3ecc3372a5ce..7b927425dc71 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -264,6 +264,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>                              u64 group_start, struct extent_map *em);
>  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
>  void btrfs_mark_bg_unused(struct btrfs_block_group *bg);
> +void btrfs_reclaim_bgs_work(struct work_struct *work);
> +void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info);
> +void btrfs_mark_bg_to_reclaim(struct btrfs_block_group *bg);
>  int btrfs_read_block_groups(struct btrfs_fs_info *info);
>  int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
>                            u64 type, u64 chunk_offset, u64 size);
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index c80302564e6b..88531c1fbcdf 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -954,10 +954,14 @@ struct btrfs_fs_info {
>         struct work_struct async_data_reclaim_work;
>         struct work_struct preempt_reclaim_work;
>
> +       /* Used to reclaim data space in the background */

Not just data, metadata block groups too.
Perhaps saying it's used to reclaim partially filled block groups is
more meaningful.

Other than that it looks good.
With Josef's suggestion to not slow down unmount too much, you can add

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +       struct work_struct reclaim_bgs_work;
> +
>         spinlock_t unused_bgs_lock;
>         struct list_head unused_bgs;
>         struct mutex unused_bg_unpin_mutex;
>         struct mutex reclaim_bgs_lock;
> +       struct list_head reclaim_bgs;
>
>         /* Cached block sizes */
>         u32 nodesize;
> @@ -998,6 +1002,8 @@ struct btrfs_fs_info {
>         spinlock_t treelog_bg_lock;
>         u64 treelog_bg;
>
> +       int bg_reclaim_threshold;
> +
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>         spinlock_t ref_verify_lock;
>         struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index e52b89ad0a61..942d894ec175 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1898,6 +1898,13 @@ static int cleaner_kthread(void *arg)
>                  * unused block groups.
>                  */
>                 btrfs_delete_unused_bgs(fs_info);
> +
> +               /*
> +                * Reclaim block groups in the reclaim_bgs list after we deleted
> +                * all unused block_groups. This possibly gives us some more free
> +                * space.
> +                */
> +               btrfs_reclaim_bgs(fs_info);
>  sleep:
>                 clear_and_wake_up_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
>                 if (kthread_should_park())
> @@ -2886,6 +2893,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>         INIT_LIST_HEAD(&fs_info->space_info);
>         INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
>         INIT_LIST_HEAD(&fs_info->unused_bgs);
> +       INIT_LIST_HEAD(&fs_info->reclaim_bgs);
>  #ifdef CONFIG_BTRFS_DEBUG
>         INIT_LIST_HEAD(&fs_info->allocated_roots);
>         INIT_LIST_HEAD(&fs_info->allocated_ebs);
> @@ -2974,6 +2982,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>         fs_info->swapfile_pins = RB_ROOT;
>
>         fs_info->send_in_progress = 0;
> +
> +       fs_info->bg_reclaim_threshold = 75;
> +       INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
>  }
>
>  static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
> @@ -4332,6 +4343,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>         cancel_work_sync(&fs_info->async_data_reclaim_work);
>         cancel_work_sync(&fs_info->preempt_reclaim_work);
>
> +       cancel_work_sync(&fs_info->reclaim_bgs_work);
> +
>         /* Cancel or finish ongoing discard work */
>         btrfs_discard_cleanup(fs_info);
>
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 9988decd5717..e54466fc101f 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -11,6 +11,7 @@
>  #include <linux/ratelimit.h>
>  #include <linux/error-injection.h>
>  #include <linux/sched/mm.h>
> +#include "misc.h"
>  #include "ctree.h"
>  #include "free-space-cache.h"
>  #include "transaction.h"
> @@ -2539,6 +2540,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info,
>  static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>                                         u64 bytenr, u64 size, bool used)
>  {
> +       struct btrfs_fs_info *fs_info = block_group->fs_info;
>         struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
>         u64 offset = bytenr - block_group->start;
>         u64 to_free, to_unusable;
> @@ -2569,8 +2571,13 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>         }
>
>         /* All the region is now unusable. Mark it as unused and reclaim */
> -       if (block_group->zone_unusable == block_group->length)
> +       if (block_group->zone_unusable == block_group->length) {
>                 btrfs_mark_bg_unused(block_group);
> +       } else if (block_group->zone_unusable >=
> +                  div_factor_fine(block_group->length,
> +                                  fs_info->bg_reclaim_threshold)) {
> +               btrfs_mark_bg_to_reclaim(block_group);
> +       }
>
>         return 0;
>  }
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index a99d1f415a7f..436ac7b4b334 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -980,6 +980,40 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>  }
>  BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
>
> +static ssize_t btrfs_bg_reclaim_threshold_show(struct kobject *kobj,
> +                                              struct kobj_attribute *a,
> +                                              char *buf)
> +{
> +       struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +       ssize_t ret;
> +
> +       ret = scnprintf(buf, PAGE_SIZE, "%d\n", fs_info->bg_reclaim_threshold);
> +
> +       return ret;
> +}
> +
> +static ssize_t btrfs_bg_reclaim_threshold_store(struct kobject *kobj,
> +                                               struct kobj_attribute *a,
> +                                               const char *buf, size_t len)
> +{
> +       struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +       int thresh;
> +       int ret;
> +
> +       ret = kstrtoint(buf, 10, &thresh);
> +       if (ret)
> +               return ret;
> +
> +       if (thresh <= 50 || thresh > 100)
> +               return -EINVAL;
> +
> +       fs_info->bg_reclaim_threshold = thresh;
> +
> +       return len;
> +}
> +BTRFS_ATTR_RW(, bg_reclaim_threshold, btrfs_bg_reclaim_threshold_show,
> +             btrfs_bg_reclaim_threshold_store);
> +
>  static const struct attribute *btrfs_attrs[] = {
>         BTRFS_ATTR_PTR(, label),
>         BTRFS_ATTR_PTR(, nodesize),
> @@ -991,6 +1025,7 @@ static const struct attribute *btrfs_attrs[] = {
>         BTRFS_ATTR_PTR(, exclusive_operation),
>         BTRFS_ATTR_PTR(, generation),
>         BTRFS_ATTR_PTR(, read_policy),
> +       BTRFS_ATTR_PTR(, bg_reclaim_threshold),
>         NULL,
>  };
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a2a7f5ab0a3e..527324154e3e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3098,7 +3098,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>         return ret;
>  }
>
> -static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> +int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>  {
>         struct btrfs_root *root = fs_info->chunk_root;
>         struct btrfs_trans_handle *trans;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index d4c3e0dd32b8..9c0d84e5ec06 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -484,6 +484,7 @@ void btrfs_describe_block_groups(u64 flags, char *buf, u32 size_buf);
>  int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
>  int btrfs_recover_balance(struct btrfs_fs_info *fs_info);
>  int btrfs_pause_balance(struct btrfs_fs_info *fs_info);
> +int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset);
>  int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
>  int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info);
>  int btrfs_uuid_scan_kthread(void *data);
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index 0551ea65374f..a41dd8a0c730 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -1903,6 +1903,18 @@ DEFINE_EVENT(btrfs__block_group, btrfs_add_unused_block_group,
>         TP_ARGS(bg_cache)
>  );
>
> +DEFINE_EVENT(btrfs__block_group, btrfs_add_reclaim_block_group,
> +       TP_PROTO(const struct btrfs_block_group *bg_cache),
> +
> +       TP_ARGS(bg_cache)
> +);
> +
> +DEFINE_EVENT(btrfs__block_group, btrfs_reclaim_block_group,
> +       TP_PROTO(const struct btrfs_block_group *bg_cache),
> +
> +       TP_ARGS(bg_cache)
> +);
> +
>  DEFINE_EVENT(btrfs__block_group, btrfs_skip_unused_block_group,
>         TP_PROTO(const struct btrfs_block_group *bg_cache),
>
> --
> 2.30.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-16  9:14     ` Johannes Thumshirn
@ 2021-04-16  9:30       ` Filipe Manana
  2021-04-16  9:32         ` Johannes Thumshirn
  0 siblings, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2021-04-16  9:30 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Fri, Apr 16, 2021 at 10:14 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 16/04/2021 11:12, Filipe Manana wrote:
> > On Thu, Apr 15, 2021 at 3:00 PM Johannes Thumshirn
> > <johannes.thumshirn@wdc.com> wrote:
> >>
> >> When relocating a block group the freed up space is not discarded in one
> >> big block, but each extent is discarded on it's own with -odisard=sync.
> >>
> >> For a zoned filesystem we need to discard the whole block group at once,
> >> so btrfs_discard_extent() will translate the discard into a
> >> REQ_OP_ZONE_RESET operation, which then resets the device's zone.
> >>
> >> Link: https://lore.kernel.org/linux-btrfs/459e2932c48e12e883dcfd3dda828d9da251d5b5.1617962110.git.johannes.thumshirn@wdc.com
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>  fs/btrfs/volumes.c | 21 +++++++++++++++++----
> >>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 6d9b2369f17a..b1bab75ec12a 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -3103,6 +3103,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>         struct btrfs_root *root = fs_info->chunk_root;
> >>         struct btrfs_trans_handle *trans;
> >>         struct btrfs_block_group *block_group;
> >> +       u64 length;
> >>         int ret;
> >>
> >>         /*
> >> @@ -3130,8 +3131,24 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>         if (!block_group)
> >>                 return -ENOENT;
> >>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> >> +       length = block_group->length;
> >>         btrfs_put_block_group(block_group);
> >>
> >> +       /*
> >> +        * Step two, delete the device extents and the chunk tree entries.
> >> +        *
> >> +        * On a zoned file system, discard the whole block group, this will
> >> +        * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
> >> +        * resetting the zone fails, don't treat it as a fatal problem from the
> >> +        * filesystem's point of view.
> >> +        */
> >> +       if (btrfs_is_zoned(fs_info)) {
> >> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
> >> +               if (ret)
> >> +                       btrfs_info(fs_info, "failed to reset zone %llu",
> >> +                                  chunk_offset);
> >> +       }
> >> +
> >>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
> >>                                                      chunk_offset);
> >>         if (IS_ERR(trans)) {
> >> @@ -3140,10 +3157,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>                 return ret;
> >>         }
> >>
> >> -       /*
> >> -        * step two, delete the device extents and the
> >> -        * chunk tree entries
> >> -        */
> >
> > This hunk seems unrelated and unintentional.
> > Not that the comment adds any value, but if the removal was
> > intentional, I would suggest a separate patch for that.
>
> It's moved upwards
>
> +       /*
> +        * Step two, delete the device extents and the chunk tree entries.
> +        *
> +        * On a zoned file system, discard the whole block group, this will
> +        * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
> +        * resetting the zone fails, don't treat it as a fatal problem from the
> +        * filesystem's point of view.
> +        */
>
> 'cause technically the "delete extents" step starts with the discard now. But I
> can leave it where it was.

The comment refers to deleting the device extent items from the device
tree, not really extents on disk.
I.e. it's all about the items from the block group in the chunk and
device trees.

>
> > Other than that, it looks good, thanks.
> >
> > Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >
>
> Thanks



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups
  2021-04-16  9:30       ` Filipe Manana
@ 2021-04-16  9:32         ` Johannes Thumshirn
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Thumshirn @ 2021-04-16  9:32 UTC (permalink / raw)
  To: fdmanana@gmail.com
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On 16/04/2021 11:30, Filipe Manana wrote:
> The comment refers to deleting the device extent items from the device
> tree, not really extents on disk.
> I.e. it's all about the items from the block group in the chunk and
> device trees.

OK I'll move it back where it was.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex
  2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
  2021-04-15 18:26   ` Josef Bacik
  2021-04-16  9:15   ` Filipe Manana
@ 2021-04-16 16:36   ` David Sterba
  2 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2021-04-16 16:36 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 10:58:34PM +0900, Johannes Thumshirn wrote:
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -957,7 +957,7 @@ struct btrfs_fs_info {
>  	spinlock_t unused_bgs_lock;
>  	struct list_head unused_bgs;
>  	struct mutex unused_bg_unpin_mutex;
> -	struct mutex delete_unused_bgs_mutex;
> +	struct mutex reclaim_bgs_lock;

Please write a comment what the mutex protects.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
  2021-04-15 18:36   ` Josef Bacik
  2021-04-16  9:28   ` Filipe Manana
@ 2021-04-16 16:38   ` David Sterba
  2021-04-16 16:41   ` David Sterba
  3 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2021-04-16 16:38 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 10:58:35PM +0900, Johannes Thumshirn wrote:
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -954,10 +954,14 @@ struct btrfs_fs_info {
>  	struct work_struct async_data_reclaim_work;
>  	struct work_struct preempt_reclaim_work;
>  
> +	/* Used to reclaim data space in the background */
> +	struct work_struct reclaim_bgs_work;
> +
>  	spinlock_t unused_bgs_lock;
>  	struct list_head unused_bgs;
>  	struct mutex unused_bg_unpin_mutex;
>  	struct mutex reclaim_bgs_lock;
> +	struct list_head reclaim_bgs;
>  
>  	/* Cached block sizes */
>  	u32 nodesize;
> @@ -998,6 +1002,8 @@ struct btrfs_fs_info {
>  	spinlock_t treelog_bg_lock;
>  	u64 treelog_bg;
>  
> +	int bg_reclaim_threshold;

That's spreading related members over too many lines, please put them
together with comments what they mean if that's not obvious (like for
the work struct).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones
  2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
                     ` (2 preceding siblings ...)
  2021-04-16 16:38   ` David Sterba
@ 2021-04-16 16:41   ` David Sterba
  3 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2021-04-16 16:41 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, linux-btrfs, Josef Bacik, Naohiro Aota,
	Filipe Manana, Anand Jain

On Thu, Apr 15, 2021 at 10:58:35PM +0900, Johannes Thumshirn wrote:
> @@ -2974,6 +2982,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>  	fs_info->swapfile_pins = RB_ROOT;
>  
>  	fs_info->send_in_progress = 0;
> +
> +	fs_info->bg_reclaim_threshold = 75;

The value should be a named constant visible defined in zoned.h with a
comment what it means.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-16 16:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-15 13:58 [PATCH v4 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
2021-04-15 13:58 ` [PATCH v4 1/3] btrfs: zoned: reset zones of relocated block groups Johannes Thumshirn
2021-04-15 18:26   ` Josef Bacik
2021-04-16  5:50     ` Johannes Thumshirn
2021-04-16  9:11   ` Filipe Manana
2021-04-16  9:14     ` Johannes Thumshirn
2021-04-16  9:30       ` Filipe Manana
2021-04-16  9:32         ` Johannes Thumshirn
2021-04-15 13:58 ` [PATCH v4 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
2021-04-15 18:26   ` Josef Bacik
2021-04-16  9:15   ` Filipe Manana
2021-04-16 16:36   ` David Sterba
2021-04-15 13:58 ` [PATCH v4 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn
2021-04-15 18:36   ` Josef Bacik
2021-04-16  5:50     ` Johannes Thumshirn
2021-04-16  9:28   ` Filipe Manana
2021-04-16 16:38   ` David Sterba
2021-04-16 16:41   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).