[PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling
@ 2023-07-26 15:56 fdmanana
  2023-07-26 15:56 ` [PATCH 01/17] btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART fdmanana
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

A few fixes, improvements, cleanups around space flushing, enospc handling,
debugging. These came out while debugging an image of a fs that fails to
mount due to being out of unallocated space and having only about 1.6M of
free metadata space, which is not enough to commit any transaction triggered
during mount while doing orphan cleanup, resulting in transactions aborts
either when running delayed refs or somewhere in the critical section of a
transaction commit. These patches do not prevent getting into that situation,
but that will be attempted by other patches in a separate patchset.

Filipe Manana (17):
  btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
  btrfs: update comment for btrfs_join_transaction_nostart()
  btrfs: print target number of bytes when dumping free space
  btrfs: print block group super and delalloc bytes when dumping space info
  btrfs: print available space for a block group when dumping a space info
  btrfs: print available space across all block groups when dumping space info
  btrfs: don't steal space from global rsv after a transaction abort
  btrfs: store the error that turned the fs into error state
  btrfs: return real error when orphan cleanup fails due to a transaction abort
  btrfs: fail priority metadata ticket with real fs error
  btrfs: make btrfs_cleanup_fs_roots() static
  btrfs: make find_free_dev_extent() static
  btrfs: merge find_free_dev_extent() and find_free_dev_extent_start()
  btrfs: avoid starting new transaction when flushing delayed items and refs
  btrfs: avoid starting and committing empty transaction when flushing space
  btrfs: avoid start and commit empty transaction when starting qgroup rescan
  btrfs: avoid start and commit empty transaction when flushing qgroups

 fs/btrfs/disk-io.c          | 102 ++++++++++++++++++------------------
 fs/btrfs/disk-io.h          |   1 -
 fs/btrfs/free-space-cache.c |   3 +-
 fs/btrfs/fs.h               |  12 +++--
 fs/btrfs/inode.c            |   9 +++-
 fs/btrfs/messages.c         |  10 ++--
 fs/btrfs/qgroup.c           |  19 ++++---
 fs/btrfs/space-info.c       |  51 ++++++++++++++----
 fs/btrfs/transaction.c      |  12 +++--
 fs/btrfs/volumes.c          |  21 +++-----
 fs/btrfs/volumes.h          |   2 -
 11 files changed, 144 insertions(+), 98 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/17] btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
@ 2023-07-26 15:56 ` fdmanana
  2023-07-26 15:56 ` [PATCH 02/17] btrfs: update comment for btrfs_join_transaction_nostart() fdmanana
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When joining a transaction with TRANS_JOIN_NOSTART, if we don't find a
running transaction we end up creating one. This goes against the purpose
of TRANS_JOIN_NOSTART which is to join a running transaction if its state
is at or below the state TRANS_STATE_COMMIT_START, otherwise return an
-ENOENT error and don't start a new transaction. So fix this to not create
a new transaction if there's no running transaction at or below that
state.

Fixes: a6d155d2e363 ("Btrfs: fix deadlock between fiemap and transaction commits")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/transaction.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 815f61d6b506..6a2a12593183 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -292,10 +292,11 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	spin_unlock(&fs_info->trans_lock);
 
 	/*
-	 * If we are ATTACH, we just want to catch the current transaction,
-	 * and commit it. If there is no transaction, just return ENOENT.
+	 * If we are ATTACH or TRANS_JOIN_NOSTART, we just want to catch the
+	 * current transaction, and commit it. If there is no transaction, just
+	 * return ENOENT.
 	 */
-	if (type == TRANS_ATTACH)
+	if (type == TRANS_ATTACH || type == TRANS_JOIN_NOSTART)
 		return -ENOENT;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/17] btrfs: update comment for btrfs_join_transaction_nostart()
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
  2023-07-26 15:56 ` [PATCH 01/17] btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART fdmanana
@ 2023-07-26 15:56 ` fdmanana
  2023-07-26 15:56 ` [PATCH 03/17] btrfs: print target number of bytes when dumping free space fdmanana
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Update the comment for btrfs_join_transaction_nostart() to be more clear
about how it works and how it's different from btrfs_attach_transaction().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/transaction.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 6a2a12593183..ab09542f2170 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -799,7 +799,10 @@ struct btrfs_trans_handle *btrfs_join_transaction_spacecache(struct btrfs_root *
 
 /*
  * Similar to regular join but it never starts a transaction when none is
- * running or after waiting for the current one to finish.
+ * running or when there's a running one at a state >= TRANS_STATE_UNBLOCKED.
+ * This is similar to btrfs_attach_transaction() but it allows the join to
+ * happen if the transaction commit already started but it's not yet in the
+ * "doing" phase (the state is < TRANS_STATE_COMMIT_DOING).
  */
 struct btrfs_trans_handle *btrfs_join_transaction_nostart(struct btrfs_root *root)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/17] btrfs: print target number of bytes when dumping free space
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
  2023-07-26 15:56 ` [PATCH 01/17] btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART fdmanana
  2023-07-26 15:56 ` [PATCH 02/17] btrfs: update comment for btrfs_join_transaction_nostart() fdmanana
@ 2023-07-26 15:56 ` fdmanana
  2023-07-26 15:57 ` [PATCH 04/17] btrfs: print block group super and delalloc bytes when dumping space info fdmanana
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When dumping free space, with btrfs_dump_free_space(), we pass a bytes
argument in order to count how many free space entries in the block group
have a size greater than or equal to that number of bytes. We then print
how many suitable entries we found, but we don't print the target number
of bytes, we just say "bytes". Change the message to actually print the
number of bytes, which makes debugging -ENOSPC issues a bit easier.

Also sligthly change the odd grammar and terminology: the sentence is
ending with 'is', which doesn't make sense, and the term 'blocks' is
confusing as we are referring to free space entries within the block
group's free space cache.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/free-space-cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index bfc01352351c..cd5bfda2c259 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2943,7 +2943,8 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group,
 	btrfs_info(fs_info, "block group has cluster?: %s",
 	       list_empty(&block_group->cluster_list) ? "no" : "yes");
 	btrfs_info(fs_info,
-		   "%d blocks of free space at or bigger than bytes is", count);
+		   "%d free space entries at or bigger than %llu bytes",
+		   count, bytes);
 }
 
 void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/17] btrfs: print block group super and delalloc bytes when dumping space info
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (2 preceding siblings ...)
  2023-07-26 15:56 ` [PATCH 03/17] btrfs: print target number of bytes when dumping free space fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 05/17] btrfs: print available space for a block group when dumping a " fdmanana
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When dumping a space info's block groups, also print the number of bytes
used for super blocks and delalloc. This is often useful for debugging
-ENOSPC problems.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 75e7fa337e66..ae12a8a9cd12 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -525,10 +525,11 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 	list_for_each_entry(cache, &info->block_groups[index], list) {
 		spin_lock(&cache->lock);
 		btrfs_info(fs_info,
-			"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s",
-			cache->start, cache->length, cache->used, cache->pinned,
-			cache->reserved, cache->zone_unusable,
-			cache->ro ? "[readonly]" : "");
+"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu delalloc %llu super %llu zone_unusable %s",
+			   cache->start, cache->length, cache->used, cache->pinned,
+			   cache->reserved, cache->delalloc_bytes,
+			   cache->bytes_super, cache->zone_unusable,
+			   cache->ro ? "[readonly]" : "");
 		spin_unlock(&cache->lock);
 		btrfs_dump_free_space(cache, bytes);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/17] btrfs: print available space for a block group when dumping a space info
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (3 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 04/17] btrfs: print block group super and delalloc bytes when dumping space info fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 06/17] btrfs: print available space across all block groups when dumping " fdmanana
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When dumping a space info, we iterate over all its block groups and then
print their size and the amounts of bytes used, reserved, pinned, etc.
When debugging -ENOSPC problems it's also useful to know how much space
is available (free), so calculate that and print it as well.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index ae12a8a9cd12..54d78e839c01 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -523,13 +523,18 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 	down_read(&info->groups_sem);
 again:
 	list_for_each_entry(cache, &info->block_groups[index], list) {
+		u64 avail;
+
 		spin_lock(&cache->lock);
+		avail = cache->length - cache->used - cache->pinned -
+			cache->reserved - cache->delalloc_bytes -
+			cache->bytes_super - cache->zone_unusable;
 		btrfs_info(fs_info,
-"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu delalloc %llu super %llu zone_unusable %s",
+"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu delalloc %llu super %llu zone_unusable (%llu bytes available) %s",
 			   cache->start, cache->length, cache->used, cache->pinned,
 			   cache->reserved, cache->delalloc_bytes,
 			   cache->bytes_super, cache->zone_unusable,
-			   cache->ro ? "[readonly]" : "");
+			   avail, cache->ro ? "[readonly]" : "");
 		spin_unlock(&cache->lock);
 		btrfs_dump_free_space(cache, bytes);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/17] btrfs: print available space across all block groups when dumping space info
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (4 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 05/17] btrfs: print available space for a block group when dumping a " fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 07/17] btrfs: don't steal space from global rsv after a transaction abort fdmanana
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When dumping a space info also sum the available space for all block
groups and then print it. This often useful for debugging -ENOSPC
related problems.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 54d78e839c01..de0044283e29 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -510,6 +510,7 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 			   int dump_block_groups)
 {
 	struct btrfs_block_group *cache;
+	u64 total_avail = 0;
 	int index = 0;
 
 	spin_lock(&info->lock);
@@ -537,10 +538,13 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 			   avail, cache->ro ? "[readonly]" : "");
 		spin_unlock(&cache->lock);
 		btrfs_dump_free_space(cache, bytes);
+		total_avail += avail;
 	}
 	if (++index < BTRFS_NR_RAID_TYPES)
 		goto again;
 	up_read(&info->groups_sem);
+
+	btrfs_info(fs_info, "%llu bytes available across all block groups", total_avail);
 }
 
 static inline u64 calc_reclaim_items_nr(const struct btrfs_fs_info *fs_info,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/17] btrfs: don't steal space from global rsv after a transaction abort
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (5 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 06/17] btrfs: print available space across all block groups when dumping " fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 08/17] btrfs: store the error that turned the fs into error state fdmanana
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When doing a priority metadata space reclaim, while we are going through
the flush states and running their respective operations, it's possible
that a transaction abort happened, for example when running delayed refs
we hit -ENOSPC or in the critical section of transaction commit we failed
with -ENOSPC or some other error. In these cases a transaction was aborted
and the fs turned into error state. If that happened, then it makes no
sense to steal from the global block reserve and return success to the
caller if the stealing was successful - the caller will later get an
error when attempting to modify the fs. Instead make the ticket fail if
we have the fs in error state and don't attempt to steal from the global
rsv, as it's not only it's pointless, it also simplifies debugging some
-ENOSPC problems.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index de0044283e29..5b1b71e029ad 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1418,8 +1418,18 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 		}
 	}
 
-	/* Attempt to steal from the global rsv if we can. */
-	if (!steal_from_global_rsv(fs_info, space_info, ticket)) {
+	/*
+	 * Attempt to steal from the global rsv if we can, except if the fs was
+	 * turned into error mode due to a transaction abort when flushing space
+	 * above, in that case fail with -EROFS instead of returning success to
+	 * the caller if we can steal from the global rsv - this is just to have
+	 * caller fail immeditelly instead of later when trying to modify the
+	 * fs, making it easier to debug -ENOSPC problems.
+	 */
+	if (BTRFS_FS_ERROR(fs_info)) {
+		ticket->error = -EROFS;
+		remove_ticket(space_info, ticket);
+	} else if (!steal_from_global_rsv(fs_info, space_info, ticket)) {
 		ticket->error = -ENOSPC;
 		remove_ticket(space_info, ticket);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/17] btrfs: store the error that turned the fs into error state
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (6 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 07/17] btrfs: don't steal space from global rsv after a transaction abort fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 09/17] btrfs: return real error when orphan cleanup fails due to a transaction abort fdmanana
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Currently when we turn the fs into an error state, typically after a
transaction abort, we don't store the error anywhere, we just set a bit
(BTRFS_FS_STATE_ERROR) at struct btrfs_fs_info::fs_state to signal the
error state.

There are cases where it would be useful to have access to the specific
error in order to provide a more meaningful error to users/applications.
This change adds a member to struct btrfs_fs_info to store the error and
removes the BTRFS_FS_STATE_ERROR bit. When there's no error, the new
member (fs_error) has a value of 0, otherwise its value is a negative
errno value.

Followup changes will make use of this new member.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c  |  2 +-
 fs/btrfs/fs.h       | 12 ++++++++----
 fs/btrfs/messages.c | 10 +++++++---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b4495d4c1533..cad79d4eecdf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3211,7 +3211,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	/* check FS state, whether FS is broken. */
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_ERROR)
-		set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state);
+		WRITE_ONCE(fs_info->fs_error, -EUCLEAN);
 
 	/*
 	 * In the long term, we'll store the compression type in the super
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 203d2a267828..ef07c6c252d8 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -46,8 +46,6 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
  * Runtime (in-memory) states of filesystem
  */
 enum {
-	/* Global indicator of serious filesystem errors */
-	BTRFS_FS_STATE_ERROR,
 	/*
 	 * Filesystem is being remounted, allow to skip some operations, like
 	 * defrag
@@ -686,6 +684,12 @@ struct btrfs_fs_info {
 	bool qgroup_rescan_running;
 	u8 qgroup_drop_subtree_thres;
 
+	/*
+	 * If this is not 0, then it indicates a serious filesystem error has
+	 * happened and it contains that error (negative errno value).
+	 */
+	int fs_error;
+
 	/* Filesystem state */
 	unsigned long fs_state;
 
@@ -962,8 +966,8 @@ static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
 	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
 }
 
-#define BTRFS_FS_ERROR(fs_info)	(unlikely(test_bit(BTRFS_FS_STATE_ERROR, \
-						   &(fs_info)->fs_state)))
+#define BTRFS_FS_ERROR(fs_info)	(READ_ONCE((fs_info)->fs_error))
+
 #define BTRFS_FS_LOG_CLEANUP_ERROR(fs_info)				\
 	(unlikely(test_bit(BTRFS_FS_STATE_LOG_CLEANUP_ERROR,		\
 			   &(fs_info)->fs_state)))
diff --git a/fs/btrfs/messages.c b/fs/btrfs/messages.c
index 23fc11af498a..e3c9d2706341 100644
--- a/fs/btrfs/messages.c
+++ b/fs/btrfs/messages.c
@@ -10,14 +10,13 @@
 #ifdef CONFIG_PRINTK
 
 #define STATE_STRING_PREFACE	": state "
-#define STATE_STRING_BUF_LEN	(sizeof(STATE_STRING_PREFACE) + BTRFS_FS_STATE_COUNT)
+#define STATE_STRING_BUF_LEN	(sizeof(STATE_STRING_PREFACE) + BTRFS_FS_STATE_COUNT + 1)
 
 /*
  * Characters to print to indicate error conditions or uncommon filesystem state.
  * RO is not an error.
  */
 static const char fs_state_chars[] = {
-	[BTRFS_FS_STATE_ERROR]			= 'E',
 	[BTRFS_FS_STATE_REMOUNTING]		= 'M',
 	[BTRFS_FS_STATE_RO]			= 0,
 	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
@@ -37,6 +36,11 @@ static void btrfs_state_to_string(const struct btrfs_fs_info *info, char *buf)
 	memcpy(curr, STATE_STRING_PREFACE, sizeof(STATE_STRING_PREFACE));
 	curr += sizeof(STATE_STRING_PREFACE) - 1;
 
+	if (BTRFS_FS_ERROR(info)) {
+		*curr++ = 'E';
+		states_printed = true;
+	}
+
 	for_each_set_bit(bit, &fs_state, sizeof(fs_state)) {
 		WARN_ON_ONCE(bit >= BTRFS_FS_STATE_COUNT);
 		if ((bit < BTRFS_FS_STATE_COUNT) && fs_state_chars[bit]) {
@@ -155,7 +159,7 @@ void __btrfs_handle_fs_error(struct btrfs_fs_info *fs_info, const char *function
 	 * Today we only save the error info to memory.  Long term we'll also
 	 * send it down to the disk.
 	 */
-	set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state);
+	WRITE_ONCE(fs_info->fs_error, errno);
 
 	/* Don't go through full error handling during mount. */
 	if (!(sb->s_flags & SB_BORN))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/17] btrfs: return real error when orphan cleanup fails due to a transaction abort
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (7 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 08/17] btrfs: store the error that turned the fs into error state fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 10/17] btrfs: fail priority metadata ticket with real fs error fdmanana
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

During mount we will call btrfs_orphan_cleanup() to remove any inodes that
were previously deleted (have a link count of 0) but for which we were not
able before to remove their items from the subvolume tree. The removal of
the items will happen by triggering eviction, when we do the final iput()
on them at btrfs_orphan_cleanup(), which will end in the loop at
btrfs_evict_inode() that truncates inode items.

In a dire situation we may have a transaction abort due to -ENOSPC when
attempting to truncate the inode items, and in that case the orphan item
(key type BTRFS_ORPHAN_ITEM_KEY) will remain in the subvolume tree and
when we hit the next iteration of the while loop at btrfs_orphan_cleanup()
we will find the same orphan item as before, and then we will return
-EINVAL from btrfs_orphan_cleanup() through the following if statement:

    if (found_key.offset == last_objectid) {
       btrfs_err(fs_info,
                 "Error removing orphan entry, stopping orphan cleanup");
       ret = -EINVAL;
       goto out;
    }

This makes the mount operation fail with -EINVAL, when it should have been
-ENOSPC. This is confusing because -EINVAL might lead a user into thinking
it provided invalid mount options for example.

An example where this happens:

   $ mount test.img /mnt
   mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

   $ dmesg
   [ 2542.356934] BTRFS: device fsid 977fff75-1181-4d2b-a739-384fa710d16e devid 1 transid 47409973 /dev/loop0 scanned by mount (4459)
   [ 2542.357451] BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
   [ 2542.357461] BTRFS info (device loop0): disk space caching is enabled
   [ 2542.742287] BTRFS info (device loop0): auto enabling async discard
   [ 2542.764554] BTRFS info (device loop0): checking UUID tree
   [ 2551.743065] ------------[ cut here ]------------
   [ 2551.743068] BTRFS: Transaction aborted (error -28)
   [ 2551.743149] WARNING: CPU: 7 PID: 215 at fs/btrfs/block-group.c:3494 btrfs_write_dirty_block_groups+0x397/0x3d0 [btrfs]
   [ 2551.743311] Modules linked in: btrfs blake2b_generic (...)
   [ 2551.743353] CPU: 7 PID: 215 Comm: kworker/u24:5 Not tainted 6.4.0-rc6-btrfs-next-134+ #1
   [ 2551.743356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
   [ 2551.743357] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
   [ 2551.743405] RIP: 0010:btrfs_write_dirty_block_groups+0x397/0x3d0 [btrfs]
   [ 2551.743449] Code: 8b 43 0c (...)
   [ 2551.743451] RSP: 0018:ffff982c005a7c40 EFLAGS: 00010286
   [ 2551.743452] RAX: 0000000000000000 RBX: ffff88fc6e44b400 RCX: 0000000000000000
   [ 2551.743453] RDX: 0000000000000002 RSI: ffffffff8dff0878 RDI: 00000000ffffffff
   [ 2551.743454] RBP: ffff88fc51817208 R08: 0000000000000000 R09: ffff982c005a7ae0
   [ 2551.743455] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88fc43d2e570
   [ 2551.743456] R13: ffff88fc43d2e400 R14: ffff88fc8fb08ee0 R15: ffff88fc6e44b530
   [ 2551.743457] FS:  0000000000000000(0000) GS:ffff89035fbc0000(0000) knlGS:0000000000000000
   [ 2551.743458] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [ 2551.743459] CR2: 00007fa8cdf2f6f4 CR3: 0000000124850003 CR4: 0000000000370ee0
   [ 2551.743462] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [ 2551.743463] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [ 2551.743464] Call Trace:
   [ 2551.743472]  <TASK>
   [ 2551.743474]  ? __warn+0x80/0x130
   [ 2551.743478]  ? btrfs_write_dirty_block_groups+0x397/0x3d0 [btrfs]
   [ 2551.743520]  ? report_bug+0x1f4/0x200
   [ 2551.743523]  ? handle_bug+0x42/0x70
   [ 2551.743526]  ? exc_invalid_op+0x14/0x70
   [ 2551.743528]  ? asm_exc_invalid_op+0x16/0x20
   [ 2551.743532]  ? btrfs_write_dirty_block_groups+0x397/0x3d0 [btrfs]
   [ 2551.743574]  ? _raw_spin_unlock+0x15/0x30
   [ 2551.743576]  ? btrfs_run_delayed_refs+0x1bd/0x200 [btrfs]
   [ 2551.743609]  commit_cowonly_roots+0x1e9/0x260 [btrfs]
   [ 2551.743652]  btrfs_commit_transaction+0x42e/0xfa0 [btrfs]
   [ 2551.743693]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 2551.743697]  flush_space+0xf1/0x5d0 [btrfs]
   [ 2551.743743]  ? _raw_spin_unlock+0x15/0x30
   [ 2551.743745]  ? finish_task_switch+0x91/0x2a0
   [ 2551.743748]  ? _raw_spin_unlock+0x15/0x30
   [ 2551.743750]  ? btrfs_get_alloc_profile+0xc9/0x1f0 [btrfs]
   [ 2551.743793]  btrfs_async_reclaim_metadata_space+0xe1/0x230 [btrfs]
   [ 2551.743837]  process_one_work+0x1d9/0x3e0
   [ 2551.743844]  worker_thread+0x4a/0x3b0
   [ 2551.743847]  ? __pfx_worker_thread+0x10/0x10
   [ 2551.743849]  kthread+0xee/0x120
   [ 2551.743852]  ? __pfx_kthread+0x10/0x10
   [ 2551.743854]  ret_from_fork+0x29/0x50
   [ 2551.743860]  </TASK>
   [ 2551.743861] ---[ end trace 0000000000000000 ]---
   [ 2551.743863] BTRFS info (device loop0: state A): dumping space info:
   [ 2551.743866] BTRFS info (device loop0: state A): space_info DATA has 126976 free, is full
   [ 2551.743868] BTRFS info (device loop0: state A): space_info total=13458472960, used=13458137088, pinned=143360, reserved=0, may_use=0, readonly=65536 zone_unusable=0
   [ 2551.743870] BTRFS info (device loop0: state A): space_info METADATA has -51625984 free, is full
   [ 2551.743872] BTRFS info (device loop0: state A): space_info total=771751936, used=770146304, pinned=1605632, reserved=0, may_use=51625984, readonly=0 zone_unusable=0
   [ 2551.743874] BTRFS info (device loop0: state A): space_info SYSTEM has 14663680 free, is not full
   [ 2551.743875] BTRFS info (device loop0: state A): space_info total=14680064, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
   [ 2551.743877] BTRFS info (device loop0: state A): global_block_rsv: size 53231616 reserved 51544064
   [ 2551.743878] BTRFS info (device loop0: state A): trans_block_rsv: size 0 reserved 0
   [ 2551.743879] BTRFS info (device loop0: state A): chunk_block_rsv: size 0 reserved 0
   [ 2551.743880] BTRFS info (device loop0: state A): delayed_block_rsv: size 0 reserved 0
   [ 2551.743881] BTRFS info (device loop0: state A): delayed_refs_rsv: size 786432 reserved 0
   [ 2551.743886] BTRFS: error (device loop0: state A) in btrfs_write_dirty_block_groups:3494: errno=-28 No space left
   [ 2551.743911] BTRFS info (device loop0: state EA): forced readonly
   [ 2551.743951] BTRFS warning (device loop0: state EA): could not allocate space for delete; will truncate on mount
   [ 2551.743962] BTRFS error (device loop0: state EA): Error removing orphan entry, stopping orphan cleanup
   [ 2551.743973] BTRFS warning (device loop0: state EA): Skipping commit of aborted transaction.
   [ 2551.743989] BTRFS error (device loop0: state EA): could not do orphan cleanup -22

So make the btrfs_orphan_cleanup() return the value of BTRFS_FS_ERROR(),
if it's set, and -EINVAL otherwise.

For that same example, after this change, the mount operation fails with
-ENOSPC:

   $ mount test.img /mnt
   mount: /mnt: mount(2) system call failed: No space left on device.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/inode.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6daaa4fd69f2..c268c5861a24 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3668,9 +3668,16 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 		 */
 
 		if (found_key.offset == last_objectid) {
+			/*
+			 * We found the same inode as before. This means we were
+			 * not able to remove its items via eviction triggered
+			 * by an iput(). A transaction abort may have happened,
+			 * due to -ENOSPC for example, so try to grab the error
+			 * that lead to a transaction abort, if any.
+			 */
 			btrfs_err(fs_info,
 				  "Error removing orphan entry, stopping orphan cleanup");
-			ret = -EINVAL;
+			ret = BTRFS_FS_ERROR(fs_info) ?: -EINVAL;
 			goto out;
 		}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/17] btrfs: fail priority metadata ticket with real fs error
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (8 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 09/17] btrfs: return real error when orphan cleanup fails due to a transaction abort fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 11/17] btrfs: make btrfs_cleanup_fs_roots() static fdmanana
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At priority_reclaim_metadata_space(), if we were not able to satisfy the
the ticket after going through the various flushing states and we notice
the fs went into an error state, likely due to a transaction abort during
the flushing, set the ticket's error to the error that caused the
transaction abort instead of an unconditional -EROFS.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 5b1b71e029ad..be5ce209b918 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1421,13 +1421,13 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info,
 	/*
 	 * Attempt to steal from the global rsv if we can, except if the fs was
 	 * turned into error mode due to a transaction abort when flushing space
-	 * above, in that case fail with -EROFS instead of returning success to
-	 * the caller if we can steal from the global rsv - this is just to have
-	 * caller fail immeditelly instead of later when trying to modify the
-	 * fs, making it easier to debug -ENOSPC problems.
+	 * above, in that case fail with the abort error instead of returning
+	 * success to the caller if we can steal from the global rsv - this is
+	 * just to have caller fail immeditelly instead of later when trying to
+	 * modify the fs, making it easier to debug -ENOSPC problems.
 	 */
 	if (BTRFS_FS_ERROR(fs_info)) {
-		ticket->error = -EROFS;
+		ticket->error = BTRFS_FS_ERROR(fs_info);
 		remove_ticket(space_info, ticket);
 	} else if (!steal_from_global_rsv(fs_info, space_info, ticket)) {
 		ticket->error = -ENOSPC;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/17] btrfs: make btrfs_cleanup_fs_roots() static
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (9 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 10/17] btrfs: fail priority metadata ticket with real fs error fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 12/17] btrfs: make find_free_dev_extent() static fdmanana
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

btrfs_cleanup_fs_roots() is not used outside disk-io.c, so make it static,
remove its prototype from disk-io.h and move its definition above the
where it's used in disk-io.c

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c | 100 ++++++++++++++++++++++-----------------------
 fs/btrfs/disk-io.h |   1 -
 2 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cad79d4eecdf..da51e5750443 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2858,6 +2858,56 @@ static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info)
+{
+	u64 root_objectid = 0;
+	struct btrfs_root *gang[8];
+	int i = 0;
+	int err = 0;
+	unsigned int ret = 0;
+
+	while (1) {
+		spin_lock(&fs_info->fs_roots_radix_lock);
+		ret = radix_tree_gang_lookup(&fs_info->fs_roots_radix,
+					     (void **)gang, root_objectid,
+					     ARRAY_SIZE(gang));
+		if (!ret) {
+			spin_unlock(&fs_info->fs_roots_radix_lock);
+			break;
+		}
+		root_objectid = gang[ret - 1]->root_key.objectid + 1;
+
+		for (i = 0; i < ret; i++) {
+			/* Avoid to grab roots in dead_roots. */
+			if (btrfs_root_refs(&gang[i]->root_item) == 0) {
+				gang[i] = NULL;
+				continue;
+			}
+			/* Grab all the search result for later use. */
+			gang[i] = btrfs_grab_root(gang[i]);
+		}
+		spin_unlock(&fs_info->fs_roots_radix_lock);
+
+		for (i = 0; i < ret; i++) {
+			if (!gang[i])
+				continue;
+			root_objectid = gang[i]->root_key.objectid;
+			err = btrfs_orphan_cleanup(gang[i]);
+			if (err)
+				goto out;
+			btrfs_put_root(gang[i]);
+		}
+		root_objectid++;
+	}
+out:
+	/* Release the uncleaned roots due to error. */
+	for (; i < ret; i++) {
+		if (gang[i])
+			btrfs_put_root(gang[i]);
+	}
+	return err;
+}
+
 /*
  * Some options only have meaning at mount time and shouldn't persist across
  * remounts, or be displayed. Clear these at the end of mount and remount
@@ -4125,56 +4175,6 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
 		btrfs_put_root(root);
 }
 
-int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info)
-{
-	u64 root_objectid = 0;
-	struct btrfs_root *gang[8];
-	int i = 0;
-	int err = 0;
-	unsigned int ret = 0;
-
-	while (1) {
-		spin_lock(&fs_info->fs_roots_radix_lock);
-		ret = radix_tree_gang_lookup(&fs_info->fs_roots_radix,
-					     (void **)gang, root_objectid,
-					     ARRAY_SIZE(gang));
-		if (!ret) {
-			spin_unlock(&fs_info->fs_roots_radix_lock);
-			break;
-		}
-		root_objectid = gang[ret - 1]->root_key.objectid + 1;
-
-		for (i = 0; i < ret; i++) {
-			/* Avoid to grab roots in dead_roots */
-			if (btrfs_root_refs(&gang[i]->root_item) == 0) {
-				gang[i] = NULL;
-				continue;
-			}
-			/* grab all the search result for later use */
-			gang[i] = btrfs_grab_root(gang[i]);
-		}
-		spin_unlock(&fs_info->fs_roots_radix_lock);
-
-		for (i = 0; i < ret; i++) {
-			if (!gang[i])
-				continue;
-			root_objectid = gang[i]->root_key.objectid;
-			err = btrfs_orphan_cleanup(gang[i]);
-			if (err)
-				goto out;
-			btrfs_put_root(gang[i]);
-		}
-		root_objectid++;
-	}
-out:
-	/* release the uncleaned roots due to error */
-	for (; i < ret; i++) {
-		if (gang[i])
-			btrfs_put_root(gang[i]);
-	}
-	return err;
-}
-
 int btrfs_commit_super(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_root *root = fs_info->tree_root;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index b03767f4d7ed..02b645744a82 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -77,7 +77,6 @@ struct btrfs_root *btrfs_extent_root(struct btrfs_fs_info *fs_info, u64 bytenr);
 struct btrfs_root *btrfs_block_group_root(struct btrfs_fs_info *fs_info);
 
 void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
-int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
 void btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info);
 void btrfs_btree_balance_dirty_nodelay(struct btrfs_fs_info *fs_info);
 void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/17] btrfs: make find_free_dev_extent() static
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (10 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 11/17] btrfs: make btrfs_cleanup_fs_roots() static fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 13/17] btrfs: merge find_free_dev_extent() and find_free_dev_extent_start() fdmanana
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

The function find_free_dev_extent() is only used within volumes.c, so make
it static and remove its prototype from volumes.h.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/volumes.c | 4 ++--
 fs/btrfs/volumes.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9df526c57c3a..e15d02f8520d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1725,8 +1725,8 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	return ret;
 }
 
-int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
-			 u64 *start, u64 *len)
+static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
+				u64 *start, u64 *len)
 {
 	/* FIXME use last free of some kind */
 	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b8c51f16ba86..a59898d51e9e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -650,8 +650,6 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
 int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_uuid_scan_kthread(void *data);
 bool btrfs_chunk_writeable(struct btrfs_fs_info *fs_info, u64 chunk_offset);
-int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
-			 u64 *start, u64 *max_avail);
 void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index);
 int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
 			struct btrfs_ioctl_get_dev_stats *stats);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/17] btrfs: merge find_free_dev_extent() and find_free_dev_extent_start()
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (11 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 12/17] btrfs: make find_free_dev_extent() static fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 14/17] btrfs: avoid starting new transaction when flushing delayed items and refs fdmanana
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There is no point in having find_free_dev_extent() because it's just a
simple wrapper around find_free_dev_extent_start() which always passes a
value of 0 for the search_start argument. Since there are no other callers
of find_free_dev_extent_start(), remove find_free_dev_extent() and rename
find_free_dev_extent_start() to find_free_dev_extent(), removing its
search_start argument because it's always 0.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/volumes.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e15d02f8520d..6b044d61ef13 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1438,18 +1438,18 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 	return false;
 }
 
-static u64 dev_extent_search_start(struct btrfs_device *device, u64 start)
+static u64 dev_extent_search_start(struct btrfs_device *device)
 {
 	switch (device->fs_devices->chunk_alloc_policy) {
 	case BTRFS_CHUNK_ALLOC_REGULAR:
-		return max_t(u64, start, BTRFS_DEVICE_RANGE_RESERVED);
+		return BTRFS_DEVICE_RANGE_RESERVED;
 	case BTRFS_CHUNK_ALLOC_ZONED:
 		/*
 		 * We don't care about the starting region like regular
 		 * allocator, because we anyway use/reserve the first two zones
 		 * for superblock logging.
 		 */
-		return ALIGN(start, device->zone_info->zone_size);
+		return 0;
 	default:
 		BUG();
 	}
@@ -1581,15 +1581,15 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
  * correct usable device space, as device extent freed in current transaction
  * is not reported as available.
  */
-static int find_free_dev_extent_start(struct btrfs_device *device,
-				u64 num_bytes, u64 search_start, u64 *start,
-				u64 *len)
+static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
+				u64 *start, u64 *len)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
 	struct btrfs_root *root = fs_info->dev_root;
 	struct btrfs_key key;
 	struct btrfs_dev_extent *dev_extent;
 	struct btrfs_path *path;
+	u64 search_start;
 	u64 hole_size;
 	u64 max_hole_start;
 	u64 max_hole_size;
@@ -1599,7 +1599,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	int slot;
 	struct extent_buffer *l;
 
-	search_start = dev_extent_search_start(device, search_start);
+	search_start = dev_extent_search_start(device);
 
 	WARN_ON(device->zone_info &&
 		!IS_ALIGNED(num_bytes, device->zone_info->zone_size));
@@ -1725,13 +1725,6 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 	return ret;
 }
 
-static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
-				u64 *start, u64 *len)
-{
-	/* FIXME use last free of some kind */
-	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
-}
-
 static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
 			  struct btrfs_device *device,
 			  u64 start, u64 *dev_extent_len)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/17] btrfs: avoid starting new transaction when flushing delayed items and refs
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (12 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 13/17] btrfs: merge find_free_dev_extent() and find_free_dev_extent_start() fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 15/17] btrfs: avoid starting and committing empty transaction when flushing space fdmanana
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When flushing space we join a transaction to flush delayed items and
delayed references, in order to try to release space. However using
btrfs_join_transaction() not only joins an existing transaction as well
as it starts a new transaction if there is none open. If there is no
transaction open, we don't have neither delayed items nor delayed
references, so creating a new transaction is a waste of time, IO and
creates an unnecessary rotation of the backup roots without gaining any
benefits (including releasing space).

So use btrfs_join_transaction_nostart() when attempting to flush delayed
items and references.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index be5ce209b918..2db92a201697 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -725,9 +725,11 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		else
 			nr = -1;
 
-		trans = btrfs_join_transaction(root);
+		trans = btrfs_join_transaction_nostart(root);
 		if (IS_ERR(trans)) {
 			ret = PTR_ERR(trans);
+			if (ret == -ENOENT)
+				ret = 0;
 			break;
 		}
 		ret = btrfs_run_delayed_items_nr(trans, nr);
@@ -743,9 +745,11 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case FLUSH_DELAYED_REFS_NR:
 	case FLUSH_DELAYED_REFS:
-		trans = btrfs_join_transaction(root);
+		trans = btrfs_join_transaction_nostart(root);
 		if (IS_ERR(trans)) {
 			ret = PTR_ERR(trans);
+			if (ret == -ENOENT)
+				ret = 0;
 			break;
 		}
 		if (state == FLUSH_DELAYED_REFS_NR)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 15/17] btrfs: avoid starting and committing empty transaction when flushing space
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (13 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 14/17] btrfs: avoid starting new transaction when flushing delayed items and refs fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 16/17] btrfs: avoid start and commit empty transaction when starting qgroup rescan fdmanana
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When flushing space and we are in the COMMIT_TRANS state, we join a
transaction with btrfs_join_transaction() and then commit the returned
transaction. However btrfs_join_transaction() starts a new transaction if
there is none currently open, which is pointless since comitting a new,
empty transaction, doesn't achieve anything, it only wastes time, IO and
creates an unnecessary rotation of the backup roots.

So use btrfs_attach_transaction_barrier() to avoid starting a new
transaction. This also waits for any ongoing transaction that is
committing (state >= TRANS_STATE_COMMIT_DOING) to fully complete, and
therefore wait for all the extents that were pinned during the
transaction's lifetime to be unpinned.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/space-info.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 2db92a201697..17c86db7b1b1 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -814,9 +814,18 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case COMMIT_TRANS:
 		ASSERT(current->journal_info == NULL);
-		trans = btrfs_join_transaction(root);
+		/*
+		 * We don't want to start a new transaction, just attach to the
+		 * current one or wait it fully commits in case its commit is
+		 * happening at the moment. Note: we don't use a nostart join
+		 * because that does not wait for a transaction to fully commit
+		 * (only for it to be unblocked, state TRANS_STATE_UNBLOCKED).
+		 */
+		trans = btrfs_attach_transaction_barrier(root);
 		if (IS_ERR(trans)) {
 			ret = PTR_ERR(trans);
+			if (ret == -ENOENT)
+				ret = 0;
 			break;
 		}
 		ret = btrfs_commit_transaction(trans);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 16/17] btrfs: avoid start and commit empty transaction when starting qgroup rescan
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (14 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 15/17] btrfs: avoid starting and committing empty transaction when flushing space fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-07-26 15:57 ` [PATCH 17/17] btrfs: avoid start and commit empty transaction when flushing qgroups fdmanana
  2023-08-17 12:58 ` [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling David Sterba
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When starting a qgroup rescan, we try to join a running transaction, with
btrfs_join_transaction(), and then commit the transaction. However using
btrfs_join_transaction() will result in creating a new transaction in case
there isn't any running or if there's an existing one already committing.
This is pointless as we only need to attach to an existing one that is
not committing and in case there's an existing one committing, wait for
its commit to complete. Creating and committing an empty transaction is
wasteful, pointless IO and unnecessary rotation of the backup roots.

So use btrfs_attach_transaction_barrier() instead, to avoid creating and
committing empty transactions.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 2637d6b157ff..1ef60ea8f0bc 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3590,15 +3590,16 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 	 * going to clear all tracking information for a clean start.
 	 */
 
-	trans = btrfs_join_transaction(fs_info->fs_root);
-	if (IS_ERR(trans)) {
+	trans = btrfs_attach_transaction_barrier(fs_info->fs_root);
+	if (IS_ERR(trans) && trans != ERR_PTR(-ENOENT)) {
 		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
 		return PTR_ERR(trans);
-	}
-	ret = btrfs_commit_transaction(trans);
-	if (ret) {
-		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
-		return ret;
+	} else if (trans != ERR_PTR(-ENOENT)) {
+		ret = btrfs_commit_transaction(trans);
+		if (ret) {
+			fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+			return ret;
+		}
 	}
 
 	qgroup_rescan_zero_tracking(fs_info);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 17/17] btrfs: avoid start and commit empty transaction when flushing qgroups
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (15 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 16/17] btrfs: avoid start and commit empty transaction when starting qgroup rescan fdmanana
@ 2023-07-26 15:57 ` fdmanana
  2023-08-17 12:58 ` [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling David Sterba
  17 siblings, 0 replies; 19+ messages in thread
From: fdmanana @ 2023-07-26 15:57 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When flushing qgroups, we try to join a running transaction, with
btrfs_join_transaction(), and then commit the transaction. However using
btrfs_join_transaction() will result in creating a new transaction in case
there isn't any running or if there's an existing one already committing.
This is pointless as we only need to attach to an existing one that is
not committing and in case there's an existing one committing, wait for
its commit to complete. Creating and committing an empty transaction is
wasteful, pointless IO and unnecessary rotation of the backup roots.

So use btrfs_attach_transaction_barrier() instead, to avoid creating and
committing empty transactions.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1ef60ea8f0bc..b99230db3c82 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3758,9 +3758,11 @@ static int try_flush_qgroup(struct btrfs_root *root)
 		goto out;
 	btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);

-	trans = btrfs_join_transaction(root);
+	trans = btrfs_attach_transaction_barrier(root);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
+		if (ret == -ENOENT)
+			ret = 0;
 		goto out;
 	}

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling
  2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
                   ` (16 preceding siblings ...)
  2023-07-26 15:57 ` [PATCH 17/17] btrfs: avoid start and commit empty transaction when flushing qgroups fdmanana
@ 2023-08-17 12:58 ` David Sterba
  17 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2023-08-17 12:58 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Jul 26, 2023 at 04:56:56PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> A few fixes, improvements, cleanups around space flushing, enospc handling,
> debugging. These came out while debugging an image of a fs that fails to
> mount due to being out of unallocated space and having only about 1.6M of
> free metadata space, which is not enough to commit any transaction triggered
> during mount while doing orphan cleanup, resulting in transactions aborts
> either when running delayed refs or somewhere in the critical section of a
> transaction commit. These patches do not prevent getting into that situation,
> but that will be attempted by other patches in a separate patchset.
> 
> Filipe Manana (17):
>   btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
>   btrfs: update comment for btrfs_join_transaction_nostart()
>   btrfs: print target number of bytes when dumping free space
>   btrfs: print block group super and delalloc bytes when dumping space info
>   btrfs: print available space for a block group when dumping a space info
>   btrfs: print available space across all block groups when dumping space info
>   btrfs: don't steal space from global rsv after a transaction abort
>   btrfs: store the error that turned the fs into error state
>   btrfs: return real error when orphan cleanup fails due to a transaction abort
>   btrfs: fail priority metadata ticket with real fs error
>   btrfs: make btrfs_cleanup_fs_roots() static
>   btrfs: make find_free_dev_extent() static
>   btrfs: merge find_free_dev_extent() and find_free_dev_extent_start()
>   btrfs: avoid starting new transaction when flushing delayed items and refs
>   btrfs: avoid starting and committing empty transaction when flushing space
>   btrfs: avoid start and commit empty transaction when starting qgroup rescan
>   btrfs: avoid start and commit empty transaction when flushing qgroups

Looks like I forgot to reply, the patchset has been in misc-next for
some time. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-17 13:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 15:56 [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling fdmanana
2023-07-26 15:56 ` [PATCH 01/17] btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART fdmanana
2023-07-26 15:56 ` [PATCH 02/17] btrfs: update comment for btrfs_join_transaction_nostart() fdmanana
2023-07-26 15:56 ` [PATCH 03/17] btrfs: print target number of bytes when dumping free space fdmanana
2023-07-26 15:57 ` [PATCH 04/17] btrfs: print block group super and delalloc bytes when dumping space info fdmanana
2023-07-26 15:57 ` [PATCH 05/17] btrfs: print available space for a block group when dumping a " fdmanana
2023-07-26 15:57 ` [PATCH 06/17] btrfs: print available space across all block groups when dumping " fdmanana
2023-07-26 15:57 ` [PATCH 07/17] btrfs: don't steal space from global rsv after a transaction abort fdmanana
2023-07-26 15:57 ` [PATCH 08/17] btrfs: store the error that turned the fs into error state fdmanana
2023-07-26 15:57 ` [PATCH 09/17] btrfs: return real error when orphan cleanup fails due to a transaction abort fdmanana
2023-07-26 15:57 ` [PATCH 10/17] btrfs: fail priority metadata ticket with real fs error fdmanana
2023-07-26 15:57 ` [PATCH 11/17] btrfs: make btrfs_cleanup_fs_roots() static fdmanana
2023-07-26 15:57 ` [PATCH 12/17] btrfs: make find_free_dev_extent() static fdmanana
2023-07-26 15:57 ` [PATCH 13/17] btrfs: merge find_free_dev_extent() and find_free_dev_extent_start() fdmanana
2023-07-26 15:57 ` [PATCH 14/17] btrfs: avoid starting new transaction when flushing delayed items and refs fdmanana
2023-07-26 15:57 ` [PATCH 15/17] btrfs: avoid starting and committing empty transaction when flushing space fdmanana
2023-07-26 15:57 ` [PATCH 16/17] btrfs: avoid start and commit empty transaction when starting qgroup rescan fdmanana
2023-07-26 15:57 ` [PATCH 17/17] btrfs: avoid start and commit empty transaction when flushing qgroups fdmanana
2023-08-17 12:58 ` [PATCH 00/17] btrfs: some misc stuff around space flushing and enospc handling David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).