* [PATCH 01/22] btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 02/22] btrfs: exit after state split error " fdmanana
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need to check if split_state() returned an error twice, instead
unify into a single if statement after setting 'prealloc' to NULL, because
on error split_state() frees the 'prealloc' extent state record.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index cae3980f4291..dc8cd908a383 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -691,12 +691,11 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
if (!prealloc)
goto search_again;
err = split_state(tree, state, prealloc, start);
- if (err)
- extent_io_tree_panic(tree, state, "split", err);
-
prealloc = NULL;
- if (err)
+ if (err) {
+ extent_io_tree_panic(tree, state, "split", err);
goto out;
+ }
if (state->end <= end) {
state = clear_state_bit(tree, state, bits, wake, changeset);
goto next;
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 02/22] btrfs: exit after state split error at btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
2025-04-23 14:19 ` [PATCH 01/22] btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 03/22] btrfs: add missing error return to btrfs_clear_extent_bit_changeset() fdmanana
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If split_state() returned an error we call extent_io_tree_panic() which
will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an
uncommon and exotic scenario, then we fallthrough and hit a use after free
when calling clear_state_bit() since the extent state record which the
local variable 'prealloc' points to was freed by split_state().
So jump to the label 'out' after calling extent_io_tree_panic() and set
the 'prealloc' pointer to NULL since split_state() has already freed it
when it hit an error.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index dc8cd908a383..b3811bbbd53b 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -712,8 +712,11 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
if (!prealloc)
goto search_again;
err = split_state(tree, state, prealloc, end + 1);
- if (err)
+ if (err) {
extent_io_tree_panic(tree, state, "split", err);
+ prealloc = NULL;
+ goto out;
+ }
if (wake)
wake_up(&state->wq);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 03/22] btrfs: add missing error return to btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
2025-04-23 14:19 ` [PATCH 01/22] btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset() fdmanana
2025-04-23 14:19 ` [PATCH 02/22] btrfs: exit after state split error " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 04/22] btrfs: use bools for local variables at btrfs_clear_extent_bit_changeset() fdmanana
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We have a couple error branches where we have an error stored in the 'err'
variable and then jump to the 'out' label, however we don't return that
error, we just return 0. Normally this is not a problem since those error
branches call extent_io_tree_panic() which triggers a BUG() call, however
it's possible to have rather exotic kernel config with CONFIG_BUG disabled
in which case the BUG() call does nothing and we fallthrough. So make sure
to return the error, not just to fix that exotic case but also to make the
code less confusing. While at it also rename the 'err' variable to 'ret'
since this is the style we prefer and use more widely.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index b3811bbbd53b..35c8824add34 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -604,7 +604,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
struct extent_state *cached;
struct extent_state *prealloc = NULL;
u64 last_end;
- int err;
+ int ret = 0;
int clear = 0;
int wake;
int delete = (bits & EXTENT_CLEAR_ALL_BITS);
@@ -690,10 +690,10 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
prealloc = alloc_extent_state_atomic(prealloc);
if (!prealloc)
goto search_again;
- err = split_state(tree, state, prealloc, start);
+ ret = split_state(tree, state, prealloc, start);
prealloc = NULL;
- if (err) {
- extent_io_tree_panic(tree, state, "split", err);
+ if (ret) {
+ extent_io_tree_panic(tree, state, "split", ret);
goto out;
}
if (state->end <= end) {
@@ -711,9 +711,9 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
prealloc = alloc_extent_state_atomic(prealloc);
if (!prealloc)
goto search_again;
- err = split_state(tree, state, prealloc, end + 1);
- if (err) {
- extent_io_tree_panic(tree, state, "split", err);
+ ret = split_state(tree, state, prealloc, end + 1);
+ if (ret) {
+ extent_io_tree_panic(tree, state, "split", ret);
prealloc = NULL;
goto out;
}
@@ -748,7 +748,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
if (prealloc)
btrfs_free_extent_state(prealloc);
- return 0;
+ return ret;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 04/22] btrfs: use bools for local variables at btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (2 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 03/22] btrfs: add missing error return to btrfs_clear_extent_bit_changeset() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 05/22] btrfs: avoid extra tree search " fdmanana
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Several variables are defined as integers but used as booleans, and the
'delete' variable can be made const since it's not changed after being
declared. So change them to proper booleans and simplify setting their
value.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 35c8824add34..b56046c55097 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -605,9 +605,9 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
struct extent_state *prealloc = NULL;
u64 last_end;
int ret = 0;
- int clear = 0;
- int wake;
- int delete = (bits & EXTENT_CLEAR_ALL_BITS);
+ bool clear;
+ bool wake;
+ const bool delete = (bits & EXTENT_CLEAR_ALL_BITS);
gfp_t mask;
set_gfp_mask_from_bits(&bits, &mask);
@@ -620,9 +620,8 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
if (bits & EXTENT_DELALLOC)
bits |= EXTENT_NORESERVE;
- wake = ((bits & EXTENT_LOCK_BITS) ? 1 : 0);
- if (bits & (EXTENT_LOCK_BITS | EXTENT_BOUNDARY))
- clear = 1;
+ wake = (bits & EXTENT_LOCK_BITS);
+ clear = (bits & (EXTENT_LOCK_BITS | EXTENT_BOUNDARY));
again:
if (!prealloc) {
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 05/22] btrfs: avoid extra tree search at btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (3 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 04/22] btrfs: use bools for local variables at btrfs_clear_extent_bit_changeset() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 06/22] btrfs: simplify last record detection " fdmanana
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When we find an extent state that starts before our range's start we
split it and jump into the 'search_again' label with our start offset
remaining the same, making us then go to the 'again' label and search
again for an extent state that contains the 'start' offset, and this
time it finds the same extent state but with its start offset set to
our range's start offset (due to the split). This is because we have
consumed the preallocated extent state record and we may need to split
again, and by jumping to 'again' we release the spinlock, allocate a new
prealloc state and restart the search.
However we may not need to restart and allocate a new extent state in
case we don't find extent states that cross our end offset, therefore
no need for further extent state splits, or we may be able to do an
atomic allocation (which is quick even if it fails). In these cases
it's a waste to restart the search.
So change the behaviour to do the restart only if we need to reschedule,
otherwise fall through - if we need to allocate an extent state for split
operations, we will try an atomic allocation and if that fails we will do
the restart as before.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index b56046c55097..590682490763 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -699,7 +699,13 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
state = clear_state_bit(tree, state, bits, wake, changeset);
goto next;
}
- goto search_again;
+ if (need_resched())
+ goto search_again;
+ /*
+ * Fallthrough and try atomic extent state allocation if needed.
+ * If it fails we'll jump to 'search_again' retry the allocation
+ * in non-atomic mode and research.
+ */
}
/*
* | ---- desired range ---- |
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 06/22] btrfs: simplify last record detection at btrfs_clear_extent_bit_changeset()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (4 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 05/22] btrfs: avoid extra tree search " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 07/22] btrfs: remove duplicate error check at btrfs_convert_extent_bit() fdmanana
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Instead of checking for an end offset of (u64)-1 (U64_MAX) for the current
extent state's end, and then checking after updating the current start
offset if it's now beyond the range's end offset, we can simply stop if
the current extent state's end is greater than or equals to our range's
end offset. This helps remove one comparison under the 'next' label and
allows to remove the if statement that checks if the start offset is
greater than the end offset under the 'search_again' label.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 590682490763..5484e2b80cfe 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -734,15 +734,13 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
state = clear_state_bit(tree, state, bits, wake, changeset);
next:
- if (last_end == (u64)-1)
+ if (last_end >= end)
goto out;
start = last_end + 1;
- if (start <= end && state && !need_resched())
+ if (state && !need_resched())
goto hit_next;
search_again:
- if (start > end)
- goto out;
spin_unlock(&tree->lock);
if (gfpflags_allow_blocking(mask))
cond_resched();
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 07/22] btrfs: remove duplicate error check at btrfs_convert_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (5 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 06/22] btrfs: simplify last record detection " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 08/22] btrfs: exit after state split error " fdmanana
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need to check if split_state() returned an error twice, instead
unify into a single if statement after setting 'prealloc' to NULL, because
on error split_state() frees the 'prealloc' extent state record.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 5484e2b80cfe..8b765ae9085f 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1389,11 +1389,11 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
goto out;
}
ret = split_state(tree, state, prealloc, start);
- if (ret)
- extent_io_tree_panic(tree, state, "split", ret);
prealloc = NULL;
- if (ret)
+ if (ret) {
+ extent_io_tree_panic(tree, state, "split", ret);
goto out;
+ }
if (state->end <= end) {
set_state_bits(tree, state, bits, NULL);
cache_state(state, cached_state);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 08/22] btrfs: exit after state split error at btrfs_convert_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (6 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 07/22] btrfs: remove duplicate error check at btrfs_convert_extent_bit() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 09/22] btrfs: exit after state insertion failure " fdmanana
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If split_state() returned an error we call extent_io_tree_panic() which
will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an
uncommon and exotic scenario, then we fallthrough and hit a use after free
when calling set_state_bits() since the extent state record which the
local variable 'prealloc' points to was freed by split_state().
So jump to the label 'out' after calling extent_io_tree_panic() and set
the 'prealloc' pointer to NULL since split_state() has already freed it
when it hit an error.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 8b765ae9085f..80de8f96b08d 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1460,8 +1460,11 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
}
ret = split_state(tree, state, prealloc, end + 1);
- if (ret)
+ if (ret) {
extent_io_tree_panic(tree, state, "split", ret);
+ prealloc = NULL;
+ goto out;
+ }
set_state_bits(tree, prealloc, bits, NULL);
cache_state(prealloc, cached_state);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 09/22] btrfs: exit after state insertion failure at btrfs_convert_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (7 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 08/22] btrfs: exit after state split error " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 10/22] btrfs: avoid unnecessary next node searches when clearing bits from extent range fdmanana
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If insert_state() state failed it returns an error pointer and we call
extent_io_tree_panic() which will trigger a BUG() call. However if
CONFIG_BUG is disabled, which is an uncommon and exotic scenario, then
we fallthrough and call cache_state() which will dereference the error
pointer, resulting in an invalid memory access.
So jump to the 'out' label after calling extent_io_tree_panic(), it also
makes the code more clear besides dealing with the exotic scenario where
CONFIG_BUG is disabled.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 80de8f96b08d..591111879aec 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1439,6 +1439,7 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (IS_ERR(inserted_state)) {
ret = PTR_ERR(inserted_state);
extent_io_tree_panic(tree, prealloc, "insert", ret);
+ goto out;
}
cache_state(inserted_state, cached_state);
if (inserted_state == prealloc)
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 10/22] btrfs: avoid unnecessary next node searches when clearing bits from extent range
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (8 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 09/22] btrfs: exit after state insertion failure " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 11/22] btrfs: avoid repeated extent state processing when converting extent bits fdmanana
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When clearing bits for a range in an io tree, at clear_state_bit(), we
always go search for the next node (through next_state() -> rb_next()) and
return it. However if the current extent state record ends at or after the
target range passed to btrfs_clear_extent_bit_changeset() or
btrfs_convert_extent_bit(), we are just wasting time finding that next
node since we won't use it in those functions.
Improve on this by skipping the next node search if the current node ends
at or after the target range.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 41 ++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 591111879aec..bb829c5703f2 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -544,12 +544,9 @@ static int split_state(struct extent_io_tree *tree, struct extent_state *orig,
* If no bits are set on the state struct after clearing things, the
* struct is freed and removed from the tree
*/
-static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
- struct extent_state *state,
- u32 bits, int wake,
- struct extent_changeset *changeset)
+static void clear_state_bit(struct extent_io_tree *tree, struct extent_state *state,
+ u32 bits, int wake, struct extent_changeset *changeset)
{
- struct extent_state *next;
u32 bits_to_clear = bits & ~EXTENT_CTLBITS;
int ret;
@@ -562,7 +559,6 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
if (wake)
wake_up(&state->wq);
if (state->state == 0) {
- next = next_state(state);
if (extent_state_in_tree(state)) {
rb_erase(&state->rb_node, &tree->state);
RB_CLEAR_NODE(&state->rb_node);
@@ -572,9 +568,7 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
}
} else {
merge_state(tree, state);
- next = next_state(state);
}
- return next;
}
/*
@@ -587,6 +581,18 @@ static void set_gfp_mask_from_bits(u32 *bits, gfp_t *mask)
*bits &= EXTENT_NOWAIT - 1;
}
+/*
+ * Use this during tree iteration to avoid doing next node searches when it's
+ * not needed (the current record ends at or after the target range's end).
+ */
+static inline struct extent_state *next_search_state(struct extent_state *state, u64 end)
+{
+ if (state->end < end)
+ return next_state(state);
+
+ return NULL;
+}
+
/*
* Clear some bits on a range in the tree. This may require splitting or
* inserting elements in the tree, so the gfp mask is used to indicate which
@@ -601,6 +607,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
struct extent_changeset *changeset)
{
struct extent_state *state;
+ struct extent_state *next;
struct extent_state *cached;
struct extent_state *prealloc = NULL;
u64 last_end;
@@ -666,7 +673,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
/* The state doesn't have the wanted bits, go ahead. */
if (!(state->state & bits)) {
- state = next_state(state);
+ next = next_search_state(state, end);
goto next;
}
@@ -696,7 +703,8 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
goto out;
}
if (state->end <= end) {
- state = clear_state_bit(tree, state, bits, wake, changeset);
+ next = next_search_state(state, end);
+ clear_state_bit(tree, state, bits, wake, changeset);
goto next;
}
if (need_resched())
@@ -732,11 +740,13 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
goto out;
}
- state = clear_state_bit(tree, state, bits, wake, changeset);
+ next = next_search_state(state, end);
+ clear_state_bit(tree, state, bits, wake, changeset);
next:
if (last_end >= end)
goto out;
start = last_end + 1;
+ state = next;
if (state && !need_resched())
goto hit_next;
@@ -1292,6 +1302,7 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
struct extent_state **cached_state)
{
struct extent_state *state;
+ struct extent_state *next;
struct extent_state *prealloc = NULL;
struct rb_node **p = NULL;
struct rb_node *parent = NULL;
@@ -1357,10 +1368,12 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (state->start == start && state->end <= end) {
set_state_bits(tree, state, bits, NULL);
cache_state(state, cached_state);
- state = clear_state_bit(tree, state, clear_bits, 0, NULL);
+ next = next_search_state(state, end);
+ clear_state_bit(tree, state, clear_bits, 0, NULL);
if (last_end == (u64)-1)
goto out;
start = last_end + 1;
+ state = next;
if (start < end && state && state->start == start &&
!need_resched())
goto hit_next;
@@ -1397,10 +1410,12 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (state->end <= end) {
set_state_bits(tree, state, bits, NULL);
cache_state(state, cached_state);
- state = clear_state_bit(tree, state, clear_bits, 0, NULL);
+ next = next_search_state(state, end);
+ clear_state_bit(tree, state, clear_bits, 0, NULL);
if (last_end == (u64)-1)
goto out;
start = last_end + 1;
+ state = next;
if (start < end && state && state->start == start &&
!need_resched())
goto hit_next;
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 11/22] btrfs: avoid repeated extent state processing when converting extent bits
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (9 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 10/22] btrfs: avoid unnecessary next node searches when clearing bits from extent range fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 12/22] btrfs: avoid researching tree when converting bits in an extent range fdmanana
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When converting bits for an extent range, if we find an extent state with
its start offset greater than current start offset, we insert a new extent
state to cover the gap, with its end offset computed and stored in the
@this_end local variable, and after the insertion we update the current
start offset to @this_end + 1. However if the insert_state() call resulted
in an extent state merge then the end offset of the merged extent may be
greater than @this_end and if that's the case, since we jump to the
'search_again' label, we'll do a full tree search that will leave us in
the same extent state - this is harmless but wastes time by doing a
pointless tree search and extent state processing.
So improve on this by updating the current start offset to the end offset
of the inserted state plus 1. This also removes the use of the @this_end
variable and directly set the value in the prealloc extent state to avoid
any confusion and misuse in the future.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index bb829c5703f2..a898a15a7854 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1430,14 +1430,8 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
* extent we found.
*/
if (state->start > start) {
- u64 this_end;
struct extent_state *inserted_state;
- if (end < last_start)
- this_end = end;
- else
- this_end = last_start - 1;
-
prealloc = alloc_extent_state_atomic(prealloc);
if (!prealloc) {
ret = -ENOMEM;
@@ -1449,7 +1443,11 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
* extent.
*/
prealloc->start = start;
- prealloc->end = this_end;
+ if (end < last_start)
+ prealloc->end = end;
+ else
+ prealloc->end = last_start - 1;
+
inserted_state = insert_state(tree, prealloc, bits, NULL);
if (IS_ERR(inserted_state)) {
ret = PTR_ERR(inserted_state);
@@ -1459,7 +1457,7 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
cache_state(inserted_state, cached_state);
if (inserted_state == prealloc)
prealloc = NULL;
- start = this_end + 1;
+ start = inserted_state->end + 1;
goto search_again;
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 12/22] btrfs: avoid researching tree when converting bits in an extent range
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (10 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 11/22] btrfs: avoid repeated extent state processing when converting extent bits fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 13/22] btrfs: simplify last record detection at btrfs_convert_extent_bit() fdmanana
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When converting bits for an extent range (btrfs_convert_extent_bit()), if
the current extent state record starts after the target range, we always
do a jump to the 'search_again' label, which will cause us to do a full
tree search for the next state if the current state ends before the target
range. Unless we need to reschedule, we can just grab the next state and
process it, avoiding a full tree search, even if that next state is not
contiguous, as we'll allocate and insert a new prealloc state if needed.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index a898a15a7854..65665c0843e6 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1458,6 +1458,22 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (inserted_state == prealloc)
prealloc = NULL;
start = inserted_state->end + 1;
+
+ /* Beyond target range, stop. */
+ if (start > end)
+ goto out;
+
+ if (need_resched())
+ goto search_again;
+
+ state = next_search_state(inserted_state, end);
+ /*
+ * If there's a next state, whether contiguous or not, we don't
+ * need to unlock and research. If it's not contiguous we will
+ * end up here and try to allocate a prealloc state and insert.
+ */
+ if (state)
+ goto hit_next;
goto search_again;
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 13/22] btrfs: simplify last record detection at btrfs_convert_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (11 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 12/22] btrfs: avoid researching tree when converting bits in an extent range fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 14/22] btrfs: exit after state insertion failure at set_extent_bit() fdmanana
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need to compare the current extent state's end offset to
(u64)-1 to check if we have the last possible record and to check as
as well if after updating the start offset to the end offset of the
current record plus one we are still inside the target range.
Instead we can simplify and exit if the current extent state ends at or
after the target range and then remove the check for the (u64)-1 as well
as the check to see if the updated start offset (to last_end + 1) is still
inside the target range.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 65665c0843e6..453312c0ea70 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1370,12 +1370,11 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
cache_state(state, cached_state);
next = next_search_state(state, end);
clear_state_bit(tree, state, clear_bits, 0, NULL);
- if (last_end == (u64)-1)
+ if (last_end >= end)
goto out;
start = last_end + 1;
state = next;
- if (start < end && state && state->start == start &&
- !need_resched())
+ if (state && state->start == start && !need_resched())
goto hit_next;
goto search_again;
}
@@ -1412,12 +1411,11 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
cache_state(state, cached_state);
next = next_search_state(state, end);
clear_state_bit(tree, state, clear_bits, 0, NULL);
- if (last_end == (u64)-1)
+ if (last_end >= end)
goto out;
start = last_end + 1;
state = next;
- if (start < end && state && state->start == start &&
- !need_resched())
+ if (state && state->start == start && !need_resched())
goto hit_next;
}
goto search_again;
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 14/22] btrfs: exit after state insertion failure at set_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (12 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 13/22] btrfs: simplify last record detection at btrfs_convert_extent_bit() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 15/22] btrfs: exit after state split error " fdmanana
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If insert_state() state failed it returns an error pointer and we call
extent_io_tree_panic() which will trigger a BUG() call. However if
CONFIG_BUG is disabled, which is an uncommon and exotic scenario, then
we fallthrough and call cache_state() which will dereference the error
pointer, resulting in an invalid memory access.
So jump to the 'out' label after calling extent_io_tree_panic(), it also
makes the code more clear besides dealing with the exotic scenario where
CONFIG_BUG is disabled.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 453312c0ea70..2f92b7f697ad 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1220,6 +1220,7 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (IS_ERR(inserted_state)) {
ret = PTR_ERR(inserted_state);
extent_io_tree_panic(tree, prealloc, "insert", ret);
+ goto out;
}
cache_state(inserted_state, cached_state);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 15/22] btrfs: exit after state split error at set_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (13 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 14/22] btrfs: exit after state insertion failure at set_extent_bit() fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 16/22] btrfs: simplify last record detection " fdmanana
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If split_state() returned an error we call extent_io_tree_panic() which
will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an
uncommon and exotic scenario, then we fallthrough and hit a use after free
when calling set_state_bits() since the extent state record which the
local variable 'prealloc' points to was freed by split_state().
So jump to the label 'out' after calling extent_io_tree_panic() and set
the 'prealloc' pointer to NULL since split_state() has already freed it
when it hit an error.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 2f92b7f697ad..1460a287c0df 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1247,8 +1247,11 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (!prealloc)
goto search_again;
ret = split_state(tree, state, prealloc, end + 1);
- if (ret)
+ if (ret) {
extent_io_tree_panic(tree, state, "split", ret);
+ prealloc = NULL;
+ goto out;
+ }
set_state_bits(tree, prealloc, bits, changeset);
cache_state(prealloc, cached_state);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 16/22] btrfs: simplify last record detection at set_extent_bit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (14 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 15/22] btrfs: exit after state split error " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 17/22] btrfs: avoid repeated extent state processing when setting extent bits fdmanana
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need to compare the current extent state's end offset to
(u64)-1 to check if we have the last possible record and to check as
as well if after updating the start offset to the end offset of the
current record plus one we are still inside the target range.
Instead we can simplify and exit if the current extent state ends at or
after the target range and then remove the check for the (u64)-1 as well
as the check to see if the updated start offset (to last_end + 1) is still
inside the target range. Besides the simplification, this also avoids
seaching for the next extent state record (through next_state()) when the
current extent state record ends at the same offset as our target range,
which is pointless and only wastes times iterating through the rb tree.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 1460a287c0df..1f3bc12430aa 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1123,12 +1123,11 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
set_state_bits(tree, state, bits, changeset);
cache_state(state, cached_state);
merge_state(tree, state);
- if (last_end == (u64)-1)
+ if (last_end >= end)
goto out;
start = last_end + 1;
state = next_state(state);
- if (start < end && state && state->start == start &&
- !need_resched())
+ if (state && state->start == start && !need_resched())
goto hit_next;
goto search_again;
}
@@ -1180,12 +1179,11 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
set_state_bits(tree, state, bits, changeset);
cache_state(state, cached_state);
merge_state(tree, state);
- if (last_end == (u64)-1)
+ if (last_end >= end)
goto out;
start = last_end + 1;
state = next_state(state);
- if (start < end && state && state->start == start &&
- !need_resched())
+ if (state && state->start == start && !need_resched())
goto hit_next;
}
goto search_again;
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 17/22] btrfs: avoid repeated extent state processing when setting extent bits
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (15 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 16/22] btrfs: simplify last record detection " fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 18/22] btrfs: avoid researching tree when setting bits in an extent range fdmanana
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When setting bits for an extent range, if we find an extent state with
its start offset greater than current start offset, we insert a new extent
state to cover the gap, with its end offset computed and stored in the
@this_end local variable, and after the insertion we update the current
start offset to @this_end + 1. However if the insert_state() call resulted
in an extent state merge then the end offset of the merged extent may be
greater than @this_end and if that's the case, since we jump to the
'search_again' label, we'll do a full tree search that will leave us in
the same extent state - this is harmless but wastes time by doing a
pointless tree search and extent state processing.
So improve on this by updating the current start offset to the end offset
of the inserted state plus 1. This also removes the use of the @this_end
variable and directly set the value in the prealloc extent state to avoid
any confusion and misuse in the future.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 1f3bc12430aa..ea974bde11dd 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1196,14 +1196,8 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
* extent we found.
*/
if (state->start > start) {
- u64 this_end;
struct extent_state *inserted_state;
- if (end < last_start)
- this_end = end;
- else
- this_end = last_start - 1;
-
prealloc = alloc_extent_state_atomic(prealloc);
if (!prealloc)
goto search_again;
@@ -1213,7 +1207,11 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
* extent.
*/
prealloc->start = start;
- prealloc->end = this_end;
+ if (end < last_start)
+ prealloc->end = end;
+ else
+ prealloc->end = last_start - 1;
+
inserted_state = insert_state(tree, prealloc, bits, changeset);
if (IS_ERR(inserted_state)) {
ret = PTR_ERR(inserted_state);
@@ -1224,7 +1222,7 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
cache_state(inserted_state, cached_state);
if (inserted_state == prealloc)
prealloc = NULL;
- start = this_end + 1;
+ start = inserted_state->end + 1;
goto search_again;
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 18/22] btrfs: avoid researching tree when setting bits in an extent range
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (16 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 17/22] btrfs: avoid repeated extent state processing when setting extent bits fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:19 ` [PATCH 19/22] btrfs: remove unnecessary NULL checks before freeing extent state fdmanana
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When setting bits for an extent range (set_extent_bit()), if the current
extent state record starts after the target range, we always do a jump to
the 'search_again' label, which will cause us to do a full tree search for
the next state if the current state ends before the target range. Unless
we need to reschedule, we can just grab the next state and process it,
avoiding a full tree search, even if that next state is not contiguous, as
we'll allocate and insert a new prealloc state if needed.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index ea974bde11dd..7271d6356c32 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1223,6 +1223,22 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
if (inserted_state == prealloc)
prealloc = NULL;
start = inserted_state->end + 1;
+
+ /* Beyond target range, stop. */
+ if (start > end)
+ goto out;
+
+ if (need_resched())
+ goto search_again;
+
+ state = next_search_state(inserted_state, end);
+ /*
+ * If there's a next state, whether contiguous or not, we don't
+ * need to unlock and research. If it's not contiguous we will
+ * end up here and try to allocate a prealloc state and insert.
+ */
+ if (state)
+ goto hit_next;
goto search_again;
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 19/22] btrfs: remove unnecessary NULL checks before freeing extent state
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (17 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 18/22] btrfs: avoid researching tree when setting bits in an extent range fdmanana
@ 2025-04-23 14:19 ` fdmanana
2025-04-23 14:20 ` [PATCH 20/22] btrfs: don't BUG_ON() when unpinning extents during transaction commit fdmanana
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When manipulating extent bits for an io tree range, we have this pattern:
if (prealloc)
btrfs_free_extent_state(prealloc);
but this is not needed nowadays since btrfs_free_extent_state() ignores
a NULL pointer argument, following the common pattern of kernel and btrfs
freeing functions, as well as libc and other user space libraries.
So remove the NULL checks, reducing source code and object size.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 7271d6356c32..f7fe7d23b40a 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -758,8 +758,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
out:
spin_unlock(&tree->lock);
- if (prealloc)
- btrfs_free_extent_state(prealloc);
+ btrfs_free_extent_state(prealloc);
return ret;
@@ -1282,8 +1281,7 @@ static int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
out:
spin_unlock(&tree->lock);
- if (prealloc)
- btrfs_free_extent_state(prealloc);
+ btrfs_free_extent_state(prealloc);
return ret;
@@ -1527,8 +1525,7 @@ int btrfs_convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
out:
spin_unlock(&tree->lock);
- if (prealloc)
- btrfs_free_extent_state(prealloc);
+ btrfs_free_extent_state(prealloc);
return ret;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 20/22] btrfs: don't BUG_ON() when unpinning extents during transaction commit
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (18 preceding siblings ...)
2025-04-23 14:19 ` [PATCH 19/22] btrfs: remove unnecessary NULL checks before freeing extent state fdmanana
@ 2025-04-23 14:20 ` fdmanana
2025-04-23 14:20 ` [PATCH 21/22] btrfs: remove variable to track trimmed bytes at btrfs_finish_extent_commit() fdmanana
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:20 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
In the final phase of a transaction commit, when unpinning extents at
btrfs_finish_extent_commit(), there's no need to BUG_ON() if we fail to
unpin an extent range. All that can happen is that we fail to return the
extent range to the in-memory free space cache, meaning no future space
allocations can reuse that extent range while the fs is mounted.
So instead return the error to the caller and make it abort the
transaction, so that the error is noticed and prevent misteriously leaking
space. We keep track of the first error we get while unpinning an extent
range and keep trying to unpin all the following extent ranges, so that
we attempt to do all discards. The transaction abort will deal with all
resource cleanups.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-tree.c | 20 ++++++++++++++++++--
fs/btrfs/transaction.c | 4 +++-
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a68a8a07caff..190bde099ee8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2821,6 +2821,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
struct extent_io_tree *unpin;
u64 start;
u64 end;
+ int unpin_error = 0;
int ret;
unpin = &trans->transaction->pinned_extents;
@@ -2841,7 +2842,22 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
btrfs_clear_extent_dirty(unpin, start, end, &cached_state);
ret = unpin_extent_range(fs_info, start, end, true);
- BUG_ON(ret);
+ /*
+ * If we get an error unpinning an extent range, store the first
+ * error to return later after trying to unpin all ranges and do
+ * the sync discards. Our caller will abort the transaction
+ * (which already wrote new superblocks) and on the next mount
+ * the space will be available as it was pinned by in-memory
+ * only structures in this phase.
+ */
+ if (ret) {
+ btrfs_err_rl(fs_info,
+"failed to unpin extent range [%llu, %llu] when committing transaction %llu: %s (%d)",
+ start, end, trans->transid,
+ btrfs_decode_error(ret), ret);
+ if (!unpin_error)
+ unpin_error = ret;
+ }
mutex_unlock(&fs_info->unused_bg_unpin_mutex);
btrfs_free_extent_state(cached_state);
cond_resched();
@@ -2888,7 +2904,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
}
}
- return 0;
+ return unpin_error;
}
/*
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 39e48bf610a1..3c00666871da 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2555,7 +2555,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
wake_up(&cur_trans->commit_wait);
btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
- btrfs_finish_extent_commit(trans);
+ ret = btrfs_finish_extent_commit(trans);
+ if (ret)
+ goto scrub_continue;
if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &cur_trans->flags))
btrfs_clear_space_info_full(fs_info);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 21/22] btrfs: remove variable to track trimmed bytes at btrfs_finish_extent_commit()
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (19 preceding siblings ...)
2025-04-23 14:20 ` [PATCH 20/22] btrfs: don't BUG_ON() when unpinning extents during transaction commit fdmanana
@ 2025-04-23 14:20 ` fdmanana
2025-04-23 14:20 ` [PATCH 22/22] btrfs: make extent unpinning more efficient when committing transaction fdmanana
2025-04-28 14:53 ` [PATCH 00/22] btrfs: some io tree optimizations and cleanups David Sterba
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:20 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We don't need to keep track of discarded (trimmed) bytes at
btrfs_finish_extent_commit() but we are declaring a local variable for
that and passing a reference to the btrfs_discard_extent() calls when we
are processing delete block groups. So instead pass NULL to
btrfs_discard_extent() and remove that variable.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-tree.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 190bde099ee8..36a29739eac2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2875,14 +2875,10 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
*/
deleted_bgs = &trans->transaction->deleted_bgs;
list_for_each_entry_safe(block_group, tmp, deleted_bgs, bg_list) {
- u64 trimmed = 0;
-
ret = -EROFS;
if (!TRANS_ABORTED(trans))
- ret = btrfs_discard_extent(fs_info,
- block_group->start,
- block_group->length,
- &trimmed);
+ ret = btrfs_discard_extent(fs_info, block_group->start,
+ block_group->length, NULL);
/*
* Not strictly necessary to lock, as the block_group should be
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 22/22] btrfs: make extent unpinning more efficient when committing transaction
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (20 preceding siblings ...)
2025-04-23 14:20 ` [PATCH 21/22] btrfs: remove variable to track trimmed bytes at btrfs_finish_extent_commit() fdmanana
@ 2025-04-23 14:20 ` fdmanana
2025-04-28 14:53 ` [PATCH 00/22] btrfs: some io tree optimizations and cleanups David Sterba
22 siblings, 0 replies; 24+ messages in thread
From: fdmanana @ 2025-04-23 14:20 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At btrfs_finish_extent_commit() we have this loop that keeps finding an
extent range to unpin in the transaction's pinned_extents io tree, caches
the extent state and then passes that cached extent state to
btrfs_clear_extent_dirty(), which will free that extent state since we
clear the only bit it can have set. So on each loop iteration we do a
full io tree search and the cached state is used only to avoid having
a tree search done by btrfs_clear_extent_dirty().
During the lifetime of a transaction we can pin many thousands of extents,
resulting in a large and deep rb tree that backs the io tree. For example,
for the following fs_mark run on a 12 cores boxes:
$ cat test.sh
#!/bin/bash
DEV=/dev/nullb0
MNT=/mnt/nullb0
FILES=100000
THREADS=$(nproc --all)
echo "performance" | \
tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
mkfs.btrfs -f $DEV
mount $DEV $MNT
OPTS="-S 0 -L 8 -n $FILES -s 0 -t $THREADS -k"
for ((i = 1; i <= $THREADS; i++)); do
OPTS="$OPTS -d $MNT/d$i"
done
fs_mark $OPTS
umount $MNT
an histogram for the number of ranges (elements) in the pinned extents
io tree of a transaction was the following:
Count: 76
Range: 5440.000 - 51088.000; Mean: 27354.368; Median: 28312.000; Stddev: 9800.767
Percentiles: 90th: 40486.000; 95th: 43322.000; 99th: 51088.000
5440.000 - 6805.809: 1 ###
6805.809 - 10652.034: 1 ###
10652.034 - 13326.178: 3 ########
13326.178 - 16671.590: 8 ######################
16671.590 - 20856.773: 7 ####################
20856.773 - 26092.528: 13 ####################################
26092.528 - 32642.571: 19 #####################################################
32642.571 - 40836.818: 17 ###############################################
40836.818 - 51088.000: 7 ####################
We can improve on this by grabbing the next state before calling
btrfs_clear_extent_dirty(), avoiding a full tree search on the next
iteration which always has an O(log n) complexity while grabbing the next
element (rb_next() rbtree operation) is in the worst case O(log n) too,
but very often much less than that, making it more efficient.
Here follow histograms for the execution times, in nanoseconds, of
btrfs_finish_extent_commit() before and after applying this patch and all
the other patches in the same patchset.
Before patchset:
Count: 32
Range: 3925691.000 - 269990635.000; Mean: 133814526.906; Median: 122758052.000; Stddev: 65776550.375
Percentiles: 90th: 228672087.000; 95th: 265187000.000; 99th: 269990635.000
3925691.000 - 5993208.660: 1 ####
5993208.660 - 75878537.656: 4 ##################
75878537.656 - 115840974.514: 12 #####################################################
115840974.514 - 176850157.761: 6 ###########################
176850157.761 - 269990635.000: 9 ########################################
After patchset:
Count: 32
Range: 1849393.000 - 231491064.000; Mean: 126978584.625; Median: 123732897.000; Stddev: 58007821.806
Percentiles: 90th: 203055491.000; 95th: 219952699.000; 99th: 231491064.000
1849393.000 - 2997642.092: 1 ####
2997642.092 - 88111637.071: 9 #####################################
88111637.071 - 142818264.414: 9 #####################################
142818264.414 - 231491064.000: 13 #####################################################
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/extent-io-tree.c | 20 ++++++++++++++++++++
fs/btrfs/extent-io-tree.h | 3 +++
fs/btrfs/extent-tree.c | 40 ++++++++++++++++++++++++++-------------
3 files changed, 50 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index f7fe7d23b40a..15652dcd66a1 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -1923,6 +1923,26 @@ int btrfs_lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, u32
return err;
}
+/*
+ * Get the extent state that follows the given extent state.
+ * This is meant to be used in a context where we know no other tasks can
+ * concurrently modify the tree.
+ */
+struct extent_state *btrfs_next_extent_state(struct extent_io_tree *tree,
+ struct extent_state *state)
+{
+ struct extent_state *next;
+
+ spin_lock(&tree->lock);
+ ASSERT(extent_state_in_tree(state));
+ next = next_state(state);
+ if (next)
+ refcount_inc(&next->refs);
+ spin_unlock(&tree->lock);
+
+ return next;
+}
+
void __cold btrfs_extent_state_free_cachep(void)
{
btrfs_extent_state_leak_debug_check();
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index d8c01d857667..0a18ca9c59c3 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -243,4 +243,7 @@ static inline int btrfs_unlock_dio_extent(struct extent_io_tree *tree, u64 start
cached, NULL);
}
+struct extent_state *btrfs_next_extent_state(struct extent_io_tree *tree,
+ struct extent_state *state);
+
#endif /* BTRFS_EXTENT_IO_TREE_H */
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 36a29739eac2..711a4f2410dc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2818,28 +2818,25 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_block_group *block_group, *tmp;
struct list_head *deleted_bgs;
- struct extent_io_tree *unpin;
+ struct extent_io_tree *unpin = &trans->transaction->pinned_extents;
+ struct extent_state *cached_state = NULL;
u64 start;
u64 end;
int unpin_error = 0;
int ret;
- unpin = &trans->transaction->pinned_extents;
+ mutex_lock(&fs_info->unused_bg_unpin_mutex);
+ btrfs_find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY,
+ &cached_state);
- while (!TRANS_ABORTED(trans)) {
- struct extent_state *cached_state = NULL;
-
- mutex_lock(&fs_info->unused_bg_unpin_mutex);
- if (!btrfs_find_first_extent_bit(unpin, 0, &start, &end,
- EXTENT_DIRTY, &cached_state)) {
- mutex_unlock(&fs_info->unused_bg_unpin_mutex);
- break;
- }
+ while (!TRANS_ABORTED(trans) && cached_state) {
+ struct extent_state *next_state;
if (btrfs_test_opt(fs_info, DISCARD_SYNC))
ret = btrfs_discard_extent(fs_info, start,
end + 1 - start, NULL);
+ next_state = btrfs_next_extent_state(unpin, cached_state);
btrfs_clear_extent_dirty(unpin, start, end, &cached_state);
ret = unpin_extent_range(fs_info, start, end, true);
/*
@@ -2858,10 +2855,27 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
if (!unpin_error)
unpin_error = ret;
}
- mutex_unlock(&fs_info->unused_bg_unpin_mutex);
+
btrfs_free_extent_state(cached_state);
- cond_resched();
+
+ if (need_resched()) {
+ btrfs_free_extent_state(next_state);
+ mutex_unlock(&fs_info->unused_bg_unpin_mutex);
+ cond_resched();
+ cached_state = NULL;
+ mutex_lock(&fs_info->unused_bg_unpin_mutex);
+ btrfs_find_first_extent_bit(unpin, 0, &start, &end,
+ EXTENT_DIRTY, &cached_state);
+ } else {
+ cached_state = next_state;
+ if (cached_state) {
+ start = cached_state->start;
+ end = cached_state->end;
+ }
+ }
}
+ mutex_unlock(&fs_info->unused_bg_unpin_mutex);
+ btrfs_free_extent_state(cached_state);
if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
btrfs_discard_calc_delay(&fs_info->discard_ctl);
--
2.47.2
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 00/22] btrfs: some io tree optimizations and cleanups
2025-04-23 14:19 [PATCH 00/22] btrfs: some io tree optimizations and cleanups fdmanana
` (21 preceding siblings ...)
2025-04-23 14:20 ` [PATCH 22/22] btrfs: make extent unpinning more efficient when committing transaction fdmanana
@ 2025-04-28 14:53 ` David Sterba
22 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2025-04-28 14:53 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Wed, Apr 23, 2025 at 03:19:40PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> An assorted set of optimizations and cleanups related to io trees.
> Details in the change logs.
>
> Filipe Manana (22):
> btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset()
> btrfs: exit after state split error at btrfs_clear_extent_bit_changeset()
> btrfs: add missing error return to btrfs_clear_extent_bit_changeset()
> btrfs: use bools for local variables at btrfs_clear_extent_bit_changeset()
> btrfs: avoid extra tree search at btrfs_clear_extent_bit_changeset()
> btrfs: simplify last record detection at btrfs_clear_extent_bit_changeset()
> btrfs: remove duplicate error check at btrfs_convert_extent_bit()
> btrfs: exit after state split error at btrfs_convert_extent_bit()
> btrfs: exit after state insertion failure at btrfs_convert_extent_bit()
> btrfs: avoid unnecessary next node searches when clearing bits from extent range
> btrfs: avoid repeated extent state processing when converting extent bits
> btrfs: avoid researching tree when converting bits in an extent range
> btrfs: simplify last record detection at btrfs_convert_extent_bit()
> btrfs: exit after state insertion failure at set_extent_bit()
> btrfs: exit after state split error at set_extent_bit()
> btrfs: simplify last record detection at set_extent_bit()
> btrfs: avoid repeated extent state processing when setting extent bits
> btrfs: avoid researching tree when setting bits in an extent range
> btrfs: remove unnecessary NULL checks before freeing extent state
> btrfs: don't BUG_ON() when unpinning extents during transaction commit
> btrfs: remove variable to track trimmed bytes at btrfs_finish_extent_commit()
> btrfs: make extent unpinning more efficient when committing transaction
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 24+ messages in thread