* [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated
@ 2025-01-13 9:42 Qu Wenruo
2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw)
To: linux-btrfs
Currently ordered extents cleanup is delayed, e.g:
Cow_file_range() and run_delalloc_nocow() all can allocate ordered
extents by themselves.
But the ordered extents cleanup is not happending in those functions,
but at the caller, btrfs_run_delalloc_range().
This is not the common practice, and has already caused various ordered
extent double accounting (fixed by the recent error handling patchset).
So this series will address the problem by:
- Refactor run_delalloc_nocow() to extract the NOCOW ordered extents
creation
To make later error handling a little simpler.
- Move ordered extents cleanup to where they got created
There are 3 call sites:
- cow_file_range()
This is the simplest one, as the recent fix makes it pretty straight
forward.
- nocow_one_range()
The new helper introduced to created ordered extents and extent maps
for NOCOW writes.
This is also pretty straightforward
- run_delalloc_nocow()
There are 3 different error cases that needs to adjust the ordered
extents cleanup range.
Qu Wenruo (2):
btrfs: extract the nocow ordered extent and extent map generation into
a helper
btrfs: move ordered extent cleanup to where they are allocated
fs/btrfs/inode.c | 188 +++++++++++++++++++++++++++--------------------
1 file changed, 107 insertions(+), 81 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper
2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
@ 2025-01-13 9:42 ` Qu Wenruo
2025-02-06 0:39 ` Boris Burkov
2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw)
To: linux-btrfs
Currently we're doing all the ordered extent and extent map generation
inside a while() loop of run_delalloc_nocow().
This makes it pretty hard to read, nor do proper error handling.
So move that part of code into a helper, nocow_one_range().
This should not change anything, but there is a tiny timing change where
btrfs_dec_nocow_writers() is only called after nocow_one_range() helper
exits.
This timing change is small, and makes error handling easier, thus
should be fine.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 130 +++++++++++++++++++++++++----------------------
1 file changed, 69 insertions(+), 61 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 130f0490b14f..42f67f8a4a33 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode,
mapping_set_error(mapping, error);
}
+static int nocow_one_range(struct btrfs_inode *inode,
+ struct folio *locked_folio,
+ struct extent_state **cached,
+ struct can_nocow_file_extent_args *nocow_args,
+ u64 file_pos, bool is_prealloc)
+{
+ struct btrfs_ordered_extent *ordered;
+ u64 len = nocow_args->file_extent.num_bytes;
+ u64 end = file_pos + len - 1;
+ int ret = 0;
+
+ lock_extent(&inode->io_tree, file_pos, end, cached);
+
+ if (is_prealloc) {
+ struct extent_map *em;
+
+ em = btrfs_create_io_em(inode, file_pos,
+ &nocow_args->file_extent,
+ BTRFS_ORDERED_PREALLOC);
+ if (IS_ERR(em)) {
+ unlock_extent(&inode->io_tree, file_pos,
+ end, cached);
+ return PTR_ERR(em);
+ }
+ free_extent_map(em);
+ }
+
+ ordered = btrfs_alloc_ordered_extent(inode, file_pos,
+ &nocow_args->file_extent,
+ is_prealloc
+ ? (1 << BTRFS_ORDERED_PREALLOC)
+ : (1 << BTRFS_ORDERED_NOCOW));
+ if (IS_ERR(ordered)) {
+ if (is_prealloc) {
+ btrfs_drop_extent_map_range(inode, file_pos,
+ end, false);
+ }
+ unlock_extent(&inode->io_tree, file_pos,
+ end, cached);
+ return PTR_ERR(ordered);
+ }
+
+ if (btrfs_is_data_reloc_root(inode->root))
+ /*
+ * Error handled later, as we must prevent
+ * extent_clear_unlock_delalloc() in error handler
+ * from freeing metadata of created ordered extent.
+ */
+ ret = btrfs_reloc_clone_csums(ordered);
+ btrfs_put_ordered_extent(ordered);
+
+ extent_clear_unlock_delalloc(inode, file_pos, end,
+ locked_folio, cached,
+ EXTENT_LOCKED | EXTENT_DELALLOC |
+ EXTENT_CLEAR_DATA_RESV,
+ PAGE_UNLOCK | PAGE_SET_ORDERED);
+
+ /*
+ * btrfs_reloc_clone_csums() error, now we're OK to call error
+ * handler, as metadata for created ordered extent will only
+ * be freed by btrfs_finish_ordered_io().
+ */
+ return ret;
+}
+
/*
* when nowcow writeback call back. This checks for snapshots or COW copies
* of the extents that exist in the file, and COWs the file as required.
@@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
while (cur_offset <= end) {
struct btrfs_block_group *nocow_bg = NULL;
- struct btrfs_ordered_extent *ordered;
struct btrfs_key found_key;
struct btrfs_file_extent_item *fi;
struct extent_buffer *leaf;
struct extent_state *cached_state = NULL;
u64 extent_end;
- u64 nocow_end;
int extent_type;
- bool is_prealloc;
ret = btrfs_lookup_file_extent(NULL, root, path, ino,
cur_offset, 0);
@@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
}
}
- nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
- lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state);
-
- is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
- if (is_prealloc) {
- struct extent_map *em;
-
- em = btrfs_create_io_em(inode, cur_offset,
- &nocow_args.file_extent,
- BTRFS_ORDERED_PREALLOC);
- if (IS_ERR(em)) {
- unlock_extent(&inode->io_tree, cur_offset,
- nocow_end, &cached_state);
- btrfs_dec_nocow_writers(nocow_bg);
- ret = PTR_ERR(em);
- goto error;
- }
- free_extent_map(em);
- }
-
- ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
- &nocow_args.file_extent,
- is_prealloc
- ? (1 << BTRFS_ORDERED_PREALLOC)
- : (1 << BTRFS_ORDERED_NOCOW));
+ ret = nocow_one_range(inode, locked_folio, &cached_state,
+ &nocow_args, cur_offset,
+ extent_type == BTRFS_FILE_EXTENT_PREALLOC);
btrfs_dec_nocow_writers(nocow_bg);
- if (IS_ERR(ordered)) {
- if (is_prealloc) {
- btrfs_drop_extent_map_range(inode, cur_offset,
- nocow_end, false);
- }
- unlock_extent(&inode->io_tree, cur_offset,
- nocow_end, &cached_state);
- ret = PTR_ERR(ordered);
+ if (ret < 0)
goto error;
- }
-
- if (btrfs_is_data_reloc_root(root))
- /*
- * Error handled later, as we must prevent
- * extent_clear_unlock_delalloc() in error handler
- * from freeing metadata of created ordered extent.
- */
- ret = btrfs_reloc_clone_csums(ordered);
- btrfs_put_ordered_extent(ordered);
-
- extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
- locked_folio, &cached_state,
- EXTENT_LOCKED | EXTENT_DELALLOC |
- EXTENT_CLEAR_DATA_RESV,
- PAGE_UNLOCK | PAGE_SET_ORDERED);
-
cur_offset = extent_end;
-
- /*
- * btrfs_reloc_clone_csums() error, now we're OK to call error
- * handler, as metadata for created ordered extent will only
- * be freed by btrfs_finish_ordered_io().
- */
- if (ret)
- goto error;
}
btrfs_release_path(path);
--
2.47.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated
2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
@ 2025-01-13 9:42 ` Qu Wenruo
2025-02-06 0:39 ` Boris Burkov
1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw)
To: linux-btrfs
The ordered extent cleanup is hard to grasp because it doesn't follow
the common cleanup-asap pattern.
E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
ordered extents, but if any error is hit, the cleanup is done later inside
btrfs_run_delalloc_range().
To change the existing delayed cleanup:
- Update the comment on error handling of run_delalloc_nocow()
There are in fact 3 different cases other than 2 if we are doing
ordered extents cleanup inside run_delalloc_nocow():
1) @cow_start and @cow_end not set
No fallback to COW at all.
Before @cur_offset we need to cleanup the OE and page dirty.
After @cur_offset just clear all involved page and extent flags.
2) @cow_start set but @cow_end not set.
This means we failed before even calling fallback_to_cow().
It's just an variant of case 1), where it's @cow_start splitting
the two parts (and we should just ignore @cur_offset since it's
advanced without any new ordered extent).
3) @cow_start and @cow_end both set
This means fallback_to_cow() failed, meaning [start, cow_start)
needs the regular OE and dirty folio cleanup, and skip range
[cow_start, cow_end) as cow_file_range() has done the cleanup,
and eventually cleanup [cow_end, end) range.
- Only reset @cow_start after fallback_to_cow() succeeded
As above case 2) and 3) are both relying on @cow_start to determine
cleanup range.
- Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(),
cow_file_range() and nocow_one_range()
For cow_file_range() it's pretty straightforward and easy.
For run_delalloc_nocow() refer to the above 3 different error cases.
For nocow_one_range() if we hit an error, we need to cleanup the
ordered extents by ourselves.
And then it will fallback to case 1), since @cur_offset is not yet
advanced, the existing cleanup will co-operate with nocow_one_range()
well.
- Remove the btrfs_cleanup_ordered_extents() inside
submit_uncompressed_range()
As failed cow_file_range() will do all the proper cleanup now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 42f67f8a4a33..8e8b08412d35 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
&wbc, false);
wbc_detach_inode(&wbc);
if (ret < 0) {
- btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
if (locked_folio)
btrfs_folio_end_lock(inode->root->fs_info, locked_folio,
start, async_extent->ram_size);
@@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
* - Else all pages except for @locked_folio are unlocked.
*
* When a failure happens in the second or later iteration of the
- * while-loop, the ordered extents created in previous iterations are kept
- * intact. So, the caller must clean them up by calling
- * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
- * example.
+ * while-loop, the ordered extents created in previous iterations are cleaned up.
*/
static noinline int cow_file_range(struct btrfs_inode *inode,
struct folio *locked_folio, u64 start,
@@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
/*
* For the range (1). We have already instantiated the ordered extents
- * for this region. They are cleaned up by
- * btrfs_cleanup_ordered_extents() in e.g,
- * btrfs_run_delalloc_range().
+ * for this region, thus we need to cleanup those ordered extents.
* EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
- * are also handled by the cleanup function.
+ * are also handled by the ordered extents cleanup.
*
* So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag,
* and finish the writeback of the involved folios, which will be
@@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
if (!locked_folio)
mapping_set_error(inode->vfs_inode.i_mapping, ret);
+
+ btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
extent_clear_unlock_delalloc(inode, orig_start, start - 1,
locked_folio, NULL, clear_bits, page_ops);
}
@@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode,
EXTENT_LOCKED | EXTENT_DELALLOC |
EXTENT_CLEAR_DATA_RESV,
PAGE_UNLOCK | PAGE_SET_ORDERED);
-
/*
- * btrfs_reloc_clone_csums() error, now we're OK to call error
- * handler, as metadata for created ordered extent will only
- * be freed by btrfs_finish_ordered_io().
+ * On error, we need to cleanup the ordered extents we created.
+ *
+ * We also need to clear the folio Dirty flags for the range,
+ * but it's not something touched by us, it will be cleared
+ * by the caller (with cleanup_dirty_folios()).
*/
+ if (ret < 0)
+ btrfs_cleanup_ordered_extents(inode, file_pos, end);
return ret;
}
@@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
if (cow_start != (u64)-1) {
ret = fallback_to_cow(inode, locked_folio, cow_start,
found_key.offset - 1);
- cow_start = (u64)-1;
if (ret) {
cow_end = found_key.offset - 1;
btrfs_dec_nocow_writers(nocow_bg);
goto error;
}
+ cow_start = (u64)-1;
}
ret = nocow_one_range(inode, locked_folio, &cached_state,
@@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
if (cow_start != (u64)-1) {
ret = fallback_to_cow(inode, locked_folio, cow_start, end);
- cow_start = (u64)-1;
if (ret) {
cow_end = end;
goto error;
}
+ cow_start = (u64)-1;
}
btrfs_free_path(path);
@@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
* start cur_offset end
* |/////////////| |
*
+ * In this case, cow_start should be (u64)-1.
+ *
* For range [start, cur_offset) the folios are already unlocked (except
* @locked_folio), EXTENT_DELALLOC already removed.
* Only need to clear the dirty flag as they will never be submitted.
* Ordered extent and extent maps are handled by
* btrfs_mark_ordered_io_finished() inside run_delalloc_range().
*
- * 2) Failed with error from fallback_to_cow()
- * start cur_offset cow_end end
+ * 2) Failed with error before calling fallback_to_cow()
+ *
+ * start cow_start end
+ * |/////////////| |
+ *
+ * In this case, only @cow_start is set, @cur_offset is between
+ * [cow_start, end)
+ *
+ * It's mostly the same as case 1), just replace @cur_offset with
+ * @cow_start.
+ *
+ * 3) Failed with error from fallback_to_cow()
+ *
+ * start cow_start cow_end end
* |/////////////|-----------| |
*
+ * In this case, both @cow_start and @cow_end is set.
+ *
* For range [start, cur_offset) it's the same as case 1).
* But for range [cur_offset, cow_end), the folios have dirty flag
* cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
@@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
* Thus we should not call extent_clear_unlock_delalloc() on range
* [cur_offset, cow_end), as the folios are already unlocked.
*
- * So clear the folio dirty flags for [start, cur_offset) first.
+ *
+ * So for all above cases, if @cow_start is set, cleanup ordered extents
+ * for range [start, @cow_start), other wise cleanup range [start, @cur_offset).
*/
- if (cur_offset > start)
+ if (cow_start != (u64)-1)
+ cur_offset = cow_start;
+
+ if (cur_offset > start) {
+ btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+ }
/*
* If an error happened while a COW region is outstanding, cur_offset
@@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
if (should_nocow(inode, start, end)) {
ret = run_delalloc_nocow(inode, locked_folio, start, end);
- goto out;
+ return ret;
}
if (btrfs_inode_can_compress(inode) &&
@@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
else
ret = cow_file_range(inode, locked_folio, start, end, NULL,
false, false);
-
-out:
- if (ret < 0)
- btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
return ret;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated
2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
@ 2025-02-06 0:39 ` Boris Burkov
0 siblings, 0 replies; 5+ messages in thread
From: Boris Burkov @ 2025-02-06 0:39 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Jan 13, 2025 at 08:12:13PM +1030, Qu Wenruo wrote:
> The ordered extent cleanup is hard to grasp because it doesn't follow
> the common cleanup-asap pattern.
>
> E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
> ordered extents, but if any error is hit, the cleanup is done later inside
> btrfs_run_delalloc_range().
>
> To change the existing delayed cleanup:
>
> - Update the comment on error handling of run_delalloc_nocow()
> There are in fact 3 different cases other than 2 if we are doing
> ordered extents cleanup inside run_delalloc_nocow():
>
> 1) @cow_start and @cow_end not set
> No fallback to COW at all.
> Before @cur_offset we need to cleanup the OE and page dirty.
> After @cur_offset just clear all involved page and extent flags.
>
> 2) @cow_start set but @cow_end not set.
> This means we failed before even calling fallback_to_cow().
> It's just an variant of case 1), where it's @cow_start splitting
> the two parts (and we should just ignore @cur_offset since it's
> advanced without any new ordered extent).
>
> 3) @cow_start and @cow_end both set
> This means fallback_to_cow() failed, meaning [start, cow_start)
> needs the regular OE and dirty folio cleanup, and skip range
> [cow_start, cow_end) as cow_file_range() has done the cleanup,
> and eventually cleanup [cow_end, end) range.
>
> - Only reset @cow_start after fallback_to_cow() succeeded
> As above case 2) and 3) are both relying on @cow_start to determine
> cleanup range.
>
> - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(),
> cow_file_range() and nocow_one_range()
>
> For cow_file_range() it's pretty straightforward and easy.
>
> For run_delalloc_nocow() refer to the above 3 different error cases.
>
> For nocow_one_range() if we hit an error, we need to cleanup the
> ordered extents by ourselves.
> And then it will fallback to case 1), since @cur_offset is not yet
> advanced, the existing cleanup will co-operate with nocow_one_range()
> well.
>
> - Remove the btrfs_cleanup_ordered_extents() inside
> submit_uncompressed_range()
> As failed cow_file_range() will do all the proper cleanup now.
>
LGTM, thanks for all the extra explanations in the commit and comments.
If you fix the IMO serious comment error I pointed out inline, please
add
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------
> 1 file changed, 42 insertions(+), 24 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 42f67f8a4a33..8e8b08412d35 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
> &wbc, false);
> wbc_detach_inode(&wbc);
> if (ret < 0) {
> - btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
> if (locked_folio)
> btrfs_folio_end_lock(inode->root->fs_info, locked_folio,
> start, async_extent->ram_size);
> @@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
> * - Else all pages except for @locked_folio are unlocked.
> *
> * When a failure happens in the second or later iteration of the
> - * while-loop, the ordered extents created in previous iterations are kept
> - * intact. So, the caller must clean them up by calling
> - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
> - * example.
> + * while-loop, the ordered extents created in previous iterations are cleaned up.
> */
> static noinline int cow_file_range(struct btrfs_inode *inode,
> struct folio *locked_folio, u64 start,
> @@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>
> /*
> * For the range (1). We have already instantiated the ordered extents
> - * for this region. They are cleaned up by
> - * btrfs_cleanup_ordered_extents() in e.g,
> - * btrfs_run_delalloc_range().
> + * for this region, thus we need to cleanup those ordered extents.
> * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
> - * are also handled by the cleanup function.
> + * are also handled by the ordered extents cleanup.
> *
> * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag,
> * and finish the writeback of the involved folios, which will be
> @@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>
> if (!locked_folio)
> mapping_set_error(inode->vfs_inode.i_mapping, ret);
> +
> + btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
> extent_clear_unlock_delalloc(inode, orig_start, start - 1,
> locked_folio, NULL, clear_bits, page_ops);
> }
> @@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode,
> EXTENT_LOCKED | EXTENT_DELALLOC |
> EXTENT_CLEAR_DATA_RESV,
> PAGE_UNLOCK | PAGE_SET_ORDERED);
> -
> /*
> - * btrfs_reloc_clone_csums() error, now we're OK to call error
> - * handler, as metadata for created ordered extent will only
> - * be freed by btrfs_finish_ordered_io().
> + * On error, we need to cleanup the ordered extents we created.
> + *
> + * We also need to clear the folio Dirty flags for the range,
> + * but it's not something touched by us, it will be cleared
> + * by the caller (with cleanup_dirty_folios()).
I don't love this phrasing about the Dirty flags for some reason. Not a
deal breaker, though.
How about:
"We do not clear the folio Dirty flags because they are set and cleaered
by the caller"
or something like that?
> */
> + if (ret < 0)
> + btrfs_cleanup_ordered_extents(inode, file_pos, end);
> return ret;
> }
>
> @@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> if (cow_start != (u64)-1) {
> ret = fallback_to_cow(inode, locked_folio, cow_start,
> found_key.offset - 1);
> - cow_start = (u64)-1;
> if (ret) {
> cow_end = found_key.offset - 1;
> btrfs_dec_nocow_writers(nocow_bg);
> goto error;
> }
> + cow_start = (u64)-1;
> }
>
> ret = nocow_one_range(inode, locked_folio, &cached_state,
> @@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>
> if (cow_start != (u64)-1) {
> ret = fallback_to_cow(inode, locked_folio, cow_start, end);
> - cow_start = (u64)-1;
> if (ret) {
> cow_end = end;
> goto error;
> }
> + cow_start = (u64)-1;
> }
>
> btrfs_free_path(path);
> @@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> * start cur_offset end
> * |/////////////| |
> *
> + * In this case, cow_start should be (u64)-1.
> + *
> * For range [start, cur_offset) the folios are already unlocked (except
> * @locked_folio), EXTENT_DELALLOC already removed.
> * Only need to clear the dirty flag as they will never be submitted.
> * Ordered extent and extent maps are handled by
> * btrfs_mark_ordered_io_finished() inside run_delalloc_range().
I believe this comment is quite wrong now, and does not represent the
new logic where we cleanup the ordered extents up to cur_offset (or
cow_start in case 2) here rather than in run_delalloc_range
> *
> - * 2) Failed with error from fallback_to_cow()
> - * start cur_offset cow_end end
> + * 2) Failed with error before calling fallback_to_cow()
> + *
> + * start cow_start end
> + * |/////////////| |
> + *
> + * In this case, only @cow_start is set, @cur_offset is between
> + * [cow_start, end)
> + *
> + * It's mostly the same as case 1), just replace @cur_offset with
> + * @cow_start.
> + *
> + * 3) Failed with error from fallback_to_cow()
> + *
> + * start cow_start cow_end end
> * |/////////////|-----------| |
> *
> + * In this case, both @cow_start and @cow_end is set.
> + *
> * For range [start, cur_offset) it's the same as case 1).
> * But for range [cur_offset, cow_end), the folios have dirty flag
> * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
> @@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> * Thus we should not call extent_clear_unlock_delalloc() on range
> * [cur_offset, cow_end), as the folios are already unlocked.
> *
> - * So clear the folio dirty flags for [start, cur_offset) first.
> + *
> + * So for all above cases, if @cow_start is set, cleanup ordered extents
> + * for range [start, @cow_start), other wise cleanup range [start, @cur_offset).
> */
> - if (cur_offset > start)
> + if (cow_start != (u64)-1)
> + cur_offset = cow_start;
> +
> + if (cur_offset > start) {
> + btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
> cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
> + }
>
> /*
> * If an error happened while a COW region is outstanding, cur_offset
> @@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
>
> if (should_nocow(inode, start, end)) {
> ret = run_delalloc_nocow(inode, locked_folio, start, end);
> - goto out;
> + return ret;
> }
>
> if (btrfs_inode_can_compress(inode) &&
> @@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
> else
> ret = cow_file_range(inode, locked_folio, start, end, NULL,
> false, false);
> -
> -out:
> - if (ret < 0)
> - btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
> return ret;
> }
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper
2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
@ 2025-02-06 0:39 ` Boris Burkov
0 siblings, 0 replies; 5+ messages in thread
From: Boris Burkov @ 2025-02-06 0:39 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Jan 13, 2025 at 08:12:12PM +1030, Qu Wenruo wrote:
> Currently we're doing all the ordered extent and extent map generation
> inside a while() loop of run_delalloc_nocow().
>
> This makes it pretty hard to read, nor do proper error handling.
>
> So move that part of code into a helper, nocow_one_range().
>
> This should not change anything, but there is a tiny timing change where
> btrfs_dec_nocow_writers() is only called after nocow_one_range() helper
> exits.
>
> This timing change is small, and makes error handling easier, thus
> should be fine.
>
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/inode.c | 130 +++++++++++++++++++++++++----------------------
> 1 file changed, 69 insertions(+), 61 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 130f0490b14f..42f67f8a4a33 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode,
> mapping_set_error(mapping, error);
> }
>
> +static int nocow_one_range(struct btrfs_inode *inode,
> + struct folio *locked_folio,
> + struct extent_state **cached,
> + struct can_nocow_file_extent_args *nocow_args,
> + u64 file_pos, bool is_prealloc)
> +{
> + struct btrfs_ordered_extent *ordered;
> + u64 len = nocow_args->file_extent.num_bytes;
> + u64 end = file_pos + len - 1;
> + int ret = 0;
> +
> + lock_extent(&inode->io_tree, file_pos, end, cached);
> +
> + if (is_prealloc) {
> + struct extent_map *em;
> +
> + em = btrfs_create_io_em(inode, file_pos,
> + &nocow_args->file_extent,
> + BTRFS_ORDERED_PREALLOC);
> + if (IS_ERR(em)) {
> + unlock_extent(&inode->io_tree, file_pos,
> + end, cached);
> + return PTR_ERR(em);
> + }
> + free_extent_map(em);
> + }
> +
> + ordered = btrfs_alloc_ordered_extent(inode, file_pos,
> + &nocow_args->file_extent,
> + is_prealloc
> + ? (1 << BTRFS_ORDERED_PREALLOC)
> + : (1 << BTRFS_ORDERED_NOCOW));
> + if (IS_ERR(ordered)) {
> + if (is_prealloc) {
> + btrfs_drop_extent_map_range(inode, file_pos,
> + end, false);
> + }
> + unlock_extent(&inode->io_tree, file_pos,
> + end, cached);
> + return PTR_ERR(ordered);
> + }
> +
> + if (btrfs_is_data_reloc_root(inode->root))
> + /*
> + * Error handled later, as we must prevent
> + * extent_clear_unlock_delalloc() in error handler
> + * from freeing metadata of created ordered extent.
> + */
> + ret = btrfs_reloc_clone_csums(ordered);
> + btrfs_put_ordered_extent(ordered);
> +
> + extent_clear_unlock_delalloc(inode, file_pos, end,
> + locked_folio, cached,
> + EXTENT_LOCKED | EXTENT_DELALLOC |
> + EXTENT_CLEAR_DATA_RESV,
> + PAGE_UNLOCK | PAGE_SET_ORDERED);
> +
> + /*
> + * btrfs_reloc_clone_csums() error, now we're OK to call error
> + * handler, as metadata for created ordered extent will only
> + * be freed by btrfs_finish_ordered_io().
> + */
> + return ret;
> +}
> +
> /*
> * when nowcow writeback call back. This checks for snapshots or COW copies
> * of the extents that exist in the file, and COWs the file as required.
> @@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>
> while (cur_offset <= end) {
> struct btrfs_block_group *nocow_bg = NULL;
> - struct btrfs_ordered_extent *ordered;
> struct btrfs_key found_key;
> struct btrfs_file_extent_item *fi;
> struct extent_buffer *leaf;
> struct extent_state *cached_state = NULL;
> u64 extent_end;
> - u64 nocow_end;
> int extent_type;
> - bool is_prealloc;
>
> ret = btrfs_lookup_file_extent(NULL, root, path, ino,
> cur_offset, 0);
> @@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> }
> }
>
> - nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
> - lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state);
> -
> - is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
> - if (is_prealloc) {
> - struct extent_map *em;
> -
> - em = btrfs_create_io_em(inode, cur_offset,
> - &nocow_args.file_extent,
> - BTRFS_ORDERED_PREALLOC);
> - if (IS_ERR(em)) {
> - unlock_extent(&inode->io_tree, cur_offset,
> - nocow_end, &cached_state);
> - btrfs_dec_nocow_writers(nocow_bg);
> - ret = PTR_ERR(em);
> - goto error;
> - }
> - free_extent_map(em);
> - }
> -
> - ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
> - &nocow_args.file_extent,
> - is_prealloc
> - ? (1 << BTRFS_ORDERED_PREALLOC)
> - : (1 << BTRFS_ORDERED_NOCOW));
> + ret = nocow_one_range(inode, locked_folio, &cached_state,
> + &nocow_args, cur_offset,
> + extent_type == BTRFS_FILE_EXTENT_PREALLOC);
> btrfs_dec_nocow_writers(nocow_bg);
> - if (IS_ERR(ordered)) {
> - if (is_prealloc) {
> - btrfs_drop_extent_map_range(inode, cur_offset,
> - nocow_end, false);
> - }
> - unlock_extent(&inode->io_tree, cur_offset,
> - nocow_end, &cached_state);
> - ret = PTR_ERR(ordered);
> + if (ret < 0)
> goto error;
> - }
> -
> - if (btrfs_is_data_reloc_root(root))
> - /*
> - * Error handled later, as we must prevent
> - * extent_clear_unlock_delalloc() in error handler
> - * from freeing metadata of created ordered extent.
> - */
> - ret = btrfs_reloc_clone_csums(ordered);
> - btrfs_put_ordered_extent(ordered);
> -
> - extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
> - locked_folio, &cached_state,
> - EXTENT_LOCKED | EXTENT_DELALLOC |
> - EXTENT_CLEAR_DATA_RESV,
> - PAGE_UNLOCK | PAGE_SET_ORDERED);
> -
> cur_offset = extent_end;
> -
> - /*
> - * btrfs_reloc_clone_csums() error, now we're OK to call error
> - * handler, as metadata for created ordered extent will only
> - * be freed by btrfs_finish_ordered_io().
> - */
> - if (ret)
> - goto error;
> }
> btrfs_release_path(path);
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-06 0:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
2025-02-06 0:39 ` Boris Burkov
2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
2025-02-06 0:39 ` Boris Burkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox