* [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated
@ 2025-01-13 9:42 Qu Wenruo
2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw)
To: linux-btrfs
Currently ordered extents cleanup is delayed, e.g:
Cow_file_range() and run_delalloc_nocow() all can allocate ordered
extents by themselves.
But the ordered extents cleanup is not happending in those functions,
but at the caller, btrfs_run_delalloc_range().
This is not the common practice, and has already caused various ordered
extent double accounting (fixed by the recent error handling patchset).
So this series will address the problem by:
- Refactor run_delalloc_nocow() to extract the NOCOW ordered extents
creation
To make later error handling a little simpler.
- Move ordered extents cleanup to where they got created
There are 3 call sites:
- cow_file_range()
This is the simplest one, as the recent fix makes it pretty straight
forward.
- nocow_one_range()
The new helper introduced to created ordered extents and extent maps
for NOCOW writes.
This is also pretty straightforward
- run_delalloc_nocow()
There are 3 different error cases that needs to adjust the ordered
extents cleanup range.
Qu Wenruo (2):
btrfs: extract the nocow ordered extent and extent map generation into
a helper
btrfs: move ordered extent cleanup to where they are allocated
fs/btrfs/inode.c | 188 +++++++++++++++++++++++++++--------------------
1 file changed, 107 insertions(+), 81 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper 2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo @ 2025-01-13 9:42 ` Qu Wenruo 2025-02-06 0:39 ` Boris Burkov 2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo 1 sibling, 1 reply; 5+ messages in thread From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw) To: linux-btrfs Currently we're doing all the ordered extent and extent map generation inside a while() loop of run_delalloc_nocow(). This makes it pretty hard to read, nor do proper error handling. So move that part of code into a helper, nocow_one_range(). This should not change anything, but there is a tiny timing change where btrfs_dec_nocow_writers() is only called after nocow_one_range() helper exits. This timing change is small, and makes error handling easier, thus should be fine. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/inode.c | 130 +++++++++++++++++++++++++---------------------- 1 file changed, 69 insertions(+), 61 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 130f0490b14f..42f67f8a4a33 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode, mapping_set_error(mapping, error); } +static int nocow_one_range(struct btrfs_inode *inode, + struct folio *locked_folio, + struct extent_state **cached, + struct can_nocow_file_extent_args *nocow_args, + u64 file_pos, bool is_prealloc) +{ + struct btrfs_ordered_extent *ordered; + u64 len = nocow_args->file_extent.num_bytes; + u64 end = file_pos + len - 1; + int ret = 0; + + lock_extent(&inode->io_tree, file_pos, end, cached); + + if (is_prealloc) { + struct extent_map *em; + + em = btrfs_create_io_em(inode, file_pos, + &nocow_args->file_extent, + BTRFS_ORDERED_PREALLOC); + if (IS_ERR(em)) { + unlock_extent(&inode->io_tree, file_pos, + end, cached); + return PTR_ERR(em); + } + free_extent_map(em); + } + + ordered = btrfs_alloc_ordered_extent(inode, file_pos, + &nocow_args->file_extent, + is_prealloc + ? (1 << BTRFS_ORDERED_PREALLOC) + : (1 << BTRFS_ORDERED_NOCOW)); + if (IS_ERR(ordered)) { + if (is_prealloc) { + btrfs_drop_extent_map_range(inode, file_pos, + end, false); + } + unlock_extent(&inode->io_tree, file_pos, + end, cached); + return PTR_ERR(ordered); + } + + if (btrfs_is_data_reloc_root(inode->root)) + /* + * Error handled later, as we must prevent + * extent_clear_unlock_delalloc() in error handler + * from freeing metadata of created ordered extent. + */ + ret = btrfs_reloc_clone_csums(ordered); + btrfs_put_ordered_extent(ordered); + + extent_clear_unlock_delalloc(inode, file_pos, end, + locked_folio, cached, + EXTENT_LOCKED | EXTENT_DELALLOC | + EXTENT_CLEAR_DATA_RESV, + PAGE_UNLOCK | PAGE_SET_ORDERED); + + /* + * btrfs_reloc_clone_csums() error, now we're OK to call error + * handler, as metadata for created ordered extent will only + * be freed by btrfs_finish_ordered_io(). + */ + return ret; +} + /* * when nowcow writeback call back. This checks for snapshots or COW copies * of the extents that exist in the file, and COWs the file as required. @@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, while (cur_offset <= end) { struct btrfs_block_group *nocow_bg = NULL; - struct btrfs_ordered_extent *ordered; struct btrfs_key found_key; struct btrfs_file_extent_item *fi; struct extent_buffer *leaf; struct extent_state *cached_state = NULL; u64 extent_end; - u64 nocow_end; int extent_type; - bool is_prealloc; ret = btrfs_lookup_file_extent(NULL, root, path, ino, cur_offset, 0); @@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, } } - nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1; - lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state); - - is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC; - if (is_prealloc) { - struct extent_map *em; - - em = btrfs_create_io_em(inode, cur_offset, - &nocow_args.file_extent, - BTRFS_ORDERED_PREALLOC); - if (IS_ERR(em)) { - unlock_extent(&inode->io_tree, cur_offset, - nocow_end, &cached_state); - btrfs_dec_nocow_writers(nocow_bg); - ret = PTR_ERR(em); - goto error; - } - free_extent_map(em); - } - - ordered = btrfs_alloc_ordered_extent(inode, cur_offset, - &nocow_args.file_extent, - is_prealloc - ? (1 << BTRFS_ORDERED_PREALLOC) - : (1 << BTRFS_ORDERED_NOCOW)); + ret = nocow_one_range(inode, locked_folio, &cached_state, + &nocow_args, cur_offset, + extent_type == BTRFS_FILE_EXTENT_PREALLOC); btrfs_dec_nocow_writers(nocow_bg); - if (IS_ERR(ordered)) { - if (is_prealloc) { - btrfs_drop_extent_map_range(inode, cur_offset, - nocow_end, false); - } - unlock_extent(&inode->io_tree, cur_offset, - nocow_end, &cached_state); - ret = PTR_ERR(ordered); + if (ret < 0) goto error; - } - - if (btrfs_is_data_reloc_root(root)) - /* - * Error handled later, as we must prevent - * extent_clear_unlock_delalloc() in error handler - * from freeing metadata of created ordered extent. - */ - ret = btrfs_reloc_clone_csums(ordered); - btrfs_put_ordered_extent(ordered); - - extent_clear_unlock_delalloc(inode, cur_offset, nocow_end, - locked_folio, &cached_state, - EXTENT_LOCKED | EXTENT_DELALLOC | - EXTENT_CLEAR_DATA_RESV, - PAGE_UNLOCK | PAGE_SET_ORDERED); - cur_offset = extent_end; - - /* - * btrfs_reloc_clone_csums() error, now we're OK to call error - * handler, as metadata for created ordered extent will only - * be freed by btrfs_finish_ordered_io(). - */ - if (ret) - goto error; } btrfs_release_path(path); -- 2.47.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper 2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo @ 2025-02-06 0:39 ` Boris Burkov 0 siblings, 0 replies; 5+ messages in thread From: Boris Burkov @ 2025-02-06 0:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Jan 13, 2025 at 08:12:12PM +1030, Qu Wenruo wrote: > Currently we're doing all the ordered extent and extent map generation > inside a while() loop of run_delalloc_nocow(). > > This makes it pretty hard to read, nor do proper error handling. > > So move that part of code into a helper, nocow_one_range(). > > This should not change anything, but there is a tiny timing change where > btrfs_dec_nocow_writers() is only called after nocow_one_range() helper > exits. > > This timing change is small, and makes error handling easier, thus > should be fine. > Reviewed-by: Boris Burkov <boris@bur.io> > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/inode.c | 130 +++++++++++++++++++++++++---------------------- > 1 file changed, 69 insertions(+), 61 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 130f0490b14f..42f67f8a4a33 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode, > mapping_set_error(mapping, error); > } > > +static int nocow_one_range(struct btrfs_inode *inode, > + struct folio *locked_folio, > + struct extent_state **cached, > + struct can_nocow_file_extent_args *nocow_args, > + u64 file_pos, bool is_prealloc) > +{ > + struct btrfs_ordered_extent *ordered; > + u64 len = nocow_args->file_extent.num_bytes; > + u64 end = file_pos + len - 1; > + int ret = 0; > + > + lock_extent(&inode->io_tree, file_pos, end, cached); > + > + if (is_prealloc) { > + struct extent_map *em; > + > + em = btrfs_create_io_em(inode, file_pos, > + &nocow_args->file_extent, > + BTRFS_ORDERED_PREALLOC); > + if (IS_ERR(em)) { > + unlock_extent(&inode->io_tree, file_pos, > + end, cached); > + return PTR_ERR(em); > + } > + free_extent_map(em); > + } > + > + ordered = btrfs_alloc_ordered_extent(inode, file_pos, > + &nocow_args->file_extent, > + is_prealloc > + ? (1 << BTRFS_ORDERED_PREALLOC) > + : (1 << BTRFS_ORDERED_NOCOW)); > + if (IS_ERR(ordered)) { > + if (is_prealloc) { > + btrfs_drop_extent_map_range(inode, file_pos, > + end, false); > + } > + unlock_extent(&inode->io_tree, file_pos, > + end, cached); > + return PTR_ERR(ordered); > + } > + > + if (btrfs_is_data_reloc_root(inode->root)) > + /* > + * Error handled later, as we must prevent > + * extent_clear_unlock_delalloc() in error handler > + * from freeing metadata of created ordered extent. > + */ > + ret = btrfs_reloc_clone_csums(ordered); > + btrfs_put_ordered_extent(ordered); > + > + extent_clear_unlock_delalloc(inode, file_pos, end, > + locked_folio, cached, > + EXTENT_LOCKED | EXTENT_DELALLOC | > + EXTENT_CLEAR_DATA_RESV, > + PAGE_UNLOCK | PAGE_SET_ORDERED); > + > + /* > + * btrfs_reloc_clone_csums() error, now we're OK to call error > + * handler, as metadata for created ordered extent will only > + * be freed by btrfs_finish_ordered_io(). > + */ > + return ret; > +} > + > /* > * when nowcow writeback call back. This checks for snapshots or COW copies > * of the extents that exist in the file, and COWs the file as required. > @@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > > while (cur_offset <= end) { > struct btrfs_block_group *nocow_bg = NULL; > - struct btrfs_ordered_extent *ordered; > struct btrfs_key found_key; > struct btrfs_file_extent_item *fi; > struct extent_buffer *leaf; > struct extent_state *cached_state = NULL; > u64 extent_end; > - u64 nocow_end; > int extent_type; > - bool is_prealloc; > > ret = btrfs_lookup_file_extent(NULL, root, path, ino, > cur_offset, 0); > @@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > } > } > > - nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1; > - lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state); > - > - is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC; > - if (is_prealloc) { > - struct extent_map *em; > - > - em = btrfs_create_io_em(inode, cur_offset, > - &nocow_args.file_extent, > - BTRFS_ORDERED_PREALLOC); > - if (IS_ERR(em)) { > - unlock_extent(&inode->io_tree, cur_offset, > - nocow_end, &cached_state); > - btrfs_dec_nocow_writers(nocow_bg); > - ret = PTR_ERR(em); > - goto error; > - } > - free_extent_map(em); > - } > - > - ordered = btrfs_alloc_ordered_extent(inode, cur_offset, > - &nocow_args.file_extent, > - is_prealloc > - ? (1 << BTRFS_ORDERED_PREALLOC) > - : (1 << BTRFS_ORDERED_NOCOW)); > + ret = nocow_one_range(inode, locked_folio, &cached_state, > + &nocow_args, cur_offset, > + extent_type == BTRFS_FILE_EXTENT_PREALLOC); > btrfs_dec_nocow_writers(nocow_bg); > - if (IS_ERR(ordered)) { > - if (is_prealloc) { > - btrfs_drop_extent_map_range(inode, cur_offset, > - nocow_end, false); > - } > - unlock_extent(&inode->io_tree, cur_offset, > - nocow_end, &cached_state); > - ret = PTR_ERR(ordered); > + if (ret < 0) > goto error; > - } > - > - if (btrfs_is_data_reloc_root(root)) > - /* > - * Error handled later, as we must prevent > - * extent_clear_unlock_delalloc() in error handler > - * from freeing metadata of created ordered extent. > - */ > - ret = btrfs_reloc_clone_csums(ordered); > - btrfs_put_ordered_extent(ordered); > - > - extent_clear_unlock_delalloc(inode, cur_offset, nocow_end, > - locked_folio, &cached_state, > - EXTENT_LOCKED | EXTENT_DELALLOC | > - EXTENT_CLEAR_DATA_RESV, > - PAGE_UNLOCK | PAGE_SET_ORDERED); > - > cur_offset = extent_end; > - > - /* > - * btrfs_reloc_clone_csums() error, now we're OK to call error > - * handler, as metadata for created ordered extent will only > - * be freed by btrfs_finish_ordered_io(). > - */ > - if (ret) > - goto error; > } > btrfs_release_path(path); > > -- > 2.47.1 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated 2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo 2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo @ 2025-01-13 9:42 ` Qu Wenruo 2025-02-06 0:39 ` Boris Burkov 1 sibling, 1 reply; 5+ messages in thread From: Qu Wenruo @ 2025-01-13 9:42 UTC (permalink / raw) To: linux-btrfs The ordered extent cleanup is hard to grasp because it doesn't follow the common cleanup-asap pattern. E.g. run_delalloc_nocow() and cow_file_range() allocate one or more ordered extents, but if any error is hit, the cleanup is done later inside btrfs_run_delalloc_range(). To change the existing delayed cleanup: - Update the comment on error handling of run_delalloc_nocow() There are in fact 3 different cases other than 2 if we are doing ordered extents cleanup inside run_delalloc_nocow(): 1) @cow_start and @cow_end not set No fallback to COW at all. Before @cur_offset we need to cleanup the OE and page dirty. After @cur_offset just clear all involved page and extent flags. 2) @cow_start set but @cow_end not set. This means we failed before even calling fallback_to_cow(). It's just an variant of case 1), where it's @cow_start splitting the two parts (and we should just ignore @cur_offset since it's advanced without any new ordered extent). 3) @cow_start and @cow_end both set This means fallback_to_cow() failed, meaning [start, cow_start) needs the regular OE and dirty folio cleanup, and skip range [cow_start, cow_end) as cow_file_range() has done the cleanup, and eventually cleanup [cow_end, end) range. - Only reset @cow_start after fallback_to_cow() succeeded As above case 2) and 3) are both relying on @cow_start to determine cleanup range. - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(), cow_file_range() and nocow_one_range() For cow_file_range() it's pretty straightforward and easy. For run_delalloc_nocow() refer to the above 3 different error cases. For nocow_one_range() if we hit an error, we need to cleanup the ordered extents by ourselves. And then it will fallback to case 1), since @cur_offset is not yet advanced, the existing cleanup will co-operate with nocow_one_range() well. - Remove the btrfs_cleanup_ordered_extents() inside submit_uncompressed_range() As failed cow_file_range() will do all the proper cleanup now. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------ 1 file changed, 42 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 42f67f8a4a33..8e8b08412d35 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, &wbc, false); wbc_detach_inode(&wbc); if (ret < 0) { - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); if (locked_folio) btrfs_folio_end_lock(inode->root->fs_info, locked_folio, start, async_extent->ram_size); @@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, * - Else all pages except for @locked_folio are unlocked. * * When a failure happens in the second or later iteration of the - * while-loop, the ordered extents created in previous iterations are kept - * intact. So, the caller must clean them up by calling - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for - * example. + * while-loop, the ordered extents created in previous iterations are cleaned up. */ static noinline int cow_file_range(struct btrfs_inode *inode, struct folio *locked_folio, u64 start, @@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode, /* * For the range (1). We have already instantiated the ordered extents - * for this region. They are cleaned up by - * btrfs_cleanup_ordered_extents() in e.g, - * btrfs_run_delalloc_range(). + * for this region, thus we need to cleanup those ordered extents. * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV - * are also handled by the cleanup function. + * are also handled by the ordered extents cleanup. * * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag, * and finish the writeback of the involved folios, which will be @@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, if (!locked_folio) mapping_set_error(inode->vfs_inode.i_mapping, ret); + + btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start); extent_clear_unlock_delalloc(inode, orig_start, start - 1, locked_folio, NULL, clear_bits, page_ops); } @@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_CLEAR_DATA_RESV, PAGE_UNLOCK | PAGE_SET_ORDERED); - /* - * btrfs_reloc_clone_csums() error, now we're OK to call error - * handler, as metadata for created ordered extent will only - * be freed by btrfs_finish_ordered_io(). + * On error, we need to cleanup the ordered extents we created. + * + * We also need to clear the folio Dirty flags for the range, + * but it's not something touched by us, it will be cleared + * by the caller (with cleanup_dirty_folios()). */ + if (ret < 0) + btrfs_cleanup_ordered_extents(inode, file_pos, end); return ret; } @@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, if (cow_start != (u64)-1) { ret = fallback_to_cow(inode, locked_folio, cow_start, found_key.offset - 1); - cow_start = (u64)-1; if (ret) { cow_end = found_key.offset - 1; btrfs_dec_nocow_writers(nocow_bg); goto error; } + cow_start = (u64)-1; } ret = nocow_one_range(inode, locked_folio, &cached_state, @@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, if (cow_start != (u64)-1) { ret = fallback_to_cow(inode, locked_folio, cow_start, end); - cow_start = (u64)-1; if (ret) { cow_end = end; goto error; } + cow_start = (u64)-1; } btrfs_free_path(path); @@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, * start cur_offset end * |/////////////| | * + * In this case, cow_start should be (u64)-1. + * * For range [start, cur_offset) the folios are already unlocked (except * @locked_folio), EXTENT_DELALLOC already removed. * Only need to clear the dirty flag as they will never be submitted. * Ordered extent and extent maps are handled by * btrfs_mark_ordered_io_finished() inside run_delalloc_range(). * - * 2) Failed with error from fallback_to_cow() - * start cur_offset cow_end end + * 2) Failed with error before calling fallback_to_cow() + * + * start cow_start end + * |/////////////| | + * + * In this case, only @cow_start is set, @cur_offset is between + * [cow_start, end) + * + * It's mostly the same as case 1), just replace @cur_offset with + * @cow_start. + * + * 3) Failed with error from fallback_to_cow() + * + * start cow_start cow_end end * |/////////////|-----------| | * + * In this case, both @cow_start and @cow_end is set. + * * For range [start, cur_offset) it's the same as case 1). * But for range [cur_offset, cow_end), the folios have dirty flag * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range(). @@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, * Thus we should not call extent_clear_unlock_delalloc() on range * [cur_offset, cow_end), as the folios are already unlocked. * - * So clear the folio dirty flags for [start, cur_offset) first. + * + * So for all above cases, if @cow_start is set, cleanup ordered extents + * for range [start, @cow_start), other wise cleanup range [start, @cur_offset). */ - if (cur_offset > start) + if (cow_start != (u64)-1) + cur_offset = cow_start; + + if (cur_offset > start) { + btrfs_cleanup_ordered_extents(inode, start, cur_offset - start); cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret); + } /* * If an error happened while a COW region is outstanding, cur_offset @@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol if (should_nocow(inode, start, end)) { ret = run_delalloc_nocow(inode, locked_folio, start, end); - goto out; + return ret; } if (btrfs_inode_can_compress(inode) && @@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol else ret = cow_file_range(inode, locked_folio, start, end, NULL, false, false); - -out: - if (ret < 0) - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); return ret; } -- 2.47.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated 2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo @ 2025-02-06 0:39 ` Boris Burkov 0 siblings, 0 replies; 5+ messages in thread From: Boris Burkov @ 2025-02-06 0:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Jan 13, 2025 at 08:12:13PM +1030, Qu Wenruo wrote: > The ordered extent cleanup is hard to grasp because it doesn't follow > the common cleanup-asap pattern. > > E.g. run_delalloc_nocow() and cow_file_range() allocate one or more > ordered extents, but if any error is hit, the cleanup is done later inside > btrfs_run_delalloc_range(). > > To change the existing delayed cleanup: > > - Update the comment on error handling of run_delalloc_nocow() > There are in fact 3 different cases other than 2 if we are doing > ordered extents cleanup inside run_delalloc_nocow(): > > 1) @cow_start and @cow_end not set > No fallback to COW at all. > Before @cur_offset we need to cleanup the OE and page dirty. > After @cur_offset just clear all involved page and extent flags. > > 2) @cow_start set but @cow_end not set. > This means we failed before even calling fallback_to_cow(). > It's just an variant of case 1), where it's @cow_start splitting > the two parts (and we should just ignore @cur_offset since it's > advanced without any new ordered extent). > > 3) @cow_start and @cow_end both set > This means fallback_to_cow() failed, meaning [start, cow_start) > needs the regular OE and dirty folio cleanup, and skip range > [cow_start, cow_end) as cow_file_range() has done the cleanup, > and eventually cleanup [cow_end, end) range. > > - Only reset @cow_start after fallback_to_cow() succeeded > As above case 2) and 3) are both relying on @cow_start to determine > cleanup range. > > - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(), > cow_file_range() and nocow_one_range() > > For cow_file_range() it's pretty straightforward and easy. > > For run_delalloc_nocow() refer to the above 3 different error cases. > > For nocow_one_range() if we hit an error, we need to cleanup the > ordered extents by ourselves. > And then it will fallback to case 1), since @cur_offset is not yet > advanced, the existing cleanup will co-operate with nocow_one_range() > well. > > - Remove the btrfs_cleanup_ordered_extents() inside > submit_uncompressed_range() > As failed cow_file_range() will do all the proper cleanup now. > LGTM, thanks for all the extra explanations in the commit and comments. If you fix the IMO serious comment error I pointed out inline, please add Reviewed-by: Boris Burkov <boris@bur.io> > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------ > 1 file changed, 42 insertions(+), 24 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 42f67f8a4a33..8e8b08412d35 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, > &wbc, false); > wbc_detach_inode(&wbc); > if (ret < 0) { > - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); > if (locked_folio) > btrfs_folio_end_lock(inode->root->fs_info, locked_folio, > start, async_extent->ram_size); > @@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, > * - Else all pages except for @locked_folio are unlocked. > * > * When a failure happens in the second or later iteration of the > - * while-loop, the ordered extents created in previous iterations are kept > - * intact. So, the caller must clean them up by calling > - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for > - * example. > + * while-loop, the ordered extents created in previous iterations are cleaned up. > */ > static noinline int cow_file_range(struct btrfs_inode *inode, > struct folio *locked_folio, u64 start, > @@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > /* > * For the range (1). We have already instantiated the ordered extents > - * for this region. They are cleaned up by > - * btrfs_cleanup_ordered_extents() in e.g, > - * btrfs_run_delalloc_range(). > + * for this region, thus we need to cleanup those ordered extents. > * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV > - * are also handled by the cleanup function. > + * are also handled by the ordered extents cleanup. > * > * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag, > * and finish the writeback of the involved folios, which will be > @@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > if (!locked_folio) > mapping_set_error(inode->vfs_inode.i_mapping, ret); > + > + btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start); > extent_clear_unlock_delalloc(inode, orig_start, start - 1, > locked_folio, NULL, clear_bits, page_ops); > } > @@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode, > EXTENT_LOCKED | EXTENT_DELALLOC | > EXTENT_CLEAR_DATA_RESV, > PAGE_UNLOCK | PAGE_SET_ORDERED); > - > /* > - * btrfs_reloc_clone_csums() error, now we're OK to call error > - * handler, as metadata for created ordered extent will only > - * be freed by btrfs_finish_ordered_io(). > + * On error, we need to cleanup the ordered extents we created. > + * > + * We also need to clear the folio Dirty flags for the range, > + * but it's not something touched by us, it will be cleared > + * by the caller (with cleanup_dirty_folios()). I don't love this phrasing about the Dirty flags for some reason. Not a deal breaker, though. How about: "We do not clear the folio Dirty flags because they are set and cleaered by the caller" or something like that? > */ > + if (ret < 0) > + btrfs_cleanup_ordered_extents(inode, file_pos, end); > return ret; > } > > @@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > if (cow_start != (u64)-1) { > ret = fallback_to_cow(inode, locked_folio, cow_start, > found_key.offset - 1); > - cow_start = (u64)-1; > if (ret) { > cow_end = found_key.offset - 1; > btrfs_dec_nocow_writers(nocow_bg); > goto error; > } > + cow_start = (u64)-1; > } > > ret = nocow_one_range(inode, locked_folio, &cached_state, > @@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > > if (cow_start != (u64)-1) { > ret = fallback_to_cow(inode, locked_folio, cow_start, end); > - cow_start = (u64)-1; > if (ret) { > cow_end = end; > goto error; > } > + cow_start = (u64)-1; > } > > btrfs_free_path(path); > @@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > * start cur_offset end > * |/////////////| | > * > + * In this case, cow_start should be (u64)-1. > + * > * For range [start, cur_offset) the folios are already unlocked (except > * @locked_folio), EXTENT_DELALLOC already removed. > * Only need to clear the dirty flag as they will never be submitted. > * Ordered extent and extent maps are handled by > * btrfs_mark_ordered_io_finished() inside run_delalloc_range(). I believe this comment is quite wrong now, and does not represent the new logic where we cleanup the ordered extents up to cur_offset (or cow_start in case 2) here rather than in run_delalloc_range > * > - * 2) Failed with error from fallback_to_cow() > - * start cur_offset cow_end end > + * 2) Failed with error before calling fallback_to_cow() > + * > + * start cow_start end > + * |/////////////| | > + * > + * In this case, only @cow_start is set, @cur_offset is between > + * [cow_start, end) > + * > + * It's mostly the same as case 1), just replace @cur_offset with > + * @cow_start. > + * > + * 3) Failed with error from fallback_to_cow() > + * > + * start cow_start cow_end end > * |/////////////|-----------| | > * > + * In this case, both @cow_start and @cow_end is set. > + * > * For range [start, cur_offset) it's the same as case 1). > * But for range [cur_offset, cow_end), the folios have dirty flag > * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range(). > @@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, > * Thus we should not call extent_clear_unlock_delalloc() on range > * [cur_offset, cow_end), as the folios are already unlocked. > * > - * So clear the folio dirty flags for [start, cur_offset) first. > + * > + * So for all above cases, if @cow_start is set, cleanup ordered extents > + * for range [start, @cow_start), other wise cleanup range [start, @cur_offset). > */ > - if (cur_offset > start) > + if (cow_start != (u64)-1) > + cur_offset = cow_start; > + > + if (cur_offset > start) { > + btrfs_cleanup_ordered_extents(inode, start, cur_offset - start); > cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret); > + } > > /* > * If an error happened while a COW region is outstanding, cur_offset > @@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol > > if (should_nocow(inode, start, end)) { > ret = run_delalloc_nocow(inode, locked_folio, start, end); > - goto out; > + return ret; > } > > if (btrfs_inode_can_compress(inode) && > @@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol > else > ret = cow_file_range(inode, locked_folio, start, end, NULL, > false, false); > - > -out: > - if (ret < 0) > - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); > return ret; > } > > -- > 2.47.1 > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-06 0:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-13 9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo 2025-01-13 9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo 2025-02-06 0:39 ` Boris Burkov 2025-01-13 9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo 2025-02-06 0:39 ` Boris Burkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox