[PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated
@ 2025-01-13  9:42 Qu Wenruo
  2025-01-13  9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
  2025-01-13  9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13  9:42 UTC (permalink / raw)
  To: linux-btrfs

Currently ordered extents cleanup is delayed, e.g:

Cow_file_range() and run_delalloc_nocow() all can allocate ordered
extents by themselves.
But the ordered extents cleanup is not happending in those functions,
but at the caller, btrfs_run_delalloc_range().

This is not the common practice, and has already caused various ordered
extent double accounting (fixed by the recent error handling patchset).

So this series will address the problem by:

- Refactor run_delalloc_nocow() to extract the NOCOW ordered extents
  creation
  To make later error handling a little simpler.

- Move ordered extents cleanup to where they got created
  There are 3 call sites:
  - cow_file_range()
    This is the simplest one, as the recent fix makes it pretty straight
    forward.

  - nocow_one_range()
    The new helper introduced to created ordered extents and extent maps
    for NOCOW writes.
    This is also pretty straightforward

  - run_delalloc_nocow()
    There are 3 different error cases that needs to adjust the ordered
    extents cleanup range.

Qu Wenruo (2):
  btrfs: extract the nocow ordered extent and extent map generation into
    a helper
  btrfs: move ordered extent cleanup to where they are allocated

 fs/btrfs/inode.c | 188 +++++++++++++++++++++++++++--------------------
 1 file changed, 107 insertions(+), 81 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper
  2025-01-13  9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
@ 2025-01-13  9:42 ` Qu Wenruo
  2025-02-06  0:39   ` Boris Burkov
  2025-01-13  9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13  9:42 UTC (permalink / raw)
  To: linux-btrfs

Currently we're doing all the ordered extent and extent map generation
inside a while() loop of run_delalloc_nocow().

This makes it pretty hard to read, nor do proper error handling.

So move that part of code into a helper, nocow_one_range().

This should not change anything, but there is a tiny timing change where
btrfs_dec_nocow_writers() is only called after nocow_one_range() helper
exits.

This timing change is small, and makes error handling easier, thus
should be fine.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 130 +++++++++++++++++++++++++----------------------
 1 file changed, 69 insertions(+), 61 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 130f0490b14f..42f67f8a4a33 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode,
 	mapping_set_error(mapping, error);
 }
 
+static int nocow_one_range(struct btrfs_inode *inode,
+			   struct folio *locked_folio,
+			   struct extent_state **cached,
+			   struct can_nocow_file_extent_args *nocow_args,
+			   u64 file_pos, bool is_prealloc)
+{
+	struct btrfs_ordered_extent *ordered;
+	u64 len = nocow_args->file_extent.num_bytes;
+	u64 end = file_pos + len - 1;
+	int ret = 0;
+
+	lock_extent(&inode->io_tree, file_pos, end, cached);
+
+	if (is_prealloc) {
+		struct extent_map *em;
+
+		em = btrfs_create_io_em(inode, file_pos,
+					&nocow_args->file_extent,
+					BTRFS_ORDERED_PREALLOC);
+		if (IS_ERR(em)) {
+			unlock_extent(&inode->io_tree, file_pos,
+				      end, cached);
+			return PTR_ERR(em);
+		}
+		free_extent_map(em);
+	}
+
+	ordered = btrfs_alloc_ordered_extent(inode, file_pos,
+			&nocow_args->file_extent,
+			is_prealloc
+			? (1 << BTRFS_ORDERED_PREALLOC)
+			: (1 << BTRFS_ORDERED_NOCOW));
+	if (IS_ERR(ordered)) {
+		if (is_prealloc) {
+			btrfs_drop_extent_map_range(inode, file_pos,
+						    end, false);
+		}
+		unlock_extent(&inode->io_tree, file_pos,
+			      end, cached);
+		return PTR_ERR(ordered);
+	}
+
+	if (btrfs_is_data_reloc_root(inode->root))
+		/*
+		 * Error handled later, as we must prevent
+		 * extent_clear_unlock_delalloc() in error handler
+		 * from freeing metadata of created ordered extent.
+		 */
+		ret = btrfs_reloc_clone_csums(ordered);
+	btrfs_put_ordered_extent(ordered);
+
+	extent_clear_unlock_delalloc(inode, file_pos, end,
+				     locked_folio, cached,
+				     EXTENT_LOCKED | EXTENT_DELALLOC |
+				     EXTENT_CLEAR_DATA_RESV,
+				     PAGE_UNLOCK | PAGE_SET_ORDERED);
+
+	/*
+	 * btrfs_reloc_clone_csums() error, now we're OK to call error
+	 * handler, as metadata for created ordered extent will only
+	 * be freed by btrfs_finish_ordered_io().
+	 */
+	return ret;
+}
+
 /*
  * when nowcow writeback call back.  This checks for snapshots or COW copies
  * of the extents that exist in the file, and COWs the file as required.
@@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 	while (cur_offset <= end) {
 		struct btrfs_block_group *nocow_bg = NULL;
-		struct btrfs_ordered_extent *ordered;
 		struct btrfs_key found_key;
 		struct btrfs_file_extent_item *fi;
 		struct extent_buffer *leaf;
 		struct extent_state *cached_state = NULL;
 		u64 extent_end;
-		u64 nocow_end;
 		int extent_type;
-		bool is_prealloc;
 
 		ret = btrfs_lookup_file_extent(NULL, root, path, ino,
 					       cur_offset, 0);
@@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			}
 		}
 
-		nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
-		lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state);
-
-		is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
-		if (is_prealloc) {
-			struct extent_map *em;
-
-			em = btrfs_create_io_em(inode, cur_offset,
-						&nocow_args.file_extent,
-						BTRFS_ORDERED_PREALLOC);
-			if (IS_ERR(em)) {
-				unlock_extent(&inode->io_tree, cur_offset,
-					      nocow_end, &cached_state);
-				btrfs_dec_nocow_writers(nocow_bg);
-				ret = PTR_ERR(em);
-				goto error;
-			}
-			free_extent_map(em);
-		}
-
-		ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
-				&nocow_args.file_extent,
-				is_prealloc
-				? (1 << BTRFS_ORDERED_PREALLOC)
-				: (1 << BTRFS_ORDERED_NOCOW));
+		ret = nocow_one_range(inode, locked_folio, &cached_state,
+				      &nocow_args, cur_offset,
+				      extent_type == BTRFS_FILE_EXTENT_PREALLOC);
 		btrfs_dec_nocow_writers(nocow_bg);
-		if (IS_ERR(ordered)) {
-			if (is_prealloc) {
-				btrfs_drop_extent_map_range(inode, cur_offset,
-							    nocow_end, false);
-			}
-			unlock_extent(&inode->io_tree, cur_offset,
-				      nocow_end, &cached_state);
-			ret = PTR_ERR(ordered);
+		if (ret < 0)
 			goto error;
-		}
-
-		if (btrfs_is_data_reloc_root(root))
-			/*
-			 * Error handled later, as we must prevent
-			 * extent_clear_unlock_delalloc() in error handler
-			 * from freeing metadata of created ordered extent.
-			 */
-			ret = btrfs_reloc_clone_csums(ordered);
-		btrfs_put_ordered_extent(ordered);
-
-		extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
-					     locked_folio, &cached_state,
-					     EXTENT_LOCKED | EXTENT_DELALLOC |
-					     EXTENT_CLEAR_DATA_RESV,
-					     PAGE_UNLOCK | PAGE_SET_ORDERED);
-
 		cur_offset = extent_end;
-
-		/*
-		 * btrfs_reloc_clone_csums() error, now we're OK to call error
-		 * handler, as metadata for created ordered extent will only
-		 * be freed by btrfs_finish_ordered_io().
-		 */
-		if (ret)
-			goto error;
 	}
 	btrfs_release_path(path);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper
  2025-01-13  9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
@ 2025-02-06  0:39   ` Boris Burkov
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Burkov @ 2025-02-06  0:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jan 13, 2025 at 08:12:12PM +1030, Qu Wenruo wrote:
> Currently we're doing all the ordered extent and extent map generation
> inside a while() loop of run_delalloc_nocow().
> 
> This makes it pretty hard to read, nor do proper error handling.
> 
> So move that part of code into a helper, nocow_one_range().
> 
> This should not change anything, but there is a tiny timing change where
> btrfs_dec_nocow_writers() is only called after nocow_one_range() helper
> exits.
> 
> This timing change is small, and makes error handling easier, thus
> should be fine.
> 
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 130 +++++++++++++++++++++++++----------------------
>  1 file changed, 69 insertions(+), 61 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 130f0490b14f..42f67f8a4a33 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1974,6 +1974,71 @@ static void cleanup_dirty_folios(struct btrfs_inode *inode,
>  	mapping_set_error(mapping, error);
>  }
>  
> +static int nocow_one_range(struct btrfs_inode *inode,
> +			   struct folio *locked_folio,
> +			   struct extent_state **cached,
> +			   struct can_nocow_file_extent_args *nocow_args,
> +			   u64 file_pos, bool is_prealloc)
> +{
> +	struct btrfs_ordered_extent *ordered;
> +	u64 len = nocow_args->file_extent.num_bytes;
> +	u64 end = file_pos + len - 1;
> +	int ret = 0;
> +
> +	lock_extent(&inode->io_tree, file_pos, end, cached);
> +
> +	if (is_prealloc) {
> +		struct extent_map *em;
> +
> +		em = btrfs_create_io_em(inode, file_pos,
> +					&nocow_args->file_extent,
> +					BTRFS_ORDERED_PREALLOC);
> +		if (IS_ERR(em)) {
> +			unlock_extent(&inode->io_tree, file_pos,
> +				      end, cached);
> +			return PTR_ERR(em);
> +		}
> +		free_extent_map(em);
> +	}
> +
> +	ordered = btrfs_alloc_ordered_extent(inode, file_pos,
> +			&nocow_args->file_extent,
> +			is_prealloc
> +			? (1 << BTRFS_ORDERED_PREALLOC)
> +			: (1 << BTRFS_ORDERED_NOCOW));
> +	if (IS_ERR(ordered)) {
> +		if (is_prealloc) {
> +			btrfs_drop_extent_map_range(inode, file_pos,
> +						    end, false);
> +		}
> +		unlock_extent(&inode->io_tree, file_pos,
> +			      end, cached);
> +		return PTR_ERR(ordered);
> +	}
> +
> +	if (btrfs_is_data_reloc_root(inode->root))
> +		/*
> +		 * Error handled later, as we must prevent
> +		 * extent_clear_unlock_delalloc() in error handler
> +		 * from freeing metadata of created ordered extent.
> +		 */
> +		ret = btrfs_reloc_clone_csums(ordered);
> +	btrfs_put_ordered_extent(ordered);
> +
> +	extent_clear_unlock_delalloc(inode, file_pos, end,
> +				     locked_folio, cached,
> +				     EXTENT_LOCKED | EXTENT_DELALLOC |
> +				     EXTENT_CLEAR_DATA_RESV,
> +				     PAGE_UNLOCK | PAGE_SET_ORDERED);
> +
> +	/*
> +	 * btrfs_reloc_clone_csums() error, now we're OK to call error
> +	 * handler, as metadata for created ordered extent will only
> +	 * be freed by btrfs_finish_ordered_io().
> +	 */
> +	return ret;
> +}
> +
>  /*
>   * when nowcow writeback call back.  This checks for snapshots or COW copies
>   * of the extents that exist in the file, and COWs the file as required.
> @@ -2018,15 +2083,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  
>  	while (cur_offset <= end) {
>  		struct btrfs_block_group *nocow_bg = NULL;
> -		struct btrfs_ordered_extent *ordered;
>  		struct btrfs_key found_key;
>  		struct btrfs_file_extent_item *fi;
>  		struct extent_buffer *leaf;
>  		struct extent_state *cached_state = NULL;
>  		u64 extent_end;
> -		u64 nocow_end;
>  		int extent_type;
> -		bool is_prealloc;
>  
>  		ret = btrfs_lookup_file_extent(NULL, root, path, ino,
>  					       cur_offset, 0);
> @@ -2160,67 +2222,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  			}
>  		}
>  
> -		nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
> -		lock_extent(&inode->io_tree, cur_offset, nocow_end, &cached_state);
> -
> -		is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
> -		if (is_prealloc) {
> -			struct extent_map *em;
> -
> -			em = btrfs_create_io_em(inode, cur_offset,
> -						&nocow_args.file_extent,
> -						BTRFS_ORDERED_PREALLOC);
> -			if (IS_ERR(em)) {
> -				unlock_extent(&inode->io_tree, cur_offset,
> -					      nocow_end, &cached_state);
> -				btrfs_dec_nocow_writers(nocow_bg);
> -				ret = PTR_ERR(em);
> -				goto error;
> -			}
> -			free_extent_map(em);
> -		}
> -
> -		ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
> -				&nocow_args.file_extent,
> -				is_prealloc
> -				? (1 << BTRFS_ORDERED_PREALLOC)
> -				: (1 << BTRFS_ORDERED_NOCOW));
> +		ret = nocow_one_range(inode, locked_folio, &cached_state,
> +				      &nocow_args, cur_offset,
> +				      extent_type == BTRFS_FILE_EXTENT_PREALLOC);
>  		btrfs_dec_nocow_writers(nocow_bg);
> -		if (IS_ERR(ordered)) {
> -			if (is_prealloc) {
> -				btrfs_drop_extent_map_range(inode, cur_offset,
> -							    nocow_end, false);
> -			}
> -			unlock_extent(&inode->io_tree, cur_offset,
> -				      nocow_end, &cached_state);
> -			ret = PTR_ERR(ordered);
> +		if (ret < 0)
>  			goto error;
> -		}
> -
> -		if (btrfs_is_data_reloc_root(root))
> -			/*
> -			 * Error handled later, as we must prevent
> -			 * extent_clear_unlock_delalloc() in error handler
> -			 * from freeing metadata of created ordered extent.
> -			 */
> -			ret = btrfs_reloc_clone_csums(ordered);
> -		btrfs_put_ordered_extent(ordered);
> -
> -		extent_clear_unlock_delalloc(inode, cur_offset, nocow_end,
> -					     locked_folio, &cached_state,
> -					     EXTENT_LOCKED | EXTENT_DELALLOC |
> -					     EXTENT_CLEAR_DATA_RESV,
> -					     PAGE_UNLOCK | PAGE_SET_ORDERED);
> -
>  		cur_offset = extent_end;
> -
> -		/*
> -		 * btrfs_reloc_clone_csums() error, now we're OK to call error
> -		 * handler, as metadata for created ordered extent will only
> -		 * be freed by btrfs_finish_ordered_io().
> -		 */
> -		if (ret)
> -			goto error;
>  	}
>  	btrfs_release_path(path);
>  
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated
  2025-01-13  9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
  2025-01-13  9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
@ 2025-01-13  9:42 ` Qu Wenruo
  2025-02-06  0:39   ` Boris Burkov
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2025-01-13  9:42 UTC (permalink / raw)
  To: linux-btrfs

The ordered extent cleanup is hard to grasp because it doesn't follow
the common cleanup-asap pattern.

E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
ordered extents, but if any error is hit, the cleanup is done later inside
btrfs_run_delalloc_range().

To change the existing delayed cleanup:

- Update the comment on error handling of run_delalloc_nocow()
  There are in fact 3 different cases other than 2 if we are doing
  ordered extents cleanup inside run_delalloc_nocow():

  1) @cow_start and @cow_end not set
     No fallback to COW at all.
     Before @cur_offset we need to cleanup the OE and page dirty.
     After @cur_offset just clear all involved page and extent flags.

  2) @cow_start set but @cow_end not set.
     This means we failed before even calling fallback_to_cow().
     It's just an variant of case 1), where it's @cow_start splitting
     the two parts (and we should just ignore @cur_offset since it's
     advanced without any new ordered extent).

  3) @cow_start and @cow_end both set
     This means fallback_to_cow() failed, meaning [start, cow_start)
     needs the regular OE and dirty folio cleanup, and skip range
     [cow_start, cow_end) as cow_file_range() has done the cleanup,
     and eventually cleanup [cow_end, end) range.

- Only reset @cow_start after fallback_to_cow() succeeded
  As above case 2) and 3) are both relying on @cow_start to determine
  cleanup range.

- Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(),
  cow_file_range() and nocow_one_range()

  For cow_file_range() it's pretty straightforward and easy.

  For run_delalloc_nocow() refer to the above 3 different error cases.

  For nocow_one_range() if we hit an error, we need to cleanup the
  ordered extents by ourselves.
  And then it will fallback to case 1), since @cur_offset is not yet
  advanced, the existing cleanup will co-operate with nocow_one_range()
  well.

- Remove the btrfs_cleanup_ordered_extents() inside
  submit_uncompressed_range()
  As failed cow_file_range() will do all the proper cleanup now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 42f67f8a4a33..8e8b08412d35 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
 			       &wbc, false);
 	wbc_detach_inode(&wbc);
 	if (ret < 0) {
-		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
 		if (locked_folio)
 			btrfs_folio_end_lock(inode->root->fs_info, locked_folio,
 					     start, async_extent->ram_size);
@@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
  * - Else all pages except for @locked_folio are unlocked.
  *
  * When a failure happens in the second or later iteration of the
- * while-loop, the ordered extents created in previous iterations are kept
- * intact. So, the caller must clean them up by calling
- * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
- * example.
+ * while-loop, the ordered extents created in previous iterations are cleaned up.
  */
 static noinline int cow_file_range(struct btrfs_inode *inode,
 				   struct folio *locked_folio, u64 start,
@@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 	/*
 	 * For the range (1). We have already instantiated the ordered extents
-	 * for this region. They are cleaned up by
-	 * btrfs_cleanup_ordered_extents() in e.g,
-	 * btrfs_run_delalloc_range().
+	 * for this region, thus we need to cleanup those ordered extents.
 	 * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
-	 * are also handled by the cleanup function.
+	 * are also handled by the ordered extents cleanup.
 	 *
 	 * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag,
 	 * and finish the writeback of the involved folios, which will be
@@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		if (!locked_folio)
 			mapping_set_error(inode->vfs_inode.i_mapping, ret);
+
+		btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
 		extent_clear_unlock_delalloc(inode, orig_start, start - 1,
 					     locked_folio, NULL, clear_bits, page_ops);
 	}
@@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode,
 				     EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_CLEAR_DATA_RESV,
 				     PAGE_UNLOCK | PAGE_SET_ORDERED);
-
 	/*
-	 * btrfs_reloc_clone_csums() error, now we're OK to call error
-	 * handler, as metadata for created ordered extent will only
-	 * be freed by btrfs_finish_ordered_io().
+	 * On error, we need to cleanup the ordered extents we created.
+	 *
+	 * We also need to clear the folio Dirty flags for the range,
+	 * but it's not something touched by us, it will be cleared
+	 * by the caller (with cleanup_dirty_folios()).
 	 */
+	if (ret < 0)
+		btrfs_cleanup_ordered_extents(inode, file_pos, end);
 	return ret;
 }
 
@@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		if (cow_start != (u64)-1) {
 			ret = fallback_to_cow(inode, locked_folio, cow_start,
 					      found_key.offset - 1);
-			cow_start = (u64)-1;
 			if (ret) {
 				cow_end = found_key.offset - 1;
 				btrfs_dec_nocow_writers(nocow_bg);
 				goto error;
 			}
+			cow_start = (u64)-1;
 		}
 
 		ret = nocow_one_range(inode, locked_folio, &cached_state,
@@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 	if (cow_start != (u64)-1) {
 		ret = fallback_to_cow(inode, locked_folio, cow_start, end);
-		cow_start = (u64)-1;
 		if (ret) {
 			cow_end = end;
 			goto error;
 		}
+		cow_start = (u64)-1;
 	}
 
 	btrfs_free_path(path);
@@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	 *    start         cur_offset             end
 	 *    |/////////////|                      |
 	 *
+	 *    In this case, cow_start should be (u64)-1.
+	 *
 	 *    For range [start, cur_offset) the folios are already unlocked (except
 	 *    @locked_folio), EXTENT_DELALLOC already removed.
 	 *    Only need to clear the dirty flag as they will never be submitted.
 	 *    Ordered extent and extent maps are handled by
 	 *    btrfs_mark_ordered_io_finished() inside run_delalloc_range().
 	 *
-	 * 2) Failed with error from fallback_to_cow()
-	 *    start         cur_offset  cow_end    end
+	 * 2) Failed with error before calling fallback_to_cow()
+	 *
+	 *    start         cow_start              end
+	 *    |/////////////|                      |
+	 *
+	 *    In this case, only @cow_start is set, @cur_offset is between
+	 *    [cow_start, end)
+	 *
+	 *    It's mostly the same as case 1), just replace @cur_offset with
+	 *    @cow_start.
+	 *
+	 * 3) Failed with error from fallback_to_cow()
+	 *
+	 *    start         cow_start   cow_end    end
 	 *    |/////////////|-----------|          |
 	 *
+	 *    In this case, both @cow_start and @cow_end is set.
+	 *
 	 *    For range [start, cur_offset) it's the same as case 1).
 	 *    But for range [cur_offset, cow_end), the folios have dirty flag
 	 *    cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
@@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 	 *    Thus we should not call extent_clear_unlock_delalloc() on range
 	 *    [cur_offset, cow_end), as the folios are already unlocked.
 	 *
-	 * So clear the folio dirty flags for [start, cur_offset) first.
+	 *
+	 * So for all above cases, if @cow_start is set, cleanup ordered extents
+	 * for range [start, @cow_start), other wise cleanup range [start, @cur_offset).
 	 */
-	if (cur_offset > start)
+	if (cow_start != (u64)-1)
+		cur_offset = cow_start;
+
+	if (cur_offset > start) {
+		btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
 		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+	}
 
 	/*
 	 * If an error happened while a COW region is outstanding, cur_offset
@@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
 
 	if (should_nocow(inode, start, end)) {
 		ret = run_delalloc_nocow(inode, locked_folio, start, end);
-		goto out;
+		return ret;
 	}
 
 	if (btrfs_inode_can_compress(inode) &&
@@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
 	else
 		ret = cow_file_range(inode, locked_folio, start, end, NULL,
 				     false, false);
-
-out:
-	if (ret < 0)
-		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
 	return ret;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated
  2025-01-13  9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
@ 2025-02-06  0:39   ` Boris Burkov
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Burkov @ 2025-02-06  0:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jan 13, 2025 at 08:12:13PM +1030, Qu Wenruo wrote:
> The ordered extent cleanup is hard to grasp because it doesn't follow
> the common cleanup-asap pattern.
> 
> E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
> ordered extents, but if any error is hit, the cleanup is done later inside
> btrfs_run_delalloc_range().
> 
> To change the existing delayed cleanup:
> 
> - Update the comment on error handling of run_delalloc_nocow()
>   There are in fact 3 different cases other than 2 if we are doing
>   ordered extents cleanup inside run_delalloc_nocow():
> 
>   1) @cow_start and @cow_end not set
>      No fallback to COW at all.
>      Before @cur_offset we need to cleanup the OE and page dirty.
>      After @cur_offset just clear all involved page and extent flags.
> 
>   2) @cow_start set but @cow_end not set.
>      This means we failed before even calling fallback_to_cow().
>      It's just an variant of case 1), where it's @cow_start splitting
>      the two parts (and we should just ignore @cur_offset since it's
>      advanced without any new ordered extent).
> 
>   3) @cow_start and @cow_end both set
>      This means fallback_to_cow() failed, meaning [start, cow_start)
>      needs the regular OE and dirty folio cleanup, and skip range
>      [cow_start, cow_end) as cow_file_range() has done the cleanup,
>      and eventually cleanup [cow_end, end) range.
> 
> - Only reset @cow_start after fallback_to_cow() succeeded
>   As above case 2) and 3) are both relying on @cow_start to determine
>   cleanup range.
> 
> - Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(),
>   cow_file_range() and nocow_one_range()
> 
>   For cow_file_range() it's pretty straightforward and easy.
> 
>   For run_delalloc_nocow() refer to the above 3 different error cases.
> 
>   For nocow_one_range() if we hit an error, we need to cleanup the
>   ordered extents by ourselves.
>   And then it will fallback to case 1), since @cur_offset is not yet
>   advanced, the existing cleanup will co-operate with nocow_one_range()
>   well.
> 
> - Remove the btrfs_cleanup_ordered_extents() inside
>   submit_uncompressed_range()
>   As failed cow_file_range() will do all the proper cleanup now.
> 

LGTM, thanks for all the extra explanations in the commit and comments.

If you fix the IMO serious comment error I pointed out inline, please
add
Reviewed-by: Boris Burkov <boris@bur.io>

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 66 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 42 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 42f67f8a4a33..8e8b08412d35 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1090,7 +1090,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode,
>  			       &wbc, false);
>  	wbc_detach_inode(&wbc);
>  	if (ret < 0) {
> -		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
>  		if (locked_folio)
>  			btrfs_folio_end_lock(inode->root->fs_info, locked_folio,
>  					     start, async_extent->ram_size);
> @@ -1272,10 +1271,7 @@ u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
>   * - Else all pages except for @locked_folio are unlocked.
>   *
>   * When a failure happens in the second or later iteration of the
> - * while-loop, the ordered extents created in previous iterations are kept
> - * intact. So, the caller must clean them up by calling
> - * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
> - * example.
> + * while-loop, the ordered extents created in previous iterations are cleaned up.
>   */
>  static noinline int cow_file_range(struct btrfs_inode *inode,
>  				   struct folio *locked_folio, u64 start,
> @@ -1488,11 +1484,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  
>  	/*
>  	 * For the range (1). We have already instantiated the ordered extents
> -	 * for this region. They are cleaned up by
> -	 * btrfs_cleanup_ordered_extents() in e.g,
> -	 * btrfs_run_delalloc_range().
> +	 * for this region, thus we need to cleanup those ordered extents.
>  	 * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
> -	 * are also handled by the cleanup function.
> +	 * are also handled by the ordered extents cleanup.
>  	 *
>  	 * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag,
>  	 * and finish the writeback of the involved folios, which will be
> @@ -1504,6 +1498,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  
>  		if (!locked_folio)
>  			mapping_set_error(inode->vfs_inode.i_mapping, ret);
> +
> +		btrfs_cleanup_ordered_extents(inode, orig_start, start - orig_start);
>  		extent_clear_unlock_delalloc(inode, orig_start, start - 1,
>  					     locked_folio, NULL, clear_bits, page_ops);
>  	}
> @@ -2030,12 +2026,15 @@ static int nocow_one_range(struct btrfs_inode *inode,
>  				     EXTENT_LOCKED | EXTENT_DELALLOC |
>  				     EXTENT_CLEAR_DATA_RESV,
>  				     PAGE_UNLOCK | PAGE_SET_ORDERED);
> -
>  	/*
> -	 * btrfs_reloc_clone_csums() error, now we're OK to call error
> -	 * handler, as metadata for created ordered extent will only
> -	 * be freed by btrfs_finish_ordered_io().
> +	 * On error, we need to cleanup the ordered extents we created.
> +	 *
> +	 * We also need to clear the folio Dirty flags for the range,
> +	 * but it's not something touched by us, it will be cleared
> +	 * by the caller (with cleanup_dirty_folios()).

I don't love this phrasing about the Dirty flags for some reason. Not a
deal breaker, though.

How about:
"We do not clear the folio Dirty flags because they are set and cleaered
by the caller"

or something like that?

>  	 */
> +	if (ret < 0)
> +		btrfs_cleanup_ordered_extents(inode, file_pos, end);
>  	return ret;
>  }
>  
> @@ -2214,12 +2213,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  		if (cow_start != (u64)-1) {
>  			ret = fallback_to_cow(inode, locked_folio, cow_start,
>  					      found_key.offset - 1);
> -			cow_start = (u64)-1;
>  			if (ret) {
>  				cow_end = found_key.offset - 1;
>  				btrfs_dec_nocow_writers(nocow_bg);
>  				goto error;
>  			}
> +			cow_start = (u64)-1;
>  		}
>  
>  		ret = nocow_one_range(inode, locked_folio, &cached_state,
> @@ -2237,11 +2236,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  
>  	if (cow_start != (u64)-1) {
>  		ret = fallback_to_cow(inode, locked_folio, cow_start, end);
> -		cow_start = (u64)-1;
>  		if (ret) {
>  			cow_end = end;
>  			goto error;
>  		}
> +		cow_start = (u64)-1;
>  	}
>  
>  	btrfs_free_path(path);
> @@ -2255,16 +2254,32 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  	 *    start         cur_offset             end
>  	 *    |/////////////|                      |
>  	 *
> +	 *    In this case, cow_start should be (u64)-1.
> +	 *
>  	 *    For range [start, cur_offset) the folios are already unlocked (except
>  	 *    @locked_folio), EXTENT_DELALLOC already removed.
>  	 *    Only need to clear the dirty flag as they will never be submitted.
>  	 *    Ordered extent and extent maps are handled by
>  	 *    btrfs_mark_ordered_io_finished() inside run_delalloc_range().

I believe this comment is quite wrong now, and does not represent the
new logic where we cleanup the ordered extents up to cur_offset (or
cow_start in case 2) here rather than in run_delalloc_range

>  	 *
> -	 * 2) Failed with error from fallback_to_cow()
> -	 *    start         cur_offset  cow_end    end
> +	 * 2) Failed with error before calling fallback_to_cow()
> +	 *
> +	 *    start         cow_start              end
> +	 *    |/////////////|                      |
> +	 *
> +	 *    In this case, only @cow_start is set, @cur_offset is between
> +	 *    [cow_start, end)
> +	 *
> +	 *    It's mostly the same as case 1), just replace @cur_offset with
> +	 *    @cow_start.
> +	 *
> +	 * 3) Failed with error from fallback_to_cow()
> +	 *
> +	 *    start         cow_start   cow_end    end
>  	 *    |/////////////|-----------|          |
>  	 *
> +	 *    In this case, both @cow_start and @cow_end is set.
> +	 *
>  	 *    For range [start, cur_offset) it's the same as case 1).
>  	 *    But for range [cur_offset, cow_end), the folios have dirty flag
>  	 *    cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
> @@ -2272,10 +2287,17 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>  	 *    Thus we should not call extent_clear_unlock_delalloc() on range
>  	 *    [cur_offset, cow_end), as the folios are already unlocked.
>  	 *
> -	 * So clear the folio dirty flags for [start, cur_offset) first.
> +	 *
> +	 * So for all above cases, if @cow_start is set, cleanup ordered extents
> +	 * for range [start, @cow_start), other wise cleanup range [start, @cur_offset).
>  	 */
> -	if (cur_offset > start)
> +	if (cow_start != (u64)-1)
> +		cur_offset = cow_start;
> +
> +	if (cur_offset > start) {
> +		btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
>  		cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
> +	}
>  
>  	/*
>  	 * If an error happened while a COW region is outstanding, cur_offset
> @@ -2340,7 +2362,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
>  
>  	if (should_nocow(inode, start, end)) {
>  		ret = run_delalloc_nocow(inode, locked_folio, start, end);
> -		goto out;
> +		return ret;
>  	}
>  
>  	if (btrfs_inode_can_compress(inode) &&
> @@ -2354,10 +2376,6 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
>  	else
>  		ret = cow_file_range(inode, locked_folio, start, end, NULL,
>  				     false, false);
> -
> -out:
> -	if (ret < 0)
> -		btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
>  	return ret;
>  }
>  
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-06  0:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13  9:42 [PATCH 0/2] btrfs: move ordered extents cleanup to where they got allocated Qu Wenruo
2025-01-13  9:42 ` [PATCH 1/2] btrfs: extract the nocow ordered extent and extent map generation into a helper Qu Wenruo
2025-02-06  0:39   ` Boris Burkov
2025-01-13  9:42 ` [PATCH 2/2] btrfs: move ordered extent cleanup to where they are allocated Qu Wenruo
2025-02-06  0:39   ` Boris Burkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox