* [PATCH 1/3] btrfs: enhance error messages for delalloc range failure
2025-07-23 11:21 [PATCH 0/3] btrfs: fix possible race between error handling and writeback Qu Wenruo
@ 2025-07-23 11:21 ` Qu Wenruo
2025-07-25 18:37 ` Boris Burkov
2025-07-23 11:21 ` [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error Qu Wenruo
2025-07-23 11:21 ` [PATCH 3/3] btrfs: keep folios locked inside run_delalloc_nocow() Qu Wenruo
2 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2025-07-23 11:21 UTC (permalink / raw)
To: linux-btrfs
When running emulated write error tests like generic/475, we can hit
error messages like this:
BTRFS error (device dm-12 state EA): run_delalloc_nocow failed, root=596 inode=264 start=1605632 len=73728: -5
BTRFS error (device dm-12 state EA): failed to run delalloc range, root=596 ino=264 folio=1605632 submit_bitmap=0-7 start=1605632 len=73728: -5
Which is normally buried by direct IO error messages.
However above error messages are not enough to determine which is the
real range that caused the error.
Considering we can have multiple different extents in one delalloc
range (e.g. some COW extents along with some NOCOW extents), just
outputting the error at the end of run_delalloc_nocow() is not enough.
To enhance the error messages:
- Remove the rate limit on the existing error messages
In the generic/475 example, most error messages are from direct IO,
not really from the delalloc range.
Considering how useful the delalloc range error messages are, we don't
want they to be rate limited.
- Add extra @cur_offset output for cow_file_range()
- Add extra @cow_start and @cow_end output for run_delalloc_nocow()
This is especially important for run_delalloc_nocow(), as there
are extra error paths where we can hit error without into
nocow_one_range() nor fallback_to_cow().
- Add an error message for nocow_one_range()
That's the missing part.
For fallback_to_cow(), we have error message from cow_file_range()
already.
- Constify the @len and @end local variables for nocow_one_range()
This makes it much easier to make sure @len and @end are not modified
at runtime.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6d9a8d8bea4c..55d42f2b4a86 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1534,10 +1534,11 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size,
end - start - cur_alloc_size + 1, NULL);
}
- btrfs_err_rl(fs_info,
- "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d",
- __func__, btrfs_root_id(inode->root),
- btrfs_ino(inode), orig_start, end + 1 - orig_start, ret);
+ btrfs_err(fs_info,
+ "%s failed, root=%llu inode=%llu start=%llu len=%llu cur_offset=%llu cur_alloc_size=%llu: %d",
+ __func__, btrfs_root_id(inode->root),
+ btrfs_ino(inode), orig_start, end + 1 - orig_start,
+ start, cur_alloc_size, ret);
return ret;
}
@@ -1969,8 +1970,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
u64 file_pos, bool is_prealloc)
{
struct btrfs_ordered_extent *ordered;
- u64 len = nocow_args->file_extent.num_bytes;
- u64 end = file_pos + len - 1;
+ const u64 len = nocow_args->file_extent.num_bytes;
+ const u64 end = file_pos + len - 1;
int ret = 0;
btrfs_lock_extent(&inode->io_tree, file_pos, end, cached);
@@ -2017,8 +2018,13 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
* We do not clear the folio Dirty flags because they are set and
* cleaered by the caller.
*/
- if (ret < 0)
+ if (ret < 0) {
btrfs_cleanup_ordered_extents(inode, file_pos, len);
+ btrfs_err(inode->root->fs_info,
+ "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
+ __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
+ file_pos, len, ret);
+ }
return ret;
}
@@ -2304,10 +2310,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
btrfs_qgroup_free_data(inode, NULL, cur_offset, end - cur_offset + 1, NULL);
}
btrfs_free_path(path);
- btrfs_err_rl(fs_info,
- "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d",
- __func__, btrfs_root_id(inode->root),
- btrfs_ino(inode), start, end + 1 - start, ret);
+ btrfs_err(fs_info,
+ "%s failed, root=%llu inode=%llu start=%llu len=%llu cow_start=%lld cow_end=%llu: %d",
+ __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
+ start, end + 1 - start, cow_start, cow_end, ret);
return ret;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 1/3] btrfs: enhance error messages for delalloc range failure
2025-07-23 11:21 ` [PATCH 1/3] btrfs: enhance error messages for delalloc range failure Qu Wenruo
@ 2025-07-25 18:37 ` Boris Burkov
0 siblings, 0 replies; 7+ messages in thread
From: Boris Burkov @ 2025-07-25 18:37 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Jul 23, 2025 at 08:51:22PM +0930, Qu Wenruo wrote:
> When running emulated write error tests like generic/475, we can hit
> error messages like this:
>
> BTRFS error (device dm-12 state EA): run_delalloc_nocow failed, root=596 inode=264 start=1605632 len=73728: -5
> BTRFS error (device dm-12 state EA): failed to run delalloc range, root=596 ino=264 folio=1605632 submit_bitmap=0-7 start=1605632 len=73728: -5
>
> Which is normally buried by direct IO error messages.
>
> However above error messages are not enough to determine which is the
> real range that caused the error.
> Considering we can have multiple different extents in one delalloc
> range (e.g. some COW extents along with some NOCOW extents), just
> outputting the error at the end of run_delalloc_nocow() is not enough.
>
> To enhance the error messages:
>
> - Remove the rate limit on the existing error messages
> In the generic/475 example, most error messages are from direct IO,
> not really from the delalloc range.
> Considering how useful the delalloc range error messages are, we don't
> want they to be rate limited.
>
> - Add extra @cur_offset output for cow_file_range()
> - Add extra @cow_start and @cow_end output for run_delalloc_nocow()
> This is especially important for run_delalloc_nocow(), as there
> are extra error paths where we can hit error without into
> nocow_one_range() nor fallback_to_cow().
>
> - Add an error message for nocow_one_range()
> That's the missing part.
> For fallback_to_cow(), we have error message from cow_file_range()
> already.
>
> - Constify the @len and @end local variables for nocow_one_range()
> This makes it much easier to make sure @len and @end are not modified
> at runtime.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/inode.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 6d9a8d8bea4c..55d42f2b4a86 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1534,10 +1534,11 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
> btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size,
> end - start - cur_alloc_size + 1, NULL);
> }
> - btrfs_err_rl(fs_info,
> - "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d",
> - __func__, btrfs_root_id(inode->root),
> - btrfs_ino(inode), orig_start, end + 1 - orig_start, ret);
> + btrfs_err(fs_info,
> + "%s failed, root=%llu inode=%llu start=%llu len=%llu cur_offset=%llu cur_alloc_size=%llu: %d",
> + __func__, btrfs_root_id(inode->root),
> + btrfs_ino(inode), orig_start, end + 1 - orig_start,
> + start, cur_alloc_size, ret);
> return ret;
> }
>
> @@ -1969,8 +1970,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
> u64 file_pos, bool is_prealloc)
> {
> struct btrfs_ordered_extent *ordered;
> - u64 len = nocow_args->file_extent.num_bytes;
> - u64 end = file_pos + len - 1;
> + const u64 len = nocow_args->file_extent.num_bytes;
> + const u64 end = file_pos + len - 1;
> int ret = 0;
>
> btrfs_lock_extent(&inode->io_tree, file_pos, end, cached);
> @@ -2017,8 +2018,13 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
> * We do not clear the folio Dirty flags because they are set and
> * cleaered by the caller.
> */
> - if (ret < 0)
> + if (ret < 0) {
> btrfs_cleanup_ordered_extents(inode, file_pos, len);
> + btrfs_err(inode->root->fs_info,
> + "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
> + __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
> + file_pos, len, ret);
> + }
> return ret;
> }
>
> @@ -2304,10 +2310,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> btrfs_qgroup_free_data(inode, NULL, cur_offset, end - cur_offset + 1, NULL);
> }
> btrfs_free_path(path);
> - btrfs_err_rl(fs_info,
> - "%s failed, root=%llu inode=%llu start=%llu len=%llu: %d",
> - __func__, btrfs_root_id(inode->root),
> - btrfs_ino(inode), start, end + 1 - start, ret);
> + btrfs_err(fs_info,
> + "%s failed, root=%llu inode=%llu start=%llu len=%llu cow_start=%lld cow_end=%llu: %d",
> + __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
> + start, end + 1 - start, cow_start, cow_end, ret);
> return ret;
> }
>
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error
2025-07-23 11:21 [PATCH 0/3] btrfs: fix possible race between error handling and writeback Qu Wenruo
2025-07-23 11:21 ` [PATCH 1/3] btrfs: enhance error messages for delalloc range failure Qu Wenruo
@ 2025-07-23 11:21 ` Qu Wenruo
2025-07-25 18:37 ` Boris Burkov
2025-07-23 11:21 ` [PATCH 3/3] btrfs: keep folios locked inside run_delalloc_nocow() Qu Wenruo
2 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2025-07-23 11:21 UTC (permalink / raw)
To: linux-btrfs
Currently if we hit an error inside nocow_one_range(), we do not clear
the page dirty, and let the caller to handle it.
This is very different compared to fallback_to_cow(), when that function
failed, everything will be cleaned up by cow_file_range().
Enhance the situation by:
- Use a common error handling for nocow_one_range()
If we failed anything, use the same btrfs_cleanup_ordered_extents()
and extent_clear_unlock_delalloc().
btrfs_cleanup_ordered_extents() is safe even if we haven't created new
ordered extent, in that case there should be no OE and that function
will do nothing.
The same applies to extent_clear_unlock_delalloc(), and since we're
passing PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK, it
will also clear folio dirty flag during error handling.
- Avoid touching the failed range of nocow_one_range()
As the failed range will be cleaned up and unlocked by that function.
Here we introduce a new variable @nocow_end to record the failed range,
so that we can skip it during the error handling of run_delalloc_nocow().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 45 +++++++++++++++++++++++++++------------------
1 file changed, 27 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 55d42f2b4a86..3f2f3c6024ba 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1982,8 +1982,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
em = btrfs_create_io_em(inode, file_pos, &nocow_args->file_extent,
BTRFS_ORDERED_PREALLOC);
if (IS_ERR(em)) {
- btrfs_unlock_extent(&inode->io_tree, file_pos, end, cached);
- return PTR_ERR(em);
+ ret = PTR_ERR(em);
+ goto error;
}
btrfs_free_extent_map(em);
}
@@ -1995,8 +1995,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
if (IS_ERR(ordered)) {
if (is_prealloc)
btrfs_drop_extent_map_range(inode, file_pos, end, false);
- btrfs_unlock_extent(&inode->io_tree, file_pos, end, cached);
- return PTR_ERR(ordered);
+ ret = PTR_ERR(ordered);
+ goto error;
}
if (btrfs_is_data_reloc_root(inode->root))
@@ -2008,23 +2008,24 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
ret = btrfs_reloc_clone_csums(ordered);
btrfs_put_ordered_extent(ordered);
+ if (ret < 0)
+ goto error;
extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
EXTENT_LOCKED | EXTENT_DELALLOC |
EXTENT_CLEAR_DATA_RESV,
PAGE_UNLOCK | PAGE_SET_ORDERED);
- /*
- * On error, we need to cleanup the ordered extents we created.
- *
- * We do not clear the folio Dirty flags because they are set and
- * cleaered by the caller.
- */
- if (ret < 0) {
- btrfs_cleanup_ordered_extents(inode, file_pos, len);
- btrfs_err(inode->root->fs_info,
- "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
- __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
- file_pos, len, ret);
- }
+ return ret;
+error:
+ btrfs_cleanup_ordered_extents(inode, file_pos, len);
+ extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
+ EXTENT_LOCKED | EXTENT_DELALLOC |
+ EXTENT_CLEAR_DATA_RESV,
+ PAGE_UNLOCK | PAGE_START_WRITEBACK |
+ PAGE_END_WRITEBACK);
+ btrfs_err(inode->root->fs_info,
+ "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
+ __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
+ file_pos, len, ret);
return ret;
}
@@ -2046,8 +2047,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
/*
* If not 0, represents the inclusive end of the last fallback_to_cow()
* range. Only for error handling.
+ *
+ * The same for nocow_end, it's to avoid double cleaning up the range
+ * already cleaned by nocow_one_range().
*/
u64 cow_end = 0;
+ u64 nocow_end = 0;
u64 cur_offset = start;
int ret;
bool check_prev = true;
@@ -2216,8 +2221,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
&nocow_args, cur_offset,
extent_type == BTRFS_FILE_EXTENT_PREALLOC);
btrfs_dec_nocow_writers(nocow_bg);
- if (ret < 0)
+ if (ret < 0) {
+ nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
goto error;
+ }
cur_offset = extent_end;
}
btrfs_release_path(path);
@@ -2291,6 +2298,8 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
*/
if (cow_end)
cur_offset = cow_end + 1;
+ else if (nocow_end)
+ cur_offset = nocow_end + 1;
/*
* We need to lock the extent here because we're clearing DELALLOC and
--
2.50.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error
2025-07-23 11:21 ` [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error Qu Wenruo
@ 2025-07-25 18:37 ` Boris Burkov
2025-07-25 21:20 ` Qu Wenruo
0 siblings, 1 reply; 7+ messages in thread
From: Boris Burkov @ 2025-07-25 18:37 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Jul 23, 2025 at 08:51:23PM +0930, Qu Wenruo wrote:
> Currently if we hit an error inside nocow_one_range(), we do not clear
> the page dirty, and let the caller to handle it.
>
> This is very different compared to fallback_to_cow(), when that function
> failed, everything will be cleaned up by cow_file_range().
>
> Enhance the situation by:
>
> - Use a common error handling for nocow_one_range()
> If we failed anything, use the same btrfs_cleanup_ordered_extents()
> and extent_clear_unlock_delalloc().
>
> btrfs_cleanup_ordered_extents() is safe even if we haven't created new
> ordered extent, in that case there should be no OE and that function
> will do nothing.
>
> The same applies to extent_clear_unlock_delalloc(), and since we're
> passing PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK, it
> will also clear folio dirty flag during error handling.
>
> - Avoid touching the failed range of nocow_one_range()
> As the failed range will be cleaned up and unlocked by that function.
>
> Here we introduce a new variable @nocow_end to record the failed range,
> so that we can skip it during the error handling of run_delalloc_nocow().
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/inode.c | 45 +++++++++++++++++++++++++++------------------
> 1 file changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 55d42f2b4a86..3f2f3c6024ba 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1982,8 +1982,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
> em = btrfs_create_io_em(inode, file_pos, &nocow_args->file_extent,
> BTRFS_ORDERED_PREALLOC);
> if (IS_ERR(em)) {
> - btrfs_unlock_extent(&inode->io_tree, file_pos, end, cached);
> - return PTR_ERR(em);
> + ret = PTR_ERR(em);
> + goto error;
> }
> btrfs_free_extent_map(em);
> }
> @@ -1995,8 +1995,8 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
> if (IS_ERR(ordered)) {
> if (is_prealloc)
> btrfs_drop_extent_map_range(inode, file_pos, end, false);
> - btrfs_unlock_extent(&inode->io_tree, file_pos, end, cached);
> - return PTR_ERR(ordered);
> + ret = PTR_ERR(ordered);
> + goto error;
> }
>
> if (btrfs_is_data_reloc_root(inode->root))
> @@ -2008,23 +2008,24 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
> ret = btrfs_reloc_clone_csums(ordered);
> btrfs_put_ordered_extent(ordered);
>
> + if (ret < 0)
> + goto error;
> extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
> EXTENT_LOCKED | EXTENT_DELALLOC |
> EXTENT_CLEAR_DATA_RESV,
> PAGE_UNLOCK | PAGE_SET_ORDERED);
> - /*
> - * On error, we need to cleanup the ordered extents we created.
> - *
> - * We do not clear the folio Dirty flags because they are set and
> - * cleaered by the caller.
> - */
> - if (ret < 0) {
> - btrfs_cleanup_ordered_extents(inode, file_pos, len);
> - btrfs_err(inode->root->fs_info,
> - "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
> - __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
> - file_pos, len, ret);
> - }
> + return ret;
> +error:
> + btrfs_cleanup_ordered_extents(inode, file_pos, len);
> + extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
> + EXTENT_LOCKED | EXTENT_DELALLOC |
> + EXTENT_CLEAR_DATA_RESV,
> + PAGE_UNLOCK | PAGE_START_WRITEBACK |
> + PAGE_END_WRITEBACK);
> + btrfs_err(inode->root->fs_info,
> + "%s failed, root=%lld inode=%llu start=%llu len=%llu: %d",
> + __func__, btrfs_root_id(inode->root), btrfs_ino(inode),
> + file_pos, len, ret);
> return ret;
> }
>
> @@ -2046,8 +2047,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> /*
> * If not 0, represents the inclusive end of the last fallback_to_cow()
> * range. Only for error handling.
> + *
> + * The same for nocow_end, it's to avoid double cleaning up the range
> + * already cleaned by nocow_one_range().
> */
> u64 cow_end = 0;
> + u64 nocow_end = 0;
> u64 cur_offset = start;
> int ret;
> bool check_prev = true;
> @@ -2216,8 +2221,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> &nocow_args, cur_offset,
> extent_type == BTRFS_FILE_EXTENT_PREALLOC);
> btrfs_dec_nocow_writers(nocow_bg);
> - if (ret < 0)
> + if (ret < 0) {
> + nocow_end = cur_offset + nocow_args.file_extent.num_bytes - 1;
> goto error;
> + }
> cur_offset = extent_end;
> }
> btrfs_release_path(path);
> @@ -2291,6 +2298,8 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> */
> if (cow_end)
> cur_offset = cow_end + 1;
The cow_end case has a comment explaining how cow_file_range does the
cleanup. Now that you made nocow_one_range match cow_file_range, can you
update the comment(s) to make that clear? I think the logic is the same
so one shared comment (rather than a separate one for this else-if
should do.
I also wonder if some of the descriptions of the 3 cases still make
perfect sense. What ordered extent are we ever finishing in case 1,
if nocow_one_range already did it, for example?
> + else if (nocow_end)
> + cur_offset = nocow_end + 1;
>
> /*
> * We need to lock the extent here because we're clearing DELALLOC and
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error
2025-07-25 18:37 ` Boris Burkov
@ 2025-07-25 21:20 ` Qu Wenruo
0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2025-07-25 21:20 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs
在 2025/7/26 04:07, Boris Burkov 写道:
> On Wed, Jul 23, 2025 at 08:51:23PM +0930, Qu Wenruo wrote:
[...]
>> @@ -2291,6 +2298,8 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>> */
>> if (cow_end)
>> cur_offset = cow_end + 1;
>
> The cow_end case has a comment explaining how cow_file_range does the
> cleanup. Now that you made nocow_one_range match cow_file_range, can you
> update the comment(s) to make that clear? I think the logic is the same
> so one shared comment (rather than a separate one for this else-if
> should do.
Sure.
>
> I also wonder if some of the descriptions of the 3 cases still make
> perfect sense. What ordered extent are we ever finishing in case 1,
The case 1 is a little more complex.
It implies have ordered extents created before our @cur_offset, but
without new @cow_start.
E.g. we created one cow OE and updated @cur_offset. Then before we are
able to determine if we can do NOCOW or not, some error happened and we
have to error out.
In that case, we hit case 1.
But to be honest, I'd prefer to merge case 1 with case 2, so that we can
focus on the range [start, @cur_offset) only.
1.1) @cow_start not set.
Then [start, cur_offset) is purely some range covered by new OE.
1.2) @cow_start set
Then [start, cow_start) is new OE.
[cow_start, cur_offset) has not been touched.
But the problem is, @cur_offset is touched too many times during error
handling.
Maybe it's time to get rid of the @cur_offset modification during error
handling?
Thanks,
Qu
> if nocow_one_range already did it, for example?
>
>> + else if (nocow_end)
>> + cur_offset = nocow_end + 1;
>>
>> /*
>> * We need to lock the extent here because we're clearing DELALLOC and
>> --
>> 2.50.0
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] btrfs: keep folios locked inside run_delalloc_nocow()
2025-07-23 11:21 [PATCH 0/3] btrfs: fix possible race between error handling and writeback Qu Wenruo
2025-07-23 11:21 ` [PATCH 1/3] btrfs: enhance error messages for delalloc range failure Qu Wenruo
2025-07-23 11:21 ` [PATCH 2/3] btrfs: make nocow_one_range() to do cleanup on error Qu Wenruo
@ 2025-07-23 11:21 ` Qu Wenruo
2 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2025-07-23 11:21 UTC (permalink / raw)
To: linux-btrfs
[BUG]
There is a very low chance that DEBUG_WARN() inside
btrfs_writepage_cow_fixup() can be triggered when
CONFIG_BTRFS_EXPERIMENTAL is enabled.
This only happens after run_delalloc_nocow() failed.
Unfortunately I haven't hit it for a while thus no real world dmesg for
now.
[CAUSE]
There is a race window where after run_delalloc_nocow() failed, error
handling can race with writeback thread.
Before we hit run_delalloc_nocow(), there is an inode with the following
dirty pages: (4K page size, 4K block size, no large folio)
0 4K 8K 12K 16K
|/////////|///////////|///////////|////////////|
The inode also have NODATACOW flag, and the above dirty range will go
through different extents during run_delalloc_range():
0 4K 8K 12K 16K
| NOCOW | COW | COW | NOCOW |
The race happen like this:
writeback thread A | writeback thread B
----------------------------------+--------------------------------------
Writeback for folio 0 |
run_delalloc_nocow() |
|- nocow_one_range() |
| For range [0, 4K), ret = 0 |
| |
|- fallback_to_cow() |
| For range [4K, 8K), ret = 0 |
| Folio 4K *UNLOCKED* |
| | Writeback for folio 4K
|- fallback_to_cow() | extent_writepage()
| For range [8K, 12K), failure | |- writepage_delalloc()
| | |
|- btrfs_cleanup_ordered_extents()| |
|- btrfs_folio_clear_ordered() | |
| Folio 0 still locked, safe | |
| | | Ordered extent already allocated.
| | | Nothing to do.
| | |- extent_writepage_io()
| | |- btrfs_writepage_cow_fixup()
|- btrfs_folio_clear_ordered() | |
Folio 4K hold by thread B, | |
UNSAFE! | |- btrfs_test_ordered()
| | Cleared by thread A,
| |
| |- DEBUG_WARN();
This is only possible after run_delalloc_nocow() failure, as
cow_file_range() will keep all folios and io tree range locked, until
everything is finished or after error handling.
The root cause is we allow fallback_to_cow() and nocow_one_range() to
unlock the folios after a successful run, so that during error handling
we're no longer safe to use btrfs_cleanup_ordered_extents() as the
folios are already unlocked.
[FIX]
- Make fallback_to_cow() and nocow_one_range() to keep folios locked
after a successful run
For fallback_to_cow() we can just pass COW_FILE_RANGE_KEEP_LOCKED flag
for cow_file_range().
For nocow_one_range() we have need to remove the PAGE_UNLOCK flag.
- Unlock folios if everything is fine in run_delalloc_nocow()
- Use extent_clear_unlock_delalloc() to handle range [@start,
@cur_offset) inside run_delalloc_nocow()
Since folios are still locked, we do not need
cleanup_dirty_folios() to do the cleanup.
extent_clear_unlock_delalloc() with "PAGE_START_WRIBACK |
PAGE_END_WRITEBACK" will clear the dirty flags.
- Remove cleanup_dirty_folios()
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 70 ++++++++++++++----------------------------------
1 file changed, 20 insertions(+), 50 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3f2f3c6024ba..cbbfda5682a8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1772,9 +1772,15 @@ static int fallback_to_cow(struct btrfs_inode *inode,
* Don't try to create inline extents, as a mix of inline extent that
* is written out and unlocked directly and a normal NOCOW extent
* doesn't work.
+ *
+ * And here we do not unlock the folio after a successful run.
+ * The folios will be unlocked after everything is finished, or by error handling.
+ *
+ * This is to ensure error handling won't need to clear dirty/ordered flags without
+ * a locked folio, which can race with writeback.
*/
ret = cow_file_range(inode, locked_folio, start, end, NULL,
- COW_FILE_RANGE_NO_INLINE);
+ COW_FILE_RANGE_NO_INLINE | COW_FILE_RANGE_KEEP_LOCKED);
ASSERT(ret != 1);
return ret;
}
@@ -1917,53 +1923,6 @@ static int can_nocow_file_extent(struct btrfs_path *path,
return ret < 0 ? ret : can_nocow;
}
-/*
- * Cleanup the dirty folios which will never be submitted due to error.
- *
- * When running a delalloc range, we may need to split the ranges (due to
- * fragmentation or NOCOW). If we hit an error in the later part, we will error
- * out and previously successfully executed range will never be submitted, thus
- * we have to cleanup those folios by clearing their dirty flag, starting and
- * finishing the writeback.
- */
-static void cleanup_dirty_folios(struct btrfs_inode *inode,
- struct folio *locked_folio,
- u64 start, u64 end, int error)
-{
- struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct address_space *mapping = inode->vfs_inode.i_mapping;
- pgoff_t start_index = start >> PAGE_SHIFT;
- pgoff_t end_index = end >> PAGE_SHIFT;
- u32 len;
-
- ASSERT(end + 1 - start < U32_MAX);
- ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
- IS_ALIGNED(end + 1, fs_info->sectorsize));
- len = end + 1 - start;
-
- /*
- * Handle the locked folio first.
- * The btrfs_folio_clamp_*() helpers can handle range out of the folio case.
- */
- btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
-
- for (pgoff_t index = start_index; index <= end_index; index++) {
- struct folio *folio;
-
- /* Already handled at the beginning. */
- if (index == locked_folio->index)
- continue;
- folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
- /* Cache already dropped, no need to do any cleanup. */
- if (IS_ERR(folio))
- continue;
- btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
- folio_unlock(folio);
- folio_put(folio);
- }
- mapping_set_error(mapping, error);
-}
-
static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio,
struct extent_state **cached,
struct can_nocow_file_extent_args *nocow_args,
@@ -2013,7 +1972,7 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
EXTENT_LOCKED | EXTENT_DELALLOC |
EXTENT_CLEAR_DATA_RESV,
- PAGE_UNLOCK | PAGE_SET_ORDERED);
+ PAGE_SET_ORDERED);
return ret;
error:
btrfs_cleanup_ordered_extents(inode, file_pos, len);
@@ -2241,6 +2200,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
cow_start = (u64)-1;
}
+ /*
+ * Everything is finished without an error, can unlock the folios now.
+ *
+ * No need to touch the io tree range nor set folio ordered flag, as
+ * fallback_to_cow() and nocow_one_range() have already handled them.
+ */
+ extent_clear_unlock_delalloc(inode, start, end, locked_folio, NULL, 0, PAGE_UNLOCK);
+
btrfs_free_path(path);
return 0;
@@ -2288,7 +2255,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
if (cur_offset > start) {
btrfs_cleanup_ordered_extents(inode, start, cur_offset - start);
- cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+ extent_clear_unlock_delalloc(inode, start, cur_offset - 1, locked_folio, NULL,
+ EXTENT_LOCKED | EXTENT_DELALLOC,
+ PAGE_UNLOCK | PAGE_START_WRITEBACK |
+ PAGE_END_WRITEBACK);
}
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 7+ messages in thread