* [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() @ 2022-12-26 1:00 Qu Wenruo 2022-12-26 1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo 2022-12-26 1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo 0 siblings, 2 replies; 6+ messages in thread From: Qu Wenruo @ 2022-12-26 1:00 UTC (permalink / raw) To: linux-btrfs There is a bug report that one can cause error in run_one_delayed_ref() and make the fs RO. But there is only two lines of dmesg: BTRFS: error (device nvme0n1p3: state A) in btrfs_run_delayed_refs:2147: errno=-5 IO failure BTRFS info (device nvme0n1p3: state EA): forced readonly Which is not really helpful. This patch will add two error reports which should help us to debug: - Add the missing level check errors in validate_extent_buffer() Failed level checks didn't cause any error message, which is very different compared to the other checks. - Add error report for failed run_one_delayed_ref() This will include extra info for each node. The remaining branches are either calling btrfs_abort_transaction(), which should provide function and line number, or have extra error message like alloc_reserved_extent(), or should trigger error messages from validate_extent_buffer() when failed, like btrfs_insert_empty_item(). Thus this two patches should cover most of the error cases. Qu Wenruo (2): btrfs: add extra error message for tree block level mismatch btrfs: always report error for run_one_delayed_ref() fs/btrfs/disk-io.c | 4 ++++ fs/btrfs/extent-tree.c | 7 +++++-- 2 files changed, 9 insertions(+), 2 deletions(-) -- 2.39.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] btrfs: add extra error message for tree block level mismatch 2022-12-26 1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo @ 2022-12-26 1:00 ` Qu Wenruo 2023-01-03 8:36 ` Anand Jain 2022-12-26 1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo 1 sibling, 1 reply; 6+ messages in thread From: Qu Wenruo @ 2022-12-26 1:00 UTC (permalink / raw) To: linux-btrfs There is a bug report that commit 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()") caused some fs to go RO under heavy write workload. The full dmesg provided very little useful info, but surprisingly if there is something wrong with the tree block, we should got some error message in validate_extent_buffer(). However this is not always true, as there is one missing error message for validate_extent_buffer(), tree level check. So this patch will add the missing error message for validate_extent_buffer() to make later debug much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f8b5955f003f..3421f06eade3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb, } if (found_level != check->level) { + btrfs_err_rl(eb->fs_info, + "level verify failed on logical %llu mirror %u wanted %u found %u", + eb->start, eb->read_mirror, check->level, + found_level); ret = -EIO; goto out; } -- 2.39.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: add extra error message for tree block level mismatch 2022-12-26 1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo @ 2023-01-03 8:36 ` Anand Jain 0 siblings, 0 replies; 6+ messages in thread From: Anand Jain @ 2023-01-03 8:36 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 12/26/22 09:00, Qu Wenruo wrote: > There is a bug report that commit 947a629988f1 ("btrfs: move tree block > parentness check into validate_extent_buffer()") caused some fs to go RO > under heavy write workload. > > The full dmesg provided very little useful info, but surprisingly if > there is something wrong with the tree block, we should got some error > message in validate_extent_buffer(). > > However this is not always true, as there is one missing error message > for validate_extent_buffer(), tree level check. > > So this patch will add the missing error message for > validate_extent_buffer() to make later debug much easier. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/disk-io.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index f8b5955f003f..3421f06eade3 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb, > } > > if (found_level != check->level) { > + btrfs_err_rl(eb->fs_info, > + "level verify failed on logical %llu mirror %u wanted %u found %u", > + eb->start, eb->read_mirror, check->level, > + found_level); > ret = -EIO; > goto out; > } ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() 2022-12-26 1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo 2022-12-26 1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo @ 2022-12-26 1:00 ` Qu Wenruo 2023-01-02 16:45 ` David Sterba 2023-01-03 8:48 ` Anand Jain 1 sibling, 2 replies; 6+ messages in thread From: Qu Wenruo @ 2022-12-26 1:00 UTC (permalink / raw) To: linux-btrfs Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but if end users hit such problem, there will be no chance that btrfs_debug() is enabled. This can lead to very little useful info for debug. This patch will: - Add extra info for error reporting Including: * logical bytenr * num_bytes * type * action * ref_mod - Replace the btrfs_debug() with btrfs_err() - Move the error reporting into run_one_delayed_ref() This is to avoid use-after-free, the @node can be freed in the caller. This error should only be triggered at most once. As if run_one_delayed_ref() failed, we trigger the error message, then causing the call chain to error out: btrfs_run_delayed_refs() `- btrfs_run_delayed_refs() `- btrfs_run_delayed_refs_for_head() `- run_one_delayed_ref() And we will abort the current transaction in btrfs_run_delayed_refs(). If we have to run delayed refs for the abort transaction, run_one_delayed_ref() will just cleanup the refs and do nothing, thus no new error messages would be output. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent-tree.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index eaa1fb2850d7..d1a4e51f8fbc 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1713,6 +1713,11 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, BUG(); if (ret && insert_reserved) btrfs_pin_extent(trans, node->bytenr, node->num_bytes, 1); + if (ret < 0) + btrfs_err(trans->fs_info, +"failed to run delayed ref for logical %llu num_bytes %llu type %u action %u ref_mod %d: %d", + node->bytenr, node->num_bytes, node->type, + node->action, node->ref_mod, ret); return ret; } @@ -1954,8 +1959,6 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, if (ret) { unselect_delayed_ref_head(delayed_refs, locked_ref); btrfs_put_delayed_ref(ref); - btrfs_debug(fs_info, "run_one_delayed_ref returned %d", - ret); return ret; } -- 2.39.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() 2022-12-26 1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo @ 2023-01-02 16:45 ` David Sterba 2023-01-03 8:48 ` Anand Jain 1 sibling, 0 replies; 6+ messages in thread From: David Sterba @ 2023-01-02 16:45 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Dec 26, 2022 at 09:00:40AM +0800, Qu Wenruo wrote: > Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but > if end users hit such problem, there will be no chance that > btrfs_debug() is enabled. > > This can lead to very little useful info for debug. > > This patch will: > > - Add extra info for error reporting > Including: > * logical bytenr > * num_bytes > * type > * action > * ref_mod > > - Replace the btrfs_debug() with btrfs_err() > > - Move the error reporting into run_one_delayed_ref() > This is to avoid use-after-free, the @node can be freed in the caller. > > This error should only be triggered at most once. > > As if run_one_delayed_ref() failed, we trigger the error message, then > causing the call chain to error out: > > btrfs_run_delayed_refs() > `- btrfs_run_delayed_refs() > `- btrfs_run_delayed_refs_for_head() > `- run_one_delayed_ref() > > And we will abort the current transaction in btrfs_run_delayed_refs(). > If we have to run delayed refs for the abort transaction, > run_one_delayed_ref() will just cleanup the refs and do nothing, thus no > new error messages would be output. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Added to misc-next, thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() 2022-12-26 1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo 2023-01-02 16:45 ` David Sterba @ 2023-01-03 8:48 ` Anand Jain 1 sibling, 0 replies; 6+ messages in thread From: Anand Jain @ 2023-01-03 8:48 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 12/26/22 09:00, Qu Wenruo wrote: > Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but > if end users hit such problem, there will be no chance that > btrfs_debug() is enabled. > > This can lead to very little useful info for debug. > > This patch will: > > - Add extra info for error reporting > Including: > * logical bytenr > * num_bytes > * type > * action > * ref_mod > > - Replace the btrfs_debug() with btrfs_err() > > - Move the error reporting into run_one_delayed_ref() > This is to avoid use-after-free, the @node can be freed in the caller. > > This error should only be triggered at most once. > > As if run_one_delayed_ref() failed, we trigger the error message, then > causing the call chain to error out: > > btrfs_run_delayed_refs() > `- btrfs_run_delayed_refs() > `- btrfs_run_delayed_refs_for_head() > `- run_one_delayed_ref() > > And we will abort the current transaction in btrfs_run_delayed_refs(). > If we have to run delayed refs for the abort transaction, > run_one_delayed_ref() will just cleanup the refs and do nothing, thus no > new error messages would be output. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-03 8:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-26 1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo 2022-12-26 1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo 2023-01-03 8:36 ` Anand Jain 2022-12-26 1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo 2023-01-02 16:45 ` David Sterba 2023-01-03 8:48 ` Anand Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox