From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Qu Wenruo <wqu@suse.com>, David Sterba <dsterba@suse.com>,
Sasha Levin <sashal@kernel.org>,
clm@fb.com, josef@toxicpanda.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.1 14/21] btrfs: scrub: improve tree block error reporting
Date: Sat, 25 Feb 2023 22:42:49 -0500 [thread overview]
Message-ID: <20230226034256.771769-14-sashal@kernel.org> (raw)
In-Reply-To: <20230226034256.771769-1-sashal@kernel.org>
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 28232909ba43561887508a6ef46d7f33a648f375 ]
[BUG]
When debugging a scrub related metadata error, it turns out that our
metadata error reporting is not ideal.
The only 3 error messages are:
- BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
Showing we have metadata generation mismatch errors.
- BTRFS error (device dm-2): unable to fixup (regular) error at logical 7110656 on dev /dev/mapper/test-scratch1
Showing which tree blocks are corrupted.
- BTRFS warning (device dm-2): checksum/header error at logical 24772608 on dev /dev/mapper/test-scratch2, physical 3801088: metadata node (level 1) in tree 5
Showing which physical range the corrupted metadata is at.
We have to combine the above 3 to know we have a corrupted metadata with
generation mismatch.
And this is already the better case, if we have other problems, like
fsid mismatch, we can not even know the cause.
[CAUSE]
The problem is caused by the fact that, scrub_checksum_tree_block()
never outputs any error message.
It just return two bits for scrub: sblock->header_error, and
sblock->generation_error.
And later we report error in scrub_print_warning(), but unfortunately we
only have two bits, there is not really much thing we can done to print
any detailed errors.
[FIX]
This patch will do the following to enhance the error reporting of
metadata scrub:
- Add extra warning (ratelimited) for every error we hit
This can help us to distinguish the different types of errors.
Some errors can help us to know what's going wrong immediately,
like bytenr mismatch.
- Re-order the checks
Currently we check bytenr first, then immediately generation.
This can lead to false generation mismatch reports, while the fsid
mismatches.
Here is the new output for the bug I'm debugging (we forgot to
writeback tree blocks for commit roots):
BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
Now we can immediately know it's some tree blocks didn't even get written
back, other than the original confusing generation mismatch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/scrub.c | 49 +++++++++++++++++++++++++++++++++++++++---------
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 196c4c6ed1ed8..c5d8dc112fd58 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2036,20 +2036,33 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
* a) don't have an extent buffer and
* b) the page is already kmapped
*/
- if (sblock->logical != btrfs_stack_header_bytenr(h))
+ if (sblock->logical != btrfs_stack_header_bytenr(h)) {
sblock->header_error = 1;
-
- if (sector->generation != btrfs_stack_header_generation(h)) {
- sblock->header_error = 1;
- sblock->generation_error = 1;
+ btrfs_warn_rl(fs_info,
+ "tree block %llu mirror %u has bad bytenr, has %llu want %llu",
+ sblock->logical, sblock->mirror_num,
+ btrfs_stack_header_bytenr(h),
+ sblock->logical);
+ goto out;
}
- if (!scrub_check_fsid(h->fsid, sector))
+ if (!scrub_check_fsid(h->fsid, sector)) {
sblock->header_error = 1;
+ btrfs_warn_rl(fs_info,
+ "tree block %llu mirror %u has bad fsid, has %pU want %pU",
+ sblock->logical, sblock->mirror_num,
+ h->fsid, sblock->dev->fs_devices->fsid);
+ goto out;
+ }
- if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
- BTRFS_UUID_SIZE))
+ if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid, BTRFS_UUID_SIZE)) {
sblock->header_error = 1;
+ btrfs_warn_rl(fs_info,
+ "tree block %llu mirror %u has bad chunk tree uuid, has %pU want %pU",
+ sblock->logical, sblock->mirror_num,
+ h->chunk_tree_uuid, fs_info->chunk_tree_uuid);
+ goto out;
+ }
shash->tfm = fs_info->csum_shash;
crypto_shash_init(shash);
@@ -2062,9 +2075,27 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
}
crypto_shash_final(shash, calculated_csum);
- if (memcmp(calculated_csum, on_disk_csum, sctx->fs_info->csum_size))
+ if (memcmp(calculated_csum, on_disk_csum, sctx->fs_info->csum_size)) {
sblock->checksum_error = 1;
+ btrfs_warn_rl(fs_info,
+ "tree block %llu mirror %u has bad csum, has " CSUM_FMT " want " CSUM_FMT,
+ sblock->logical, sblock->mirror_num,
+ CSUM_FMT_VALUE(fs_info->csum_size, on_disk_csum),
+ CSUM_FMT_VALUE(fs_info->csum_size, calculated_csum));
+ goto out;
+ }
+
+ if (sector->generation != btrfs_stack_header_generation(h)) {
+ sblock->header_error = 1;
+ sblock->generation_error = 1;
+ btrfs_warn_rl(fs_info,
+ "tree block %llu mirror %u has bad generation, has %llu want %llu",
+ sblock->logical, sblock->mirror_num,
+ btrfs_stack_header_generation(h),
+ sector->generation);
+ }
+out:
return sblock->header_error || sblock->checksum_error;
}
--
2.39.0
next prev parent reply other threads:[~2023-02-26 3:44 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-26 3:42 [PATCH AUTOSEL 6.1 01/21] ARM: OMAP2+: omap4-common: Fix refcount leak bug Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 02/21] arm64: dts: qcom: msm8996: Add additional A2NoC clocks Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 03/21] udf: Define EFSCORRUPTED error code Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 04/21] context_tracking: Fix noinstr vs KASAN Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 05/21] exit: Detect and fix irq disabled state in oops Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 06/21] ARM: dts: exynos: Use Exynos5420 compatible for the MIPI video phy Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 07/21] fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 08/21] blk-iocost: fix divide by 0 error in calc_lcoefs() Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 09/21] blk-cgroup: dropping parent refcount after pd_free_fn() is done Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 10/21] blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy() Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 11/21] trace/blktrace: fix memory leak with using debugfs_lookup() Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() Sasha Levin
2023-02-26 4:07 ` Eric Biggers
2023-02-26 5:30 ` Eric Biggers
2023-02-26 19:24 ` Eric Biggers
2023-02-26 19:33 ` Slade Watkins
2023-02-27 14:18 ` Sasha Levin
2023-02-27 17:47 ` AUTOSEL process Eric Biggers
2023-02-27 18:06 ` Eric Biggers
2023-02-27 20:39 ` Sasha Levin
2023-02-27 21:38 ` Eric Biggers
2023-02-27 22:35 ` Sasha Levin
2023-02-27 22:59 ` Matthew Wilcox
2023-02-28 0:52 ` Sasha Levin
2023-02-28 1:25 ` Eric Biggers
2023-02-28 4:25 ` Willy Tarreau
2023-03-30 0:08 ` Eric Biggers
2023-03-30 14:05 ` Sasha Levin
2023-03-30 17:22 ` Eric Biggers
2023-03-30 17:50 ` Sasha Levin
2023-02-28 0:32 ` Eric Biggers
2023-02-28 1:53 ` Sasha Levin
2023-02-28 3:41 ` Eric Biggers
2023-02-28 10:41 ` Amir Goldstein
2023-02-28 11:28 ` Greg KH
2023-03-01 2:05 ` Slade Watkins
2023-03-01 5:13 ` Eric Biggers
2023-03-01 6:09 ` Greg KH
2023-03-01 7:22 ` Eric Biggers
2023-03-01 7:40 ` Willy Tarreau
2023-03-01 8:31 ` Eric Biggers
2023-03-01 8:43 ` Greg KH
2023-03-01 6:06 ` Greg KH
2023-03-01 7:05 ` Eric Biggers
2023-03-01 10:31 ` Thorsten Leemhuis
2023-03-01 13:26 ` Mark Brown
2023-02-28 17:03 ` Sasha Levin
2023-03-10 23:07 ` Eric Biggers
2023-03-11 13:41 ` Sasha Levin
2023-03-11 15:54 ` James Bottomley
2023-03-11 18:07 ` Sasha Levin
2023-03-12 19:03 ` Theodore Ts'o
2023-03-07 21:18 ` Pavel Machek
2023-03-07 21:45 ` Eric Biggers
2023-03-11 6:25 ` Matthew Wilcox
2023-03-11 8:11 ` Willy Tarreau
2023-03-11 11:45 ` Pavel Machek
2023-03-11 12:29 ` Greg KH
2023-03-21 12:41 ` Maciej W. Rozycki
2023-03-11 14:06 ` Sasha Levin
2023-03-11 16:16 ` Theodore Ts'o
2023-03-11 17:48 ` Eric Biggers
2023-03-11 18:26 ` Sasha Levin
2023-03-11 18:54 ` Eric Biggers
2023-03-11 19:01 ` Eric Biggers
2023-03-11 21:14 ` Sasha Levin
2023-03-12 8:04 ` Amir Goldstein
2023-03-12 16:00 ` Sasha Levin
2023-03-13 17:41 ` Greg KH
2023-03-13 18:54 ` Eric Biggers
2023-03-14 18:26 ` Greg KH
2023-03-11 20:17 ` Eric Biggers
2023-03-11 21:02 ` Sasha Levin
2023-03-12 4:23 ` Willy Tarreau
2023-03-11 18:33 ` Willy Tarreau
2023-03-11 19:24 ` Eric Biggers
2023-03-11 19:46 ` Eric Biggers
2023-03-11 20:19 ` Willy Tarreau
2023-03-11 20:59 ` Sasha Levin
2023-03-11 20:11 ` Willy Tarreau
2023-03-11 20:53 ` Eric Biggers
2023-03-12 4:32 ` Willy Tarreau
2023-03-12 5:21 ` Eric Biggers
2023-03-12 5:48 ` Willy Tarreau
2023-03-12 7:42 ` Amir Goldstein
2023-03-12 13:34 ` Mark Brown
2023-03-12 15:57 ` Sasha Levin
2023-03-12 13:55 ` Mark Brown
2023-03-11 22:38 ` David Laight
2023-03-12 4:41 ` Willy Tarreau
2023-03-12 5:09 ` Theodore Ts'o
2023-03-14 14:12 ` Jan Kara
2023-03-13 3:37 ` Bagas Sanjaya
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 13/21] sched/fair: sanitize vruntime of entity being placed Sasha Levin
2023-02-26 3:42 ` Sasha Levin [this message]
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 15/21] arm64: zynqmp: Enable hs termination flag for USB dwc3 controller Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 16/21] cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 17/21] entry, kasan, x86: Disallow overriding mem*() functions Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 18/21] x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 19/21] cpuidle: drivers: firmware: psci: Dont instrument suspend code Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 20/21] cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG Sasha Levin
2023-02-26 3:42 ` [PATCH AUTOSEL 6.1 21/21] perf/x86/intel/uncore: Add Meteor Lake support Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230226034256.771769-14-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).