* [PATCH v4 0/3] btrfs: Add write time super block validation
@ 2018-05-25 4:43 Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 1/3] btrfs: Move btrfs_check_super_valid() to avoid forward declaration Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Qu Wenruo @ 2018-05-25 4:43 UTC (permalink / raw)
To: linux-btrfs
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_sb_check
We have 2 reports about corrupted btrfs super block, which has some garbage
in its super block, but otherwise it's completely fine and its csum even
matches.
This means we develop memory corruption during btrfs mount time.
It's not clear whether it's caused by btrfs or some other kernel module,
but at least let's do write time verification to catch such corruption
early.
Current design is to do 2 different checks at mount time and super write
time.
And for write time check, it only checks the template super block
(fs_info->super_to_commit) other than each super blocks to be written to
disk, mostly to avoid duplicated checks.
Changelog:
v2:
Rename btrfs_check_super_valid() to btrfs_validate_super() suggested
by Nikolay and David.
v3:
Add a new patch to move btrfs_check_super_valid() to avoid forward
declaration.
Refactor btrfs_check_super_valid() to provide better naming and
function reusablity.
Code style and comment update.
Use 2 different functions, btrfs_validate_mount_super() and
btrfs_validate_write_super(), for mount and write time super check.
v4:
Change the timing of btrfs_validate_write_super() to handle seed
sprout case, where the original superblock can be from seed device,
which has a different fsid. Thanks Anand for exposing this bug.
Only the last patch is affected.
Qu Wenruo (3):
btrfs: Move btrfs_check_super_valid() to avoid forward declaration
btrfs: Refactor btrfs_check_super_valid()
btrfs: Do super block verification before writing it to disk
fs/btrfs/disk-io.c | 365 ++++++++++++++++++++++++++-------------------
1 file changed, 214 insertions(+), 151 deletions(-)
--
2.17.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/3] btrfs: Move btrfs_check_super_valid() to avoid forward declaration
2018-05-25 4:43 [PATCH v4 0/3] btrfs: Add write time super block validation Qu Wenruo
@ 2018-05-25 4:43 ` Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid() Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk Qu Wenruo
2 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2018-05-25 4:43 UTC (permalink / raw)
To: linux-btrfs
Just move btrfs_check_super_valid() before its single caller to avoid
forward declaration.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 299 ++++++++++++++++++++++-----------------------
1 file changed, 149 insertions(+), 150 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 932ed61d9554..13c5f90995aa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,6 @@
static const struct extent_io_ops btree_extent_io_ops;
static void end_workqueue_fn(struct btrfs_work *work);
static void free_fs_root(struct btrfs_root *root);
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info);
static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
struct btrfs_fs_info *fs_info);
@@ -2441,6 +2440,155 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
return ret;
}
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_super_block *sb = fs_info->super_copy;
+ u64 nodesize = btrfs_super_nodesize(sb);
+ u64 sectorsize = btrfs_super_sectorsize(sb);
+ int ret = 0;
+
+ if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
+ btrfs_err(fs_info, "no valid FS found");
+ ret = -EINVAL;
+ }
+ if (btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP) {
+ btrfs_err(fs_info, "unrecognized or unsupported super flag: %llu",
+ btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP);
+ ret = -EINVAL;
+ }
+ if (btrfs_super_root_level(sb) >= BTRFS_MAX_LEVEL) {
+ btrfs_err(fs_info, "tree_root level too big: %d >= %d",
+ btrfs_super_root_level(sb), BTRFS_MAX_LEVEL);
+ ret = -EINVAL;
+ }
+ if (btrfs_super_chunk_root_level(sb) >= BTRFS_MAX_LEVEL) {
+ btrfs_err(fs_info, "chunk_root level too big: %d >= %d",
+ btrfs_super_chunk_root_level(sb), BTRFS_MAX_LEVEL);
+ ret = -EINVAL;
+ }
+ if (btrfs_super_log_root_level(sb) >= BTRFS_MAX_LEVEL) {
+ btrfs_err(fs_info, "log_root level too big: %d >= %d",
+ btrfs_super_log_root_level(sb), BTRFS_MAX_LEVEL);
+ ret = -EINVAL;
+ }
+
+ /*
+ * Check sectorsize and nodesize first, other check will need it.
+ * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
+ */
+ if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
+ sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
+ btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
+ ret = -EINVAL;
+ }
+ /* Only PAGE SIZE is supported yet */
+ if (sectorsize != PAGE_SIZE) {
+ btrfs_err(fs_info,
+ "sectorsize %llu not supported yet, only support %lu",
+ sectorsize, PAGE_SIZE);
+ ret = -EINVAL;
+ }
+ if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
+ nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
+ btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
+ ret = -EINVAL;
+ }
+ if (nodesize != le32_to_cpu(sb->__unused_leafsize)) {
+ btrfs_err(fs_info, "invalid leafsize %u, should be %llu",
+ le32_to_cpu(sb->__unused_leafsize), nodesize);
+ ret = -EINVAL;
+ }
+
+ /* Root alignment check */
+ if (!IS_ALIGNED(btrfs_super_root(sb), sectorsize)) {
+ btrfs_warn(fs_info, "tree_root block unaligned: %llu",
+ btrfs_super_root(sb));
+ ret = -EINVAL;
+ }
+ if (!IS_ALIGNED(btrfs_super_chunk_root(sb), sectorsize)) {
+ btrfs_warn(fs_info, "chunk_root block unaligned: %llu",
+ btrfs_super_chunk_root(sb));
+ ret = -EINVAL;
+ }
+ if (!IS_ALIGNED(btrfs_super_log_root(sb), sectorsize)) {
+ btrfs_warn(fs_info, "log_root block unaligned: %llu",
+ btrfs_super_log_root(sb));
+ ret = -EINVAL;
+ }
+
+ if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+ btrfs_err(fs_info,
+ "dev_item UUID does not match fsid: %pU != %pU",
+ fs_info->fsid, sb->dev_item.fsid);
+ ret = -EINVAL;
+ }
+
+ /*
+ * Hint to catch really bogus numbers, bitflips or so, more exact checks are
+ * done later
+ */
+ if (btrfs_super_bytes_used(sb) < 6 * btrfs_super_nodesize(sb)) {
+ btrfs_err(fs_info, "bytes_used is too small %llu",
+ btrfs_super_bytes_used(sb));
+ ret = -EINVAL;
+ }
+ if (!is_power_of_2(btrfs_super_stripesize(sb))) {
+ btrfs_err(fs_info, "invalid stripesize %u",
+ btrfs_super_stripesize(sb));
+ ret = -EINVAL;
+ }
+ if (btrfs_super_num_devices(sb) > (1UL << 31))
+ btrfs_warn(fs_info, "suspicious number of devices: %llu",
+ btrfs_super_num_devices(sb));
+ if (btrfs_super_num_devices(sb) == 0) {
+ btrfs_err(fs_info, "number of devices is 0");
+ ret = -EINVAL;
+ }
+
+ if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
+ btrfs_err(fs_info, "super offset mismatch %llu != %u",
+ btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
+ ret = -EINVAL;
+ }
+
+ /*
+ * Obvious sys_chunk_array corruptions, it must hold at least one key
+ * and one chunk
+ */
+ if (btrfs_super_sys_array_size(sb) > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) {
+ btrfs_err(fs_info, "system chunk array too big %u > %u",
+ btrfs_super_sys_array_size(sb),
+ BTRFS_SYSTEM_CHUNK_ARRAY_SIZE);
+ ret = -EINVAL;
+ }
+ if (btrfs_super_sys_array_size(sb) < sizeof(struct btrfs_disk_key)
+ + sizeof(struct btrfs_chunk)) {
+ btrfs_err(fs_info, "system chunk array too small %u < %zu",
+ btrfs_super_sys_array_size(sb),
+ sizeof(struct btrfs_disk_key)
+ + sizeof(struct btrfs_chunk));
+ ret = -EINVAL;
+ }
+
+ /*
+ * The generation is a global counter, we'll trust it more than the others
+ * but it's still possible that it's the one that's wrong.
+ */
+ if (btrfs_super_generation(sb) < btrfs_super_chunk_root_generation(sb))
+ btrfs_warn(fs_info,
+ "suspicious: generation < chunk_root_generation: %llu < %llu",
+ btrfs_super_generation(sb),
+ btrfs_super_chunk_root_generation(sb));
+ if (btrfs_super_generation(sb) < btrfs_super_cache_generation(sb)
+ && btrfs_super_cache_generation(sb) != (u64)-1)
+ btrfs_warn(fs_info,
+ "suspicious: generation < cache_generation: %llu < %llu",
+ btrfs_super_generation(sb),
+ btrfs_super_cache_generation(sb));
+
+ return ret;
+}
+
int open_ctree(struct super_block *sb,
struct btrfs_fs_devices *fs_devices,
char *options)
@@ -3972,155 +4120,6 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
level, first_key);
}
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
-{
- struct btrfs_super_block *sb = fs_info->super_copy;
- u64 nodesize = btrfs_super_nodesize(sb);
- u64 sectorsize = btrfs_super_sectorsize(sb);
- int ret = 0;
-
- if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
- btrfs_err(fs_info, "no valid FS found");
- ret = -EINVAL;
- }
- if (btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP) {
- btrfs_err(fs_info, "unrecognized or unsupported super flag: %llu",
- btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP);
- ret = -EINVAL;
- }
- if (btrfs_super_root_level(sb) >= BTRFS_MAX_LEVEL) {
- btrfs_err(fs_info, "tree_root level too big: %d >= %d",
- btrfs_super_root_level(sb), BTRFS_MAX_LEVEL);
- ret = -EINVAL;
- }
- if (btrfs_super_chunk_root_level(sb) >= BTRFS_MAX_LEVEL) {
- btrfs_err(fs_info, "chunk_root level too big: %d >= %d",
- btrfs_super_chunk_root_level(sb), BTRFS_MAX_LEVEL);
- ret = -EINVAL;
- }
- if (btrfs_super_log_root_level(sb) >= BTRFS_MAX_LEVEL) {
- btrfs_err(fs_info, "log_root level too big: %d >= %d",
- btrfs_super_log_root_level(sb), BTRFS_MAX_LEVEL);
- ret = -EINVAL;
- }
-
- /*
- * Check sectorsize and nodesize first, other check will need it.
- * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
- */
- if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
- sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
- btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
- ret = -EINVAL;
- }
- /* Only PAGE SIZE is supported yet */
- if (sectorsize != PAGE_SIZE) {
- btrfs_err(fs_info,
- "sectorsize %llu not supported yet, only support %lu",
- sectorsize, PAGE_SIZE);
- ret = -EINVAL;
- }
- if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
- nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
- btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
- ret = -EINVAL;
- }
- if (nodesize != le32_to_cpu(sb->__unused_leafsize)) {
- btrfs_err(fs_info, "invalid leafsize %u, should be %llu",
- le32_to_cpu(sb->__unused_leafsize), nodesize);
- ret = -EINVAL;
- }
-
- /* Root alignment check */
- if (!IS_ALIGNED(btrfs_super_root(sb), sectorsize)) {
- btrfs_warn(fs_info, "tree_root block unaligned: %llu",
- btrfs_super_root(sb));
- ret = -EINVAL;
- }
- if (!IS_ALIGNED(btrfs_super_chunk_root(sb), sectorsize)) {
- btrfs_warn(fs_info, "chunk_root block unaligned: %llu",
- btrfs_super_chunk_root(sb));
- ret = -EINVAL;
- }
- if (!IS_ALIGNED(btrfs_super_log_root(sb), sectorsize)) {
- btrfs_warn(fs_info, "log_root block unaligned: %llu",
- btrfs_super_log_root(sb));
- ret = -EINVAL;
- }
-
- if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
- btrfs_err(fs_info,
- "dev_item UUID does not match fsid: %pU != %pU",
- fs_info->fsid, sb->dev_item.fsid);
- ret = -EINVAL;
- }
-
- /*
- * Hint to catch really bogus numbers, bitflips or so, more exact checks are
- * done later
- */
- if (btrfs_super_bytes_used(sb) < 6 * btrfs_super_nodesize(sb)) {
- btrfs_err(fs_info, "bytes_used is too small %llu",
- btrfs_super_bytes_used(sb));
- ret = -EINVAL;
- }
- if (!is_power_of_2(btrfs_super_stripesize(sb))) {
- btrfs_err(fs_info, "invalid stripesize %u",
- btrfs_super_stripesize(sb));
- ret = -EINVAL;
- }
- if (btrfs_super_num_devices(sb) > (1UL << 31))
- btrfs_warn(fs_info, "suspicious number of devices: %llu",
- btrfs_super_num_devices(sb));
- if (btrfs_super_num_devices(sb) == 0) {
- btrfs_err(fs_info, "number of devices is 0");
- ret = -EINVAL;
- }
-
- if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
- btrfs_err(fs_info, "super offset mismatch %llu != %u",
- btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
- ret = -EINVAL;
- }
-
- /*
- * Obvious sys_chunk_array corruptions, it must hold at least one key
- * and one chunk
- */
- if (btrfs_super_sys_array_size(sb) > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) {
- btrfs_err(fs_info, "system chunk array too big %u > %u",
- btrfs_super_sys_array_size(sb),
- BTRFS_SYSTEM_CHUNK_ARRAY_SIZE);
- ret = -EINVAL;
- }
- if (btrfs_super_sys_array_size(sb) < sizeof(struct btrfs_disk_key)
- + sizeof(struct btrfs_chunk)) {
- btrfs_err(fs_info, "system chunk array too small %u < %zu",
- btrfs_super_sys_array_size(sb),
- sizeof(struct btrfs_disk_key)
- + sizeof(struct btrfs_chunk));
- ret = -EINVAL;
- }
-
- /*
- * The generation is a global counter, we'll trust it more than the others
- * but it's still possible that it's the one that's wrong.
- */
- if (btrfs_super_generation(sb) < btrfs_super_chunk_root_generation(sb))
- btrfs_warn(fs_info,
- "suspicious: generation < chunk_root_generation: %llu < %llu",
- btrfs_super_generation(sb),
- btrfs_super_chunk_root_generation(sb));
- if (btrfs_super_generation(sb) < btrfs_super_cache_generation(sb)
- && btrfs_super_cache_generation(sb) != (u64)-1)
- btrfs_warn(fs_info,
- "suspicious: generation < cache_generation: %llu < %llu",
- btrfs_super_generation(sb),
- btrfs_super_cache_generation(sb));
-
- return ret;
-}
-
static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
{
mutex_lock(&fs_info->cleaner_mutex);
--
2.17.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid()
2018-05-25 4:43 [PATCH v4 0/3] btrfs: Add write time super block validation Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 1/3] btrfs: Move btrfs_check_super_valid() to avoid forward declaration Qu Wenruo
@ 2018-05-25 4:43 ` Qu Wenruo
2018-05-25 5:15 ` Su Yue
2018-05-25 4:43 ` [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk Qu Wenruo
2 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2018-05-25 4:43 UTC (permalink / raw)
To: linux-btrfs
Refactor btrfs_check_super_valid() by the ways:
1) Rename it to btrfs_validate_mount_super()
Now it's more obvious when the function should be called.
2) Extract core check routine into __validate_super()
So later write time check can reuse it, and if needed, we could also
use __validate_super() to check each super block.
3) Add more comment about btrfs_validate_mount_super()
Mostly of what it doesn't check and when it should be called.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 13c5f90995aa..b981ecc4b6f9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2440,9 +2440,19 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
return ret;
}
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+/*
+ * Real super block validation check
+ * NOTE: super csum type and incompat features will not be checked here.
+ *
+ * @sb: super block to check
+ * @mirror_num: the super block number to check its bytenr:
+ * 0 means the primary (1st) sb,
+ * 1 and 2 means 2nd and 3rd backup copy
+ * -1 means to skip bytenr check
+ */
+static int __validate_super(struct btrfs_fs_info *fs_info,
+ struct btrfs_super_block *sb, int mirror_num)
{
- struct btrfs_super_block *sb = fs_info->super_copy;
u64 nodesize = btrfs_super_nodesize(sb);
u64 sectorsize = btrfs_super_sectorsize(sb);
int ret = 0;
@@ -2545,7 +2555,8 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
ret = -EINVAL;
}
- if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
+ if (mirror_num >= 0 &&
+ btrfs_super_bytenr(sb) != btrfs_sb_offset(mirror_num)) {
btrfs_err(fs_info, "super offset mismatch %llu != %u",
btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
ret = -EINVAL;
@@ -2589,6 +2600,16 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
return ret;
}
+/*
+ * Check the validation of super block at mount time.
+ * Some checks already done by early mount, like csum type and incompat flags
+ * will be skipped.
+ */
+static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
+{
+ return __validate_super(fs_info, fs_info->super_copy, 0);
+}
+
int open_ctree(struct super_block *sb,
struct btrfs_fs_devices *fs_devices,
char *options)
@@ -2814,7 +2835,7 @@ int open_ctree(struct super_block *sb,
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
- ret = btrfs_check_super_valid(fs_info);
+ ret = btrfs_validate_mount_super(fs_info);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
--
2.17.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk
2018-05-25 4:43 [PATCH v4 0/3] btrfs: Add write time super block validation Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 1/3] btrfs: Move btrfs_check_super_valid() to avoid forward declaration Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid() Qu Wenruo
@ 2018-05-25 4:43 ` Qu Wenruo
2018-05-25 6:33 ` Nikolay Borisov
2018-05-25 15:31 ` David Sterba
2 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2018-05-25 4:43 UTC (permalink / raw)
To: linux-btrfs
There are already 2 reports about strangely corrupted super blocks,
where csum still matches but extra garbage gets slipped into super block.
The corruption would looks like:
------
superblock: bytenr=65536, device=/dev/sdc1
---------------------------------------------------------
csum_type 41700 (INVALID)
csum 0x3b252d3a [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
...
incompat_flags 0x5b22400000000169
( MIXED_BACKREF |
COMPRESS_LZO |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA |
unknown flag: 0x5b22400000000000 )
...
------
Or
------
superblock: bytenr=65536, device=/dev/mapper/x
---------------------------------------------------------
csum_type 35355 (INVALID)
csum_size 32
csum 0xf0dbeddd [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
...
incompat_flags 0x176d200000000169
( MIXED_BACKREF |
COMPRESS_LZO |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA |
unknown flag: 0x176d200000000000 )
------
Obviously, csum_type and incompat_flags get some garbage, but its csum
still matches, which means kernel calculates the csum based on corrupted
super block memory.
And after manually fixing these values, the filesystem is completely
healthy without any problem exposed by btrfs check.
Although the cause is still unknown, at least detect it and prevent further
corruption.
Reported-by: Ken Swenson <flat@imo.uto.moe>
Reported-by: Ben Parsons <9parsonsb@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b981ecc4b6f9..d6c0cee627d9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2610,6 +2610,41 @@ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
return __validate_super(fs_info, fs_info->super_copy, 0);
}
+/*
+ * Check the validation of super block at write time.
+ * Some checks like bytenr check will be skipped as their values will be
+ * overwritten soon.
+ * Extra checks like csum type and incompact flags will be executed here.
+ */
+static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info,
+ struct btrfs_super_block *sb)
+{
+ int ret;
+
+ ret = __validate_super(fs_info, sb, -1);
+ if (ret < 0)
+ goto out;
+ if (btrfs_super_csum_type(sb) != BTRFS_CSUM_TYPE_CRC32) {
+ ret = -EUCLEAN;
+ btrfs_err(fs_info, "invalid csum type, has %u want %u",
+ btrfs_super_csum_type(sb), BTRFS_CSUM_TYPE_CRC32);
+ goto out;
+ }
+ if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
+ ret = -EUCLEAN;
+ btrfs_err(fs_info,
+ "invalid incompact flags, has 0x%llu valid mask 0x%llu",
+ btrfs_super_incompat_flags(sb),
+ BTRFS_FEATURE_INCOMPAT_SUPP);
+ goto out;
+ }
+out:
+ if (ret < 0)
+ btrfs_err(fs_info,
+ "super block corruption detected before writing it to disk");
+ return ret;
+}
+
int open_ctree(struct super_block *sb,
struct btrfs_fs_devices *fs_devices,
char *options)
@@ -3770,6 +3805,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
flags = btrfs_super_flags(sb);
btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
+ ret = btrfs_validate_write_super(fs_info, sb);
+ if (ret < 0) {
+ mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+ btrfs_handle_fs_error(fs_info, -EUCLEAN,
+ "unexpected superblock corruption detected");
+ return -EUCLEAN;
+ }
+
ret = write_dev_supers(dev, sb, max_mirrors);
if (ret)
total_errors++;
--
2.17.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid()
2018-05-25 4:43 ` [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid() Qu Wenruo
@ 2018-05-25 5:15 ` Su Yue
0 siblings, 0 replies; 8+ messages in thread
From: Su Yue @ 2018-05-25 5:15 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3091 bytes --]
On 05/25/2018 12:43 PM, Qu Wenruo wrote:
> Refactor btrfs_check_super_valid() by the ways:
>
> 1) Rename it to btrfs_validate_mount_super()
> Now it's more obvious when the function should be called.
>
> 2) Extract core check routine into __validate_super()
> So later write time check can reuse it, and if needed, we could also
> use __validate_super() to check each super block.
>
> 3) Add more comment about btrfs_validate_mount_super()
> Mostly of what it doesn't check and when it should be called.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/disk-io.c | 29 +++++++++++++++++++++++++----
> 1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 13c5f90995aa..b981ecc4b6f9 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2440,9 +2440,19 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
> return ret;
> }
>
> -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
> +/*
> + * Real super block validation check
> + * NOTE: super csum type and incompat features will not be checked here.
> + *
> + * @sb: super block to check
> + * @mirror_num: the super block number to check its bytenr:
> + * 0 means the primary (1st) sb,
> + * 1 and 2 means 2nd and 3rd backup copy
> + * -1 means to skip bytenr check
> + */
> +static int __validate_super(struct btrfs_fs_info *fs_info,
> + struct btrfs_super_block *sb, int mirror_num)
> {
> - struct btrfs_super_block *sb = fs_info->super_copy;
> u64 nodesize = btrfs_super_nodesize(sb);
> u64 sectorsize = btrfs_super_sectorsize(sb);
> int ret = 0;
> @@ -2545,7 +2555,8 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
> ret = -EINVAL;
> }
>
> - if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
> + if (mirror_num >= 0 &&
> + btrfs_super_bytenr(sb) != btrfs_sb_offset(mirror_num)) {
> btrfs_err(fs_info, "super offset mismatch %llu != %u",
> btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
> ret = -EINVAL;
> @@ -2589,6 +2600,16 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
> return ret;
> }
>
> +/*
> + * Check the validation of super block at mount time.
> + * Some checks already done by early mount, like csum type and incompat flags
> + * will be skipped.
> + */
> +static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
> +{
> + return __validate_super(fs_info, fs_info->super_copy, 0);
> +}
> +
> int open_ctree(struct super_block *sb,
> struct btrfs_fs_devices *fs_devices,
> char *options)
> @@ -2814,7 +2835,7 @@ int open_ctree(struct super_block *sb,
>
> memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
>
> - ret = btrfs_check_super_valid(fs_info);
> + ret = btrfs_validate_mount_super(fs_info);
> if (ret) {
> btrfs_err(fs_info, "superblock contains fatal errors");
> err = -EINVAL;
>
[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 1791 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk
2018-05-25 4:43 ` [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk Qu Wenruo
@ 2018-05-25 6:33 ` Nikolay Borisov
2018-05-25 7:46 ` Qu Wenruo
2018-05-25 15:31 ` David Sterba
1 sibling, 1 reply; 8+ messages in thread
From: Nikolay Borisov @ 2018-05-25 6:33 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 25.05.2018 07:43, Qu Wenruo wrote:
> There are already 2 reports about strangely corrupted super blocks,
> where csum still matches but extra garbage gets slipped into super block.
>
> The corruption would looks like:
> ------
> superblock: bytenr=65536, device=/dev/sdc1
> ---------------------------------------------------------
> csum_type 41700 (INVALID)
> csum 0x3b252d3a [match]
> bytenr 65536
> flags 0x1
> ( WRITTEN )
> magic _BHRfS_M [match]
> ...
> incompat_flags 0x5b22400000000169
> ( MIXED_BACKREF |
> COMPRESS_LZO |
> BIG_METADATA |
> EXTENDED_IREF |
> SKINNY_METADATA |
> unknown flag: 0x5b22400000000000 )
> ...
> ------
> Or
> ------
> superblock: bytenr=65536, device=/dev/mapper/x
> ---------------------------------------------------------
> csum_type 35355 (INVALID)
> csum_size 32
> csum 0xf0dbeddd [match]
> bytenr 65536
> flags 0x1
> ( WRITTEN )
> magic _BHRfS_M [match]
> ...
> incompat_flags 0x176d200000000169
> ( MIXED_BACKREF |
> COMPRESS_LZO |
> BIG_METADATA |
> EXTENDED_IREF |
> SKINNY_METADATA |
> unknown flag: 0x176d200000000000 )
> ------
>
> Obviously, csum_type and incompat_flags get some garbage, but its csum
> still matches, which means kernel calculates the csum based on corrupted
> super block memory.
> And after manually fixing these values, the filesystem is completely
> healthy without any problem exposed by btrfs check.
>
> Although the cause is still unknown, at least detect it and prevent further
> corruption.
>
> Reported-by: Ken Swenson <flat@imo.uto.moe>
> Reported-by: Ben Parsons <9parsonsb@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com> , however 1 question
see below
> ---
> fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b981ecc4b6f9..d6c0cee627d9 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2610,6 +2610,41 @@ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
> return __validate_super(fs_info, fs_info->super_copy, 0);
> }
>
> +/*
> + * Check the validation of super block at write time.
> + * Some checks like bytenr check will be skipped as their values will be
> + * overwritten soon.
> + * Extra checks like csum type and incompact flags will be executed here.
> + */
Is this comment correct though, since this function is called right
before write_dev_supers which is performing the actual write and doesn't
really modify anything on the superblock ? Why would they be
overwritten? Perhaps it's somewhat stale since I see in the old version
(the one which is in misc-next now) we call btrfs_validate_write_super
before going into the loop which writes the super on each dev.
> +static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info,
> + struct btrfs_super_block *sb)
> +{
> + int ret;
> +
> + ret = __validate_super(fs_info, sb, -1);
> + if (ret < 0)
> + goto out;
> + if (btrfs_super_csum_type(sb) != BTRFS_CSUM_TYPE_CRC32) {
> + ret = -EUCLEAN;
> + btrfs_err(fs_info, "invalid csum type, has %u want %u",
> + btrfs_super_csum_type(sb), BTRFS_CSUM_TYPE_CRC32);
> + goto out;
> + }
> + if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
> + ret = -EUCLEAN;
> + btrfs_err(fs_info,
> + "invalid incompact flags, has 0x%llu valid mask 0x%llu",
> + btrfs_super_incompat_flags(sb),
> + BTRFS_FEATURE_INCOMPAT_SUPP);
> + goto out;
> + }
> +out:
> + if (ret < 0)
> + btrfs_err(fs_info,
> + "super block corruption detected before writing it to disk");
> + return ret;
> +}
> +
> int open_ctree(struct super_block *sb,
> struct btrfs_fs_devices *fs_devices,
> char *options)
> @@ -3770,6 +3805,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
> flags = btrfs_super_flags(sb);
> btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
>
> + ret = btrfs_validate_write_super(fs_info, sb);
> + if (ret < 0) {
> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> + btrfs_handle_fs_error(fs_info, -EUCLEAN,
> + "unexpected superblock corruption detected");
> + return -EUCLEAN;
> + }
> +
> ret = write_dev_supers(dev, sb, max_mirrors);
> if (ret)
> total_errors++;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk
2018-05-25 6:33 ` Nikolay Borisov
@ 2018-05-25 7:46 ` Qu Wenruo
0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2018-05-25 7:46 UTC (permalink / raw)
To: Nikolay Borisov, Qu Wenruo, linux-btrfs
On 2018年05月25日 14:33, Nikolay Borisov wrote:
>
>
> On 25.05.2018 07:43, Qu Wenruo wrote:
>> There are already 2 reports about strangely corrupted super blocks,
>> where csum still matches but extra garbage gets slipped into super block.
>>
>> The corruption would looks like:
>> ------
>> superblock: bytenr=65536, device=/dev/sdc1
>> ---------------------------------------------------------
>> csum_type 41700 (INVALID)
>> csum 0x3b252d3a [match]
>> bytenr 65536
>> flags 0x1
>> ( WRITTEN )
>> magic _BHRfS_M [match]
>> ...
>> incompat_flags 0x5b22400000000169
>> ( MIXED_BACKREF |
>> COMPRESS_LZO |
>> BIG_METADATA |
>> EXTENDED_IREF |
>> SKINNY_METADATA |
>> unknown flag: 0x5b22400000000000 )
>> ...
>> ------
>> Or
>> ------
>> superblock: bytenr=65536, device=/dev/mapper/x
>> ---------------------------------------------------------
>> csum_type 35355 (INVALID)
>> csum_size 32
>> csum 0xf0dbeddd [match]
>> bytenr 65536
>> flags 0x1
>> ( WRITTEN )
>> magic _BHRfS_M [match]
>> ...
>> incompat_flags 0x176d200000000169
>> ( MIXED_BACKREF |
>> COMPRESS_LZO |
>> BIG_METADATA |
>> EXTENDED_IREF |
>> SKINNY_METADATA |
>> unknown flag: 0x176d200000000000 )
>> ------
>>
>> Obviously, csum_type and incompat_flags get some garbage, but its csum
>> still matches, which means kernel calculates the csum based on corrupted
>> super block memory.
>> And after manually fixing these values, the filesystem is completely
>> healthy without any problem exposed by btrfs check.
>>
>> Although the cause is still unknown, at least detect it and prevent further
>> corruption.
>>
>> Reported-by: Ken Swenson <flat@imo.uto.moe>
>> Reported-by: Ben Parsons <9parsonsb@gmail.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com> , however 1 question
> see below
>
>> ---
>> fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 43 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b981ecc4b6f9..d6c0cee627d9 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2610,6 +2610,41 @@ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
>> return __validate_super(fs_info, fs_info->super_copy, 0);
>> }
>>
>> +/*
>> + * Check the validation of super block at write time.
>> + * Some checks like bytenr check will be skipped as their values will be
>> + * overwritten soon.
>> + * Extra checks like csum type and incompact flags will be executed here.
>> + */
>
> Is this comment correct though, since this function is called right
> before write_dev_supers which is performing the actual write and doesn't
> really modify anything on the superblock ?
It still does extra modification on bytenr and csum.
> Why would they be
> overwritten? Perhaps it's somewhat stale since I see in the old version
> (the one which is in misc-next now) we call btrfs_validate_write_super
> before going into the loop which writes the super on each dev.
Sorry I forgot to mention in this patch (only mentioned in cover letter).
There is one case that we didn't cover before.
For seed sprout case (mount seed device, and then add a new device), the
fs_info->super_copy is still the old seed device one, thus fsid won't
match with newly generated fsid.
So we still need to do the check later on, other than just using super_copy.
Thanks,
Qu
>
>> +static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info,
>> + struct btrfs_super_block *sb)
>> +{
>> + int ret;
>> +
>> + ret = __validate_super(fs_info, sb, -1);
>> + if (ret < 0)
>> + goto out;
>> + if (btrfs_super_csum_type(sb) != BTRFS_CSUM_TYPE_CRC32) {
>> + ret = -EUCLEAN;
>> + btrfs_err(fs_info, "invalid csum type, has %u want %u",
>> + btrfs_super_csum_type(sb), BTRFS_CSUM_TYPE_CRC32);
>> + goto out;
>> + }
>> + if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
>> + ret = -EUCLEAN;
>> + btrfs_err(fs_info,
>> + "invalid incompact flags, has 0x%llu valid mask 0x%llu",
>> + btrfs_super_incompat_flags(sb),
>> + BTRFS_FEATURE_INCOMPAT_SUPP);
>> + goto out;
>> + }
>> +out:
>> + if (ret < 0)
>> + btrfs_err(fs_info,
>> + "super block corruption detected before writing it to disk");
>> + return ret;
>> +}
>> +
>> int open_ctree(struct super_block *sb,
>> struct btrfs_fs_devices *fs_devices,
>> char *options)
>> @@ -3770,6 +3805,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
>> flags = btrfs_super_flags(sb);
>> btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
>>
>> + ret = btrfs_validate_write_super(fs_info, sb);
>> + if (ret < 0) {
>> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>> + btrfs_handle_fs_error(fs_info, -EUCLEAN,
>> + "unexpected superblock corruption detected");
>> + return -EUCLEAN;
>> + }
>> +
>> ret = write_dev_supers(dev, sb, max_mirrors);
>> if (ret)
>> total_errors++;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk
2018-05-25 4:43 ` [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk Qu Wenruo
2018-05-25 6:33 ` Nikolay Borisov
@ 2018-05-25 15:31 ` David Sterba
1 sibling, 0 replies; 8+ messages in thread
From: David Sterba @ 2018-05-25 15:31 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, May 25, 2018 at 12:43:25PM +0800, Qu Wenruo wrote:
> Reported-by: Ken Swenson <flat@imo.uto.moe>
> Reported-by: Ben Parsons <9parsonsb@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b981ecc4b6f9..d6c0cee627d9 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2610,6 +2610,41 @@ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info)
> return __validate_super(fs_info, fs_info->super_copy, 0);
> }
>
> +/*
> + * Check the validation of super block at write time.
> + * Some checks like bytenr check will be skipped as their values will be
> + * overwritten soon.
> + * Extra checks like csum type and incompact flags will be executed here.
> + */
> +static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info,
> + struct btrfs_super_block *sb)
> +{
> + int ret;
> +
> + ret = __validate_super(fs_info, sb, -1);
Please note that the patches have been slightly updated in the committed
version. Updates to patches that come that late after the initial
iterations are more convenient as incremental updates.
> + if (ret < 0)
> + goto out;
> + if (btrfs_super_csum_type(sb) != BTRFS_CSUM_TYPE_CRC32) {
> + ret = -EUCLEAN;
> + btrfs_err(fs_info, "invalid csum type, has %u want %u",
> + btrfs_super_csum_type(sb), BTRFS_CSUM_TYPE_CRC32);
> + goto out;
> + }
> + if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
> + ret = -EUCLEAN;
> + btrfs_err(fs_info,
> + "invalid incompact flags, has 0x%llu valid mask 0x%llu",
> + btrfs_super_incompat_flags(sb),
> + BTRFS_FEATURE_INCOMPAT_SUPP);
> + goto out;
> + }
> +out:
> + if (ret < 0)
> + btrfs_err(fs_info,
> + "super block corruption detected before writing it to disk");
> + return ret;
> +}
> +
> int open_ctree(struct super_block *sb,
> struct btrfs_fs_devices *fs_devices,
> char *options)
> @@ -3770,6 +3805,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
> flags = btrfs_super_flags(sb);
> btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
>
> + ret = btrfs_validate_write_super(fs_info, sb);
> + if (ret < 0) {
> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> + btrfs_handle_fs_error(fs_info, -EUCLEAN,
> + "unexpected superblock corruption detected");
> + return -EUCLEAN;
The shortcut from here makes sense but we still have to wait for any
previous superblock writes that could have happened (write_dev_supers).
The sequence would be:
at device 1
check sb
write
at device 2
check sb
failed write
now the bh from device 1 write would have one more reference.
As the change is only in the 3rd patch, please send only that one and
based on the patch in misc-next. Thanks.
> + }
> +
> ret = write_dev_supers(dev, sb, max_mirrors);
(Here the reference gets incremented.)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-05-25 15:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-25 4:43 [PATCH v4 0/3] btrfs: Add write time super block validation Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 1/3] btrfs: Move btrfs_check_super_valid() to avoid forward declaration Qu Wenruo
2018-05-25 4:43 ` [PATCH v4 2/3] btrfs: Refactor btrfs_check_super_valid() Qu Wenruo
2018-05-25 5:15 ` Su Yue
2018-05-25 4:43 ` [PATCH v4 3/3] btrfs: Do super block verification before writing it to disk Qu Wenruo
2018-05-25 6:33 ` Nikolay Borisov
2018-05-25 7:46 ` Qu Wenruo
2018-05-25 15:31 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).