* [PATCH] btrfs: tree-checker: add btrfs dev extent checks
@ 2024-08-11 5:50 Qu Wenruo
2024-08-13 23:11 ` David Sterba
2024-08-15 5:17 ` Anand Jain
0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2024-08-11 5:50 UTC (permalink / raw)
To: linux-btrfs
[REPORT]
There is a corruption report that btrfs refuse to mount a fs that has
overlapping dev extents:
BTRFS error (device sdc): dev extent devid 4 physical offset
14263979671552 overlap with previous dev extent end 14263980982272
BTRFS error (device sdc): failed to verify dev extents against chunks: -117
BTRFS error (device sdc): open_ctree failed
[CAUSE]
The cause is very obvious, there is a bad dev extent item with incorrect
length.
Although we are not 100% sure of the cause before getting the dev tree
dump, I'm already surprised that we do not have any checks on dev tree.
Currently we only do the dev-extent verification at mount time, but if the
corruption is caused by memory bitflip, we really want to catch it before
writing the corruption to the storage.
Furthermore the dev extent items has the following key definition:
(<device id> DEV_EXTENT <physical offset>)
Thus we can not just rely on the generic key order check to make sure
there is no overlapping.
[ENHANCEMENT]
Introduce dedicated dev extent checks, including:
- Fixed member checks
* chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
* chunk_objectid should always be
BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
- Alignment checks
* chunk_offset should be aligned to sectorsize
* length should be aligned to sectorsize
* key.offset should be aligned to sectorsize
- Overlap checks
If the previous key is also a dev-extent item, with the same
device id, make sure we do not overlap with the previous dev extent.
Reported: Stefan N <stefannnau@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/tree-checker.c | 69 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..3e8fbedfe04d 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1763,6 +1763,72 @@ static int check_raid_stripe_extent(const struct extent_buffer *leaf,
return 0;
}
+static int check_dev_extent_item(const struct extent_buffer *leaf,
+ const struct btrfs_key *key,
+ int slot,
+ struct btrfs_key *prev_key)
+{
+ struct btrfs_dev_extent *de;
+ const u32 sectorsize = leaf->fs_info->sectorsize;
+
+ de = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
+ /* Basic fixed member checks. */
+ if (unlikely(btrfs_dev_extent_chunk_tree(leaf, de) !=
+ BTRFS_CHUNK_TREE_OBJECTID)) {
+ generic_err(leaf, slot,
+ "invalid dev extent chunk tree id, has %llu expect %llu",
+ btrfs_dev_extent_chunk_tree(leaf, de),
+ BTRFS_CHUNK_TREE_OBJECTID);
+ return -EUCLEAN;
+ }
+ if (unlikely(btrfs_dev_extent_chunk_objectid(leaf, de) !=
+ BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
+ generic_err(leaf, slot,
+ "invalid dev extent chunk objectid, has %llu expect %llu",
+ btrfs_dev_extent_chunk_objectid(leaf, de),
+ BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+ return -EUCLEAN;
+ }
+ /* Alignment check. */
+ if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) {
+ generic_err(leaf, slot,
+ "invalid dev extent key.offset, has %llu not aligned to %u",
+ key->offset, sectorsize);
+ return -EUCLEAN;
+ }
+ if (unlikely(!IS_ALIGNED(btrfs_dev_extent_chunk_offset(leaf, de),
+ sectorsize))) {
+ generic_err(leaf, slot,
+ "invalid dev extent chunk offset, has %llu not aligned to %u",
+ btrfs_dev_extent_chunk_objectid(leaf, de),
+ sectorsize);
+ return -EUCLEAN;
+ }
+ if (unlikely(!IS_ALIGNED(btrfs_dev_extent_length(leaf, de),
+ sectorsize))) {
+ generic_err(leaf, slot,
+ "invalid dev extent length, has %llu not aligned to %u",
+ btrfs_dev_extent_length(leaf, de), sectorsize);
+ return -EUCLEAN;
+ }
+ /* Overlap check with previous dev extent. */
+ if (slot && prev_key->objectid == key->objectid &&
+ prev_key->type == key->type) {
+ struct btrfs_dev_extent *prev_de;
+ u64 prev_len;
+
+ prev_de = btrfs_item_ptr(leaf, slot - 1, struct btrfs_dev_extent);
+ prev_len = btrfs_dev_extent_length(leaf, prev_de);
+ if (unlikely(prev_key->offset + prev_len > key->offset)) {
+ generic_err(leaf, slot,
+ "dev extent overlap, prev offset %llu len %llu current offset %llu",
+ prev_key->objectid, prev_len, key->offset);
+ return -EUCLEAN;
+ }
+ }
+ return 0;
+}
+
/*
* Common point to switch the item-specific validation.
*/
@@ -1799,6 +1865,9 @@ static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf,
case BTRFS_DEV_ITEM_KEY:
ret = check_dev_item(leaf, key, slot);
break;
+ case BTRFS_DEV_EXTENT_KEY:
+ ret = check_dev_extent_item(leaf, key, slot, prev_key);
+ break;
case BTRFS_INODE_ITEM_KEY:
ret = check_inode_item(leaf, key, slot);
break;
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] btrfs: tree-checker: add btrfs dev extent checks
2024-08-11 5:50 [PATCH] btrfs: tree-checker: add btrfs dev extent checks Qu Wenruo
@ 2024-08-13 23:11 ` David Sterba
2024-08-13 23:17 ` Qu Wenruo
2024-08-15 5:17 ` Anand Jain
1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2024-08-13 23:11 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote:
> [REPORT]
> There is a corruption report that btrfs refuse to mount a fs that has
> overlapping dev extents:
>
> BTRFS error (device sdc): dev extent devid 4 physical offset
> 14263979671552 overlap with previous dev extent end 14263980982272
> BTRFS error (device sdc): failed to verify dev extents against chunks: -117
> BTRFS error (device sdc): open_ctree failed
>
> [CAUSE]
> The cause is very obvious, there is a bad dev extent item with incorrect
> length.
> Although we are not 100% sure of the cause before getting the dev tree
> dump, I'm already surprised that we do not have any checks on dev tree.
>
> Currently we only do the dev-extent verification at mount time, but if the
> corruption is caused by memory bitflip, we really want to catch it before
> writing the corruption to the storage.
>
> Furthermore the dev extent items has the following key definition:
>
> (<device id> DEV_EXTENT <physical offset>)
>
> Thus we can not just rely on the generic key order check to make sure
> there is no overlapping.
>
> [ENHANCEMENT]
> Introduce dedicated dev extent checks, including:
>
> - Fixed member checks
> * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
> * chunk_objectid should always be
> BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
>
> - Alignment checks
> * chunk_offset should be aligned to sectorsize
> * length should be aligned to sectorsize
> * key.offset should be aligned to sectorsize
>
> - Overlap checks
> If the previous key is also a dev-extent item, with the same
> device id, make sure we do not overlap with the previous dev extent.
>
> Reported: Stefan N <stefannnau@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Looks like we missed some simple tree item checks indeed.
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs: tree-checker: add btrfs dev extent checks
2024-08-13 23:11 ` David Sterba
@ 2024-08-13 23:17 ` Qu Wenruo
2024-08-13 23:32 ` David Sterba
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2024-08-13 23:17 UTC (permalink / raw)
To: dsterba; +Cc: linux-btrfs
在 2024/8/14 08:41, David Sterba 写道:
> On Sun, Aug 11, 2024 at 03:20:08PM +0930, Qu Wenruo wrote:
>> [REPORT]
>> There is a corruption report that btrfs refuse to mount a fs that has
>> overlapping dev extents:
>>
>> BTRFS error (device sdc): dev extent devid 4 physical offset
>> 14263979671552 overlap with previous dev extent end 14263980982272
>> BTRFS error (device sdc): failed to verify dev extents against chunks: -117
>> BTRFS error (device sdc): open_ctree failed
>>
>> [CAUSE]
>> The cause is very obvious, there is a bad dev extent item with incorrect
>> length.
>> Although we are not 100% sure of the cause before getting the dev tree
>> dump, I'm already surprised that we do not have any checks on dev tree.
>>
>> Currently we only do the dev-extent verification at mount time, but if the
>> corruption is caused by memory bitflip, we really want to catch it before
>> writing the corruption to the storage.
>>
>> Furthermore the dev extent items has the following key definition:
>>
>> (<device id> DEV_EXTENT <physical offset>)
>>
>> Thus we can not just rely on the generic key order check to make sure
>> there is no overlapping.
>>
>> [ENHANCEMENT]
>> Introduce dedicated dev extent checks, including:
>>
>> - Fixed member checks
>> * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
>> * chunk_objectid should always be
>> BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
>>
>> - Alignment checks
>> * chunk_offset should be aligned to sectorsize
>> * length should be aligned to sectorsize
>> * key.offset should be aligned to sectorsize
>>
>> - Overlap checks
>> If the previous key is also a dev-extent item, with the same
>> device id, make sure we do not overlap with the previous dev extent.
>>
>> Reported: Stefan N <stefannnau@gmail.com>
>> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Looks like we missed some simple tree item checks indeed.
The original idea is, we have btrfs_verify_dev_extents() at mount time,
thus it's enough to reject bad dev extents, and no need for tree-checker
for dev-extents.
But this method doesn't prevent bitflip from sneaking in during runtime.
So in the long run, our sanity checks should:
- Do cross-checks at mount time for critical infrastructure
To prevent corruption sneaking in undetected.
- Do in-leaf checks at tree-checker
To prevent corruption reach storage.
- Do extra read-time cross-checks
Just like the dir item checks we did.
Thanks,
Qu
>
> Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs: tree-checker: add btrfs dev extent checks
2024-08-13 23:17 ` Qu Wenruo
@ 2024-08-13 23:32 ` David Sterba
0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2024-08-13 23:32 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Aug 14, 2024 at 08:47:40AM +0930, Qu Wenruo wrote:
> > Looks like we missed some simple tree item checks indeed.
>
> The original idea is, we have btrfs_verify_dev_extents() at mount time,
> thus it's enough to reject bad dev extents, and no need for tree-checker
> for dev-extents.
>
> But this method doesn't prevent bitflip from sneaking in during runtime.
Yeah, the tree-checker is run before commit too, so the mount checks
won't catch that.
> So in the long run, our sanity checks should:
>
> - Do cross-checks at mount time for critical infrastructure
> To prevent corruption sneaking in undetected.
At mount we can afford to do more than the tree-checker can do in the
single leaf, like matching the dev extents and block groups.
> - Do in-leaf checks at tree-checker
> To prevent corruption reach storage.
>
> - Do extra read-time cross-checks
> Just like the dir item checks we did.
Some values have a clear range, or a constraint like alignment to a
sectorsize. Beyond that I think it's quite limited, we can't read other
leaves but maybe other checks against existing in-memory structures can
be done. An example, a block group item is consistent but overlaps with
one in the tree. Similar to what you do in this patch, but the bg is
already handled because of how the data is tracked. The tree-checker
coverage is pretty good so it's hard to find what else to cross-check.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs: tree-checker: add btrfs dev extent checks
2024-08-11 5:50 [PATCH] btrfs: tree-checker: add btrfs dev extent checks Qu Wenruo
2024-08-13 23:11 ` David Sterba
@ 2024-08-15 5:17 ` Anand Jain
2024-08-15 12:25 ` David Sterba
1 sibling, 1 reply; 6+ messages in thread
From: Anand Jain @ 2024-08-15 5:17 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 11/8/24 1:50 pm, Qu Wenruo wrote:
> [REPORT]
> There is a corruption report that btrfs refuse to mount a fs that has
> overlapping dev extents:
>
> BTRFS error (device sdc): dev extent devid 4 physical offset
> 14263979671552 overlap with previous dev extent end 14263980982272
> BTRFS error (device sdc): failed to verify dev extents against chunks: -117
> BTRFS error (device sdc): open_ctree failed
>
> [CAUSE]
> The cause is very obvious, there is a bad dev extent item with incorrect
> length.
> Although we are not 100% sure of the cause before getting the dev tree
> dump, I'm already surprised that we do not have any checks on dev tree.
>
> Currently we only do the dev-extent verification at mount time, but if the
> corruption is caused by memory bitflip, we really want to catch it before
> writing the corruption to the storage.
>
> Furthermore the dev extent items has the following key definition:
>
> (<device id> DEV_EXTENT <physical offset>)
>
> Thus we can not just rely on the generic key order check to make sure
> there is no overlapping.
>
> [ENHANCEMENT]
> Introduce dedicated dev extent checks, including:
>
> - Fixed member checks
> * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
> * chunk_objectid should always be
> BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
>
> - Alignment checks
> * chunk_offset should be aligned to sectorsize
> * length should be aligned to sectorsize
> * key.offset should be aligned to sectorsize
>
> - Overlap checks
> If the previous key is also a dev-extent item, with the same
> device id, make sure we do not overlap with the previous dev extent.
>
> Reported: Stefan N <stefannnau@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Looks good.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Thx.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs: tree-checker: add btrfs dev extent checks
2024-08-15 5:17 ` Anand Jain
@ 2024-08-15 12:25 ` David Sterba
0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2024-08-15 12:25 UTC (permalink / raw)
To: Anand Jain; +Cc: Qu Wenruo, linux-btrfs
On Thu, Aug 15, 2024 at 01:17:00PM +0800, Anand Jain wrote:
> On 11/8/24 1:50 pm, Qu Wenruo wrote:
> > [REPORT]
> > There is a corruption report that btrfs refuse to mount a fs that has
> > overlapping dev extents:
> >
> > BTRFS error (device sdc): dev extent devid 4 physical offset
> > 14263979671552 overlap with previous dev extent end 14263980982272
> > BTRFS error (device sdc): failed to verify dev extents against chunks: -117
> > BTRFS error (device sdc): open_ctree failed
> >
> > [CAUSE]
> > The cause is very obvious, there is a bad dev extent item with incorrect
> > length.
> > Although we are not 100% sure of the cause before getting the dev tree
> > dump, I'm already surprised that we do not have any checks on dev tree.
> >
> > Currently we only do the dev-extent verification at mount time, but if the
> > corruption is caused by memory bitflip, we really want to catch it before
> > writing the corruption to the storage.
> >
> > Furthermore the dev extent items has the following key definition:
> >
> > (<device id> DEV_EXTENT <physical offset>)
> >
> > Thus we can not just rely on the generic key order check to make sure
> > there is no overlapping.
> >
> > [ENHANCEMENT]
> > Introduce dedicated dev extent checks, including:
> >
> > - Fixed member checks
> > * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
> > * chunk_objectid should always be
> > BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
> >
> > - Alignment checks
> > * chunk_offset should be aligned to sectorsize
> > * length should be aligned to sectorsize
> > * key.offset should be aligned to sectorsize
> >
> > - Overlap checks
> > If the previous key is also a dev-extent item, with the same
> > device id, make sure we do not overlap with the previous dev extent.
> >
> > Reported: Stefan N <stefannnau@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Looks good.
>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
I'm doing some updaets to for-next so I've added the tag.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-15 12:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-11 5:50 [PATCH] btrfs: tree-checker: add btrfs dev extent checks Qu Wenruo
2024-08-13 23:11 ` David Sterba
2024-08-13 23:17 ` Qu Wenruo
2024-08-13 23:32 ` David Sterba
2024-08-15 5:17 ` Anand Jain
2024-08-15 12:25 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox