From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Kai Krakow <hurikhan77@gmail.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs crashes during routine btrfs-balance-least-used
Date: Tue, 16 Jul 2024 18:39:32 +0930 [thread overview]
Message-ID: <71256c6e-c584-48e8-bce6-c04aff0d0496@gmx.com> (raw)
In-Reply-To: <CAMthOuNqaKoJGYWNDdV33hWkG1oa77vuEt0gxxYVbW+KNrtnaw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]
在 2024/7/16 16:21, Kai Krakow 写道:
> Hello Qu!
>
>[...]
>>
>> And we're also enhancing our handling on bad hardwares (except when they
>> cheat on FLUSH), the biggest example is tree-checker.
>>
>> But I really run out of ideas for such a huge corruption.
>> Your setup really rules out almost everything, from out-of-tree (Nv*dia
>> drivers) to bad hot memory migration implementation.
>>
>> I'll let you know when the new tree-checker patch come out, and
>> hopefully it can catch the root cause.
>
> I've got your patch from linux-btrfs git (which you've sent to the
> list yesterday). It looks like this cannot be applied to 6.6 LTS. Is
> it part of a larger patch series and will be backported?
>
> Is it safe to just cherry pick all patches saying "tree-checker" to
> the 6.6 tree?
The conflicts are caused by the missing commit 1645c283a87c ("btrfs:
tree-checker: add type and sequence check for inline backrefs") and a
cleanup patch.
So I manually backported those two patches, just give them a try.
Meanwhile the missing commit looks like a good candidate for stable 6.6
branch, I can definitely send it to stable later.
Thanks,
Qu
>
> Thanks,
> Kai
[-- Attachment #2: 0002-btrfs-tree-checker-validate-dref-root-and-objectid.patch --]
[-- Type: text/x-patch, Size: 5451 bytes --]
From dbd8f12a0f31489eed4e061116041376ef7ec627 Mon Sep 17 00:00:00 2001
Message-ID: <dbd8f12a0f31489eed4e061116041376ef7ec627.1721120832.git.wqu@suse.com>
In-Reply-To: <cover.1721120832.git.wqu@suse.com>
References: <cover.1721120832.git.wqu@suse.com>
From: Qu Wenruo <wqu@suse.com>
Date: Mon, 15 Jul 2024 16:07:07 +0930
Subject: [PATCH 2/2] btrfs: tree-checker: validate dref root and objectid
Not yet upstreamed.
[CORRUPTION]
There is a bug report that btrfs flips RO due to a corruption in the
extent tree, the involved dumps looks like this:
item 188 key (402811572224 168 4096) itemoff 14598 itemsize 79
extent refs 3 gen 3678544 flags 1
ref#0: extent data backref root 13835058055282163977 objectid 281473384125923 offset 81432576 count 1
ref#1: shared data backref parent 1947073626112 count 1
ref#2: shared data backref parent 1156030103552 count 1
BTRFS critical (device vdc1: state EA): unable to find ref byte nr 402811572224 parent 0 root 265 owner 28703026 offset 81432576 slot 189
BTRFS error (device vdc1: state EA): failed to run delayed ref for logical 402811572224 num_bytes 4096 type 178 action 2 ref_mod 1: -2
[CAUSE]
The corrupted entry is ref#0 of item 188.
The root number 13835058055282163977 is beyond the upper limit for root
items (the current limit is 1 << 48), and the objectid also looks
suspicious.
Only the offset and count is correct.
[ENHANCEMENT]
Although it's still unknown why we have such many bytes corrupted
randomly, we can still enhance the tree-checker for data backrefs by:
- Validate the root value
For now there should only be 3 types of roots can have data backref:
* subvolume trees
* data reloc trees
* root tree
Only for v1 space cache
- validate the objectid value
The objectid should be a valid inode number.
Hopefully we can catch such problem in the future with the new checkers.
Reported-by: Kai Krakow <hurikhan77@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAMthOuPjg5RDT-G_LXeBBUUtzt3cq=JywF+D1_h+JYxe=WKp-Q@mail.gmail.com/#t
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/tree-checker.c | 47 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 5d6cfa618dc4..f14825f3d4e8 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1265,6 +1265,19 @@ static void extent_err(const struct extent_buffer *eb, int slot,
va_end(args);
}
+static bool is_valid_dref_root(u64 rootid)
+{
+ /*
+ * The following tree root objectids are allowed to have a data backref:
+ * - subvolume trees
+ * - data reloc tree
+ * - tree root
+ * For v1 space cache
+ */
+ return is_fstree(rootid) || rootid == BTRFS_DATA_RELOC_TREE_OBJECTID ||
+ rootid == BTRFS_ROOT_TREE_OBJECTID;
+}
+
static int check_extent_item(struct extent_buffer *leaf,
struct btrfs_key *key, int slot,
struct btrfs_key *prev_key)
@@ -1417,6 +1430,8 @@ static int check_extent_item(struct extent_buffer *leaf,
struct btrfs_extent_data_ref *dref;
struct btrfs_shared_data_ref *sref;
u64 seq;
+ u64 dref_root;
+ u64 dref_objectid;
u64 dref_offset;
u64 inline_offset;
u8 inline_type;
@@ -1460,11 +1475,26 @@ static int check_extent_item(struct extent_buffer *leaf,
*/
case BTRFS_EXTENT_DATA_REF_KEY:
dref = (struct btrfs_extent_data_ref *)(&iref->offset);
+ dref_root = btrfs_extent_data_ref_root(leaf, dref);
+ dref_objectid = btrfs_extent_data_ref_objectid(leaf, dref);
dref_offset = btrfs_extent_data_ref_offset(leaf, dref);
seq = hash_extent_data_ref(
btrfs_extent_data_ref_root(leaf, dref),
btrfs_extent_data_ref_objectid(leaf, dref),
btrfs_extent_data_ref_offset(leaf, dref));
+ if (unlikely(!is_valid_dref_root(dref_root))) {
+ extent_err(leaf, slot,
+ "invalid data ref root value %llu",
+ dref_root);
+ return -EUCLEAN;
+ }
+ if (unlikely(dref_objectid < BTRFS_FIRST_FREE_OBJECTID ||
+ dref_objectid > BTRFS_LAST_FREE_OBJECTID)) {
+ extent_err(leaf, slot,
+ "invalid data ref objectid value %llu",
+ dref_root);
+ return -EUCLEAN;
+ }
if (unlikely(!IS_ALIGNED(dref_offset,
fs_info->sectorsize))) {
extent_err(leaf, slot,
@@ -1600,6 +1630,8 @@ static int check_extent_data_ref(struct extent_buffer *leaf,
return -EUCLEAN;
}
for (; ptr < end; ptr += sizeof(*dref)) {
+ u64 root;
+ u64 objectid;
u64 offset;
/*
@@ -1607,7 +1639,22 @@ static int check_extent_data_ref(struct extent_buffer *leaf,
* overflow from the leaf due to hash collisions.
*/
dref = (struct btrfs_extent_data_ref *)ptr;
+ root = btrfs_extent_data_ref_root(leaf, dref);
+ objectid = btrfs_extent_data_ref_objectid(leaf, dref);
offset = btrfs_extent_data_ref_offset(leaf, dref);
+ if (unlikely(!is_valid_dref_root(root))) {
+ extent_err(leaf, slot,
+ "invalid extent data backref root value %llu",
+ root);
+ return -EUCLEAN;
+ }
+ if (unlikely(objectid < BTRFS_FIRST_FREE_OBJECTID ||
+ objectid > BTRFS_LAST_FREE_OBJECTID)) {
+ extent_err(leaf, slot,
+ "invalid extent data backref objectid value %llu",
+ root);
+ return -EUCLEAN;
+ }
if (unlikely(!IS_ALIGNED(offset, leaf->fs_info->sectorsize))) {
extent_err(leaf, slot,
"invalid extent data backref offset, have %llu expect aligned to %u",
--
2.45.2
[-- Attachment #3: 0001-btrfs-tree-checker-add-type-and-sequence-check-for-i.patch --]
[-- Type: text/x-patch, Size: 5404 bytes --]
From 63189f5d922db2bc525f5251be46fe857e00a2d6 Mon Sep 17 00:00:00 2001
Message-ID: <63189f5d922db2bc525f5251be46fe857e00a2d6.1721120832.git.wqu@suse.com>
In-Reply-To: <cover.1721120832.git.wqu@suse.com>
References: <cover.1721120832.git.wqu@suse.com>
From: Qu Wenruo <wqu@suse.com>
Date: Tue, 24 Oct 2023 12:41:11 +1030
Subject: [PATCH 1/2] btrfs: tree-checker: add type and sequence check for
inline backrefs
commit 1645c283a87c61f84b2bffd81f50724df959b11a upstream.
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
[ Fix a conflict due to header cleanup. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/tree-checker.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index cc6bc5985120..5d6cfa618dc4 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -29,6 +29,7 @@
#include "accessors.h"
#include "file-item.h"
#include "inode-item.h"
+#include "extent-tree.h"
/*
* Error message should follow the following format:
@@ -1274,6 +1275,8 @@ static int check_extent_item(struct extent_buffer *leaf,
unsigned long ptr; /* Current pointer inside inline refs */
unsigned long end; /* Extent item end */
const u32 item_size = btrfs_item_size(leaf, slot);
+ u8 last_type = 0;
+ u64 last_seq = U64_MAX;
u64 flags;
u64 generation;
u64 total_refs; /* Total refs in btrfs_extent_item */
@@ -1320,6 +1323,18 @@ static int check_extent_item(struct extent_buffer *leaf,
* 2.2) Ref type specific data
* Either using btrfs_extent_inline_ref::offset, or specific
* data structure.
+ *
+ * All above inline items should follow the order:
+ *
+ * - All btrfs_extent_inline_ref::type should be in an ascending
+ * order
+ *
+ * - Within the same type, the items should follow a descending
+ * order by their sequence number. The sequence number is
+ * determined by:
+ * * btrfs_extent_inline_ref::offset for all types other than
+ * EXTENT_DATA_REF
+ * * hash_extent_data_ref() for EXTENT_DATA_REF
*/
if (unlikely(item_size < sizeof(*ei))) {
extent_err(leaf, slot,
@@ -1401,6 +1416,7 @@ static int check_extent_item(struct extent_buffer *leaf,
struct btrfs_extent_inline_ref *iref;
struct btrfs_extent_data_ref *dref;
struct btrfs_shared_data_ref *sref;
+ u64 seq;
u64 dref_offset;
u64 inline_offset;
u8 inline_type;
@@ -1414,6 +1430,7 @@ static int check_extent_item(struct extent_buffer *leaf,
iref = (struct btrfs_extent_inline_ref *)ptr;
inline_type = btrfs_extent_inline_ref_type(leaf, iref);
inline_offset = btrfs_extent_inline_ref_offset(leaf, iref);
+ seq = inline_offset;
if (unlikely(ptr + btrfs_extent_inline_ref_size(inline_type) > end)) {
extent_err(leaf, slot,
"inline ref item overflows extent item, ptr %lu iref size %u end %lu",
@@ -1444,6 +1461,10 @@ static int check_extent_item(struct extent_buffer *leaf,
case BTRFS_EXTENT_DATA_REF_KEY:
dref = (struct btrfs_extent_data_ref *)(&iref->offset);
dref_offset = btrfs_extent_data_ref_offset(leaf, dref);
+ seq = hash_extent_data_ref(
+ btrfs_extent_data_ref_root(leaf, dref),
+ btrfs_extent_data_ref_objectid(leaf, dref),
+ btrfs_extent_data_ref_offset(leaf, dref));
if (unlikely(!IS_ALIGNED(dref_offset,
fs_info->sectorsize))) {
extent_err(leaf, slot,
@@ -1470,6 +1491,24 @@ static int check_extent_item(struct extent_buffer *leaf,
inline_type);
return -EUCLEAN;
}
+ if (inline_type < last_type) {
+ extent_err(leaf, slot,
+ "inline ref out-of-order: has type %u, prev type %u",
+ inline_type, last_type);
+ return -EUCLEAN;
+ }
+ /* Type changed, allow the sequence starts from U64_MAX again. */
+ if (inline_type > last_type)
+ last_seq = U64_MAX;
+ if (seq > last_seq) {
+ extent_err(leaf, slot,
+"inline ref out-of-order: has type %u offset %llu seq 0x%llx, prev type %u seq 0x%llx",
+ inline_type, inline_offset, seq,
+ last_type, last_seq);
+ return -EUCLEAN;
+ }
+ last_type = inline_type;
+ last_seq = seq;
ptr += btrfs_extent_inline_ref_size(inline_type);
}
/* No padding is allowed */
--
2.45.2
next prev parent reply other threads:[~2024-07-16 9:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-14 16:13 btrfs crashes during routine btrfs-balance-least-used Kai Krakow
2024-07-14 21:53 ` Qu Wenruo
2024-07-15 4:29 ` Kai Krakow
2024-07-15 5:00 ` Qu Wenruo
2024-07-15 5:31 ` Kai Krakow
2024-07-15 5:50 ` Qu Wenruo
2024-07-16 6:51 ` Kai Krakow
2024-07-16 9:09 ` Qu Wenruo [this message]
2024-07-16 13:25 ` Kai Krakow
2024-07-16 22:18 ` Qu Wenruo
2024-07-17 8:09 ` Kai Krakow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71256c6e-c584-48e8-bce6-c04aff0d0496@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=hurikhan77@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox