From: Teng Liu <27rabbitlt@gmail.com>
To: linux-btrfs@vger.kernel.org
Cc: Teng Liu <27rabbitlt@gmail.com>,
dsterba@suse.com, clm@fb.com, wqu@suse.com,
linux-kernel@vger.kernel.org,
syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Subject: [PATCH v4] btrfs: validate data reloc tree file extent item members
Date: Wed, 13 May 2026 13:35:44 +0200 [thread overview]
Message-ID: <20260513113553.213959-1-27rabbitlt@gmail.com> (raw)
In-Reply-To: <20260427202822.278326-1-27rabbitlt@gmail.com>
get_new_location() uses BUG_ON() to crash the kernel if the file extent
item it looks up has any of offset, compression, encryption, or
other_encoding set non-zero. The data reloc inode is only written by
relocation's own paths and the four fields are always 0 in what the
kernel writes:
- insert_prealloc_file_extent() memsets the stack item to zero and
only fills in type, disk_bytenr, disk_num_bytes and num_bytes, so
offset/compression/encryption/other_encoding stay 0.
- insert_ordered_extent_file_extent() copies oe->compress_type into
the file extent's compression field, but the data reloc inode is
created with BTRFS_INODE_NOCOMPRESS so compress_type is always 0;
encryption and other_encoding are reserved-and-zero in btrfs.
A non-zero value here means the leaf decoded from disk does not match
what the kernel wrote, i.e. on-disk corruption. A malformed image
reaches this code via balance and panics the kernel.
A previous attempt to enforce all four constraints in tree-checker's
check_extent_data_item() was merged as commit 7d0ee95979e9 ("btrfs:
validate data reloc tree file extent item members in tree-checker")
and then reverted by commit 1c034697fcaa after btrfs/061 produced
false positives on arm64 with 64K pages. The reason: relocation
writeback legitimately produces REG file_extent_items with offset != 0
in the data reloc tree. When an ordered extent covers only the back
portion of an underlying PREALLOC (num_bytes < ram_bytes on the input
file_extent), insert_ordered_extent_file_extent() inserts a REG with
offset = oe->offset
num_bytes = oe->num_bytes
ram_bytes preserved from the original PREALLOC,
and this item can reach disk if a transaction commit fires while it
is present in the leaf.
The four fields belong in different layers:
- compression, encryption and other_encoding are universal
invariants for every item in the data reloc tree, regardless of
cluster geometry. Enforce them in tree-checker's
check_extent_data_item() so a corrupt leaf is rejected at read
time.
- offset is only an invariant at the cluster-boundary keys that
get_new_location() searches (the key is computed as
src_disk_bytenr - reloc_block_group_start). Partial-PREALLOC
writebacks legitimately place REG items at non-boundary keys with
offset != 0; tree-checker cannot reject these. The cluster-
boundary item is always written by either
insert_prealloc_file_extent() (offset=0 by memset) or by the
front portion of a partial writeback (offset=0 by construction),
so a non-zero offset there is corruption.
Enforce the universal invariants in check_extent_data_item() with a
file_extent_err() rejection. Convert the BUG_ON() in
get_new_location() to a -EUCLEAN return paired with btrfs_print_leaf()
and btrfs_err() so the offending leaf is logged. The caller in
replace_file_extents() already handles non-zero returns from
get_new_location() by breaking out of the loop without aborting the
transaction.
Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: David Sterba <dsterba@suse.com>
Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v4:
- Split the check by which layer the invariant holds in. Reject
compression/encryption/other_encoding != 0 in tree-checker (true
on-disk invariant for the entire data reloc tree). Keep the offset
check at the call site in get_new_location() (true only at the
cluster-boundary keys it searches; partial-PREALLOC writeback
legitimately produces non-zero offset at non-boundary keys, which
is why the v3 single-rule approach was reverted).
- Suggested by Qu Wenruo in reply to v3:
https://lore.kernel.org/linux-btrfs/20260427202822.278326-1-27rabbitlt@gmail.com/
Changes in v3:
- Moved the entire four-field check from get_new_location() into
tree-checker's check_extent_data_item(). Replaced BUG_ON() with
ASSERT() in get_new_location(). Merged as 7d0ee95979e9 and
reverted by 1c034697fcaa due to false positives in btrfs/061 on
arm64 64K pages.
Changes in v2:
- Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err()
so the offending leaf is dumped to dmesg, per Qu's v1 review:
https://lore.kernel.org/linux-btrfs/6c54901d-5e07-4c46-9553-997b28c93b86@suse.com/
- Expand the changelog to argue why non-zero compression/encryption/
other_encoding in the data reloc inode imply on-disk corruption
rather than a kernel bug.
fs/btrfs/relocation.c | 22 ++++++++++++++++++----
fs/btrfs/tree-checker.c | 27 +++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..01977fa282db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -814,6 +814,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
u64 bytenr, u64 num_bytes)
{
struct btrfs_root *root = BTRFS_I(reloc_inode)->root;
+ struct btrfs_fs_info *fs_info = root->fs_info;
BTRFS_PATH_AUTO_FREE(path);
struct btrfs_file_extent_item *fi;
struct extent_buffer *leaf;
@@ -835,10 +836,23 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
fi = btrfs_item_ptr(leaf, path->slots[0],
struct btrfs_file_extent_item);
- BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
- btrfs_file_extent_compression(leaf, fi) ||
- btrfs_file_extent_encryption(leaf, fi) ||
- btrfs_file_extent_other_encoding(leaf, fi));
+ /*
+ * The cluster-boundary key searched above is always written by
+ * relocation with offset 0: either by insert_prealloc_file_extent()
+ * (memsets the stack item to 0) or by the front portion of a partial
+ * writeback (offset=0 by construction). A non-zero value here means
+ * the on-disk leaf does not match what relocation wrote, i.e.
+ * corruption. The other encoding fields are caught earlier by
+ * tree-checker's check_extent_data_item().
+ */
+ if (unlikely(btrfs_file_extent_offset(leaf, fi))) {
+ btrfs_print_leaf(leaf);
+ btrfs_err(fs_info,
+"unexpected non-zero offset in file extent item for data reloc inode %llu key offset %llu offset %llu",
+ btrfs_ino(BTRFS_I(reloc_inode)), bytenr,
+ btrfs_file_extent_offset(leaf, fi));
+ return -EUCLEAN;
+ }
if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
return -EINVAL;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1f15d0793a9c..8fc919dc08d0 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -296,6 +296,33 @@ static int check_extent_data_item(struct extent_buffer *leaf,
return 0;
}
+ /*
+ * For the data reloc tree, file extent items are written by
+ * relocation's own paths. The data reloc inode is created with
+ * BTRFS_INODE_NOCOMPRESS, so insert_ordered_extent_file_extent()
+ * always leaves the compression field at 0. Encryption and
+ * other_encoding are reserved-and-zero in btrfs. A non-zero value
+ * for any of these means the leaf decoded from disk does not match
+ * what the kernel wrote, i.e. on-disk corruption.
+ *
+ * The file_extent_item's offset field is NOT a universal invariant
+ * here: partial-PREALLOC writebacks legitimately produce REG items
+ * with non-zero offset at non-boundary keys. The offset check is
+ * performed at the call site in get_new_location(), which only
+ * inspects cluster-boundary keys where offset is always 0.
+ */
+ if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+ (btrfs_file_extent_compression(leaf, fi) ||
+ btrfs_file_extent_encryption(leaf, fi) ||
+ btrfs_file_extent_other_encoding(leaf, fi)))) {
+ file_extent_err(leaf, slot,
+"invalid encoding fields for data reloc tree, compression=%u encryption=%u other_encoding=%u",
+ btrfs_file_extent_compression(leaf, fi),
+ btrfs_file_extent_encryption(leaf, fi),
+ btrfs_file_extent_other_encoding(leaf, fi));
+ return -EUCLEAN;
+ }
+
/* Regular or preallocated extent has fixed item size */
if (unlikely(item_size != sizeof(*fi))) {
file_extent_err(leaf, slot,
base-commit: 6bf684b8823552b99c86bf791b22f622934ee771
--
2.54.0
prev parent reply other threads:[~2026-05-13 11:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-25 6:10 [PATCH] btrfs: replace BUG_ON() with error return in get_new_location() Teng Liu
2026-04-25 8:06 ` Qu Wenruo
2026-04-26 20:16 ` [PATCH v2] " Teng Liu
2026-04-27 1:19 ` Qu Wenruo
2026-04-27 13:50 ` David Sterba
2026-04-27 20:24 ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
2026-04-27 22:15 ` Qu Wenruo
2026-04-28 0:44 ` Qu Wenruo
2026-04-28 15:29 ` David Sterba
2026-04-28 9:03 ` Johannes Thumshirn
2026-05-03 15:35 ` Teng Liu
2026-05-03 22:36 ` Qu Wenruo
2026-05-13 11:35 ` Teng Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260513113553.213959-1-27rabbitlt@gmail.com \
--to=27rabbitlt@gmail.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox