Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH] btrfs: replace BUG_ON() with error return in get_new_location()
@ 2026-04-25  6:10 Teng Liu
  2026-04-25  8:06 ` Qu Wenruo
  2026-04-26 20:16 ` [PATCH v2] " Teng Liu
  0 siblings, 2 replies; 13+ messages in thread
From: Teng Liu @ 2026-04-25  6:10 UTC (permalink / raw)
  To: linux-btrfs
  Cc: dsterba, clm, linux-kernel, Teng Liu, syzbot+3e20d8f3d41bac5dc9a2

In get_new_location(), BUG_ON() crashes the kernel if the looked up
file extent item has any of offset, compression, encryption, or other
encoding set. While entries created by the relocation code itself are
not expected to have these fields set, the values come from on-disk
data and a malformed file system can reach this code with non-zero
values, panicking the kernel during a balance operation.

Replace the BUG_ON() with a return of -EUCLEAN, the established error
code in fs/btrfs/relocation.c for filesystem corruption. The caller in
replace_file_extents() already handles errors from get_new_location()
by breaking out of the loop without aborting the transaction so no
caller changes are needed.

Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
 fs/btrfs/relocation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..ce751c35945f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -835,10 +835,11 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	if (unlikely(btrfs_file_extent_offset(leaf, fi) ||
+		     btrfs_file_extent_compression(leaf, fi) ||
+		     btrfs_file_extent_encryption(leaf, fi) ||
+		     btrfs_file_extent_other_encoding(leaf, fi)))
+		return -EUCLEAN;
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] btrfs: replace BUG_ON() with error return in get_new_location()
  2026-04-25  6:10 [PATCH] btrfs: replace BUG_ON() with error return in get_new_location() Teng Liu
@ 2026-04-25  8:06 ` Qu Wenruo
  2026-04-26 20:16 ` [PATCH v2] " Teng Liu
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2026-04-25  8:06 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs
  Cc: dsterba, clm, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2



在 2026/4/25 15:40, Teng Liu 写道:
> In get_new_location(), BUG_ON() crashes the kernel if the looked up
> file extent item has any of offset, compression, encryption, or other
> encoding set. While entries created by the relocation code itself are
> not expected to have these fields set, the values come from on-disk
> data and a malformed file system can reach this code with non-zero
> values, panicking the kernel during a balance operation.
> 
> Replace the BUG_ON() with a return of -EUCLEAN, the established error
> code in fs/btrfs/relocation.c for filesystem corruption. The caller in
> replace_file_extents() already handles errors from get_new_location()
> by breaking out of the loop without aborting the transaction so no
> caller changes are needed.
> 
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> ---
>   fs/btrfs/relocation.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..ce751c35945f 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -835,10 +835,11 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	if (unlikely(btrfs_file_extent_offset(leaf, fi) ||
> +		     btrfs_file_extent_compression(leaf, fi) ||
> +		     btrfs_file_extent_encryption(leaf, fi) ||
> +		     btrfs_file_extent_other_encoding(leaf, fi)))
> +		return -EUCLEAN;

Every EUCLEAN should be paired with an error message.

And in this particular case, it's better to provide the tree dump, to 
really be sure that such data is corrupted other than caused by some bugs.

>   
>   	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>   		return -EINVAL;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] btrfs: replace BUG_ON() with error return in get_new_location()
  2026-04-25  6:10 [PATCH] btrfs: replace BUG_ON() with error return in get_new_location() Teng Liu
  2026-04-25  8:06 ` Qu Wenruo
@ 2026-04-26 20:16 ` Teng Liu
  2026-04-27  1:19   ` Qu Wenruo
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
  1 sibling, 2 replies; 13+ messages in thread
From: Teng Liu @ 2026-04-26 20:16 UTC (permalink / raw)
  To: linux-btrfs
  Cc: dsterba, clm, wqu, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2,
	Teng Liu, wqu, syzbot+3e20d8f3d41bac5dc9a2

In get_new_location(), BUG_ON() crashes the kernel if the looked up
file extent item has any of offset, compression, encryption, or other
encoding set. The data reloc inode this lookup targets is populated
solely by relocation's own prealloc and writeback paths:

  - insert_prealloc_file_extent() memsets the stack item to 0 and only
    sets type/disk_bytenr/disk_num_bytes/num_bytes, so the four checked
    fields stay 0.
  - insert_ordered_extent_file_extent() copies oe->compress_type into
    file_extent compression, but the data reloc inode is created with
    BTRFS_INODE_NOCOMPRESS, so compress_type is always 0; encryption
    and other_encoding are reserved-and-zero per the comment in that
    function.

So observing non-zero compression, encryption or other_encoding here
should on-disk corruption. A malformed image can reach this code
through a balance operation and panic the kernel.

Replace the BUG_ON() with a return of -EUCLEAN, the established error
code in fs/btrfs/relocation.c for filesystem corruption, and pair it
with btrfs_print_leaf() and btrfs_err() as suggested by Qu allowing dmesg
to log the necessary information. The caller in replace_file_extents() 
already handles errors from get_new_location() by breaking out of the loop 
without aborting the transaction, so no caller changes are needed.

Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v2:
 - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
   the offending leaf is dumped to dmesg, per Qu's review:
   https://lore.kernel.org/linux-btrfs/6c54901d-5e07-4c46-9553-997b28c93b86@suse.com/
 - Expand the changelog to argue why non-zero compression/encryption/
   other_encoding in the data reloc inode imply on-disk corruption
   rather than a kernel bug.

 fs/btrfs/relocation.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..bba28866df1c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -814,6 +814,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 			    u64 bytenr, u64 num_bytes)
 {
 	struct btrfs_root *root = BTRFS_I(reloc_inode)->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	BTRFS_PATH_AUTO_FREE(path);
 	struct btrfs_file_extent_item *fi;
 	struct extent_buffer *leaf;
@@ -835,10 +836,16 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	if (unlikely(btrfs_file_extent_offset(leaf, fi) ||
+		     btrfs_file_extent_compression(leaf, fi) ||
+		     btrfs_file_extent_encryption(leaf, fi) ||
+		     btrfs_file_extent_other_encoding(leaf, fi))) {
+		btrfs_print_leaf(leaf);
+		btrfs_err(fs_info,
+"unexpected non-zero fields in file extent item for data reloc inode %llu offset %llu (offset/compression/encryption/other_encoding must all be 0)",
+			  btrfs_ino(BTRFS_I(reloc_inode)), bytenr);
+		return -EUCLEAN;
+	}
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] btrfs: replace BUG_ON() with error return in get_new_location()
  2026-04-26 20:16 ` [PATCH v2] " Teng Liu
@ 2026-04-27  1:19   ` Qu Wenruo
  2026-04-27 13:50     ` David Sterba
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-04-27  1:19 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs
  Cc: dsterba, clm, wqu, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2,
	syzbot+3e20d8f3d41bac5dc9a2



在 2026/4/27 05:46, Teng Liu 写道:
> In get_new_location(), BUG_ON() crashes the kernel if the looked up
> file extent item has any of offset, compression, encryption, or other
> encoding set. The data reloc inode this lookup targets is populated
> solely by relocation's own prealloc and writeback paths:
> 
>    - insert_prealloc_file_extent() memsets the stack item to 0 and only
>      sets type/disk_bytenr/disk_num_bytes/num_bytes, so the four checked
>      fields stay 0.
>    - insert_ordered_extent_file_extent() copies oe->compress_type into
>      file_extent compression, but the data reloc inode is created with
>      BTRFS_INODE_NOCOMPRESS, so compress_type is always 0; encryption
>      and other_encoding are reserved-and-zero per the comment in that
>      function.
> 
> So observing non-zero compression, encryption or other_encoding here
> should on-disk corruption. A malformed image can reach this code
> through a balance operation and panic the kernel.
> 
> Replace the BUG_ON() with a return of -EUCLEAN, the established error
> code in fs/btrfs/relocation.c for filesystem corruption, and pair it
> with btrfs_print_leaf() and btrfs_err() as suggested by Qu allowing dmesg
> to log the necessary information. The caller in replace_file_extents()
> already handles errors from get_new_location() by breaking out of the loop
> without aborting the transaction, so no caller changes are needed.
> 
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> ---
> Changes in v2:
>   - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
>     the offending leaf is dumped to dmesg, per Qu's review:
>     https://lore.kernel.org/linux-btrfs/6c54901d-5e07-4c46-9553-997b28c93b86@suse.com/
>   - Expand the changelog to argue why non-zero compression/encryption/
>     other_encoding in the data reloc inode imply on-disk corruption
>     rather than a kernel bug.
> 
>   fs/btrfs/relocation.c | 15 +++++++++++----
>   1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..bba28866df1c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -814,6 +814,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   			    u64 bytenr, u64 num_bytes)
>   {
>   	struct btrfs_root *root = BTRFS_I(reloc_inode)->root;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
>   	BTRFS_PATH_AUTO_FREE(path);
>   	struct btrfs_file_extent_item *fi;
>   	struct extent_buffer *leaf;
> @@ -835,10 +836,16 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	if (unlikely(btrfs_file_extent_offset(leaf, fi) ||
> +		     btrfs_file_extent_compression(leaf, fi) ||
> +		     btrfs_file_extent_encryption(leaf, fi) ||
> +		     btrfs_file_extent_other_encoding(leaf, fi))) {
> +		btrfs_print_leaf(leaf);
> +		btrfs_err(fs_info,
> +"unexpected non-zero fields in file extent item for data reloc inode %llu offset %llu (offset/compression/encryption/other_encoding must all be 0)",
> +			  btrfs_ino(BTRFS_I(reloc_inode)), bytenr);
> +		return -EUCLEAN;

This looks like exactly what we did in tree-checker.

And we have enough info to do this in tree-checker, the data reloc tree 
has a fixed id and owner in extent buffer header.

What about moving this to tree-checker where we already have 
check_extent_data_item().

Furthermore, if you move this check into tree-checker, it's better to 
print the offset/compress/encryption/other_encoding values.
And remove the "must all be 0", just something like:

file_extent_err(leaf, slot,
		"invalid members for data reloc tree, offset=%llu compress=%llu 
encryption=%llu other_encoding=%llu",
		...);


> +	}
>   
>   	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>   		return -EINVAL;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] btrfs: replace BUG_ON() with error return in get_new_location()
  2026-04-27  1:19   ` Qu Wenruo
@ 2026-04-27 13:50     ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2026-04-27 13:50 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Teng Liu, linux-btrfs, dsterba, clm, wqu, linux-kernel,
	syzbot+3e20d8f3d41bac5dc9a2, syzbot+3e20d8f3d41bac5dc9a2

On Mon, Apr 27, 2026 at 10:49:26AM +0930, Qu Wenruo wrote:
> > @@ -835,10 +836,16 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
> >   	fi = btrfs_item_ptr(leaf, path->slots[0],
> >   			    struct btrfs_file_extent_item);
> >   
> > -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> > -	       btrfs_file_extent_compression(leaf, fi) ||
> > -	       btrfs_file_extent_encryption(leaf, fi) ||
> > -	       btrfs_file_extent_other_encoding(leaf, fi));
> > +	if (unlikely(btrfs_file_extent_offset(leaf, fi) ||
> > +		     btrfs_file_extent_compression(leaf, fi) ||
> > +		     btrfs_file_extent_encryption(leaf, fi) ||
> > +		     btrfs_file_extent_other_encoding(leaf, fi))) {
> > +		btrfs_print_leaf(leaf);
> > +		btrfs_err(fs_info,
> > +"unexpected non-zero fields in file extent item for data reloc inode %llu offset %llu (offset/compression/encryption/other_encoding must all be 0)",
> > +			  btrfs_ino(BTRFS_I(reloc_inode)), bytenr);
> > +		return -EUCLEAN;
> 
> This looks like exactly what we did in tree-checker.
> 
> And we have enough info to do this in tree-checker, the data reloc tree 
> has a fixed id and owner in extent buffer header.
> 
> What about moving this to tree-checker where we already have 
> check_extent_data_item().

I think it would be better in the tree-checker, as this verifies the
item as it's read and not after the relocatikon is started.

The BUG_ON can be turned to an assertion though it duplicates some of
the checks, but this is otherwise harmless and cheap.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-26 20:16 ` [PATCH v2] " Teng Liu
  2026-04-27  1:19   ` Qu Wenruo
@ 2026-04-27 20:24   ` Teng Liu
  2026-04-27 22:15     ` Qu Wenruo
                       ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Teng Liu @ 2026-04-27 20:24 UTC (permalink / raw)
  To: linux-btrfs
  Cc: dsterba, clm, wqu, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2,
	Teng Liu

get_new_location() uses BUG_ON() to crash the kernel if the file extent
item it looks up has any of offset, compression, encryption, or
other_encoding set. The data reloc inode is only written by relocation's
own paths -- insert_prealloc_file_extent() and
insert_ordered_extent_file_extent() -- which always leave those four
fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
and encryption/other_encoding are reserved-and-zero). Observing a
non-zero value therefore means the leaf decoded from disk does not match
what the kernel wrote, i.e. on-disk corruption. A malformed image can
reach this code via balance and panic the kernel.

Move the validation into tree-checker's check_extent_data_item(), where
the constraint is enforced when the leaf is read off disk rather than
after relocation has already started. The data reloc tree has a fixed
root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
header, so check_extent_data_item() has all the information it needs to
apply this check on its own. Report violations via file_extent_err() and
print the four offending values.

In get_new_location() replace the BUG_ON() with an ASSERT().
The caller in replace_file_extents() already handles non-zero returns from
get_new_location() by breaking out of the loop without aborting the
transaction, so no caller changes are needed.

Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: David Sterba <dsterba@suse.com>
Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v3:
 - Move the corruption check from relocation.c into tree-checker's
   check_extent_data_item(), per Qu and David. The data reloc tree's
   fixed objectid is recorded in the extent buffer header, so the
   check has all the context it needs at read time.
 - Use file_extent_err() and print offset/compression/encryption/
   other_encoding values, per Qu.
 - Replace the BUG_ON in get_new_location() with ASSERT() rather than
   -EUCLEAN, per David.

Changes in v2:
 - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
   the offending leaf is dumped to dmesg, per Qu's review of v1.
 - Expand the changelog to argue why non-zero
   compression/encryption/other_encoding in the data reloc inode imply
   on-disk corruption.

 fs/btrfs/relocation.c   |  8 ++++----
 fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..527d4dbfe31c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
+	       !btrfs_file_extent_compression(leaf, fi) &&
+	       !btrfs_file_extent_encryption(leaf, fi) &&
+	       !btrfs_file_extent_other_encoding(leaf, fi));
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1f15d0793a9c..e4864f7a471e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -296,6 +296,27 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 		return 0;
 	}
 
+	/*
+	 * For the data reloc tree, file extent items are written by
+	 * relocation's own paths, which always leave offset, compression,
+	 * encryption and other_encoding as 0. Any non-zero value here means
+	 * the leaf decoded from disk does not match what the kernel wrote,
+	 * i.e. on-disk corruption.
+	 */
+	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		     (btrfs_file_extent_offset(leaf, fi) ||
+		      btrfs_file_extent_compression(leaf, fi) ||
+		      btrfs_file_extent_encryption(leaf, fi) ||
+		      btrfs_file_extent_other_encoding(leaf, fi)))) {
+		file_extent_err(leaf, slot,
+"invalid members for data reloc tree, offset=%llu compress=%u encryption=%u other_encoding=%u",
+				btrfs_file_extent_offset(leaf, fi),
+				btrfs_file_extent_compression(leaf, fi),
+				btrfs_file_extent_encryption(leaf, fi),
+				btrfs_file_extent_other_encoding(leaf, fi));
+		return -EUCLEAN;
+	}
+
 	/* Regular or preallocated extent has fixed item size */
 	if (unlikely(item_size != sizeof(*fi))) {
 		file_extent_err(leaf, slot,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
@ 2026-04-27 22:15     ` Qu Wenruo
  2026-04-28  0:44       ` Qu Wenruo
  2026-04-28  9:03     ` Johannes Thumshirn
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-04-27 22:15 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs
  Cc: dsterba, clm, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2



在 2026/4/28 05:54, Teng Liu 写道:
> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> item it looks up has any of offset, compression, encryption, or
> other_encoding set. The data reloc inode is only written by relocation's
> own paths -- insert_prealloc_file_extent() and
> insert_ordered_extent_file_extent() -- which always leave those four
> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
> and encryption/other_encoding are reserved-and-zero). Observing a
> non-zero value therefore means the leaf decoded from disk does not match
> what the kernel wrote, i.e. on-disk corruption. A malformed image can
> reach this code via balance and panic the kernel.
> 
> Move the validation into tree-checker's check_extent_data_item(), where
> the constraint is enforced when the leaf is read off disk rather than
> after relocation has already started. The data reloc tree has a fixed
> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
> header, so check_extent_data_item() has all the information it needs to
> apply this check on its own. Report violations via file_extent_err() and
> print the four offending values.
> 
> In get_new_location() replace the BUG_ON() with an ASSERT().
> The caller in replace_file_extents() already handles non-zero returns from
> get_new_location() by breaking out of the loop without aborting the
> transaction, so no caller changes are needed.
> 
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Suggested-by: David Sterba <dsterba@suse.com>
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

And merged.

Thanks,
Qu

> ---
> Changes in v3:
>   - Move the corruption check from relocation.c into tree-checker's
>     check_extent_data_item(), per Qu and David. The data reloc tree's
>     fixed objectid is recorded in the extent buffer header, so the
>     check has all the context it needs at read time.
>   - Use file_extent_err() and print offset/compression/encryption/
>     other_encoding values, per Qu.
>   - Replace the BUG_ON in get_new_location() with ASSERT() rather than
>     -EUCLEAN, per David.
> 
> Changes in v2:
>   - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
>     the offending leaf is dumped to dmesg, per Qu's review of v1.
>   - Expand the changelog to argue why non-zero
>     compression/encryption/other_encoding in the data reloc inode imply
>     on-disk corruption.
> 
>   fs/btrfs/relocation.c   |  8 ++++----
>   fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
>   2 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..527d4dbfe31c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
> +	       !btrfs_file_extent_compression(leaf, fi) &&
> +	       !btrfs_file_extent_encryption(leaf, fi) &&
> +	       !btrfs_file_extent_other_encoding(leaf, fi));
>   
>   	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>   		return -EINVAL;
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1f15d0793a9c..e4864f7a471e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -296,6 +296,27 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>   		return 0;
>   	}
>   
> +	/*
> +	 * For the data reloc tree, file extent items are written by
> +	 * relocation's own paths, which always leave offset, compression,
> +	 * encryption and other_encoding as 0. Any non-zero value here means
> +	 * the leaf decoded from disk does not match what the kernel wrote,
> +	 * i.e. on-disk corruption.
> +	 */
> +	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
> +		     (btrfs_file_extent_offset(leaf, fi) ||
> +		      btrfs_file_extent_compression(leaf, fi) ||
> +		      btrfs_file_extent_encryption(leaf, fi) ||
> +		      btrfs_file_extent_other_encoding(leaf, fi)))) {
> +		file_extent_err(leaf, slot,
> +"invalid members for data reloc tree, offset=%llu compress=%u encryption=%u other_encoding=%u",
> +				btrfs_file_extent_offset(leaf, fi),
> +				btrfs_file_extent_compression(leaf, fi),
> +				btrfs_file_extent_encryption(leaf, fi),
> +				btrfs_file_extent_other_encoding(leaf, fi));
> +		return -EUCLEAN;
> +	}
> +
>   	/* Regular or preallocated extent has fixed item size */
>   	if (unlikely(item_size != sizeof(*fi))) {
>   		file_extent_err(leaf, slot,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-27 22:15     ` Qu Wenruo
@ 2026-04-28  0:44       ` Qu Wenruo
  2026-04-28 15:29         ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-04-28  0:44 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs
  Cc: dsterba, clm, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2



在 2026/4/28 07:45, Qu Wenruo 写道:
> 
> 
> 在 2026/4/28 05:54, Teng Liu 写道:
>> get_new_location() uses BUG_ON() to crash the kernel if the file extent
>> item it looks up has any of offset, compression, encryption, or
>> other_encoding set. The data reloc inode is only written by relocation's
>> own paths -- insert_prealloc_file_extent() and
>> insert_ordered_extent_file_extent() -- which always leave those four
>> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
>> and encryption/other_encoding are reserved-and-zero). Observing a
>> non-zero value therefore means the leaf decoded from disk does not match
>> what the kernel wrote, i.e. on-disk corruption. A malformed image can
>> reach this code via balance and panic the kernel.
>>
>> Move the validation into tree-checker's check_extent_data_item(), where
>> the constraint is enforced when the leaf is read off disk rather than
>> after relocation has already started. The data reloc tree has a fixed
>> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
>> header, so check_extent_data_item() has all the information it needs to
>> apply this check on its own. Report violations via file_extent_err() and
>> print the four offending values.
>>
>> In get_new_location() replace the BUG_ON() with an ASSERT().
>> The caller in replace_file_extents() already handles non-zero returns 
>> from
>> get_new_location() by breaking out of the loop without aborting the
>> transaction, so no caller changes are needed.
>>
>> Suggested-by: Qu Wenruo <wqu@suse.com>
>> Suggested-by: David Sterba <dsterba@suse.com>
>> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
>> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> And merged.

Unfortunately this tree-checker got triggered during btrfs/061 runs at 
write-time tree-checker, with arm64 64K page size.

The offending file extent is as the following:

[  536.885066] 	item 69 key (258 EXTENT_DATA 4063232) itemoff 12400 
itemsize 53
[  536.885067] 		generation 28 type 1
[  536.885067] 		extent data disk bytenr 10512723968 nr 36864
[  536.885068] 		extent data offset 24576 nr 12288 ram 36864
[  536.885069] 		extent compression 0

Note the offset is not zero, and the type is 1 which means it's a 
regular file extent.

So the check is causing false alerts.

Now the patch is reverted, and I'll spend more time digging into the case.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>> ---
>> Changes in v3:
>>   - Move the corruption check from relocation.c into tree-checker's
>>     check_extent_data_item(), per Qu and David. The data reloc tree's
>>     fixed objectid is recorded in the extent buffer header, so the
>>     check has all the context it needs at read time.
>>   - Use file_extent_err() and print offset/compression/encryption/
>>     other_encoding values, per Qu.
>>   - Replace the BUG_ON in get_new_location() with ASSERT() rather than
>>     -EUCLEAN, per David.
>>
>> Changes in v2:
>>   - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err() so
>>     the offending leaf is dumped to dmesg, per Qu's review of v1.
>>   - Expand the changelog to argue why non-zero
>>     compression/encryption/other_encoding in the data reloc inode imply
>>     on-disk corruption.
>>
>>   fs/btrfs/relocation.c   |  8 ++++----
>>   fs/btrfs/tree-checker.c | 21 +++++++++++++++++++++
>>   2 files changed, 25 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 1c42c5180bdd..527d4dbfe31c 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -835,10 +835,10 @@ static int get_new_location(struct inode 
>> *reloc_inode, u64 *new_bytenr,
>>       fi = btrfs_item_ptr(leaf, path->slots[0],
>>                   struct btrfs_file_extent_item);
>> -    BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
>> -           btrfs_file_extent_compression(leaf, fi) ||
>> -           btrfs_file_extent_encryption(leaf, fi) ||
>> -           btrfs_file_extent_other_encoding(leaf, fi));
>> +    ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
>> +           !btrfs_file_extent_compression(leaf, fi) &&
>> +           !btrfs_file_extent_encryption(leaf, fi) &&
>> +           !btrfs_file_extent_other_encoding(leaf, fi));
>>       if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
>>           return -EINVAL;
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index 1f15d0793a9c..e4864f7a471e 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -296,6 +296,27 @@ static int check_extent_data_item(struct 
>> extent_buffer *leaf,
>>           return 0;
>>       }
>> +    /*
>> +     * For the data reloc tree, file extent items are written by
>> +     * relocation's own paths, which always leave offset, compression,
>> +     * encryption and other_encoding as 0. Any non-zero value here means
>> +     * the leaf decoded from disk does not match what the kernel wrote,
>> +     * i.e. on-disk corruption.
>> +     */
>> +    if (unlikely(btrfs_header_owner(leaf) == 
>> BTRFS_DATA_RELOC_TREE_OBJECTID &&
>> +             (btrfs_file_extent_offset(leaf, fi) ||
>> +              btrfs_file_extent_compression(leaf, fi) ||
>> +              btrfs_file_extent_encryption(leaf, fi) ||
>> +              btrfs_file_extent_other_encoding(leaf, fi)))) {
>> +        file_extent_err(leaf, slot,
>> +"invalid members for data reloc tree, offset=%llu compress=%u 
>> encryption=%u other_encoding=%u",
>> +                btrfs_file_extent_offset(leaf, fi),
>> +                btrfs_file_extent_compression(leaf, fi),
>> +                btrfs_file_extent_encryption(leaf, fi),
>> +                btrfs_file_extent_other_encoding(leaf, fi));
>> +        return -EUCLEAN;
>> +    }
>> +
>>       /* Regular or preallocated extent has fixed item size */
>>       if (unlikely(item_size != sizeof(*fi))) {
>>           file_extent_err(leaf, slot,
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
  2026-04-27 22:15     ` Qu Wenruo
@ 2026-04-28  9:03     ` Johannes Thumshirn
  2026-05-03 15:35     ` Teng Liu
  2026-05-13 11:35     ` [PATCH v4] btrfs: validate data reloc tree file extent item members Teng Liu
  3 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2026-04-28  9:03 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs
  Cc: dsterba, clm, wqu, linux-kernel, syzbot+3e20d8f3d41bac5dc9a2

On 4/27/26 10:24 PM, Teng Liu wrote:
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 1c42c5180bdd..527d4dbfe31c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -835,10 +835,10 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
>   	fi = btrfs_item_ptr(leaf, path->slots[0],
>   			    struct btrfs_file_extent_item);
>   
> -	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
> -	       btrfs_file_extent_compression(leaf, fi) ||
> -	       btrfs_file_extent_encryption(leaf, fi) ||
> -	       btrfs_file_extent_other_encoding(leaf, fi));
> +	ASSERT(!btrfs_file_extent_offset(leaf, fi) &&
> +	       !btrfs_file_extent_compression(leaf, fi) &&
> +	       !btrfs_file_extent_encryption(leaf, fi) &&
> +	       !btrfs_file_extent_other_encoding(leaf, fi));

Can you split that into multiple ASSERT()s? So we quickly see which one 
actually triggered.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-28  0:44       ` Qu Wenruo
@ 2026-04-28 15:29         ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2026-04-28 15:29 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Teng Liu, linux-btrfs, dsterba, clm, linux-kernel,
	syzbot+3e20d8f3d41bac5dc9a2

On Tue, Apr 28, 2026 at 10:14:40AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2026/4/28 07:45, Qu Wenruo 写道:
> > 
> > 
> > 在 2026/4/28 05:54, Teng Liu 写道:
> >> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> >> item it looks up has any of offset, compression, encryption, or
> >> other_encoding set. The data reloc inode is only written by relocation's
> >> own paths -- insert_prealloc_file_extent() and
> >> insert_ordered_extent_file_extent() -- which always leave those four
> >> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
> >> and encryption/other_encoding are reserved-and-zero). Observing a
> >> non-zero value therefore means the leaf decoded from disk does not match
> >> what the kernel wrote, i.e. on-disk corruption. A malformed image can
> >> reach this code via balance and panic the kernel.
> >>
> >> Move the validation into tree-checker's check_extent_data_item(), where
> >> the constraint is enforced when the leaf is read off disk rather than
> >> after relocation has already started. The data reloc tree has a fixed
> >> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
> >> header, so check_extent_data_item() has all the information it needs to
> >> apply this check on its own. Report violations via file_extent_err() and
> >> print the four offending values.
> >>
> >> In get_new_location() replace the BUG_ON() with an ASSERT().
> >> The caller in replace_file_extents() already handles non-zero returns 
> >> from
> >> get_new_location() by breaking out of the loop without aborting the
> >> transaction, so no caller changes are needed.
> >>
> >> Suggested-by: Qu Wenruo <wqu@suse.com>
> >> Suggested-by: David Sterba <dsterba@suse.com>
> >> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> >> Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
> > 
> > Reviewed-by: Qu Wenruo <wqu@suse.com>
> > 
> > And merged.
> 
> Unfortunately this tree-checker got triggered during btrfs/061 runs at 
> write-time tree-checker, with arm64 64K page size.
> 
> The offending file extent is as the following:
> 
> [  536.885066] 	item 69 key (258 EXTENT_DATA 4063232) itemoff 12400 
> itemsize 53
> [  536.885067] 		generation 28 type 1
> [  536.885067] 		extent data disk bytenr 10512723968 nr 36864
> [  536.885068] 		extent data offset 24576 nr 12288 ram 36864
> [  536.885069] 		extent compression 0
> 
> Note the offset is not zero, and the type is 1 which means it's a 
> regular file extent.
> 
> So the check is causing false alerts.

I maybe have an idea.  The difference from the BUG_ON and the
tree-checker is the context where it's called. In relocation it's
somewere in the middle and there are actions fixing up the offset. OTOH
when this is done in tree-checker the constraints are different.

  get_new_location() - verifies offset, compression, ...

The offset corresponds to 'bytenr' and is returned via *new_bytenr to
replace_file_extents() and then updated in the leaf

  btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr);

This eventually ends up in in the pre-write check.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
  2026-04-27 22:15     ` Qu Wenruo
  2026-04-28  9:03     ` Johannes Thumshirn
@ 2026-05-03 15:35     ` Teng Liu
  2026-05-03 22:36       ` Qu Wenruo
  2026-05-13 11:35     ` [PATCH v4] btrfs: validate data reloc tree file extent item members Teng Liu
  3 siblings, 1 reply; 13+ messages in thread
From: Teng Liu @ 2026-05-03 15:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: wqu, dsterba

On 2026-04-27 22:24, Teng Liu wrote:
> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> item it looks up has any of offset, compression, encryption, or
> other_encoding set. The data reloc inode is only written by relocation's
> own paths -- insert_prealloc_file_extent() and
> insert_ordered_extent_file_extent() -- which always leave those four
> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
> and encryption/other_encoding are reserved-and-zero). Observing a
> non-zero value therefore means the leaf decoded from disk does not match
> what the kernel wrote, i.e. on-disk corruption. A malformed image can
> reach this code via balance and panic the kernel.
> 
> Move the validation into tree-checker's check_extent_data_item(), where
> the constraint is enforced when the leaf is read off disk rather than
> after relocation has already started. The data reloc tree has a fixed
> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
> header, so check_extent_data_item() has all the information it needs to
> apply this check on its own. Report violations via file_extent_err() and
> print the four offending values.
> 
> In get_new_location() replace the BUG_ON() with an ASSERT().
> The caller in replace_file_extents() already handles non-zero returns from
> get_new_location() by breaking out of the loop without aborting the
> transaction, so no caller changes are needed.
> 

Hi Qu, David, and all,

I spent some time trying to understand why the tree-checker saw non-zero
offset while BUG_ON never hit. Share some observation here in case it's
helpful and also ask for further advice about how to make the patch
correct.

Short answer: I noticed that get_new_location() and the tree-checker
look at different itmes in the data reloc tree. The tree-chcker
validates every item pre-write; while get_new_location() searches only
exact cluster boundary key, so the offset is always 0.

== How I tested ==

I instrumented both check_extent_data_item() (tree-checker) and
get_new_location() with pr_info() to log ino+key_offset+offset for every
item. Then ran btrfs/061 on qemu. 

== Some Evidence ==

Tree-checker hits (items with non-zero fields in the data reloc tree):
[   63.531400] tree-checker HIT: ino=258 key_off=13303808 type=1 offset=102400
              compress=0 encrypt=0 other=0

Printing out the leaf items (when tree-checker hit an item with non-zero
offset) and we found:
[   63.540541]  item 114 key (258 EXTENT_DATA 13201408) itemoff 10188 itemsize 53
[   63.540549]          generation 22 type 2
[   63.540554]          extent data disk bytenr 4566134784 nr 110592
[   63.540558]          extent data offset 0 nr 102400 ram 110592
[   63.540613]          extent compression 0
[   63.540619]  item 115 key (258 EXTENT_DATA 13303808) itemoff 10135 itemsize 53
[   63.540644]          generation 22 type 1
[   63.540649]          extent data disk bytenr 4566134784 nr 110592
[   63.540744]          extent data offset 102400 nr 8192 ram 110592
[   63.540750]          extent compression 0

Looking for the two 13201408 and 13303808 items in get_new_location:

get_new_location() search keys:
  ino=258 key_off=13201408 offset=0 type=1
  ...

We didn't see 13303808 item at all during get_new_location.
get_new_location searched for key_off=13201408 (the cluster boundary) and
found offset=0.  It never searched for 13303808.

=== Why this happens ===

During the MOVE_DATA_EXTENTS phase:

1. prealloc_file_extent_cluster() inserts one PREALLOC per cluster
   boundary (via insert_prealloc_file_extent), all with offset=0.

2. relocate_one_folio() reads source data and marks pages dirty.  On
   writeback, ordered extents are created.  When an ordered extent
   completes, insert_ordered_extent_file_extent() calls
   insert_reserved_file_extent(), which calls btrfs_drop_extents() to
   carve out the written range from the PREALLOC, splitting it.  The
   right-hand piece (REG) gets the ordered extent's offset into the
   shared disk_bytenr -- which can be non-zero.

3. The pre-write tree-checker fires when the modified leaf is written
   to disk, seeing the REG item with offset!=0.  This is the false
   positive.

4. btrfs_wait_ordered_range() waits for all ordered extents to finish,
   then the stage switches to UPDATE_DATA_PTRS.  replace_file_extents()
   calls get_new_location(src_disk_bytenr) for each source extent.
   The search key is:

       bytenr -= reloc_block_group_start;
       btrfs_lookup_file_extent(..., ino, bytenr, 0);

   This is the cluster boundary -- the start of an extent, where offset
   is always 0.  get_new_location never searches the mid-range keys
   where the split pieces live.

== Advice Needed ==

If my analysis is correct, what do you think is the best way to fix this
problem? Or am I missing something somewhere?

Thanks in advance,
Teng Liu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker
  2026-05-03 15:35     ` Teng Liu
@ 2026-05-03 22:36       ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2026-05-03 22:36 UTC (permalink / raw)
  To: Teng Liu, linux-btrfs; +Cc: dsterba



在 2026/5/4 01:05, Teng Liu 写道:
> On 2026-04-27 22:24, Teng Liu wrote:
>> get_new_location() uses BUG_ON() to crash the kernel if the file extent
>> item it looks up has any of offset, compression, encryption, or
>> other_encoding set. The data reloc inode is only written by relocation's
>> own paths -- insert_prealloc_file_extent() and
>> insert_ordered_extent_file_extent() -- which always leave those four
>> fields at 0 (the data reloc inode is created with BTRFS_INODE_NOCOMPRESS,
>> and encryption/other_encoding are reserved-and-zero). Observing a
>> non-zero value therefore means the leaf decoded from disk does not match
>> what the kernel wrote, i.e. on-disk corruption. A malformed image can
>> reach this code via balance and panic the kernel.
>>
>> Move the validation into tree-checker's check_extent_data_item(), where
>> the constraint is enforced when the leaf is read off disk rather than
>> after relocation has already started. The data reloc tree has a fixed
>> root id (BTRFS_DATA_RELOC_TREE_OBJECTID) recorded in the extent buffer
>> header, so check_extent_data_item() has all the information it needs to
>> apply this check on its own. Report violations via file_extent_err() and
>> print the four offending values.
>>
>> In get_new_location() replace the BUG_ON() with an ASSERT().
>> The caller in replace_file_extents() already handles non-zero returns from
>> get_new_location() by breaking out of the loop without aborting the
>> transaction, so no caller changes are needed.
>>
> 
> Hi Qu, David, and all,
> 
> I spent some time trying to understand why the tree-checker saw non-zero
> offset while BUG_ON never hit. Share some observation here in case it's
> helpful and also ask for further advice about how to make the patch
> correct.
> 
> Short answer: I noticed that get_new_location() and the tree-checker
> look at different itmes in the data reloc tree. The tree-chcker
> validates every item pre-write; while get_new_location() searches only
> exact cluster boundary key, so the offset is always 0.
> 
> == How I tested ==
> 
> I instrumented both check_extent_data_item() (tree-checker) and
> get_new_location() with pr_info() to log ino+key_offset+offset for every
> item. Then ran btrfs/061 on qemu.
> 
> == Some Evidence ==
> 
> Tree-checker hits (items with non-zero fields in the data reloc tree):
> [   63.531400] tree-checker HIT: ino=258 key_off=13303808 type=1 offset=102400
>                compress=0 encrypt=0 other=0
> 
> Printing out the leaf items (when tree-checker hit an item with non-zero
> offset) and we found:
> [   63.540541]  item 114 key (258 EXTENT_DATA 13201408) itemoff 10188 itemsize 53
> [   63.540549]          generation 22 type 2
> [   63.540554]          extent data disk bytenr 4566134784 nr 110592
> [   63.540558]          extent data offset 0 nr 102400 ram 110592
> [   63.540613]          extent compression 0
> [   63.540619]  item 115 key (258 EXTENT_DATA 13303808) itemoff 10135 itemsize 53
> [   63.540644]          generation 22 type 1
> [   63.540649]          extent data disk bytenr 4566134784 nr 110592
> [   63.540744]          extent data offset 102400 nr 8192 ram 110592
> [   63.540750]          extent compression 0
> 
> Looking for the two 13201408 and 13303808 items in get_new_location:
> 
> get_new_location() search keys:
>    ino=258 key_off=13201408 offset=0 type=1
>    ...
> 
> We didn't see 13303808 item at all during get_new_location.
> get_new_location searched for key_off=13201408 (the cluster boundary) and
> found offset=0.  It never searched for 13303808.
> 
> === Why this happens ===
> 
> During the MOVE_DATA_EXTENTS phase:
> 
> 1. prealloc_file_extent_cluster() inserts one PREALLOC per cluster
>     boundary (via insert_prealloc_file_extent), all with offset=0.
> 
> 2. relocate_one_folio() reads source data and marks pages dirty.  On
>     writeback, ordered extents are created.  When an ordered extent
>     completes, insert_ordered_extent_file_extent() calls
>     insert_reserved_file_extent(), which calls btrfs_drop_extents() to
>     carve out the written range from the PREALLOC, splitting it.  The
>     right-hand piece (REG) gets the ordered extent's offset into the
>     shared disk_bytenr -- which can be non-zero.
> 
> 3. The pre-write tree-checker fires when the modified leaf is written
>     to disk, seeing the REG item with offset!=0.  This is the false
>     positive.
> 
> 4. btrfs_wait_ordered_range() waits for all ordered extents to finish,
>     then the stage switches to UPDATE_DATA_PTRS.  replace_file_extents()
>     calls get_new_location(src_disk_bytenr) for each source extent.
>     The search key is:
> 
>         bytenr -= reloc_block_group_start;
>         btrfs_lookup_file_extent(..., ino, bytenr, 0);
> 
>     This is the cluster boundary -- the start of an extent, where offset
>     is always 0.  get_new_location never searches the mid-range keys
>     where the split pieces live.
> 
> == Advice Needed ==
> 
> If my analysis is correct, what do you think is the best way to fix this
> problem? Or am I missing something somewhere?

Make the tree-checker to reject compress/encoded extents in data reloc tree.

And add proper handling with leaf dump for relocation when an non-zero 
offset is hit.

> 
> Thanks in advance,
> Teng Liu
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4] btrfs: validate data reloc tree file extent item members
  2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
                       ` (2 preceding siblings ...)
  2026-05-03 15:35     ` Teng Liu
@ 2026-05-13 11:35     ` Teng Liu
  3 siblings, 0 replies; 13+ messages in thread
From: Teng Liu @ 2026-05-13 11:35 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Teng Liu, dsterba, clm, wqu, linux-kernel,
	syzbot+3e20d8f3d41bac5dc9a2

get_new_location() uses BUG_ON() to crash the kernel if the file extent
item it looks up has any of offset, compression, encryption, or
other_encoding set non-zero. The data reloc inode is only written by
relocation's own paths and the four fields are always 0 in what the
kernel writes:

  - insert_prealloc_file_extent() memsets the stack item to zero and
    only fills in type, disk_bytenr, disk_num_bytes and num_bytes, so
    offset/compression/encryption/other_encoding stay 0.
  - insert_ordered_extent_file_extent() copies oe->compress_type into
    the file extent's compression field, but the data reloc inode is
    created with BTRFS_INODE_NOCOMPRESS so compress_type is always 0;
    encryption and other_encoding are reserved-and-zero in btrfs.

A non-zero value here means the leaf decoded from disk does not match
what the kernel wrote, i.e. on-disk corruption. A malformed image
reaches this code via balance and panics the kernel.

A previous attempt to enforce all four constraints in tree-checker's
check_extent_data_item() was merged as commit 7d0ee95979e9 ("btrfs:
validate data reloc tree file extent item members in tree-checker")
and then reverted by commit 1c034697fcaa after btrfs/061 produced
false positives on arm64 with 64K pages. The reason: relocation
writeback legitimately produces REG file_extent_items with offset != 0
in the data reloc tree. When an ordered extent covers only the back
portion of an underlying PREALLOC (num_bytes < ram_bytes on the input
file_extent), insert_ordered_extent_file_extent() inserts a REG with

  offset    = oe->offset
  num_bytes = oe->num_bytes
  ram_bytes preserved from the original PREALLOC,

and this item can reach disk if a transaction commit fires while it
is present in the leaf.

The four fields belong in different layers:

  - compression, encryption and other_encoding are universal
    invariants for every item in the data reloc tree, regardless of
    cluster geometry. Enforce them in tree-checker's
    check_extent_data_item() so a corrupt leaf is rejected at read
    time.

  - offset is only an invariant at the cluster-boundary keys that
    get_new_location() searches (the key is computed as
    src_disk_bytenr - reloc_block_group_start). Partial-PREALLOC
    writebacks legitimately place REG items at non-boundary keys with
    offset != 0; tree-checker cannot reject these. The cluster-
    boundary item is always written by either
    insert_prealloc_file_extent() (offset=0 by memset) or by the
    front portion of a partial writeback (offset=0 by construction),
    so a non-zero offset there is corruption.

Enforce the universal invariants in check_extent_data_item() with a
file_extent_err() rejection. Convert the BUG_ON() in
get_new_location() to a -EUCLEAN return paired with btrfs_print_leaf()
and btrfs_err() so the offending leaf is logged. The caller in
replace_file_extents() already handles non-zero returns from
get_new_location() by breaking out of the loop without aborting the
transaction.

Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: David Sterba <dsterba@suse.com>
Reported-by: syzbot+3e20d8f3d41bac5dc9a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
Signed-off-by: Teng Liu <27rabbitlt@gmail.com>
---
Changes in v4:
 - Split the check by which layer the invariant holds in. Reject
   compression/encryption/other_encoding != 0 in tree-checker (true
   on-disk invariant for the entire data reloc tree). Keep the offset
   check at the call site in get_new_location() (true only at the
   cluster-boundary keys it searches; partial-PREALLOC writeback
   legitimately produces non-zero offset at non-boundary keys, which
   is why the v3 single-rule approach was reverted).
 - Suggested by Qu Wenruo in reply to v3:
   https://lore.kernel.org/linux-btrfs/20260427202822.278326-1-27rabbitlt@gmail.com/

Changes in v3:
 - Moved the entire four-field check from get_new_location() into
   tree-checker's check_extent_data_item(). Replaced BUG_ON() with
   ASSERT() in get_new_location(). Merged as 7d0ee95979e9 and
   reverted by 1c034697fcaa due to false positives in btrfs/061 on
   arm64 64K pages.

Changes in v2:
 - Pair the -EUCLEAN return with btrfs_print_leaf() and btrfs_err()
   so the offending leaf is dumped to dmesg, per Qu's v1 review:
   https://lore.kernel.org/linux-btrfs/6c54901d-5e07-4c46-9553-997b28c93b86@suse.com/
 - Expand the changelog to argue why non-zero compression/encryption/
   other_encoding in the data reloc inode imply on-disk corruption
   rather than a kernel bug.

 fs/btrfs/relocation.c   | 22 ++++++++++++++++++----
 fs/btrfs/tree-checker.c | 27 +++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c42c5180bdd..01977fa282db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -814,6 +814,7 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 			    u64 bytenr, u64 num_bytes)
 {
 	struct btrfs_root *root = BTRFS_I(reloc_inode)->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	BTRFS_PATH_AUTO_FREE(path);
 	struct btrfs_file_extent_item *fi;
 	struct extent_buffer *leaf;
@@ -835,10 +836,23 @@ static int get_new_location(struct inode *reloc_inode, u64 *new_bytenr,
 	fi = btrfs_item_ptr(leaf, path->slots[0],
 			    struct btrfs_file_extent_item);
 
-	BUG_ON(btrfs_file_extent_offset(leaf, fi) ||
-	       btrfs_file_extent_compression(leaf, fi) ||
-	       btrfs_file_extent_encryption(leaf, fi) ||
-	       btrfs_file_extent_other_encoding(leaf, fi));
+	/*
+	 * The cluster-boundary key searched above is always written by
+	 * relocation with offset 0: either by insert_prealloc_file_extent()
+	 * (memsets the stack item to 0) or by the front portion of a partial
+	 * writeback (offset=0 by construction). A non-zero value here means
+	 * the on-disk leaf does not match what relocation wrote, i.e.
+	 * corruption. The other encoding fields are caught earlier by
+	 * tree-checker's check_extent_data_item().
+	 */
+	if (unlikely(btrfs_file_extent_offset(leaf, fi))) {
+		btrfs_print_leaf(leaf);
+		btrfs_err(fs_info,
+"unexpected non-zero offset in file extent item for data reloc inode %llu key offset %llu offset %llu",
+			  btrfs_ino(BTRFS_I(reloc_inode)), bytenr,
+			  btrfs_file_extent_offset(leaf, fi));
+		return -EUCLEAN;
+	}
 
 	if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi))
 		return -EINVAL;
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1f15d0793a9c..8fc919dc08d0 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -296,6 +296,33 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 		return 0;
 	}
 
+	/*
+	 * For the data reloc tree, file extent items are written by
+	 * relocation's own paths. The data reloc inode is created with
+	 * BTRFS_INODE_NOCOMPRESS, so insert_ordered_extent_file_extent()
+	 * always leaves the compression field at 0. Encryption and
+	 * other_encoding are reserved-and-zero in btrfs. A non-zero value
+	 * for any of these means the leaf decoded from disk does not match
+	 * what the kernel wrote, i.e. on-disk corruption.
+	 *
+	 * The file_extent_item's offset field is NOT a universal invariant
+	 * here: partial-PREALLOC writebacks legitimately produce REG items
+	 * with non-zero offset at non-boundary keys. The offset check is
+	 * performed at the call site in get_new_location(), which only
+	 * inspects cluster-boundary keys where offset is always 0.
+	 */
+	if (unlikely(btrfs_header_owner(leaf) == BTRFS_DATA_RELOC_TREE_OBJECTID &&
+		     (btrfs_file_extent_compression(leaf, fi) ||
+		      btrfs_file_extent_encryption(leaf, fi) ||
+		      btrfs_file_extent_other_encoding(leaf, fi)))) {
+		file_extent_err(leaf, slot,
+"invalid encoding fields for data reloc tree, compression=%u encryption=%u other_encoding=%u",
+				btrfs_file_extent_compression(leaf, fi),
+				btrfs_file_extent_encryption(leaf, fi),
+				btrfs_file_extent_other_encoding(leaf, fi));
+		return -EUCLEAN;
+	}
+
 	/* Regular or preallocated extent has fixed item size */
 	if (unlikely(item_size != sizeof(*fi))) {
 		file_extent_err(leaf, slot,

base-commit: 6bf684b8823552b99c86bf791b22f622934ee771
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-13 11:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-25  6:10 [PATCH] btrfs: replace BUG_ON() with error return in get_new_location() Teng Liu
2026-04-25  8:06 ` Qu Wenruo
2026-04-26 20:16 ` [PATCH v2] " Teng Liu
2026-04-27  1:19   ` Qu Wenruo
2026-04-27 13:50     ` David Sterba
2026-04-27 20:24   ` [PATCH v3] btrfs: validate data reloc tree file extent item members in tree-checker Teng Liu
2026-04-27 22:15     ` Qu Wenruo
2026-04-28  0:44       ` Qu Wenruo
2026-04-28 15:29         ` David Sterba
2026-04-28  9:03     ` Johannes Thumshirn
2026-05-03 15:35     ` Teng Liu
2026-05-03 22:36       ` Qu Wenruo
2026-05-13 11:35     ` [PATCH v4] btrfs: validate data reloc tree file extent item members Teng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox