[PATCH v2] btrfs: reject root items with drop_progress and zero drop

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
@ 2026-03-12  0:14 ZhengYuan Huang
  2026-03-12  0:23 ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: ZhengYuan Huang @ 2026-03-12  0:14 UTC (permalink / raw)
  To: dsterba, clm, wqu
  Cc: linux-btrfs, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
	ZhengYuan Huang, stable

[BUG]
When recovering relocation at mount time, merge_reloc_root() and
btrfs_drop_snapshot() both use BUG_ON(level == 0) to guard against
an impossible state: a non-zero drop_progress combined with a zero
drop_level in a root_item, which can be triggered:

------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:1545!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 283 ... Tainted: 6.18.0+ #16 PREEMPT(voluntary) 
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2
RIP: 0010:merge_reloc_root+0x1266/0x1650 fs/btrfs/relocation.c:1545
Code: ffff0000 00004589 d7e9acfa ffffe8a1 79bafebe 02000000
Call Trace:
 merge_reloc_roots+0x295/0x890 fs/btrfs/relocation.c:1861
 btrfs_recover_relocation+0xd6e/0x11d0 fs/btrfs/relocation.c:4195
 btrfs_start_pre_rw_mount+0xa4d/0x1810 fs/btrfs/disk-io.c:3130
 open_ctree+0x5824/0x5fe0 fs/btrfs/disk-io.c:3640
 btrfs_fill_super fs/btrfs/super.c:987 [inline]
 btrfs_get_tree_super fs/btrfs/super.c:1951 [inline]
 btrfs_get_tree_subvol fs/btrfs/super.c:2094 [inline]
 btrfs_get_tree+0x111c/0x2190 fs/btrfs/super.c:2128
 vfs_get_tree+0x9a/0x370 fs/super.c:1758
 fc_mount fs/namespace.c:1199 [inline]
 do_new_mount_fc fs/namespace.c:3642 [inline]
 do_new_mount fs/namespace.c:3718 [inline]
 path_mount+0x5b8/0x1ea0 fs/namespace.c:4028
 do_mount fs/namespace.c:4041 [inline]
 __do_sys_mount fs/namespace.c:4229 [inline]
 __se_sys_mount fs/namespace.c:4206 [inline]
 __x64_sys_mount+0x282/0x320 fs/namespace.c:4206
 ...
RIP: 0033:0x7f969c9a8fde
Code: 0f1f4000 48c7c2b0 fffffff7 d8648902 b8ffffff ffc3660f
---[ end trace 0000000000000000 ]---

[CAUSE]
A non-zero drop_progress.objectid means an interrupted
btrfs_drop_snapshot() left a resume point on disk, and in that case
drop_level must be greater than 0 because the checkpoint is only
saved at internal node levels.

Although this invariant is enforced when the kernel writes the root
item, it is not validated when the root item is read back from disk.
That allows on-disk corruption to provide an invalid state with
drop_progress.objectid != 0 and drop_level == 0.

When relocation recovery later processes such a root item,
merge_reloc_root() reads drop_level and hits BUG_ON(level == 0). The
same invalid metadata can also trigger the corresponding BUG_ON() in
btrfs_drop_snapshot().

[FIX]
Fix this by validating the root_item invariant in tree-checker when
reading root items from disk: if drop_progress.objectid is non-zero,
drop_level must also be non-zero. Reject such malformed metadata with
-EUCLEAN before it reaches merge_reloc_root() or btrfs_drop_snapshot()
and triggers the BUG_ON.

The bug is reproducible on 7.0.0-rc2-next-20260310 with our dynamic
metadata fuzzing tool that corrupts btrfs metadata at runtime. After
the fix, the same corruption is correctly rejected by tree-checker
and the BUG_ON is no longer triggered.

Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check")
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
---
[CHANGELOG]
v2:
- Split out the error message fix from the previous patch, as requested
  during review.
---
 fs/btrfs/tree-checker.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index dd274f67ad7f..1e052c3303b3 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1260,6 +1260,23 @@ static int check_root_item(struct extent_buffer *leaf, struct btrfs_key *key,
 			    btrfs_root_drop_level(&ri), BTRFS_MAX_LEVEL - 1);
 		return -EUCLEAN;
 	}
+	/*
+	 * If drop_progress.objectid is non-zero, a btrfs_drop_snapshot() was
+	 * interrupted and the resume point was recorded in drop_progress and
+	 * drop_level.  In that case drop_level must be >= 1: level 0 is the
+	 * leaf level and drop_snapshot never saves a checkpoint there (it
+	 * only records checkpoints at internal node levels in DROP_REFERENCE
+	 * stage).  A zero drop_level combined with a non-zero drop_progress
+	 * objectid indicates on-disk corruption and would cause a BUG_ON in
+	 * merge_reloc_root() and btrfs_drop_snapshot() at mount time.
+	 */
+	if (unlikely(btrfs_disk_key_objectid(&ri.drop_progress) != 0 &&
+		     btrfs_root_drop_level(&ri) == 0)) {
+		generic_err(leaf, slot,
+			    "invalid root drop_level 0 with non-zero drop_progress objectid %llu",
+			    btrfs_disk_key_objectid(&ri.drop_progress));
+		return -EUCLEAN;
+	}
 
 	/* Flags check */
 	if (unlikely(btrfs_root_flags(&ri) & ~valid_root_flags)) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
  2026-03-12  0:14 [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level ZhengYuan Huang
@ 2026-03-12  0:23 ` Qu Wenruo
  2026-03-12  1:07   ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2026-03-12  0:23 UTC (permalink / raw)
  To: ZhengYuan Huang, dsterba, clm
  Cc: linux-btrfs, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
	stable



在 2026/3/12 10:44, ZhengYuan Huang 写道:
> [BUG]
> When recovering relocation at mount time, merge_reloc_root() and
> btrfs_drop_snapshot() both use BUG_ON(level == 0) to guard against
> an impossible state: a non-zero drop_progress combined with a zero
> drop_level in a root_item, which can be triggered:
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/relocation.c:1545!
> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> CPU: 1 UID: 0 PID: 283 ... Tainted: 6.18.0+ #16 PREEMPT(voluntary)
> Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2
> RIP: 0010:merge_reloc_root+0x1266/0x1650 fs/btrfs/relocation.c:1545
> Code: ffff0000 00004589 d7e9acfa ffffe8a1 79bafebe 02000000
> Call Trace:
>   merge_reloc_roots+0x295/0x890 fs/btrfs/relocation.c:1861
>   btrfs_recover_relocation+0xd6e/0x11d0 fs/btrfs/relocation.c:4195
>   btrfs_start_pre_rw_mount+0xa4d/0x1810 fs/btrfs/disk-io.c:3130
>   open_ctree+0x5824/0x5fe0 fs/btrfs/disk-io.c:3640
>   btrfs_fill_super fs/btrfs/super.c:987 [inline]
>   btrfs_get_tree_super fs/btrfs/super.c:1951 [inline]
>   btrfs_get_tree_subvol fs/btrfs/super.c:2094 [inline]
>   btrfs_get_tree+0x111c/0x2190 fs/btrfs/super.c:2128
>   vfs_get_tree+0x9a/0x370 fs/super.c:1758
>   fc_mount fs/namespace.c:1199 [inline]
>   do_new_mount_fc fs/namespace.c:3642 [inline]
>   do_new_mount fs/namespace.c:3718 [inline]
>   path_mount+0x5b8/0x1ea0 fs/namespace.c:4028
>   do_mount fs/namespace.c:4041 [inline]
>   __do_sys_mount fs/namespace.c:4229 [inline]
>   __se_sys_mount fs/namespace.c:4206 [inline]
>   __x64_sys_mount+0x282/0x320 fs/namespace.c:4206
>   ...
> RIP: 0033:0x7f969c9a8fde
> Code: 0f1f4000 48c7c2b0 fffffff7 d8648902 b8ffffff ffc3660f
> ---[ end trace 0000000000000000 ]---
> 
> [CAUSE]
> A non-zero drop_progress.objectid means an interrupted
> btrfs_drop_snapshot() left a resume point on disk, and in that case
> drop_level must be greater than 0 because the checkpoint is only
> saved at internal node levels.
> 
> Although this invariant is enforced when the kernel writes the root
> item, it is not validated when the root item is read back from disk.
> That allows on-disk corruption to provide an invalid state with
> drop_progress.objectid != 0 and drop_level == 0.
> 
> When relocation recovery later processes such a root item,
> merge_reloc_root() reads drop_level and hits BUG_ON(level == 0). The
> same invalid metadata can also trigger the corresponding BUG_ON() in
> btrfs_drop_snapshot().
> 
> [FIX]
> Fix this by validating the root_item invariant in tree-checker when
> reading root items from disk: if drop_progress.objectid is non-zero,
> drop_level must also be non-zero. Reject such malformed metadata with
> -EUCLEAN before it reaches merge_reloc_root() or btrfs_drop_snapshot()
> and triggers the BUG_ON.
> 
> The bug is reproducible on 7.0.0-rc2-next-20260310 with our dynamic
> metadata fuzzing tool that corrupts btrfs metadata at runtime.

I'll move this part into the BUG section as it's about how the bug is 
triggered.

> After
> the fix, the same corruption is correctly rejected by tree-checker
> and the BUG_ON is no longer triggered.
> 
> Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check")

Again, it's not a bug fix. You're just adding a new check.

This fixes tag should only go with the error message fix.

You don't need to send a new update, I'll do all the update at merge time.

Otherwise looks good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> Cc: stable@vger.kernel.org # 5.3+
> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
> ---
> [CHANGELOG]
> v2:
> - Split out the error message fix from the previous patch, as requested
>    during review.
> ---
>   fs/btrfs/tree-checker.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index dd274f67ad7f..1e052c3303b3 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -1260,6 +1260,23 @@ static int check_root_item(struct extent_buffer *leaf, struct btrfs_key *key,
>   			    btrfs_root_drop_level(&ri), BTRFS_MAX_LEVEL - 1);
>   		return -EUCLEAN;
>   	}
> +	/*
> +	 * If drop_progress.objectid is non-zero, a btrfs_drop_snapshot() was
> +	 * interrupted and the resume point was recorded in drop_progress and
> +	 * drop_level.  In that case drop_level must be >= 1: level 0 is the
> +	 * leaf level and drop_snapshot never saves a checkpoint there (it
> +	 * only records checkpoints at internal node levels in DROP_REFERENCE
> +	 * stage).  A zero drop_level combined with a non-zero drop_progress
> +	 * objectid indicates on-disk corruption and would cause a BUG_ON in
> +	 * merge_reloc_root() and btrfs_drop_snapshot() at mount time.
> +	 */
> +	if (unlikely(btrfs_disk_key_objectid(&ri.drop_progress) != 0 &&
> +		     btrfs_root_drop_level(&ri) == 0)) {
> +		generic_err(leaf, slot,
> +			    "invalid root drop_level 0 with non-zero drop_progress objectid %llu",
> +			    btrfs_disk_key_objectid(&ri.drop_progress));
> +		return -EUCLEAN;
> +	}
>   
>   	/* Flags check */
>   	if (unlikely(btrfs_root_flags(&ri) & ~valid_root_flags)) {


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
  2026-03-12  0:23 ` Qu Wenruo
@ 2026-03-12  1:07   ` Qu Wenruo
  2026-03-12  2:10     ` ZhengYuan Huang
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2026-03-12  1:07 UTC (permalink / raw)
  To: ZhengYuan Huang, dsterba, clm
  Cc: linux-btrfs, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
	stable



在 2026/3/12 10:53, Qu Wenruo 写道:
[...]
>>
>> Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check")
> 
> Again, it's not a bug fix. You're just adding a new check.

More info about when to use fixes tag:

https://www.kernel.org/doc/html/v6.19/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

It's more common to use it for a regression, aka, before that commit 
everything works, but at that commit something is broken.

And in your case, the commit is not causing the BUG_ON() thus it's 
incorrect.
Furthermore you'd better not put the commit introducing the BUG_ON() as 
the fixes target.


In your case, you're just enhancing the tree-checker to address a fuzzed 
image (which I guess you'll continue submitting such patches), thus 
getting the fixes tag done correctly will save everyone time.

Thanks,
Qu

> 
> This fixes tag should only go with the error message fix.
> 
> You don't need to send a new update, I'll do all the update at merge time.
> 
> Otherwise looks good to me.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> Thanks,
> Qu
> 
>> Cc: stable@vger.kernel.org # 5.3+
>> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
>> ---
>> [CHANGELOG]
>> v2:
>> - Split out the error message fix from the previous patch, as requested
>>    during review.
>> ---
>>   fs/btrfs/tree-checker.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index dd274f67ad7f..1e052c3303b3 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -1260,6 +1260,23 @@ static int check_root_item(struct extent_buffer 
>> *leaf, struct btrfs_key *key,
>>                   btrfs_root_drop_level(&ri), BTRFS_MAX_LEVEL - 1);
>>           return -EUCLEAN;
>>       }
>> +    /*
>> +     * If drop_progress.objectid is non-zero, a btrfs_drop_snapshot() 
>> was
>> +     * interrupted and the resume point was recorded in drop_progress 
>> and
>> +     * drop_level.  In that case drop_level must be >= 1: level 0 is the
>> +     * leaf level and drop_snapshot never saves a checkpoint there (it
>> +     * only records checkpoints at internal node levels in 
>> DROP_REFERENCE
>> +     * stage).  A zero drop_level combined with a non-zero drop_progress
>> +     * objectid indicates on-disk corruption and would cause a BUG_ON in
>> +     * merge_reloc_root() and btrfs_drop_snapshot() at mount time.
>> +     */
>> +    if (unlikely(btrfs_disk_key_objectid(&ri.drop_progress) != 0 &&
>> +             btrfs_root_drop_level(&ri) == 0)) {
>> +        generic_err(leaf, slot,
>> +                "invalid root drop_level 0 with non-zero 
>> drop_progress objectid %llu",
>> +                btrfs_disk_key_objectid(&ri.drop_progress));
>> +        return -EUCLEAN;
>> +    }
>>       /* Flags check */
>>       if (unlikely(btrfs_root_flags(&ri) & ~valid_root_flags)) {
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
  2026-03-12  1:07   ` Qu Wenruo
@ 2026-03-12  2:10     ` ZhengYuan Huang
  2026-03-12  4:16       ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: ZhengYuan Huang @ 2026-03-12  2:10 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: dsterba, clm, linux-btrfs, linux-kernel, baijiaju1990, r33s3n6,
	zzzccc427, stable

Thanks a lot for the explanation. I think I now understand the
intended use of the
Fixes: tag.

I still have one question, though. In this case, scripts/checkpatch.pl
warns that a
Fixes: tag should be added, which seems inconsistent with the submission
guidelines you pointed me to.

My understanding had been that patches should generally be sent only
after passing
checkpatch.pl cleanly, so I wanted to ask: is this kind of warning acceptable in
practice, or does it mean my local checkpatch.pl is outdated? If it is outdated,
should I generally use the latest checkpatch.pl when checking patches
before submission?

Thanks again,
ZhengYuan Huang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
  2026-03-12  2:10     ` ZhengYuan Huang
@ 2026-03-12  4:16       ` Qu Wenruo
  2026-03-12  4:27         ` ZhengYuan Huang
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2026-03-12  4:16 UTC (permalink / raw)
  To: ZhengYuan Huang
  Cc: dsterba, clm, linux-btrfs, linux-kernel, baijiaju1990, r33s3n6,
	zzzccc427, stable

在 2026/3/12 12:40, ZhengYuan Huang 写道:
> Thanks a lot for the explanation. I think I now understand the
> intended use of the
> Fixes: tag.
> 
> I still have one question, though. In this case, scripts/checkpatch.pl
> warns that a
> Fixes: tag should be added, which seems inconsistent with the submission
> guidelines you pointed me to.

If you check the checkpatch.pl itself, the logic in it is pretty simple 
and can give false alerts.

It just checks if the commit message has BUG: or KASAN/UBSAN lines.

So it's false alert prune.

> 
> My understanding had been that patches should generally be sent only
> after passing
> checkpatch.pl cleanly,

No, that is only a script which has its limits.
Checkpatch is good for its code style checks, but not always correct on 
other suggestions.

Sometimes even its code style checks may conflict with the rules inside 
each subsystem.

> so I wanted to ask: is this kind of warning acceptable in
> practice,

Yes, unless you believe a simple perl script can be as good as human 
common sense.

> or does it mean my local checkpatch.pl is outdated?

Since you're already using the latest rc kernel, I believe you're 
already using the latest checkpatch.

Thanks,
Qu

> If it is outdated,
> should I generally use the latest checkpatch.pl when checking patches
> before submission?
> 
> Thanks again,
> ZhengYuan Huang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level
  2026-03-12  4:16       ` Qu Wenruo
@ 2026-03-12  4:27         ` ZhengYuan Huang
  0 siblings, 0 replies; 6+ messages in thread
From: ZhengYuan Huang @ 2026-03-12  4:27 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: dsterba, clm, linux-btrfs, linux-kernel, baijiaju1990, r33s3n6,
	zzzccc427, stable

On Thu, Mar 12, 2026 at 12:16 PM Qu Wenruo <wqu@suse.com> wrote:
> > My understanding had been that patches should generally be sent only
> > after passing
> > checkpatch.pl cleanly,
>
> No, that is only a script which has its limits.
> Checkpatch is good for its code style checks, but not always correct on
> other suggestions.
>
> Sometimes even its code style checks may conflict with the rules inside
> each subsystem.

Thanks for the clarification.

Understood, that makes sense. I had been treating checkpatch.pl
warnings too strictly. I'll follow the documented guidelines and
subsystem expectations instead.

Thanks again,
ZhengYuan Huang

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-12  4:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12  0:14 [PATCH v2] btrfs: reject root items with drop_progress and zero drop_level ZhengYuan Huang
2026-03-12  0:23 ` Qu Wenruo
2026-03-12  1:07   ` Qu Wenruo
2026-03-12  2:10     ` ZhengYuan Huang
2026-03-12  4:16       ` Qu Wenruo
2026-03-12  4:27         ` ZhengYuan Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox