public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
@ 2024-11-26 16:11 Brett Dikeman
  2024-11-26 21:30 ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Brett Dikeman @ 2024-11-26 16:11 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

Greetings,

I have a filesystem that re-mounted read-only very shortly after I
started a btrfs defrag with zst compression enabled (which is not to
say I think this was the cause.) The volume  resides on a Debian
Bookworm system and is very simple configuration/feature-wise; it does
not use quotas, snapshots, or sub-volumes. In the few hours prior to
running the defrag command, I deleted a large number of files that
totaled about 100GB of space. Prior to that, the filesystem hasn't
seen changes in months; even atimes are disabled.

btrfs check completes with no errors generated in dmesg or the
terminal during the check, it takes what seems like a reasonable
amount of time with not much interruption in disk activity. A scrub
progresses at expected speeds but suddenly stops with a status of
"success" after a few GB.). There are no signs of drive failure from
SMART parameters, and no kernel messages that would suggest drive
failure, such as timeouts or SATA errors. However, I am currently
running a nondestructive-write badblocks test to address this
possibility a bit more - both drives have made it  in to 10% so far,
with no errors in dmesg or badblocks.

What I have tried:

- upgraded btrfsprogs to bookworm-backports because bookworm's
btrfsprogs is old enough that it doesn't include several rescue
commands.
- clearing the zero log
- clearing the inode cache
- clearing the space cache.

It was mounting OK until around when I updated the tools package and
ran some of the above commands. During one attempt to run a scrub,
there was dmesg output I unfortunately did not catch, but I remember
something that looked similar to what I've seen when I had md array
that ended up with different-aged metadata when one drive was booted
out of a 4-drive array; I had to force md to ignore its timestamp.

Any recommendations on how to proceed would be greatly appreciated.
dmesg output is included as an attachment.

uname -a output:
6.11.5+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.11.5-1~bpo12+1
(2024-11-11) x86_64 GNU/Linux

btrfs version:
btrfs-progs v6.6.3

#  btrfs fi show
Label: none  uuid: b1e76acd-525d-46f2-b2a6-b0403dcdc135
Total devices 2 FS bytes used 1.37TiB
devid    1 size 1.82TiB used 1.44TiB path /dev/sdd
devid    2 size 1.82TiB used 1.44TiB path /dev/sdc

[-- Attachment #2: dmesg-11-25-2024.txt --]
[-- Type: text/plain, Size: 2617 bytes --]

[Mon Nov 25 16:22:06 2024] BTRFS info (device sdd): first mount of filesystem b1e76acd-525d-46f2-b2a6-b0403dcdc135
[Mon Nov 25 16:22:06 2024] BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm
[Mon Nov 25 16:22:06 2024] BTRFS info (device sdd): using free-space-tree
[Mon Nov 25 16:22:10 2024] BTRFS info (device sdd): creating free space tree
[Mon Nov 25 16:22:19 2024] page: refcount:4 mapcount:0 mapping:000000009ee5aade index:0x2d25109b pfn:0x3b322f
[Mon Nov 25 16:22:19 2024] memcg:ffff9d6daeb20800
[Mon Nov 25 16:22:19 2024] aops:btree_aops [btrfs] ino:1
[Mon Nov 25 16:22:19 2024] flags: 0x17ffffc0004000(private|node=0|zone=2|lastcpupid=0x1fffff)
[Mon Nov 25 16:22:19 2024] raw: 0017ffffc0004000 0000000000000000 dead000000000122 ffff9d6f58a94ef8
[Mon Nov 25 16:22:19 2024] raw: 000000002d25109b ffff9d6f1c1114a0 00000004ffffffff ffff9d6daeb20800
[Mon Nov 25 16:22:19 2024] page dumped because: eb page dump
[Mon Nov 25 16:22:19 2024] BTRFS critical (device sdd): corrupt leaf: block=3102325977088 slot=12 extent bytenr=6781779968 len=139264 invalid data ref objectid value 18446744073709551604
[Mon Nov 25 16:22:19 2024] BTRFS error (device sdd): read time tree block corruption detected on logical 3102325977088 mirror 2
[Mon Nov 25 16:22:19 2024] page: refcount:3 mapcount:0 mapping:000000009ee5aade index:0x2d25109b pfn:0x3b322f
[Mon Nov 25 16:22:19 2024] memcg:ffff9d6daeb20800
[Mon Nov 25 16:22:19 2024] aops:btree_aops [btrfs] ino:1
[Mon Nov 25 16:22:19 2024] flags: 0x17ffffd0004020(lru|private|node=0|zone=2|lastcpupid=0x1fffff)
[Mon Nov 25 16:22:19 2024] raw: 0017ffffd0004020 ffffe6488d169c08 ffff9d6daeb21210 ffff9d6f58a94ef8
[Mon Nov 25 16:22:19 2024] raw: 000000002d25109b ffff9d6f1c1114a0 00000003ffffffff ffff9d6daeb20800
[Mon Nov 25 16:22:19 2024] page dumped because: eb page dump
[Mon Nov 25 16:22:19 2024] BTRFS critical (device sdd): corrupt leaf: block=3102325977088 slot=12 extent bytenr=6781779968 len=139264 invalid data ref objectid value 18446744073709551604
[Mon Nov 25 16:22:19 2024] BTRFS error (device sdd): read time tree block corruption detected on logical 3102325977088 mirror 1
[Mon Nov 25 16:22:19 2024] BTRFS error (device sdd state A): Transaction aborted (error -5)
[Mon Nov 25 16:22:19 2024] BTRFS: error (device sdd state A) in btrfs_create_free_space_tree:1197: errno=-5 IO failure
[Mon Nov 25 16:22:19 2024] BTRFS warning (device sdd state EA): failed to create free space tree: -5
[Mon Nov 25 16:22:19 2024] BTRFS error (device sdd state EA): commit super ret -30
[Mon Nov 25 16:22:19 2024] BTRFS error (device sdd state EA): open_ctree failed

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-26 16:11 corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox Brett Dikeman
@ 2024-11-26 21:30 ` Qu Wenruo
  2024-11-26 23:09   ` Brett Dikeman
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2024-11-26 21:30 UTC (permalink / raw)
  To: Brett Dikeman, linux-btrfs



在 2024/11/27 02:41, Brett Dikeman 写道:
> Greetings,
>
> I have a filesystem that re-mounted read-only very shortly after I
> started a btrfs defrag with zst compression enabled (which is not to
> say I think this was the cause.) The volume  resides on a Debian
> Bookworm system and is very simple configuration/feature-wise; it does
> not use quotas, snapshots, or sub-volumes. In the few hours prior to
> running the defrag command, I deleted a large number of files that
> totaled about 100GB of space. Prior to that, the filesystem hasn't
> seen changes in months; even atimes are disabled.
>
> btrfs check completes with no errors generated in dmesg or the
> terminal during the check, it takes what seems like a reasonable
> amount of time with not much interruption in disk activity. A scrub
> progresses at expected speeds but suddenly stops with a status of
> "success" after a few GB.). There are no signs of drive failure from
> SMART parameters, and no kernel messages that would suggest drive
> failure, such as timeouts or SATA errors. However, I am currently
> running a nondestructive-write badblocks test to address this
> possibility a bit more - both drives have made it  in to 10% so far,
> with no errors in dmesg or badblocks.
>
> What I have tried:
>
> - upgraded btrfsprogs to bookworm-backports because bookworm's
> btrfsprogs is old enough that it doesn't include several rescue
> commands.
> - clearing the zero log
> - clearing the inode cache

Inode cache is what you need to clear.

But you need a much newer progs, at least v6.11, to fully clear the
inode cache.

> - clearing the space cache.
>
> It was mounting OK until around when I updated the tools package and
> ran some of the above commands. During one attempt to run a scrub,
> there was dmesg output I unfortunately did not catch, but I remember
> something that looked similar to what I've seen when I had md array
> that ended up with different-aged metadata when one drive was booted
> out of a 4-drive array; I had to force md to ignore its timestamp.
>
> Any recommendations on how to proceed would be greatly appreciated.
> dmesg output is included as an attachment.
>
> uname -a output:
> 6.11.5+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.11.5-1~bpo12+1
> (2024-11-11) x86_64 GNU/Linux
>
> btrfs version:
> btrfs-progs v6.6.3

Too old progs, that will not fully clear all inode cache.

After using v6.11 to fully clear inode cache, your fs should be totally
fine.

Thanks,
Qu
>
> #  btrfs fi show
> Label: none  uuid: b1e76acd-525d-46f2-b2a6-b0403dcdc135
> Total devices 2 FS bytes used 1.37TiB
> devid    1 size 1.82TiB used 1.44TiB path /dev/sdd
> devid    2 size 1.82TiB used 1.44TiB path /dev/sdc


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-26 21:30 ` Qu Wenruo
@ 2024-11-26 23:09   ` Brett Dikeman
  2024-11-27  3:17     ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Brett Dikeman @ 2024-11-26 23:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Nov 26, 2024 at 4:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> Inode cache is what you need to clear.

Brilliant! Thank you, Qu. It did mount cleanly. Shame Debian's
toolchain is so out-of-date. I pinged the maintainer asking if they
could pull 6.11.

Is there anything I could have done that would have caused the
corrupted inode cache?

'btrfs check' thought the second drive was busy after unmounting, but
a reboot cured that.

check found the following; Is the fix for this to clear the free space cache?

[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space cache
block group 6471811072 has wrong amount of free space, free space
cache has 42901504 block group has 43040768
failed to load free space cache for block group 6471811072
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 1504346918912 bytes used, no error found
total csum bytes: 1466184264
total tree bytes: 2501361664
total fs tree bytes: 718209024
total extent tree bytes: 106008576
btree space waste bytes: 375584870
file data blocks allocated: 1512284389376
 referenced 1380845105152

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-26 23:09   ` Brett Dikeman
@ 2024-11-27  3:17     ` Qu Wenruo
  2024-11-30  1:11       ` Nicholas D Steeves
  2024-12-27 22:25       ` Nicholas D Steeves
  0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-11-27  3:17 UTC (permalink / raw)
  To: Brett Dikeman; +Cc: linux-btrfs



在 2024/11/27 09:39, Brett Dikeman 写道:
> On Tue, Nov 26, 2024 at 4:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>> Inode cache is what you need to clear.
>
> Brilliant! Thank you, Qu. It did mount cleanly. Shame Debian's
> toolchain is so out-of-date. I pinged the maintainer asking if they
> could pull 6.11.
>
> Is there anything I could have done that would have caused the
> corrupted inode cache?

It's not corrupted, but us deprecating that feature quite some time ago.
And then a recently enhanced sanity check just treat such long
deprecated feature as an error, thus rejecting those tree blocks.

>
> 'btrfs check' thought the second drive was busy after unmounting, but
> a reboot cured that.
>
> check found the following; Is the fix for this to clear the free space cache?

I'd recommend to go v2 cache directly.

You may not want to hear, but we're going to deprecate v1 cache too.

V2 cache has way better crash handling, I have not yet seen a corrupted
v2 cache yet.

Thanks,
Qu
>
> [1/8] checking log skipped (none written)
> [2/8] checking root items
> [3/8] checking extents
> [4/8] checking free space cache
> block group 6471811072 has wrong amount of free space, free space
> cache has 42901504 block group has 43040768
> failed to load free space cache for block group 6471811072
> [5/8] checking fs roots
> [6/8] checking only csums items (without verifying data)
> [7/8] checking root refs
> [8/8] checking quota groups skipped (not enabled on this FS)
> found 1504346918912 bytes used, no error found
> total csum bytes: 1466184264
> total tree bytes: 2501361664
> total fs tree bytes: 718209024
> total extent tree bytes: 106008576
> btree space waste bytes: 375584870
> file data blocks allocated: 1512284389376
>   referenced 1380845105152


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-27  3:17     ` Qu Wenruo
@ 2024-11-30  1:11       ` Nicholas D Steeves
  2024-11-30  4:37         ` Qu Wenruo
  2024-12-27 22:25       ` Nicholas D Steeves
  1 sibling, 1 reply; 7+ messages in thread
From: Nicholas D Steeves @ 2024-11-30  1:11 UTC (permalink / raw)
  To: Qu Wenruo, Brett Dikeman; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]

Qu Wenruo <quwenruo.btrfs@gmx.com> writes:

> 在 2024/11/27 09:39, Brett Dikeman 写道:
>> On Tue, Nov 26, 2024 at 4:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>> Inode cache is what you need to clear.
>>
>> Brilliant! Thank you, Qu. It did mount cleanly. Shame Debian's
>> toolchain is so out-of-date. I pinged the maintainer asking if they
>> could pull 6.11.
>>
>> Is there anything I could have done that would have caused the
>> corrupted inode cache?
>
> It's not corrupted, but us deprecating that feature quite some time ago.
> And then a recently enhanced sanity check just treat such long
> deprecated feature as an error, thus rejecting those tree blocks.
>
>>
>> 'btrfs check' thought the second drive was busy after unmounting, but
>> a reboot cured that.
>>
>> check found the following; Is the fix for this to clear the free space cache?
>
> I'd recommend to go v2 cache directly.
>
> You may not want to hear, but we're going to deprecate v1 cache too.
>
> V2 cache has way better crash handling, I have not yet seen a corrupted
> v2 cache yet.

Let's avoid a "btrfs is the only fs that ate my data" storm.  Would it
be sufficient to provide free space cache v2 migration instructions in
our release notes?  If not, would you be willing to write something?
Alternatively, should this be automatic, and in-kernel?  Debian users
tend to have long-running installations and don't tend to reinstall.  I
think you'll agree we ought to get ahead of the thousands of users who
have v1 cache and who ran mkfs with btrfs-progs as early as 4.4 or 4.9.

Most users will migrate from a recent 6.1.x (close to kernel.org's)
directly to 6.12.x (what seems to be the most likely LTS).  Many have
also tracked the stable mainline or have recently upgraded to 6.11.

Beyond that, I'm not aware of any format chances (other than some new
experimental raid56 stuff), but it might also be nice to know if there's
a threshold where it would be better to reformat an "aged" filesystem.


Kind regards,
Nicholas

P.S. We'll fix the ancient btrfs-progs problem before the new year.  The
current issue is where "strong package ownership" processes slow things
down.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-30  1:11       ` Nicholas D Steeves
@ 2024-11-30  4:37         ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-11-30  4:37 UTC (permalink / raw)
  To: Nicholas D Steeves, Qu Wenruo, Brett Dikeman; +Cc: linux-btrfs



在 2024/11/30 11:41, Nicholas D Steeves 写道:
> Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
> 
>> 在 2024/11/27 09:39, Brett Dikeman 写道:
>>> On Tue, Nov 26, 2024 at 4:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>> Inode cache is what you need to clear.
>>>
>>> Brilliant! Thank you, Qu. It did mount cleanly. Shame Debian's
>>> toolchain is so out-of-date. I pinged the maintainer asking if they
>>> could pull 6.11.
>>>
>>> Is there anything I could have done that would have caused the
>>> corrupted inode cache?
>>
>> It's not corrupted, but us deprecating that feature quite some time ago.
>> And then a recently enhanced sanity check just treat such long
>> deprecated feature as an error, thus rejecting those tree blocks.
>>
>>>
>>> 'btrfs check' thought the second drive was busy after unmounting, but
>>> a reboot cured that.
>>>
>>> check found the following; Is the fix for this to clear the free space cache?
>>
>> I'd recommend to go v2 cache directly.
>>
>> You may not want to hear, but we're going to deprecate v1 cache too.
>>
>> V2 cache has way better crash handling, I have not yet seen a corrupted
>> v2 cache yet.
> 
> Let's avoid a "btrfs is the only fs that ate my data" storm.  Would it
> be sufficient to provide free space cache v2 migration instructions in
> our release notes?

So far when we deprecate the v1 cache, it should require no user 
interruption.

We will silently delete v1 cache and rebuild v2 cache on the next rw 
mount, at least that's my plan.

Unlike the offending inode cache feature, v1 cache is a very common 
feature, thus I believe we won't hit the same situation again, until v1 
cache is fully deprecated for years if not decades.

>  If not, would you be willing to write something?

I believe the current man page is showing it good enough:

   If v2 is enabled, and v1 space cache will be cleared (at the first
   mount) and kernels without v2 support will only be able to mount the
   filesystem in read-only mode.

And since v2 cache is supported since 4.5, we have no need to bother any 
compatibility problem either.

> Alternatively, should this be automatic, and in-kernel?  Debian users
> tend to have long-running installations and don't tend to reinstall.  I
> think you'll agree we ought to get ahead of the thousands of users who
> have v1 cache and who ran mkfs with btrfs-progs as early as 4.4 or 4.9.
> 
> Most users will migrate from a recent 6.1.x (close to kernel.org's)
> directly to 6.12.x (what seems to be the most likely LTS).  Many have
> also tracked the stable mainline or have recently upgraded to 6.11.
> 
> Beyond that, I'm not aware of any format chances (other than some new
> experimental raid56 stuff), but it might also be nice to know if there's
> a threshold where it would be better to reformat an "aged" filesystem.

There is no recommendation to format an "aged" fs at all.

All our new features (v2 cache, no_holes, block-group-tree, even new 
checksums) have a way to migrate the existing fs to the new format.

The only difference is, some will be done automatically by the kernel 
(v2 cache), some needs to be done with unmounted fs and btrfstune tool 
(bg tree, new checksums format).

The offline migration is not ideal for end users, but at least that's 
what we have for now.

> 
> 
> Kind regards,
> Nicholas
> 
> P.S. We'll fix the ancient btrfs-progs problem before the new year.  The
> current issue is where "strong package ownership" processes slow things
> down.

I'd recommend to provide something like PPA repo for a newer/latest 
btrfs-progs static build for end users who do not want to build 
btrfs-progs by themselves, and still want a latest prog no matter the 
Debian version.

But the inode cache cleanup code is really an exception, where the 
feature is so rare that we do not even have proper test coverage for it 
at all.

Until we hit the situation where inode cache cleanup is definitely needed...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox
  2024-11-27  3:17     ` Qu Wenruo
  2024-11-30  1:11       ` Nicholas D Steeves
@ 2024-12-27 22:25       ` Nicholas D Steeves
  1 sibling, 0 replies; 7+ messages in thread
From: Nicholas D Steeves @ 2024-12-27 22:25 UTC (permalink / raw)
  To: Qu Wenruo, Brett Dikeman; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

Qu Wenruo <quwenruo.btrfs@gmx.com> writes:

> 在 2024/11/27 09:39, Brett Dikeman 写道:
>> On Tue, Nov 26, 2024 at 4:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>> Inode cache is what you need to clear.
>>
>> Brilliant! Thank you, Qu. It did mount cleanly. Shame Debian's
>> toolchain is so out-of-date. I pinged the maintainer asking if they
>> could pull 6.11.

I've salvaged Debian's btrfs-progs package, so it will now be kept
significantly more up-to-date than it was in the past.  6.12 was just
accepted into unstable/sid, and it will migrate to testing/trixie after
Debian 13's "alpha1" installer is released.

Regards,
Nicholas


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-27 22:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-26 16:11 corrupt leaf, invalid data ref objectid value, read time tree block corruption detected Inbox Brett Dikeman
2024-11-26 21:30 ` Qu Wenruo
2024-11-26 23:09   ` Brett Dikeman
2024-11-27  3:17     ` Qu Wenruo
2024-11-30  1:11       ` Nicholas D Steeves
2024-11-30  4:37         ` Qu Wenruo
2024-12-27 22:25       ` Nicholas D Steeves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox