From: Nicolas Gnyra <nicolas.gnyra@gmail.com>
To: Qu Wenruo <wqu@suse.com>, Qu Wenruo <quwenruo.btrfs@gmx.com>,
linux-btrfs@vger.kernel.org
Subject: Re: Errors found in extent allocation tree or chunk allocation
Date: Thu, 30 Jan 2025 00:21:36 -0500 [thread overview]
Message-ID: <c1bc1160-22ac-4104-bbb6-b976dc4e26ed@gmail.com> (raw)
In-Reply-To: <3f0d8fe5-e631-4c20-a62f-31c22f169324@suse.com>
Le 2025-01-29 à 23:19, Qu Wenruo a écrit :
>
>
> 在 2025/1/30 14:19, Nicolas Gnyra 写道:
>> Le 2025-01-29 à 18:35, Qu Wenruo a écrit :
>>>
>>>
>>> 在 2025/1/30 06:03, Nicolas Gnyra 写道:
>>>> Le 2024-12-03 à 21:50, Qu Wenruo a écrit :
>>>>>
>>>>>
>>>>> 在 2024/12/4 10:32, Nicolas Gnyra 写道:
>>>>>> Hi all,
>>>>>>
>>>>>> I seem to have messed up my btrfs filesystem after adding a new (3rd)
>>>>>> drive and running `btrfs balance start -dconvert=raid5 -
>>>>>> mconvert=raid1c3 /path/to/mount`. It ran for a while and I thought it
>>>>>> had finished successfully but after a reboot it's stuck mounting as
>>>>>> read-only. I seemingly am able to mount it as read/write if I add `-o
>>>>>> skip_balance` but if I try to write to it, it locks up again. I
>>>>>> managed
>>>>>> to run a scrub in this state but it found no errors.
>>>>>>
>>>>>> Kernel logs: https://pastebin.com/Cs06sNnr (drives sdb, sdc, and sdd,
>>>>>> UUID dfa2779b-b7d1-4658-89f7-dabe494e67c8)
>>>>>
>>>>> The dmesg shows the problem very straightforward:
>>>>>
>>>>> item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50
>>>>> extent refs 1 gen 84178 flags 1
>>>>> ref#0: shared data backref parent 32399126528000 count 0 <<<
>>>>> ref#1: shared data backref parent 31808973717504 count 1
>>>>>
>>>>> Notice the count number, it should never be 0, as if one ref goes zero
>>>>> it should be removed from the extent item.
>>>>>
>>>>> I believe the correct value should just be 1, and 0 -> 1 is also
>>>>> possibly an indicator of hardware runtime bitflip.
>>>>>
>>>>> This is a new corner case we have never seen, thus I'll send a new
>>>>> patch
>>>>> to handle such case in tree-checker.
>>>>>
>>>>>> `btrfs check`: https://pastebin.com/7SJZS3Yv
>>>>>> `btrfs check --repair` (ran after a discussion in Libera Chat,
>>>>>> failed):
>>>>>> https://pastebin.com/BGLSx6GM
>>>>>
>>>>> In theory, btrfs should be able to repair this particular error,
>>>>> but the error message seems to indicate ENOSPC, meaning there is not
>>>>> enough space for metadata at least.
>>>>
>>>> I finally had some time to try out a version of the kernel with your
>>>> fix
>>>> (built locally from commit 0afd22092df4d3473569c197e317f91face7e51b)
>>>> and
>>>> I can now see the modified error message (see new dmesg contents:
>>>> https://pastebin.com/t7J5TJ0Z). Unfortunately, apart from that,
>>>> behaviour seems to be identical to before. `btrfs check --repair` still
>>>> fails in the exact same way. Is this expected? For some reason I had
>>>> assumed your change would fix it, but I had forgotten this mention of
>>>> ENOSPC so is there any chance of getting back into a writable state or
>>>> should I just reformat the drives?
>>>
>>> For the ENOSPC problem, please provide `btrfs fi usage` output for the
>>> mount fs.
>>>
>>> I believe with the ENOSPC problem resolved, we can let btrfs check
>>> --repair to fix the problem.
>>>
>>> Thanks,
>>> Qu
>>
>> Thanks for the quick reply! Here's the output of `btrfs fi usage`:
>>
>> Overall:
>> Device size: 21.83TiB
>> Device allocated: 12.50TiB
>> Device unallocated: 9.33TiB
>> Device missing: 0.00B
>> Device slack: 0.00B
>> Used: 11.35TiB
>> Free (estimated): 6.89TiB (min: 3.85TiB)
>> Free (statfs, df): 6.78TiB
>> Data ratio: 1.52
>> Metadata ratio: 2.88
>> Global reserve: 512.00MiB (used: 0.00B)
>> Multiple profiles: yes (data, metadata,
>> system)
>>
>> Data,RAID1: Size:324.00GiB, Used:299.59GiB (92.47%)
>> /dev/sdd 324.00GiB
>> /dev/sde 324.00GiB
>>
>> Data,RAID5: Size:7.88TiB, Used:7.16TiB (90.84%)
>> /dev/sdd 3.94TiB
>> /dev/sde 3.94TiB
>> /dev/sdf 3.94TiB
>>
>> Metadata,RAID1: Size:2.00GiB, Used:73.25MiB (3.58%)
>> /dev/sdd 2.00GiB
>> /dev/sde 2.00GiB
>
> The mixed metadata profile may be the problem.
>
> Have you tried to convert the remaining 2GiB RAID1 metadata into RAID1C3?
>
> Or is the problem you're hitting preventing the full conversion to RAID1C3?
>
>
> Anyway, it also looks like a bug in btrfs-progs, I'll need to dig deeper
> to fix it.
>
> Thanks,
> Qu
Just to make sure, you mean running `btrfs balance start
-mconvert=raid1c3,soft` right? If so, unfortunately it just triggers
those same "invalid shared data ref count, should have non-zero value"
errors then forces the filesystem into read-only mode so I can't get it
to run.
>>
>> Metadata,RAID1C3: Size:14.00GiB, Used:8.69GiB (62.08%)
>> /dev/sdd 14.00GiB
>> /dev/sde 14.00GiB
>> /dev/sdf 14.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:48.00KiB (0.15%)
>> /dev/sdd 32.00MiB
>> /dev/sde 32.00MiB
>>
>> System,RAID1C3: Size:32.00MiB, Used:736.00KiB (2.25%)
>> /dev/sdd 32.00MiB
>> /dev/sde 32.00MiB
>> /dev/sdf 32.00MiB
>>
>> Unallocated:
>> /dev/sdd 3.00TiB
>> /dev/sde 3.00TiB
>> /dev/sdf 3.32TiB
>>
>> Thanks,
>> Nicolas
>>
>>>>>> I'm currently running btrfs-progs v6.12 but the balance was
>>>>>> originally
>>>>>> run on v5.10.1. Is there any way to recover from this or should I
>>>>>> just
>>>>>> nuke the filesystem and restart from scratch? There's nothing super
>>>>>> important on there, it's just going to be annoying to restore from a
>>>>>> backup, and I thought it'd be interesting to try to figure out what
>>>>>> happened here.
>>>>>
>>>>> Recommended to run a full memtest before doing anything, just to
>>>>> verify
>>>>> if it's really a hardware bitflip.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
next prev parent reply other threads:[~2025-01-30 5:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 0:02 Errors found in extent allocation tree or chunk allocation Nicolas Gnyra
2024-12-04 2:50 ` Qu Wenruo
2024-12-04 3:58 ` Nicolas Gnyra
2024-12-04 4:23 ` Qu Wenruo
2024-12-04 4:43 ` Nicolas Gnyra
2024-12-04 13:38 ` Nicolas Gnyra
2025-01-29 19:33 ` Nicolas Gnyra
2025-01-29 23:35 ` Qu Wenruo
2025-01-30 3:49 ` Nicolas Gnyra
2025-01-30 4:19 ` Qu Wenruo
2025-01-30 5:21 ` Nicolas Gnyra [this message]
2025-03-15 16:52 ` Nicolas Gnyra
-- strict thread matches above, loose matches on Subject: below --
2023-01-10 12:49 errors " Frankie Fisher
2023-01-12 22:59 ` Frankie Fisher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1bc1160-22ac-4104-bbb6-b976dc4e26ed@gmail.com \
--to=nicolas.gnyra@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox