From: Nicolas Gnyra <nicolas.gnyra@gmail.com>
To: Qu Wenruo <wqu@suse.com>, Qu Wenruo <quwenruo.btrfs@gmx.com>,
linux-btrfs@vger.kernel.org
Subject: Re: Errors found in extent allocation tree or chunk allocation
Date: Thu, 30 Jan 2025 00:21:36 -0500 [thread overview]
Message-ID: <c1bc1160-22ac-4104-bbb6-b976dc4e26ed@gmail.com> (raw)
In-Reply-To: <3f0d8fe5-e631-4c20-a62f-31c22f169324@suse.com>
Le 2025-01-29 à 23:19, Qu Wenruo a écrit :
>
>
> 在 2025/1/30 14:19, Nicolas Gnyra 写道:
>> Le 2025-01-29 à 18:35, Qu Wenruo a écrit :
>>>
>>>
>>> 在 2025/1/30 06:03, Nicolas Gnyra 写道:
>>>> Le 2024-12-03 à 21:50, Qu Wenruo a écrit :
>>>>>
>>>>>
>>>>> 在 2024/12/4 10:32, Nicolas Gnyra 写道:
>>>>>> Hi all,
>>>>>>
>>>>>> I seem to have messed up my btrfs filesystem after adding a new (3rd)
>>>>>> drive and running `btrfs balance start -dconvert=raid5 -
>>>>>> mconvert=raid1c3 /path/to/mount`. It ran for a while and I thought it
>>>>>> had finished successfully but after a reboot it's stuck mounting as
>>>>>> read-only. I seemingly am able to mount it as read/write if I add `-o
>>>>>> skip_balance` but if I try to write to it, it locks up again. I
>>>>>> managed
>>>>>> to run a scrub in this state but it found no errors.
>>>>>>
>>>>>> Kernel logs: https://pastebin.com/Cs06sNnr (drives sdb, sdc, and sdd,
>>>>>> UUID dfa2779b-b7d1-4658-89f7-dabe494e67c8)
>>>>>
>>>>> The dmesg shows the problem very straightforward:
>>>>>
>>>>> item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50
>>>>> extent refs 1 gen 84178 flags 1
>>>>> ref#0: shared data backref parent 32399126528000 count 0 <<<
>>>>> ref#1: shared data backref parent 31808973717504 count 1
>>>>>
>>>>> Notice the count number, it should never be 0, as if one ref goes zero
>>>>> it should be removed from the extent item.
>>>>>
>>>>> I believe the correct value should just be 1, and 0 -> 1 is also
>>>>> possibly an indicator of hardware runtime bitflip.
>>>>>
>>>>> This is a new corner case we have never seen, thus I'll send a new
>>>>> patch
>>>>> to handle such case in tree-checker.
>>>>>
>>>>>> `btrfs check`: https://pastebin.com/7SJZS3Yv
>>>>>> `btrfs check --repair` (ran after a discussion in Libera Chat,
>>>>>> failed):
>>>>>> https://pastebin.com/BGLSx6GM
>>>>>
>>>>> In theory, btrfs should be able to repair this particular error,
>>>>> but the error message seems to indicate ENOSPC, meaning there is not
>>>>> enough space for metadata at least.
>>>>
>>>> I finally had some time to try out a version of the kernel with your
>>>> fix
>>>> (built locally from commit 0afd22092df4d3473569c197e317f91face7e51b)
>>>> and
>>>> I can now see the modified error message (see new dmesg contents:
>>>> https://pastebin.com/t7J5TJ0Z). Unfortunately, apart from that,
>>>> behaviour seems to be identical to before. `btrfs check --repair` still
>>>> fails in the exact same way. Is this expected? For some reason I had
>>>> assumed your change would fix it, but I had forgotten this mention of
>>>> ENOSPC so is there any chance of getting back into a writable state or
>>>> should I just reformat the drives?
>>>
>>> For the ENOSPC problem, please provide `btrfs fi usage` output for the
>>> mount fs.
>>>
>>> I believe with the ENOSPC problem resolved, we can let btrfs check
>>> --repair to fix the problem.
>>>
>>> Thanks,
>>> Qu
>>
>> Thanks for the quick reply! Here's the output of `btrfs fi usage`:
>>
>> Overall:
>> Device size: 21.83TiB
>> Device allocated: 12.50TiB
>> Device unallocated: 9.33TiB
>> Device missing: 0.00B
>> Device slack: 0.00B
>> Used: 11.35TiB
>> Free (estimated): 6.89TiB (min: 3.85TiB)
>> Free (statfs, df): 6.78TiB
>> Data ratio: 1.52
>> Metadata ratio: 2.88
>> Global reserve: 512.00MiB (used: 0.00B)
>> Multiple profiles: yes (data, metadata,
>> system)
>>
>> Data,RAID1: Size:324.00GiB, Used:299.59GiB (92.47%)
>> /dev/sdd 324.00GiB
>> /dev/sde 324.00GiB
>>
>> Data,RAID5: Size:7.88TiB, Used:7.16TiB (90.84%)
>> /dev/sdd 3.94TiB
>> /dev/sde 3.94TiB
>> /dev/sdf 3.94TiB
>>
>> Metadata,RAID1: Size:2.00GiB, Used:73.25MiB (3.58%)
>> /dev/sdd 2.00GiB
>> /dev/sde 2.00GiB
>
> The mixed metadata profile may be the problem.
>
> Have you tried to convert the remaining 2GiB RAID1 metadata into RAID1C3?
>
> Or is the problem you're hitting preventing the full conversion to RAID1C3?
>
>
> Anyway, it also looks like a bug in btrfs-progs, I'll need to dig deeper
> to fix it.
>
> Thanks,
> Qu
Just to make sure, you mean running `btrfs balance start
-mconvert=raid1c3,soft` right? If so, unfortunately it just triggers
those same "invalid shared data ref count, should have non-zero value"
errors then forces the filesystem into read-only mode so I can't get it
to run.
>>
>> Metadata,RAID1C3: Size:14.00GiB, Used:8.69GiB (62.08%)
>> /dev/sdd 14.00GiB
>> /dev/sde 14.00GiB
>> /dev/sdf 14.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:48.00KiB (0.15%)
>> /dev/sdd 32.00MiB
>> /dev/sde 32.00MiB
>>
>> System,RAID1C3: Size:32.00MiB, Used:736.00KiB (2.25%)
>> /dev/sdd 32.00MiB
>> /dev/sde 32.00MiB
>> /dev/sdf 32.00MiB
>>
>> Unallocated:
>> /dev/sdd 3.00TiB
>> /dev/sde 3.00TiB
>> /dev/sdf 3.32TiB
>>
>> Thanks,
>> Nicolas
>>
>>>>>> I'm currently running btrfs-progs v6.12 but the balance was
>>>>>> originally
>>>>>> run on v5.10.1. Is there any way to recover from this or should I
>>>>>> just
>>>>>> nuke the filesystem and restart from scratch? There's nothing super
>>>>>> important on there, it's just going to be annoying to restore from a
>>>>>> backup, and I thought it'd be interesting to try to figure out what
>>>>>> happened here.
>>>>>
>>>>> Recommended to run a full memtest before doing anything, just to
>>>>> verify
>>>>> if it's really a hardware bitflip.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
next prev parent reply other threads:[~2025-01-30 5:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 0:02 Errors found in extent allocation tree or chunk allocation Nicolas Gnyra
2024-12-04 2:50 ` Qu Wenruo
2024-12-04 3:58 ` Nicolas Gnyra
2024-12-04 4:23 ` Qu Wenruo
2024-12-04 4:43 ` Nicolas Gnyra
2024-12-04 13:38 ` Nicolas Gnyra
2025-01-29 19:33 ` Nicolas Gnyra
2025-01-29 23:35 ` Qu Wenruo
2025-01-30 3:49 ` Nicolas Gnyra
2025-01-30 4:19 ` Qu Wenruo
2025-01-30 5:21 ` Nicolas Gnyra [this message]
2025-03-15 16:52 ` Nicolas Gnyra
-- strict thread matches above, loose matches on Subject: below --
2023-01-10 12:49 errors " Frankie Fisher
2023-01-12 22:59 ` Frankie Fisher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1bc1160-22ac-4104-bbb6-b976dc4e26ed@gmail.com \
--to=nicolas.gnyra@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.