corruption after block-group-tree conversion, how to recover? (data is readable)

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* corruption after block-group-tree conversion, how to recover? (data is readable)
@ 2026-04-26 21:23 Thomas Debesse
  2026-04-26 22:18 ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Debesse @ 2026-04-26 21:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Thomas DEBESSE

Hi, I just did a `btrfstune --convert-to-block-group-tree` conversion
operation on a btrfs filesystem and this corrupted the volume:

```
[  786.278571] BTRFS info (device dm-9): first mount of filesystem <uuid>
[  786.278595] BTRFS info (device dm-9): using crc32c (crc32c-intel)
checksum algorithm
[  786.442781] BTRFS error (device dm-9): parent transid verify failed
on logical 9820282388480 mirror 2 wanted 5715946 found 5717327
[  786.448020] BTRFS error (device dm-9): parent transid verify failed
on logical 9820282388480 mirror 1 wanted 5715946 found 5717327
[  786.449543] BTRFS error (device dm-9): failed to read block groups: -5
[  786.481199] BTRFS error (device dm-9): open_ctree failed: -5
```

I can mount the btrfs volume with `mount -oro,rescue=all`, I can
browse the filesystem, and every file data I checked loocked fine.

Here is the log when mounting with `-oro,rescue=all`:

```
[ 2523.344235] BTRFS info (device dm-9): first mount of filesystem <uuid>
[ 2523.344260] BTRFS info (device dm-9): using crc32c (crc32c-intel)
checksum algorithm
[ 2523.612830] BTRFS error (device dm-9: state C): parent transid
verify failed on logical 9820282388480 mirror 2 wanted 5715946 found
5717327
[ 2523.613372] BTRFS error (device dm-9: state C): parent transid
verify failed on logical 9820282388480 mirror 1 wanted 5715946 found
5717327
[ 2523.639263] BTRFS info (device dm-9: state C): enabling ssd optimizations
[ 2523.639268] BTRFS info (device dm-9: state C): disabling log replay
at mount time
[ 2523.639271] BTRFS info (device dm-9: state C): turning on async discard
[ 2523.639273] BTRFS info (device dm-9: state C): enabling free space tree
[ 2523.639276] BTRFS info (device dm-9: state C): ignoring bad roots
[ 2523.639278] BTRFS info (device dm-9: state C): ignoring data csums
```

The log reports SSD optimization to be enabled because those drives
are bcache backing devices. Bcache was disabled throughout the
operation.

I expect that recovering from backup would take 10 days, so I would
like to avoid that if possible. If there are ways to recompute the
broken metadata for example or other repair operation, I would prefer
doing that.

I asked on #btrfs channel and after helping me succesfully mount the
volume, multicore recommended me to send an email to this mailing
list, so here we are.

What do you think about it? Is there a way to repair the filesytem?

Context: This is a 4 HDD btrfs raid10 volume. Drives are large (16TB
each, Seagate IronWolf Pro).
24 days ago two drives started logging some errors, so I replaced
them. Those drives are bcache backing devices but bcache has been kept
disabled for the whole operation that follows.

I first did a backup of the files in their last state. Then I replaced
the first drive with `btrfs replace`, waited for complete replacement,
then replaced the second drive with `btrfs replace` and waited for
complete replacement again. Once this was done I shut down, removed
the replaced drive, only keeping plugged in the two sane non-replaced
drives and the two sane replacement drives. I then rebooted, mounted
successfully the btrfs volume, and did a scrub. The scrub finished
without error.

It happens that this btrfs volume had been created on September 2023
so I was running Ubuntu Lunar that used Linux 6.2 at the time, and it
lacked the block group tree feature. Mounting was very slow so and it
annoyed me so in February of 2025 I asked on the #btrfs channel if
there was something doable to speed-up things, and it had been told to
me that enabling block group tree would help me. Out of caution, I
didn't do it at the time, waiting for a better moment. Since I just
replaced two drives and did a backup, I thought this was the best
moment to do it, and I did the block group tree conversion, after a
year of patience. It looks like it was the right time to do it but
that I should have not done it.

It also happens that I use btrfs deduplication and compression
extensively, and I do know that in fact once decompressed and
deduplicated the data is larger than the size of the disk storage. So
not only decompressing and copying from borg will be slow (and take
days…), but I'll have to also wait for btrfs to recompress everything
and I'll have to run bees while the copy is going on to deduplicate on
the go to make sure the restoration fits. This will be slow as hell.
That's why I expect the restoration from backup to last 10 days. Since
I'm already in degraded workflow since 24 days, I'm looking for faster
options if they do exist.

So if there is any recovery or repair operation available I want to
try it first. Since I can read the data, can the corrupted metadata be
recomputed?

The system is running Ubuntu 24.04.4 with Linux 6.8.0-110-generic and
btrfs-progs 6.6.3-1.1build2. I can run rescue systems on removable
drives if newer tools are required to repair.

Best regards,

-- 
Thomas “illwieckz” Debesse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-26 21:23 corruption after block-group-tree conversion, how to recover? (data is readable) Thomas Debesse
@ 2026-04-26 22:18 ` Qu Wenruo
  2026-04-27  0:08   ` Ulli Horlacher
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-04-26 22:18 UTC (permalink / raw)
  To: Thomas Debesse, linux-btrfs



在 2026/4/27 06:53, Thomas Debesse 写道:
> Hi, I just did a `btrfstune --convert-to-block-group-tree` conversion
> operation on a btrfs filesystem and this corrupted the volume:
> 
> ```
> [  786.278571] BTRFS info (device dm-9): first mount of filesystem <uuid>
> [  786.278595] BTRFS info (device dm-9): using crc32c (crc32c-intel)
> checksum algorithm
> [  786.442781] BTRFS error (device dm-9): parent transid verify failed
> on logical 9820282388480 mirror 2 wanted 5715946 found 5717327
> [  786.448020] BTRFS error (device dm-9): parent transid verify failed
> on logical 9820282388480 mirror 1 wanted 5715946 found 5717327

Unfortunately, transid mismatch is not reliably repairable.

> [  786.449543] BTRFS error (device dm-9): failed to read block groups: -5
> [  786.481199] BTRFS error (device dm-9): open_ctree failed: -5
> ```
> 
> I can mount the btrfs volume with `mount -oro,rescue=all`, I can
> browse the filesystem, and every file data I checked loocked fine.
> 
> Here is the log when mounting with `-oro,rescue=all`:
> 
> ```
> [ 2523.344235] BTRFS info (device dm-9): first mount of filesystem <uuid>
> [ 2523.344260] BTRFS info (device dm-9): using crc32c (crc32c-intel)
> checksum algorithm
> [ 2523.612830] BTRFS error (device dm-9: state C): parent transid
> verify failed on logical 9820282388480 mirror 2 wanted 5715946 found
> 5717327
> [ 2523.613372] BTRFS error (device dm-9: state C): parent transid
> verify failed on logical 9820282388480 mirror 1 wanted 5715946 found
> 5717327
> [ 2523.639263] BTRFS info (device dm-9: state C): enabling ssd optimizations
> [ 2523.639268] BTRFS info (device dm-9: state C): disabling log replay
> at mount time
> [ 2523.639271] BTRFS info (device dm-9: state C): turning on async discard
> [ 2523.639273] BTRFS info (device dm-9: state C): enabling free space tree
> [ 2523.639276] BTRFS info (device dm-9: state C): ignoring bad roots
> [ 2523.639278] BTRFS info (device dm-9: state C): ignoring data csums
> ```
> 
> The log reports SSD optimization to be enabled because those drives
> are bcache backing devices. Bcache was disabled throughout the
> operation.
> 
> I expect that recovering from backup would take 10 days, so I would
> like to avoid that if possible. If there are ways to recompute the
> broken metadata for example or other repair operation, I would prefer
> doing that.

I'm afraid restoring from backup is the only reliable solution.

> 
> The system is running Ubuntu 24.04.4 with Linux 6.8.0-110-generic and
> btrfs-progs 6.6.3-1.1build2.

Unfortunately the btrfs-progs is too old, and possibility the root cause.

There are some fixes in progs v6.15, thus it's strongly recommended to 
use progs newer than v6.15.

Thanks,
Qu

> I can run rescue systems on removable
> drives if newer tools are required to repair.
> 
> Best regards,
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-26 22:18 ` Qu Wenruo
@ 2026-04-27  0:08   ` Ulli Horlacher
  2026-04-27  2:24   ` Christoph Anton Mitterer
  2026-04-27  2:43   ` Thomas Debesse
  2 siblings, 0 replies; 7+ messages in thread
From: Ulli Horlacher @ 2026-04-27  0:08 UTC (permalink / raw)
  To: linux-btrfs

On Mon 2026-04-27 (07:48), Qu Wenruo wrote:

> åœš 2026/4/27 06:53, Thomas Debesse å†™é":
> 
> > Hi, I just did a `btrfstune --convert-to-block-group-tree` conversion

Why should one execute this command?
For what...?


> > The system is running Ubuntu 24.04.4 with Linux 6.8.0-110-generic and
> > btrfs-progs 6.6.3-1.1build2.
> 
> Unfortunately the btrfs-progs is too old, and possibility the root cause.
> 
> There are some fixes in progs v6.15, thus it's strongly recommended to
> use progs newer than v6.15.

I also run Ubuntu 24.04.4 on many hosts and btrfs-progs 6.6.3-1.1build2 is
the newest Ubuntu package for it.

Is it recommendable to put btrfs v6.19.1 from
https://github.com/kdave/btrfs-progs in a PATH element in front?

I have now:

root@fex:~# type -a btrfs
btrfs is /opt/btrfs-tools/bin/btrfs
btrfs is /usr/bin/btrfs

root@fex:~# /opt/btrfs-tools/bin/btrfs version
btrfs-progs v6.19.1
-EXPERIMENTAL -INJECT +STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin

root@fex:~# /usr/bin/btrfs version
btrfs-progs v6.6.3

root@fex:~# uname -a
Linux fex 6.8.0-110-generic #110-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 15:09:20 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux


And a addon question:

  Static binaries

  The btrfs.box is an all-in-one tool in the busybox style, the
  functionality is determined by the binary names (either symlink,
  hardlink or a file copy).

I found no list which names are useful. By try-and-error I have:

root@fex:/opt/btrfs-tools/bin# ls -li btrfs btrfstune mkfs.btrfs
3214654 -rwxr-xr-x 4 framstag users 3653384 Mar 18 18:34 btrfs
3214654 -rwxr-xr-x 4 framstag users 3653384 Mar 18 18:34 btrfstune
3214654 -rwxr-xr-x 4 framstag users 3653384 Mar 18 18:34 mkfs.btrfs


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum TIK
Universitaet Stuttgart         E-Mail: horlacher@tik.uni-stuttgart.de
Allmandring 30a                Tel:    ++49-711-68565868
70569 Stuttgart (Germany)      WWW:    https://www.tik.uni-stuttgart.de/
REF:<07db83f8-5769-4aaf-9f54-2711ddad9eea@suse.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-26 22:18 ` Qu Wenruo
  2026-04-27  0:08   ` Ulli Horlacher
@ 2026-04-27  2:24   ` Christoph Anton Mitterer
  2026-04-27  2:43   ` Thomas Debesse
  2 siblings, 0 replies; 7+ messages in thread
From: Christoph Anton Mitterer @ 2026-04-27  2:24 UTC (permalink / raw)
  To: Qu Wenruo, Thomas Debesse, linux-btrfs

On Mon, 2026-04-27 at 07:48 +0930, Qu Wenruo wrote:
> 
> Unfortunately the btrfs-progs is too old, and possibility the root
> cause.
> There are some fixes in progs v6.15, thus it's strongly recommended
> to 
> use progs newer than v6.15.

Is there a way to verify whether his issue was actually caused by
what's fixed in 6.15?

I mean now that BGT is the default on fs creation, more people will
probably want to convert their existing fs to it... but if that would
be still unstable then there might be quite some breakage.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-26 22:18 ` Qu Wenruo
  2026-04-27  0:08   ` Ulli Horlacher
  2026-04-27  2:24   ` Christoph Anton Mitterer
@ 2026-04-27  2:43   ` Thomas Debesse
  2026-04-27  5:26     ` Qu Wenruo
  2 siblings, 1 reply; 7+ messages in thread
From: Thomas Debesse @ 2026-04-27  2:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Thomas DEBESSE

Le lun. 27 avr. 2026 à 00:18, Qu Wenruo <wqu@suse.com> a écrit :
> Unfortunately, transid mismatch is not reliably repairable.
> I'm afraid restoring from backup is the only reliable solution.

Thank you for your answers. Is the data from the “badly converted”
volume be assumed safe?

The idea is that, instead of overwriting the “badly converted” volume
with a brand newly created and empty btrfs volume and copying the data
from backup, I may consider the option to create the new btrfs volume
on brand new devices I may source. I would have then the option to
copy the data from either the backup or the “badly converted” volume.
If the data is safely copyable from the “badly converted” volume (once
mounted with recovery options), I expect copying from the “badly
converted” volume to be way faster than copying from backup.

Also another question: if it happens that copying from the “badly
converted” volume can be assumed to be safe, would `btrfs send` work
or do I would still have to do it the btrfs-agnostic way, like by
using rsync?

Thanks in advance for your answers, that helps a lot.

-- 
Thomas “illwieckz” Debesse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-27  2:43   ` Thomas Debesse
@ 2026-04-27  5:26     ` Qu Wenruo
  2026-04-27 12:56       ` Thomas Debesse
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2026-04-27  5:26 UTC (permalink / raw)
  To: Thomas Debesse, Qu Wenruo; +Cc: linux-btrfs

在 2026/4/27 12:13, Thomas Debesse 写道:
> Le lun. 27 avr. 2026 à 00:18, Qu Wenruo <wqu@suse.com> a écrit :
>> Unfortunately, transid mismatch is not reliably repairable.
>> I'm afraid restoring from backup is the only reliable solution.
> 
> Thank you for your answers. Is the data from the “badly converted”
> volume be assumed safe?

If you really want to know that, only a full "btrfs check" can give you 
some clue.

The transid mismatch means the block group tree block is being 
over-written by some other tree blocks.

It can be minor, e.g. over-written by a new bg tree block, or it can 
cause data loss, e.g. over-written by a subvolume tree.

And it can even be worse, if there are multiple tree blocks being 
overwritten.

Your "rescue=all" mount working is a good indication that at least most 
of your subvolume trees are good, but it also disables datachecksum 
verification, so there is still a chance that csum tree is corrupted.

> 
> The idea is that, instead of overwriting the “badly converted” volume
> with a brand newly created and empty btrfs volume and copying the data
> from backup, I may consider the option to create the new btrfs volume
> on brand new devices I may source. I would have then the option to
> copy the data from either the backup or the “badly converted” volume.
> If the data is safely copyable from the “badly converted” volume (once
> mounted with recovery options), I expect copying from the “badly
> converted” volume to be way faster than copying from backup.

In that case, you may want to try this mount option instead:

"-o ro,rescue=ibadroots,nologreplay", which will allow btrfs to skip the 
bad block group tree, but still enables data checksum verification.

> 
> Also another question: if it happens that copying from the “badly
> converted” volume can be assumed to be safe, would `btrfs send` work
> or do I would still have to do it the btrfs-agnostic way, like by
> using rsync?

I think 'btrfs send' may work, but haven't yet verified.

> 
> Thanks in advance for your answers, that helps a lot.
> 

And I forgot to ask, when did the corruption happen?
Immediately after the conversion finished, or the first mount worked, 
but corruption happened later?

If immediately after a finished conversion, then it's very possible the 
conversion is the cause.

If not, then it may be something else, like the older non-upstream LTS 
kernel from Ubuntu.

The reason I'm asking is, although there are some fixes in progs v6.15, 
they are all addressing problems related to resuming interrupted conversion.
Thus I'm wondering what is the root cause of the transid mismatch.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption after block-group-tree conversion, how to recover? (data is readable)
  2026-04-27  5:26     ` Qu Wenruo
@ 2026-04-27 12:56       ` Thomas Debesse
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Debesse @ 2026-04-27 12:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, Thomas DEBESSE

Le lun. 27 avr. 2026 à 07:26, Qu Wenruo <quwenruo.btrfs@gmx.com> a écrit :
> 在 2026/4/27 12:13, Thomas Debesse 写道:
> > Thank you for your answers. Is the data from the “badly converted”
> > volume be assumed safe?
>
> If you really want to know that, only a full "btrfs check" can give you
> some clue.

OK

> In that case, you may want to try this mount option instead:
>
> "-o ro,rescue=ibadroots,nologreplay", which will allow btrfs to skip the
> bad block group tree, but still enables data checksum verification.

I confirm that `-o ro,rescue=ibadroots,rescue=nologreplay` can still
mount the volume, thanks.

> And I forgot to ask, when did the corruption happen?
> Immediately after the conversion finished, or the first mount worked,
> but corruption happened later?

It happened immediately after the conversion. I converted, rebooted,
and first mount did not work.

> If immediately after a finished conversion, then it's very possible the
> conversion is the cause.

My own immediate thought was the conversion is the cause, because it
happened right after it.

> The reason I'm asking is, although there are some fixes in progs v6.15,
> they are all addressing problems related to resuming interrupted conversion.
> Thus I'm wondering what is the root cause of the transid mismatch.

On IRC some people suggest that's likely because the mkfs.btrfs used
for creating the volume (btrfs-progs 5.10.1 from Ubuntu 23.04 in
September 2023) didn't use the no-holes feature that was supposedly
made default in btrfs-progs 5.15, and then the btrfstune used for
converting the volume (btrfs-progs 6.6.3 from Ubuntu 24.04) didn't
check for the no-holes feature being there before converting while it
would had been required to be there fist… And that missed check (and
the tool doing the job while it should have not) would have been the
bug that hit me.

It was also said that free-space-tree was also required, but I already
successfully converted to free-space-tree months ago, so the culprit
is very likely the missing of no-holes feature…

Some people also said that they experienced problems with conversion
but doesn't experience those when doing a zero log before conversion.
But in my case my umount and shutdown before conversion was supposedly
correctly done and then there was no log to expect.

Thanks for your answers.

-- 
Thomas “illwieckz” Debesse

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-27 12:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26 21:23 corruption after block-group-tree conversion, how to recover? (data is readable) Thomas Debesse
2026-04-26 22:18 ` Qu Wenruo
2026-04-27  0:08   ` Ulli Horlacher
2026-04-27  2:24   ` Christoph Anton Mitterer
2026-04-27  2:43   ` Thomas Debesse
2026-04-27  5:26     ` Qu Wenruo
2026-04-27 12:56       ` Thomas Debesse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox