Btrfs keeps getting corrupted

All of lore.kernel.org
 help / color / mirror / Atom feed

* Btrfs keeps getting corrupted
@ 2024-09-15 19:45 Roman Mamedov
  2024-09-15 21:29 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Roman Mamedov @ 2024-09-15 19:45 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have a Btrfs filesystem that keeps getting corrupted for some reason.

The setup is a 4-disk external enclosure connected via USB3. It is powered-on
and off as needed.

In it there's one 12TB Seagate Exos HDD, that admittedly has a low number of 
reallocated sectors, but there are no IO errors at any time during the
operartion (so those do not seem to be hit and there are no new ones).

On the HDD there's a LUKS partition, and inside LUKS there's a Btrfs
filesystem.

The workflow is power-on, luks-open, mount, rsync, unmount, luks-close,
power-off.

On the previous attempt to use this, the FS was starting to go read-only on
accessing some recently-copied files, and there were "transid verify failed"
errors in dmesg. I wrote that off as perhaps not syncing, unmounting and
closing everything off correctly before power-off.

Modified my scripts to do a "sync" before every step in the power-off
sequence. Reformatted from scratch, copied all data again, and turned it off.

Next time, a few weeks later, I try to do another rsync, and this time it
doesn't even mount:

[248942.223437] BTRFS: device label sea12.k4e devid 1 transid 725 /dev/dm-26 scanned by (udev-worker) (5328)
[248942.267427] BTRFS info (device dm-26): first mount of filesystem 4071aeab-ccab-4b36-901f-38fd38e4ef41
[248942.267441] BTRFS info (device dm-26): using crc32c (crc32c-intel) checksum algorithm
[248942.267446] BTRFS info (device dm-26): use zstd compression, level 3
[248942.267448] BTRFS info (device dm-26): using free space tree
[248942.358145] BTRFS error (device dm-26): level verify failed on logical 1053650288640 mirror 1 wanted 3 found 0
[248942.388148] BTRFS error (device dm-26): level verify failed on logical 1053650288640 mirror 2 wanted 3 found 0
[248942.396897] BTRFS error (device dm-26: state C): failed to load root csum
[248942.408461] BTRFS error (device dm-26: state C): open_ctree failed

btrfsck:

Opening filesystem to check...
parent transid verify failed on 1053650288640 wanted 723 found 110
parent transid verify failed on 1053650288640 wanted 723 found 110
parent transid verify failed on 1053650288640 wanted 723 found 110
Ignoring transid failure
ERROR: root [7 0] level 0 does not match 3

ERROR: could not setup csum tree
ERROR: cannot open file system

===

Such a high disparity in transid mismatch, flush is not working somewhere? But
I specifically do "sync" even multiple times now, before unmounting and after.

How can I figure out what is to blame here, is it the enclosure, is it USB,
LUKS, Btrfs, or some fundamental bug involving a combination of these?

Or maybe the drive is faulty in some mysterious way and storing/returning old
data instead of IO errors or sector reallocation.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs keeps getting corrupted
  2024-09-15 19:45 Btrfs keeps getting corrupted Roman Mamedov
@ 2024-09-15 21:29 ` Qu Wenruo
  2024-09-15 22:31   ` Roman Mamedov
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2024-09-15 21:29 UTC (permalink / raw)
  To: Roman Mamedov, linux-btrfs



在 2024/9/16 05:15, Roman Mamedov 写道:
> Hello,
>
> I have a Btrfs filesystem that keeps getting corrupted for some reason.
>
> The setup is a 4-disk external enclosure connected via USB3. It is powered-on
> and off as needed.
>
> In it there's one 12TB Seagate Exos HDD, that admittedly has a low number of
> reallocated sectors, but there are no IO errors at any time during the
> operartion (so those do not seem to be hit and there are no new ones).
>
> On the HDD there's a LUKS partition, and inside LUKS there's a Btrfs
> filesystem.
>
> The workflow is power-on, luks-open, mount, rsync, unmount, luks-close,
> power-off.
>
> On the previous attempt to use this, the FS was starting to go read-only on
> accessing some recently-copied files, and there were "transid verify failed"
> errors in dmesg. I wrote that off as perhaps not syncing, unmounting and
> closing everything off correctly before power-off.
>
> Modified my scripts to do a "sync" before every step in the power-off
> sequence. Reformatted from scratch, copied all data again, and turned it off.
>
> Next time, a few weeks later, I try to do another rsync, and this time it
> doesn't even mount:
>
> [248942.223437] BTRFS: device label sea12.k4e devid 1 transid 725 /dev/dm-26 scanned by (udev-worker) (5328)
> [248942.267427] BTRFS info (device dm-26): first mount of filesystem 4071aeab-ccab-4b36-901f-38fd38e4ef41
> [248942.267441] BTRFS info (device dm-26): using crc32c (crc32c-intel) checksum algorithm
> [248942.267446] BTRFS info (device dm-26): use zstd compression, level 3
> [248942.267448] BTRFS info (device dm-26): using free space tree
> [248942.358145] BTRFS error (device dm-26): level verify failed on logical 1053650288640 mirror 1 wanted 3 found 0
> [248942.388148] BTRFS error (device dm-26): level verify failed on logical 1053650288640 mirror 2 wanted 3 found 0

This is definitely something wrong with the flush behavior.

The level check happens after bytenr check, so at least it's not full
garbage.

> [248942.396897] BTRFS error (device dm-26: state C): failed to load root csum
> [248942.408461] BTRFS error (device dm-26: state C): open_ctree failed
>
> btrfsck:
>
> Opening filesystem to check...
> parent transid verify failed on 1053650288640 wanted 723 found 110
> parent transid verify failed on 1053650288640 wanted 723 found 110
> parent transid verify failed on 1053650288640 wanted 723 found 110
> Ignoring transid failure
> ERROR: root [7 0] level 0 does not match 3
>
> ERROR: could not setup csum tree
> ERROR: cannot open file system
>
> ===
>
> Such a high disparity in transid mismatch, flush is not working somewhere? But
> I specifically do "sync" even multiple times now, before unmounting and after.

Manually sync still relies on FLUSH, and FLUSH is not working on the
lower storage stack (from LUKS to your SSD/HDD firmware), sync won't
save you.

>
> How can I figure out what is to blame here, is it the enclosure, is it USB,
> LUKS, Btrfs, or some fundamental bug involving a combination of these?

In that case, you may want to provide your kernel version first (to rule
out known bugs or too old kernels), then reduce the depth of the storage
stack, aka, running btrfs directly on that device as a test.

I do not believe it's the LUKS/device mapper layer, but just in case.

If btrfs on the raw device works fine, then you may focus on the
LUKS/device-mapper layer.

Thanks,
Qu

>
> Or maybe the drive is faulty in some mysterious way and storing/returning old
> data instead of IO errors or sector reallocation.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs keeps getting corrupted
  2024-09-15 21:29 ` Qu Wenruo
@ 2024-09-15 22:31   ` Roman Mamedov
  2024-09-15 23:38     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Roman Mamedov @ 2024-09-15 22:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, 16 Sep 2024 06:59:54 +0930
Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> > Such a high disparity in transid mismatch, flush is not working somewhere? But
> > I specifically do "sync" even multiple times now, before unmounting and after.
> 
> Manually sync still relies on FLUSH, and FLUSH is not working on the
> lower storage stack (from LUKS to your SSD/HDD firmware), sync won't
> save you.

I always kept in mind that 'sync' was the key to ensure all data has been
moved from RAM to the HDD. But I now realize that I missed that there's also a
buffer in the HDD, which also needs to be flushed to disk. It could be that I
power-off the device before it manages to do that. Also it is an SMR HDD, so
it might need to do housekeeping with the data on-disk as well.

One idea I got is sending drive to sleep (hdparm -Y) before calling power-off
now. Hopefully that makes it flush before sleep.

> > How can I figure out what is to blame here, is it the enclosure, is it USB,
> > LUKS, Btrfs, or some fundamental bug involving a combination of these?
> 
> In that case, you may want to provide your kernel version first (to rule
> out known bugs or too old kernels), then reduce the depth of the storage
> stack, aka, running btrfs directly on that device as a test.

The kernel is 6.6 series, cannot say exact, but was at most 5-10 point
releases older than latest, both of the times the issue occured.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs keeps getting corrupted
  2024-09-15 22:31   ` Roman Mamedov
@ 2024-09-15 23:38     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-09-15 23:38 UTC (permalink / raw)
  To: Roman Mamedov, Qu Wenruo; +Cc: linux-btrfs

在 2024/9/16 08:01, Roman Mamedov 写道:
> On Mon, 16 Sep 2024 06:59:54 +0930
> Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
>>> Such a high disparity in transid mismatch, flush is not working somewhere? But
>>> I specifically do "sync" even multiple times now, before unmounting and after.
>>
>> Manually sync still relies on FLUSH, and FLUSH is not working on the
>> lower storage stack (from LUKS to your SSD/HDD firmware), sync won't
>> save you.
> 
> I always kept in mind that 'sync' was the key to ensure all data has been
> moved from RAM to the HDD. But I now realize that I missed that there's also a
> buffer in the HDD, which also needs to be flushed to disk. It could be that I
> power-off the device before it manages to do that. Also it is an SMR HDD, so
> it might need to do housekeeping with the data on-disk as well.

The point of FLUSH command is, for any correctly behaving firmware, the 
device should only report that command is done, AFTER all data is 
written into non-volatile storage (can be the spinning disk of a HDD, 
NAND or even battery powered RAM).

So if the device is reporting FLUSH done, but in fact it's not, then 
it's a big problem for the device firmware or anything between, 
including the firmware of the disk, USB to ATA converter, the device-map 
layer (remember, each dm device will also need to handle the FLUSH 
command) etc.

Although just cheating the FLUSH behavior is not that a big deal, I'm 
using that behavior all day for all my testing VMs, and AFAIK VMware and 
VBox are also doing such cheating by default

The problem can only happen if a power loss/crash happens, thus dropping 
the cache (and break the required write sequence), and corrupt the btrfs 
filesystem, since btrfs strongly relies on the correct FLUSH behavior to 
implement CoW to protect its metadata.
(I still remember a lot users reporting btrfs corruption with Vmware/Vbox)

> 
> One idea I got is sending drive to sleep (hdparm -Y) before calling power-off
> now. Hopefully that makes it flush before sleep.
> 
>>> How can I figure out what is to blame here, is it the enclosure, is it USB,
>>> LUKS, Btrfs, or some fundamental bug involving a combination of these?
>>
>> In that case, you may want to provide your kernel version first (to rule
>> out known bugs or too old kernels), then reduce the depth of the storage
>> stack, aka, running btrfs directly on that device as a test.
> 
> The kernel is 6.6 series, cannot say exact, but was at most 5-10 point
> releases older than latest, both of the times the issue occured.
> 

Then I do not think it's the dm layer.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-15 23:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-15 19:45 Btrfs keeps getting corrupted Roman Mamedov
2024-09-15 21:29 ` Qu Wenruo
2024-09-15 22:31   ` Roman Mamedov
2024-09-15 23:38     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.