public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* mounting causes errors after power loss
@ 2024-02-15 20:04 Kyle Smith
  2024-02-15 23:23 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Kyle Smith @ 2024-02-15 20:04 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have noticed the occasional btrfs error after a hard power cycle and
wanted to get a better understanding of the issue. These errors only
happen after the btrfs partition is mounted, and running "btrfs check"
before mounting does not find any errors.

I am using btrfs on Linux 5.10.176 on an encrypted LUKS2 partition on
an eMMC device. The LUKS2 partition is configured to allow-discards
and btrfs is mounted with  "-o acl,noatime,nodiratime,compress=lzo".

# uname -a
Linux (none) 5.10.176 #0 SMP PREEMPT Thu Apr 27 20:28:15 2023 aarch64 GNU/Linux

# btrfs --version
btrfs-progs v6.0.1

# btrfs fi show
Label: none  uuid: d90b7698-7ef5-4c1e-8365-b7631a6eafba
    Total devices 1 FS bytes used 92.16MiB
    devid    1 size 2.53GiB used 808.00MiB path /dev/mapper/luks-part

# mount -t btrfs -o acl,noatime,nodiratime,compress=lzo
/dev/mapper/luks-part /mnt/btrfs
[  185.443505] BTRFS: device fsid d90b7698-7ef5-4c1e-8365-b7631a6eafba
devid 1 transid 17201265 /dev/mapper/luks-part scanned by mount (2976)
[  185.455314] BTRFS info (device dm-0): flagging fs with big metadata feature
[  185.461689] BTRFS info (device dm-0): use lzo compression, level 0
[  185.467924] BTRFS info (device dm-0): using free space tree
[  185.473563] BTRFS info (device dm-0): has skinny extents
[  185.486490] BTRFS info (device dm-0): enabling ssd optimizations

# btrfs fi df /mnt/btrfs
Data, single: total=280.00MiB, used=91.46MiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=256.00MiB, used=704.00KiB
GlobalReserve, single: total=3.25MiB, used=0.00B

Here is an example of the errors found by "btrfs check" after
mounting. These errors don't happen often but they are reproducible
and persistent.

# btrfs check --mode=lowmem --readonly -p /dev/mapper/luks-part
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-part
UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
[1/7] checking root items                      (0:00:00 elapsed, 1456
items checked)
[2/7] checking extents                         (0:00:01 elapsed, 42
items checked)
[3/7] checking free space tree                 (0:00:00 elapsed, 5
items checked)
ERROR: root 5 INODE_ITEM[27535265] index 55000957 name .sharedContents
filetype 1 missing
ERROR: root 5 INODE_ITEM[27535266] index 55000959 name .sharedContents
filetype 1 missing
ERROR: root 5 DIR INODE [256] size 668 not equal to 698
[4/7] checking fs roots                        (0:00:00 elapsed, 15
items checked)
ERROR: errors found in fs roots
found 96636928 bytes used, error(s) found
total csum bytes: 93652
total tree bytes: 737280
total fs tree bytes: 376832
total extent tree bytes: 147456
btree space waste bytes: 231395
file data blocks allocated: 95899648
 referenced 92807168
Command exited with non-zero status 1

# btrfs check --readonly -p /dev/mapper/luks-part
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-part
UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
[1/7] checking root items                      (0:00:00 elapsed, 1456
items checked)
[2/7] checking extents                         (0:00:00 elapsed, 54
items checked)
[3/7] checking free space tree                 (0:00:00 elapsed, 5
items checked)
root 5 inode 256 errors 200, dir isize wrong   (0:00:00 elapsed, 1
items checked)
root 5 inode 27535265 errors 1, no inode item
    unresolved ref dir 256 index 55000957 namelen 15 name
.sharedContents filetype 1 errors 5, no dir item, no inode ref
root 5 inode 27535266 errors 1, no inode item
    unresolved ref dir 256 index 55000959 namelen 15 name
.sharedContents filetype 1 errors 5, no dir item, no inode ref
[4/7] checking fs roots                        (0:00:00 elapsed, 22
items checked)
ERROR: errors found in fs roots
found 96636928 bytes used, error(s) found
total csum bytes: 93652
total tree bytes: 737280
total fs tree bytes: 376832
total extent tree bytes: 147456
btree space waste bytes: 231395
file data blocks allocated: 95899648
 referenced 92807168
Command exited with non-zero status 1

Here is the "btrfs ins dump-tree" output of the above inodes.

# btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535265 "
        location key (27535265 INODE_ITEM 0) type FILE
        transid 17119099 data_len 0 name_len 15
        name: .sharedContents
    item 62 key (256 DIR_INDEX 55000959) itemoff 13593 itemsize 45
        location key (27535266 INODE_ITEM 0) type FILE
        transid 17119099 data_len 0 name_len 15
# btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535266 "
        location key (27535266 INODE_ITEM 0) type FILE
        transid 17119099 data_len 0 name_len 15
        name: .sharedContents
    item 63 key (256 DIR_INDEX 55415388) itemoff 13545 itemsize 48
        location key (27743503 INODE_ITEM 0) type FILE
        transid 17188721 data_len 0 name_len 18

Is this a known issue with btrfs and power loss? Running "btrfs check
--repair" can fix this issue but I would like to prevent it in the
first place. This issue looks similar to the one in a previous message
on this list, "Filesystem inconsistency on power cycle" [0].


Thank you,
Kyle

[0]: https://lore.kernel.org/linux-btrfs/CA+XNQ=ixcfB1_CXHf5azsB4gX87vvdmei+fxv5dj4K_4=H1=ag@mail.gmail.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mounting causes errors after power loss
  2024-02-15 20:04 mounting causes errors after power loss Kyle Smith
@ 2024-02-15 23:23 ` Qu Wenruo
  2024-02-16  0:21   ` Kyle Smith
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2024-02-15 23:23 UTC (permalink / raw)
  To: Kyle Smith, linux-btrfs



在 2024/2/16 06:34, Kyle Smith 写道:
> Hello,
>
> I have noticed the occasional btrfs error after a hard power cycle and
> wanted to get a better understanding of the issue. These errors only
> happen after the btrfs partition is mounted, and running "btrfs check"
> before mounting does not find any errors.
>
> I am using btrfs on Linux 5.10.176 on an encrypted LUKS2 partition on
> an eMMC device. The LUKS2 partition is configured to allow-discards
> and btrfs is mounted with  "-o acl,noatime,nodiratime,compress=lzo".
>
> # uname -a
> Linux (none) 5.10.176 #0 SMP PREEMPT Thu Apr 27 20:28:15 2023 aarch64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v6.0.1
>
> # btrfs fi show
> Label: none  uuid: d90b7698-7ef5-4c1e-8365-b7631a6eafba
>      Total devices 1 FS bytes used 92.16MiB
>      devid    1 size 2.53GiB used 808.00MiB path /dev/mapper/luks-part
>
> # mount -t btrfs -o acl,noatime,nodiratime,compress=lzo
> /dev/mapper/luks-part /mnt/btrfs
> [  185.443505] BTRFS: device fsid d90b7698-7ef5-4c1e-8365-b7631a6eafba
> devid 1 transid 17201265 /dev/mapper/luks-part scanned by mount (2976)
> [  185.455314] BTRFS info (device dm-0): flagging fs with big metadata feature
> [  185.461689] BTRFS info (device dm-0): use lzo compression, level 0
> [  185.467924] BTRFS info (device dm-0): using free space tree
> [  185.473563] BTRFS info (device dm-0): has skinny extents
> [  185.486490] BTRFS info (device dm-0): enabling ssd optimizations
>
> # btrfs fi df /mnt/btrfs
> Data, single: total=280.00MiB, used=91.46MiB
> System, DUP: total=8.00MiB, used=16.00KiB
> Metadata, DUP: total=256.00MiB, used=704.00KiB
> GlobalReserve, single: total=3.25MiB, used=0.00B
>
> Here is an example of the errors found by "btrfs check" after
> mounting. These errors don't happen often but they are reproducible
> and persistent.
>
> # btrfs check --mode=lowmem --readonly -p /dev/mapper/luks-part
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/luks-part
> UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
> [1/7] checking root items                      (0:00:00 elapsed, 1456
> items checked)
> [2/7] checking extents                         (0:00:01 elapsed, 42
> items checked)
> [3/7] checking free space tree                 (0:00:00 elapsed, 5
> items checked)
> ERROR: root 5 INODE_ITEM[27535265] index 55000957 name .sharedContents
> filetype 1 missing
> ERROR: root 5 INODE_ITEM[27535266] index 55000959 name .sharedContents
> filetype 1 missing
> ERROR: root 5 DIR INODE [256] size 668 not equal to 698

Those are all fixable by the latest btrfs-progs, so no big deal.

Furthermore, this is not caused by some powerloss, but more like some
older btrfs bugs.
Or sometimes even memory bitflips (this need extra debugging to confirm).

By all means, it's recommended to use kernel newer than v5.11 at least
(thus recommended to go at least 5.15).

> [4/7] checking fs roots                        (0:00:00 elapsed, 15
> items checked)
> ERROR: errors found in fs roots
> found 96636928 bytes used, error(s) found
> total csum bytes: 93652
> total tree bytes: 737280
> total fs tree bytes: 376832
> total extent tree bytes: 147456
> btree space waste bytes: 231395
> file data blocks allocated: 95899648
>   referenced 92807168
> Command exited with non-zero status 1
>
> # btrfs check --readonly -p /dev/mapper/luks-part
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/luks-part
> UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
> [1/7] checking root items                      (0:00:00 elapsed, 1456
> items checked)
> [2/7] checking extents                         (0:00:00 elapsed, 54
> items checked)
> [3/7] checking free space tree                 (0:00:00 elapsed, 5
> items checked)
> root 5 inode 256 errors 200, dir isize wrong   (0:00:00 elapsed, 1
> items checked)
> root 5 inode 27535265 errors 1, no inode item
>      unresolved ref dir 256 index 55000957 namelen 15 name
> .sharedContents filetype 1 errors 5, no dir item, no inode ref
> root 5 inode 27535266 errors 1, no inode item
>      unresolved ref dir 256 index 55000959 namelen 15 name
> .sharedContents filetype 1 errors 5, no dir item, no inode ref
> [4/7] checking fs roots                        (0:00:00 elapsed, 22
> items checked)
> ERROR: errors found in fs roots
> found 96636928 bytes used, error(s) found
> total csum bytes: 93652
> total tree bytes: 737280
> total fs tree bytes: 376832
> total extent tree bytes: 147456
> btree space waste bytes: 231395
> file data blocks allocated: 95899648
>   referenced 92807168
> Command exited with non-zero status 1
>
> Here is the "btrfs ins dump-tree" output of the above inodes.
>
> # btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535265 "
>          location key (27535265 INODE_ITEM 0) type FILE
>          transid 17119099 data_len 0 name_len 15
>          name: .sharedContents
>      item 62 key (256 DIR_INDEX 55000959) itemoff 13593 itemsize 45
>          location key (27535266 INODE_ITEM 0) type FILE
>          transid 17119099 data_len 0 name_len 15
> # btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535266 "
>          location key (27535266 INODE_ITEM 0) type FILE
>          transid 17119099 data_len 0 name_len 15
>          name: .sharedContents
>      item 63 key (256 DIR_INDEX 55415388) itemoff 13545 itemsize 48
>          location key (27743503 INODE_ITEM 0) type FILE
>          transid 17188721 data_len 0 name_len 18

Unfortunately the dump is not enough to confirm anything.

Please try the following ones:

# btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535265
DIR_INDEX 55000957)"

# btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535266
DIR_INDEX 55000959)"

After the direct match, there would be a line like:

	location key (XXXX INODE_ITEM 0) type XXX

Use that key to do such search again.

>
> Is this a known issue with btrfs and power loss? Running "btrfs check
> --repair" can fix this issue but I would like to prevent it in the
> first place. This issue looks similar to the one in a previous message
> on this list, "Filesystem inconsistency on power cycle" [0].

The power loss is only going to cause problem if your disk are not
properly handling flush (VBox and VMware seems to do that).
And if your disks (from the lower LUKS layer, until the disk firmwares)
are not doing flushing correctly, it's going to cause transid mismatch,
not the same symptom.

For your case, it's completely unrelated, but I'd like more dump to make
sure it's not some weird memory bitflip.

Thanks,
Qu

>
>
> Thank you,
> Kyle
>
> [0]: https://lore.kernel.org/linux-btrfs/CA+XNQ=ixcfB1_CXHf5azsB4gX87vvdmei+fxv5dj4K_4=H1=ag@mail.gmail.com/
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mounting causes errors after power loss
  2024-02-15 23:23 ` Qu Wenruo
@ 2024-02-16  0:21   ` Kyle Smith
  2024-02-16  0:49     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Kyle Smith @ 2024-02-16  0:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Feb 15, 2024 at 3:23 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/2/16 06:34, Kyle Smith 写道:
> > Hello,
> >
> > I have noticed the occasional btrfs error after a hard power cycle and
> > wanted to get a better understanding of the issue. These errors only
> > happen after the btrfs partition is mounted, and running "btrfs check"
> > before mounting does not find any errors.
> >
> > I am using btrfs on Linux 5.10.176 on an encrypted LUKS2 partition on
> > an eMMC device. The LUKS2 partition is configured to allow-discards
> > and btrfs is mounted with  "-o acl,noatime,nodiratime,compress=lzo".
> >
> > # uname -a
> > Linux (none) 5.10.176 #0 SMP PREEMPT Thu Apr 27 20:28:15 2023 aarch64 GNU/Linux
> >
> > # btrfs --version
> > btrfs-progs v6.0.1
> >
> > # btrfs fi show
> > Label: none  uuid: d90b7698-7ef5-4c1e-8365-b7631a6eafba
> >      Total devices 1 FS bytes used 92.16MiB
> >      devid    1 size 2.53GiB used 808.00MiB path /dev/mapper/luks-part
> >
> > # mount -t btrfs -o acl,noatime,nodiratime,compress=lzo
> > /dev/mapper/luks-part /mnt/btrfs
> > [  185.443505] BTRFS: device fsid d90b7698-7ef5-4c1e-8365-b7631a6eafba
> > devid 1 transid 17201265 /dev/mapper/luks-part scanned by mount (2976)
> > [  185.455314] BTRFS info (device dm-0): flagging fs with big metadata feature
> > [  185.461689] BTRFS info (device dm-0): use lzo compression, level 0
> > [  185.467924] BTRFS info (device dm-0): using free space tree
> > [  185.473563] BTRFS info (device dm-0): has skinny extents
> > [  185.486490] BTRFS info (device dm-0): enabling ssd optimizations
> >
> > # btrfs fi df /mnt/btrfs
> > Data, single: total=280.00MiB, used=91.46MiB
> > System, DUP: total=8.00MiB, used=16.00KiB
> > Metadata, DUP: total=256.00MiB, used=704.00KiB
> > GlobalReserve, single: total=3.25MiB, used=0.00B
> >
> > Here is an example of the errors found by "btrfs check" after
> > mounting. These errors don't happen often but they are reproducible
> > and persistent.
> >
> > # btrfs check --mode=lowmem --readonly -p /dev/mapper/luks-part
> > Opening filesystem to check...
> > Checking filesystem on /dev/mapper/luks-part
> > UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
> > [1/7] checking root items                      (0:00:00 elapsed, 1456
> > items checked)
> > [2/7] checking extents                         (0:00:01 elapsed, 42
> > items checked)
> > [3/7] checking free space tree                 (0:00:00 elapsed, 5
> > items checked)
> > ERROR: root 5 INODE_ITEM[27535265] index 55000957 name .sharedContents
> > filetype 1 missing
> > ERROR: root 5 INODE_ITEM[27535266] index 55000959 name .sharedContents
> > filetype 1 missing
> > ERROR: root 5 DIR INODE [256] size 668 not equal to 698
>
> Those are all fixable by the latest btrfs-progs, so no big deal.
>
> Furthermore, this is not caused by some powerloss, but more like some
> older btrfs bugs.
> Or sometimes even memory bitflips (this need extra debugging to confirm).
>
> By all means, it's recommended to use kernel newer than v5.11 at least
> (thus recommended to go at least 5.15).

I'm currently using OpenWrt 22.03.5 which uses the 5.10 kernel, and I
am eventually going to move to OpenWrt 23.05 with the 5.15 kernel. In
the meantime, are there any btrfs patches that I should backport to
the 5.10 kernel? Is there any problem upgrading the kernel from 5.10
to 5.15 while btrfs has these errors? Would upgrading alone be enough
to fix these errors or is a "btrfs check --repair" required?

OpenWrt also provides btrfs-progs 6.0.1. Is this version new enough to
safely and reliably fix these errors? "btrfs check --repair" has been
successful everytime.

Please provide the debug steps to check for memory bitflips. The
system has been very stable so while I don't think this is a memory
issue it would be good to rule it out.

> > [4/7] checking fs roots                        (0:00:00 elapsed, 15
> > items checked)
> > ERROR: errors found in fs roots
> > found 96636928 bytes used, error(s) found
> > total csum bytes: 93652
> > total tree bytes: 737280
> > total fs tree bytes: 376832
> > total extent tree bytes: 147456
> > btree space waste bytes: 231395
> > file data blocks allocated: 95899648
> >   referenced 92807168
> > Command exited with non-zero status 1
> >
> > # btrfs check --readonly -p /dev/mapper/luks-part
> > Opening filesystem to check...
> > Checking filesystem on /dev/mapper/luks-part
> > UUID: d90b7698-7ef5-4c1e-8365-b7631a6eafba
> > [1/7] checking root items                      (0:00:00 elapsed, 1456
> > items checked)
> > [2/7] checking extents                         (0:00:00 elapsed, 54
> > items checked)
> > [3/7] checking free space tree                 (0:00:00 elapsed, 5
> > items checked)
> > root 5 inode 256 errors 200, dir isize wrong   (0:00:00 elapsed, 1
> > items checked)
> > root 5 inode 27535265 errors 1, no inode item
> >      unresolved ref dir 256 index 55000957 namelen 15 name
> > .sharedContents filetype 1 errors 5, no dir item, no inode ref
> > root 5 inode 27535266 errors 1, no inode item
> >      unresolved ref dir 256 index 55000959 namelen 15 name
> > .sharedContents filetype 1 errors 5, no dir item, no inode ref
> > [4/7] checking fs roots                        (0:00:00 elapsed, 22
> > items checked)
> > ERROR: errors found in fs roots
> > found 96636928 bytes used, error(s) found
> > total csum bytes: 93652
> > total tree bytes: 737280
> > total fs tree bytes: 376832
> > total extent tree bytes: 147456
> > btree space waste bytes: 231395
> > file data blocks allocated: 95899648
> >   referenced 92807168
> > Command exited with non-zero status 1
> >
> > Here is the "btrfs ins dump-tree" output of the above inodes.
> >
> > # btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535265 "
> >          location key (27535265 INODE_ITEM 0) type FILE
> >          transid 17119099 data_len 0 name_len 15
> >          name: .sharedContents
> >      item 62 key (256 DIR_INDEX 55000959) itemoff 13593 itemsize 45
> >          location key (27535266 INODE_ITEM 0) type FILE
> >          transid 17119099 data_len 0 name_len 15
> > # btrfs ins dump-tree -t 5 /dev/mapper/luks-part | grep -A5 "(27535266 "
> >          location key (27535266 INODE_ITEM 0) type FILE
> >          transid 17119099 data_len 0 name_len 15
> >          name: .sharedContents
> >      item 63 key (256 DIR_INDEX 55415388) itemoff 13545 itemsize 48
> >          location key (27743503 INODE_ITEM 0) type FILE
> >          transid 17188721 data_len 0 name_len 18
>
> Unfortunately the dump is not enough to confirm anything.
>
> Please try the following ones:
>
> # btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535265
> DIR_INDEX 55000957)"
>
> # btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535266
> DIR_INDEX 55000959)"
>
> After the direct match, there would be a line like:
>
>         location key (XXXX INODE_ITEM 0) type XXX
>
> Use that key to do such search again.

I wasn't able to find the  "(27535265 DIR_INDEX 55000957)" or
"(27535266 DIR_INDEX 55000959)"  strings in the dump. Here are the
lines matching any of those values. I get the same output with "-t 5"
or just removing the option. "-t" alone was throwing an error.

# btrfs ins dump-tree /dev/mapper/luks-part | grep -A3 -E
"27535265|55000957|27535266|55000959"
    item 61 key (256 DIR_INDEX 55000957) itemoff 13638 itemsize 45
        location key (27535265 INODE_ITEM 0) type FILE
        transid 17119099 data_len 0 name_len 15
        name: .sharedContents
    item 62 key (256 DIR_INDEX 55000959) itemoff 13593 itemsize 45
        location key (27535266 INODE_ITEM 0) type FILE
        transid 17119099 data_len 0 name_len 15
        name: .sharedContents
    item 63 key (256 DIR_INDEX 55415388) itemoff 13545 itemsize 48

I see these two "location key" lines but no new key values to search
for. Should I be looking for something else?

        location key (27535265 INODE_ITEM 0) type FILE
        location key (27535266 INODE_ITEM 0) type FILE

> >
> > Is this a known issue with btrfs and power loss? Running "btrfs check
> > --repair" can fix this issue but I would like to prevent it in the
> > first place. This issue looks similar to the one in a previous message
> > on this list, "Filesystem inconsistency on power cycle" [0].
>
> The power loss is only going to cause problem if your disk are not
> properly handling flush (VBox and VMware seems to do that).
> And if your disks (from the lower LUKS layer, until the disk firmwares)
> are not doing flushing correctly, it's going to cause transid mismatch,
> not the same symptom.
>
> For your case, it's completely unrelated, but I'd like more dump to make
> sure it's not some weird memory bitflip.

This is good to know. Can I rule out the lower LUKS layer and the disk
firmware since I'm not seeing a transid mismatch? These btrfs errors
are the only problems I've had with LUKS2 on eMMC.

Please let me know about backporting any relevant btrfs patches or
debugging a possible memory bitflip.

Thank you for your quick help.


Kyle

> Thanks,
> Qu
>
> >
> >
> > Thank you,
> > Kyle
> >
> > [0]: https://lore.kernel.org/linux-btrfs/CA+XNQ=ixcfB1_CXHf5azsB4gX87vvdmei+fxv5dj4K_4=H1=ag@mail.gmail.com/
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mounting causes errors after power loss
  2024-02-16  0:21   ` Kyle Smith
@ 2024-02-16  0:49     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-02-16  0:49 UTC (permalink / raw)
  To: Kyle Smith, fdmanana; +Cc: linux-btrfs



在 2024/2/16 10:51, Kyle Smith 写道:
> On Thu, Feb 15, 2024 at 3:23 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
[...]
>>
>> Those are all fixable by the latest btrfs-progs, so no big deal.
>>
>> Furthermore, this is not caused by some powerloss, but more like some
>> older btrfs bugs.
>> Or sometimes even memory bitflips (this need extra debugging to confirm).
>>
>> By all means, it's recommended to use kernel newer than v5.11 at least
>> (thus recommended to go at least 5.15).
>
> I'm currently using OpenWrt 22.03.5 which uses the 5.10 kernel, and I
> am eventually going to move to OpenWrt 23.05 with the 5.15 kernel. In
> the meantime, are there any btrfs patches that I should backport to
> the 5.10 kernel?

I don't think so, unless you want to backport all the tree-checker code
to the 5.10 kernel.

> Is there any problem upgrading the kernel from 5.10
> to 5.15 while btrfs has these errors?

Still hard to say. See my reply about the dump below.

> Would upgrading alone be enough
> to fix these errors or is a "btrfs check --repair" required?

--repair is required. It's already a corruption on disk.

>
> OpenWrt also provides btrfs-progs 6.0.1. Is this version new enough to
> safely and reliably fix these errors? "btrfs check --repair" has been
> successful everytime.

I believe it's should be fine.

>
> Please provide the debug steps to check for memory bitflips. The
> system has been very stable so while I don't think this is a memory
> issue it would be good to rule it out.

If it's x86_64 based, you can try some UEFI payload like memtest86+.

If not, you can go memtester program.

>
[...]
>>
>> Unfortunately the dump is not enough to confirm anything.
>>
>> Please try the following ones:
>>
>> # btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535265
>> DIR_INDEX 55000957)"
>>
>> # btrfs ins dump-tree -t /dev/mapper/luks-part | grep -A5 "(27535266
>> DIR_INDEX 55000959)"
>>
>> After the direct match, there would be a line like:
>>
>>          location key (XXXX INODE_ITEM 0) type XXX
>>
>> Use that key to do such search again.
>
> I wasn't able to find the  "(27535265 DIR_INDEX 55000957)" or
> "(27535266 DIR_INDEX 55000959)"  strings in the dump. Here are the
> lines matching any of those values. I get the same output with "-t 5"
> or just removing the option. "-t" alone was throwing an error.
>
> # btrfs ins dump-tree /dev/mapper/luks-part | grep -A3 -E
> "27535265|55000957|27535266|55000959"
>      item 61 key (256 DIR_INDEX 55000957) itemoff 13638 itemsize 45
>          location key (27535265 INODE_ITEM 0) type FILE
>          transid 17119099 data_len 0 name_len 15
>          name: .sharedContents
>      item 62 key (256 DIR_INDEX 55000959) itemoff 13593 itemsize 45
>          location key (27535266 INODE_ITEM 0) type FILE
>          transid 17119099 data_len 0 name_len 15
>          name: .sharedContents
>      item 63 key (256 DIR_INDEX 55415388) itemoff 13545 itemsize 48

So it really means the inode 27535265 and 27535266 are gone.

It may be something related to the transaction split in older kernels,
as the deletion of the inode item and those dir items should be in the
same transaction.

But it's pretty old kernel, thus I'm not sure if it's possible to pin
down the fix/offending commit.

In that case, no obvious memory biflip.
But since the damage is already done, a --repair is required.
>
> I see these two "location key" lines but no new key values to search
> for. Should I be looking for something else?
>
>          location key (27535265 INODE_ITEM 0) type FILE
>          location key (27535266 INODE_ITEM 0) type FILE

Considering that's the only error, it should really be those two inode
items missing.
Or it means the dir index are not properly deleted.

[...]
>>
>> For your case, it's completely unrelated, but I'd like more dump to make
>> sure it's not some weird memory bitflip.
>
> This is good to know. Can I rule out the lower LUKS layer and the disk
> firmware since I'm not seeing a transid mismatch? These btrfs errors
> are the only problems I've had with LUKS2 on eMMC.

The problem is, you can only find out if it's something wrong with flush
when you already hit a transid error.

So forget flush related problem for now.

>
> Please let me know about backporting any relevant btrfs patches or
> debugging a possible memory bitflip.

I don't have any good idea on how this happened.

Adding Filipe and he may be aware of which commit is the cause/fix.

Thanks,
Qu

>
> Thank you for your quick help.
>
>
> Kyle
>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> Thank you,
>>> Kyle
>>>
>>> [0]: https://lore.kernel.org/linux-btrfs/CA+XNQ=ixcfB1_CXHf5azsB4gX87vvdmei+fxv5dj4K_4=H1=ag@mail.gmail.com/
>>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-16  0:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15 20:04 mounting causes errors after power loss Kyle Smith
2024-02-15 23:23 ` Qu Wenruo
2024-02-16  0:21   ` Kyle Smith
2024-02-16  0:49     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox