Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: ellie <el@horse64.org>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs corruption issue on Pine64 PinePhone
Date: Mon, 19 Aug 2024 14:59:31 +0930	[thread overview]
Message-ID: <33f0ecec-585d-4a02-a8a5-319759401e5f@gmx.com> (raw)
In-Reply-To: <bf24d64d-9bf7-48ad-9a36-7ae7d262a6b8@horse64.org>



在 2024/8/19 13:28, ellie 写道:
> Is there something else I could provide to help track this down? I
> assume just because the file contents happen to be fine, doesn't mean
> there wasn't corruption, like for example in the metadata. My apologies
> for taking up your time.

This means, by somehow the data checksum is incorrect.

This doesn't sound sane to me, so I can only come up two possible reasons:

1. The checksum algorithm on the platform is insane
    IIRC the SOC is pretty mature (although it also means old), this
    doesn't sound possible to me.

2. Memory hardware is incorrect
    Thus causing bitflip for data csum.

Other than above two reasons, I can not come up with other reasons
unfortunately.

Thanks,
Qu

>
> Regards,
>
> Ellie
>
> On 8/8/24 13:31, ellie wrote:
>> On 8/6/24 23:55, Qu Wenruo wrote:
>>>
>>>
>>> 在 2024/8/7 01:32, ellie 写道:
>>>>
>>>>
>>>> On 8/5/24 08:34, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> 在 2024/8/5 15:50, ellie 写道:
>>>>>>
>>>>>>
>>>>>> On 8/5/24 08:10, Qu Wenruo wrote:
>>>>>>>
>>>>>>>
>>>>>>> 在 2024/8/5 15:25, ellie 写道:
>>>>>>>> On 8/5/24 07:39, ellie wrote:
>>>>>>>>> Dear kernel list,
>>>>>>>>>
>>>>>>>>> I'm hoping this is the right place to sent this. But there seems
>>>>>>>>> to be
>>>>>>>>> a btrfs corruption issue on the Pine64 PinePhone:
>>>>>>>>>
>>>>>>>>> https://gitlab.com/postmarketOS/pmaports/-/issues/3058
>>>>>>>>>
>>>>>>>>> The kernel is 6.9.10, I wouldn't know what exact additional
>>>>>>>>> patches
>>>>>>>>> may be used by postmarketOS (which is based on Alpine). The
>>>>>>>>> device is
>>>>>>>>> the PinePhone revision 1.2a or newer https://wiki.pine64.org/wiki/
>>>>>>>>> PinePhone#Hardware_revisions sadly there doesn't seem to be a
>>>>>>>>> way to
>>>>>>>>> check in software if it's 1.2a or 1.2b, and I don't remember which
>>>>>>>>> it is.
>>>>>>>>>
>>>>>>>>> This is on an SD Card, so an inherently rather unreliable storage
>>>>>>>>> medium. However, I tried two cards from what I believe to be two
>>>>>>>>> different vendors, Lexar and SanDisk, and I'm seeing this with
>>>>>>>>> both.
>>>>>>>>>
>>>>>>>>> The PinePhone had various chipset instability issues before, like
>>>>>>>>> https://gitlab.com/postmarketOS/pmaports/-/issues/805 which I
>>>>>>>>> believe
>>>>>>>>> has however been fixed since. I have no idea if that's
>>>>>>>>> relevant, I'm
>>>>>>>>> just pointing it out. I also don't know if other filesystems, like
>>>>>>>>> ext4 that I used before, might have also had corruption and just
>>>>>>>>> didn't detect it. Not that I ever noticed anything, but I'm not
>>>>>>>>> sure I
>>>>>>>>> necessarily ever would have.
>>>>>>>
>>>>>>> In the detailed report in pmOS issue, you mentioned it's a video
>>>>>>> file.
>>>>>>>
>>>>>>> I'm wondering if all the corruptions you see are from video files,
>>>>>>> especially if the video files are all recorded on the file.
>>>>>>>
>>>>>>> If that's the case, it may be related to the IO pattern,
>>>>>>> especially if
>>>>>>> the recording tool is using direct IO and didn't have proper
>>>>>>> writeback
>>>>>>> wait for those direct IO.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Qu
>>>>>>>
>>>>>>
>>>>>> Thanks so much for the quick input!
>>>>>>
>>>>>> All the files I mentioned in bug reports were written by
>>>>>> syncthing, so
>>>>>> there wasn't any on-device video recording involved. I once saw
>>>>>> Nheko's
>>>>>> database file corrupt however, so it's apparently not limited to
>>>>>> syncthing. I'm guessing video files are affected so often simply
>>>>>> due to
>>>>>> their large size.
>>>>>
>>>>> I did a quick clone and search of syncthing.
>>>>>
>>>>> There is no usage of O_DIRECT directly, so I guess it's not the known
>>>>> csum mismatch caused by bad sync of direct IO writeback.
>>>>>
>>>>> In that case, since the corrupted file is syncthing synchronized, can
>>>>> you do a diff of the binary data?
>>>>>
>>>>> To avoid the EIO from btrfs, you can use "-o rescue=all,ro" to
>>>>> mount the
>>>>> sdcard on another system, then compare the binary.
>>>>> (e.g. "xxd file.good > good.xxd; xxd file.bad > bad.xxd; diff *.xxd")
>>>>>
>>>>> At this stage, we need to find out what's really causing the problem,
>>>>> the btrfs itself or some thing lower level.
>>>>> (I strongly hope it's not btrfs, but either way it's not going to
>>>>> end up
>>>>> well)
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>> Thanks for your detailed instructions! I was about to do as you said
>>>> and
>>>> ran the sync for a few hours, stopped it, and planned to run btrfs
>>>> scrub
>>>> this evening. However, I then ran into a hard shutdown due to what
>>>> might
>>>> be an upower bug (won't lie, was very annoyed at that point):
>>>>
>>>> https://gitlab.com/postmarketOS/pmaports/-/issues/3073
>>>>
>>>> Should I still attach a diff for an affected file I find now? Or are
>>>> the
>>>> results going to be worthless if there was a hard shutdown in between,
>>>> and I need to first fix the filesystem, repeat the sync test, and
>>>> repeat
>>>> finding a new corruption error to diff?
>>>
>>> As long as you didn't touch those files, and scrub still reports errors
>>> on that file, the diff is still very helpful to provide some clue.
>>>
>>
>> I finally had a new corrupted file pop up, this was actually after any
>> unintended sudden shutdown so there shouldn't be any interference from
>> that:
>>
>> [128958.860335] BTRFS error (device dm-0): unable to fixup (regular)
>> error at logical 133906497536 on dev /dev/mapper/root physical
>> 135089684480
>> [128958.862548] BTRFS warning (device dm-0): checksum error at logical
>> 133906497536 on dev /dev/mapper/root, physical 135089684480, root 257,
>> inode 331715, offset 102400, length 4096, links 1 (path: ellie/Music/
>> Baldur's Gate (2) II Shadows of Amn (2000)/06 City Gates.mp3)
>>
>> However, when manually mounting the file on the computer where it
>> originates from and where the undamaged original file is:
>>
>> /mnt # mount -t btrfs -o rescue=all,ro,subvol=/@home,defaults /dev/
>> mapper/blamap p64
>> /mnt # ls p64/
>> ellie
>> /mnt # cp p64/ellie/Music/Baldur\'s\ Gate\ \(2\)\ II\ Shadows\ of\
>> Amn\ \(2000\)/06\ City\ Gates.mp3 ./
>> /mnt # diff 06\ City\ Gates.mp3 /home/ellie/Music/Baldur\'s\ Gate\
>> \(2\)\ II\ Shadows\ of\ Amn\ \(2000\)/06\ City\ Gates.mp3
>> /mnt # diff 06\ City\ Gates.mp3 /home/ellie/Music/Baldur\'s\ Gate\
>> \(2\)\ II\ Shadows\ of\ Amn\ \(2000\)/06\ City\ Gates.mp3
>> /mnt #
>>
>> It seems like file is exactly the same, which I assume isn't meant to
>> happen.
>>
>> I'm not sure what that implies, but I hope it's helpful info!
>>
>> Regards,
>>
>> Ellie
>>
>

  reply	other threads:[~2024-08-19  5:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05  5:39 btrfs corruption issue on Pine64 PinePhone ellie
2024-08-05  5:55 ` ellie
2024-08-05  6:10   ` Qu Wenruo
2024-08-05  6:20     ` ellie
2024-08-05  6:34       ` Qu Wenruo
2024-08-06 16:02         ` ellie
2024-08-06 21:55           ` Qu Wenruo
2024-08-08 11:31             ` ellie
2024-08-19  3:58               ` ellie
2024-08-19  5:29                 ` Qu Wenruo [this message]
2024-08-19  8:16                   ` ellie
2024-10-17 20:17                   ` Ellie
2024-10-02  7:20 ` ellie
2024-12-16 22:53   ` BTRFS hangs and causes semi-freezes on PinePhone Ellie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33f0ecec-585d-4a02-a8a5-319759401e5f@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=el@horse64.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox