Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: ellie <el@horse64.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs corruption issue on Pine64 PinePhone
Date: Tue, 6 Aug 2024 18:02:12 +0200	[thread overview]
Message-ID: <ad8a9333-8732-4d78-a86b-22dea00aabbe@horse64.org> (raw)
In-Reply-To: <1e96ef22-b51d-488a-ab90-84fd85c981ea@gmx.com>



On 8/5/24 08:34, Qu Wenruo wrote:
> 
> 
> 在 2024/8/5 15:50, ellie 写道:
>>
>>
>> On 8/5/24 08:10, Qu Wenruo wrote:
>>>
>>>
>>> 在 2024/8/5 15:25, ellie 写道:
>>>> On 8/5/24 07:39, ellie wrote:
>>>>> Dear kernel list,
>>>>>
>>>>> I'm hoping this is the right place to sent this. But there seems to be
>>>>> a btrfs corruption issue on the Pine64 PinePhone:
>>>>>
>>>>> https://gitlab.com/postmarketOS/pmaports/-/issues/3058
>>>>>
>>>>> The kernel is 6.9.10, I wouldn't know what exact additional patches
>>>>> may be used by postmarketOS (which is based on Alpine). The device is
>>>>> the PinePhone revision 1.2a or newer https://wiki.pine64.org/wiki/
>>>>> PinePhone#Hardware_revisions sadly there doesn't seem to be a way to
>>>>> check in software if it's 1.2a or 1.2b, and I don't remember which
>>>>> it is.
>>>>>
>>>>> This is on an SD Card, so an inherently rather unreliable storage
>>>>> medium. However, I tried two cards from what I believe to be two
>>>>> different vendors, Lexar and SanDisk, and I'm seeing this with both.
>>>>>
>>>>> The PinePhone had various chipset instability issues before, like
>>>>> https://gitlab.com/postmarketOS/pmaports/-/issues/805 which I believe
>>>>> has however been fixed since. I have no idea if that's relevant, I'm
>>>>> just pointing it out. I also don't know if other filesystems, like
>>>>> ext4 that I used before, might have also had corruption and just
>>>>> didn't detect it. Not that I ever noticed anything, but I'm not sure I
>>>>> necessarily ever would have.
>>>
>>> In the detailed report in pmOS issue, you mentioned it's a video file.
>>>
>>> I'm wondering if all the corruptions you see are from video files,
>>> especially if the video files are all recorded on the file.
>>>
>>> If that's the case, it may be related to the IO pattern, especially if
>>> the recording tool is using direct IO and didn't have proper writeback
>>> wait for those direct IO.
>>>
>>> Thanks,
>>> Qu
>>>
>>
>> Thanks so much for the quick input!
>>
>> All the files I mentioned in bug reports were written by syncthing, so
>> there wasn't any on-device video recording involved. I once saw Nheko's
>> database file corrupt however, so it's apparently not limited to
>> syncthing. I'm guessing video files are affected so often simply due to
>> their large size.
> 
> I did a quick clone and search of syncthing.
> 
> There is no usage of O_DIRECT directly, so I guess it's not the known
> csum mismatch caused by bad sync of direct IO writeback.
> 
> In that case, since the corrupted file is syncthing synchronized, can
> you do a diff of the binary data?
> 
> To avoid the EIO from btrfs, you can use "-o rescue=all,ro" to mount the
> sdcard on another system, then compare the binary.
> (e.g. "xxd file.good > good.xxd; xxd file.bad > bad.xxd; diff *.xxd")
> 
> At this stage, we need to find out what's really causing the problem,
> the btrfs itself or some thing lower level.
> (I strongly hope it's not btrfs, but either way it's not going to end up
> well)
> 
> Thanks,
> Qu
Thanks for your detailed instructions! I was about to do as you said and 
ran the sync for a few hours, stopped it, and planned to run btrfs scrub 
this evening. However, I then ran into a hard shutdown due to what might 
be an upower bug (won't lie, was very annoyed at that point):

https://gitlab.com/postmarketOS/pmaports/-/issues/3073

Should I still attach a diff for an affected file I find now? Or are the 
results going to be worthless if there was a hard shutdown in between, 
and I need to first fix the filesystem, repeat the sync test, and repeat 
finding a new corruption error to diff?

Regards,

Ellie


  reply	other threads:[~2024-08-06 16:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05  5:39 btrfs corruption issue on Pine64 PinePhone ellie
2024-08-05  5:55 ` ellie
2024-08-05  6:10   ` Qu Wenruo
2024-08-05  6:20     ` ellie
2024-08-05  6:34       ` Qu Wenruo
2024-08-06 16:02         ` ellie [this message]
2024-08-06 21:55           ` Qu Wenruo
2024-08-08 11:31             ` ellie
2024-08-19  3:58               ` ellie
2024-08-19  5:29                 ` Qu Wenruo
2024-08-19  8:16                   ` ellie
2024-10-17 20:17                   ` Ellie
2024-10-02  7:20 ` ellie
2024-12-16 22:53   ` BTRFS hangs and causes semi-freezes on PinePhone Ellie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad8a9333-8732-4d78-a86b-22dea00aabbe@horse64.org \
    --to=el@horse64.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox