From: Calvin Owens <calvin@wbinvd.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-block@vger.kernel.org, linux-raid@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [QUESTION] Debugging some file data corruption
Date: Fri, 14 Nov 2025 11:39:21 -0800 [thread overview]
Message-ID: <aReFaSpMe3yxoBMA@mozart.vkv.me> (raw)
In-Reply-To: <cd54e3a7-d676-46fe-8922-bb97d4e775cc@gmx.com>
On Wednesday 11/12 at 07:32 +1030, Qu Wenruo wrote:
> With LUKS in the middle, it makes any corruption pattern very human
> unreadable.
>
> I guess it's not really feasible to try to reproduce the problem again since
> it has 10TiB data involved?
I can try again. It only takes 10 minutes of my time to get it started,
even if it takes a few days to run, I've got spare machines for it.
> But if you can spend a lot of time waiting for data copy, mind to try
> the following combination(s)?
>
> - btrfs on mdraid1
> - btrfs RAID1 on raw two HDDs
Will do.
> Considering there is no bad/bad combinations, I strongly doubt if it's
> mdraid1 itself causing problems.
>
> Does the mdraid1 has something like write-behind feature enabled?
No, nothing special, I'm creating the array using:
mdadm --create -l 1 -n 2 --write-zeroes /dev/sda /dev/sdb
> > Then, I re-ran the offline scrubs: drive A now shows all the errors
> > originally seen across both drives, and drive B is now clean.
> >
> > Finally, I ran userspace checksums of the full set of files on the
> > newly clean drive B: they perfectly match an older copy in my backups.
> >
> > This proves that:
> >
> > 1) RAID mismatches and btrfs checksum failures are strictly 1:1.
> > 2) For every RAID mismatch, strictly one mirror was corrupted.
> > 3) No slient corruption occurred, btrfs caught everything.
> >
> > The hard drives are brand new, so that is my current suspicion.
>
> I won't suspect HDD as the first culprit. Since no powerloss there is no
> FLUSH/FUA bugs involved, and all corruptions are related to data but not
> metadata, if it's really HDD I guess we should have at least one or two
> metadata corruption too.
I should have mentioned this: there were a few metadata corruptions, but
the first online scrub fixed them (DUP), so I didn't get a chance to see
what their contents were. Here's the full log:
[Oct 5 00:45] [ T307779] BTRFS: device fsid 3bd1727b-c8ae-4876-96b2-9318c1f9556f devid 1 transid 121 /dev/mapper/md0_crypt (253:1) scanned by mount (307779)
[ +0.000875] [ T307779] BTRFS info (device dm-1): first mount of filesystem 3bd1727b-c8ae-4876-96b2-9318c1f9556f
[ +0.000054] [ T307779] BTRFS info (device dm-1): using blake2b (blake2b-256-generic) checksum algorithm
[ +3.510409] [ T307779] BTRFS info (device dm-1): enabling ssd optimizations
[ +0.000052] [ T307779] BTRFS info (device dm-1): enabling free space tree
[ +0.000015] [ T307779] BTRFS info (device dm-1): use zstd compression, level 1
[ +9.793525] [ T307802] BTRFS info (device dm-1): scrub: started on devid 1
[Oct 5 01:30] [ T307812] BTRFS error (device dm-1): scrub: fixed up error at logical 316151431168 on dev /dev/mapper/md0_crypt physical 310791110656
[ +0.000070] [ T307812] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[Oct 5 05:16] [ T308052] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 1850055917568 on dev /dev/mapper/md0_crypt physical 1858654240768
[ +0.082724] [ T308052] BTRFS warning (device dm-1): scrub: checksum error at logical 1850055917568 on dev /dev/mapper/md0_crypt, physical 1858654240768 root 5 inode 387 offset 40929132544 length 4096 links 1 (path: REDACTED)
[ +0.000077] [ T308052] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[Oct 5 06:51] [ T312207] BTRFS warning (device dm-1): scrub: tree block 2493640359936 mirror 2 has bad csum, has 0x4086e4014eeb997db83ae7255c333697ed4d740338405795861d6f3d0c7848af want 0xbdbcdc764674915f8899d9c164916bd7aad34693e78e9e9ba479b2339e12ef1c
[ +0.077111] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
[ +0.000065] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
[ +0.000026] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
[ +0.000016] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
[Oct 5 07:47] [ T312650] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 2878389944320 on dev /dev/mapper/md0_crypt physical 2896685498368
[ +0.079176] [ T312650] BTRFS warning (device dm-1): scrub: checksum error at logical 2878389944320 on dev /dev/mapper/md0_crypt, physical 2896685498368 root 5 inode 431 offset 2979594240 length 4096 links 1 (path: REDACTED)
[ +0.000079] [ T312650] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[Oct 5 08:42] [ T312757] BTRFS error (device dm-1): scrub: fixed up error at logical 3255455711232 on dev /dev/mapper/md0_crypt physical 3273751265280
[ +0.000066] [ T312757] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[Oct 5 12:31] [ T316045] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 4804342972416 on dev /dev/mapper/md0_crypt physical 4836597170176
[ +0.106254] [ T316045] BTRFS warning (device dm-1): scrub: checksum error at logical 4804342972416 on dev /dev/mapper/md0_crypt, physical 4836597170176 root 5 inode 12231 offset 626524160 length 4096 links 1 (path: REDACTED)
[ +0.000082] [ T316045] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[Oct 5 12:40] [ T316489] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 4864481624064 on dev /dev/mapper/md0_crypt physical 4896735821824
[ +0.018418] [ T316489] BTRFS warning (device dm-1): scrub: checksum error at logical 4864481624064 on dev /dev/mapper/md0_crypt, physical 4896735821824 root 5 inode 12268 offset 635633664 length 4096 links 1 (path: REDACTED)
[ +0.000067] [ T316489] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[Oct 5 15:18] [ T317014] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 5960080556032 on dev /dev/mapper/md0_crypt physical 5996629721088
[ +0.088397] [ T317014] BTRFS warning (device dm-1): scrub: checksum error at logical 5960080556032 on dev /dev/mapper/md0_crypt, physical 5996629721088 root 5 inode 15079 offset 43859968 length 4096 links 1 (path: REDACTED)
[ +0.000078] [ T317014] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
[ +11.379069] [ T319380] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 5961402220544 on dev /dev/mapper/md0_crypt physical 5997951385600
[ +0.000245] [ T319380] BTRFS warning (device dm-1): scrub: checksum error at logical 5961402220544 on dev /dev/mapper/md0_crypt, physical 5997951385600 root 5 inode 15079 offset 612937728 length 4096 links 1 (path: REDACTED)
[ +0.000038] [ T319380] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
[Oct 5 16:39] [ T320461] BTRFS error (device dm-1): scrub: fixed up error at logical 6508309970944 on dev /dev/mapper/md0_crypt physical 6551301586944
[ +0.000066] [ T320461] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
[Oct 5 16:44] [ T319972] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 6542744813568 on dev /dev/mapper/md0_crypt physical 6585736429568
[ +0.072550] [ T319972] BTRFS warning (device dm-1): scrub: checksum error at logical 6542744813568 on dev /dev/mapper/md0_crypt, physical 6585736429568 root 5 inode 16518 offset 103481344 length 4096 links 1 (path: REDACTED)
[ +0.000079] [ T319972] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[Oct 5 20:28] [ T320144] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 8062253727744 on dev /dev/mapper/md0_crypt physical 8112761536512
[ +0.059861] [ T320144] BTRFS warning (device dm-1): scrub: checksum error at logical 8062253727744 on dev /dev/mapper/md0_crypt, physical 8112761536512 root 5 inode 349790 offset 16443854848 length 4096 links 1 (path: REDACTED)
[ +0.000069] [ T320144] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
[Oct 5 21:01] [ T307802] BTRFS info (device dm-1): scrub: finished on devid 1 with status: 0
UUID: 3bd1727b-c8ae-4876-96b2-9318c1f9556f
Scrub started: Sun Oct 5 00:45:20 2025
Status: finished
Duration: 20:15:56
Total to scrub: 7.54TiB
Rate: 108.38MiB/s
Error summary: verify=4 csum=11
Corrected: 7
Uncorrectable: 8
Unverified: 0
That scrub missed about half the data errors, which makes sense to me:
the mdraid1 "randomly" reads a given block from one or the other
underlying drive.
But the metadata doesn't make sense to me: the one scrub appears to have
seen and fixed *all* the metadata errors, because my later offline
scrubs of each individual drive saw no metadata errors. Seems unlikely.
I'll make sure to only run offline scrubs next time, so I can inspect
the metadata corruptions too (and for the btrfs-raid1, so I can inspect
the corruptions at all before they're fixed up).
> > I've
> > used the same two-drive USB enclosure extensively with older HDDs and
> > never seen a problem. I'm running this FIO job to test them:
> >
> > [global]
> > numjobs=1
> > loops=20
> > ioengine=io_uring
> > rw=randrw
> > percentage_random=5%
> > rwmixwrite=95
> > iodepth=32
> > direct=1
> > size=5%
> > blocksize_range=1k-32m
> > sync=none
> > refill_buffers=1
> > random_distribution=random
> > random_generator=tausworthe64
> > verify=xxhash
> > verify_fatal=1
> > verify_dump=1
> > do_verify=1
> > verify_async=$ncpus
> > [hdd-sdb-test]
> > filename=/dev/sdb
> > [hdd-sdc-test]
> > filename=/dev/sdc
> >
> > ...but no luck hitting anything after about 18 hours.
>
> I didn't have a good experience using fio to find corruption, thus if you
> hit the problem by simplying copying data (I guess through 'cp'?), then
> maybe stick to the working reproducer?
>
> Although copying 10TiB into HDDs will take over 40 hours, still much longer
> than your fio workload.
Makes sense. I'll stick to making copies like the original workload,
and report back if I can trigger it again.
Thanks,
Calvin
next prev parent reply other threads:[~2025-11-14 19:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-11 17:01 [QUESTION] Debugging some file data corruption Calvin Owens
2025-11-11 21:02 ` Qu Wenruo
2025-11-14 19:39 ` Calvin Owens [this message]
2026-01-12 18:11 ` Calvin Owens
2026-02-11 14:07 ` Kenta Akagi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aReFaSpMe3yxoBMA@mozart.vkv.me \
--to=calvin@wbinvd.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox