Re: [QUESTION] Debugging some file data corruption

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

From: Calvin Owens <calvin@wbinvd.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-block@vger.kernel.org, linux-raid@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [QUESTION] Debugging some file data corruption
Date: Fri, 14 Nov 2025 11:39:21 -0800	[thread overview]
Message-ID: <aReFaSpMe3yxoBMA@mozart.vkv.me> (raw)
In-Reply-To: <cd54e3a7-d676-46fe-8922-bb97d4e775cc@gmx.com>

On Wednesday 11/12 at 07:32 +1030, Qu Wenruo wrote:
> With LUKS in the middle, it makes any corruption pattern very human
> unreadable.
> 
> I guess it's not really feasible to try to reproduce the problem again since
> it has 10TiB data involved?

I can try again. It only takes 10 minutes of my time to get it started,
even if it takes a few days to run, I've got spare machines for it.

> But if you can spend a lot of time waiting for data copy, mind to try
> the following combination(s)?
> 
> - btrfs on mdraid1
> - btrfs RAID1 on raw two HDDs

Will do.

> Considering there is no bad/bad combinations, I strongly doubt if it's
> mdraid1 itself causing problems.
> 
> Does the mdraid1 has something like write-behind feature enabled?

No, nothing special, I'm creating the array using:

    mdadm --create -l 1 -n 2 --write-zeroes /dev/sda /dev/sdb

> > Then, I re-ran the offline scrubs: drive A now shows all the errors
> > originally seen across both drives, and drive B is now clean.
> > 
> > Finally, I ran userspace checksums of the full set of files on the
> > newly clean drive B: they perfectly match an older copy in my backups.
> > 
> > This proves that:
> > 
> >      1) RAID mismatches and btrfs checksum failures are strictly 1:1.
> >      2) For every RAID mismatch, strictly one mirror was corrupted.
> >      3) No slient corruption occurred, btrfs caught everything.
> > 
> > The hard drives are brand new, so that is my current suspicion.
> 
> I won't suspect HDD as the first culprit. Since no powerloss there is no
> FLUSH/FUA bugs involved, and all corruptions are related to data but not
> metadata, if it's really HDD I guess we should have at least one or two
> metadata corruption too.

I should have mentioned this: there were a few metadata corruptions, but
the first online scrub fixed them (DUP), so I didn't get a chance to see
what their contents were. Here's the full log:

    [Oct 5 00:45] [ T307779] BTRFS: device fsid 3bd1727b-c8ae-4876-96b2-9318c1f9556f devid 1 transid 121 /dev/mapper/md0_crypt (253:1) scanned by mount (307779)
    [  +0.000875] [ T307779] BTRFS info (device dm-1): first mount of filesystem 3bd1727b-c8ae-4876-96b2-9318c1f9556f
    [  +0.000054] [ T307779] BTRFS info (device dm-1): using blake2b (blake2b-256-generic) checksum algorithm
    [  +3.510409] [ T307779] BTRFS info (device dm-1): enabling ssd optimizations
    [  +0.000052] [ T307779] BTRFS info (device dm-1): enabling free space tree
    [  +0.000015] [ T307779] BTRFS info (device dm-1): use zstd compression, level 1
    [  +9.793525] [ T307802] BTRFS info (device dm-1): scrub: started on devid 1
    [Oct 5 01:30] [ T307812] BTRFS error (device dm-1): scrub: fixed up error at logical 316151431168 on dev /dev/mapper/md0_crypt physical 310791110656
    [  +0.000070] [ T307812] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
    [Oct 5 05:16] [ T308052] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 1850055917568 on dev /dev/mapper/md0_crypt physical 1858654240768
    [  +0.082724] [ T308052] BTRFS warning (device dm-1): scrub: checksum error at logical 1850055917568 on dev /dev/mapper/md0_crypt, physical 1858654240768 root 5 inode 387 offset 40929132544 length 4096 links 1 (path: REDACTED)
    [  +0.000077] [ T308052] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    [Oct 5 06:51] [ T312207] BTRFS warning (device dm-1): scrub: tree block 2493640359936 mirror 2 has bad csum, has 0x4086e4014eeb997db83ae7255c333697ed4d740338405795861d6f3d0c7848af want 0xbdbcdc764674915f8899d9c164916bd7aad34693e78e9e9ba479b2339e12ef1c
    [  +0.077111] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
    [  +0.000065] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
    [  +0.000026] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
    [  +0.000016] [ T312207] BTRFS error (device dm-1): scrub: fixed up error at logical 2493640343552 on dev /dev/mapper/md0_crypt physical 2510828601344
    [Oct 5 07:47] [ T312650] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 2878389944320 on dev /dev/mapper/md0_crypt physical 2896685498368
    [  +0.079176] [ T312650] BTRFS warning (device dm-1): scrub: checksum error at logical 2878389944320 on dev /dev/mapper/md0_crypt, physical 2896685498368 root 5 inode 431 offset 2979594240 length 4096 links 1 (path: REDACTED)
    [  +0.000079] [ T312650] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
    [Oct 5 08:42] [ T312757] BTRFS error (device dm-1): scrub: fixed up error at logical 3255455711232 on dev /dev/mapper/md0_crypt physical 3273751265280
    [  +0.000066] [ T312757] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
    [Oct 5 12:31] [ T316045] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 4804342972416 on dev /dev/mapper/md0_crypt physical 4836597170176
    [  +0.106254] [ T316045] BTRFS warning (device dm-1): scrub: checksum error at logical 4804342972416 on dev /dev/mapper/md0_crypt, physical 4836597170176 root 5 inode 12231 offset 626524160 length 4096 links 1 (path: REDACTED)
    [  +0.000082] [ T316045] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
    [Oct 5 12:40] [ T316489] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 4864481624064 on dev /dev/mapper/md0_crypt physical 4896735821824
    [  +0.018418] [ T316489] BTRFS warning (device dm-1): scrub: checksum error at logical 4864481624064 on dev /dev/mapper/md0_crypt, physical 4896735821824 root 5 inode 12268 offset 635633664 length 4096 links 1 (path: REDACTED)
    [  +0.000067] [ T316489] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
    [Oct 5 15:18] [ T317014] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 5960080556032 on dev /dev/mapper/md0_crypt physical 5996629721088
    [  +0.088397] [ T317014] BTRFS warning (device dm-1): scrub: checksum error at logical 5960080556032 on dev /dev/mapper/md0_crypt, physical 5996629721088 root 5 inode 15079 offset 43859968 length 4096 links 1 (path: REDACTED)
    [  +0.000078] [ T317014] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
    [ +11.379069] [ T319380] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 5961402220544 on dev /dev/mapper/md0_crypt physical 5997951385600
    [  +0.000245] [ T319380] BTRFS warning (device dm-1): scrub: checksum error at logical 5961402220544 on dev /dev/mapper/md0_crypt, physical 5997951385600 root 5 inode 15079 offset 612937728 length 4096 links 1 (path: REDACTED)
    [  +0.000038] [ T319380] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
    [Oct 5 16:39] [ T320461] BTRFS error (device dm-1): scrub: fixed up error at logical 6508309970944 on dev /dev/mapper/md0_crypt physical 6551301586944
    [  +0.000066] [ T320461] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
    [Oct 5 16:44] [ T319972] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 6542744813568 on dev /dev/mapper/md0_crypt physical 6585736429568
    [  +0.072550] [ T319972] BTRFS warning (device dm-1): scrub: checksum error at logical 6542744813568 on dev /dev/mapper/md0_crypt, physical 6585736429568 root 5 inode 16518 offset 103481344 length 4096 links 1 (path: REDACTED)
    [  +0.000079] [ T319972] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
    [Oct 5 20:28] [ T320144] BTRFS error (device dm-1): scrub: unable to fixup (regular) error at logical 8062253727744 on dev /dev/mapper/md0_crypt physical 8112761536512
    [  +0.059861] [ T320144] BTRFS warning (device dm-1): scrub: checksum error at logical 8062253727744 on dev /dev/mapper/md0_crypt, physical 8112761536512 root 5 inode 349790 offset 16443854848 length 4096 links 1 (path: REDACTED)
    [  +0.000069] [ T320144] BTRFS error (device dm-1): bdev /dev/mapper/md0_crypt errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
    [Oct 5 21:01] [ T307802] BTRFS info (device dm-1): scrub: finished on devid 1 with status: 0

    UUID:             3bd1727b-c8ae-4876-96b2-9318c1f9556f
    Scrub started:    Sun Oct  5 00:45:20 2025
    Status:           finished
    Duration:         20:15:56
    Total to scrub:   7.54TiB
    Rate:             108.38MiB/s
    Error summary:    verify=4 csum=11
      Corrected:      7
      Uncorrectable:  8
      Unverified:     0

That scrub missed about half the data errors, which makes sense to me:
the mdraid1 "randomly" reads a given block from one or the other
underlying drive.

But the metadata doesn't make sense to me: the one scrub appears to have
seen and fixed *all* the metadata errors, because my later offline
scrubs of each individual drive saw no metadata errors. Seems unlikely.

I'll make sure to only run offline scrubs next time, so I can inspect
the metadata corruptions too (and for the btrfs-raid1, so I can inspect
the corruptions at all before they're fixed up).

> > I've
> > used the same two-drive USB enclosure extensively with older HDDs and
> > never seen a problem. I'm running this FIO job to test them:
> > 
> >      [global]
> >      numjobs=1
> >      loops=20
> >      ioengine=io_uring
> >      rw=randrw
> >      percentage_random=5%
> >      rwmixwrite=95
> >      iodepth=32
> >      direct=1
> >      size=5%
> >      blocksize_range=1k-32m
> >      sync=none
> >      refill_buffers=1
> >      random_distribution=random
> >      random_generator=tausworthe64
> >      verify=xxhash
> >      verify_fatal=1
> >      verify_dump=1
> >      do_verify=1
> >      verify_async=$ncpus
> >      [hdd-sdb-test]
> >      filename=/dev/sdb
> >      [hdd-sdc-test]
> >      filename=/dev/sdc
> > 
> > ...but no luck hitting anything after about 18 hours.
> 
> I didn't have a good experience using fio to find corruption, thus if you
> hit the problem by simplying copying data (I guess through 'cp'?), then
> maybe stick to the working reproducer?
> 
> Although copying 10TiB into HDDs will take over 40 hours, still much longer
> than your fio workload.

Makes sense. I'll stick to making copies like the original workload,
and report back if I can trigger it again.

Thanks,
Calvin

next prev parent reply	other threads:[~2025-11-14 19:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 17:01 [QUESTION] Debugging some file data corruption Calvin Owens
2025-11-11 21:02 ` Qu Wenruo
2025-11-14 19:39   ` Calvin Owens [this message]
2026-01-12 18:11   ` Calvin Owens
2026-02-11 14:07     ` Kenta Akagi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aReFaSpMe3yxoBMA@mozart.vkv.me \
    --to=calvin@wbinvd.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox