RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: ein <ein.net@gmail.com>
To: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted
Date: Mon, 10 Jun 2024 16:56:00 +0200	[thread overview]
Message-ID: <6ae187b3-7770-4b64-aa65-43fff3120213@gmail.com> (raw)

Dear devs and users,

I used BRTFS for few months in RAID1 mode on Debian 12 (6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 
6.1.90-1 (2024-05-03) x86_64 GNU/Linux, btrfs-progs v6.2).
I created my filesystem by issuing:
mkfs.btrfs -d raid1 -m raid1 /dev/sda1 /dev/sde1 /dev/sdf1

Those are 2TB WD Reds, mix of CMRs and SMRs with good S.M.A.R.T. stats.
I am using on-die-ecc RAM memory modules.
I never did balance or replace any device.
I had couple of unexpected hangs because of nvme power management which made my root fs unavailable, 
but hopefully it's been fixed by installing new firmware for WD black nvme.

How it's possible that btrfs kept same chunk of data on the same physical device?

Jun 02 23:27:54 node0 kernel: BTRFS warning (device sdf1): csum failed root 256 ino 259 off 
140290392064 csum 0x1315675d expected csum 0x49271c1b mirror 2
Jun02 23:27:54 node0 kernel: BTRFS warning (device sdf1): csum failed root 256 ino 259 off 
140290392064 csum 0x1315675d expected csum 0x49271c1b mirror 1

The corrupted file is qocw2 image with Windows 7 on it and I think I am able to corrupt this file 
and only this file on daily basis.
I resorted my filesystem from backup, by:
1. wipfs any singatires on my hdds,
2. recreating fs from scratch,
3. coping over new data, few days later I see the same issues:

Jun10 07:22:14 node0 kernel: BTRFS info (device dm-10): read error corrected: ino 259 off 
39193079808 (dev /dev/mapper/vg1-lv1 sector 670864280)
Jun10 07:22:14 node0 kernel: BTRFS info (device dm-10): read error corrected: ino 259 off 
33579532288 (dev /dev/mapper/vg1-lv1 sector 199031056)

I don't think that it's RAM related because,
- HW is new, RAM is good quality and I did mem. check couple months ago,
- it affects only one file, I have other much busier VMs, that one mostly stays idle,
- other OS operations seems to be working perfect for months.

Sincerely,

next             reply	other threads:[~2024-06-10 16:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 14:56 ein [this message]
2024-07-29  8:43 ` RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted ein
2024-07-29 10:05   ` Qu Wenruo
2025-01-13 15:54     ` ein
2025-01-13 20:39       ` Qu Wenruo
2025-01-16 14:55         ` ein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ae187b3-7770-4b64-aa65-43fff3120213@gmail.com \
    --to=ein.net@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox