Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: ein <ein.net@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted
Date: Mon, 13 Jan 2025 16:54:31 +0100	[thread overview]
Message-ID: <501eb99a-dee6-4e84-93cb-ae49d48dcab6@gmail.com> (raw)
In-Reply-To: <a00a0c80-85fa-4484-9076-d4a2f50e177e@gmx.com>

On 29.07.2024 12:05, Qu Wenruo wrote:
> On 10.06.2024 16:56, ein wrote:
>>> [...]
>>> I don't think that it's RAM related because,
>>> - HW is new, RAM is good quality and I did mem. check couple months ago,
>>> - it affects only one file, I have other much busier VMs, that one
>>> mostly stays idle,
>>> - other OS operations seems to be working perfectly for months.
>>
>> [...]
>>
>> after spotting this:
>> https://www.reddit.com/r/GlobalOffensive/comments/1eb00pg/intel_processors_are_causing_significant/
>>
>> I decided to move from:
>> cpupower frequency-set -g performance
>> to:
>> cpupower frequency-set -g powersave
>>
>> I have got:
>>
>> ~# lscpu
>> Architecture:             x86_64
>>   CPU op-mode(s):         32-bit, 64-bit
>>   Address sizes:          46 bits physical, 48 bits virtual
>>   Byte Order:             Little Endian
>> CPU(s):                   32
>>   On-line CPU(s) list:    0-31
>> Vendor ID:                GenuineIntel
>>   BIOS Vendor ID:         Intel(R) Corporation
>>   Model name:             13th Gen Intel(R) Core(TM) i9-13900K
>>     BIOS Model name:      13th Gen Intel(R) Core(TM) i9-13900K To Be
>> Filled By O.E.M. CPU @ 5.3GHz
>>
>> One week without corruptions.
Hi Qu,  thank for the answer.
> Normally we only suspect the hardware when we have enough evidence.
> (e.g. proof of bitflip etc)
> Even if the hardware is known to have problems.
I think I have those - proofs. (1)
> In your case, I still do not believe it's hardware problem.
>
> > - it affects only one file, I have other much busier VMs, that one
> mostly stays idle,
>
> Due to btrfs' datacsum behavior, it's very sensitive to page content
> change during writeback.
>
> Normally this should not happen for buffered writes as btrfs has locked
> the page cache.
>
> But for Direct IO it's still very possible that one process submitted a
> direct IO, and when the IO was still under way, the user space changed
> the contents of that page.
>
> In that case, btrfs csum is calculated using that old contents, but the
> on-disk data is the new contents, causing the csum mismatch.
>
> So I'm wondering what's the workload inside the VM?

As far as I know in such configuration there's no writeback:

<disk type="file" device="disk">
   <driver name="qemu" type="qcow2" cache="none" discard="unmap"/>
   <source file="/var/lib/libvirt/images-red-btrfs/dell.qcow2" index="2"/>
   <backingStore/>
   <target dev="vda" bus="virtio"/>
   <alias name="virtio-disk0"/>
   <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0"/>
</disk>
[...]
<controller type="pci" index="0" model="pci-root">
   <alias name="pci.0"/>
</controller>

This is mostly empty Win7 virtual machine with very small SQLite database (100-500MiB) with some 
network monitoring tool.

(1)
It took almost a year, I spent hundredths of hours and thousands of $ chasing this issue:
- tired 4 different new SATA controllers, from cheap ASM106X series to, DC grade HBA like LSI,
- multiple times replaced all SATA cables,
- replacing HDDs WD Red drives (mix of CMA/SMR) to WD Red SSDs SA500,
That part changed nothing. I experienced a lot of PCI-E link issues like, disappearing SATA drives, 
disappearing NVME drives - sometimes both of them, USB link problems etc.
But I don't think that link issues was related - the corruption happens without them (indication of 
link reset in dmsg).

- RMA the CPU from i9-13900k to i9-14900k,
- try every available Intel CPU microcode update packaged as BIOS update by mainboard vendor.
This part made the situation better, but I still could recreate corruption errors. As times goes on 
when running in the "performance" mode, the issues appeared often and were more severe. Every time 
switching from performance mode to powersave (lower voltage) made the CPU more stable.

The process of recreation looked as follows.
- shut the VM off,
- defrag the filesystem (btrfs filesystem defragment),
- turn the VM on,
- defrag/chkdsk on VM.
The errors appeared almost immediately. There was correlation how often it happens.
If the VM image was very fragmented in btrfs, then the probability of corruption was lower.

i9-14900k 3 month after RMA, started to have threading issues and started to leave zombie processes 
in performance mode. Powersave mode fixed it as well and it worked stable.

Finally, I replaced my mainboard (it was X13SAE-F) with Intel Z890 mobo and the latest CPU 
generation leaving whole IO stack intact (same: chassis, cables, controllers and disks).
I ran scrub, balance, this VM had one small 4096b unrecoverable error on bluescreen memory dump file 
and everything works fine from couple of days. I can't reproduce it with above method anymore.
I used ddrescue to reread everything I could from btrfs (this one file used by mentioned VM) and 
just replaced the file after ddrescue was done.

On Friday last week I asked Intel for refund.

I am positively surprised how much pain this btrfs filesystem (RAID10 for data and metadata) handled 
over last year. Great job devs, keep it up!

Sincerely,
e.

next prev parent reply	other threads:[~2025-01-13 15:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 14:56 RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted ein
2024-07-29  8:43 ` ein
2024-07-29 10:05   ` Qu Wenruo
2025-01-13 15:54     ` ein [this message]
2025-01-13 20:39       ` Qu Wenruo
2025-01-16 14:55         ` ein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501eb99a-dee6-4e84-93cb-ae49d48dcab6@gmail.com \
    --to=ein.net@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox