Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: ein <ein.net@gmail.com>
Cc: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted
Date: Tue, 14 Jan 2025 07:09:28 +1030	[thread overview]
Message-ID: <3749cb72-a99f-4f4e-9682-e2cbf7604227@gmx.com> (raw)
In-Reply-To: <501eb99a-dee6-4e84-93cb-ae49d48dcab6@gmail.com>



在 2025/1/14 02:24, ein 写道:
> On 29.07.2024 12:05, Qu Wenruo wrote:
>> On 10.06.2024 16:56, ein wrote:
>>>> [...]
>>>> I don't think that it's RAM related because,
>>>> - HW is new, RAM is good quality and I did mem. check couple months
>>>> ago,
>>>> - it affects only one file, I have other much busier VMs, that one
>>>> mostly stays idle,
>>>> - other OS operations seems to be working perfectly for months.
>>>
>>> [...]
>>>
>>> after spotting this:
>>> https://www.reddit.com/r/GlobalOffensive/comments/1eb00pg/
>>> intel_processors_are_causing_significant/
>>>
>>> I decided to move from:
>>> cpupower frequency-set -g performance
>>> to:
>>> cpupower frequency-set -g powersave
>>>
>>> I have got:
>>>
>>> ~# lscpu
>>> Architecture:             x86_64
>>>   CPU op-mode(s):         32-bit, 64-bit
>>>   Address sizes:          46 bits physical, 48 bits virtual
>>>   Byte Order:             Little Endian
>>> CPU(s):                   32
>>>   On-line CPU(s) list:    0-31
>>> Vendor ID:                GenuineIntel
>>>   BIOS Vendor ID:         Intel(R) Corporation
>>>   Model name:             13th Gen Intel(R) Core(TM) i9-13900K
>>>     BIOS Model name:      13th Gen Intel(R) Core(TM) i9-13900K To Be
>>> Filled By O.E.M. CPU @ 5.3GHz
>>>
>>> One week without corruptions.
> Hi Qu,  thank for the answer.
>> Normally we only suspect the hardware when we have enough evidence.
>> (e.g. proof of bitflip etc)
>> Even if the hardware is known to have problems.
> I think I have those - proofs. (1)
>> In your case, I still do not believe it's hardware problem.
>>
>> > - it affects only one file, I have other much busier VMs, that one
>> mostly stays idle,
>>
>> Due to btrfs' datacsum behavior, it's very sensitive to page content
>> change during writeback.
>>
>> Normally this should not happen for buffered writes as btrfs has locked
>> the page cache.
>>
>> But for Direct IO it's still very possible that one process submitted a
>> direct IO, and when the IO was still under way, the user space changed
>> the contents of that page.
>>
>> In that case, btrfs csum is calculated using that old contents, but the
>> on-disk data is the new contents, causing the csum mismatch.
>>
>> So I'm wondering what's the workload inside the VM?
>
> As far as I know in such configuration there's no writeback:
>
> <disk type="file" device="disk">
>    <driver name="qemu" type="qcow2" cache="none" discard="unmap"/>

cache="none" means direct IO.

Exactly the problem I mentioned, direct IO with data changed during
writeback.

You can change it to "cache=writeback" then it should resolve the false
alert mismatch.
(Or just simply change the disk image file to NODATASUM)

Thanks,
Qu

>    <source file="/var/lib/libvirt/images-red-btrfs/dell.qcow2" index="2"/>
>    <backingStore/>
>    <target dev="vda" bus="virtio"/>
>    <alias name="virtio-disk0"/>
>    <address type="pci" domain="0x0000" bus="0x00" slot="0x04"
> function="0x0"/>
> </disk>
> [...]
> <controller type="pci" index="0" model="pci-root">
>    <alias name="pci.0"/>
> </controller>
>
> This is mostly empty Win7 virtual machine with very small SQLite
> database (100-500MiB) with some network monitoring tool.
>
> (1)
> It took almost a year, I spent hundredths of hours and thousands of $
> chasing this issue:
> - tired 4 different new SATA controllers, from cheap ASM106X series to,
> DC grade HBA like LSI,
> - multiple times replaced all SATA cables,
> - replacing HDDs WD Red drives (mix of CMA/SMR) to WD Red SSDs SA500,
> That part changed nothing. I experienced a lot of PCI-E link issues
> like, disappearing SATA drives, disappearing NVME drives - sometimes
> both of them, USB link problems etc.
> But I don't think that link issues was related - the corruption happens
> without them (indication of link reset in dmsg).
>
> - RMA the CPU from i9-13900k to i9-14900k,
> - try every available Intel CPU microcode update packaged as BIOS update
> by mainboard vendor.
> This part made the situation better, but I still could recreate
> corruption errors. As times goes on when running in the "performance"
> mode, the issues appeared often and were more severe. Every time
> switching from performance mode to powersave (lower voltage) made the
> CPU more stable.
>
> The process of recreation looked as follows.
> - shut the VM off,
> - defrag the filesystem (btrfs filesystem defragment),
> - turn the VM on,
> - defrag/chkdsk on VM.
> The errors appeared almost immediately. There was correlation how often
> it happens.
> If the VM image was very fragmented in btrfs, then the probability of
> corruption was lower.
>
> i9-14900k 3 month after RMA, started to have threading issues and
> started to leave zombie processes in performance mode. Powersave mode
> fixed it as well and it worked stable.
>
> Finally, I replaced my mainboard (it was X13SAE-F) with Intel Z890 mobo
> and the latest CPU generation leaving whole IO stack intact (same:
> chassis, cables, controllers and disks).
> I ran scrub, balance, this VM had one small 4096b unrecoverable error on
> bluescreen memory dump file and everything works fine from couple of
> days. I can't reproduce it with above method anymore.
> I used ddrescue to reread everything I could from btrfs (this one file
> used by mentioned VM) and just replaced the file after ddrescue was done.
>
> On Friday last week I asked Intel for refund.
>
> I am positively surprised how much pain this btrfs filesystem (RAID10
> for data and metadata) handled over last year. Great job devs, keep it up!
>
> Sincerely,
> e.
>
>

next prev parent reply	other threads:[~2025-01-13 20:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 14:56 RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted ein
2024-07-29  8:43 ` ein
2024-07-29 10:05   ` Qu Wenruo
2025-01-13 15:54     ` ein
2025-01-13 20:39       ` Qu Wenruo [this message]
2025-01-16 14:55         ` ein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3749cb72-a99f-4f4e-9682-e2cbf7604227@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=ein.net@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox