From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: ein <ein.net@gmail.com>
Cc: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted
Date: Tue, 14 Jan 2025 07:09:28 +1030 [thread overview]
Message-ID: <3749cb72-a99f-4f4e-9682-e2cbf7604227@gmx.com> (raw)
In-Reply-To: <501eb99a-dee6-4e84-93cb-ae49d48dcab6@gmail.com>
在 2025/1/14 02:24, ein 写道:
> On 29.07.2024 12:05, Qu Wenruo wrote:
>> On 10.06.2024 16:56, ein wrote:
>>>> [...]
>>>> I don't think that it's RAM related because,
>>>> - HW is new, RAM is good quality and I did mem. check couple months
>>>> ago,
>>>> - it affects only one file, I have other much busier VMs, that one
>>>> mostly stays idle,
>>>> - other OS operations seems to be working perfectly for months.
>>>
>>> [...]
>>>
>>> after spotting this:
>>> https://www.reddit.com/r/GlobalOffensive/comments/1eb00pg/
>>> intel_processors_are_causing_significant/
>>>
>>> I decided to move from:
>>> cpupower frequency-set -g performance
>>> to:
>>> cpupower frequency-set -g powersave
>>>
>>> I have got:
>>>
>>> ~# lscpu
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 46 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 32
>>> On-line CPU(s) list: 0-31
>>> Vendor ID: GenuineIntel
>>> BIOS Vendor ID: Intel(R) Corporation
>>> Model name: 13th Gen Intel(R) Core(TM) i9-13900K
>>> BIOS Model name: 13th Gen Intel(R) Core(TM) i9-13900K To Be
>>> Filled By O.E.M. CPU @ 5.3GHz
>>>
>>> One week without corruptions.
> Hi Qu, thank for the answer.
>> Normally we only suspect the hardware when we have enough evidence.
>> (e.g. proof of bitflip etc)
>> Even if the hardware is known to have problems.
> I think I have those - proofs. (1)
>> In your case, I still do not believe it's hardware problem.
>>
>> > - it affects only one file, I have other much busier VMs, that one
>> mostly stays idle,
>>
>> Due to btrfs' datacsum behavior, it's very sensitive to page content
>> change during writeback.
>>
>> Normally this should not happen for buffered writes as btrfs has locked
>> the page cache.
>>
>> But for Direct IO it's still very possible that one process submitted a
>> direct IO, and when the IO was still under way, the user space changed
>> the contents of that page.
>>
>> In that case, btrfs csum is calculated using that old contents, but the
>> on-disk data is the new contents, causing the csum mismatch.
>>
>> So I'm wondering what's the workload inside the VM?
>
> As far as I know in such configuration there's no writeback:
>
> <disk type="file" device="disk">
> <driver name="qemu" type="qcow2" cache="none" discard="unmap"/>
cache="none" means direct IO.
Exactly the problem I mentioned, direct IO with data changed during
writeback.
You can change it to "cache=writeback" then it should resolve the false
alert mismatch.
(Or just simply change the disk image file to NODATASUM)
Thanks,
Qu
> <source file="/var/lib/libvirt/images-red-btrfs/dell.qcow2" index="2"/>
> <backingStore/>
> <target dev="vda" bus="virtio"/>
> <alias name="virtio-disk0"/>
> <address type="pci" domain="0x0000" bus="0x00" slot="0x04"
> function="0x0"/>
> </disk>
> [...]
> <controller type="pci" index="0" model="pci-root">
> <alias name="pci.0"/>
> </controller>
>
> This is mostly empty Win7 virtual machine with very small SQLite
> database (100-500MiB) with some network monitoring tool.
>
> (1)
> It took almost a year, I spent hundredths of hours and thousands of $
> chasing this issue:
> - tired 4 different new SATA controllers, from cheap ASM106X series to,
> DC grade HBA like LSI,
> - multiple times replaced all SATA cables,
> - replacing HDDs WD Red drives (mix of CMA/SMR) to WD Red SSDs SA500,
> That part changed nothing. I experienced a lot of PCI-E link issues
> like, disappearing SATA drives, disappearing NVME drives - sometimes
> both of them, USB link problems etc.
> But I don't think that link issues was related - the corruption happens
> without them (indication of link reset in dmsg).
>
> - RMA the CPU from i9-13900k to i9-14900k,
> - try every available Intel CPU microcode update packaged as BIOS update
> by mainboard vendor.
> This part made the situation better, but I still could recreate
> corruption errors. As times goes on when running in the "performance"
> mode, the issues appeared often and were more severe. Every time
> switching from performance mode to powersave (lower voltage) made the
> CPU more stable.
>
> The process of recreation looked as follows.
> - shut the VM off,
> - defrag the filesystem (btrfs filesystem defragment),
> - turn the VM on,
> - defrag/chkdsk on VM.
> The errors appeared almost immediately. There was correlation how often
> it happens.
> If the VM image was very fragmented in btrfs, then the probability of
> corruption was lower.
>
> i9-14900k 3 month after RMA, started to have threading issues and
> started to leave zombie processes in performance mode. Powersave mode
> fixed it as well and it worked stable.
>
> Finally, I replaced my mainboard (it was X13SAE-F) with Intel Z890 mobo
> and the latest CPU generation leaving whole IO stack intact (same:
> chassis, cables, controllers and disks).
> I ran scrub, balance, this VM had one small 4096b unrecoverable error on
> bluescreen memory dump file and everything works fine from couple of
> days. I can't reproduce it with above method anymore.
> I used ddrescue to reread everything I could from btrfs (this one file
> used by mentioned VM) and just replaced the file after ddrescue was done.
>
> On Friday last week I asked Intel for refund.
>
> I am positively surprised how much pain this btrfs filesystem (RAID10
> for data and metadata) handled over last year. Great job devs, keep it up!
>
> Sincerely,
> e.
>
>
next prev parent reply other threads:[~2025-01-13 20:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-10 14:56 RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted ein
2024-07-29 8:43 ` ein
2024-07-29 10:05 ` Qu Wenruo
2025-01-13 15:54 ` ein
2025-01-13 20:39 ` Qu Wenruo [this message]
2025-01-16 14:55 ` ein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3749cb72-a99f-4f4e-9682-e2cbf7604227@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=ein.net@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox