public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: ein <ein.net@gmail.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted
Date: Mon, 29 Jul 2024 19:35:04 +0930	[thread overview]
Message-ID: <a00a0c80-85fa-4484-9076-d4a2f50e177e@gmx.com> (raw)
In-Reply-To: <37cfd270-4b64-4415-8fee-fa732575d3a9@gmail.com>



在 2024/7/29 18:13, ein 写道:
> On 10.06.2024 16:56, ein wrote:
>> [...]
>>
>> I don't think that it's RAM related because,
>> - HW is new, RAM is good quality and I did mem. check couple months ago,
>> - it affects only one file, I have other much busier VMs, that one
>> mostly stays idle,
>> - other OS operations seems to be working perfectly for months.
>>
>> Sincerely,
>
> Hi,
>
> after spotting this:
> https://www.reddit.com/r/GlobalOffensive/comments/1eb00pg/intel_processors_are_causing_significant/
>
> I decided to move from:
> cpupower frequency-set -g performance
> to:
> cpupower frequency-set -g powersave
>
> I have got:
>
> ~# lscpu
> Architecture:             x86_64
>   CPU op-mode(s):         32-bit, 64-bit
>   Address sizes:          46 bits physical, 48 bits virtual
>   Byte Order:             Little Endian
> CPU(s):                   32
>   On-line CPU(s) list:    0-31
> Vendor ID:                GenuineIntel
>   BIOS Vendor ID:         Intel(R) Corporation
>   Model name:             13th Gen Intel(R) Core(TM) i9-13900K
>     BIOS Model name:      13th Gen Intel(R) Core(TM) i9-13900K To Be
> Filled By O.E.M. CPU @ 5.3GHz
>
> One week without corruptions.

Normally we only suspect the hardware when we have enough evidence.
(e.g. proof of bitflip etc)
Even if the hardware is known to have problems.

In your case, I still do not believe it's hardware problem.

 > - it affects only one file, I have other much busier VMs, that one
mostly stays idle,

Due to btrfs' datacsum behavior, it's very sensitive to page content
change during writeback.

Normally this should not happen for buffered writes as btrfs has locked
the page cache.

But for Direct IO it's still very possible that one process submitted a
direct IO, and when the IO was still under way, the user space changed
the contents of that page.

In that case, btrfs csum is calculated using that old contents, but the
on-disk data is the new contents, causing the csum mismatch.

So I'm wondering what's the workload inside the VM?

Thanks,
Qu
>
> Sincerely,
>
>

  reply	other threads:[~2024-07-29 10:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 14:56 RAID1 two chunks of the same data on the same physical disk, one file keeps being corrupted ein
2024-07-29  8:43 ` ein
2024-07-29 10:05   ` Qu Wenruo [this message]
2025-01-13 15:54     ` ein
2025-01-13 20:39       ` Qu Wenruo
2025-01-16 14:55         ` ein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a00a0c80-85fa-4484-9076-d4a2f50e177e@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=ein.net@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox