From: Christoph Hellwig <hch@lst.de>
To: Bruno Gravato <bgravato@gmail.com>
Cc: Stefan <linux-kernel@simg.de>,
"Dr. David Alan Gilbert" <linux@treblig.org>,
Christoph Hellwig <hch@lst.de>,
Thorsten Leemhuis <linux@leemhuis.info>,
Mario Limonciello <mario.limonciello@amd.com>,
Keith Busch <kbusch@kernel.org>,
Adrian Huang <ahuang12@lenovo.com>,
Linux kernel regressions list <regressions@lists.linux.dev>,
linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
Date: Tue, 4 Feb 2025 07:12:08 +0100 [thread overview]
Message-ID: <20250204061208.GA29300@lst.de> (raw)
In-Reply-To: <CAOBLbT8V5DkuXzzgp2dPOZkGomqiLa_cdC=e=uWBtcWpP7L=Ww@mail.gmail.com>
On Sun, Feb 02, 2025 at 08:32:31AM +0000, Bruno Gravato wrote:
> In my tests I was using real data: a backup of my files.
>
> On one such test I copied over 300K files, variables sizes and types
> totalling about 60GB. A bit over 20 files got corrupted.
> I tried copying the files over the network (ethernet) using rsync/ssh.
> I also tried restoring the files using restic (over ssh as well). And
> I also tried copying the files locally from a SATA disk. In all cases
> I got similar results with some files being corrupted.
> The destination nvme disk was using btrfs and running btrfs scrub
> after the copy detects quite a few checksum errors.
So you used various different data sources, and the desintation was
always the nvme device in the suspect slot.
> I analyzed some of those corrupted files and one of them happened to
> be a text file (linux kernel source code).
> A big portion of the text was replaced with text from another file in
> the same directory (being text made it easy to find where it came
> from).
> So this was a contiguous block of text that was overwritten with a
> contiguous block of text from another file.
> If I remember correctly the other file was not corrupted (so the
> blocks weren't swapped). It looked like a certain block of text was
> written twice: on the correct file and on another file in the same
> directory.
That's a very interesting pattern.
> I also got some jpeg images corrupted. I was able to open and view
> (partially) those images and it looked like a portion of the image was
> repeated in a different part of it), so blocks of the same file were
> probably duplicated and overwritten within itself.
>
> The blocks being overwritten seemed to be different sizes on different files.
This does sound like a fairly common pattern due to SSD FTL issues,
but I still don't want to rule out swiotlb, which due to the bucketing
could maybe also lead to these, but I can't really see how. But the
fact that the affected systems seem to be using swiotlb despite no
good reason for them to do so still leaves me puzzled.
next prev parent reply other threads:[~2025-02-04 6:12 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 14:38 [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Thorsten Leemhuis
2025-01-08 15:07 ` Keith Busch
2025-01-09 8:28 ` Christoph Hellwig
2025-01-09 8:52 ` Thorsten Leemhuis
2025-01-09 15:44 ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
2025-01-10 11:17 ` Bruno Gravato
2025-01-15 6:37 ` Bruno Gravato
2025-01-15 8:40 ` Thorsten Leemhuis
2025-01-16 17:29 ` Thorsten Leemhuis
2025-01-17 8:05 ` Christoph Hellwig
2025-01-17 9:51 ` Thorsten Leemhuis
2025-01-17 9:55 ` Christoph Hellwig
2025-01-17 10:30 ` Thorsten Leemhuis
2025-02-04 6:26 ` Christoph Hellwig
2025-01-17 13:36 ` Bruno Gravato
2025-01-20 14:31 ` Thorsten Leemhuis
2025-01-28 7:41 ` Christoph Hellwig
2025-01-28 12:00 ` Stefan
2025-01-28 12:52 ` Dr. David Alan Gilbert
2025-01-28 14:24 ` Stefan
2025-02-02 8:32 ` Bruno Gravato
2025-02-04 6:12 ` Christoph Hellwig [this message]
2025-02-04 9:12 ` Bruno Gravato
2025-02-03 18:48 ` Stefan
2025-02-06 15:58 ` Stefan
2025-01-17 21:31 ` Stefan
2025-01-18 1:03 ` Keith Busch
2025-01-15 10:47 ` Stefan
2025-01-15 13:14 ` Bruno Gravato
2025-01-15 16:26 ` Stefan
2025-01-10 0:10 ` [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250204061208.GA29300@lst.de \
--to=hch@lst.de \
--cc=ahuang12@lenovo.com \
--cc=axboe@fb.com \
--cc=bgravato@gmail.com \
--cc=iommu@lists.linux.dev \
--cc=kbusch@kernel.org \
--cc=linux-kernel@simg.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux@leemhuis.info \
--cc=linux@treblig.org \
--cc=mario.limonciello@amd.com \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox