[Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
@ 2025-01-08 14:38 Thorsten Leemhuis
  2025-01-08 15:07 ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-08 14:38 UTC (permalink / raw)
  To: Adrian Huang, Christoph Hellwig
  Cc: Linux kernel regressions list, Keith Busch, linux-nvme,
	Jens Axboe, iommu@lists.linux.dev, Linux kernel regressions list,
	LKML

[side note TWIMC: regression tracking is sadly kinda dormant temporarily
(hopefully this will change again soon), but this was brought to my
attention and looked kinda important]

Hi, Thorsten here, the Linux kernel's regression tracker.

Adrian, Christoph I noticed a report about a regression in
bugzilla.kernel.org that appears to be caused by a change you too
handled a while ago -- or it exposed an earlier problem:

3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
limitation") [v6.4-rc3]

As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to write this mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=219609 :

> Bug 219609 - File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
>
> there are one or two bugs which were originally reported at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 . For details
> (logs, etc.), see there. Here, I will post a summary and try to point
> out the most relevant observations:
> 
> Bug 1: Write errors with Lexar NM790 NVME
> 
> * Occur since Debian kernel 6.5, but reproduced with upstream kernel
> 6.11.5 (the only upstream kernel I tested)
> * Only occur in 1st M.2 socket (not in the 2nd one on rear side)
> * Easiest way to reproduce them is to use f3 (
> https://fight-flash-fraud.readthedocs.io/en/latest/usage.html ). f3
> reports overwritten sectors
> * The errors seem not to occur in the last files of 500 file (=500 GB)
> test runs and I never detected file system corruption (just defect
> files; I produced probably more than thousand ones). The reason for the
> latter observation is maybe, that file system information are written
> last. (See see message 113 in the Debian bug report)
> 
> (Possible) Bug 2: Read errors with Kingston FURY Renegade
> 
> * Only occur in 1st M.2 socket (did not tested the rear socket, because
> the warranty seal would to be broken in order to remove the heat sink)
> * Almost impossible to reproduce it, only detected it in Debian kernel
> that bases on 6.1.112
> * 1st occurrence: I detected in an SSD intensive computation (as data
> cache) which produced wrong results after a few days (but not in the
> first days). The error could be reproduced with f3: The corruptions were
> massive and different files were affected in subsequent f3read runs (==>
> read errors). Unfortunately I did not stored the f3 logs. (I still have
> the corrupt computation results, so it was real.)
> * 2nd occurrence: A single defect sector (read error) in a multi-day
> attempt to reproduce the error with the same kernel (Debian 6.1.112),
> see message 113 in the Debian bug report
> 
> Consideration / Notes:
> * These serial links (PCIe) need to be calibrated. Calibration issues
> would explain while the errors (dis)appear under certain condition. But
> errors like this should be detected (nothing could be found in the
> kernel logs). Is the error correction possibly inactive? However, this
> still does not explain why f2 reports overwritten sectors, unless the
> signal errors occur during command / address transmission.
> * Testing is difficult, because the machine is installed remotely and in
> use. ATM, till about end of Janaury, can run tests for bug 1.
> * On the AsRock X600M-STX mainboard (without chipset), the CPU (Ryzen
> 8700G) runs in SoC (system on chip) mode. Maybe someone did not tested
> this properly ...
> 
[...]

> With the help of TJ from the Debian kernel team ( https://
> bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 ), at least a
> workaround could be found.
> 
> The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> iommu/20230503161759.GA1614@lst.de/ ) introduced in 6.3.7
> 
> To examine the situation, I added this debug info (all files are
> located in `drivers/nvme/host`):
> 
>> --- core.c.orig       2025-01-03 14:27:38.220428482 +0100
>> +++ core.c    2025-01-03 12:56:34.503259774 +0100
>> @@ -3306,6 +3306,7 @@
>>               max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
>>       else
>>               max_hw_sectors = UINT_MAX;
>> +     dev_warn(ctrl->device, "id->mdts=%d,  max_hw_sectors=%d, 
>> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
>>       ctrl->max_hw_sectors =
>>               min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
> 
> 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
> 
>> [  127.196212] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=16384
>> [  127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
> 
>> [   46.436384] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=256
>> [   46.443562] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> After I reverted the mentioned patch (
> 
>> --- pci.c.orig        2025-01-03 14:28:05.944819822 +0100
>> +++ pci.c     2025-01-03 12:54:37.014579093 +0100
>> @@ -3042,7 +3042,8 @@
>>        * over a single page.
>>        */
>>       dev->ctrl.max_hw_sectors = min_t(u32,
>> -             NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> +//           NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> +             NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
>>       dev->ctrl.max_segments = NVME_MAX_SEGS;
>>  
>>       /*
> 
> ), 6.11.5 (used this version because sources were laying around) works and says:
> 
>> [    1.251370] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
>> ctrl->max_hw_sectors=16384
>> [    1.261168] nvme nvme0: allocated 40 MiB host memory buffer.
> 
> Thus, the corruption occurs if `ctrl->max_hw_sectors` is set to another (a smaller) value than defined by `id->mdts`. 
> 
> If this should be allowed, the mentioned patch is not the (root) cause, but reversion is at least a workaround.

See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
  2025-01-08 14:38 [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Thorsten Leemhuis
@ 2025-01-08 15:07 ` Keith Busch
  2025-01-09  8:28   ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2025-01-08 15:07 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Adrian Huang, Christoph Hellwig, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML

On Wed, Jan 08, 2025 at 03:38:53PM +0100, Thorsten Leemhuis wrote:
> [side note TWIMC: regression tracking is sadly kinda dormant temporarily
> (hopefully this will change again soon), but this was brought to my
> attention and looked kinda important]
> 
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> Adrian, Christoph I noticed a report about a regression in
> bugzilla.kernel.org that appears to be caused by a change you too
> handled a while ago -- or it exposed an earlier problem:
> 
> 3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
> limitation") [v6.4-rc3]

...
 
> > The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> > based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> > iommu/20230503161759.GA1614@lst.de/ ) introduced in 6.3.7
> > 
> > To examine the situation, I added this debug info (all files are
> > located in `drivers/nvme/host`):
> > 
> >> --- core.c.orig       2025-01-03 14:27:38.220428482 +0100
> >> +++ core.c    2025-01-03 12:56:34.503259774 +0100
> >> @@ -3306,6 +3306,7 @@
> >>               max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
> >>       else
> >>               max_hw_sectors = UINT_MAX;
> >> +     dev_warn(ctrl->device, "id->mdts=%d,  max_hw_sectors=%d, 
> >> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
> >>       ctrl->max_hw_sectors =
> >>               min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
> > 
> > 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
> > 
> >> [  127.196212] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=16384
> >> [  127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
> > 
> > 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
> > 
> >> [   46.436384] nvme nvme0: id->mdts=7,  max_hw_sectors=1024, 
> >> ctrl->max_hw_sectors=256
> >> [   46.443562] nvme nvme0: allocated 40 MiB host memory buffer.

It should always be okay to do smaller transfers as long as everything
stays aligned the logical block size. I'm guessing the dma opt change
has exposed some other flaw in the nvme controller. For example, two
consecutive smaller writes are hitting some controller side caching bug
that a single larger trasnfer would have handled correctly. The host
could have sent such a sequence even without the patch reverted, but
happens to not be doing that in this particular test.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
  2025-01-08 15:07 ` Keith Busch
@ 2025-01-09  8:28   ` Christoph Hellwig
  2025-01-09  8:52     ` Thorsten Leemhuis
  2025-01-10  0:10     ` [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Keith Busch
  0 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2025-01-09  8:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Thorsten Leemhuis, Adrian Huang, Christoph Hellwig,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
> It should always be okay to do smaller transfers as long as everything
> stays aligned the logical block size. I'm guessing the dma opt change
> has exposed some other flaw in the nvme controller. For example, two
> consecutive smaller writes are hitting some controller side caching bug
> that a single larger trasnfer would have handled correctly. The host
> could have sent such a sequence even without the patch reverted, but
> happens to not be doing that in this particular test.

Yes.  This somehow reminds of the bug with an Intel SSD that got
really upset with quickly following writes to different LBAs inside the
same indirection unit.  But as the new smaller size is nicely aligned
that seems unlikely.  Maybe the higher number of commands simply overloads
the buggy firmware?

Of course the real question is why we're even seeing the limitation.
The value suggests it's the swiotlb one.  Does the system use AMD SEV
(memory encryption)?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
  2025-01-09  8:28   ` Christoph Hellwig
@ 2025-01-09  8:52     ` Thorsten Leemhuis
  2025-01-09 15:44       ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
  2025-01-10  0:10     ` [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Keith Busch
  1 sibling, 1 reply; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-09  8:52 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Adrian Huang, Linux kernel regressions list, linux-nvme,
	Jens Axboe, iommu@lists.linux.dev, LKML, linux-kernel, bgravato

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

[CCing the people from
https://bugzilla.kernel.org/show_bug.cgi?id=219609, as they permitted that.

Stefan, Bruno, reminder: some developers might not follow the ticket or
unwilling to go to a web-based bug tracker; so any answers to questions
that are raised here via email might not be seen if you only provide
them in the bug tracker; yes, that sucks, but that's how it is for now;
hopefully things on that front will improve soon.]

On 09.01.25 09:28, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
>> It should always be okay to do smaller transfers as long as everything
>> stays aligned the logical block size. I'm guessing the dma opt change
>> has exposed some other flaw in the nvme controller. For example, two
>> consecutive smaller writes are hitting some controller side caching bug
>> that a single larger trasnfer would have handled correctly. The host
>> could have sent such a sequence even without the patch reverted, but
>> happens to not be doing that in this particular test.
> 
> Yes.  This somehow reminds of the bug with an Intel SSD that got
> really upset with quickly following writes to different LBAs inside the
> same indirection unit.  But as the new smaller size is nicely aligned
> that seems unlikely.  Maybe the higher number of commands simply overloads
> the buggy firmware?

Thx for the assessment. FWIW, I bought such a machine myself recently
and it's still in a state where I could abandon the install. I haven't
checked yet if mine is affected, too.

> Of course the real question is why we're even seeing the limitation.
> The value suggests it's the swiotlb one.  Does the system use AMD SEV
> (memory encryption)?

In case it is helpful to anyone: there are some logs buried deep in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 I'm attaching
one of the kernel logs I found there (there were multiple ones; hope I
picked a appropriate one) for easier access.

Ciao, Thorsten

[-- Attachment #2: kern.log-6.11.5 --]
[-- Type: application/x-troff-man, Size: 118168 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-09  8:52     ` Thorsten Leemhuis
@ 2025-01-09 15:44       ` Stefan
  2025-01-10 11:17         ` Bruno Gravato
  2025-01-15  6:37         ` Bruno Gravato
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan @ 2025-01-09 15:44 UTC (permalink / raw)
  To: Keith Busch, bugzilla-daemon, bgravato
  Cc: Adrian Huang, Linux kernel regressions list, linux-nvme,
	Jens Axboe, iommu@lists.linux.dev, LKML, linux-kernel,
	Thorsten Leemhuis, Christoph Hellwig

Hi,

due to Thorstens hints, I'm trying to reply to both, the bug tracker and
the mailing list.

> --- Comment #13 from Keith Busch (kbusch@kernel.org) ---
> If I'm summarizing correctly, we're seeing corruption on Lexar, Kingston,
> and now Samsung NVMe's?

The Kingston read errors may be something different. They are described
in detail in messages #108 and #113 of the Debian Bug Tracker
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372

With the Kington, I never saw the write errors that occur with Lexar and
Samsung on newer Kernels (and which are easy to reproduce).

(ATM I cannot provide test results from the Kingston SSD because the
Lexar is installed, the PC is installed remotely and in use. Thus I
can't swap the SSDS that often.)

> # cat /sys/block/nvme0n1/queue/fua

Returns "1"

> --- Comment #15 from Keith Busch (kbusch@kernel.org) --- as a test,
> could you turn off the volatile write cache?
>
> # sudo nvme set-feature /dev/nvme0n1 -f 6 -v 0
Had to modify that a little bit:

   $ nvme get-feature /dev/nvme0n1 -f 6
   get-feature:0x06 (Volatile Write Cache), Current value:0x00000001
   $ nvme set-feature /dev/nvme0 -f 6 /dev/nvme0n1 -v 0
   set-feature:0x06 (Volatile Write Cache), value:00000000,
cdw12:00000000, save:0
   $ nvme get-feature /dev/nvme0n1 -f 6
   get-feature:0x06 (Volatile Write Cache), Current value:00000000

Corruptions disappear (under 6.13.0-rc6) if volatile write cache is
disabled (and appear again if I turn it on with "-v 1").

But, lspci says I have a

   Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD
(DRAM-less) (rev 01) (prog-if 02 [NVM Express])

Note the "DRAM-less". This is confirmed by
https://www.techpowerup.com/ssd-specs/lexar-nm790-4-tb.d1591. Instead of
this, the SSD has a (*non-*volatile) SLC write cache and it uses 40 MB
Host-Memory-Buffer (HMB).

May there be an issue with the HMB allocation/usage ?

Is the mainboard firmware involved into HMB allocation/usage ? That
would explain, why volatile write caching via HMB works in the 2nd M.2
socket.

BTW, controller is MaxioTech MAP1602A, which is different from the
Samsung controllers.

> --- Comment #14 from Bruno Gravato (bgravato@gmail.com) --- The only
>  difference in the specs between the two M.2 slots is that one is
> gen5x4 (the main one, which is the one with problems) and the other
> is gen4x4 (this works fine, no errors).

AFAIK this primary M.2 socket is connected to dedicated PCIe lanes of
the CPU. On my PC, it runs in Gen4 mode (limited by SSD).

The secondary M.2 socket on the rear side is probably connected to PCIe
lanes which are usually used by a chipset -- but that socket works.

Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
  2025-01-09  8:28   ` Christoph Hellwig
  2025-01-09  8:52     ` Thorsten Leemhuis
@ 2025-01-10  0:10     ` Keith Busch
  1 sibling, 0 replies; 31+ messages in thread
From: Keith Busch @ 2025-01-10  0:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thorsten Leemhuis, Adrian Huang, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML

On Thu, Jan 09, 2025 at 09:28:49AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 08:07:28AM -0700, Keith Busch wrote:
> > It should always be okay to do smaller transfers as long as everything
> > stays aligned the logical block size. I'm guessing the dma opt change
> > has exposed some other flaw in the nvme controller. For example, two
> > consecutive smaller writes are hitting some controller side caching bug
> > that a single larger trasnfer would have handled correctly. The host
> > could have sent such a sequence even without the patch reverted, but
> > happens to not be doing that in this particular test.
> 
> Yes.  This somehow reminds of the bug with an Intel SSD that got
> really upset with quickly following writes to different LBAs inside the
> same indirection unit.

Good old https://bugzilla.redhat.com/show_bug.cgi?id=1402533 ...

> But as the new smaller size is nicely aligned
> that seems unlikely.  Maybe the higher number of commands simply overloads
> the buggy firmware?

Maybe the higher size creates different splits that better straddle some
unreported internal boundary we don't know about. This all just points
to some probabilisitic scenario that somehow happens more often with
a lower transfer limit.

The bugzilla reports disabling VWC makes the problem go away. That may
be a timing thing or a caching thing, but suggests a kernel bug is less
likely (yay!?); not easy to tell so far. It's just concerning multiple
vendor devices are reporting a similiar observation, so maybe these are
not even the same root problem.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-09 15:44       ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
@ 2025-01-10 11:17         ` Bruno Gravato
  2025-01-15  6:37         ` Bruno Gravato
  1 sibling, 0 replies; 31+ messages in thread
From: Bruno Gravato @ 2025-01-10 11:17 UTC (permalink / raw)
  To: Stefan
  Cc: Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Thorsten Leemhuis, Christoph Hellwig

Hi,

(resending in text-only mode, because mailing lists don't like HMTL
emails... sorry to those getting this twice)

I can reply via email, that's not a problem.

I'll try to run some more tests when I get the chance (it's been a
very busy week, sorry).
Besides the volatile write cache test, any other test I should try?

Regarding the M.2 slots. I believe this motherboard has no chipset. So
both slots should be connected directly to the CPU (mine is Ryzen
8600G), although they might be connecting to different parts of the
CPU, right? I guess that can make a difference.

My disks are gen4 as well.

Bruno


On Thu, 9 Jan 2025 at 15:44, Stefan <linux-kernel@simg.de> wrote:
>
> Hi,
>
> due to Thorstens hints, I'm trying to reply to both, the bug tracker and
> the mailing list.
>
> > --- Comment #13 from Keith Busch (kbusch@kernel.org) ---
> > If I'm summarizing correctly, we're seeing corruption on Lexar, Kingston,
> > and now Samsung NVMe's?
>
> The Kingston read errors may be something different. They are described
> in detail in messages #108 and #113 of the Debian Bug Tracker
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372
>
> With the Kington, I never saw the write errors that occur with Lexar and
> Samsung on newer Kernels (and which are easy to reproduce).
>
> (ATM I cannot provide test results from the Kingston SSD because the
> Lexar is installed, the PC is installed remotely and in use. Thus I
> can't swap the SSDS that often.)
>
> > # cat /sys/block/nvme0n1/queue/fua
>
> Returns "1"
>
> > --- Comment #15 from Keith Busch (kbusch@kernel.org) --- as a test,
> > could you turn off the volatile write cache?
> >
> > # sudo nvme set-feature /dev/nvme0n1 -f 6 -v 0
> Had to modify that a little bit:
>
>    $ nvme get-feature /dev/nvme0n1 -f 6
>    get-feature:0x06 (Volatile Write Cache), Current value:0x00000001
>    $ nvme set-feature /dev/nvme0 -f 6 /dev/nvme0n1 -v 0
>    set-feature:0x06 (Volatile Write Cache), value:00000000,
> cdw12:00000000, save:0
>    $ nvme get-feature /dev/nvme0n1 -f 6
>    get-feature:0x06 (Volatile Write Cache), Current value:00000000
>
> Corruptions disappear (under 6.13.0-rc6) if volatile write cache is
> disabled (and appear again if I turn it on with "-v 1").
>
> But, lspci says I have a
>
>    Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD
> (DRAM-less) (rev 01) (prog-if 02 [NVM Express])
>
> Note the "DRAM-less". This is confirmed by
> https://www.techpowerup.com/ssd-specs/lexar-nm790-4-tb.d1591. Instead of
> this, the SSD has a (*non-*volatile) SLC write cache and it uses 40 MB
> Host-Memory-Buffer (HMB).
>
> May there be an issue with the HMB allocation/usage ?
>
> Is the mainboard firmware involved into HMB allocation/usage ? That
> would explain, why volatile write caching via HMB works in the 2nd M.2
> socket.
>
> BTW, controller is MaxioTech MAP1602A, which is different from the
> Samsung controllers.
>
> > --- Comment #14 from Bruno Gravato (bgravato@gmail.com) --- The only
> >  difference in the specs between the two M.2 slots is that one is
> > gen5x4 (the main one, which is the one with problems) and the other
> > is gen4x4 (this works fine, no errors).
>
> AFAIK this primary M.2 socket is connected to dedicated PCIe lanes of
> the CPU. On my PC, it runs in Gen4 mode (limited by SSD).
>
> The secondary M.2 socket on the rear side is probably connected to PCIe
> lanes which are usually used by a chipset -- but that socket works.
>
> Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-09 15:44       ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
  2025-01-10 11:17         ` Bruno Gravato
@ 2025-01-15  6:37         ` Bruno Gravato
  2025-01-15  8:40           ` Thorsten Leemhuis
  2025-01-15 10:47           ` Stefan
  1 sibling, 2 replies; 31+ messages in thread
From: Bruno Gravato @ 2025-01-15  6:37 UTC (permalink / raw)
  To: Stefan
  Cc: Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Thorsten Leemhuis, Christoph Hellwig

I finally got the chance to run some more tests with some interesting
and unexpected results...

I put another disk (WD Black SN750) in the main M.2 slot (the
problematic one), but kept my main disk (Solidigm P44 Pro) in the
secondary M.2 slot (where it doesn't have any issues).
I rerun my test: step 1) copy a large number of files to the WD disk
(main slot), step 2) run btrfs scrub on it and expect some checksum
errors
To my surprise there were no errors!
I tried it twice with different kernels (6.2.6 and 6.11.5) and booting
from either disk (I have linux installations on both).
Still no errors.

I then removed the Solidigm disk from the secondary and kept the WD
disk in the main M.2 slot.
Rerun my tests (on kernel 6.11.5) and bang! btrfs scrub now detected
quite a few checksum errors!

I then tried disabling volatile write cache with "nvme set-feature
/dev/nvme0 -f 6 -v 0"
"nvme get-feature /dev/nvme0 -f 6" confirmed it was disabled, but
/sys/block/nvme0n1/queue/fua still showed 1... Was that supposed to
turn into 0?

I re-run my test, but I still got checksum errors on btrfs scrub. So
disabling volatile write cache (assuming I did it correctly) didn't
make a difference in my case.

I put the Solidigm disk back into the secondary slot, booted and rerun
the test on the WD disk (main slot) just to be triple sure and still
no errors.

So it looks like the corruption only happens if only the main M.2 slot
is occupied and the secondary M.2 slot is free.
With two nvme disks (one on each M.2 slot), there were no errors at all.

Stefan, did you ever try running your tests with 2 nvme disks
installed on both slots? Or did you use only one slot at a time?

Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15  6:37         ` Bruno Gravato
@ 2025-01-15  8:40           ` Thorsten Leemhuis
  2025-01-16 17:29             ` Thorsten Leemhuis
  2025-01-17  8:05             ` Christoph Hellwig
  2025-01-15 10:47           ` Stefan
  1 sibling, 2 replies; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-15  8:40 UTC (permalink / raw)
  To: Bruno Gravato, Stefan
  Cc: Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Christoph Hellwig

On 15.01.25 07:37, Bruno Gravato wrote:
> I finally got the chance to run some more tests with some interesting
> and unexpected results...

FWIW, I briefly looked into the issue in between as well and can
reproduce it[1] locally with my Samsung SSD 990 EVO Plus 4TB in the main
M.2 slot of my DeskMini X600 using btrfs on a mainline kernel with a
config from Fedora rawhide.

So what can we that are affected by the problem do to narrow it down?

What does it mean that disabling the NVMe devices's write cache often
but apparently not always helps? It it just reducing the chance of the
problem occurring or accidentally working around it?

hch initially brought up that swiotlb seems to be used. Are there any
BIOS setup settings we should try? I tried a few changes yesterday, but
I still get the "PCI-DMA: Using software bounce buffering for IO
(SWIOTLB)" message in the log and not a single line mentioning DMAR.

Ciao, Thorsten

[1] see start of this thread and/or
https://bugzilla.kernel.org/show_bug.cgi?id=219609 for details

> I put another disk (WD Black SN750) in the main M.2 slot (the
> problematic one), but kept my main disk (Solidigm P44 Pro) in the
> secondary M.2 slot (where it doesn't have any issues).
> I rerun my test: step 1) copy a large number of files to the WD disk
> (main slot), step 2) run btrfs scrub on it and expect some checksum
> errors
> To my surprise there were no errors!
> I tried it twice with different kernels (6.2.6 and 6.11.5) and booting
> from either disk (I have linux installations on both).
> Still no errors.
> 
> I then removed the Solidigm disk from the secondary and kept the WD
> disk in the main M.2 slot.
> Rerun my tests (on kernel 6.11.5) and bang! btrfs scrub now detected
> quite a few checksum errors!
> 
> I then tried disabling volatile write cache with "nvme set-feature
> /dev/nvme0 -f 6 -v 0"
> "nvme get-feature /dev/nvme0 -f 6" confirmed it was disabled, but
> /sys/block/nvme0n1/queue/fua still showed 1... Was that supposed to
> turn into 0?
> 
> I re-run my test, but I still got checksum errors on btrfs scrub. So
> disabling volatile write cache (assuming I did it correctly) didn't
> make a difference in my case.
> 
> I put the Solidigm disk back into the secondary slot, booted and rerun
> the test on the WD disk (main slot) just to be triple sure and still
> no errors.
> 
> So it looks like the corruption only happens if only the main M.2 slot
> is occupied and the secondary M.2 slot is free.
> With two nvme disks (one on each M.2 slot), there were no errors at all.
> 
> Stefan, did you ever try running your tests with 2 nvme disks
> installed on both slots? Or did you use only one slot at a time?

$ journalctl -k | grep -i -e DMAR -e IOMMU -e AMD-Vi -e SWIOTLB
AMD-Vi: Using global IVHD EFR:0x246577efa2254afa, EFR2:0x0
iommu: Default domain type: Translated
iommu: DMA domain TLB invalidation policy: lazy mode
pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
pci 0000:00:01.0: Adding to iommu group 0
pci 0000:00:01.3: Adding to iommu group 1
pci 0000:00:02.0: Adding to iommu group 2
pci 0000:00:02.3: Adding to iommu group 3
pci 0000:00:03.0: Adding to iommu group 4
pci 0000:00:04.0: Adding to iommu group 5
pci 0000:00:08.0: Adding to iommu group 6
pci 0000:00:08.1: Adding to iommu group 7
pci 0000:00:08.2: Adding to iommu group 8
pci 0000:00:08.3: Adding to iommu group 9
pci 0000:00:14.0: Adding to iommu group 10
pci 0000:00:14.3: Adding to iommu group 10
pci 0000:00:18.0: Adding to iommu group 11
pci 0000:00:18.1: Adding to iommu group 11
pci 0000:00:18.2: Adding to iommu group 11
pci 0000:00:18.3: Adding to iommu group 11
pci 0000:00:18.4: Adding to iommu group 11
pci 0000:00:18.5: Adding to iommu group 11
pci 0000:00:18.6: Adding to iommu group 11
pci 0000:00:18.7: Adding to iommu group 11
pci 0000:01:00.0: Adding to iommu group 12
pci 0000:02:00.0: Adding to iommu group 13
pci 0000:03:00.0: Adding to iommu group 14
pci 0000:03:00.1: Adding to iommu group 15
pci 0000:03:00.2: Adding to iommu group 16
pci 0000:03:00.3: Adding to iommu group 17
pci 0000:03:00.4: Adding to iommu group 18
pci 0000:03:00.6: Adding to iommu group 19
pci 0000:04:00.0: Adding to iommu group 20
pci 0000:04:00.1: Adding to iommu group 21
pci 0000:05:00.0: Adding to iommu group 22
AMD-Vi: Extended features (0x246577efa2254afa, 0x0): PPR NX GT [5] IA GA
PC GA_vAPIC
AMD-Vi: Interrupt remapping enabled
AMD-Vi: Virtual APIC enabled
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15  6:37         ` Bruno Gravato
  2025-01-15  8:40           ` Thorsten Leemhuis
@ 2025-01-15 10:47           ` Stefan
  2025-01-15 13:14             ` Bruno Gravato
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan @ 2025-01-15 10:47 UTC (permalink / raw)
  To: Bruno Gravato, bugzilla-daemon
  Cc: Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Thorsten Leemhuis, Christoph Hellwig

Hi,

(replying to both, the mailing list and the kernel bug tracker)

Am 15.01.25 um 07:37 schrieb Bruno Gravato:
> I then removed the Solidigm disk from the secondary and kept the WD
> disk in the main M.2 slot. Rerun my tests (on kernel 6.11.5) and
> bang! btrfs scrub now detected quite a few checksum errors!
>
> I then tried disabling volatile write cache with "nvme set-feature
> /dev/nvme0 -f 6 -v 0" "nvme get-feature /dev/nvme0 -f 6" confirmed it
> was disabled, but /sys/block/nvme0n1/queue/fua still showed 1... Was
> that supposed to turn into 0?

You can check this using `nvme get-feature /dev/nvme0n1 -f 6`

> So it looks like the corruption only happens if only the main M.2
> slot is occupied and the secondary M.2 slot is free. With two nvme
> disks (one on each M.2 slot), there were no errors at all.
>
> Stefan, did you ever try running your tests with 2 nvme disks
> installed on both slots? Or did you use only one slot at a time?

No, I only tested these configurations:

1. 1st M.2: Lexar;    2nd M.2: empty
    (Easy to reproduce write errors)
2. 1st M.2: Kingsten; 2nd M.2: Lexar
    (Difficult to reproduce read errors with 6.1 Kernel, but no issues
    with a newer ones within several month of intense use)

I'll swap the SSD's soon. Then I will also test other configurations and
will try out a third SSD. If I get corruption with other SSD's, I will
check which modifications help.

Note that I need both SSD's (configuration 2) in about one week and
cannot change this for about 3 months (already announced this in December).

Thus, if there are things I shall test with configuration 1, please
inform me quickly.

Just as remainder (for those who did not read the two bug trackers):
I tested with `f3` (a utility used to detect scam disks) on ext4.
`f3` reports overwritten sectors. In configuration 1 this are write
errors (appear if I read again).

(If no other SSD-intense jobs are running), the corruption do not occur
in the last files, and I never noticed file system corruptions, only
file contents is corrupt. (This is probably luck, but also has something
to do with the journal and the time when file system information are
written.)


Am 13.01.25 um 22:01 schrieb bugzilla-daemon@kernel.org:
 > https://bugzilla.kernel.org/show_bug.cgi?id=219609
 >
 > --- Comment #21 from mbe ---
 > Hi,
 >
 > I did some more tests. At first I retrieved the following values
under debian
 >
 >> Debian 12, Kernel 6.1.119, no corruption
 >> cat /sys/class/block/nvme0n1/queue/max_hw_sectors_kb
 >> 2048
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_sectors_kb
 >> 1280
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_segments
 >> 127
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_segment_size
 >> 4294967295
 >
 > To achieve the same values on Kernel 6.11.0-13, I had to make the
following
 > changes to drivers/nvme/host/pci.c
 >
 >> --- pci.c.org 2024-09-15 16:57:56.000000000 +0200
 >> +++ pci.c     2025-01-13 21:18:54.475903619 +0100
 >> @@ -41,8 +41,8 @@
 >>    * These can be higher, but we need to ensure that any command doesn't
 >>    * require an sg allocation that needs more than a page of data.
 >>    */
 >> -#define NVME_MAX_KB_SZ       8192
 >> -#define NVME_MAX_SEGS        128
 >> +#define NVME_MAX_KB_SZ       4096
 >> +#define NVME_MAX_SEGS        127
 >>   #define NVME_MAX_NR_ALLOCATIONS      5
 >>
 >>   static int use_threaded_interrupts;
 >> @@ -3048,8 +3048,8 @@
 >>         * Limit the max command size to prevent iod->sg allocations
going
 >>         * over a single page.
 >>         */
 >> -     dev->ctrl.max_hw_sectors = min_t(u32,
 >> -             NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev)
 >> 9);
 >> +     //dev->ctrl.max_hw_sectors = min_t(u32,
 >> +     //      NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev)
 >> 9);
 >>        dev->ctrl.max_segments = NVME_MAX_SEGS;
 >>
 >>        /*
 >
 > So basically, dev->ctl.max_hw_sectors stays zero, so that in core.c
it is set
 > to the value of nvme_mps_to_sectors(ctrl, id->mdts)  (=> 4096 in my case)
This has the same effect as setting it to `dma_max_mapping_size(...)`

 >> if (id->mdts)
 >>    max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
 >> else
 >>    max_hw_sectors = UINT_MAX;
 >> ctrl->max_hw_sectors =
 >>    min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
 >
 > But that alone was not enough:
 > Tests with ctrl->max_hw_sectors=4096 and NVME_MAX_SEGS = 128 still
resulted in
 > corruptions.
 > They only went away after reverting this value back to 127 (the value
from
 > kernel 6.1).

That change was introduced in 6.3-rc1 using a patch "nvme-pci: place
descriptor addresses in iod" (
https://github.com/torvalds/linux/commit/7846c1b5a5db8bb8475603069df7c7af034fd081
)

This patch has no effect for me, i.e. unmodified kernels work up to 6.3.6.

The patch that triggers the corruptions is the one introduced in 6.3.7
  which replaces `dma_max_mapping_size(...)` by
`dma_opt_mapping_size(...)`. If I apply this change to 6.1, the
corruptions also occur in that kernel.

Matthias, did you checked what happens is you only modify NVME_MAX_SEGS
(and leave the `dev->ctrl.max_hw_sectors = min_t(u32, NVME_MAX_KB_SZ <<
1, dma_opt_mapping_size(&pdev->dev) >> 9);`)

 > Additional logging to get the values of the following statements
 >> (dma_opt_mapping_size(&pdev->dev) >> 9) = 256
 >> (dma_max_mapping_size(&pdev->dev) >> 9) = 36028797018963967 [sic!]
 >
 > @Stefan, can you check which value NVME_MAX_SEGS had in your tests?
 > It also seems to have an influence.

"128", see above.

Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15 10:47           ` Stefan
@ 2025-01-15 13:14             ` Bruno Gravato
  2025-01-15 16:26               ` Stefan
  0 siblings, 1 reply; 31+ messages in thread
From: Bruno Gravato @ 2025-01-15 13:14 UTC (permalink / raw)
  To: Stefan
  Cc: bugzilla-daemon, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Thorsten Leemhuis, Christoph Hellwig

On Wed, 15 Jan 2025 at 10:48, Stefan <linux-kernel@simg.de> wrote:
> > Stefan, did you ever try running your tests with 2 nvme disks
> > installed on both slots? Or did you use only one slot at a time?
>
> No, I only tested these configurations:
>
> 1. 1st M.2: Lexar;    2nd M.2: empty
>     (Easy to reproduce write errors)
> 2. 1st M.2: Kingsten; 2nd M.2: Lexar
>     (Difficult to reproduce read errors with 6.1 Kernel, but no issues
>     with a newer ones within several month of intense use)
>
> I'll swap the SSD's soon. Then I will also test other configurations and
> will try out a third SSD. If I get corruption with other SSD's, I will
> check which modifications help.

So it may be that the reason you no longer had errors in config 2 is
not because you put a different SSD in the 1st slot, but because you
now have the 2nd slot also occupied, like me.

If yours behaves like mine, I'd expect that if you swap the disks in
config 2, that you won't have any errors as well...
I'm very curious to see the result of that test!

Just to recap the results of my tests:

Setup 1
Main slot: Solidigm
Secondary slot: (empty)
Result: BAD - corruption happens

Setup 2
Main slot: (empty)
Secondary slot: Solidigm
Result: GOOD - no corruption

Setup 3
Main slot: WD
Secondary slot: (empty)
Result: BAD - corruption happens

Setup 4
Main slot: WD
Secondary slot: Solidigm
Result: GOOD - no corruption (on either disk)

So, in my case, it looks like the corruption only happens if I have
only 1 disk installed in the main slot and the secondary slot is
empty.
If I have the two slots occupied or only the secondary slot occupied,
there are no more errors.


Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15 13:14             ` Bruno Gravato
@ 2025-01-15 16:26               ` Stefan
  0 siblings, 0 replies; 31+ messages in thread
From: Stefan @ 2025-01-15 16:26 UTC (permalink / raw)
  To: Bruno Gravato, bugzilla-daemon
  Cc: bugzilla-daemon, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Thorsten Leemhuis, Christoph Hellwig

Hi,

Am 15.01.25 um 14:14 schrieb Bruno Gravato:
> If yours behaves like mine, I'd expect that if you swap the disks in
> config 2, that you won't have any errors as well...

yeah, I would just need to plug something into the 2nd M.2 socket. But
that can't be done remotely. I will do that on weekend or in next week.

BTW, is there a kernel parameter to ignore a NVME/PCI device? If the
corruptions appear again after disabling the 2nd SSD, it is more likely
that it is a kernel problem, e.g. a driver writing to memory reserved
for some other driver/component. Such a bug may only occur under rare
conditions. AFAIU, the patch "nvme-pci: place descriptor addresses in
iod" form 6.3-rc1 attempts to use some space which is otherwise unused.
Unfortunately I was not able to revert that patch because later changes
depend on it.

So, I now only tried out whether just `NVME_MAX_SEGS 127` helps (see
message from Matthias). Answer is no. This only seem to by an upper
limit, because `/sys/class/block/nvme0n1/queue/max_segments` reports 33
with unmodified kernels >= 6.3.7. With older kernels or kernels with
reversed patch "nvme-pci: clamp max_hw_sectors based on DMA optimized
limitation" (introduced in 6.3.7) this value is 127 and corruptions
disappear.

I guess, this value somehow has to be 127. In my case it is sufficient
to revert the patch form 6.3.7. In Matthias's case, the values then
becomes 128 and has to be limited additionally using `NVME_MAX_SEGS 127`

Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15  8:40           ` Thorsten Leemhuis
@ 2025-01-16 17:29             ` Thorsten Leemhuis
  2025-01-17  8:05             ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-16 17:29 UTC (permalink / raw)
  To: iommu@lists.linux.dev, linux-nvme
  Cc: Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, Jens Axboe, LKML,
	Christoph Hellwig, Stefan, Bruno Gravato

On 15.01.25 09:40, Thorsten Leemhuis wrote:
> On 15.01.25 07:37, Bruno Gravato wrote:
>> I finally got the chance to run some more tests with some interesting
>> and unexpected results...
> 
> FWIW, I briefly looked into the issue in between as well and can
> reproduce it[1] locally with my Samsung SSD 990 EVO Plus 4TB in the main
> M.2 slot of my DeskMini X600 using btrfs on a mainline kernel with a
> config from Fedora rawhide.
> 
> So what can we that are affected by the problem do to narrow it down?
> 
> What does it mean that disabling the NVMe devices's write cache often
> but apparently not always helps? It it just reducing the chance of the
> problem occurring or accidentally working around it?
> 
> hch initially brought up that swiotlb seems to be used. Are there any
> BIOS setup settings we should try? I tried a few changes yesterday, but
> I still get the "PCI-DMA: Using software bounce buffering for IO
> (SWIOTLB)" message in the log and not a single line mentioning DMAR.

FWIW, I meanwhile became aware that it is normal that there are no lines
with DMAR when it comes to AMD's IOMMU. Sorry for the noise.

But there is a new development:

I noticed earlier today that disabling the IOMMU in the BIOS Setup seems
to prevent the corruption from occurring. Another user in the bugzilla
ticket just confirmed this.

Ciao, Thorsten

> [1] see start of this thread and/or
> https://bugzilla.kernel.org/show_bug.cgi?id=219609 for details
> 
>> I put another disk (WD Black SN750) in the main M.2 slot (the
>> problematic one), but kept my main disk (Solidigm P44 Pro) in the
>> secondary M.2 slot (where it doesn't have any issues).
>> I rerun my test: step 1) copy a large number of files to the WD disk
>> (main slot), step 2) run btrfs scrub on it and expect some checksum
>> errors
>> To my surprise there were no errors!
>> I tried it twice with different kernels (6.2.6 and 6.11.5) and booting
>> from either disk (I have linux installations on both).
>> Still no errors.
>>
>> I then removed the Solidigm disk from the secondary and kept the WD
>> disk in the main M.2 slot.
>> Rerun my tests (on kernel 6.11.5) and bang! btrfs scrub now detected
>> quite a few checksum errors!
>>
>> I then tried disabling volatile write cache with "nvme set-feature
>> /dev/nvme0 -f 6 -v 0"
>> "nvme get-feature /dev/nvme0 -f 6" confirmed it was disabled, but
>> /sys/block/nvme0n1/queue/fua still showed 1... Was that supposed to
>> turn into 0?
>>
>> I re-run my test, but I still got checksum errors on btrfs scrub. So
>> disabling volatile write cache (assuming I did it correctly) didn't
>> make a difference in my case.
>>
>> I put the Solidigm disk back into the secondary slot, booted and rerun
>> the test on the WD disk (main slot) just to be triple sure and still
>> no errors.
>>
>> So it looks like the corruption only happens if only the main M.2 slot
>> is occupied and the secondary M.2 slot is free.
>> With two nvme disks (one on each M.2 slot), there were no errors at all.
>>
>> Stefan, did you ever try running your tests with 2 nvme disks
>> installed on both slots? Or did you use only one slot at a time?
> 
> $ journalctl -k | grep -i -e DMAR -e IOMMU -e AMD-Vi -e SWIOTLB
> AMD-Vi: Using global IVHD EFR:0x246577efa2254afa, EFR2:0x0
> iommu: Default domain type: Translated
> iommu: DMA domain TLB invalidation policy: lazy mode
> pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
> pci 0000:00:01.0: Adding to iommu group 0
> pci 0000:00:01.3: Adding to iommu group 1
> pci 0000:00:02.0: Adding to iommu group 2
> pci 0000:00:02.3: Adding to iommu group 3
> pci 0000:00:03.0: Adding to iommu group 4
> pci 0000:00:04.0: Adding to iommu group 5
> pci 0000:00:08.0: Adding to iommu group 6
> pci 0000:00:08.1: Adding to iommu group 7
> pci 0000:00:08.2: Adding to iommu group 8
> pci 0000:00:08.3: Adding to iommu group 9
> pci 0000:00:14.0: Adding to iommu group 10
> pci 0000:00:14.3: Adding to iommu group 10
> pci 0000:00:18.0: Adding to iommu group 11
> pci 0000:00:18.1: Adding to iommu group 11
> pci 0000:00:18.2: Adding to iommu group 11
> pci 0000:00:18.3: Adding to iommu group 11
> pci 0000:00:18.4: Adding to iommu group 11
> pci 0000:00:18.5: Adding to iommu group 11
> pci 0000:00:18.6: Adding to iommu group 11
> pci 0000:00:18.7: Adding to iommu group 11
> pci 0000:01:00.0: Adding to iommu group 12
> pci 0000:02:00.0: Adding to iommu group 13
> pci 0000:03:00.0: Adding to iommu group 14
> pci 0000:03:00.1: Adding to iommu group 15
> pci 0000:03:00.2: Adding to iommu group 16
> pci 0000:03:00.3: Adding to iommu group 17
> pci 0000:03:00.4: Adding to iommu group 18
> pci 0000:03:00.6: Adding to iommu group 19
> pci 0000:04:00.0: Adding to iommu group 20
> pci 0000:04:00.1: Adding to iommu group 21
> pci 0000:05:00.0: Adding to iommu group 22
> AMD-Vi: Extended features (0x246577efa2254afa, 0x0): PPR NX GT [5] IA GA
> PC GA_vAPIC
> AMD-Vi: Interrupt remapping enabled
> AMD-Vi: Virtual APIC enabled
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-15  8:40           ` Thorsten Leemhuis
  2025-01-16 17:29             ` Thorsten Leemhuis
@ 2025-01-17  8:05             ` Christoph Hellwig
  2025-01-17  9:51               ` Thorsten Leemhuis
  2025-01-17 21:31               ` Stefan
  1 sibling, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2025-01-17  8:05 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Bruno Gravato, Stefan, Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Christoph Hellwig

On Wed, Jan 15, 2025 at 09:40:04AM +0100, Thorsten Leemhuis wrote:
> What does it mean that disabling the NVMe devices's write cache often
> but apparently not always helps? It it just reducing the chance of the
> problem occurring or accidentally working around it?

For consumer NAND device you basically can't disable the volatile
write cache.  If you do disable it, that just means it gets flushed
after every write, meaning you have to write the entire NAND
(super)block for every write, causing a huge slowdown (and a lot of
media wear).  This will change timings a lot obviously.  If it doesn't
change the timing the driver just fakes it, which reputable vendors
shouldn't be doing, but I would not be entirely surprised about
for noname devices.

> hch initially brought up that swiotlb seems to be used. Are there any
> BIOS setup settings we should try? I tried a few changes yesterday, but
> I still get the "PCI-DMA: Using software bounce buffering for IO
> (SWIOTLB)" message in the log and not a single line mentioning DMAR.

The real question would be to figure out why it is used.

Do you see the

	pci_dbg(dev, "marking as untrusted\n");

message in the commit log if enabling the pci debug output?
(I though we had a sysfs file for that, but I can't find it).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  8:05             ` Christoph Hellwig
@ 2025-01-17  9:51               ` Thorsten Leemhuis
  2025-01-17  9:55                 ` Christoph Hellwig
                                   ` (2 more replies)
  2025-01-17 21:31               ` Stefan
  1 sibling, 3 replies; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-17  9:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bruno Gravato, Stefan, Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Mario Limonciello

On 17.01.25 09:05, Christoph Hellwig wrote:
> On Wed, Jan 15, 2025 at 09:40:04AM +0100, Thorsten Leemhuis wrote:
>
>> hch initially brought up that swiotlb seems to be used. Are there any
>> BIOS setup settings we should try? I tried a few changes yesterday, but
>> I still get the "PCI-DMA: Using software bounce buffering for IO
>> (SWIOTLB)" message in the log and not a single line mentioning DMAR.
> 
> The real question would be to figure out why it is used.
> 
> Do you see the
> 
> 	pci_dbg(dev, "marking as untrusted\n");
> 
> message in the commit log if enabling the pci debug output?

By booting with 'ignore_loglevel dyndbg="file drivers/pci/* +p"' I
suppose? No, that is not printed (but other debug lines from the pci
code are).

Side note: that "PCI-DMA: Using software bounce buffering for IO
>> (SWIOTLB)" message does show up on two other AMD machines I own as
well. One also has a Ryzen 8000, the other one a much older one.

And BTW a few bits of the latest development in the bugzilla ticket
(https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):

* iommu=pt and amd_iommu=off seems to work around the problem (in
addition to disabling the iommu in the BIOS setup).

* Not totally sure, but it seems most or everyone affected is using a
Ryzen 8000 CPU -- and now one user showed up that mentioned a DeskMini
x600 with a Ryzen 7000 CPU is not affected (see ticket for details). But
that might be due to other aspects. A former colleague of mine who can
reproduce the problem will later test if a different CPU line really is
making a difference.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  9:51               ` Thorsten Leemhuis
@ 2025-01-17  9:55                 ` Christoph Hellwig
  2025-01-17 10:30                   ` Thorsten Leemhuis
  2025-01-17 13:36                 ` Bruno Gravato
  2025-01-20 14:31                 ` Thorsten Leemhuis
  2 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2025-01-17  9:55 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Christoph Hellwig, Bruno Gravato, Stefan, Keith Busch,
	bugzilla-daemon, Adrian Huang, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML,
	Mario Limonciello

On Fri, Jan 17, 2025 at 10:51:09AM +0100, Thorsten Leemhuis wrote:
> By booting with 'ignore_loglevel dyndbg="file drivers/pci/* +p"' I
> suppose? No, that is not printed (but other debug lines from the pci
> code are).
> 
> Side note: that "PCI-DMA: Using software bounce buffering for IO
> >> (SWIOTLB)" message does show up on two other AMD machines I own as
> well. One also has a Ryzen 8000, the other one a much older one.
> 
> And BTW a few bits of the latest development in the bugzilla ticket
> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
> 
> * iommu=pt and amd_iommu=off seems to work around the problem (in
> addition to disabling the iommu in the BIOS setup).

That suggests the problem is related to the dma-iommu code, and
my strong suspect is the swiotlb bounce buffering for untrusted
device.  If you feel adventurous, can you try building a kernel
where dev_use_swiotlb() in drivers/iommu/dma-iommu.c is hacked
to always return false?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  9:55                 ` Christoph Hellwig
@ 2025-01-17 10:30                   ` Thorsten Leemhuis
  2025-02-04  6:26                     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-17 10:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bruno Gravato, Stefan, Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Mario Limonciello

On 17.01.25 10:55, Christoph Hellwig wrote:
> On Fri, Jan 17, 2025 at 10:51:09AM +0100, Thorsten Leemhuis wrote:
>> By booting with 'ignore_loglevel dyndbg="file drivers/pci/* +p"' I
>> suppose? No, that is not printed (but other debug lines from the pci
>> code are).
>>
>> Side note: that "PCI-DMA: Using software bounce buffering for IO
>>>> (SWIOTLB)" message does show up on two other AMD machines I own as
>> well. One also has a Ryzen 8000, the other one a much older one.
>>
>> And BTW a few bits of the latest development in the bugzilla ticket
>> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
>>
>> * iommu=pt and amd_iommu=off seems to work around the problem (in
>> addition to disabling the iommu in the BIOS setup).
> 
> That suggests the problem is related to the dma-iommu code, and
> my strong suspect is the swiotlb bounce buffering for untrusted
> device.  If you feel adventurous, can you try building a kernel
> where dev_use_swiotlb() in drivers/iommu/dma-iommu.c is hacked
> to always return false?

Tried that, did not help: I still get corrupted data.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  9:51               ` Thorsten Leemhuis
  2025-01-17  9:55                 ` Christoph Hellwig
@ 2025-01-17 13:36                 ` Bruno Gravato
  2025-01-20 14:31                 ` Thorsten Leemhuis
  2 siblings, 0 replies; 31+ messages in thread
From: Bruno Gravato @ 2025-01-17 13:36 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Christoph Hellwig, Stefan, Keith Busch, bugzilla-daemon,
	Adrian Huang, Linux kernel regressions list, linux-nvme,
	Jens Axboe, iommu@lists.linux.dev, LKML, Mario Limonciello

On Fri, 17 Jan 2025 at 09:51, Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
> * Not totally sure, but it seems most or everyone affected is using a
> Ryzen 8000 CPU -- and now one user showed up that mentioned a DeskMini
> x600 with a Ryzen 7000 CPU is not affected (see ticket for details). But
> that might be due to other aspects. A former colleague of mine who can
> reproduce the problem will later test if a different CPU line really is
> making a difference.

One other different aspect for that user besides the 7000 series CPU
is that he's using a wifi card as well (that sits in a M.2 wifi slot
just below the main M.2 disk slot), so I wonder if that may play a
role? I think most of us have no wifi card installed. I think I have a
M.2 wifi card on my former NUC, I'll see if it's compatible with the
deskmini and try it out.

The other reason could be some disk models aren't affected... I think
Stefan reported no issues on a Firecuda 520.

I ordered a Crucial T500 1TB yesterday. It's for another machine, but
I will try it on the deskmini x600 before deploying on the other
machine. I should receive it in a week or so.

Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  8:05             ` Christoph Hellwig
  2025-01-17  9:51               ` Thorsten Leemhuis
@ 2025-01-17 21:31               ` Stefan
  2025-01-18  1:03                 ` Keith Busch
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan @ 2025-01-17 21:31 UTC (permalink / raw)
  To: Christoph Hellwig, Thorsten Leemhuis, bugzilla-daemon
  Cc: Bruno Gravato, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML


Hi,

>> What does it mean that disabling the NVMe devices's write cache
>> often but apparently not always helps? It it just reducing the
>> chance of the problem occurring or accidentally working around it?
>
> For consumer NAND device you basically can't disable the volatile
> write cache.  If you do disable it, that just means it gets flushed
> after every write, meaning you have to write the entire NAND
> (super)block for every write, causing a huge slowdown (and a lot of
> media wear).  This will change timings a lot obviously.  If it
> doesn't change the timing the driver just fakes it, which reputable
> vendors shouldn't be doing, but I would not be entirely surprised
> about for noname devices.

As already mentioned, my SSD has no DRAM and uses HMB (Host memory
buffer). (It has non-volatile SLC cache.) Disabling volatile write cache
has no significant effect on read/write performance of large files,
because the HMB size in only 40MB. But things like file deletions may be
slower.

AFAIS the corruption occur with both kinds of SSD's, the ones that have
own DRAM and he ones that use HMB.

> --- Comment #49 from Bruno Gravato ---
>> * Not totally sure, but it seems most or everyone affected is
>> using a Ryzen 8000 CPU -- and now one user showed up that mentioned
>> a DeskMini x600 with a Ryzen 7000 CPU is not affected (see ticket
>> for details). But that might be due to other aspects. A former
>> colleague of mine who can reproduce the problem will later test if
>> a different CPU line really is making a difference.
>
> One other different aspect for that user besides the 7000 series CPU
> is that he's using a wifi card as well (that sits in a M.2 wifi slot
> just below the main M.2 disk slot), so I wonder if that may play a
> role? I think most of us have no wifi card installed. I think I have
> a M.2 wifi card on my former NUC, I'll see if it's compatible with
> the deskmini and try it out.
>
> The other reason could be some disk models aren't affected... I think
> Stefan reported no issues on a Firecuda 520.

Correct. To verify that the two other CPU series are not affected,
someone who can reproduce this error and who have laying around another
CPU must swap them.

> --- Comment #51 from Ralph Gerstman --- > A missing network might prevent the failure during install - at least
> in Ubuntu> 22.10 - but can happen anyway. Enabling network seems to
> raise the chance.
I had to disable it in BIOS. Just not connecting it has no effect
because drivers and firmware are still loaded.


Just for the files (already mentioned it): I'm using the latest BIOS
version 4.08 with  AGESA PI 1.2.0.2a (according to AsRock page) and
firmware blobs version 20241210 from
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
and I can confirm the the corruptions also occur with older versions of
BIOS/firmware.

Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17 21:31               ` Stefan
@ 2025-01-18  1:03                 ` Keith Busch
  0 siblings, 0 replies; 31+ messages in thread
From: Keith Busch @ 2025-01-18  1:03 UTC (permalink / raw)
  To: Stefan
  Cc: Christoph Hellwig, Thorsten Leemhuis, bugzilla-daemon,
	Bruno Gravato, Adrian Huang, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML

On Fri, Jan 17, 2025 at 10:31:55PM +0100, Stefan wrote:
> As already mentioned, my SSD has no DRAM and uses HMB (Host memory
> buffer). 

HMB and volatile write caches are not necessarily intertwined. A device
can have both. Generally speaking, you'd expect the HMB to have SSD
metadata, not user data, where a VWC usually just has user data. The
spec also requires the device maintain data integrity even with an
unexpected sudden loss of access to the HMB, but that isn't the case
with a VWC.

>(It has non-volatile SLC cache.) Disabling volatile write cache
> has no significant effect on read/write performance of large files,

Devices are free to have whatever hierarchy of non-volatile caches they
want without advertising that to the host, but if they're calling those
"volatile" then I think something has been misinterpreted.

> because the HMB size in only 40MB. But things like file deletions may be
> slower.
> 
> AFAIS the corruption occur with both kinds of SSD's, the ones that have
> own DRAM and he ones that use HMB.

Yeah, that was the point of the experiment. If corruption happens when
it's off, then that helps rule out host buffer size/alignment (which is
where this bz started) as a triggering condition. Disabling VWC is not a
"fix", it's just a debug data point. If corruption goes away with it
off, though, then we can't really conclude anything for this issue.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17  9:51               ` Thorsten Leemhuis
  2025-01-17  9:55                 ` Christoph Hellwig
  2025-01-17 13:36                 ` Bruno Gravato
@ 2025-01-20 14:31                 ` Thorsten Leemhuis
  2025-01-28  7:41                   ` Christoph Hellwig
  2 siblings, 1 reply; 31+ messages in thread
From: Thorsten Leemhuis @ 2025-01-20 14:31 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Bruno Gravato, Stefan, Keith Busch, bugzilla-daemon, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, Christoph Hellwig

On 17.01.25 10:51, Thorsten Leemhuis wrote:
> On 17.01.25 09:05, Christoph Hellwig wrote:
>> On Wed, Jan 15, 2025 at 09:40:04AM +0100, Thorsten Leemhuis wrote:

> And BTW a few bits of the latest development in the bugzilla ticket
> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
> 
> * iommu=pt and amd_iommu=off seems to work around the problem (in
> addition to disabling the iommu in the BIOS setup).
> 
> * Not totally sure, but it seems most or everyone affected is using a
> Ryzen 8000 CPU -- and now one user showed up that mentioned a DeskMini
> x600 with a Ryzen 7000 CPU is not affected (see ticket for details). But
> that might be due to other aspects. A former colleague of mine who can
> reproduce the problem will later test if a different CPU line really is
> making a difference.

My former colleague Christian Hirsch (not CCed) can reproduce the
problem reliably. He today switched the CPU to a Ryzen 7 7700 and later
to some Ryzen 9600X – and with those things worked just fine, e.g. no
corruptions. But they came back after putting the 8600G back in.

Ralph, can you please add this detail to the Asrock support ticket?

Ciao, Thorsten

[1] he described building a x600 machine in the c't magazine, which is
the reason why I and a few others affected and CCed build their x600 systems

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-20 14:31                 ` Thorsten Leemhuis
@ 2025-01-28  7:41                   ` Christoph Hellwig
  2025-01-28 12:00                     ` Stefan
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2025-01-28  7:41 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Mario Limonciello, Bruno Gravato, Stefan, Keith Busch,
	bugzilla-daemon, Adrian Huang, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML,
	Christoph Hellwig

On Mon, Jan 20, 2025 at 03:31:28PM +0100, Thorsten Leemhuis wrote:
> My former colleague Christian Hirsch (not CCed) can reproduce the
> problem reliably. He today switched the CPU to a Ryzen 7 7700 and later
> to some Ryzen 9600X – and with those things worked just fine, e.g. no
> corruptions. But they came back after putting the 8600G back in.

So basically you need a specific board and a specific CPU, and only
one M.2 SSD in the two slots to reproduce it?  Puh.  I'm kinda lost on
what we could do about this on the Linux side.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-28  7:41                   ` Christoph Hellwig
@ 2025-01-28 12:00                     ` Stefan
  2025-01-28 12:52                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan @ 2025-01-28 12:00 UTC (permalink / raw)
  To: Christoph Hellwig, Thorsten Leemhuis, bugzilla-daemon
  Cc: Mario Limonciello, Bruno Gravato, Keith Busch, bugzilla-daemon,
	Adrian Huang, Linux kernel regressions list, linux-nvme,
	Jens Axboe, iommu@lists.linux.dev, LKML

Hi,

Am 28.01.25 um 08:41 schrieb Christoph Hellwig:
> So basically you need a specific board and a specific CPU, and only
> one M.2 SSD in the two slots to reproduce it?

more generally, it dependents on which PCIe devices are used. On my PC
corruptions also disappear if I disable the ethernet controller in the BIOS.

Furthermore it depends on transaction sizes (that's why older kernels
work), IOMMU, sometimes on volatile write cache and partially on SSD
type (which may have something to do with the former things).

> Puh.  I'm kinda lost on what we could do about this on the Linux
> side.

Because it also depends on the CPU series, a firmware or hardware issue
seems to be more likely than a Linux bug.

ATM ASRock is still trying to reproduce the issue. (I'm in contact with
them to. But they have Chinese new year holidays in Taiwan this week.)

If they can't reproduce it, they have to provide an explanation why the
issues are seen by so many users.

Regards Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-28 12:00                     ` Stefan
@ 2025-01-28 12:52                       ` Dr. David Alan Gilbert
  2025-01-28 14:24                         ` Stefan
  0 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2025-01-28 12:52 UTC (permalink / raw)
  To: Stefan
  Cc: Christoph Hellwig, Thorsten Leemhuis, bugzilla-daemon,
	Mario Limonciello, Bruno Gravato, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

* Stefan (linux-kernel@simg.de) wrote:
> Hi,
> 
> Am 28.01.25 um 08:41 schrieb Christoph Hellwig:
> > So basically you need a specific board and a specific CPU, and only
> > one M.2 SSD in the two slots to reproduce it?
> 
> more generally, it dependents on which PCIe devices are used. On my PC
> corruptions also disappear if I disable the ethernet controller in the BIOS.
> 
> Furthermore it depends on transaction sizes (that's why older kernels
> work), IOMMU, sometimes on volatile write cache and partially on SSD
> type (which may have something to do with the former things).

Is there any characterisation of the corrupted data; last time I looked at the
bz there wasn't.
I mean, is it reliably any of:
   a) What's the size of the corruption?
          block, cache line, word, bit???
   b) Position?
          e.g. last word in a block or something?
   c) Data?
          pile of zero's/ff's junk/etc?

   d) Is it a missed write, old data, or partially written block?

Dave

> > Puh.  I'm kinda lost on what we could do about this on the Linux
> > side.
> 
> Because it also depends on the CPU series, a firmware or hardware issue
> seems to be more likely than a Linux bug.
> 
> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
> them to. But they have Chinese new year holidays in Taiwan this week.)
> 
> If they can't reproduce it, they have to provide an explanation why the
> issues are seen by so many users.
> 
> Regards Stefan
> 
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-28 12:52                       ` Dr. David Alan Gilbert
@ 2025-01-28 14:24                         ` Stefan
  2025-02-02  8:32                           ` Bruno Gravato
  2025-02-03 18:48                           ` Stefan
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan @ 2025-01-28 14:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, bugzilla-daemon
  Cc: Christoph Hellwig, Thorsten Leemhuis, bugzilla-daemon,
	Mario Limonciello, Bruno Gravato, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

Hi,

Am 28.01.25 um 13:52 schrieb Dr. David Alan Gilbert:
> Is there any characterisation of the corrupted data; last time I
> looked at the bz there wasn't.

Yes, there is. (And I already reported it at least on the Debian bug
tracker, see links in the initial message.)

f3 reports overwritten sectors, i.e. it looks like the pseudo-random
test pattern is written to wrong position. These corruptions occur in
clusters whose size is an integer multiple of 2^17 bytes in most cases
(about 80%) and 2^15 in all cases.

The frequency of these corruptions is roughly 1 cluster per 50 GB written.

Can others confirm this or do they observe a different characteristic?

Regards Stefan


> I mean, is it reliably any of:
>     a) What's the size of the corruption?
>            block, cache line, word, bit???
>     b) Position?
>            e.g. last word in a block or something?
>     c) Data?
>            pile of zero's/ff's junk/etc?
>
>     d) Is it a missed write, old data, or partially written block?
>
> Dave
>
>>> Puh.  I'm kinda lost on what we could do about this on the Linux
>>> side.
>>
>> Because it also depends on the CPU series, a firmware or hardware issue
>> seems to be more likely than a Linux bug.
>>
>> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
>> them to. But they have Chinese new year holidays in Taiwan this week.)
>>
>> If they can't reproduce it, they have to provide an explanation why the
>> issues are seen by so many users.
>>
>> Regards Stefan
>>
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-28 14:24                         ` Stefan
@ 2025-02-02  8:32                           ` Bruno Gravato
  2025-02-04  6:12                             ` Christoph Hellwig
  2025-02-03 18:48                           ` Stefan
  1 sibling, 1 reply; 31+ messages in thread
From: Bruno Gravato @ 2025-02-02  8:32 UTC (permalink / raw)
  To: Stefan
  Cc: Dr. David Alan Gilbert, Christoph Hellwig, Thorsten Leemhuis,
	Mario Limonciello, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

I just realized I replied only to the bugzilla list. Sorry about that.

So I'm forwarding my reply to everyone else who was in CC and may not
be getting the bugzilla emails.

> > Is there any characterisation of the corrupted data; last time I
> > looked at the bz there wasn't.
>
> Yes, there is. (And I already reported it at least on the Debian bug
> tracker, see links in the initial message.)
>
> f3 reports overwritten sectors, i.e. it looks like the pseudo-random
> test pattern is written to wrong position. These corruptions occur in
> clusters whose size is an integer multiple of 2^17 bytes in most cases
> (about 80%) and 2^15 in all cases.
>
> The frequency of these corruptions is roughly 1 cluster per 50 GB written.
>
> Can others confirm this or do they observe a different characteristic?

In my tests I was using real data: a backup of my files.

On one such test I copied over 300K files, variables sizes and types
totalling about 60GB. A bit over 20 files got corrupted.
I tried copying the files over the network (ethernet) using rsync/ssh.
I also tried restoring the files using restic (over ssh as well). And
I also tried copying the files locally from a SATA disk. In all cases
I got similar results with some files being corrupted.
The destination nvme disk was using btrfs and running btrfs scrub
after the copy detects quite a few checksum errors.

I analyzed some of those corrupted files and one of them happened to
be a text file (linux kernel source code).
A big portion of the text was replaced with text from another file in
the same directory (being text made it easy to find where it came
from).
So this was a contiguous block of text that was overwritten with a
contiguous block of text from another file.
If I remember correctly the other file was not corrupted (so the
blocks weren't swapped). It looked like a certain block of text was
written twice: on the correct file and on another file in the same
directory.

I also got some jpeg images corrupted. I was able to open and view
(partially) those images and it looked like a portion of the image was
repeated in a different part of it), so blocks of the same file were
probably duplicated and overwritten within itself.

The blocks being overwritten seemed to be different sizes on different files.

Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-28 14:24                         ` Stefan
  2025-02-02  8:32                           ` Bruno Gravato
@ 2025-02-03 18:48                           ` Stefan
  2025-02-06 15:58                             ` Stefan
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan @ 2025-02-03 18:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, bugzilla-daemon
  Cc: Christoph Hellwig, Thorsten Leemhuis, Mario Limonciello,
	Bruno Gravato, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

Hi,

just got feedback from ASRock. They asked me to make a video from the
corruptions occurring on my remotely (and headless) running system.
Maybe I should make video of printing out the logs that can be found an
the Linux and Debian bug trackers ...

Seems that ASRock is unwilling to solve the problem.

Regards Stefan


Am 28.01.25 um 15:24 schrieb Stefan:
> Hi,
>
> Am 28.01.25 um 13:52 schrieb Dr. David Alan Gilbert:
>> Is there any characterisation of the corrupted data; last time I
>> looked at the bz there wasn't.
>
> Yes, there is. (And I already reported it at least on the Debian bug
> tracker, see links in the initial message.)
>
> f3 reports overwritten sectors, i.e. it looks like the pseudo-random
> test pattern is written to wrong position. These corruptions occur in
> clusters whose size is an integer multiple of 2^17 bytes in most cases
> (about 80%) and 2^15 in all cases.
>
> The frequency of these corruptions is roughly 1 cluster per 50 GB written.
>
> Can others confirm this or do they observe a different characteristic?
>
> Regards Stefan
>
>
>> I mean, is it reliably any of:
>>     a) What's the size of the corruption?
>>            block, cache line, word, bit???
>>     b) Position?
>>            e.g. last word in a block or something?
>>     c) Data?
>>            pile of zero's/ff's junk/etc?
>>
>>     d) Is it a missed write, old data, or partially written block?
>>
>> Dave
>>
>>>> Puh.  I'm kinda lost on what we could do about this on the Linux
>>>> side.
>>>
>>> Because it also depends on the CPU series, a firmware or hardware issue
>>> seems to be more likely than a Linux bug.
>>>
>>> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
>>> them to. But they have Chinese new year holidays in Taiwan this week.)
>>>
>>> If they can't reproduce it, they have to provide an explanation why the
>>> issues are seen by so many users.
>>>
>>> Regards Stefan
>>>
>>>
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-02-02  8:32                           ` Bruno Gravato
@ 2025-02-04  6:12                             ` Christoph Hellwig
  2025-02-04  9:12                               ` Bruno Gravato
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2025-02-04  6:12 UTC (permalink / raw)
  To: Bruno Gravato
  Cc: Stefan, Dr. David Alan Gilbert, Christoph Hellwig,
	Thorsten Leemhuis, Mario Limonciello, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

On Sun, Feb 02, 2025 at 08:32:31AM +0000, Bruno Gravato wrote:
> In my tests I was using real data: a backup of my files.
> 
> On one such test I copied over 300K files, variables sizes and types
> totalling about 60GB. A bit over 20 files got corrupted.
> I tried copying the files over the network (ethernet) using rsync/ssh.
> I also tried restoring the files using restic (over ssh as well). And
> I also tried copying the files locally from a SATA disk. In all cases
> I got similar results with some files being corrupted.
> The destination nvme disk was using btrfs and running btrfs scrub
> after the copy detects quite a few checksum errors.

So you used various different data sources, and the desintation was
always the nvme device in the suspect slot.

> I analyzed some of those corrupted files and one of them happened to
> be a text file (linux kernel source code).
> A big portion of the text was replaced with text from another file in
> the same directory (being text made it easy to find where it came
> from).
> So this was a contiguous block of text that was overwritten with a
> contiguous block of text from another file.
> If I remember correctly the other file was not corrupted (so the
> blocks weren't swapped). It looked like a certain block of text was
> written twice: on the correct file and on another file in the same
> directory.

That's a very interesting pattern.

> I also got some jpeg images corrupted. I was able to open and view
> (partially) those images and it looked like a portion of the image was
> repeated in a different part of it), so blocks of the same file were
> probably duplicated and overwritten within itself.
> 
> The blocks being overwritten seemed to be different sizes on different files.

This does sound like a fairly common pattern due to SSD FTL issues,
but I still don't want to rule out swiotlb, which due to the bucketing
could maybe also lead to these, but I can't really see how.  But the
fact that the affected systems seem to be using swiotlb despite no
good reason for them to do so still leaves me puzzled.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-01-17 10:30                   ` Thorsten Leemhuis
@ 2025-02-04  6:26                     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2025-02-04  6:26 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Christoph Hellwig, Bruno Gravato, Stefan, Keith Busch,
	bugzilla-daemon, Adrian Huang, Linux kernel regressions list,
	linux-nvme, Jens Axboe, iommu@lists.linux.dev, LKML,
	Mario Limonciello

On Fri, Jan 17, 2025 at 11:30:47AM +0100, Thorsten Leemhuis wrote:
> >> Side note: that "PCI-DMA: Using software bounce buffering for IO
> >>>> (SWIOTLB)" message does show up on two other AMD machines I own as
> >> well. One also has a Ryzen 8000, the other one a much older one.

The message will aways show with > 4G of memory.  It only implies swiotlb
is initialized, not that any device actually uses it.

> >> And BTW a few bits of the latest development in the bugzilla ticket
> >> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
> >>
> >> * iommu=pt and amd_iommu=off seems to work around the problem (in
> >> addition to disabling the iommu in the BIOS setup).

iommu_pt calls iommu_set_default_passthrough, which sets
iommu_def_domain_type to IOMMU_DOMAIN_IDENTITY.  I.e. the hardware
IOMMu is left on, but treated as a 1:1 mapping by Linux.

amd_iommu=off sets amd_iommu_disabled, which calls disable_iommus,
which from a quick read disables the hardware IOMMU.

In either case we'll end up using dma-direct instead of dma-iommu.

> > 
> > That suggests the problem is related to the dma-iommu code, and
> > my strong suspect is the swiotlb bounce buffering for untrusted
> > device.  If you feel adventurous, can you try building a kernel
> > where dev_use_swiotlb() in drivers/iommu/dma-iommu.c is hacked
> > to always return false?
> 
> Tried that, did not help: I still get corrupted data.

.. which together with this implies that the problem only happens
when using the dma-iommu code (with or without swiotlb buffering
for unaligned / untrusted data), and does not happen with
dma-direct.

If we assume it also is related to the optimal dma size, which
the original report suggests, the values for that might be
interesting.  For dma-iommu this is:

	PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);

where IOVA_RANGE_CACHE_MAX_SIZE is 6, i.e.

	PAGE_SIZE << 5 or 131072 for x86_64.

for dma-direct it falls back to dma_max_mapping_size, which is
SIZE_MAX without swiotlb, or swiotlb_max_mapping_size, which
is a bit complicate due to minimum alignment, but in this case
should evaluate to: 258048, which is almost twice as big.

And all this unfortunately leaves me really confused.  If someone is
interested in playing around with at the risk of data corruption it would
be interesting to hack hardcoded values into dma_opt_mapping_size, e.g.
plug in the 131072 used by dma-iommu while using dma-direct with the
above iommu disable options.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-02-04  6:12                             ` Christoph Hellwig
@ 2025-02-04  9:12                               ` Bruno Gravato
  0 siblings, 0 replies; 31+ messages in thread
From: Bruno Gravato @ 2025-02-04  9:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stefan, Dr. David Alan Gilbert, Thorsten Leemhuis,
	Mario Limonciello, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML, bugzilla-daemon

On Tue, 4 Feb 2025 at 06:12, Christoph Hellwig wrote:
>
> On Sun, Feb 02, 2025 at 08:32:31AM +0000, Bruno Gravato wrote:
> > In my tests I was using real data: a backup of my files.
> >
> > On one such test I copied over 300K files, variables sizes and types
> > totalling about 60GB. A bit over 20 files got corrupted.
> > I tried copying the files over the network (ethernet) using rsync/ssh.
> > I also tried restoring the files using restic (over ssh as well). And
> > I also tried copying the files locally from a SATA disk. In all cases
> > I got similar results with some files being corrupted.
> > The destination nvme disk was using btrfs and running btrfs scrub
> > after the copy detects quite a few checksum errors.
>
> So you used various different data sources, and the desintation was
> always the nvme device in the suspect slot.
>

Yes, regardless of the data source, the destination was always a
single nvme disk on the main M.2 nvme slot, with the secondary M.2
nvme slot empty.
I tried 3 different disks (WD, Crucial and Solidigm) with similar results.
If I put any of those disks on the secondary M.2 slot (with the main
slot empty) the problem doesn't occur.
The one that intrigues me most is if I put 2 nvme disks in, occupying
both M.2 slots, the problem doesn't occur either.
The secondary slot must be empty for the issue to happen.

I didn't try using the main M.2 slot as source instead of target, to
see if the problem also occurs on reading as well.
I could try that if you think it's worth testing.


> > I analyzed some of those corrupted files and one of them happened to
> > be a text file (linux kernel source code).
> > A big portion of the text was replaced with text from another file in
> > the same directory (being text made it easy to find where it came
> > from).
> > So this was a contiguous block of text that was overwritten with a
> > contiguous block of text from another file.
> > If I remember correctly the other file was not corrupted (so the
> > blocks weren't swapped). It looked like a certain block of text was
> > written twice: on the correct file and on another file in the same
> > directory.
>
> That's a very interesting pattern.
>
> > I also got some jpeg images corrupted. I was able to open and view
> > (partially) those images and it looked like a portion of the image was
> > repeated in a different part of it), so blocks of the same file were
> > probably duplicated and overwritten within itself.
> >
> > The blocks being overwritten seemed to be different sizes on different files.
>
> This does sound like a fairly common pattern due to SSD FTL issues,
> but I still don't want to rule out swiotlb, which due to the bucketing
> could maybe also lead to these, but I can't really see how.  But the
> fact that the affected systems seem to be using swiotlb despite no
> good reason for them to do so still leaves me puzzled.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
  2025-02-03 18:48                           ` Stefan
@ 2025-02-06 15:58                             ` Stefan
  0 siblings, 0 replies; 31+ messages in thread
From: Stefan @ 2025-02-06 15:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, bugzilla-daemon
  Cc: Christoph Hellwig, Thorsten Leemhuis, Mario Limonciello,
	Bruno Gravato, Keith Busch, Adrian Huang,
	Linux kernel regressions list, linux-nvme, Jens Axboe,
	iommu@lists.linux.dev, LKML

Hi,

after Matthias was so kind (more than me) to make a video (!) for the
ASRock support, and after I once again referred to this thread and the
many users who have the same problem, ASRock is able to reproduce the
issues.

Ralph, all tests in comment #40 (including the network issue) where run
twice, because I did not collect logs and lspci outputs the first time.
(The corruptions seem to depend on which PCIe devices / lanes (?) are
used. That's why I also included the lspci outputs.)

(As announced in initial message, I cannot run tests ATM and for a while.)

Regards Stefan


Am 03.02.25 um 19:48 schrieb Stefan:
> Hi,
>
> just got feedback from ASRock. They asked me to make a video from the
> corruptions occurring on my remotely (and headless) running system.
> Maybe I should make video of printing out the logs that can be found an
> the Linux and Debian bug trackers ...
>
> Seems that ASRock is unwilling to solve the problem.
>
> Regards Stefan
>
>
> Am 28.01.25 um 15:24 schrieb Stefan:
>> Hi,
>>
>> Am 28.01.25 um 13:52 schrieb Dr. David Alan Gilbert:
>>> Is there any characterisation of the corrupted data; last time I
>>> looked at the bz there wasn't.
>>
>> Yes, there is. (And I already reported it at least on the Debian bug
>> tracker, see links in the initial message.)
>>
>> f3 reports overwritten sectors, i.e. it looks like the pseudo-random
>> test pattern is written to wrong position. These corruptions occur in
>> clusters whose size is an integer multiple of 2^17 bytes in most cases
>> (about 80%) and 2^15 in all cases.
>>
>> The frequency of these corruptions is roughly 1 cluster per 50 GB
>> written.
>>
>> Can others confirm this or do they observe a different characteristic?
>>
>> Regards Stefan
>>
>>
>>> I mean, is it reliably any of:
>>>     a) What's the size of the corruption?
>>>            block, cache line, word, bit???
>>>     b) Position?
>>>            e.g. last word in a block or something?
>>>     c) Data?
>>>            pile of zero's/ff's junk/etc?
>>>
>>>     d) Is it a missed write, old data, or partially written block?
>>>
>>> Dave
>>>
>>>>> Puh.  I'm kinda lost on what we could do about this on the Linux
>>>>> side.
>>>>
>>>> Because it also depends on the CPU series, a firmware or hardware issue
>>>> seems to be more likely than a Linux bug.
>>>>
>>>> ATM ASRock is still trying to reproduce the issue. (I'm in contact with
>>>> them to. But they have Chinese new year holidays in Taiwan this week.)
>>>>
>>>> If they can't reproduce it, they have to provide an explanation why the
>>>> issues are seen by so many users.
>>>>
>>>> Regards Stefan
>>>>
>>>>
>>
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-02-06 15:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 14:38 [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Thorsten Leemhuis
2025-01-08 15:07 ` Keith Busch
2025-01-09  8:28   ` Christoph Hellwig
2025-01-09  8:52     ` Thorsten Leemhuis
2025-01-09 15:44       ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
2025-01-10 11:17         ` Bruno Gravato
2025-01-15  6:37         ` Bruno Gravato
2025-01-15  8:40           ` Thorsten Leemhuis
2025-01-16 17:29             ` Thorsten Leemhuis
2025-01-17  8:05             ` Christoph Hellwig
2025-01-17  9:51               ` Thorsten Leemhuis
2025-01-17  9:55                 ` Christoph Hellwig
2025-01-17 10:30                   ` Thorsten Leemhuis
2025-02-04  6:26                     ` Christoph Hellwig
2025-01-17 13:36                 ` Bruno Gravato
2025-01-20 14:31                 ` Thorsten Leemhuis
2025-01-28  7:41                   ` Christoph Hellwig
2025-01-28 12:00                     ` Stefan
2025-01-28 12:52                       ` Dr. David Alan Gilbert
2025-01-28 14:24                         ` Stefan
2025-02-02  8:32                           ` Bruno Gravato
2025-02-04  6:12                             ` Christoph Hellwig
2025-02-04  9:12                               ` Bruno Gravato
2025-02-03 18:48                           ` Stefan
2025-02-06 15:58                             ` Stefan
2025-01-17 21:31               ` Stefan
2025-01-18  1:03                 ` Keith Busch
2025-01-15 10:47           ` Stefan
2025-01-15 13:14             ` Bruno Gravato
2025-01-15 16:26               ` Stefan
2025-01-10  0:10     ` [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox