public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Stefan <linux-kernel@simg.de>
To: Bruno Gravato <bgravato@gmail.com>, bugzilla-daemon@kernel.org
Cc: Keith Busch <kbusch@kernel.org>,
	bugzilla-daemon@kernel.org, Adrian Huang <ahuang12@lenovo.com>,
	Linux kernel regressions list <regressions@lists.linux.dev>,
	linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	LKML <linux-kernel@vger.kernel.org>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
Date: Wed, 15 Jan 2025 11:47:28 +0100	[thread overview]
Message-ID: <6c2a34ac-d158-4109-a166-e6d06cafa360@simg.de> (raw)
In-Reply-To: <CAOBLbT8103fAyoFNF8=YcEM1sM6HodcUe=Ee5NsE2hUkfCYv7g@mail.gmail.com>

Hi,

(replying to both, the mailing list and the kernel bug tracker)

Am 15.01.25 um 07:37 schrieb Bruno Gravato:
> I then removed the Solidigm disk from the secondary and kept the WD
> disk in the main M.2 slot. Rerun my tests (on kernel 6.11.5) and
> bang! btrfs scrub now detected quite a few checksum errors!
>
> I then tried disabling volatile write cache with "nvme set-feature
> /dev/nvme0 -f 6 -v 0" "nvme get-feature /dev/nvme0 -f 6" confirmed it
> was disabled, but /sys/block/nvme0n1/queue/fua still showed 1... Was
> that supposed to turn into 0?

You can check this using `nvme get-feature /dev/nvme0n1 -f 6`

> So it looks like the corruption only happens if only the main M.2
> slot is occupied and the secondary M.2 slot is free. With two nvme
> disks (one on each M.2 slot), there were no errors at all.
>
> Stefan, did you ever try running your tests with 2 nvme disks
> installed on both slots? Or did you use only one slot at a time?

No, I only tested these configurations:

1. 1st M.2: Lexar;    2nd M.2: empty
    (Easy to reproduce write errors)
2. 1st M.2: Kingsten; 2nd M.2: Lexar
    (Difficult to reproduce read errors with 6.1 Kernel, but no issues
    with a newer ones within several month of intense use)

I'll swap the SSD's soon. Then I will also test other configurations and
will try out a third SSD. If I get corruption with other SSD's, I will
check which modifications help.

Note that I need both SSD's (configuration 2) in about one week and
cannot change this for about 3 months (already announced this in December).

Thus, if there are things I shall test with configuration 1, please
inform me quickly.

Just as remainder (for those who did not read the two bug trackers):
I tested with `f3` (a utility used to detect scam disks) on ext4.
`f3` reports overwritten sectors. In configuration 1 this are write
errors (appear if I read again).

(If no other SSD-intense jobs are running), the corruption do not occur
in the last files, and I never noticed file system corruptions, only
file contents is corrupt. (This is probably luck, but also has something
to do with the journal and the time when file system information are
written.)


Am 13.01.25 um 22:01 schrieb bugzilla-daemon@kernel.org:
 > https://bugzilla.kernel.org/show_bug.cgi?id=219609
 >
 > --- Comment #21 from mbe ---
 > Hi,
 >
 > I did some more tests. At first I retrieved the following values
under debian
 >
 >> Debian 12, Kernel 6.1.119, no corruption
 >> cat /sys/class/block/nvme0n1/queue/max_hw_sectors_kb
 >> 2048
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_sectors_kb
 >> 1280
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_segments
 >> 127
 >>
 >> cat /sys/class/block/nvme0n1/queue/max_segment_size
 >> 4294967295
 >
 > To achieve the same values on Kernel 6.11.0-13, I had to make the
following
 > changes to drivers/nvme/host/pci.c
 >
 >> --- pci.c.org 2024-09-15 16:57:56.000000000 +0200
 >> +++ pci.c     2025-01-13 21:18:54.475903619 +0100
 >> @@ -41,8 +41,8 @@
 >>    * These can be higher, but we need to ensure that any command doesn't
 >>    * require an sg allocation that needs more than a page of data.
 >>    */
 >> -#define NVME_MAX_KB_SZ       8192
 >> -#define NVME_MAX_SEGS        128
 >> +#define NVME_MAX_KB_SZ       4096
 >> +#define NVME_MAX_SEGS        127
 >>   #define NVME_MAX_NR_ALLOCATIONS      5
 >>
 >>   static int use_threaded_interrupts;
 >> @@ -3048,8 +3048,8 @@
 >>         * Limit the max command size to prevent iod->sg allocations
going
 >>         * over a single page.
 >>         */
 >> -     dev->ctrl.max_hw_sectors = min_t(u32,
 >> -             NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev)
 >> 9);
 >> +     //dev->ctrl.max_hw_sectors = min_t(u32,
 >> +     //      NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev)
 >> 9);
 >>        dev->ctrl.max_segments = NVME_MAX_SEGS;
 >>
 >>        /*
 >
 > So basically, dev->ctl.max_hw_sectors stays zero, so that in core.c
it is set
 > to the value of nvme_mps_to_sectors(ctrl, id->mdts)  (=> 4096 in my case)
This has the same effect as setting it to `dma_max_mapping_size(...)`

 >> if (id->mdts)
 >>    max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
 >> else
 >>    max_hw_sectors = UINT_MAX;
 >> ctrl->max_hw_sectors =
 >>    min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
 >
 > But that alone was not enough:
 > Tests with ctrl->max_hw_sectors=4096 and NVME_MAX_SEGS = 128 still
resulted in
 > corruptions.
 > They only went away after reverting this value back to 127 (the value
from
 > kernel 6.1).

That change was introduced in 6.3-rc1 using a patch "nvme-pci: place
descriptor addresses in iod" (
https://github.com/torvalds/linux/commit/7846c1b5a5db8bb8475603069df7c7af034fd081
)

This patch has no effect for me, i.e. unmodified kernels work up to 6.3.6.

The patch that triggers the corruptions is the one introduced in 6.3.7
  which replaces `dma_max_mapping_size(...)` by
`dma_opt_mapping_size(...)`. If I apply this change to 6.1, the
corruptions also occur in that kernel.

Matthias, did you checked what happens is you only modify NVME_MAX_SEGS
(and leave the `dev->ctrl.max_hw_sectors = min_t(u32, NVME_MAX_KB_SZ <<
1, dma_opt_mapping_size(&pdev->dev) >> 9);`)

 > Additional logging to get the values of the following statements
 >> (dma_opt_mapping_size(&pdev->dev) >> 9) = 256
 >> (dma_max_mapping_size(&pdev->dev) >> 9) = 36028797018963967 [sic!]
 >
 > @Stefan, can you check which value NVME_MAX_SEGS had in your tests?
 > It also seems to have an influence.

"128", see above.

Regards Stefan


  parent reply	other threads:[~2025-01-15 10:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 14:38 [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Thorsten Leemhuis
2025-01-08 15:07 ` Keith Busch
2025-01-09  8:28   ` Christoph Hellwig
2025-01-09  8:52     ` Thorsten Leemhuis
2025-01-09 15:44       ` [Bug 219609] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G Stefan
2025-01-10 11:17         ` Bruno Gravato
2025-01-15  6:37         ` Bruno Gravato
2025-01-15  8:40           ` Thorsten Leemhuis
2025-01-16 17:29             ` Thorsten Leemhuis
2025-01-17  8:05             ` Christoph Hellwig
2025-01-17  9:51               ` Thorsten Leemhuis
2025-01-17  9:55                 ` Christoph Hellwig
2025-01-17 10:30                   ` Thorsten Leemhuis
2025-02-04  6:26                     ` Christoph Hellwig
2025-01-17 13:36                 ` Bruno Gravato
2025-01-20 14:31                 ` Thorsten Leemhuis
2025-01-28  7:41                   ` Christoph Hellwig
2025-01-28 12:00                     ` Stefan
2025-01-28 12:52                       ` Dr. David Alan Gilbert
2025-01-28 14:24                         ` Stefan
2025-02-02  8:32                           ` Bruno Gravato
2025-02-04  6:12                             ` Christoph Hellwig
2025-02-04  9:12                               ` Bruno Gravato
2025-02-03 18:48                           ` Stefan
2025-02-06 15:58                             ` Stefan
2025-01-17 21:31               ` Stefan
2025-01-18  1:03                 ` Keith Busch
2025-01-15 10:47           ` Stefan [this message]
2025-01-15 13:14             ` Bruno Gravato
2025-01-15 16:26               ` Stefan
2025-01-10  0:10     ` [Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c2a34ac-d158-4109-a166-e6d06cafa360@simg.de \
    --to=linux-kernel@simg.de \
    --cc=ahuang12@lenovo.com \
    --cc=axboe@fb.com \
    --cc=bgravato@gmail.com \
    --cc=bugzilla-daemon@kernel.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox