public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: "Pawel Anikiel" <panikiel@google.com>
Cc: bob.beckett@collabora.com, axboe@kernel.dk, hch@lst.de,
	kernel@collabora.com, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, sagi@grimberg.me
Subject: Re: [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk
Date: Thu, 14 Nov 2024 08:46:41 -0700	[thread overview]
Message-ID: <ZzYbYSTiMddjuVjF@kbusch-mbp> (raw)
In-Reply-To: <20241114113803.3571128-1-panikiel@google.com>

On Thu, Nov 14, 2024 at 11:38:03AM +0000, Pawel Anikiel wrote:
> I've been tracking down an issue that seems to be related (identical?) to
> this one, and I would like to propose a different fix.
> 
> I have a device with the aforementioned NVMe-eMMC bridge, and I was
> experiencing nvme read timeouts after updating the kernel from 5.15 to
> 6.6. Doing a kernel bisect, I arrived at the same dma pool commit as
> Robert in the original thread.
> 
> After trying out some changes in the nvme-pci driver, I came up with the
> same fix as in this thread: change the alignment of the small pool to
> 512. However, I wanted to get a deeper understanding of what's going on.
> 
> After a lot of analysis, I found out why the nvme timeouts were happening:
> The bridge incorrectly implements PRP list chaining.
> 
> When doing a read of exactly 264 sectors, and allocating a PRP list with
> offset 0xf00, the last PRP entry in that list lies right before a page
> boundary.  The bridge incorrectly (?) assumes that it's a pointer to a
> chained PRP list, tries to do a DMA to address 0x0, gets a bus error,
> and crashes.
> 
> When doing a write of 264 sectors with PRP list offset of 0xf00,
> the bridge treats data as a pointer, and writes incorrect data to
> the drive. This might be why Robert is experiencing fs corruption.

This sounds very plausible, great analysis. Curious though, even without
the dma pool optimizations, you could still allocate a PRP list at that
offset. I wonder why the problem only showed up once we optimized the
pool allocator.
 
> So if my findings are right, the correct quirk would be "don't make PRP
> lists ending on a page boundary".

Coincidently enough, the quirk in this patch achieves that. But it's
great to understand why it was successful.

> Changing the small dma pool alignment to 512 happens to fix the issue
> because it never allocates a PRP list with offset 0xf00. Theoretically,
> the issue could still happen with the page pool, but this bridge has
> a max transfer size of 64 pages, which is not enough to fill an entire
> page-sized PRP list.

Thanks, this answers my question in the other thread: MDTS is too small
to hit the same bug with the large pool.


  parent reply	other threads:[~2024-11-14 15:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12 19:50 [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk Bob Beckett
2024-11-12 20:47 ` Keith Busch
2024-11-13  4:31 ` Christoph Hellwig
2024-11-13 18:05   ` Keith Busch
2024-11-13 20:08     ` Robert Beckett
2024-11-14  5:55     ` Christoph Hellwig
2024-11-14 13:10       ` Robert Beckett
2024-11-14 11:38 ` Paweł Anikiel
2024-11-14 12:17   ` Christoph Hellwig
2024-11-14 15:37     ` Keith Busch
2024-11-14 13:24   ` Robert Beckett
2024-11-14 14:13     ` Paweł Anikiel
2024-11-14 16:28       ` Robert Beckett
2024-11-22 19:36         ` Keith Busch
2024-12-09 12:32           ` Robert Beckett
2024-12-09 15:33             ` Paweł Anikiel
2024-12-10 21:36               ` Keith Busch
2024-12-11 10:55                 ` Robert Beckett
2024-12-17 11:18                   ` Paweł Anikiel
2024-11-14 15:46   ` Keith Busch [this message]
2024-11-14 16:47     ` Robert Beckett
2024-11-14 18:00 ` Keith Busch
2024-11-14 18:01   ` Jens Axboe
2024-11-25 11:01 ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzYbYSTiMddjuVjF@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bob.beckett@collabora.com \
    --cc=hch@lst.de \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=panikiel@google.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox