All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: "Pawel Anikiel" <panikiel@google.com>
Cc: bob.beckett@collabora.com, axboe@kernel.dk, hch@lst.de,
	kernel@collabora.com, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, sagi@grimberg.me
Subject: Re: [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk
Date: Thu, 14 Nov 2024 08:46:41 -0700	[thread overview]
Message-ID: <ZzYbYSTiMddjuVjF@kbusch-mbp> (raw)
In-Reply-To: <20241114113803.3571128-1-panikiel@google.com>

On Thu, Nov 14, 2024 at 11:38:03AM +0000, Pawel Anikiel wrote:
> I've been tracking down an issue that seems to be related (identical?) to
> this one, and I would like to propose a different fix.
> 
> I have a device with the aforementioned NVMe-eMMC bridge, and I was
> experiencing nvme read timeouts after updating the kernel from 5.15 to
> 6.6. Doing a kernel bisect, I arrived at the same dma pool commit as
> Robert in the original thread.
> 
> After trying out some changes in the nvme-pci driver, I came up with the
> same fix as in this thread: change the alignment of the small pool to
> 512. However, I wanted to get a deeper understanding of what's going on.
> 
> After a lot of analysis, I found out why the nvme timeouts were happening:
> The bridge incorrectly implements PRP list chaining.
> 
> When doing a read of exactly 264 sectors, and allocating a PRP list with
> offset 0xf00, the last PRP entry in that list lies right before a page
> boundary.  The bridge incorrectly (?) assumes that it's a pointer to a
> chained PRP list, tries to do a DMA to address 0x0, gets a bus error,
> and crashes.
> 
> When doing a write of 264 sectors with PRP list offset of 0xf00,
> the bridge treats data as a pointer, and writes incorrect data to
> the drive. This might be why Robert is experiencing fs corruption.

This sounds very plausible, great analysis. Curious though, even without
the dma pool optimizations, you could still allocate a PRP list at that
offset. I wonder why the problem only showed up once we optimized the
pool allocator.
 
> So if my findings are right, the correct quirk would be "don't make PRP
> lists ending on a page boundary".

Coincidently enough, the quirk in this patch achieves that. But it's
great to understand why it was successful.

> Changing the small dma pool alignment to 512 happens to fix the issue
> because it never allocates a PRP list with offset 0xf00. Theoretically,
> the issue could still happen with the page pool, but this bridge has
> a max transfer size of 64 pages, which is not enough to fill an entire
> page-sized PRP list.

Thanks, this answers my question in the other thread: MDTS is too small
to hit the same bug with the large pool.


  parent reply	other threads:[~2024-11-14 15:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12 19:50 [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk Bob Beckett
2024-11-12 20:47 ` Keith Busch
2024-11-13  4:31 ` Christoph Hellwig
2024-11-13 18:05   ` Keith Busch
2024-11-13 20:08     ` Robert Beckett
2024-11-14  5:55     ` Christoph Hellwig
2024-11-14 13:10       ` Robert Beckett
2024-11-14 11:38 ` Paweł Anikiel
2024-11-14 12:17   ` Christoph Hellwig
2024-11-14 15:37     ` Keith Busch
2024-11-14 13:24   ` Robert Beckett
2024-11-14 14:13     ` Paweł Anikiel
2024-11-14 16:28       ` Robert Beckett
2024-11-22 19:36         ` Keith Busch
2024-12-09 12:32           ` Robert Beckett
2024-12-09 15:33             ` Paweł Anikiel
2024-12-10 21:36               ` Keith Busch
2024-12-11 10:55                 ` Robert Beckett
2024-12-17 11:18                   ` Paweł Anikiel
2024-11-14 15:46   ` Keith Busch [this message]
2024-11-14 16:47     ` Robert Beckett
2024-11-14 18:00 ` Keith Busch
2024-11-14 18:01   ` Jens Axboe
2024-11-25 11:01 ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzYbYSTiMddjuVjF@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bob.beckett@collabora.com \
    --cc=hch@lst.de \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=panikiel@google.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.