linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@meta.com>
To: <linux-block@vger.kernel.org>, <linux-nvme@lists.infradead.org>
Cc: <hch@lst.de>, <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>
Subject: [PATCHv3 0/2] block+nvme: reducing virtual boundary mask reliance
Date: Thu, 21 Aug 2025 13:44:18 -0700	[thread overview]
Message-ID: <20250821204420.2267923-1-kbusch@meta.com> (raw)

From: Keith Busch <kbusch@kernel.org>

Previous version is here:

  https://lore.kernel.org/linux-nvme/20250805195608.2379107-1-kbusch@meta.com/

This patch set depends on this unmerged series for flexible direct-io
here:

  https://lore.kernel.org/linux-block/20250819164922.640964-1-kbusch@meta.com/

The purpose of this is to allow optimization decisions to happen per IO.
The virtual boundary that NVMe reports provides specific guarantees
about the data alignment, but that might not be large enough for some
CPU architectures to take advantage of even iif an applications uses
aligned data buffers that could use it.

At the same time, the virtual boundary prevents the driver from directly
using memory in ways the hardware may be capable of accessing. This
creates unnecessary needs on applications to double buffer their data
into the more restrictive virtually contiguous format.

This patch series provides an efficient way to track page boundary gaps
per-IO so that the optimizations can be decided per-IO. This provides
flexibility to use all hardware to their abilities beyond what the
virtual boundary mask can provide.

Note, abuse of this capability may result in worse performance compared
to the bounce buffer solutions. Sending a bunch of tiny vectors for one
IO incurs significant protocol overhead, so while this patch set allows
you to do that, I recommend that you don't. We can't enforce a minimum
size though because vectors may straddle pages with only a few words in
the first and/or last pages, which we do need to support.

Changes from v2:

  - We only need to know about the lowest set bit in any bio vector page
    gap. Use that to avoid increasing the bio size

  - Fixed back merges; the previous was potentially missing one of the
    bio's gaps

  - Use pointers instead of relying on the inline to generate good code.

  - Trivial name changes

  - Comments explaing the new bio field, and the nvme usage for deciding
    on SGL vs PRP DMA.

Keith Busch (2):
  block: accumulate segment page gaps per bio
  nvme: remove virtual boundary for sgl capable devices

 block/bio.c               |  1 +
 block/blk-merge.c         | 39 ++++++++++++++++++++++++++++++++++++---
 block/blk-mq-dma.c        |  3 +--
 block/blk-mq.c            | 10 ++++++++++
 drivers/nvme/host/core.c  | 21 ++++++++++++++++-----
 drivers/nvme/host/pci.c   | 16 +++++++++++++---
 include/linux/blk-mq.h    |  2 ++
 include/linux/blk_types.h |  8 ++++++++
 8 files changed, 87 insertions(+), 13 deletions(-)

-- 
2.47.3



             reply	other threads:[~2025-08-22  2:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 20:44 Keith Busch [this message]
2025-08-21 20:44 ` [PATCHv3 1/2] block: accumulate segment page gaps per bio Keith Busch
2025-08-25 13:46   ` Christoph Hellwig
2025-08-25 14:10     ` Keith Busch
2025-08-26 13:03       ` Christoph Hellwig
2025-08-26 13:47         ` Keith Busch
2025-08-26 13:57           ` Christoph Hellwig
2025-08-26 22:33             ` Keith Busch
2025-08-27  7:37               ` Christoph Hellwig
2025-08-30  1:47                 ` Keith Busch
2025-09-02  5:36                   ` Christoph Hellwig
2025-08-21 20:44 ` [PATCHv3 2/2] nvme: remove virtual boundary for sgl capable devices Keith Busch
2025-08-25 13:49   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250821204420.2267923-1-kbusch@meta.com \
    --to=kbusch@meta.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).