Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 0/2] block+nvme: removing virtual boundary mask reliance
@ 2025-10-07 17:52 Keith Busch
  2025-10-07 17:52 ` [PATCHv4 1/2] block: accumulate memory segment gaps per bio Keith Busch
  2025-10-07 17:52 ` [PATCHv4 2/2] nvme: remove virtual boundary for sgl capable devices Keith Busch
  0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2025-10-07 17:52 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, axboe, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Previous version here:

  https://lore.kernel.org/linux-nvme/20250821204420.2267923-1-kbusch@meta.com/

The purpose is to allow optimization decisions to happen per IO, and
flexibility to utilize unaligned buffers for hardware that supports it.

The virtual boundary that NVMe uses provides specific guarantees about
the data alignment, but that might not be large enough for some CPU
architectures to take advantage of even if an applications uses aligned
data buffers that could use it.

At the same time, the virtual boundary prevents the driver from directly
using memory in ways the hardware may be capable of accessing. This
creates unnecessary needs on applications to double buffer their data
into a more restrictive virtually contiguous format.

This patch series provides an efficient way to track segment  boundary
gaps per-IO so that the optimizations can be decided per-IO. This
provides flexibility to use all hardware to their abilities beyond what
the virtual boundary mask can provide.

Note, abuse of this capability may result in worse performance compared
to the bounce buffer solutions. Sending a bunch of tiny vectors for one
IO incurs significant protocol overhead, so while this patch set allows
you to do that, I recommend that you don't. We can't enforce a minimum
size though because vectors may straddle pages with only a few words in
the first and/or last pages, which we do need to support.

Changes from v3:

 - More comments explaining what the new fields are for

 - A bit of refactoring to reuse the bvec gap code

 - Also count gaps for passthrough commands, as it's possible to send
   vectored IO through that interface too.

 - The nvme side has all the transport ops specify a callback to get the
   desired virtual boundary. PCI supports no boundary for SGL capable
   devices, while TCP and FC never needed it. RDMA and Apple continue to
   use current virtual boundary mask as it's not clear if its safe to
   remove it for those.

Keith Busch (2):
  block: accumulate memory segment gaps per bio
  nvme: remove virtual boundary for sgl capable devices

 block/bio.c                 |  1 +
 block/blk-map.c             |  3 +++
 block/blk-merge.c           | 39 ++++++++++++++++++++++++++++++++++---
 block/blk-mq-dma.c          |  3 +--
 block/blk-mq.c              | 10 ++++++++++
 block/blk.h                 |  9 +++++++--
 drivers/nvme/host/apple.c   |  1 +
 drivers/nvme/host/core.c    | 10 +++++-----
 drivers/nvme/host/fabrics.h |  6 ++++++
 drivers/nvme/host/fc.c      |  1 +
 drivers/nvme/host/nvme.h    |  7 +++++++
 drivers/nvme/host/pci.c     | 28 +++++++++++++++++++++++---
 drivers/nvme/host/rdma.c    |  1 +
 drivers/nvme/host/tcp.c     |  1 +
 drivers/nvme/target/loop.c  |  1 +
 include/linux/bio.h         |  2 ++
 include/linux/blk-mq.h      |  8 ++++++++
 include/linux/blk_types.h   | 12 ++++++++++++
 18 files changed, 128 insertions(+), 15 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-13 21:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-07 17:52 [PATCHv4 0/2] block+nvme: removing virtual boundary mask reliance Keith Busch
2025-10-07 17:52 ` [PATCHv4 1/2] block: accumulate memory segment gaps per bio Keith Busch
2025-10-10  5:34   ` Christoph Hellwig
2025-10-13 21:33     ` Keith Busch
2025-10-07 17:52 ` [PATCHv4 2/2] nvme: remove virtual boundary for sgl capable devices Keith Busch
2025-10-10  5:34   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox