[LSF/MM/BPF TOPIC] DMA mapping API in complex scenarios

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Leon Romanovsky <leon@kernel.org>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	"Matthew Wilcox" <willy@infradead.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"David Howells" <dhowells@redhat.com>,
	"Chaitanya Kulkarni" <chaitanyak@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Simona Vetter" <simona.vetter@ffwll.ch>,
	"RDMA mailing list" <linux-rdma@vger.kernel.org>
Subject: [LSF/MM/BPF TOPIC] DMA mapping API in complex scenarios
Date: Wed, 22 Jan 2025 09:16:00 +0200	[thread overview]
Message-ID: <20250122071600.GC10702@unreal> (raw)

Currently the only efficient way to map a complex memory description through
the DMA API is by using the scatterlist APIs. The SG APIs are unique in that
they efficiently combine the two fundamental operations of sizing and allocating
a large IOVA window from the IOMMU and processing all the per-address
swiotlb/flushing/p2p/map details.

This uniqueness has been a long standing pain point as the scatterlist API
is mandatory, but expensive to use. It prevents any kind of optimization or
feature improvement (such as avoiding struct page for P2P) due to the
impossibility of improving the scatterlist.

Several approaches have been explored to expand the DMA API with additional
scatterlist-like structures (BIO, rlist), instead split up the DMA API
to allow callers to bring their own data structure [1].

The API is split up into parts:
 - Allocate IOVA space:
    To do any pre-allocation required. This is done based on the caller
    supplying some details about how much IOMMU address space it would need
    in worst case.
 - Map and unmap relevant structures to pre-allocated IOVA space:
    Perform the actual mapping into the pre-allocated IOVA. This is very
    similar to dma_map_page().

In this topic, I would like to present existing DMA API abuses and present path
to move forward with the help of new DMA API.

In this topic I will briefly present the new API and have a forward
looking discussion about how such a significant change is expected to
impact the kernel.

Particularly how this API fits with Matthew's phyr sketch, and where
we might see this go in the storage layer (David's proposal for iter [2]).
In addition, we will discuss the roadmap of converting DMABUF and RDMA to
SG-free world (Jasons's vision [3]):

 1) The new DMA API lands
 2) We improve the new DMA API to be fully struct page free, including
    setting up P2P
 3) VFIO provides a dmbuf exporter using the new DMA API's P2P
    support. We'd have to continue with the scatterlist hacks for now.
    VFIO would be a move_notify exporter. This should work with RDMA
 4) RDMA works to drop scatterlist from its internal flows and use the
    new DMA API instead.
 5) VFIO/RDMA implement a new non-scatterlist DMABUF op to
    demonstrate the post-scatterlist world and deprecate the scatterlist
    hacks.
 6) We add revoke semantics to dmabuf, and VFIO/RDMA implements them
 7) iommufd can import a pinnable revokable dmabuf using CPU pfns
    through the non-scatterlist op.
 8) Relevant GPU drivers implement the non-scatterlist op and RDMA
    removes support for the deprecated scatterlist hacks.

[1] https://lore.kernel.org/all/cover.1737106761.git.leon@kernel.org
[2] https://lore.kernel.org/all/886959.1737148612@warthog.procyon.org.uk/
[3] https://lore.kernel.org/all/20250120175901.GP5556@nvidia.com

----------------------------------------------------------------------------
 LWN coverage:
Dancing the DMA two-step - https://lwn.net/Articles/997563/
----------------------------------------------------------------------------

Thanks

                 reply	other threads:[~2025-01-22  7:16 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250122071600.GC10702@unreal \
    --to=leon@kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=simona.vetter@ffwll.ch \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).