From: Jason Gunthorpe <jgg@ziepe.ca>
To: Christoph Hellwig <hch@lst.de>
Cc: "Leon Romanovsky" <leon@kernel.org>,
"Robin Murphy" <robin.murphy@arm.com>,
"Marek Szyprowski" <m.szyprowski@samsung.com>,
"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
"Chaitanya Kulkarni" <chaitanyak@nvidia.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Jens Axboe" <axboe@kernel.dk>, "Keith Busch" <kbusch@kernel.org>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>,
"Kevin Tian" <kevin.tian@intel.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
iommu@lists.linux.dev, linux-nvme@lists.infradead.org,
kvm@vger.kernel.org, linux-mm@kvack.org,
"Bart Van Assche" <bvanassche@acm.org>,
"Damien Le Moal" <damien.lemoal@opensource.wdc.com>,
"Amir Goldstein" <amir73il@gmail.com>,
"josef@toxicpanda.com" <josef@toxicpanda.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"Dan Williams" <dan.j.williams@intel.com>,
"jack@suse.com" <jack@suse.com>,
"Zhu Yanjun" <zyjzyj2000@gmail.com>
Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps
Date: Wed, 6 Mar 2024 13:44:56 -0400 [thread overview]
Message-ID: <20240306174456.GO9225@ziepe.ca> (raw)
In-Reply-To: <20240306162022.GB28427@lst.de>
On Wed, Mar 06, 2024 at 05:20:22PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 06, 2024 at 11:43:28AM -0400, Jason Gunthorpe wrote:
> > I don't think they are so fundamentally different, at least in our
> > past conversations I never came out with the idea we should burden the
> > driver with two different flows based on what kind of alignment the
> > transfer happens to have.
>
> Then we talked past each other..
Well, we never talked to such detail
> > > So if we want to efficiently be able to handle these cases we need
> > > two APIs in the driver and a good framework to switch between them.
> >
> > But, what does the non-page-aligned version look like? Doesn't it
> > still look basically like this?
>
> I'd just rather have the non-aligned case for those who really need
> it be the loop over map single region that is needed for the direct
> mapping anyway.
There is a list of interesting cases this has to cover:
1. Direct map. No dma_addr_t at unmap, multiple HW SGLs
2. IOMMU aligned map, no P2P. Only IOVA range at unmap, single HW SGLs
3. IOMMU aligned map, P2P. Only IOVA range at unmap, multiple HW SGLs
4. swiotlb single range. Only IOVA range at unmap, single HW SGL
5. swiotlb multi-range. All dma_addr_t's at unmap, multiple HW SGLs.
6. Unaligned IOMMU. Only IOVA range at unmap, multiple HW SGLs
I think we agree that 1 and 2 should be optimized highly as they are
the common case. That mainly means no dma_addr_t storage in either
5 is the slowest and has the most overhead.
4 is basically the same as 2 from the driver's viewpoint
3 is quite similar to 1, but it has the IOVA range at unmap.
6 doesn't have to be optimal, from the driver perspective it can be
like 5
That is three basic driver flows 1/3, 2/4 and 5/6
So are you thinking something more like a driver flow of:
.. extent IO and get # aligned pages and know if there is P2P ..
dma_init_io(state, num_pages, p2p_flag)
if (dma_io_single_range(state)) {
// #2, #4
for each io()
dma_link_aligned_pages(state, io range)
hw_sgl = (state->iova, state->len)
} else {
// #1, #3, #5, #6
hw_sgls = alloc_hw_sgls(num_ios)
if (dma_io_needs_dma_addr_unmap(state))
dma_addr_storage = alloc_num_ios(); // #5 only
for each io()
hw_sgl[i] = dma_map_single(state, io range)
if (dma_addr_storage)
dma_addr_storage[i] = hw_sgl[i]; // #5 only
}
?
This is not quite what you said, we split the driver flow based on
needing 1 HW SGL vs need many HW SGL.
> > So are they really so different to want different APIs? That strikes
> > me as a big driver cost.
>
> To not have to store a dma_address range per CPU range that doesn't
> actually get used at all.
Right, that is a nice optimization we should reach for.
Jason
next prev parent reply other threads:[~2024-03-06 17:45 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-05 11:18 [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 01/16] mm/hmm: let users to tag specific PFNs Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 02/16] dma-mapping: provide an interface to allocate IOVA Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 03/16] dma-mapping: provide callbacks to link/unlink pages to specific IOVA Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 04/16] iommu/dma: Provide an interface to allow preallocate IOVA Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 05/16] iommu/dma: Prepare map/unmap page functions to receive IOVA Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 06/16] iommu/dma: Implement link/unlink page callbacks Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 07/16] RDMA/umem: Preallocate and cache IOVA for UMEM ODP Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 08/16] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 09/16] RDMA/core: Separate DMA mapping to caching IOVA and page linkage Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 10/16] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 11/16] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 12/16] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 13/16] vfio/mlx5: Explicitly store page list Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 14/16] vfio/mlx5: Convert vfio to use DMA link API Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 15/16] block: add dma_link_range() based API Leon Romanovsky
2024-03-05 11:18 ` [RFC RESEND 16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL Leon Romanovsky
2024-03-05 15:51 ` Keith Busch
2024-03-05 16:08 ` Jens Axboe
2024-03-05 16:39 ` Chaitanya Kulkarni
2024-03-05 16:46 ` Chaitanya Kulkarni
2024-03-06 14:33 ` Christoph Hellwig
2024-03-06 15:05 ` Jason Gunthorpe
2024-03-06 16:14 ` Christoph Hellwig
2024-05-03 14:41 ` Zhu Yanjun
2024-05-05 13:23 ` Leon Romanovsky
2024-05-06 7:25 ` Zhu Yanjun
2024-03-05 12:05 ` [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Robin Murphy
2024-03-05 12:29 ` Leon Romanovsky
2024-03-06 14:44 ` Christoph Hellwig
2024-03-06 15:43 ` Jason Gunthorpe
2024-03-06 16:20 ` Christoph Hellwig
2024-03-06 17:44 ` Jason Gunthorpe [this message]
2024-03-06 22:14 ` Christoph Hellwig
2024-03-07 0:00 ` Jason Gunthorpe
2024-03-07 15:05 ` Christoph Hellwig
2024-03-07 21:01 ` Jason Gunthorpe
2024-03-08 16:49 ` Christoph Hellwig
2024-03-08 20:23 ` Jason Gunthorpe
2024-03-09 16:14 ` Christoph Hellwig
2024-03-10 9:35 ` Leon Romanovsky
2024-03-12 21:28 ` Christoph Hellwig
2024-03-13 7:46 ` Leon Romanovsky
2024-03-13 21:44 ` Christoph Hellwig
2024-03-19 15:36 ` Jason Gunthorpe
2024-03-20 8:55 ` Leon Romanovsky
2024-03-21 22:40 ` Christoph Hellwig
2024-03-22 17:46 ` Leon Romanovsky
2024-03-24 23:16 ` Christoph Hellwig
2024-03-21 22:39 ` Christoph Hellwig
2024-03-22 18:43 ` Jason Gunthorpe
2024-03-24 23:22 ` Christoph Hellwig
2024-03-27 17:14 ` Jason Gunthorpe
2024-03-07 6:01 ` Zhu Yanjun
2024-04-09 20:39 ` Zhu Yanjun
2024-05-02 23:32 ` Zeng, Oak
2024-05-03 11:57 ` Zhu Yanjun
2024-05-03 16:42 ` Jason Gunthorpe
2024-05-03 20:59 ` Zeng, Oak
2024-06-10 15:12 ` Zeng, Oak
2024-06-10 15:19 ` Zhu Yanjun
2024-06-10 16:18 ` Leon Romanovsky
2024-06-10 16:40 ` Zeng, Oak
2024-06-10 17:25 ` Jason Gunthorpe
2024-06-10 21:28 ` Zeng, Oak
2024-06-11 7:49 ` Zhu Yanjun
2024-06-11 15:45 ` Leon Romanovsky
2024-06-11 18:26 ` Zeng, Oak
2024-06-11 19:11 ` Leon Romanovsky
2024-06-11 15:39 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240306174456.GO9225@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=chaitanyak@nvidia.com \
--cc=corbet@lwn.net \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dan.j.williams@intel.com \
--cc=daniel@iogearbox.net \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jack@suse.com \
--cc=jglisse@redhat.com \
--cc=joro@8bytes.org \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=martin.petersen@oracle.com \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=will@kernel.org \
--cc=yishaih@nvidia.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).