From: Jason Gunthorpe <jgg@ziepe.ca>
To: Christoph Hellwig <hch@lst.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
linux-rdma@vger.kernel.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe <logang@deltatee.com>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH 2/5] RDMA/core: remove use of dma_virt_ops
Date: Wed, 4 Nov 2020 14:09:12 -0400 [thread overview]
Message-ID: <20201104180912.GS36674@ziepe.ca> (raw)
In-Reply-To: <20201104163135.GA15840@lst.de>
On Wed, Nov 04, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 04, 2020 at 11:52:55AM -0400, Jason Gunthorpe wrote:
> > It could work, I think a resonable ULP API would be to have some
> >
> > rdma_fill_ib_sge_from_sgl()
> > rdma_map_sge_single()
> > etc etc
> >
> > ie instead of wrappering the DMA API as-is we have a new API that
> > directly builds the ib_sge. It always fills the local_dma_lkey from
> > the pd, so it knows it is doing DMA from local kernel memory.
>
> Yeah.
>
> > Logically SW devices then have a local_dma_lkey MR that has an IOVA of
> > the CPU physical address space, not the DMA address space as HW
> > devices have. The ib_sge builders can know this detail and fill in
> > addr from either a cpu phyical or a dma map.
>
> I don't think the builders are the right place to do it - it really
> should to be in the low-level drivers for a bunch of reasons:
At this point we have little choice, the ULP is responsible for
map/unmap because the ULP owns the CQ and (batch) unmap is triggered
by some CQE.
Reworking all drivers to somehow keep track of unmaps a CQEs triggers
feels so hard at this point as to be impossible. It is why the
rdma_rw_ctx basically exists.
So we have to keep the current arrangment, when the ib_sge is built
the dma map must be conditionally done.
> 1) this avoids doing the dma_map when no DMA is performed, e.g. for
> mlx5 when send data is in the extended WQE
At least in the kernel, the ULP has to be involved today in
IB_SEND_INLINE. Userspace does an auto-copy, but that is because it
doesn't have dma map/unmap.
Without unmap tracking as above the caller must supply a specially
formed ib_sge when using IB_SEND_INLINE that specifies the CPU vaddr
so memcpy works.
But this all looks like dead code, no ULP sets IB_SEND_INLINE ??
> 2) to deal with the fact that dma mapping reduces the number of SGEs.
> When the system uses a modern IOMMU we'll always end up with a
> single IOVA range no matter how many pages were mapped originally.
> This means any MR process can actually be consolidated to use
> a single SGE with the local lkey.
Any place like rdma_rw_ctx_init() that decides dynamically between SGE
and MR becomes a mess. It would be manageable if rdma_rw_ctx_init()
was the only place doing this..
I haven't looked lately, but I wonder if it is feasible that all the
MR users would use this API?
Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Christoph Hellwig <hch@lst.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org,
iommu@lists.linux-foundation.org
Subject: Re: [PATCH 2/5] RDMA/core: remove use of dma_virt_ops
Date: Wed, 4 Nov 2020 14:09:12 -0400 [thread overview]
Message-ID: <20201104180912.GS36674@ziepe.ca> (raw)
In-Reply-To: <20201104163135.GA15840@lst.de>
On Wed, Nov 04, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 04, 2020 at 11:52:55AM -0400, Jason Gunthorpe wrote:
> > It could work, I think a resonable ULP API would be to have some
> >
> > rdma_fill_ib_sge_from_sgl()
> > rdma_map_sge_single()
> > etc etc
> >
> > ie instead of wrappering the DMA API as-is we have a new API that
> > directly builds the ib_sge. It always fills the local_dma_lkey from
> > the pd, so it knows it is doing DMA from local kernel memory.
>
> Yeah.
>
> > Logically SW devices then have a local_dma_lkey MR that has an IOVA of
> > the CPU physical address space, not the DMA address space as HW
> > devices have. The ib_sge builders can know this detail and fill in
> > addr from either a cpu phyical or a dma map.
>
> I don't think the builders are the right place to do it - it really
> should to be in the low-level drivers for a bunch of reasons:
At this point we have little choice, the ULP is responsible for
map/unmap because the ULP owns the CQ and (batch) unmap is triggered
by some CQE.
Reworking all drivers to somehow keep track of unmaps a CQEs triggers
feels so hard at this point as to be impossible. It is why the
rdma_rw_ctx basically exists.
So we have to keep the current arrangment, when the ib_sge is built
the dma map must be conditionally done.
> 1) this avoids doing the dma_map when no DMA is performed, e.g. for
> mlx5 when send data is in the extended WQE
At least in the kernel, the ULP has to be involved today in
IB_SEND_INLINE. Userspace does an auto-copy, but that is because it
doesn't have dma map/unmap.
Without unmap tracking as above the caller must supply a specially
formed ib_sge when using IB_SEND_INLINE that specifies the CPU vaddr
so memcpy works.
But this all looks like dead code, no ULP sets IB_SEND_INLINE ??
> 2) to deal with the fact that dma mapping reduces the number of SGEs.
> When the system uses a modern IOMMU we'll always end up with a
> single IOVA range no matter how many pages were mapped originally.
> This means any MR process can actually be consolidated to use
> a single SGE with the local lkey.
Any place like rdma_rw_ctx_init() that decides dynamically between SGE
and MR becomes a mess. It would be manageable if rdma_rw_ctx_init()
was the only place doing this..
I haven't looked lately, but I wonder if it is feasible that all the
MR users would use this API?
Jason
next prev parent reply other threads:[~2020-11-04 18:09 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-04 9:50 remove dma_virt_ops Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
2020-11-04 9:50 ` [PATCH 1/5] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
2020-11-04 9:50 ` [PATCH 2/5] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
2020-11-04 13:42 ` Jason Gunthorpe
2020-11-04 13:42 ` Jason Gunthorpe
2020-11-04 14:01 ` Christoph Hellwig
2020-11-04 14:01 ` Christoph Hellwig
2020-11-04 15:09 ` Bernard Metzler
2020-11-04 15:09 ` Bernard Metzler
2020-11-04 18:10 ` Jason Gunthorpe
2020-11-04 18:10 ` Jason Gunthorpe
2020-11-04 15:52 ` Jason Gunthorpe
2020-11-04 15:52 ` Jason Gunthorpe
2020-11-04 16:31 ` Christoph Hellwig
2020-11-04 16:31 ` Christoph Hellwig
2020-11-04 18:09 ` Jason Gunthorpe [this message]
2020-11-04 18:09 ` Jason Gunthorpe
2020-11-04 9:50 ` [PATCH 3/5] PCI/p2p: remove the DMA_VIRT_OPS hacks Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
2020-11-04 16:53 ` Bjorn Helgaas
2020-11-04 16:53 ` Bjorn Helgaas
2020-11-04 17:00 ` Logan Gunthorpe
2020-11-04 17:00 ` Logan Gunthorpe
2020-11-04 9:50 ` [PATCH 4/5] PCI/p2p: cleanup up __pci_p2pdma_map_sg a bit Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
2020-11-04 16:54 ` Bjorn Helgaas
2020-11-04 16:54 ` Bjorn Helgaas
2020-11-04 9:50 ` [PATCH 5/5] dma-mapping: remove dma_virt_ops Christoph Hellwig
2020-11-04 9:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104180912.GS36674@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=bhelgaas@google.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.