From: Christoph Hellwig <hch@lst.de>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: "Christoph Hellwig" <hch@lst.de>,
"Logan Gunthorpe" <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
"Stephen Bates" <sbates@raithlin.com>,
"Jens Axboe" <axboe@kernel.dk>,
"Keith Busch" <keith.busch@intel.com>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Max Gurtovoy" <maxg@mellanox.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>
Subject: Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()
Date: Mon, 8 Jan 2018 19:34:34 +0100 [thread overview]
Message-ID: <20180108183434.GA15549@lst.de> (raw)
In-Reply-To: <20180108180917.GF11348@ziepe.ca>
On Mon, Jan 08, 2018 at 11:09:17AM -0700, Jason Gunthorpe wrote:
> > As usual we implement what actually has a consumer. On top of that the
> > R/W API is the only core RDMA API that actually does DMA mapping for the
> > ULP at the moment.
>
> Well again the same can be said for dma_map_page vs dma_map_sg...
I don't understand this comment.
>
> > For SENDs and everything else dma maps are done by the ULP (I'd like
> > to eventually change that, though - e.g. sends through that are
> > inline to the workqueue don't need a dma map to start with).
>
>
> > That's because the initial design was to let the ULPs do the DMA
> > mappings, which fundamentally is wrong. I've fixed it for the R/W
> > API when adding it, but no one has started work on SENDs and atomics.
>
> Well, you know why it is like this, and it is very complicated to
> unwind - the HW driver does not have enough information during CQ
> processing to properly do any unmaps, let alone serious error tear
> down unmaps, so we'd need a bunch of new APIs developed first, like RW
> did. :\
Yes, if it was trivial we would have done it already.
> > > And on that topic, does this scheme work with HFI?
> >
> > No, and I guess we need an opt-out. HFI generally seems to be
> > extremely weird.
>
> This series needs some kind of fix so HFI, QIB, rxe, etc don't get
> broken, and it shouldn't be 'fixed' at the RDMA level.
I don't think rxe is a problem as it won't show up a pci device.
HFI and QIB do show as PCI devices, and could be used for P2P transfers
from the PCI point of view. It's just that they have a layer of
software indirection between their hardware and what is exposed at
the RDMA layer.
So I very much disagree about where to place that workaround - the
RDMA code is exactly the right place.
> > > This is why P2P must fit in to the common DMA framework somehow, we
> > > rely on these abstractions to work properly and fully in RDMA.
> >
> > Moving P2P up to common RDMA code isn't going to fix this. For that
> > we need to stop preting that something that isn't DMA can abuse the
> > dma mapping framework, and until then opt them out of behavior that
> > assumes actual DMA like P2P.
>
> It could, if we had a DMA op for p2p then the drivers that provide
> their own ops can implement it appropriately or not at all.
>
> Eg the correct implementation for rxe to support p2p memory is
> probably somewhat straightfoward.
But P2P is _not_ a factor of the dma_ops implementation at all,
it is something that happens behind the dma_map implementation.
Think about what the dma mapping routines do:
(a) translate from host address to bus addresses
and
(b) flush caches (in non-coherent architectures)
Both are obviously not needed for P2P transfers, as they never reach
the host.
> Very long term the IOMMUs under the ops will need to care about this,
> so the wrapper is not an optimal place to put it - but I wouldn't
> object if it gets it out of RDMA :)
Unless you have an IOMMU on your PCIe switch and not before/inside
the root complex that is not correct.
next prev parent reply other threads:[~2018-01-08 18:34 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-04 19:01 [PATCH 00/11] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 01/12] pci-p2p: Support peer to peer memory Logan Gunthorpe
2018-01-04 21:40 ` Bjorn Helgaas
2018-01-04 23:06 ` Logan Gunthorpe
2018-01-04 21:59 ` Bjorn Helgaas
2018-01-05 0:20 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 02/12] pci-p2p: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-01-04 21:50 ` Bjorn Helgaas
2018-01-04 22:25 ` Jason Gunthorpe
2018-01-04 23:13 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 03/12] pci-p2p: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 04/12] pci-p2p: Clear ACS P2P flags for all client devices Logan Gunthorpe
2018-01-04 21:57 ` Bjorn Helgaas
2018-01-04 22:35 ` Alex Williamson
2018-01-05 0:00 ` Logan Gunthorpe
2018-01-05 1:09 ` Logan Gunthorpe
2018-01-05 3:33 ` Alex Williamson
2018-01-05 6:47 ` Jerome Glisse
2018-01-05 15:41 ` Alex Williamson
2018-01-05 17:10 ` Logan Gunthorpe
2018-01-05 17:18 ` Alex Williamson
2018-01-04 19:01 ` [PATCH 05/12] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-01-04 19:22 ` Jason Gunthorpe
2018-01-04 19:52 ` Logan Gunthorpe
2018-01-04 22:13 ` Jason Gunthorpe
2018-01-04 23:44 ` Logan Gunthorpe
2018-01-05 4:50 ` Jason Gunthorpe
2018-01-08 14:59 ` Christoph Hellwig
2018-01-08 18:09 ` Jason Gunthorpe
2018-01-08 18:17 ` Logan Gunthorpe
2018-01-08 18:29 ` Jason Gunthorpe
2018-01-08 18:34 ` Christoph Hellwig [this message]
2018-01-08 18:44 ` Logan Gunthorpe
2018-01-08 18:57 ` Christoph Hellwig
2018-01-08 19:05 ` Logan Gunthorpe
2018-01-09 16:47 ` Christoph Hellwig
2018-01-08 19:49 ` Jason Gunthorpe
2018-01-09 16:46 ` Christoph Hellwig
2018-01-09 17:10 ` Jason Gunthorpe
2018-01-08 19:01 ` Jason Gunthorpe
2018-01-09 16:55 ` Christoph Hellwig
2018-01-04 19:01 ` [PATCH 07/12] nvme-pci: clean up CMB initialization Logan Gunthorpe
2018-01-04 19:08 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 08/12] nvme-pci: clean up SMBSZ bit definitions Logan Gunthorpe
2018-01-04 19:08 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 09/12] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-01-05 15:30 ` Marta Rybczynska
2018-01-05 18:14 ` Logan Gunthorpe
2018-01-05 18:11 ` Keith Busch
2018-01-05 18:19 ` Logan Gunthorpe
2018-01-05 19:01 ` Keith Busch
2018-01-05 19:04 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 10/12] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 11/12] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 12/12] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180108183434.GA15549@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maxg@mellanox.com \
--cc=sagi@grimberg.me \
--cc=sbates@raithlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox