From: Christoph Hellwig <hch@lst.de>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@ziepe.ca>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org,
linux-rdma@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Bjorn Helgaas <bhelgaas@google.com>,
Dan Williams <dan.j.williams@intel.com>,
Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
Stephen Bates <sbates@raithlin.com>
Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA
Date: Fri, 28 Jun 2019 15:38:37 +0200 [thread overview]
Message-ID: <20190628133837.GA3801@lst.de> (raw)
In-Reply-To: <e63d0259-e17f-effe-b76d-43dbfda8ae3a@deltatee.com>
On Thu, Jun 27, 2019 at 12:00:35PM -0600, Logan Gunthorpe wrote:
> > It is not. (c) is fundamentally very different as it is not actually
> > an operation that ever goes out to the wire at all, and which is why the
> > actual physical address on the wire does not matter at all.
> > Some interfaces like NVMe have designed it in a way that it the commands
> > used to do this internal transfer look like (b2), but that is just their
> > (IMHO very questionable) interface design choice, that produces a whole
> > chain of problems.
>
> >From the mapping device's driver's perspective yes, but from the
> perspective of a submitting driver they would be the same.
With your dma_addr_t scheme it won't be the same, as you'd need
a magic way to generate the internal addressing and stuff it into
the dma_addr_t. With a phys_addr_t based scheme they should basically
be all the same.
> Yes, you did suggest them. But what I'm trying to suggest is we don't
> *necessarily* need the lookup. For demonstration purposes only, a
> submitting driver could very roughly potentially do:
>
> struct bio_vec vec;
> dist = pci_p2pdma_dist(provider_pdev, mapping_pdev);
> if (dist < 0) {
> /* use regular memory */
> vec.bv_addr = virt_to_phys(kmalloc(...));
> vec.bv_flags = 0;
> } else if (dist & PCI_P2PDMA_THRU_HOST_BRIDGE) {
> vec.bv_addr = pci_p2pmem_alloc_phys(provider_pdev, ...);
> vec.bv_flags = BVEC_MAP_RESOURCE;
> } else {
> vec.bv_addr = pci_p2pmem_alloc_bus_addr(provider_pdev, ...);
> vec.bv_flags = BVEC_MAP_BUS_ADDR;
> }
That doesn't look too bad, except..
> -- And a mapping driver would roughly just do:
>
> dma_addr_t dma_addr;
> if (vec.bv_flags & BVEC_MAP_BUS_ADDR) {
> if (pci_bus_addr_in_bar(mapping_pdev, vec.bv_addr, &bar, &off)) {
> /* case (c) */
> /* program the DMA engine with bar and off */
Why bother with that here if we could also let the caller handle
that? pci_p2pdma_dist() should be able to trivially find that out
based on provider_dev == mapping_dev.
> The real difficulty here is that you'd really want all the above handled
> by a dma_map_bvec() so it can combine every vector hitting the IOMMU
> into a single continuous IOVA -- but it's hard to fit case (c) into that
> equation. So it might be that a dma_map_bvec() handles cases (a), (b1)
> and (b2) and the mapping driver has to then check each resulting DMA
> vector for pci_bus_addr_in_bar() while it is programming the DMA engine
> to deal with case (c).
I'd do it the other way around. pci_p2pdma_dist is used to find
the p2p type. The p2p type is stuff into the bio_vec, and we then:
(1) manually check for case (c) in driver for drivers that want to
treat it different from (b)
(2) we then have a dma mapping wrapper that checks the p2p type
and does the right thing for the rest.
WARNING: multiple messages have this Message-ID (diff)
From: hch@lst.de (Christoph Hellwig)
Subject: [RFC PATCH 00/28] Removing struct page from P2PDMA
Date: Fri, 28 Jun 2019 15:38:37 +0200 [thread overview]
Message-ID: <20190628133837.GA3801@lst.de> (raw)
In-Reply-To: <e63d0259-e17f-effe-b76d-43dbfda8ae3a@deltatee.com>
On Thu, Jun 27, 2019@12:00:35PM -0600, Logan Gunthorpe wrote:
> > It is not. (c) is fundamentally very different as it is not actually
> > an operation that ever goes out to the wire at all, and which is why the
> > actual physical address on the wire does not matter at all.
> > Some interfaces like NVMe have designed it in a way that it the commands
> > used to do this internal transfer look like (b2), but that is just their
> > (IMHO very questionable) interface design choice, that produces a whole
> > chain of problems.
>
> >From the mapping device's driver's perspective yes, but from the
> perspective of a submitting driver they would be the same.
With your dma_addr_t scheme it won't be the same, as you'd need
a magic way to generate the internal addressing and stuff it into
the dma_addr_t. With a phys_addr_t based scheme they should basically
be all the same.
> Yes, you did suggest them. But what I'm trying to suggest is we don't
> *necessarily* need the lookup. For demonstration purposes only, a
> submitting driver could very roughly potentially do:
>
> struct bio_vec vec;
> dist = pci_p2pdma_dist(provider_pdev, mapping_pdev);
> if (dist < 0) {
> /* use regular memory */
> vec.bv_addr = virt_to_phys(kmalloc(...));
> vec.bv_flags = 0;
> } else if (dist & PCI_P2PDMA_THRU_HOST_BRIDGE) {
> vec.bv_addr = pci_p2pmem_alloc_phys(provider_pdev, ...);
> vec.bv_flags = BVEC_MAP_RESOURCE;
> } else {
> vec.bv_addr = pci_p2pmem_alloc_bus_addr(provider_pdev, ...);
> vec.bv_flags = BVEC_MAP_BUS_ADDR;
> }
That doesn't look too bad, except..
> -- And a mapping driver would roughly just do:
>
> dma_addr_t dma_addr;
> if (vec.bv_flags & BVEC_MAP_BUS_ADDR) {
> if (pci_bus_addr_in_bar(mapping_pdev, vec.bv_addr, &bar, &off)) {
> /* case (c) */
> /* program the DMA engine with bar and off */
Why bother with that here if we could also let the caller handle
that? pci_p2pdma_dist() should be able to trivially find that out
based on provider_dev == mapping_dev.
> The real difficulty here is that you'd really want all the above handled
> by a dma_map_bvec() so it can combine every vector hitting the IOMMU
> into a single continuous IOVA -- but it's hard to fit case (c) into that
> equation. So it might be that a dma_map_bvec() handles cases (a), (b1)
> and (b2) and the mapping driver has to then check each resulting DMA
> vector for pci_bus_addr_in_bar() while it is programming the DMA engine
> to deal with case (c).
I'd do it the other way around. pci_p2pdma_dist is used to find
the p2p type. The p2p type is stuff into the bio_vec, and we then:
(1) manually check for case (c) in driver for drivers that want to
treat it different from (b)
(2) we then have a dma mapping wrapper that checks the p2p type
and does the right thing for the rest.
next prev parent reply other threads:[~2019-06-28 13:38 UTC|newest]
Thread overview: 179+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-20 16:12 [RFC PATCH 00/28] Removing struct page from P2PDMA Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 01/28] block: Introduce DMA direct request type Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 02/28] block: Add dma_vec structure Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 03/28] block: Warn on mis-use of dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 04/28] block: Never bounce " Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 17:23 ` Jason Gunthorpe
2019-06-20 17:23 ` Jason Gunthorpe
2019-06-20 18:38 ` Logan Gunthorpe
2019-06-20 18:38 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 05/28] block: Skip dma-direct bios in bio_integrity_prep() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 06/28] block: Support dma-direct bios in bio_advance_iter() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 07/28] block: Use dma_vec length in bio_cur_bytes() for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 08/28] block: Introduce dmavec_phys_mergeable() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 09/28] block: Introduce vec_gap_to_prev() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 10/28] block: Create generic vec_split_segs() from bvec_split_segs() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 11/28] block: Create blk_segment_split_ctx Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 12/28] block: Create helper for bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 13/28] block: Generalize bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 14/28] block: Support splitting dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 15/28] block: Support counting dma-direct bio segments Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 16/28] block: Implement mapping dma-direct requests to SGs in blk_rq_map_sg() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 17/28] block: Introduce queue flag to indicate support for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 18/28] block: Introduce bio_add_dma_addr() Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 19/28] nvme-pci: Support dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 20/28] IB/core: Introduce API for initializing a RW ctx from a DMA address Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:49 ` Jason Gunthorpe
2019-06-20 16:49 ` Jason Gunthorpe
2019-06-20 16:59 ` Logan Gunthorpe
2019-06-20 16:59 ` Logan Gunthorpe
2019-06-20 17:11 ` Jason Gunthorpe
2019-06-20 17:11 ` Jason Gunthorpe
2019-06-20 18:24 ` Logan Gunthorpe
2019-06-20 18:24 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 21/28] nvmet: Split nvmet_bdev_execute_rw() into a helper function Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 22/28] nvmet: Use DMA addresses instead of struct pages for P2P Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 23/28] nvme-pci: Remove support for PCI_P2PDMA requests Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 24/28] block: Remove PCI_P2PDMA queue flag Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 25/28] IB/core: Remove P2PDMA mapping support in rdma_rw_ctx Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 26/28] PCI/P2PDMA: Remove SGL helpers Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 27/28] PCI/P2PDMA: Remove struct pages that back P2PDMA memory Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 28/28] memremap: Remove PCI P2PDMA page memory type Logan Gunthorpe
2019-06-20 16:12 ` Logan Gunthorpe
2019-06-20 18:45 ` [RFC PATCH 00/28] Removing struct page from P2PDMA Dan Williams
2019-06-20 18:45 ` Dan Williams
2019-06-20 19:33 ` Jason Gunthorpe
2019-06-20 19:33 ` Jason Gunthorpe
2019-06-20 20:18 ` Dan Williams
2019-06-20 20:18 ` Dan Williams
2019-06-20 20:51 ` Logan Gunthorpe
2019-06-20 20:51 ` Logan Gunthorpe
2019-06-21 17:47 ` Jason Gunthorpe
2019-06-21 17:47 ` Jason Gunthorpe
2019-06-21 17:54 ` Dan Williams
2019-06-21 17:54 ` Dan Williams
2019-06-24 7:31 ` Christoph Hellwig
2019-06-24 7:31 ` Christoph Hellwig
2019-06-24 13:46 ` Jason Gunthorpe
2019-06-24 13:46 ` Jason Gunthorpe
2019-06-24 13:50 ` Christoph Hellwig
2019-06-24 13:50 ` Christoph Hellwig
2019-06-24 13:55 ` Jason Gunthorpe
2019-06-24 13:55 ` Jason Gunthorpe
2019-06-24 16:53 ` Logan Gunthorpe
2019-06-24 16:53 ` Logan Gunthorpe
2019-06-24 18:16 ` Jason Gunthorpe
2019-06-24 18:16 ` Jason Gunthorpe
2019-06-24 18:28 ` Logan Gunthorpe
2019-06-24 18:28 ` Logan Gunthorpe
2019-06-24 18:54 ` Jason Gunthorpe
2019-06-24 18:54 ` Jason Gunthorpe
2019-06-24 19:37 ` Logan Gunthorpe
2019-06-24 19:37 ` Logan Gunthorpe
2019-06-24 16:10 ` Logan Gunthorpe
2019-06-24 16:10 ` Logan Gunthorpe
2019-06-25 7:18 ` Christoph Hellwig
2019-06-25 7:18 ` Christoph Hellwig
2019-06-20 19:34 ` Logan Gunthorpe
2019-06-20 19:34 ` Logan Gunthorpe
2019-06-20 23:40 ` Dan Williams
2019-06-20 23:40 ` Dan Williams
2019-06-20 23:42 ` Logan Gunthorpe
2019-06-20 23:42 ` Logan Gunthorpe
2019-06-24 7:27 ` Christoph Hellwig
2019-06-24 7:27 ` Christoph Hellwig
2019-06-24 16:07 ` Logan Gunthorpe
2019-06-24 16:07 ` Logan Gunthorpe
2019-06-25 7:20 ` Christoph Hellwig
2019-06-25 7:20 ` Christoph Hellwig
2019-06-25 15:57 ` Logan Gunthorpe
2019-06-25 15:57 ` Logan Gunthorpe
2019-06-25 17:01 ` Christoph Hellwig
2019-06-25 17:01 ` Christoph Hellwig
2019-06-25 19:54 ` Logan Gunthorpe
2019-06-25 19:54 ` Logan Gunthorpe
2019-06-26 6:57 ` Christoph Hellwig
2019-06-26 6:57 ` Christoph Hellwig
2019-06-26 18:31 ` Logan Gunthorpe
2019-06-26 18:31 ` Logan Gunthorpe
2019-06-26 20:21 ` Jason Gunthorpe
2019-06-26 20:21 ` Jason Gunthorpe
2019-06-26 20:39 ` Dan Williams
2019-06-26 20:39 ` Dan Williams
2019-06-26 20:54 ` Jason Gunthorpe
2019-06-26 20:54 ` Jason Gunthorpe
2019-06-26 20:55 ` Logan Gunthorpe
2019-06-26 20:55 ` Logan Gunthorpe
2019-06-26 20:45 ` Logan Gunthorpe
2019-06-26 20:45 ` Logan Gunthorpe
2019-06-26 21:00 ` Jason Gunthorpe
2019-06-26 21:00 ` Jason Gunthorpe
2019-06-26 21:18 ` Logan Gunthorpe
2019-06-26 21:18 ` Logan Gunthorpe
2019-06-27 6:32 ` Jason Gunthorpe
2019-06-27 6:32 ` Jason Gunthorpe
2019-06-27 16:09 ` Logan Gunthorpe
2019-06-27 16:09 ` Logan Gunthorpe
2019-06-27 16:35 ` Jason Gunthorpe
2019-06-27 16:35 ` Jason Gunthorpe
2019-06-27 16:49 ` Logan Gunthorpe
2019-06-27 16:49 ` Logan Gunthorpe
2019-06-28 4:57 ` Jason Gunthorpe
2019-06-28 4:57 ` Jason Gunthorpe
2019-06-28 16:22 ` Logan Gunthorpe
2019-06-28 16:22 ` Logan Gunthorpe
2019-06-28 17:29 ` Jason Gunthorpe
2019-06-28 17:29 ` Jason Gunthorpe
2019-06-28 18:29 ` Logan Gunthorpe
2019-06-28 18:29 ` Logan Gunthorpe
2019-06-28 19:09 ` Jason Gunthorpe
2019-06-28 19:09 ` Jason Gunthorpe
2019-06-28 19:35 ` Logan Gunthorpe
2019-06-28 19:35 ` Logan Gunthorpe
2019-07-02 22:45 ` Jason Gunthorpe
2019-07-02 22:45 ` Jason Gunthorpe
2019-07-02 22:52 ` Logan Gunthorpe
2019-07-02 22:52 ` Logan Gunthorpe
2019-06-27 9:08 ` Christoph Hellwig
2019-06-27 9:08 ` Christoph Hellwig
2019-06-27 16:30 ` Logan Gunthorpe
2019-06-27 16:30 ` Logan Gunthorpe
2019-06-27 17:00 ` Christoph Hellwig
2019-06-27 17:00 ` Christoph Hellwig
2019-06-27 18:00 ` Logan Gunthorpe
2019-06-27 18:00 ` Logan Gunthorpe
2019-06-27 18:00 ` Logan Gunthorpe
2019-06-28 13:38 ` Christoph Hellwig [this message]
2019-06-28 13:38 ` Christoph Hellwig
2019-06-28 15:54 ` Logan Gunthorpe
2019-06-28 15:54 ` Logan Gunthorpe
2019-06-27 9:01 ` Christoph Hellwig
2019-06-27 9:01 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190628133837.GA3801@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=sagi@grimberg.me \
--cc=sbates@raithlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.