linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-rdma@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Stephen Bates <sbates@raithlin.com>
Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA
Date: Mon, 24 Jun 2019 09:27:52 +0200	[thread overview]
Message-ID: <20190624072752.GA3954@lst.de> (raw)
In-Reply-To: <20190620161240.22738-1-logang@deltatee.com>

This is not going to fly.

For one passing a dma_addr_t through the block layer is a layering
violation, and one that I think will also bite us in practice.
The host physical to PCIe bus address mapping can have offsets, and
those offsets absolutely can be different for differnet root ports.
So with your caller generated dma_addr_t everything works fine with
a switched setup as the one you are probably testing on, but on a
sufficiently complicated setup with multiple root ports it can break.

Also duplicating the whole block I/O stack, including hooks all over
the fast path is pretty much a no-go.

I've been pondering for a while if we wouldn't be better off just
passing a phys_addr_t + len instead of the page, offset, len tuple
in the bio_vec, though.  If you look at the normal I/O path here
is what we normally do:

 - we get a page as input, either because we have it at hand (e.g.
   from the page cache) or from get_user_pages (which actually caculates
   it from a pfn in the page tables)
 - once in the bio all the merging decisions are based on the physical
   address, so we have to convert it to the physical address there,
   potentially multiple times
 - then dma mapping all works off the physical address, which it gets
   from the page at the start
 - then only the dma address is used for the I/O
 - on I/O completion we often but not always need the page again.  In
   the direct I/O case for reference counting and dirty status, in the
   file system also for things like marking the page uptodate

So if we move to a phys_addr_t we'd need to go back to the page at least
once.  But because of how the merging works we really only need to do
it once per segment, as we can just do pointer arithmerics do get the
following pages.  As we generally go at least once from a physical
address to a page in the merging code even a relatively expensive vmem_map
looks shouldn't be too bad.  Even more so given that the super hot path
(small blkdev direct I/O) can actually trivially cache the affected pages
as well.

Linus kinda hates the pfn approach, but part of that was really that
it was proposed for file system data, which we all found out really
can't work as-is without pages the hard way.  Another part probably
was potential performance issue, but between the few page lookups, and
the fact that using a single phys_addr_t instead of pfn/page + offset
should avoid quite a few calculations performance should not actually
be affected, although we'll have to be careful to actually verify that.

  parent reply	other threads:[~2019-06-24  7:28 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-20 16:12 [RFC PATCH 00/28] Removing struct page from P2PDMA Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 01/28] block: Introduce DMA direct request type Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 02/28] block: Add dma_vec structure Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 03/28] block: Warn on mis-use of dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 04/28] block: Never bounce " Logan Gunthorpe
2019-06-20 17:23   ` Jason Gunthorpe
2019-06-20 18:38     ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 05/28] block: Skip dma-direct bios in bio_integrity_prep() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 06/28] block: Support dma-direct bios in bio_advance_iter() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 07/28] block: Use dma_vec length in bio_cur_bytes() for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 08/28] block: Introduce dmavec_phys_mergeable() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 09/28] block: Introduce vec_gap_to_prev() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 10/28] block: Create generic vec_split_segs() from bvec_split_segs() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 11/28] block: Create blk_segment_split_ctx Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 12/28] block: Create helper for bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 13/28] block: Generalize bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 14/28] block: Support splitting dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 15/28] block: Support counting dma-direct bio segments Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 16/28] block: Implement mapping dma-direct requests to SGs in blk_rq_map_sg() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 17/28] block: Introduce queue flag to indicate support for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 18/28] block: Introduce bio_add_dma_addr() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 19/28] nvme-pci: Support dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 20/28] IB/core: Introduce API for initializing a RW ctx from a DMA address Logan Gunthorpe
2019-06-20 16:49   ` Jason Gunthorpe
2019-06-20 16:59     ` Logan Gunthorpe
2019-06-20 17:11       ` Jason Gunthorpe
2019-06-20 18:24         ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 21/28] nvmet: Split nvmet_bdev_execute_rw() into a helper function Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 22/28] nvmet: Use DMA addresses instead of struct pages for P2P Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 23/28] nvme-pci: Remove support for PCI_P2PDMA requests Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 24/28] block: Remove PCI_P2PDMA queue flag Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 25/28] IB/core: Remove P2PDMA mapping support in rdma_rw_ctx Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 26/28] PCI/P2PDMA: Remove SGL helpers Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 27/28] PCI/P2PDMA: Remove struct pages that back P2PDMA memory Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 28/28] memremap: Remove PCI P2PDMA page memory type Logan Gunthorpe
2019-06-20 18:45 ` [RFC PATCH 00/28] Removing struct page from P2PDMA Dan Williams
2019-06-20 19:33   ` Jason Gunthorpe
2019-06-20 20:18     ` Dan Williams
2019-06-20 20:51       ` Logan Gunthorpe
2019-06-21 17:47       ` Jason Gunthorpe
2019-06-21 17:54         ` Dan Williams
2019-06-24  7:31     ` Christoph Hellwig
2019-06-24 13:46       ` Jason Gunthorpe
2019-06-24 13:50         ` Christoph Hellwig
2019-06-24 13:55           ` Jason Gunthorpe
2019-06-24 16:53             ` Logan Gunthorpe
2019-06-24 18:16               ` Jason Gunthorpe
2019-06-24 18:28                 ` Logan Gunthorpe
2019-06-24 18:54                   ` Jason Gunthorpe
2019-06-24 19:37                     ` Logan Gunthorpe
2019-06-24 16:10         ` Logan Gunthorpe
2019-06-25  7:18           ` Christoph Hellwig
2019-06-20 19:34   ` Logan Gunthorpe
2019-06-20 23:40     ` Dan Williams
2019-06-20 23:42       ` Logan Gunthorpe
2019-06-24  7:27 ` Christoph Hellwig [this message]
2019-06-24 16:07   ` Logan Gunthorpe
2019-06-25  7:20     ` Christoph Hellwig
2019-06-25 15:57       ` Logan Gunthorpe
2019-06-25 17:01         ` Christoph Hellwig
2019-06-25 19:54           ` Logan Gunthorpe
2019-06-26  6:57             ` Christoph Hellwig
2019-06-26 18:31               ` Logan Gunthorpe
2019-06-26 20:21                 ` Jason Gunthorpe
2019-06-26 20:39                   ` Dan Williams
2019-06-26 20:54                     ` Jason Gunthorpe
2019-06-26 20:55                     ` Logan Gunthorpe
2019-06-26 20:45                   ` Logan Gunthorpe
2019-06-26 21:00                     ` Jason Gunthorpe
2019-06-26 21:18                       ` Logan Gunthorpe
2019-06-27  6:32                         ` Jason Gunthorpe
2019-06-27 16:09                           ` Logan Gunthorpe
2019-06-27 16:35                             ` Jason Gunthorpe
2019-06-27 16:49                               ` Logan Gunthorpe
2019-06-28  4:57                                 ` Jason Gunthorpe
2019-06-28 16:22                                   ` Logan Gunthorpe
2019-06-28 17:29                                     ` Jason Gunthorpe
2019-06-28 18:29                                       ` Logan Gunthorpe
2019-06-28 19:09                                         ` Jason Gunthorpe
2019-06-28 19:35                                           ` Logan Gunthorpe
2019-07-02 22:45                                             ` Jason Gunthorpe
2019-07-02 22:52                                               ` Logan Gunthorpe
2019-06-27  9:08                     ` Christoph Hellwig
2019-06-27 16:30                       ` Logan Gunthorpe
2019-06-27 17:00                         ` Christoph Hellwig
2019-06-27 18:00                           ` Logan Gunthorpe
2019-06-28 13:38                             ` Christoph Hellwig
2019-06-28 15:54                               ` Logan Gunthorpe
2019-06-27  9:01                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190624072752.GA3954@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).