Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <jglisse@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	Jason Gunthorpe <jgg@mellanox.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Christian Koenig <christian.koenig@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <jroedel@suse.de>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>
Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma
Date: Thu, 31 Jan 2019 10:37:38 -0500	[thread overview]
Message-ID: <20190131153737.GD4619@redhat.com> (raw)
In-Reply-To: <20190131081355.GC26495@lst.de>

On Thu, Jan 31, 2019 at 09:13:55AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 03:52:13PM -0700, Logan Gunthorpe wrote:
> > > *shrug* so what if the special GUP called a VMA op instead of
> > > traversing the VMA PTEs today? Why does it really matter? It could
> > > easily change to a struct page flow tomorrow..
> > 
> > Well it's so that it's composable. We want the SGL->DMA side to work for
> > APIs from kernel space and not have to run a completely different flow
> > for kernel drivers than from userspace memory.
> 
> Yes, I think that is the important point.
> 
> All the other struct page discussion is not about anyone of us wanting
> struct page - heck it is a pain to deal with, but then again it is
> there for a reason.
> 
> In the typical GUP flows we have three uses of a struct page:

We do not want GUP. Yes some RDMA driver and other use GUP but they
should only use GUP on regular vma not on special vma (ie mmap of a
device file). Allowing GUP on those is insane. It is better to special
case the peer to peer mapping because _it is_ special, nothing inside
those are manage by core mm and driver can deal with them in weird
way (GPU certainly do and for very good reasons without which they
would perform badly).

> 
>  (1) to carry a physical address.  This is mostly through
>      struct scatterlist and struct bio_vec.  We could just store
>      a magic PFN-like value that encodes the physical address
>      and allow looking up a page if it exists, and we had at least
>      two attempts at it.  In some way I think that would actually
>      make the interfaces cleaner, but Linus has NACKed it in the
>      past, so we'll have to convince him first that this is the
>      way forward

Wasting 64bytes just to carry address is a waste for everyone.

>  (2) to keep a reference to the memory so that it doesn't go away
>      under us due to swapping, process exit, unmapping, etc.
>      No idea how we want to solve this, but I guess you have
>      some smart ideas?

The DMA API has _never_ dealt with page refcount and it have always
been up to the user of the DMA API to ascertain that it is safe for
them to map/unmap page/resource they are providing to the DMA API.

The lifetime management of page or resource provided to the DMA API
should remain the problem of the caller and not be something the DMA
API cares one bit about.

>  (3) to make the PTEs dirty after writing to them.  Again no sure
>      what our preferred interface here would be

Again the DMA API has never dealt with that nor should he. What does
dirty pte means for a special mapping (mmap of device file) ? There is
no single common definition for that, most driver do not care about it
and it get fully ignore.

> 
> If we solve all of the above problems I'd be more than happy to
> go with a non-struct page based interface for BAR P2P.  But we'll
> have to solve these issues in a generic way first.

None of the above are problems the DMA API need to solve. The DMA API
is about mapping some memory resource to a device. For regular main
memory it is easy on most architecture (anything with a sane IOMMU).
For IO resources it is not as straight forward as it was often left
undefined in the architecture platform documentation or the inter-
connect standard. AFAIK mapping BAR from one PCIE device to another
through IOMMU works well on recent Intel and AMD platform. We will
probably need to use some whitelist at i am not sure this is something
Intel or AMD guarantee, i believe they want to start guaranteeing it.

So having one DMA API for regular memory and one for IO memory aka
resource (dma_map_resource()) sounds like the only sane approach here.
It is fundamentally different memory and we should not try to muddle
the water by having it go through a single common API. There is no
benefit to that beside saving couple hundred of lines of code to some
driver and this couple hundred lines of code can be move to a common
helpers.

So to me it is lot sane to provide an helper that would deal with
the different vma type on behalf of device than forcing down struct
page. Something like:

vma_dma_map_range(vma, device, start, end, flags, pa[])
vma_dma_unmap_range(vma, device, start, end, flags, pa[])

VMA_DMA_MAP_FLAG_WRITE
VMA_DMA_MAP_FLAG_PIN

Which would use GUP or special vma handling on behalf of the calling
device or use a special p2p code path for special vma. Device that
need pinning set the flag and it is up to the exporting device to
accept or not. Pinning when using GUP is obvious.

When the vma goes away the importing device must update its device
page table to some dummy page or do something sane, because keeping
things map after that point does not make sense anymore. Device is
no longer operating on a range of virtual address that make sense.

So instead of pushing p2p handling within GUP to not disrupt existing
driver workflow. It is better to provide an helper that handle all
the gory details for the device driver. It does not change things for
the driver and allows proper special casing.

Cheers,
Jérôme

next prev parent reply	other threads:[~2019-01-31 15:37 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 17:47 [RFC PATCH 0/5] Device peer to peer (p2p) through vma jglisse
2019-01-29 17:47 ` [RFC PATCH 1/5] pci/p2p: add a function to test peer to peer capability jglisse
2019-01-29 18:24   ` Logan Gunthorpe
2019-01-29 19:44     ` Greg Kroah-Hartman
2019-01-29 19:53       ` Jerome Glisse
2019-01-29 20:44       ` Logan Gunthorpe
2019-01-29 21:00         ` Jerome Glisse
2019-01-29 19:56   ` Alex Deucher
2019-01-29 20:00     ` Jerome Glisse
2019-01-29 20:24     ` Logan Gunthorpe
2019-01-29 21:28       ` Alex Deucher
2019-01-30 10:25       ` Christian König
2019-01-29 17:47 ` [RFC PATCH 2/5] drivers/base: " jglisse
2019-01-29 18:26   ` Logan Gunthorpe
2019-01-29 19:54     ` Jerome Glisse
2019-01-29 19:46   ` Greg Kroah-Hartman
2019-01-29 19:56     ` Jerome Glisse
2019-01-29 17:47 ` [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma jglisse
2019-01-29 18:36   ` Logan Gunthorpe
2019-01-29 19:11     ` Jerome Glisse
2019-01-29 19:24       ` Logan Gunthorpe
2019-01-29 19:44         ` Jerome Glisse
2019-01-29 20:43           ` Logan Gunthorpe
2019-01-30  7:52             ` Christoph Hellwig
2019-01-29 19:32       ` Jason Gunthorpe
2019-01-29 19:50         ` Jerome Glisse
2019-01-29 20:24           ` Jason Gunthorpe
2019-01-29 20:44             ` Jerome Glisse
2019-01-29 23:02               ` Jason Gunthorpe
2019-01-30  0:08                 ` Jerome Glisse
2019-01-30  4:30                   ` Jason Gunthorpe
2019-01-30 15:43                     ` Jerome Glisse
2019-01-29 20:39         ` Logan Gunthorpe
2019-01-29 20:57           ` Jerome Glisse
2019-01-29 21:30             ` Logan Gunthorpe
2019-01-29 21:50               ` Jerome Glisse
2019-01-29 22:58                 ` Logan Gunthorpe
2019-01-29 23:47                   ` Jerome Glisse
2019-01-30  1:17                     ` Logan Gunthorpe
2019-01-30  2:48                       ` Jerome Glisse
2019-01-30  4:18                       ` Jason Gunthorpe
2019-01-30  8:00                         ` Christoph Hellwig
2019-01-30 15:49                           ` Jerome Glisse
2019-01-30 19:06                           ` Jason Gunthorpe
2019-01-30 19:45                             ` Logan Gunthorpe
2019-01-30 19:59                               ` Jason Gunthorpe
2019-01-30 21:01                                 ` Logan Gunthorpe
2019-01-30 21:50                                   ` Jason Gunthorpe
2019-01-30 22:52                                     ` Logan Gunthorpe
2019-01-30 23:30                                       ` Jason Gunthorpe
2019-01-31  8:13                                       ` Christoph Hellwig
2019-01-31 15:37                                         ` Jerome Glisse [this message]
2019-01-31 19:02                                         ` Jason Gunthorpe
2019-01-31 19:19                                           ` Logan Gunthorpe
2019-01-31 19:54                                             ` Jason Gunthorpe
2019-01-31 19:35                                           ` Jerome Glisse
2019-01-31 19:44                                             ` Logan Gunthorpe
2019-01-31 19:58                                             ` Jason Gunthorpe
2019-01-30 17:17                         ` Logan Gunthorpe
2019-01-30 18:56                           ` Jason Gunthorpe
2019-01-30 19:22                             ` Jerome Glisse
2019-01-30 19:38                               ` Jason Gunthorpe
2019-01-30 20:00                                 ` Logan Gunthorpe
2019-01-30 20:11                                   ` Jason Gunthorpe
2019-01-30 20:43                                     ` Jerome Glisse
2019-01-30 20:50                                       ` Jason Gunthorpe
2019-01-30 21:45                                         ` Jerome Glisse
2019-01-30 21:56                                           ` Jason Gunthorpe
2019-01-30 22:30                                             ` Jerome Glisse
2019-01-30 22:33                                               ` Jason Gunthorpe
2019-01-30 22:47                                                 ` Jerome Glisse
2019-01-30 22:51                                                   ` Jason Gunthorpe
2019-01-30 22:58                                                     ` Jerome Glisse
2019-01-30 19:52                               ` Logan Gunthorpe
2019-01-30 20:35                                 ` Jerome Glisse
2019-01-29 20:58           ` Jason Gunthorpe
2019-01-30  8:02             ` Christoph Hellwig
2019-01-30 10:33               ` Koenig, Christian
2019-01-30 15:55                 ` Jerome Glisse
2019-01-30 17:26                   ` Christoph Hellwig
2019-01-30 17:32                     ` Logan Gunthorpe
2019-01-30 17:39                     ` Jason Gunthorpe
2019-01-30 18:05                     ` Jerome Glisse
2019-01-30 17:44               ` Jason Gunthorpe
2019-01-30 18:13                 ` Logan Gunthorpe
2019-01-30 18:50                   ` Jerome Glisse
2019-01-31  8:02                     ` Christoph Hellwig
2019-01-31 15:03                       ` Jerome Glisse
2019-01-30 19:19                   ` Jason Gunthorpe
2019-01-30 19:48                     ` Logan Gunthorpe
2019-01-30 20:44                       ` Jason Gunthorpe
2019-01-31  8:05                         ` Christoph Hellwig
2019-01-31 15:11                           ` Jerome Glisse
2019-01-29 17:47 ` [RFC PATCH 4/5] mm/hmm: add support for peer to peer to HMM device memory jglisse
2019-01-29 17:47 ` [RFC PATCH 5/5] mm/hmm: add support for peer to peer to special device vma jglisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190131153737.GD4619@redhat.com \
    --to=jglisse@redhat.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jgg@mellanox.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).