linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'linux-rdma@vger.kernel.org'" <linux-rdma@vger.kernel.org>,
	"'linux-nvdimm@lists.01.org'" <linux-nvdimm@lists.01.org>,
	"'Linux-media@vger.kernel.org'" <Linux-media@vger.kernel.org>,
	"'dri-devel@lists.freedesktop.org'"
	<dri-devel@lists.freedesktop.org>,
	"'linux-pci@vger.kernel.org'" <linux-pci@vger.kernel.org>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Sagalovitch, Serguei" <Serguei.Sagalovitch@amd.com>,
	"Blinzer, Paul" <Paul.Blinzer@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>,
	"Sander, Ben" <ben.sander@amd.com>,
	hch@infradead.org, jgunthorpe@obsidianresearch.com,
	david1.zhou@amd.com, qiang.yu@amd.com
Subject: Re: Enabling peer to peer device transactions for PCIe devices
Date: Thu, 5 Jan 2017 13:39:29 -0500	[thread overview]
Message-ID: <20170105183927.GA5324@gmail.com> (raw)
In-Reply-To: <MWHPR12MB169484839282E2D56124FA02F7B50@MWHPR12MB1694.namprd12.prod.outlook.com>

Sorry to revive this thread but it fells through my filters and i
miss it. I have been going through it and i think the discussion
has been hinder by the fact that distinct problems were merge while
they should be address separately.

First for peer-to-peer we need to be clear on how this happens. Two
cases here :
  1) peer-to-peer because of userspace specific API like NVidia GPU
    direct (AMD is pushing its own similar API i just can't remember
    marketing name). This does not happen through a vma, this happens
    through specific device driver call going through device specific
    ioctl on both side (GPU and RDMA). So both kernel driver are aware
    of each others.
  2) peer-to-peer because RDMA/device is trying to access a regular
    vma (ie non special either private anonymous or share memory or
    mmap of a regular file not a device file).

For 1) there is no need to over complicate thing. Device driver must
have a back-channel between them and must be able to invalidate their
respective mapping (ie GPU must be able to ask RDMA device to kill/
stop its MR).

So remaining issue for 1) is how to enable effective peer-to-peer
mapping given that it might not work reliably on all platform. Here
Alex was listing existing proposal:
  A P2P DMA DMA-API/PCI map_peer_resource support for peer-to-peer
    http://www.spinics.net/lists/linux-pci/msg44560.html
  B ZONE_DEVICE IO irect I/O and DMA for persistent memory
    https://lwn.net/Articles/672457/
  C DMA-BUF RDMA subsystem DMA-BUF support
    http://www.spinics.net/lists/linux-rdma/msg38748.html
  D iopmem iopmem : A block device for PCIe memory
    https://lwn.net/Articles/703895/
  E HMM (not interesting for case 1)
  F Something new

Of the above D is ill suited for for GPU as we do not want to pin
GPU memory and D is design with long live object that do not move.
Also i do not think that exposing device PCIe bar through a new
/dev/somefilename is a good idea for GPU. So i think this should
be discarded.

HMM should be discard in respect of case 1 too. It is useful for
case 2. I don't think dma-buf is the right path either.

So we i think there is only A and B that make sense. Now for use
case 1 i think A is the best solution. No need to have struct page
and it require explicit knowlegde for device driver that it is
mapping another device memory which is a given in usecase 1.


If we look at case 2 the situation is bit more complex. Here RDMA
is just trying to access a regular VMA but it might happens that
some memory inside that VMA reside inside a device memory. When
that happens we would like to avoid to move that memory back to
system memory assuming that a peer mapping is doable.

Usecase 2 assume that the GPU is either on platform with CAPI or
CCTX (or something similar) in which case it is easy as device
memory will have struct page and is always accessible by CPU and
transparent from device to device access (AFAICT).

So we left with platform that do not have proper support for
device memory (ie CPU can not access it the same as DDR or as
limited access). Which apply to x86 for the foreseeable future.

This is the problem HMM address, allowing to transparently use
device memory inside a process even if direct CPU access are not
permited. I have plan to support peer-to-peer with HMM because
it is an important usecase. The idea is to have the device driver
fault against ZONE_DEVICE page and communicate through common API
to establish mapping. HMM will only handle keeping track of device
to device mapping and allowing to invalidate such mapping at any
time to allow memory to be migrated.

I do not intend to solve the IOMMU side of the problem or even
the PCI hierarchy issue where you can't peer-to-peer between device
accross some PCI bridge. I believe this is an orthogonal problem
and that it is best solve inside the DMA API ie with solution A.


I do not think we should try to solve all the problems with a
common solutions. They are too disparate from capabilities (what
the hardware can and can't do).

>From my point of view there is few take aways:
  - device should only access regular vma
  - device should never try to access vma that point to another
    device (mmap of any file in /dev)
  - peer to peer access through dedicated userspace API must
    involve dedicated API between kernel driver taking part into
    the peer to peer access
  - peer to peer of regular vma must involve a common API for
    drivers to interact so no driver can block the other


So i think the DMA-API proposal is the one to pursue and others
problem relating to handling GPU memory and how to use it is a
different kind of problem. One with either an hardware solution
(CAPI, CCTX, ...) or a software solution (HMM so far).

I don't think we should conflict the 2 problems into one. Anyway
i think this should be something worth discussing face to face
with interested party to flesh out a solution (can be at LSF/MM
or in another forum).

Cheers,
Jérôme

  parent reply	other threads:[~2017-01-05 18:39 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21 20:36 Enabling peer to peer device transactions for PCIe devices Deucher, Alexander
2016-11-22 18:11 ` Dan Williams
     [not found]   ` <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>
2016-11-22 20:01     ` Dan Williams
2016-11-22 20:10       ` Daniel Vetter
2016-11-22 20:24         ` Dan Williams
2016-11-22 20:35         ` Serguei Sagalovitch
2016-11-22 21:03           ` Daniel Vetter
2016-11-22 21:21             ` Dan Williams
2016-11-22 22:21               ` Sagalovitch, Serguei
2016-11-23  7:49               ` Daniel Vetter
2016-11-23  8:51                 ` Christian König
2016-11-23 19:27                   ` Serguei Sagalovitch
2016-11-23 17:03                 ` Dave Hansen
2016-11-23 17:13     ` Logan Gunthorpe
2016-11-23 17:27       ` Bart Van Assche
2016-11-23 18:40         ` Dan Williams
2016-11-23 19:12           ` Jason Gunthorpe
2016-11-23 19:24             ` Serguei Sagalovitch
2016-11-23 19:06         ` Serguei Sagalovitch
2016-11-23 19:05       ` Jason Gunthorpe
2016-11-23 19:14         ` Serguei Sagalovitch
2016-11-23 19:32           ` Jason Gunthorpe
     [not found]             ` <c2c88376-5ba7-37d1-4d3e-592383ebb00a@amd.com>
2016-11-23 20:33               ` Jason Gunthorpe
2016-11-23 21:11                 ` Logan Gunthorpe
2016-11-23 21:55                   ` Jason Gunthorpe
2016-11-23 22:42                     ` Dan Williams
2016-11-23 23:25                       ` Jason Gunthorpe
2016-11-24  9:45                         ` Christian König
2016-11-24 16:26                           ` Jason Gunthorpe
2016-11-24 17:00                             ` Serguei Sagalovitch
2016-11-24 17:55                           ` Logan Gunthorpe
2016-11-25 13:06                             ` Christian König
2016-11-25 16:45                               ` Logan Gunthorpe
2016-11-25 17:20                                 ` Serguei Sagalovitch
2016-11-25 20:26                                   ` Felix Kuehling
2016-11-25 20:48                                     ` Serguei Sagalovitch
2016-11-24  0:40                     ` Sagalovitch, Serguei
2016-11-24 16:24                       ` Jason Gunthorpe
2016-11-24  1:25                     ` Logan Gunthorpe
2016-11-24 16:42                       ` Jason Gunthorpe
2016-11-24 18:11                         ` Logan Gunthorpe
2016-11-25  7:58                           ` Christoph Hellwig
2016-11-25 19:41                             ` Jason Gunthorpe
2016-11-25 17:59                           ` Serguei Sagalovitch
2016-11-25 13:22                         ` Christian König
2016-11-25 17:16                           ` Serguei Sagalovitch
2016-11-25 19:34                             ` Jason Gunthorpe
2016-11-25 19:49                               ` Serguei Sagalovitch
2016-11-25 20:19                                 ` Jason Gunthorpe
2016-11-25 23:41                               ` Alex Deucher
2016-11-25 19:32                           ` Jason Gunthorpe
2016-11-25 20:40                             ` Christian König
2016-11-25 20:51                               ` Felix Kuehling
2016-11-25 21:18                               ` Jason Gunthorpe
2016-11-27  8:16                             ` Haggai Eran
2016-11-27 14:02                             ` Haggai Eran
2016-11-27 14:07                               ` Christian König
2016-11-28  5:31                                 ` zhoucm1
2016-11-28 14:48                               ` Serguei Sagalovitch
2016-11-28 18:36                                 ` Haggai Eran
2016-11-28 16:57                               ` Jason Gunthorpe
2016-11-28 18:19                                 ` Haggai Eran
2016-11-28 19:02                                   ` Jason Gunthorpe
2016-11-30 10:45                                     ` Haggai Eran
2016-11-30 16:23                                       ` Jason Gunthorpe
2016-11-30 17:28                                         ` Serguei Sagalovitch
2016-12-04  7:33                                           ` Haggai Eran
2016-11-30 18:01                                         ` Logan Gunthorpe
2016-12-04  7:42                                           ` Haggai Eran
2016-12-04 13:06                                             ` Stephen Bates
2016-12-04 13:23                                             ` Stephen Bates
2016-12-05 17:18                                               ` Jason Gunthorpe
2016-12-05 17:40                                                 ` Dan Williams
2016-12-05 18:02                                                   ` Jason Gunthorpe
2016-12-05 18:08                                                     ` Dan Williams
2016-12-05 18:39                                                       ` Logan Gunthorpe
2016-12-05 18:48                                                         ` Dan Williams
2016-12-05 19:14                                                           ` Jason Gunthorpe
2016-12-05 19:27                                                             ` Logan Gunthorpe
2016-12-05 19:46                                                               ` Jason Gunthorpe
2016-12-05 19:59                                                                 ` Logan Gunthorpe
2016-12-05 20:06                                                                 ` Christoph Hellwig
2016-12-06  8:06                                                           ` Stephen Bates
2016-12-06 16:38                                                             ` Jason Gunthorpe
2016-12-06 16:51                                                               ` Logan Gunthorpe
2016-12-06 17:28                                                                 ` Jason Gunthorpe
2016-12-06 21:47                                                                   ` Logan Gunthorpe
2016-12-06 22:02                                                                     ` Dan Williams
2016-12-06 17:12                                                               ` Christoph Hellwig
2016-12-04  7:53                                         ` Haggai Eran
2016-11-30 17:10                                       ` Deucher, Alexander
2016-11-28 18:20                                 ` Logan Gunthorpe
2016-11-28 19:35                                   ` Serguei Sagalovitch
2016-11-28 21:36                                     ` Logan Gunthorpe
2016-11-28 21:55                                       ` Serguei Sagalovitch
2016-11-28 22:24                                         ` Jason Gunthorpe
2017-01-05 18:39 ` Jerome Glisse [this message]
2017-01-05 19:01   ` Jason Gunthorpe
2017-01-05 19:54     ` Jerome Glisse
2017-01-05 20:07       ` Jason Gunthorpe
2017-01-05 20:19         ` Jerome Glisse
2017-01-05 22:42           ` Jason Gunthorpe
2017-01-05 23:23             ` Jerome Glisse
2017-01-06  0:30               ` Jason Gunthorpe
2017-01-06  0:41                 ` Serguei Sagalovitch
2017-01-06  1:58                 ` Jerome Glisse
2017-01-06 16:56                   ` Serguei Sagalovitch
2017-01-06 17:37                     ` Jerome Glisse
2017-01-06 18:26                       ` Jason Gunthorpe
2017-01-06 19:12                         ` Deucher, Alexander
2017-01-06 22:10                         ` Logan Gunthorpe
2017-01-12  4:54                           ` Stephen Bates
2017-01-12 15:11                             ` Jerome Glisse
2017-01-12 17:17                               ` Jason Gunthorpe
2017-01-13 13:04                               ` Christian König
2017-01-12 22:35                             ` Logan Gunthorpe
2017-01-06 15:08     ` Henrique Almeida
2017-10-20 12:36 ` Ludwig Petrosyan
2017-10-20 15:48   ` Logan Gunthorpe
2017-10-22  6:13     ` Petrosyan, Ludwig
2017-10-22 17:19       ` Logan Gunthorpe
2017-10-23 16:08       ` David Laight
2017-10-23 22:04         ` Logan Gunthorpe
2017-10-24  5:58           ` Petrosyan, Ludwig
2017-10-24 14:58             ` David Laight
2017-10-26 13:28               ` Petrosyan, Ludwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170105183927.GA5324@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Linux-media@vger.kernel.org \
    --cc=Paul.Blinzer@amd.com \
    --cc=Serguei.Sagalovitch@amd.com \
    --cc=Suravee.Suthikulpanit@amd.com \
    --cc=ben.sander@amd.com \
    --cc=david1.zhou@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=qiang.yu@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).