From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 5 Jan 2017 15:19:36 -0500 From: Jerome Glisse Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20170105201935.GC2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> <20170105195424.GB2166@redhat.com> <20170105200719.GB31047@obsidianresearch.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170105200719.GB31047@obsidianresearch.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Jason Gunthorpe Cc: david1.zhou@amd.com, qiang.yu@amd.com, "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , Kuehling,, "Serguei , 'linux-kernel@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , Koenig,, Alexander, "Ben , Suthikulpanit, Suravee" , "'linux-pci@vger.kernel.org'" , Jerome Glisse , "Blinzer, Paul" , "'Linux-media@vger.kernel.org'" List-ID: On Thu, Jan 05, 2017 at 01:07:19PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 02:54:24PM -0500, Jerome Glisse wrote: > = > > Mellanox and NVidia support peer to peer with what they market a > > GPUDirect. It only works without IOMMU. It is probably not upstream : > > = > > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html > > = > > I thought it was but it seems it require an out of tree driver to work. > = > Right, it is out of tree and not under consideration for mainline. > = > > Wether there is a vma or not isn't important to the issue anyway. If > > you want to enforce VMA rule for RDMA it is an RDMA specific discussion > > in which i don't want to be involve, it is not my turf :) > = > Always having a VMA changes the discussion - the question is how to > create a VMA that reprensents IO device memory, and how do DMA > consumers extract the correct information from that VMA to pass to the > kernel DMA API so it can setup peer-peer DMA. Well my point is that it can't be. In HMM case inside a single VMA you can have one page inside GPU memory at address A but next page inside regular memory at A+4k. So handling this at the VMA level does not make sense. So in this case you would get the device from the struct page and you would query through common API to determine if you can do peer to peer. If not it would trigger migration back to regular memory. If yes then you still have to solve the IOMMU issue and hence the DMA API changes that were propose. In the GPUDirect case the idea is that you have a specific device vma that you map for peer to peer. Here thing can be at vma level and not at a page level. Expectation here is that the GPU userspace expose a special API to allow RDMA to directly happen on GPU object allocated through GPU specific API (ie it is not regular memory and it is not accessible by CPU). Both case are disjoint. Both case need to solve the IOMMU issue which seems to be best solve at the DMA API level. > > What matter is the back channel API between peer-to-peer device. Like > > the above patchset points out for GPU we need to be able to invalidate > > a mapping at any point in time. Pining is not something we want to > > live with. > = > We have MMU notifiers to handle this today in RDMA. Async RDMA MR > Invalidate like you see in the above out of tree patches is totally > crazy and shouldn't be in mainline. Use ODP capable RDMA hardware. Well there is still a large base of hardware that do not have such feature and some people would like to be able to keep using those. I believe allowing direct access to GPU object that are otherwise hidden from regular kernel memory management is still meaningfull. Cheers, J=E9r=F4me _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 5 Jan 2017 15:19:36 -0500 From: Jerome Glisse To: Jason Gunthorpe Cc: Jerome Glisse , "Deucher, Alexander" , "'linux-kernel@vger.kernel.org'" , "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "'Linux-media@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "'linux-pci@vger.kernel.org'" , "Kuehling, Felix" , "Sagalovitch, Serguei" , "Blinzer, Paul" , "Koenig, Christian" , "Suthikulpanit, Suravee" , "Sander, Ben" , hch@infradead.org, david1.zhou@amd.com, qiang.yu@amd.com Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20170105201935.GC2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> <20170105195424.GB2166@redhat.com> <20170105200719.GB31047@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 In-Reply-To: <20170105200719.GB31047@obsidianresearch.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: On Thu, Jan 05, 2017 at 01:07:19PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 02:54:24PM -0500, Jerome Glisse wrote: > > > Mellanox and NVidia support peer to peer with what they market a > > GPUDirect. It only works without IOMMU. It is probably not upstream : > > > > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html > > > > I thought it was but it seems it require an out of tree driver to work. > > Right, it is out of tree and not under consideration for mainline. > > > Wether there is a vma or not isn't important to the issue anyway. If > > you want to enforce VMA rule for RDMA it is an RDMA specific discussion > > in which i don't want to be involve, it is not my turf :) > > Always having a VMA changes the discussion - the question is how to > create a VMA that reprensents IO device memory, and how do DMA > consumers extract the correct information from that VMA to pass to the > kernel DMA API so it can setup peer-peer DMA. Well my point is that it can't be. In HMM case inside a single VMA you can have one page inside GPU memory at address A but next page inside regular memory at A+4k. So handling this at the VMA level does not make sense. So in this case you would get the device from the struct page and you would query through common API to determine if you can do peer to peer. If not it would trigger migration back to regular memory. If yes then you still have to solve the IOMMU issue and hence the DMA API changes that were propose. In the GPUDirect case the idea is that you have a specific device vma that you map for peer to peer. Here thing can be at vma level and not at a page level. Expectation here is that the GPU userspace expose a special API to allow RDMA to directly happen on GPU object allocated through GPU specific API (ie it is not regular memory and it is not accessible by CPU). Both case are disjoint. Both case need to solve the IOMMU issue which seems to be best solve at the DMA API level. > > What matter is the back channel API between peer-to-peer device. Like > > the above patchset points out for GPU we need to be able to invalidate > > a mapping at any point in time. Pining is not something we want to > > live with. > > We have MMU notifiers to handle this today in RDMA. Async RDMA MR > Invalidate like you see in the above out of tree patches is totally > crazy and shouldn't be in mainline. Use ODP capable RDMA hardware. Well there is still a large base of hardware that do not have such feature and some people would like to be able to keep using those. I believe allowing direct access to GPU object that are otherwise hidden from regular kernel memory management is still meaningfull. Cheers, Jérôme From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: Enabling peer to peer device transactions for PCIe devices Date: Thu, 5 Jan 2017 15:19:36 -0500 Message-ID: <20170105201935.GC2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> <20170105195424.GB2166@redhat.com> <20170105200719.GB31047@obsidianresearch.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20170105200719.GB31047@obsidianresearch.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jason Gunthorpe Cc: qiang.yu@amd.com, "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "Kuehling, Felix" , "Sagalovitch, Serguei" , "'linux-kernel@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "Koenig, Christian" , hch@infradead.org, "Deucher, Alexander" , "Sander, Ben" , "Suthikulpanit, Suravee" , "'linux-pci@vger.kernel.org'" , "Blinzer, Paul" , "'Linux-media@vger.kernel.org'" List-Id: linux-rdma@vger.kernel.org T24gVGh1LCBKYW4gMDUsIDIwMTcgYXQgMDE6MDc6MTlQTSAtMDcwMCwgSmFzb24gR3VudGhvcnBl IHdyb3RlOgo+IE9uIFRodSwgSmFuIDA1LCAyMDE3IGF0IDAyOjU0OjI0UE0gLTA1MDAsIEplcm9t ZSBHbGlzc2Ugd3JvdGU6Cj4gCj4gPiBNZWxsYW5veCBhbmQgTlZpZGlhIHN1cHBvcnQgcGVlciB0 byBwZWVyIHdpdGggd2hhdCB0aGV5IG1hcmtldCBhCj4gPiBHUFVEaXJlY3QuIEl0IG9ubHkgd29y a3Mgd2l0aG91dCBJT01NVS4gSXQgaXMgcHJvYmFibHkgbm90IHVwc3RyZWFtIDoKPiA+IAo+ID4g aHR0cHM6Ly93d3cubWFpbC1hcmNoaXZlLmNvbS9saW51eC1yZG1hQHZnZXIua2VybmVsLm9yZy9t c2cyMTQwMi5odG1sCj4gPiAKPiA+IEkgdGhvdWdodCBpdCB3YXMgYnV0IGl0IHNlZW1zIGl0IHJl cXVpcmUgYW4gb3V0IG9mIHRyZWUgZHJpdmVyIHRvIHdvcmsuCj4gCj4gUmlnaHQsIGl0IGlzIG91 dCBvZiB0cmVlIGFuZCBub3QgdW5kZXIgY29uc2lkZXJhdGlvbiBmb3IgbWFpbmxpbmUuCj4gCj4g PiBXZXRoZXIgdGhlcmUgaXMgYSB2bWEgb3Igbm90IGlzbid0IGltcG9ydGFudCB0byB0aGUgaXNz dWUgYW55d2F5LiBJZgo+ID4geW91IHdhbnQgdG8gZW5mb3JjZSBWTUEgcnVsZSBmb3IgUkRNQSBp dCBpcyBhbiBSRE1BIHNwZWNpZmljIGRpc2N1c3Npb24KPiA+IGluIHdoaWNoIGkgZG9uJ3Qgd2Fu dCB0byBiZSBpbnZvbHZlLCBpdCBpcyBub3QgbXkgdHVyZiA6KQo+IAo+IEFsd2F5cyBoYXZpbmcg YSBWTUEgY2hhbmdlcyB0aGUgZGlzY3Vzc2lvbiAtIHRoZSBxdWVzdGlvbiBpcyBob3cgdG8KPiBj cmVhdGUgYSBWTUEgdGhhdCByZXByZW5zZW50cyBJTyBkZXZpY2UgbWVtb3J5LCBhbmQgaG93IGRv IERNQQo+IGNvbnN1bWVycyBleHRyYWN0IHRoZSBjb3JyZWN0IGluZm9ybWF0aW9uIGZyb20gdGhh dCBWTUEgdG8gcGFzcyB0byB0aGUKPiBrZXJuZWwgRE1BIEFQSSBzbyBpdCBjYW4gc2V0dXAgcGVl ci1wZWVyIERNQS4KCldlbGwgbXkgcG9pbnQgaXMgdGhhdCBpdCBjYW4ndCBiZS4gSW4gSE1NIGNh c2UgaW5zaWRlIGEgc2luZ2xlIFZNQSB5b3UKY2FuIGhhdmUgb25lIHBhZ2UgaW5zaWRlIEdQVSBt ZW1vcnkgYXQgYWRkcmVzcyBBIGJ1dCBuZXh0IHBhZ2UgaW5zaWRlCnJlZ3VsYXIgbWVtb3J5IGF0 IEErNGsuIFNvIGhhbmRsaW5nIHRoaXMgYXQgdGhlIFZNQSBsZXZlbCBkb2VzIG5vdCBtYWtlCnNl bnNlLiBTbyBpbiB0aGlzIGNhc2UgeW91IHdvdWxkIGdldCB0aGUgZGV2aWNlIGZyb20gdGhlIHN0 cnVjdCBwYWdlCmFuZCB5b3Ugd291bGQgcXVlcnkgdGhyb3VnaCBjb21tb24gQVBJIHRvIGRldGVy bWluZSBpZiB5b3UgY2FuIGRvIHBlZXIKdG8gcGVlci4gSWYgbm90IGl0IHdvdWxkIHRyaWdnZXIg bWlncmF0aW9uIGJhY2sgdG8gcmVndWxhciBtZW1vcnkuCklmIHllcyB0aGVuIHlvdSBzdGlsbCBo YXZlIHRvIHNvbHZlIHRoZSBJT01NVSBpc3N1ZSBhbmQgaGVuY2UgdGhlIERNQQpBUEkgY2hhbmdl cyB0aGF0IHdlcmUgcHJvcG9zZS4KCkluIHRoZSBHUFVEaXJlY3QgY2FzZSB0aGUgaWRlYSBpcyB0 aGF0IHlvdSBoYXZlIGEgc3BlY2lmaWMgZGV2aWNlIHZtYQp0aGF0IHlvdSBtYXAgZm9yIHBlZXIg dG8gcGVlci4gSGVyZSB0aGluZyBjYW4gYmUgYXQgdm1hIGxldmVsIGFuZCBub3QgYXQKYSBwYWdl IGxldmVsLiBFeHBlY3RhdGlvbiBoZXJlIGlzIHRoYXQgdGhlIEdQVSB1c2Vyc3BhY2UgZXhwb3Nl IGEgc3BlY2lhbApBUEkgdG8gYWxsb3cgUkRNQSB0byBkaXJlY3RseSBoYXBwZW4gb24gR1BVIG9i amVjdCBhbGxvY2F0ZWQgdGhyb3VnaApHUFUgc3BlY2lmaWMgQVBJIChpZSBpdCBpcyBub3QgcmVn dWxhciBtZW1vcnkgYW5kIGl0IGlzIG5vdCBhY2Nlc3NpYmxlCmJ5IENQVSkuCgoKQm90aCBjYXNl IGFyZSBkaXNqb2ludC4gQm90aCBjYXNlIG5lZWQgdG8gc29sdmUgdGhlIElPTU1VIGlzc3VlIHdo aWNoCnNlZW1zIHRvIGJlIGJlc3Qgc29sdmUgYXQgdGhlIERNQSBBUEkgbGV2ZWwuCgoKPiA+IFdo YXQgbWF0dGVyIGlzIHRoZSBiYWNrIGNoYW5uZWwgQVBJIGJldHdlZW4gcGVlci10by1wZWVyIGRl dmljZS4gTGlrZQo+ID4gdGhlIGFib3ZlIHBhdGNoc2V0IHBvaW50cyBvdXQgZm9yIEdQVSB3ZSBu ZWVkIHRvIGJlIGFibGUgdG8gaW52YWxpZGF0ZQo+ID4gYSBtYXBwaW5nIGF0IGFueSBwb2ludCBp biB0aW1lLiBQaW5pbmcgaXMgbm90IHNvbWV0aGluZyB3ZSB3YW50IHRvCj4gPiBsaXZlIHdpdGgu Cj4gCj4gV2UgaGF2ZSBNTVUgbm90aWZpZXJzIHRvIGhhbmRsZSB0aGlzIHRvZGF5IGluIFJETUEu IEFzeW5jIFJETUEgTVIKPiBJbnZhbGlkYXRlIGxpa2UgeW91IHNlZSBpbiB0aGUgYWJvdmUgb3V0 IG9mIHRyZWUgcGF0Y2hlcyBpcyB0b3RhbGx5Cj4gY3JhenkgYW5kIHNob3VsZG4ndCBiZSBpbiBt YWlubGluZS4gVXNlIE9EUCBjYXBhYmxlIFJETUEgaGFyZHdhcmUuCgpXZWxsIHRoZXJlIGlzIHN0 aWxsIGEgbGFyZ2UgYmFzZSBvZiBoYXJkd2FyZSB0aGF0IGRvIG5vdCBoYXZlIHN1Y2gKZmVhdHVy ZSBhbmQgc29tZSBwZW9wbGUgd291bGQgbGlrZSB0byBiZSBhYmxlIHRvIGtlZXAgdXNpbmcgdGhv c2UuCkkgYmVsaWV2ZSBhbGxvd2luZyBkaXJlY3QgYWNjZXNzIHRvIEdQVSBvYmplY3QgdGhhdCBh cmUgb3RoZXJ3aXNlCmhpZGRlbiBmcm9tIHJlZ3VsYXIga2VybmVsIG1lbW9yeSBtYW5hZ2VtZW50 IGlzIHN0aWxsIG1lYW5pbmdmdWxsLgoKQ2hlZXJzLApKw6lyw7RtZQoKX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVsIG1haWxpbmcgbGlzdApk cmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Au b3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==