From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 5 Jan 2017 14:54:24 -0500 From: Jerome Glisse Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20170105195424.GB2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170105190113.GA12587@obsidianresearch.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Jason Gunthorpe Cc: david1.zhou@amd.com, qiang.yu@amd.com, "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , Kuehling,, "Serguei , 'linux-kernel@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , Koenig,, Alexander, "Ben , Suthikulpanit, Suravee" , "'linux-pci@vger.kernel.org'" , Jerome Glisse , "Blinzer, Paul" , "'Linux-media@vger.kernel.org'" List-ID: On Thu, Jan 05, 2017 at 12:01:13PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote: > = > > 1) peer-to-peer because of userspace specific API like NVidia GPU > > direct (AMD is pushing its own similar API i just can't remember > > marketing name). This does not happen through a vma, this happens > > through specific device driver call going through device specific > > ioctl on both side (GPU and RDMA). So both kernel driver are aware > > of each others. > = > Today you can only do user-initiated RDMA operations in conjection > with a VMA. > = > We'd need a really big and strong reason to create an entirely new > non-VMA based memory handle scheme for RDMA. > = > So my inclination is to just completely push back on this idea. You > need a VMA to do RMA. > = > GPUs need to create VMAs for the memory they want to RDMA from, even > if the VMA handle just causes SIGBUS for any CPU access. Mellanox and NVidia support peer to peer with what they market a GPUDirect. It only works without IOMMU. It is probably not upstream : https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html I thought it was but it seems it require an out of tree driver to work. Wether there is a vma or not isn't important to the issue anyway. If you want to enforce VMA rule for RDMA it is an RDMA specific discussion in which i don't want to be involve, it is not my turf :) What matter is the back channel API between peer-to-peer device. Like the above patchset points out for GPU we need to be able to invalidate a mapping at any point in time. Pining is not something we want to live with. So the VMA consideration does not change what i was saying there is 2 cases: 1) device vma (might be restricted to specific userspace API) 2) regular vma (!VM_MIXED and no special pte entry) For 1) you need back channel it can be per device driver or we can agree to some common API that can add to vm_operations_struct. For 2) expectation is that you will have valid struct page but you still need special handling at the dma API level. In 1) the peer-to-peer mapping is track at vma level and mediated there. For 2) it is per page and it is mediated at that level. In both case on you have setup mapping you need to handle the IOMMU and the PCI bridge restriction that might apply and i believe that the DMA API is the place where we want to solve that second side of the problem. Cheers, J=E9r=F4me _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:48908 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966543AbdAETy2 (ORCPT ); Thu, 5 Jan 2017 14:54:28 -0500 Date: Thu, 5 Jan 2017 14:54:24 -0500 From: Jerome Glisse To: Jason Gunthorpe Cc: Jerome Glisse , "Deucher, Alexander" , "'linux-kernel@vger.kernel.org'" , "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "'Linux-media@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "'linux-pci@vger.kernel.org'" , "Kuehling, Felix" , "Sagalovitch, Serguei" , "Blinzer, Paul" , "Koenig, Christian" , "Suthikulpanit, Suravee" , "Sander, Ben" , hch@infradead.org, david1.zhou@amd.com, qiang.yu@amd.com Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20170105195424.GB2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 In-Reply-To: <20170105190113.GA12587@obsidianresearch.com> Sender: linux-pci-owner@vger.kernel.org List-ID: On Thu, Jan 05, 2017 at 12:01:13PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote: > > > 1) peer-to-peer because of userspace specific API like NVidia GPU > > direct (AMD is pushing its own similar API i just can't remember > > marketing name). This does not happen through a vma, this happens > > through specific device driver call going through device specific > > ioctl on both side (GPU and RDMA). So both kernel driver are aware > > of each others. > > Today you can only do user-initiated RDMA operations in conjection > with a VMA. > > We'd need a really big and strong reason to create an entirely new > non-VMA based memory handle scheme for RDMA. > > So my inclination is to just completely push back on this idea. You > need a VMA to do RMA. > > GPUs need to create VMAs for the memory they want to RDMA from, even > if the VMA handle just causes SIGBUS for any CPU access. Mellanox and NVidia support peer to peer with what they market a GPUDirect. It only works without IOMMU. It is probably not upstream : https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html I thought it was but it seems it require an out of tree driver to work. Wether there is a vma or not isn't important to the issue anyway. If you want to enforce VMA rule for RDMA it is an RDMA specific discussion in which i don't want to be involve, it is not my turf :) What matter is the back channel API between peer-to-peer device. Like the above patchset points out for GPU we need to be able to invalidate a mapping at any point in time. Pining is not something we want to live with. So the VMA consideration does not change what i was saying there is 2 cases: 1) device vma (might be restricted to specific userspace API) 2) regular vma (!VM_MIXED and no special pte entry) For 1) you need back channel it can be per device driver or we can agree to some common API that can add to vm_operations_struct. For 2) expectation is that you will have valid struct page but you still need special handling at the dma API level. In 1) the peer-to-peer mapping is track at vma level and mediated there. For 2) it is per page and it is mediated at that level. In both case on you have setup mapping you need to handle the IOMMU and the PCI bridge restriction that might apply and i believe that the DMA API is the place where we want to solve that second side of the problem. Cheers, Jérôme From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: Enabling peer to peer device transactions for PCIe devices Date: Thu, 5 Jan 2017 14:54:24 -0500 Message-ID: <20170105195424.GB2166@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20170105190113.GA12587@obsidianresearch.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jason Gunthorpe Cc: qiang.yu@amd.com, "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "Kuehling, Felix" , "Sagalovitch, Serguei" , "'linux-kernel@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "Koenig, Christian" , hch@infradead.org, "Deucher, Alexander" , "Sander, Ben" , "Suthikulpanit, Suravee" , "'linux-pci@vger.kernel.org'" , "Blinzer, Paul" , "'Linux-media@vger.kernel.org'" List-Id: linux-rdma@vger.kernel.org T24gVGh1LCBKYW4gMDUsIDIwMTcgYXQgMTI6MDE6MTNQTSAtMDcwMCwgSmFzb24gR3VudGhvcnBl IHdyb3RlOgo+IE9uIFRodSwgSmFuIDA1LCAyMDE3IGF0IDAxOjM5OjI5UE0gLTA1MDAsIEplcm9t ZSBHbGlzc2Ugd3JvdGU6Cj4gCj4gPiAgIDEpIHBlZXItdG8tcGVlciBiZWNhdXNlIG9mIHVzZXJz cGFjZSBzcGVjaWZpYyBBUEkgbGlrZSBOVmlkaWEgR1BVCj4gPiAgICAgZGlyZWN0IChBTUQgaXMg cHVzaGluZyBpdHMgb3duIHNpbWlsYXIgQVBJIGkganVzdCBjYW4ndCByZW1lbWJlcgo+ID4gICAg IG1hcmtldGluZyBuYW1lKS4gVGhpcyBkb2VzIG5vdCBoYXBwZW4gdGhyb3VnaCBhIHZtYSwgdGhp cyBoYXBwZW5zCj4gPiAgICAgdGhyb3VnaCBzcGVjaWZpYyBkZXZpY2UgZHJpdmVyIGNhbGwgZ29p bmcgdGhyb3VnaCBkZXZpY2Ugc3BlY2lmaWMKPiA+ICAgICBpb2N0bCBvbiBib3RoIHNpZGUgKEdQ VSBhbmQgUkRNQSkuIFNvIGJvdGgga2VybmVsIGRyaXZlciBhcmUgYXdhcmUKPiA+ICAgICBvZiBl YWNoIG90aGVycy4KPiAKPiBUb2RheSB5b3UgY2FuIG9ubHkgZG8gdXNlci1pbml0aWF0ZWQgUkRN QSBvcGVyYXRpb25zIGluIGNvbmplY3Rpb24KPiB3aXRoIGEgVk1BLgo+IAo+IFdlJ2QgbmVlZCBh IHJlYWxseSBiaWcgYW5kIHN0cm9uZyByZWFzb24gdG8gY3JlYXRlIGFuIGVudGlyZWx5IG5ldwo+ IG5vbi1WTUEgYmFzZWQgbWVtb3J5IGhhbmRsZSBzY2hlbWUgZm9yIFJETUEuCj4gCj4gU28gbXkg aW5jbGluYXRpb24gaXMgdG8ganVzdCBjb21wbGV0ZWx5IHB1c2ggYmFjayBvbiB0aGlzIGlkZWEu IFlvdQo+IG5lZWQgYSBWTUEgdG8gZG8gUk1BLgo+IAo+IEdQVXMgbmVlZCB0byBjcmVhdGUgVk1B cyBmb3IgdGhlIG1lbW9yeSB0aGV5IHdhbnQgdG8gUkRNQSBmcm9tLCBldmVuCj4gaWYgdGhlIFZN QSBoYW5kbGUganVzdCBjYXVzZXMgU0lHQlVTIGZvciBhbnkgQ1BVIGFjY2Vzcy4KCk1lbGxhbm94 IGFuZCBOVmlkaWEgc3VwcG9ydCBwZWVyIHRvIHBlZXIgd2l0aCB3aGF0IHRoZXkgbWFya2V0IGEK R1BVRGlyZWN0LiBJdCBvbmx5IHdvcmtzIHdpdGhvdXQgSU9NTVUuIEl0IGlzIHByb2JhYmx5IG5v dCB1cHN0cmVhbSA6CgpodHRwczovL3d3dy5tYWlsLWFyY2hpdmUuY29tL2xpbnV4LXJkbWFAdmdl ci5rZXJuZWwub3JnL21zZzIxNDAyLmh0bWwKCkkgdGhvdWdodCBpdCB3YXMgYnV0IGl0IHNlZW1z IGl0IHJlcXVpcmUgYW4gb3V0IG9mIHRyZWUgZHJpdmVyIHRvIHdvcmsuCgpXZXRoZXIgdGhlcmUg aXMgYSB2bWEgb3Igbm90IGlzbid0IGltcG9ydGFudCB0byB0aGUgaXNzdWUgYW55d2F5LiBJZgp5 b3Ugd2FudCB0byBlbmZvcmNlIFZNQSBydWxlIGZvciBSRE1BIGl0IGlzIGFuIFJETUEgc3BlY2lm aWMgZGlzY3Vzc2lvbgppbiB3aGljaCBpIGRvbid0IHdhbnQgdG8gYmUgaW52b2x2ZSwgaXQgaXMg bm90IG15IHR1cmYgOikKCldoYXQgbWF0dGVyIGlzIHRoZSBiYWNrIGNoYW5uZWwgQVBJIGJldHdl ZW4gcGVlci10by1wZWVyIGRldmljZS4gTGlrZQp0aGUgYWJvdmUgcGF0Y2hzZXQgcG9pbnRzIG91 dCBmb3IgR1BVIHdlIG5lZWQgdG8gYmUgYWJsZSB0byBpbnZhbGlkYXRlCmEgbWFwcGluZyBhdCBh bnkgcG9pbnQgaW4gdGltZS4gUGluaW5nIGlzIG5vdCBzb21ldGhpbmcgd2Ugd2FudCB0bwpsaXZl IHdpdGguCgpTbyB0aGUgVk1BIGNvbnNpZGVyYXRpb24gZG9lcyBub3QgY2hhbmdlIHdoYXQgaSB3 YXMgc2F5aW5nIHRoZXJlIGlzCjIgY2FzZXM6CiAgMSkgZGV2aWNlIHZtYSAobWlnaHQgYmUgcmVz dHJpY3RlZCB0byBzcGVjaWZpYyB1c2Vyc3BhY2UgQVBJKQogIDIpIHJlZ3VsYXIgdm1hICghVk1f TUlYRUQgYW5kIG5vIHNwZWNpYWwgcHRlIGVudHJ5KQoKRm9yIDEpIHlvdSBuZWVkIGJhY2sgY2hh bm5lbCBpdCBjYW4gYmUgcGVyIGRldmljZSBkcml2ZXIgb3Igd2UgY2FuIGFncmVlCnRvIHNvbWUg Y29tbW9uIEFQSSB0aGF0IGNhbiBhZGQgdG8gdm1fb3BlcmF0aW9uc19zdHJ1Y3QuCgpGb3IgMikg ZXhwZWN0YXRpb24gaXMgdGhhdCB5b3Ugd2lsbCBoYXZlIHZhbGlkIHN0cnVjdCBwYWdlIGJ1dCB5 b3Ugc3RpbGwKbmVlZCBzcGVjaWFsIGhhbmRsaW5nIGF0IHRoZSBkbWEgQVBJIGxldmVsLgoKSW4g MSkgdGhlIHBlZXItdG8tcGVlciBtYXBwaW5nIGlzIHRyYWNrIGF0IHZtYSBsZXZlbCBhbmQgbWVk aWF0ZWQgdGhlcmUuCkZvciAyKSBpdCBpcyBwZXIgcGFnZSBhbmQgaXQgaXMgbWVkaWF0ZWQgYXQg dGhhdCBsZXZlbC4KCkluIGJvdGggY2FzZSBvbiB5b3UgaGF2ZSBzZXR1cCBtYXBwaW5nIHlvdSBu ZWVkIHRvIGhhbmRsZSB0aGUgSU9NTVUgYW5kCnRoZSBQQ0kgYnJpZGdlIHJlc3RyaWN0aW9uIHRo YXQgbWlnaHQgYXBwbHkgYW5kIGkgYmVsaWV2ZSB0aGF0IHRoZSBETUEKQVBJIGlzIHRoZSBwbGFj ZSB3aGVyZSB3ZSB3YW50IHRvIHNvbHZlIHRoYXQgc2Vjb25kIHNpZGUgb2YgdGhlIHByb2JsZW0u CgpDaGVlcnMsCkrDqXLDtG1lCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fCmRyaS1kZXZlbCBtYWlsaW5nIGxpc3QKZHJpLWRldmVsQGxpc3RzLmZyZWVkZXNr dG9wLm9yZwpodHRwczovL2xpc3RzLmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2Ry aS1kZXZlbAo=