From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Christian_K=c3=b6nig?= Subject: Re: Enabling peer to peer device transactions for PCIe devices Date: Fri, 13 Jan 2017 14:04:30 +0100 Message-ID: <0733efde-1a6e-5237-0868-e9fb31f237c7@amd.com> References: <20170105201935.GC2166@redhat.com> <20170105224215.GA3855@obsidianresearch.com> <20170105232352.GB6426@redhat.com> <20170106003034.GB4670@obsidianresearch.com> <20170106015831.GA2226@gmail.com> <20170106173722.GB3804@redhat.com> <20170106182625.GB5724@obsidianresearch.com> <2a148b6e-86bc-4c4d-2f22-d733e2cc94cc@deltatee.com> <20170112151129.GA10942@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20170112151129.GA10942-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Jerome Glisse , Stephen Bates Cc: david1.zhou-5C7GfCeVMHo@public.gmane.org, hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, "'linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'" , "'linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org'" , "Kuehling, Felix" , Serguei Sagalovitch , "'linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'" , "'dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org'" , "Blinzer, Paul" , Jason Gunthorpe , qiang.yu-5C7GfCeVMHo@public.gmane.org, "Suthikulpanit, Suravee" , "'linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'" , "Deucher, Alexander" , Jerome Glisse , "Sander, Ben" , "'Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'" List-Id: linux-rdma@vger.kernel.org Am 12.01.2017 um 16:11 schrieb Jerome Glisse: > On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote: >> On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote: >>> >>> On 06/01/17 11:26 AM, Jason Gunthorpe wrote: >>> >>> >>>> Make a generic API for all of this and you'd have my vote.. >>>> >>>> >>>> IMHO, you must support basic pinning semantics - that is necessary to >>>> support generic short lived DMA (eg filesystem, etc). That hardware can >>>> clearly do that if it can support ODP. >>> I agree completely. >>> >>> >>> What we want is for RDMA, O_DIRECT, etc to just work with special VMAs >>> (ie. at least those backed with ZONE_DEVICE memory). Then >>> GPU/NVME/DAX/whatever drivers can just hand these VMAs to userspace >>> (using whatever interface is most appropriate) and userspace can do what >>> it pleases with them. This makes _so_ much sense and actually largely >>> already works today (as demonstrated by iopmem). >> +1 for iopmem ;-) >> >> I feel like we are going around and around on this topic. I would like to >> see something that is upstream that enables P2P even if it is only the >> minimum viable useful functionality to begin. I think aiming for the moon >> (which is what HMM and things like it are) are simply going to take more >> time if they ever get there. >> >> There is a use case for in-kernel P2P PCIe transfers between two NVMe >> devices and between an NVMe device and an RDMA NIC (using NVMe CMBs or >> BARs on the NIC). I am even seeing users who now want to move data P2P >> between FPGAs and NVMe SSDs and the upstream kernel should be able to >> support these users or they will look elsewhere. >> >> The iopmem patchset addressed all the use cases above and while it is not >> an in kernel API it could have been modified to be one reasonably easily. >> As Logan states the driver can then choose to pass the VMAs to user-space >> in a manner that makes sense. >> >> Earlier in the thread someone mentioned LSF/MM. There is already a >> proposal to discuss this topic so if you are interested please respond to >> the email letting the committee know this topic is of interest to you [1= ]. >> >> Also earlier in the thread someone discussed the issues around the IOMMU. >> Given the known issues around P2P transfers in certain CPU root complexes >> [2] it might just be a case of only allowing P2P when a PCIe switch >> connects the two EPs. Another option is just to use CONFIG_EXPERT and ma= ke >> sure people are aware of the pitfalls if they invoke the P2P option. > > iopmem is not applicable to GPU what i propose is to split the issue in 2 > so that everyone can reuse the part that needs to be common namely the DMA > API part where you have to create IOMMU mapping for one device to point > to the other device memory. > > We can have a DMA API that is agnostic to how the device memory is manage > (so does not matter if device memory have struct page or not). This what > i have been arguing in this thread. To make progress on this issue we need > to stop conflicting different use case. > > So i say let solve the IOMMU issue first and let everyone use it in their > own way with their device. I do not think we can share much more than > that. Yeah, exactly what I said from the very beginning as well. Just hacking = together quick solutions doesn't really solve the problem in the long term. What we need is proper adjusting of the DMA API towards handling of P2P = and then build solutions for the different use cases on top of that. We should also avoid falling into the trap of trying to just handle the = existing get_user_pages and co interfaces so that the existing code = doesn't need to change. P2P needs to be validated for each use case = individually and not implemented in workarounds with fingers crossed and = hoped for the best. Regards, Christian. > > Cheers, > J=E9r=F4me