From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Date: Tue, 29 Jan 2019 15:44:00 -0500 Message-ID: <20190129204359.GM3176@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> <20190129174728.6430-4-jglisse@redhat.com> <20190129191120.GE3176@redhat.com> <20190129193250.GK10108@mellanox.com> <20190129195055.GH3176@redhat.com> <20190129202429.GL10108@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20190129202429.GL10108@mellanox.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jason Gunthorpe Cc: Joerg Roedel , "Rafael J . Wysocki" , Greg Kroah-Hartman , Felix Kuehling , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , Christoph Hellwig , "linux-mm@kvack.org" , "iommu@lists.linux-foundation.org" , "linux-pci@vger.kernel.org" , Bjorn Helgaas , Robin Murphy , Logan Gunthorpe , Christian Koenig , Marek Szyprowski List-Id: iommu@lists.linux-foundation.org T24gVHVlLCBKYW4gMjksIDIwMTkgYXQgMDg6MjQ6MzZQTSArMDAwMCwgSmFzb24gR3VudGhvcnBl IHdyb3RlOgo+IE9uIFR1ZSwgSmFuIDI5LCAyMDE5IGF0IDAyOjUwOjU1UE0gLTA1MDAsIEplcm9t ZSBHbGlzc2Ugd3JvdGU6Cj4gCj4gPiBHUFUgZHJpdmVyIGRvIHdhbnQgbW9yZSBjb250cm9sIDop IEdQVSBkcml2ZXIgYXJlIG1vdmluZyB0aGluZ3MgYXJvdW5kCj4gPiBhbGwgdGhlIHRpbWUgYW5k IHRoZXkgaGF2ZSBtb3JlIG1lbW9yeSB0aGFuIGJhciBzcGFjZSAob24gbmV3ZXIgcGxhdGZvcm0K PiA+IEFNRCBHUFUgZG8gcmVzaXplIHRoZSBiYXIgYnV0IGl0IGlzIG5vdCB0aGUgcnVsZSBmb3Ig YWxsIEdQVXMpLiBTbwo+ID4gR1BVIGRyaXZlciBkbyBhY3R1YWx5IG1hbmFnZSB0aGVpciBCQVIg YWRkcmVzcyBzcGFjZSBhbmQgdGhleSBtYXAgYW5kCj4gPiB1bm1hcCB0aGluZyB0aGVyZS4gVGhl eSBjYW4gbm90IGFsbG93IHNvbWVvbmUgdG8ganVzdCBwaW4gc3R1ZmYgdGhlcmUKPiA+IHJhbmRv bWx5IG9yIHRoaXMgd291bGQgZGlzcnVwdCB0aGVpciByZWd1bGFyIHdvcmsgZmxvdy4gSGVuY2Ug dGhleSBuZWVkCj4gPiBjb250cm9sIGFuZCB0aGV5IG1pZ2h0IGltcGxlbWVudCB0aHJlc2hvbGQg Zm9yIGluc3RhbmNlIGlmIHRoZXkgaGF2ZQo+ID4gbW9yZSB0aGFuIE4gcGFnZXMgb2YgYmFyIHNw YWNlIG1hcCBmb3IgcGVlciB0byBwZWVyIHRoZW4gdGhleSBjYW4gZGVjaWRlCj4gPiB0byBmYWxs IGJhY2sgdG8gbWFpbiBtZW1vcnkgZm9yIGFueSBuZXcgcGVlciBtYXBwaW5nLgo+IAo+IEJ1dCB0 aGlzIEFQSSBkb2Vzbid0IHNlZW0gdG8gb2ZmZXIgYW55IGNvbnRyb2wgLSBJIHRob3VnaHQgdGhh dAo+IGNvbnRyb2wgd2FzIGFsbCBjb21pbmcgZnJvbSB0aGUgbW0vaG1tIG5vdGlmaWVycyB0cmln Z2VyaW5nIHAycF91bm1hcHM/CgpUaGUgY29udHJvbCBpcyB3aXRoaW4gdGhlIGRyaXZlciBpbXBs ZW1lbnRhdGlvbiBvZiB0aG9zZSBjYWxsYmFja3MuIFNvCmRyaXZlciBpbXBsZW1lbnRhdGlvbiBj YW4gcmVmdXNlIHRvIG1hcCBieSByZXR1cm5pbmcgYW4gZXJyb3Igb24gcDJwX21hcApvciBpdCBj YW4gZGVjaWRlIHRvIHVzZSBtYWluIG1lbW9yeSBieSBtaWdyYXRpbmcgaXRzIG9iamVjdCB0byBt YWluIG1lbW9yeQphbmQgcG9wdWxhdGluZyB0aGUgZG1hIGFkZHJlc3MgYXJyYXkgd2l0aCBkbWFf cGFnZV9tYXAoKSBvZiB0aGUgbWFpbiBtZW1vcnkKcGFnZXMuIERyaXZlciBsaWtlIEdQVSBjYW4g aGF2ZSBwb2xpY3kgb24gdG9wIG9mIHRoYXQgZm9yIGluc3RhbmNlIHRoZXkKd2lsbCBvbmx5IGFs bG93IHAycCBtYXAgdG8gc3VjY2VlZCBmb3Igb2JqZWN0cyB0aGF0IGhhdmUgYmVlbiB0YWdnZWQg YnkgdGhlCnVzZXJzcGFjZSBpbiBzb21lIHdheSBpZSB0aGUgdXNlcnNwYWNlIGFwcGxpY2F0aW9u IGlzIGluIGNvbnRyb2wgb2Ygd2hhdApjYW4gYmUgbWFwIHRvIHBlZXIgZGV2aWNlLiBUaGlzIGlz IG5lZWRlZCBmb3IgR1BVIGRyaXZlciBhcyB3ZSBkbyB3YW50CnVzZXJzcGFjZSBpbnZvbHZlbWVu dCBvbiB3aGF0IG9iamVjdCBhcmUgYWxsb3dlZCB0byBoYXZlIHAycCBhY2Nlc3MgYW5kCmFsc28g c28gdGhhdCB3ZSBjYW4gcmVwb3J0IHRvIHVzZXJzcGFjZSB3aGVuIHdlIGFyZSBydW5uaW5nIG91 dCBvZiBCQVIKYWRkcmVzc2VzIGZvciB0aGlzIHRvIHdvcmsgYXMgaW50ZW5kZWQgKGllIG5vdCBm YWxsaW5nIGJhY2sgdG8gbWFpbiBtZW1vcnkpCnNvIHRoYXQgYXBwbGljYXRpb24gY2FuIHRha2Ug YXBwcm9wcmlhdGUgYWN0aW9ucyAobGlrZSBkZWNpZGUgd2hhdCB0bwpwcmlvcml0aXplKS4KCkZv ciBtb3ZpbmcgdGhpbmdzIGFyb3VuZCBhZnRlciBhIHN1Y2Nlc3NmdWwgcDJwX21hcCB5ZXMgdGhl IGV4cG9ydGluZwpkZXZpY2UgaGF2ZSB0byBjYWxsIGZvciBpbnN0YW5jZSB6YXBfdm1hX3B0ZXMo KSBvciBzb21ldGhpbmcgc2ltaWxhci4KVGhpcyB3aWxsIHRyaWdnZXIgbm90aWZpZXIgY2FsbCBh bmQgdGhlIGltcG9ydGluZyBkZXZpY2Ugd2lsbCBpbnZhbGlkYXRlCml0cyBtYXBwaW5nLiBPbmNl IGl0IGlzIGludmFsaWRhdGVkIHRoZW4gdGhlIGV4cG9ydGluZyBkZXZpY2UgY2FuCnBvaW50IG5l dyBjYWxsIG9mIHAycF9tYXAgKGZvciB0aGUgc2FtZSByYW5nZSkgdG8gbmV3IG1lbW9yeSAob2J2 aW91c2x5CnRoZSBleHBvcnRpbmcgZGV2aWNlIGhhdmUgdG8gc3luY2hyb25pemUgYW55IGNvbmN1 cnJlbnQgY2FsbCB0byBwMnBfbWFwCndpdGggdGhlIGludmFsaWRhdGlvbikuCgo+IAo+IEkgd291 bGQgdGhpbmsgdGhhdCB0aGUgaW1wb3J0aW5nIGRyaXZlciBjYW4gYXNzdW1lIHRoZSBCQVIgcGFn ZSBpcwo+IGtlcHQgYWxpdmUgdW50aWwgaXQgY2FsbHMgdW5tYXAgKHByZXN1bWFibHkgdHJpZ2dl cmVkIGJ5IG5vdGlmaWVycyk/Cj4gCj4gaWUgdGhlIGV4cG9ydGluZyBkcml2ZXIgc2VlcyB0aGUg QkFSIHBhZ2UgYXMgcGlubmVkIHVudGlsIHVubWFwLgoKVGhlIGludGVudGlvbiB3aXRoIHRoaXMg cGF0Y2hzZXQgaXMgdGhhdCBpdCBpcyBub3QgcGluIGllIHRoZSBpbXBvcnRlcgpkZXZpY2UgX211 c3RfIGFiaWRlIGJ5IGFsbCBtbXUgbm90aWZpZXIgaW52YWxpZGF0aW9ucyBhbmQgdGhleSBjYW4K aGFwcGVuIGF0IGFueXRpbWUuIFRoZSBpbXBvcnRpbmcgZGV2aWNlIGNhbiBob3dldmVyIHJlLXAy cF9tYXAgdGhlCnNhbWUgcmFuZ2UgYWZ0ZXIgYW4gaW52YWxpZGF0aW9uLgoKSSB3b3VsZCBsaWtl IHRvIHJlc3RyaWN0IHRoaXMgdG8gaW1wb3J0ZXIgdGhhdCBjYW4gaW52YWxpZGF0ZSBmb3IKbm93 IGJlY2F1c2UgaSBiZWxpZXZlIGFsbCB0aGUgZmlyc3QgZGV2aWNlIHRvIHVzZSBjYW4gc3VwcG9y dCB0aGUKaW52YWxpZGF0aW9uLgoKQWxzbyB3aGVuIHVzaW5nIEhNTSBwcml2YXRlIGRldmljZSBt ZW1vcnkgd2UgX2NhbiBub3RfIHBpbiB2aXJ0dWFsCmFkZHJlc3MgdG8gZGV2aWNlIG1lbW9yeSBh cyBvdGhlcndpc2UgQ1BVIGFjY2VzcyB3b3VsZCBoYXZlIHRvIFNJR0JVUwpvciBTRUdGQVVMVCBh bmQgd2UgZG8gbm90IHdhbnQgdGhhdC4gU28gdGhpcyB3YXMgYWxzbyBhIG1vdGl2YXRpb24gdG8K a2VlcCB0aGluZyBjb25zaXN0ZW50IGZvciB0aGUgaW1wb3J0ZXIgZm9yIGJvdGggY2FzZXMuCgpD aGVlcnMsCkrDqXLDtG1lCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCmRyaS1kZXZlbCBtYWlsaW5nIGxpc3QKZHJpLWRldmVsQGxpc3RzLmZyZWVkZXNrdG9w Lm9yZwpodHRwczovL2xpc3RzLmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2RyaS1k ZXZlbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF189C169C4 for ; Tue, 29 Jan 2019 20:44:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A479E20989 for ; Tue, 29 Jan 2019 20:44:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727075AbfA2UoG (ORCPT ); Tue, 29 Jan 2019 15:44:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46970 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726945AbfA2UoG (ORCPT ); Tue, 29 Jan 2019 15:44:06 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0BCDF80503; Tue, 29 Jan 2019 20:44:05 +0000 (UTC) Received: from redhat.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 598275F7C2; Tue, 29 Jan 2019 20:44:02 +0000 (UTC) Date: Tue, 29 Jan 2019 15:44:00 -0500 From: Jerome Glisse To: Jason Gunthorpe Cc: Logan Gunthorpe , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , "linux-pci@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , "iommu@lists.linux-foundation.org" Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Message-ID: <20190129204359.GM3176@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> <20190129174728.6430-4-jglisse@redhat.com> <20190129191120.GE3176@redhat.com> <20190129193250.GK10108@mellanox.com> <20190129195055.GH3176@redhat.com> <20190129202429.GL10108@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190129202429.GL10108@mellanox.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 29 Jan 2019 20:44:06 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Tue, Jan 29, 2019 at 08:24:36PM +0000, Jason Gunthorpe wrote: > On Tue, Jan 29, 2019 at 02:50:55PM -0500, Jerome Glisse wrote: > > > GPU driver do want more control :) GPU driver are moving things around > > all the time and they have more memory than bar space (on newer platform > > AMD GPU do resize the bar but it is not the rule for all GPUs). So > > GPU driver do actualy manage their BAR address space and they map and > > unmap thing there. They can not allow someone to just pin stuff there > > randomly or this would disrupt their regular work flow. Hence they need > > control and they might implement threshold for instance if they have > > more than N pages of bar space map for peer to peer then they can decide > > to fall back to main memory for any new peer mapping. > > But this API doesn't seem to offer any control - I thought that > control was all coming from the mm/hmm notifiers triggering p2p_unmaps? The control is within the driver implementation of those callbacks. So driver implementation can refuse to map by returning an error on p2p_map or it can decide to use main memory by migrating its object to main memory and populating the dma address array with dma_page_map() of the main memory pages. Driver like GPU can have policy on top of that for instance they will only allow p2p map to succeed for objects that have been tagged by the userspace in some way ie the userspace application is in control of what can be map to peer device. This is needed for GPU driver as we do want userspace involvement on what object are allowed to have p2p access and also so that we can report to userspace when we are running out of BAR addresses for this to work as intended (ie not falling back to main memory) so that application can take appropriate actions (like decide what to prioritize). For moving things around after a successful p2p_map yes the exporting device have to call for instance zap_vma_ptes() or something similar. This will trigger notifier call and the importing device will invalidate its mapping. Once it is invalidated then the exporting device can point new call of p2p_map (for the same range) to new memory (obviously the exporting device have to synchronize any concurrent call to p2p_map with the invalidation). > > I would think that the importing driver can assume the BAR page is > kept alive until it calls unmap (presumably triggered by notifiers)? > > ie the exporting driver sees the BAR page as pinned until unmap. The intention with this patchset is that it is not pin ie the importer device _must_ abide by all mmu notifier invalidations and they can happen at anytime. The importing device can however re-p2p_map the same range after an invalidation. I would like to restrict this to importer that can invalidate for now because i believe all the first device to use can support the invalidation. Also when using HMM private device memory we _can not_ pin virtual address to device memory as otherwise CPU access would have to SIGBUS or SEGFAULT and we do not want that. So this was also a motivation to keep thing consistent for the importer for both cases. Cheers, Jérôme