From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: [PATCH v4 0/9] mmu notifier provide context informations Date: Wed, 23 Jan 2019 18:04:47 -0500 Message-ID: <20190123230447.GC1257@redhat.com> References: <20190123222315.1122-1-jglisse@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Dan Williams Cc: Ralph Campbell , Jan Kara , Arnd Bergmann , KVM list , Matthew Wilcox , linux-rdma , John Hubbard , Felix Kuehling , Radim =?utf-8?B?S3LEjW3DocWZ?= , Linux Kernel Mailing List , Maling list - DRI developers , Michal Hocko , Linux MM , Jason Gunthorpe , Ross Zwisler , linux-fsdevel , Paolo Bonzini , Andrew Morton , Christian =?iso-8859-1?Q?K=F6nig?= List-Id: linux-rdma@vger.kernel.org T24gV2VkLCBKYW4gMjMsIDIwMTkgYXQgMDI6NTQ6NDBQTSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdy b3RlOgo+IE9uIFdlZCwgSmFuIDIzLCAyMDE5IGF0IDI6MjMgUE0gPGpnbGlzc2VAcmVkaGF0LmNv bT4gd3JvdGU6Cj4gPgo+ID4gRnJvbTogSsOpcsO0bWUgR2xpc3NlIDxqZ2xpc3NlQHJlZGhhdC5j b20+Cj4gPgo+ID4gSGkgQW5kcmV3LCBpIHNlZSB0aGF0IHlvdSBzdGlsbCBoYXZlIG15IGV2ZW50 IHBhdGNoIGluIHlvdSBxdWV1ZSBbMV0uCj4gPiBUaGlzIHBhdGNoc2V0IHJlcGxhY2UgdGhhdCBz aW5nbGUgcGF0Y2ggYW5kIGlzIGJyb2tlbiBkb3duIGluIGZ1cnRoZXIKPiA+IHN0ZXAgc28gdGhh dCBpdCBpcyBlYXNpZXIgdG8gcmV2aWV3IGFuZCBhc2NlcnRhaW4gdGhhdCBubyBtaXN0YWtlIHdl cmUKPiA+IG1hZGUgZHVyaW5nIG1lY2hhbmljYWwgY2hhbmdlcy4gSGVyZSBhcmUgdGhlIHN0ZXA6 Cj4gPgo+ID4gICAgIFBhdGNoIDEgLSBhZGQgdGhlIGVudW0gdmFsdWVzCj4gPiAgICAgUGF0Y2gg MiAtIGNvY2NpbmVsbGUgc2VtYW50aWMgcGF0Y2ggdG8gY29udmVydCBhbGwgY2FsbCBzaXRlIG9m Cj4gPiAgICAgICAgICAgICAgIG1tdV9ub3RpZmllcl9yYW5nZV9pbml0IHRvIGRlZmF1bHQgZW51 bSB2YWx1ZSBhbmQgYWxzbwo+ID4gICAgICAgICAgICAgICB0byBwYXNzaW5nIGRvd24gdGhlIHZt YSB3aGVuIGl0IGlzIGF2YWlsYWJsZQo+ID4gICAgIFBhdGNoIDMgLSB1cGRhdGUgbWFueSBjYWxs IHNpdGUgdG8gbW9yZSBhY2N1cmF0ZSBlbnVtIHZhbHVlcwo+ID4gICAgIFBhdGNoIDQgLSBhZGQg dGhlIGluZm9ybWF0aW9uIHRvIHRoZSBtbXVfbm90aWZpZXJfcmFuZ2Ugc3RydWN0Cj4gPiAgICAg UGF0Y2ggNSAtIGhlbHBlciB0byB0ZXN0IGlmIGEgcmFuZ2UgaXMgdXBkYXRlZCB0byByZWFkIG9u bHkKPiA+Cj4gPiBBbGwgdGhlIHJlbWFpbmluZyBwYXRjaGVzIGFyZSB1cGRhdGUgdG8gdmFyaW91 cyBkcml2ZXIgdG8gZGVtb25zdHJhdGUKPiA+IGhvdyB0aGlzIG5ldyBpbmZvcm1hdGlvbiBnZXQg dXNlIGJ5IGRldmljZSBkcml2ZXIuIEkgYnVpbGQgdGVzdGVkCj4gPiB3aXRoIG1ha2UgYWxsIGFu ZCBtYWtlIGFsbCBtaW51cyBldmVyeXRoaW5nIHRoYXQgZW5hYmxlIG1tdSBub3RpZmllcgo+ID4g aWUgYnVpbGRpbmcgd2l0aCBNTVVfTk9USUZJRVI9bm8uIEFsc28gdGVzdGVkIHdpdGggc29tZSBy YWRlb24sYW1kCj4gPiBncHUgYW5kIGludGVsIGdwdS4KPiA+Cj4gPiBJZiB0aGV5IGFyZSBubyBv YmplY3Rpb25zIGkgYmVsaWV2ZSBiZXN0IHBsYW4gd291bGQgYmUgdG8gbWVyZ2UgdGhlCj4gPiB0 aGUgZmlyc3QgNSBwYXRjaGVzIChhbGwgbW0gY2hhbmdlcykgdGhyb3VnaCB5b3VyIHF1ZXVlIGZv ciA1LjEgYW5kCj4gPiB0aGVuIHRvIGRlbGF5IGRyaXZlciB1cGRhdGUgdG8gZWFjaCBpbmRpdmlk dWFsIGRyaXZlciB0cmVlIGZvciA1LjIuCj4gPiBUaGlzIHdpbGwgYWxsb3cgZWFjaCBpbmRpdmlk dWFsIGRldmljZSBkcml2ZXIgbWFpbnRhaW5lciB0aW1lIHRvIG1vcmUKPiA+IHRob3Vyb3VnaGx5 IHRlc3QgdGhpcyBtb3JlIHRoZW4gbXkgb3duIHRlc3RpbmcuCj4gPgo+ID4gTm90ZSB0aGF0IGkg YWxzbyBpbnRlbmQgdG8gdXNlIHRoaXMgZmVhdHVyZSBmdXJ0aGVyIGluIG5vdXZlYXUgYW5kCj4g PiBITU0gZG93biB0aGUgcm9hZC4gSSBhbHNvIGV4cGVjdCB0aGF0IG90aGVyIHVzZXIgbGlrZSBL Vk0gbWlnaHQgYmUKPiA+IGludGVyZXN0ZWQgaW50byBsZXZlcmFnaW5nIHRoaXMgbmV3IGluZm9y bWF0aW9uIHRvIG9wdGltaXplIHNvbWUgb2YKPiA+IHRoZXJlIHNlY29uZGFyeSBwYWdlIHRhYmxl IGludmFsaWRhdGlvbi4KPiAKPiAiRG93biB0aGUgcm9hZCIgdXNlcnMgc2hvdWxkIGludHJvZHVj ZSB0aGUgZnVuY3Rpb25hbGl0eSB0aGV5IHdhbnQgdG8KPiBjb25zdW1lLiBUaGUgY29tbW9uIGNv bmNlcm4gd2l0aCBwcmVlbXB0aXZlbHkgaW5jbHVkaW5nCj4gZm9yd2FyZC1sb29raW5nIGluZnJh c3RydWN0dXJlIGlzIHJlYWxpemluZyBsYXRlciB0aGF0IHRoZQo+IGluZnJhc3RydWN0dXJlIGlz IG5vdCBuZWVkZWQsIG9yIG5lZWRzIGNoYW5naW5nLiBJZiBpdCBoYXMgbm8gY3VycmVudAo+IGNv bnN1bWVyLCBsZWF2ZSBpdCBvdXQuCgpUaGlzIHBhdGNoc2V0IGFscmVhZHkgc2hvdyB0aGF0IHRo aXMgaXMgdXNlZnVsLCB3aGF0IG1vcmUgY2FuIGkgZG8gPwpJIGtub3cgaSB3aWxsIHVzZSB0aGlz IGluZm9ybWF0aW9uLCBpbiBub3V2ZWF1IGZvciBtZW1vcnkgcG9saWN5IHdlCmFsbG9jYXRlIG91 ciBvd24gc3RydWN0dXJlIGZvciBldmVyeSB2bWEgdGhlIEdQVSBldmVyIGFjY2Vzc2VkIG9yIHRo YXQKdXNlcnNwYWNlIGhpbnRlZCB3ZSBzaG91bGQgc2V0IGEgcG9saWN5IGZvci4gUmlnaHQgbm93 IHdpdGggZXhpc3RpbmcKbW11IG5vdGlmaWVyIGkgX211c3RfIGZyZWUgdGhvc2Ugc3RydWN0dXJl IGJlY2F1c2UgaSBkbyBub3Qga25vdyBpZgp0aGUgaW52YWxpZGF0aW9uIGlzIGFuIG11bm1hcCBv ciBzb21ldGhpbmcgZWxzZS4gU28gaSBhbSBsb29zaW5nCmltcG9ydGFudCBpbmZvcm1hdGlvbnMg YW5kIHVuZWNlc3Nhcmx5IGZyZWUgc3RydWN0IHRoYXQgaSB3aWxsIGhhdmUKdG8gcmUtYWxsb2Nh dGUganVzdCBjb3VwbGUgamlmZmllcyBsYXR0ZXIuIFRoYXQncyBvbmUgd2F5IGkgYW0gdXNpbmcK dGhpcy4gVGhlIG90aGVyIHdheSBpcyB0byBvcHRpbWl6ZSBHUFUgcGFnZSB0YWJsZSB1cGRhdGUg anVzdCBsaWtlIGkKYW0gZG9pbmcgd2l0aCBhbGwgdGhlIHBhdGNoZXMgdG8gUkRNQS9PRFAgYW5k IHZhcmlvdXMgR1BVIGRyaXZlcnMuCgoKPiA+IEhlcmUgaXMgYW4gZXhwbGFpbmF0aW9uIG9uIHRo ZSByYXRpb25hbCBmb3IgdGhpcyBwYXRjaHNldDoKPiA+Cj4gPgo+ID4gQ1BVIHBhZ2UgdGFibGUg dXBkYXRlIGNhbiBoYXBwZW5zIGZvciBtYW55IHJlYXNvbnMsIG5vdCBvbmx5IGFzIGEgcmVzdWx0 Cj4gPiBvZiBhIHN5c2NhbGwgKG11bm1hcCgpLCBtcHJvdGVjdCgpLCBtcmVtYXAoKSwgbWFkdmlz ZSgpLCAuLi4pIGJ1dCBhbHNvCj4gPiBhcyBhIHJlc3VsdCBvZiBrZXJuZWwgYWN0aXZpdGllcyAo bWVtb3J5IGNvbXByZXNzaW9uLCByZWNsYWltLCBtaWdyYXRpb24sCj4gPiAuLi4pLgo+ID4KPiA+ IFRoaXMgcGF0Y2ggaW50cm9kdWNlIGEgc2V0IG9mIGVudW1zIHRoYXQgY2FuIGJlIGFzc29jaWF0 ZWQgd2l0aCBlYWNoIG9mCj4gPiB0aGUgZXZlbnRzIHRyaWdnZXJpbmcgYSBtbXUgbm90aWZpZXIu IExhdHRlciBwYXRjaGVzIHRha2UgYWR2YW50YWdlcyBvZgo+ID4gdGhvc2UgZW51bSB2YWx1ZXMu Cj4gPgo+ID4gLSBVTk1BUDogbXVubWFwKCkgb3IgbXJlbWFwKCkKPiA+IC0gQ0xFQVI6IHBhZ2Ug dGFibGUgaXMgY2xlYXJlZCAobWlncmF0aW9uLCBjb21wYWN0aW9uLCByZWNsYWltLCAuLi4pCj4g PiAtIFBST1RFQ1RJT05fVk1BOiBjaGFuZ2UgaW4gYWNjZXNzIHByb3RlY3Rpb25zIGZvciB0aGUg cmFuZ2UKPiA+IC0gUFJPVEVDVElPTl9QQUdFOiBjaGFuZ2UgaW4gYWNjZXNzIHByb3RlY3Rpb25z IGZvciBwYWdlIGluIHRoZSByYW5nZQo+ID4gLSBTT0ZUX0RJUlRZOiBzb2Z0IGRpcnR5bmVzcyB0 cmFja2luZwo+ID4KPiA+IEJlaW5nIGFibGUgdG8gaWRlbnRpZnkgbXVubWFwKCkgYW5kIG1yZW1h cCgpIGZyb20gb3RoZXIgcmVhc29ucyB3aHkgdGhlCj4gPiBwYWdlIHRhYmxlIGlzIGNsZWFyZWQg aXMgaW1wb3J0YW50IHRvIGFsbG93IHVzZXIgb2YgbW11IG5vdGlmaWVyIHRvCj4gPiB1cGRhdGUg dGhlaXIgb3duIGludGVybmFsIHRyYWNraW5nIHN0cnVjdHVyZSBhY2NvcmRpbmdseSAob24gbXVu bWFwIG9yCj4gPiBtcmVtYXAgaXQgaXMgbm90IGxvbmdlciBuZWVkZWQgdG8gdHJhY2sgcmFuZ2Ug b2YgdmlydHVhbCBhZGRyZXNzIGFzIGl0Cj4gPiBiZWNvbWVzIGludmFsaWQpLgo+IAo+IFRoZSBv bmx5IGNvbnRleHQgaW5mb3JtYXRpb24gY29uc3VtZWQgaW4gdGhpcyBwYXRjaCBzZXQgaXMKPiBN TVVfTk9USUZZX1BST1RFQ1RJT05fVk1BLgo+IAo+IFdoYXQgaXMgdGhlIHByYWN0aWNhbCBiZW5l Zml0IG9mIHRoZXNlICJvcHRpbWl6ZSBvdXQgdGhlIGNhc2Ugd2hlbiBhCj4gcmFuZ2UgaXMgdXBk YXRlZCB0byByZWFkIG9ubHkiIG9wdGltaXphdGlvbnM/IEFueSBudW1iZXJzIHRvIHNob3cgdGhp cwo+IGlzIHdvcnRoIHRoZSBjb2RlIHRocmFzaD8KCkl0IGRlcGVuZHMgb24gdGhlIHdvcmtsb2Fk IGZvciBpbnN0YW5jZSBpZiB5b3UgbWFwIHRvIFJETUEgYSBmaWxlCnJlYWQgb25seSBsaWtlIGEg bG9nIGZpbGUgZm9yIGV4cG9ydCwgYWxsIHdyaXRlIGJhY2sgdGhhdCB3b3VsZApkaXNydXB0IHRo ZSBSRE1BIG1hcHBpbmcgY2FuIGJlIG9wdGltaXplZCBvdXQuCgpTZWUgYWJvdmUgZm9yIG1vcmUg cmVhc29ucyB3aHkgaXQgaXMgYmVuZWZpY2lhbCAoa25vd2luZyB3aGVuIGl0IGlzCmFuIG11bm1h cC9tcmVtYXAgdmVyc3VzIHNvbWV0aGluZyBlbHNlKS4KCkkgd291bGQgaGF2ZSBub3QgdGhvdWdo dCB0aGF0IHBhc3NpbmcgZG93biBpbmZvcm1hdGlvbiBhcyBzb21ldGhpbmcKdGhhdCBjb250cm92 ZXJzaWFsLiBIb3BlcyB0aGlzIGhlbHAgeW91IHNlZSB0aGUgYmVuZWZpdCBvZiB0aGlzLgoKQ2hl ZXJzLApKw6lyw7RtZQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwpkcmktZGV2ZWwgbWFpbGluZyBsaXN0CmRyaS1kZXZlbEBsaXN0cy5mcmVlZGVza3RvcC5v cmcKaHR0cHM6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2 ZWwK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 002E5C41518 for ; Wed, 23 Jan 2019 23:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CC8FF218A2 for ; Wed, 23 Jan 2019 23:05:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726792AbfAWXFC (ORCPT ); Wed, 23 Jan 2019 18:05:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34390 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726157AbfAWXFC (ORCPT ); Wed, 23 Jan 2019 18:05:02 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 367F7B2755; Wed, 23 Jan 2019 23:05:01 +0000 (UTC) Received: from redhat.com (ovpn-120-127.rdu2.redhat.com [10.10.120.127]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8DA7C604DB; Wed, 23 Jan 2019 23:04:49 +0000 (UTC) Date: Wed, 23 Jan 2019 18:04:47 -0500 From: Jerome Glisse To: Dan Williams Cc: Ralph Campbell , Jan Kara , Arnd Bergmann , KVM list , Matthew Wilcox , linux-rdma , John Hubbard , Felix Kuehling , Radim =?utf-8?B?S3LEjW3DocWZ?= , Linux Kernel Mailing List , Maling list - DRI developers , Michal Hocko , Linux MM , Jason Gunthorpe , Ross Zwisler , linux-fsdevel , Paolo Bonzini , Andrew Morton , Christian =?iso-8859-1?Q?K=F6nig?= Subject: Re: [PATCH v4 0/9] mmu notifier provide context informations Message-ID: <20190123230447.GC1257@redhat.com> References: <20190123222315.1122-1-jglisse@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 23 Jan 2019 23:05:01 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote: > On Wed, Jan 23, 2019 at 2:23 PM wrote: > > > > From: Jérôme Glisse > > > > Hi Andrew, i see that you still have my event patch in you queue [1]. > > This patchset replace that single patch and is broken down in further > > step so that it is easier to review and ascertain that no mistake were > > made during mechanical changes. Here are the step: > > > > Patch 1 - add the enum values > > Patch 2 - coccinelle semantic patch to convert all call site of > > mmu_notifier_range_init to default enum value and also > > to passing down the vma when it is available > > Patch 3 - update many call site to more accurate enum values > > Patch 4 - add the information to the mmu_notifier_range struct > > Patch 5 - helper to test if a range is updated to read only > > > > All the remaining patches are update to various driver to demonstrate > > how this new information get use by device driver. I build tested > > with make all and make all minus everything that enable mmu notifier > > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd > > gpu and intel gpu. > > > > If they are no objections i believe best plan would be to merge the > > the first 5 patches (all mm changes) through your queue for 5.1 and > > then to delay driver update to each individual driver tree for 5.2. > > This will allow each individual device driver maintainer time to more > > thouroughly test this more then my own testing. > > > > Note that i also intend to use this feature further in nouveau and > > HMM down the road. I also expect that other user like KVM might be > > interested into leveraging this new information to optimize some of > > there secondary page table invalidation. > > "Down the road" users should introduce the functionality they want to > consume. The common concern with preemptively including > forward-looking infrastructure is realizing later that the > infrastructure is not needed, or needs changing. If it has no current > consumer, leave it out. This patchset already show that this is useful, what more can i do ? I know i will use this information, in nouveau for memory policy we allocate our own structure for every vma the GPU ever accessed or that userspace hinted we should set a policy for. Right now with existing mmu notifier i _must_ free those structure because i do not know if the invalidation is an munmap or something else. So i am loosing important informations and unecessarly free struct that i will have to re-allocate just couple jiffies latter. That's one way i am using this. The other way is to optimize GPU page table update just like i am doing with all the patches to RDMA/ODP and various GPU drivers. > > Here is an explaination on the rational for this patchset: > > > > > > CPU page table update can happens for many reasons, not only as a result > > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > > as a result of kernel activities (memory compression, reclaim, migration, > > ...). > > > > This patch introduce a set of enums that can be associated with each of > > the events triggering a mmu notifier. Latter patches take advantages of > > those enum values. > > > > - UNMAP: munmap() or mremap() > > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > > - PROTECTION_VMA: change in access protections for the range > > - PROTECTION_PAGE: change in access protections for page in the range > > - SOFT_DIRTY: soft dirtyness tracking > > > > Being able to identify munmap() and mremap() from other reasons why the > > page table is cleared is important to allow user of mmu notifier to > > update their own internal tracking structure accordingly (on munmap or > > mremap it is not longer needed to track range of virtual address as it > > becomes invalid). > > The only context information consumed in this patch set is > MMU_NOTIFY_PROTECTION_VMA. > > What is the practical benefit of these "optimize out the case when a > range is updated to read only" optimizations? Any numbers to show this > is worth the code thrash? It depends on the workload for instance if you map to RDMA a file read only like a log file for export, all write back that would disrupt the RDMA mapping can be optimized out. See above for more reasons why it is beneficial (knowing when it is an munmap/mremap versus something else). I would have not thought that passing down information as something that controversial. Hopes this help you see the benefit of this. Cheers, Jérôme