From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B63B4CD1284 for ; Thu, 11 Apr 2024 09:46:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=wcaQ7ek5BLFBqGlyGMQL2onuUm27D8Yvl0ojZuZYD2g=; b=cDHvlb/YH4VRZJ q7rA7BJHRFXDclGiFu8HfAcBekfSxXi2tmuknAdJLCQhz8iieczP4cbZKmyeNjvA9vNx6UmgpbIAY H81Ky7MXeCwi+2kniKp1xM8g6RD34lHMd48AIwjH1PGvsozXZjZCOnTiP8IU6jlHU3xyQSpgHYyIs zSZS9tJ87/3IPt7GAp9I0jKEwOFWcaozeJOTAE1j9ToWzdppCz+4rs+91PSoueTBUkIjXPAggw0Mg DpH5uf/7urJRard30Z12M4fn56OniO5o9RpEXPXcnRYEOdbSkAxNuchDr72obGiw4yRiumN5yExTA +FxgY1+ZxosANrqh6KUg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rur05-0000000BNow-2KPi; Thu, 11 Apr 2024 09:45:45 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rur03-0000000BNoK-0M35 for linux-arm-kernel@lists.infradead.org; Thu, 11 Apr 2024 09:45:44 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5AB9E113E; Thu, 11 Apr 2024 02:46:11 -0700 (PDT) Received: from [10.1.38.151] (XHFQ2J9959.cambridge.arm.com [10.1.38.151]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2AF703F6C4; Thu, 11 Apr 2024 02:45:40 -0700 (PDT) Message-ID: <81aa23ca-18b1-4430-9ad1-00a2c5af8fc2@arm.com> Date: Thu, 11 Apr 2024 10:45:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <0ae22147-e1a1-4bcb-8a4c-f900f3f8c39e@redhat.com> <374d8500-4625-4bff-a934-77b5f34cf2ec@arm.com> <8bd9e136-8575-4c40-bae2-9b015d823916@redhat.com> <86680856-2532-495b-951a-ea7b2b93872f@arm.com> <35236bbf-3d9a-40e9-84b5-e10e10295c0c@redhat.com> <4fba71aa-8a63-4a27-8eaf-92a69b2cff0d@arm.com> <5a23518b-7974-4b03-bd6e-80ecf6c39484@redhat.com> From: Ryan Roberts In-Reply-To: <5a23518b-7974-4b03-bd6e-80ecf6c39484@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240411_024543_472039_ADC37886 X-CRM114-Status: GOOD ( 48.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gMTAvMDQvMjAyNCAyMTowOSwgRGF2aWQgSGlsZGVuYnJhbmQgd3JvdGU6Cj4gWy4uLl0KPiAK PiBTa2lwcGluZyB0aGUgcHRkZXNjIHN0dWZmIHdlIGRpc2N1c3NlZCBvZmZsaW5lLCB0byBub3Qg Z2V0IGRpc3RyYWN0ZWQuIDopCj4gCj4gVGhpcyBzdHVmZiBpcyBraWxsaW5nIG1lLCBzb3JyeSBm b3IgdGhlIGxlbmd0aHkgcmVwbHkgLi4uCgpObyBwcm9ibGVtIC0gdGhhbmtzIGZvciB0YWtpbmcg dGhlIHRpbWUgdG8gdGhpbmsgaXQgdGhyb3VnaCBhbmQgcmVwbHkgd2l0aCBzdWNoCmNsYXJpdHku IDopCgo+IAo+Pgo+PiBTbyBJJ3ZlIGJlZW4gbG9va2luZyBhdCBhbGwgdGhpcyBhZ2FpbiwgYW5k IGdldHRpbmcgbXlzZWxmIGV2ZW4gbW9yZSBjb25mdXNlZC4KPj4KPj4gSSBiZWxpZXZlIHRoZXJl IGFyZSAyIGNsYXNzZXMgb2YgcHRlcF9nZXRfbG9ja2xlc3MoKSBjYWxsZXI6Cj4+Cj4+IDEpIHZt Zi0+b3JpZ19wdGUgPSBwdGVwX2dldF9sb2NrbGVzcyh2bWYtPnB0ZSk7IGluIGhhbmRsZV9wdGVf ZmF1bHQoKQo+PiAyKSBldmVyeW9uZSBlbHNlCj4gCj4gTGlrZWx5IG9ubHkgY29tcGxldGVseSBs b2NrbGVzcyBwYWdlIHRhYmxlIHdhbGtlcnMgd2hlcmUgd2UgKmNhbm5vdCogcmVjaGVjawo+IHVu ZGVyIFBUTCBpcyBzcGVjaWFsLiBFc3NlbnRpYWxseSB3aGVyZSB3ZSBkaXNhYmxlIGludGVycnVw dHMgYW5kIHJlbHkgb24gVExCCj4gZmx1c2hlcyB0byBzeW5jIGFnYWluc3QgY29uY3VycmVudCBj aGFuZ2VzLgoKWWVzIGFncmVlZCAtIDIgdHlwZXM7ICJsb2NrbGVzcyB3YWxrZXJzIHRoYXQgbGF0 ZXIgcmVjaGVjayB1bmRlciBQVEwiIGFuZAoibG9ja2xlc3Mgd2Fsa2VycyB0aGF0IG5ldmVyIHRh a2UgdGhlIFBUTCIuCgpEZXRhaWw6IHRoZSBwYXJ0IGFib3V0IGRpc2FibGluZyBpbnRlcnJ1cHRz IGFuZCBUTEIgZmx1c2ggc3luY2luZyBpcwphcmNoLXNwZWNpZmMuIFRoYXQncyBub3QgaG93IGFy bTY0IGRvZXMgaXQgKHRoZSBodyBicm9hZGNhc3RzIHRoZSBUTEJJcykuIEJ1dAp5b3UgbWFrZSB0 aGF0IGNsZWFyIGZ1cnRoZXIgZG93bi4KCj4gCj4gTGV0J3MgdGFrZSBhIGxvb2sgd2hlcmUgcHRl cF9nZXRfbG9ja2xlc3MoKSBjb21lcyBmcm9tOgo+IAo+IGNvbW1pdCAyYTRhMDZkYThhNGI5M2Rk MTg5MTcxZWVkN2E5OWZmZmQzOGY0MmYzCj4gQXV0aG9yOiBQZXRlciBaaWpsc3RyYSA8cGV0ZXJ6 QGluZnJhZGVhZC5vcmc+Cj4gRGF0ZTrCoMKgIEZyaSBOb3YgMTMgMTE6NDE6NDAgMjAyMCArMDEw MAo+IAo+IMKgwqDCoCBtbS9ndXA6IFByb3ZpZGUgZ3VwX2dldF9wdGUoKSBtb3JlIGdlbmVyaWMK PiAKPiDCoMKgwqAgSW4gb3JkZXIgdG8gd3JpdGUgYW5vdGhlciBsb2NrbGVzcyBwYWdlLXRhYmxl IHdhbGtlciwgd2UgbmVlZAo+IMKgwqDCoCBndXBfZ2V0X3B0ZSgpIGV4cG9zZWQuIFdoaWxlIGRv aW5nIHRoYXQsIHJlbmFtZSBpdCB0bwo+IMKgwqDCoCBwdGVwX2dldF9sb2NrbGVzcygpIHRvIG1h dGNoIHRoZSBleGlzdGluZyBwdGVwX2dldCgpIG5hbWluZy4KPiAKPiAKPiBHVVAtZmFzdCwgd2hl biB3ZSB3ZXJlIHN0aWxsIHJlbHlpbmcgb24gVExCIGZsdXNoZXMgdG8gc3luYyBhZ2FpbnN0IEdV UC1mYXN0Lgo+IAo+ICJXaXRoIGdldF91c2VyX3BhZ2VzX2Zhc3QoKSwgd2Ugd2FsayBkb3duIHRo ZSBwYWdldGFibGVzIHdpdGhvdXQgdGFraW5nIGFueQo+IGxvY2tzLsKgIEZvciB0aGlzIHdlIHdv dWxkIGxpa2UgdG8gbG9hZCB0aGUgcG9pbnRlcnMgYXRvbWljYWxseSwgYnV0IHNvbWV0aW1lcwo+ IHRoYXQgaXMgbm90IHBvc3NpYmxlIChlLmcuIHdpdGhvdXQgZXhwZW5zaXZlIGNtcHhjaGc4YiBv biB4ODZfMzIgUEFFKS7CoCBXaGF0IHdlCj4gZG8gaGF2ZSBpcyB0aGUgZ3VhcmFudGVlIHRoYXQg YSBQVEUgd2lsbCBvbmx5IGVpdGhlciBnbyBmcm9tIG5vdCBwcmVzZW50IHRvCj4gcHJlc2VudCwg b3IgcHJlc2VudCB0byBub3QgcHJlc2VudCBvciBib3RoIC0tIGl0IHdpbGwgbm90IHN3aXRjaCB0 byBhIGNvbXBsZXRlbHkKPiBkaWZmZXJlbnQgcHJlc2VudCBwYWdlIHdpdGhvdXQgYSBUTEIgZmx1 c2ggaW4gYmV0d2Vlbjsgc29tZXRoaW5nIGhhdCB3ZSBhcmUKPiBibG9ja2luZyBieSBob2xkaW5n IGludGVycnVwdHMgb2ZmLiIKPiAKPiBMYXRlciwgd2UgYWRkZWQgc3VwcG9ydCBmb3IgR1VQLWZh c3QgdGhhdCBpbnRyb2R1Y2VkIHRoZSAhVExCIHZhcmlhbnQ6Cj4gCj4gY29tbWl0IDI2NjdmNTBl OGI4MTQ1N2ZjYjRhM2RiZTZhZmYzZTgxZWEwMDllMTMKPiBBdXRob3I6IFN0ZXZlIENhcHBlciA8 c3RldmUuY2FwcGVyQGxpbmFyby5vcmc+Cj4gRGF0ZTrCoMKgIFRodSBPY3QgOSAxNToyOToxNCAy MDE0IC0wNzAwCj4gCj4gwqDCoMKgIG1tOiBpbnRyb2R1Y2UgYSBnZW5lcmFsIFJDVSBnZXRfdXNl cl9wYWdlc19mYXN0KCkKPiAKPiBXaXRoIHRoZSBwYXR0ZXJuCj4gCj4gLyoKPiDCoCogSW4gdGhl IGxpbmUgYmVsb3cgd2UgYXJlIGFzc3VtaW5nIHRoYXQgdGhlIHB0ZSBjYW4gYmUgcmVhZAo+IMKg KiBhdG9taWNhbGx5LiBJZiB0aGlzIGlzIG5vdCB0aGUgY2FzZSBmb3IgeW91ciBhcmNoaXRlY3R1 cmUsCj4gwqAqIHBsZWFzZSB3cmFwIHRoaXMgaW4gYSBoZWxwZXIgZnVuY3Rpb24hCj4gwqAqCj4g wqAqIGZvciBhbiBleGFtcGxlIHNlZSBndXBfZ2V0X3B0ZSBpbiBhcmNoL3g4Ni9tbS9ndXAuYwo+ IMKgKi8KPiBwdGVfdCBwdGUgPSBBQ0NFU1NfT05DRSgqcHRlcCk7Cj4gLi4uCj4gaWYgKHVubGlr ZWx5KHB0ZV92YWwocHRlKSAhPSBwdGVfdmFsKCpwdGVwKSkpIHsKPiAuLi4KPiAKPiAKPiBXaGVy ZWJ5IHRoZSBtZW50aW9uZWQgYXJjaC94ODYvbW0vZ3VwLmMgY29kZSBkaWQgYSBzdHJhaWdodCBw dGVfdCBwdGUgPQo+IGd1cF9nZXRfcHRlKHB0ZXApIHdpdGhvdXQgYW55IHJlLXJlYWRpbmcgb2Yg UFRFcy4gVGhlIFBURSB0aGF0IHdhcyByZWFkIHdhcwo+IHJlcXVpcmVkIHRvIGJlIHNhbmUsIHRo aXMgdGhlIGxlbmd0aHkgY29tbWVudCBhYm92ZSBwdGVwX2dldF9sb2NrbGVzcygpIHRoYXQKPiB0 YWxrcyBhYm91dCBUTEIgZmx1c2hlcy4KPiAKPiBUaGUgY29tbWVudCBhYm92ZSBwdGVwX2dldF9s b2NrbGVzcygpIGZvciBDT05GSUdfR1VQX0dFVF9QWFhfTE9XX0hJR0ggaXMgc3RpbGwKPiBmdWxs IG9mIGRldGFpbHMgYWJvdXQgVExCIGZsdXNoZXMgc3luY2luZyBhZ2FpbnN0IEdVUC1mYXN0LiBC dXQgYXMgeW91IG5vdGUsIHdlCj4gdXNlIGl0IGV2ZW4gaW4gY29udGV4dHMgd2hlcmUgd2UgZG9u J3QgZGlzYWJsZSBpbnRlcnJ1cHRzIGFuZCB0aGUgVExCIGZsdXNoCj4gY2FuJ3QgaGVscC4KPiAK PiBXZSBkb24ndCBkaXNhYmxlIGludGVycnVwdHMgZHVyaW5nIHBhZ2UgZmF1bHRzIC4uLiBzbyBt b3N0IG9mIHRoZSB0aGluZ3MKPiBkZXNjcmliZWQgaW4gcHRlcF9nZXRfbG9ja2xlc3MoKSBkb24n dCByZWFsbHkgYXBwbHkuCj4gCj4gVGhhdCdzIGFsc28gdGhlIHJlYXNvbiB3aHkgLi4uCj4+Cj4+ IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHZtZi0+cHRlID0gcHRlX29mZnNldF9t YXAodm1mLT5wbWQsIHZtZi0+YWRkcmVzcyk7Cj4+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgIHZtZi0+b3JpZ19wdGUgPSAqdm1mLT5wdGU7Cj4+ICvCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgIHZtZi0+b3JpZ19wdGUgPSBwdGVwX2dldF9sb2NrbGVzcyh2bWYtPnB0ZSk7Cj4+IMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHZtZi0+ZmxhZ3MgfD0gRkFVTFRfRkxBR19P UklHX1BURV9WQUxJRDsKPj4KPj4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgLyoKPj4g LcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAqIHNvbWUgYXJjaGl0ZWN0dXJlcyBjYW4g aGF2ZSBsYXJnZXIgcHRlcyB0aGFuIHdvcmRzaXplLAo+PiAtwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgICogZS5nLnBwYzQ0eC1kZWZjb25maWcgaGFzIENPTkZJR19QVEVfNjRCSVQ9eSBh bmQKPj4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAqIENPTkZJR18zMkJJVD15LCBz byBSRUFEX09OQ0UgY2Fubm90IGd1YXJhbnRlZSBhdG9taWMKPj4gLcKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoCAqIGFjY2Vzc2VzLsKgIFRoZSBjb2RlIGJlbG93IGp1c3QgbmVlZHMgYSBj b25zaXN0ZW50IHZpZXcKPj4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAqIGZvciB0 aGUgaWZzIGFuZCB3ZSBsYXRlciBkb3VibGUgY2hlY2sgYW55d2F5IHdpdGggdGhlCj4+IC3CoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgKiBwdGwgbG9jayBoZWxkLiBTbyBoZXJlIGEgYmFy cmllciB3aWxsIGRvLgo+PiAtwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgICovCj4+IC3C oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGJhcnJpZXIoKTsKPj4gwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqAgaWYgKHB0ZV9ub25lKHZtZi0+b3JpZ19wdGUpKSB7Cj4gCj4gLi4u IHRoYXQgY29kZSB3YXMgaW4gcGxhY2UuIFdlIHdvdWxkIHBvc3NpYmx5IHJlYWQgZ2FyYmFnZSBQ VEVzLCBidXQgd291bGQKPiByZWNoZWNrICp1bmRlciBQVEwqICh3aGVyZSB0aGUgUFRFIGNhbiBu byBsb25nZXIgY2hhbmdlKSB0aGF0IHdoYXQgd2UgcmVhZAo+IHdhc24ndCBnYXJiYWdlIGFuZCBk aWRuJ3QgY2hhbmdlLgoKQWdyZWVkLgoKPiAKPj4KPj4gLS0gCj4+Cj4+ICgyKSBBbGwgdGhlIG90 aGVyIHVzZXJzIHJlcXVpcmUgdGhhdCBhIHN1YnNldCBvZiB0aGUgcHRlIGZpZWxkcyBhcmUKPj4g c2VsZi1jb25zaXN0ZW50OyBzcGVjaWZpY2FsbHkgdGhleSBkb24ndCBjYXJlIGFib3V0IGFjY2Vz cywgZGlydHksIHVmZmQtd3Agb3IKPj4gc29mdC1kaXJ0eS4gYXJtNjQgY2FuIGd1YXJyYW50ZWUg dGhhdCBhbGwgdGhlIG90aGVyIGJpdHMgYXJlIHNlbGYtY29uc2lzdGVudAo+PiBqdXN0IGJ5IGNh bGxpbmcgcHRlcF9nZXQoKS4KPiAKPiBMZXQncyBmb2N1cyBvbiBhY2Nlc3MrZGlydHkgZm9yIG5v dyA7KQo+IAo+Pgo+PiAtLSAKPj4KPj4gU28sIEknbSBtYWtpbmcgdGhlIGJvbGQgY2xhaW0gdGhh dCBpdCB3YXMgbmV2ZXIgbmVjY2Vzc2FyeSB0byBzcGVjaWFsaXplCj4+IHB0ZV9nZXRfbG9ja2xl c3MoKSBvbiBhcm02NCwgYW5kIEkgKnRoaW5rKiB3ZSBjb3VsZCBqdXN0IGRlbGV0ZSBpdCBzbyB0 aGF0Cj4+IHB0ZXBfZ2V0X2xvY2tsZXNzKCkgcmVzb2x2ZXMgdG8gcHRlcF9nZXQoKSBvbiBhcm02 NC4gVGhhdCBzb2x2ZXMgdGhlIG9yaWdpbmFsCj4+IGFpbSB3aXRob3V0IG5lZWRpbmcgdG8gaW50 cm9kdWNlICJub3JlY2VuY3kiIHZhcmlhbnRzLgo+Pgo+PiBBZGRpdGlvbmFsbHkgSSBwcm9wb3Nl IGRvY3VtZW50aW5nIHB0ZXBfZ2V0X2xvY2tsZXNzKCkgdG8gZGVzY3JpYmUgdGhlIHNldCBvZgo+ PiBmaWVsZHMgdGhhdCBhcmUgZ3VhcnJhbnRlZWQgdG8gYmUgc2VsZi1jb25zaXN0ZW50IGFuZCB0 aGUgcmVtYWluaW5nIGZpZWxkcyB3aGljaAo+PiBhcmUgc2VsZi1jb25zaXN0ZW50IG9ubHkgd2l0 aCBiZXN0LWVmZm9ydC4KPj4KPj4gQ291bGQgaXQgYmUgdGhpcyBlYXN5PyBNeSBoZWFkIGlzIGh1 cnRpbmcuLi4KPiAKPiBJIHRoaW5rIHdoYXQgaGFzIHRvIGhhcHBlbiBpczoKPiAKPiAoMSkgcHRl X2dldF9sb2NrbGVzcygpIG11c3QgcmV0dXJuIHRoZSBzYW1lIHZhbHVlIGFzIHB0ZXBfZ2V0KCkg YXMgbG9uZyBhcyB0aGVyZQo+IGFyZSBubyByYWNlcy4gTm8gcmVtb3ZhbC9hZGRpdGlvbiBvZiBh Y2Nlc3MvZGlydHkgYml0cyBldGMuCgpUb2RheSdzIGFybTY0IHB0ZXBfZ2V0KCkgZ3VhcmFudGVl cyB0aGlzLgoKPiAKPiAoMikgTG9ja2xlc3MgcGFnZSB0YWJsZSB3YWxrZXJzIHRoYXQgbGF0ZXIg dmVyaWZ5IHVuZGVyIHRoZSBQVEwgY2FuIGhhbmRsZQo+IHNlcmlvdXMgImdhcmJhZ2UgUFRFcyIu IFRoaXMgaXMgb3VyIHBhZ2UgZmF1bHQgaGFuZGxlci4KClRoaXMgaXNuJ3QgcmVhbGx5IGEgcHJv cGVydHkgb2YgYSBwdGVwX2dldF9sb2NrbGVzcygpOyBpdHMgYSBzdGF0ZW1lbnQgYWJvdXQgYQpj bGFzcyBvZiB1c2Vycy4gSSBhZ3JlZSB3aXRoIHRoZSBzdGF0ZW1lbnQuCgo+IAo+ICgzKSBMb2Nr bGVzcyBwYWdlIHRhYmxlIHdhbGtlcnMgdGhhdCBjYW5ub3QgdmVyaWZ5IHVuZGVyIFBUTCBjYW5u b3QgaGFuZGxlCj4gYXJiaXRyYXJ5IGdhcmJhZ2UgUFRFcy4gVGhpcyBpcyBHVVAtZmFzdC4gVHdv IG9wdGlvbnM6Cj4gCj4gKDNhKSBwdGVfZ2V0X2xvY2tsZXNzKCkgY2FuIGF0b21pY2FsbHkgcmVh ZCB0aGUgUFRFOiBXZSByZS1jaGVjayBsYXRlciBpZiB0aGUKPiBhdG9taWNhbGx5LXJlYWQgUFRF IGlzIHN0aWxsIHVuY2hhbmdlZCAod2l0aG91dCBQVEwpLiBObyBJUEkgZm9yIFRMQiBmbHVzaGVz Cj4gcmVxdWlyZWQuIFRoaXMgaXMgdGhlIGNvbW1vbiBjYXNlLiBIVyBtaWdodCBjb25jdXJyZW50 bHkgc2V0IGFjY2Vzcy9kaXJ0eSBiaXRzLAo+IHNvIHdlIGNhbiByYWNlIHdpdGggdGhhdC4gQnV0 IHdlIGRvbid0IHJlYWQgZ2FyYmFnZS4KClRvZGF5J3MgYXJtNjQgcHRlcF9nZXQoKSBjYW5ub3Qg Z2FyYW50ZWUgdGhhdCB0aGUgYWNjZXNzL2RpcnR5IGJpdHMgYXJlCmNvbnNpc3RlbnQgZm9yIGNv bnRwdGUgcHRlcy4gVGhhdCdzIHRoZSBiaXQgdGhhdCBjb21wbGljYXRlcyB0aGUgY3VycmVudApw dGVwX2dldF9sb2NrbGVzcygpIGltcGxlbWVudGF0aW9uLgoKQnV0IHRoZSBwb2ludCBJIHdhcyB0 cnlpbmcgdG8gbWFrZSBpcyB0aGF0IEdVUC1mYXN0IGRvZXMgbm90IGFjdHVhbGx5IGNhcmUgYWJv dXQKKmFsbCogdGhlIGZpZWxkcyBiZWluZyBjb25zaXN0ZW50IChlLmcuIGFjY2Vzcy9kaXJ0eSku IFNvIHdlIGNvdWxkIHNwZWMKcHRlX2dldF9sb2NrbGVzcygpIHRvIHNheSB0aGF0ICJhbGwgZmll bGRzIGluIHRoZSByZXR1cm5lZCBwdGUgYXJlIGd1YXJyYW50ZWVkCnRvIGJlIHNlbGYtY29uc2lz dGVudCBleGNlcHQgZm9yIGFjY2VzcyBhbmQgZGlydHkgaW5mb3JtYXRpb24sIHdoaWNoIG1heSBi ZQppbmNvbnNpc3RlbnQgaWYgYSByYWNpbmcgbW9kaWZpY2F0aW9uIG9jY3VyZWQiLgoKVGhpcyBj b3VsZCBtZWFuIHRoYXQgdGhlIGFjY2Vzcy9kaXJ0eSBzdGF0ZSAqZG9lcyogY2hhbmdlIGZvciBh IGdpdmVuIHBhZ2Ugd2hpbGUKR1VQLWZhc3QgaXMgd2Fsa2luZyBpdCwgYnV0IEdVUC1mYXN0ICpk b2Vzbid0KiBkZXRlY3QgdGhhdCBjaGFuZ2UuIEkgKnRoaW5rKgp0aGF0IGZhaWxpbmcgdG8gZGV0 ZWN0IHRoaXMgaXMgYmVuaWduLgoKQXNpZGU6IEdVUC1mYXN0IGN1cnJlbnRseSByZWNoZWNrcyB0 aGUgcHRlIG9yaWdpbmFsbHkgb2J0YWluZWQgd2l0aApwdGVwX2dldF9sb2NrbGVzcygpLCB1c2lu ZyBwdGVwX2dldCgpLiBJcyB0aGF0IGNvcnJlY3Q/IHB0ZXBfZ2V0KCkgbXVzdCBjb25mb3JtCnRv ICgxKSwgc28gZWl0aGVyIGl0IHJldHVybnMgdGhlIHNhbWUgcHRlIG9yIGl0IHJldHVybnMgYSBk aWZmZXJlbnQgcHRlIG9yCmdhcmJhZ2UuIEJ1dCB0aGF0IGdhcmJhZ2UgY291bGQganVzdCBoYXBw ZW4gdG8gYmUgdGhlIHNhbWUgYXMgdGhlIG9yaWdpbmFsbHkKb2J0YWluZWQgcHRlLiBTbyBpbiB0 aGF0IGNhc2UsIGl0IHdvdWxkIGhhdmUgYSBmYWxzZSBtYXRjaC4gSSB0aGluayB0aGlzIG5lZWRz CnRvIGJlIGNoYW5nZWQgdG8gcHRlcF9nZXRfbG9ja2xlc3MoKT8KCj4gCj4gKDNiKSBwdGVfZ2V0 X2xvY2tsZXNzKCkgY2Fubm90IGF0b21pY2FsbHkgcmVhZCB0aGUgUFRFOiBXZSBuZWVkIHNwZWNp YWwgbWFnaWMgdG8KPiByZWFkIHNvbWV3aGF0LXNhbmUgUFRFcyBhbmQgbmVlZCBJUElzIGR1cmlu ZyBUTEIgZmx1c2hlcyB0byBzeW5jIGFnYWluc3Qgc2VyaW91cwo+IFBURSBjaGFuZ2VzIChlLmcu LCBwcmVzZW50IC0+IHByZXNlbnQpLiBUaGlzIGlzIHdlaXJkIHg4Ni1QQUUuCj4gCj4gCj4gSWYg cHRlcF9nZXQoKSBvbiBhcm02NCBjYW4gZG8gKDEpLCAoMikgYW5kICgzYSksIHdlIG1pZ2h0IGJl IGdvb2QuCj4gCj4gTXkgaGVhZCBpcyBodXJ0aW5nIC4uLgo+IAoKCl9fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmxpbnV4LWFybS1rZXJuZWwgbWFpbGluZyBs aXN0CmxpbnV4LWFybS1rZXJuZWxAbGlzdHMuaW5mcmFkZWFkLm9yZwpodHRwOi8vbGlzdHMuaW5m cmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LWFybS1rZXJuZWwK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 754A5CD1284 for ; Thu, 11 Apr 2024 09:45:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E13396B0085; Thu, 11 Apr 2024 05:45:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D752D6B0087; Thu, 11 Apr 2024 05:45:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEF066B0088; Thu, 11 Apr 2024 05:45:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9CD736B0085 for ; Thu, 11 Apr 2024 05:45:45 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E970C1C0FD0 for ; Thu, 11 Apr 2024 09:45:44 +0000 (UTC) X-FDA: 81996769008.30.681EF04 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id D378A18000E for ; Thu, 11 Apr 2024 09:45:42 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712828743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Oogbda0Ddkkzy12Ze/4J4kddlwyjEPK7tdX2DJoHYR0=; b=EO/2/thNsPUQQ/yBLvvk6xrPm7oujP6biVdh6Gln50mzz+ETCMkWzuF5BXGbGZ3hHMgz3K +w/VMQvzOoFI+4WD1WI203la/UHb9yFAcMzkHpz8ZEOxckDp5xdXyEYw9xizcTkYXsY4fV djgaxozoicwg9JJBZtUx8ttP24LBbvU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712828743; a=rsa-sha256; cv=none; b=Yf0Gy+BxQy0EoUiyoS/IoOVNqsgq96Ef4t17RuPDdGXyQnloHv3BEzEdE2UMAxF3Wub3uA bCVs6cHPUl5UGTgKh09U0CS7rYYaVCygAHPBuYQxuUFMLfHZEkNl897E+hqYlD4N4SY5c6 eBxXKojI0MzNuF5ZO1QJsNKvqQIsh2Y= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5AB9E113E; Thu, 11 Apr 2024 02:46:11 -0700 (PDT) Received: from [10.1.38.151] (XHFQ2J9959.cambridge.arm.com [10.1.38.151]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2AF703F6C4; Thu, 11 Apr 2024 02:45:40 -0700 (PDT) Message-ID: <81aa23ca-18b1-4430-9ad1-00a2c5af8fc2@arm.com> Date: Thu, 11 Apr 2024 10:45:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <0ae22147-e1a1-4bcb-8a4c-f900f3f8c39e@redhat.com> <374d8500-4625-4bff-a934-77b5f34cf2ec@arm.com> <8bd9e136-8575-4c40-bae2-9b015d823916@redhat.com> <86680856-2532-495b-951a-ea7b2b93872f@arm.com> <35236bbf-3d9a-40e9-84b5-e10e10295c0c@redhat.com> <4fba71aa-8a63-4a27-8eaf-92a69b2cff0d@arm.com> <5a23518b-7974-4b03-bd6e-80ecf6c39484@redhat.com> From: Ryan Roberts In-Reply-To: <5a23518b-7974-4b03-bd6e-80ecf6c39484@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: bwts4dw83i94huffwi86cs9grr7dxuxa X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D378A18000E X-HE-Tag: 1712828742-700437 X-HE-Meta: U2FsdGVkX190MDrde8Tzr7Ux6sYzQVBOBq+E8XsTrAOu1AdRvhAf7x9st3Ts2gUXMP/H9jWY9qdjTvSyFVgMFSBkw/AfY0JS+PkspXc1QvK+TJS+XWCx/xgYvumMQsA2pYNRDE8I+wHO+BFuvw8h2nNFijL1YAvYqNK7uXrmOHCzl+yCD8c2y6W/E3qF1XH7UfYscSHVfRSeeZyXYf7w+u32+cS1RC/Ilt+W+Oe0liTbhsiks9TYBiVHC3Vh8/3mW/P6tmaEg2pvq7wqrteXcF9AU75Mj40dkAypiB2WX1PkuW7vW7UKkyzKjWBQIzJeCgudfvzlpBkqU0s39fKdG26Iog3+69DqErdBtwDZbMsxOZYeUZwvGAIhiH4bs2TTiM7eeKkJM5PhR21O3dqZaLD+SilAr+VHGH/kzjZEsZP08ehCUNA2Um/0cKprTw4CLSrVj0Ir8udjyQseW5K68yYn0bnpOYAbLyU0mTQdnFFJNKdhq/d0zHypDAo1ZiMW+rAYOPFSPH22muRvXSWncUHpGfsPTTc7QboWYOz0aW5u2Y+1Im8pCR+FI8dY0K7ML3ndeVOUh3Jt6DehKHpIhnqNXxphWSzUCo4OkRGzS/oP+O88xddbd2mZkvrcSjoGs1l2UoVft2xMi7rRD+ewVImzzwWRiq+fE/mi9vuDYzmZ27u1pBNbY+sw3PPqqYtg1bSkVEsfqpynGiBtpn+hU16cj219i+dq8ZmZxPywdvL8w3ESZEdTR76Ivf/2C4rhcj9yT61S2GiS1MtrdYGoXYvfUTJ79wu1Ek0YIlQ5/LMX/ICWvno8hjMiKpEdae+Dl4FaW8oZrSrgxORgkiTIsk0D9h1vDBz/ut+7Qn/W6xh1qWYuLiWpnwn9KNSzaloh048Cb26HIYOS6MO1r712RVQLuIOFZiIMIm5xonw88UOYLFgAZ7RaV8hwXIHmJwH1dim6kgkBcEF72WmBDx6 H698gXS+ SXG/8vM7s4Il3H1DSW1GW7zzI5uHRDjIG/N+4VJCZnskzD2EYOzuLPim/4xl0dzIHkXhO1ZgRhxJJO0ihoZj/8Gq2/ZYL3SQvdUsz6scCzfHWKy1Pt+e9TUschrG/SdbTlyqkmZPJpUAdbOnrKSQ5XalBDths/0ies34zxmmCKZSvInB0fhbjkEA9dFmvipqciIP5xk8/YaUGTxSBFSVang+B0xOChzGE7QVsYYQE+Ez8b7YQ4MJ2WMSo9olHOoPWtGuR64YFEwEPVerXHPtKjb48NNNO4+b+DH9XCbH6aFiqQ7A2FjFe2PuIu3FPj3HWl9saMk6/QdH/B54fvicL8G2ubw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/04/2024 21:09, David Hildenbrand wrote: > [...] > > Skipping the ptdesc stuff we discussed offline, to not get distracted. :) > > This stuff is killing me, sorry for the lengthy reply ... No problem - thanks for taking the time to think it through and reply with such clarity. :) > >> >> So I've been looking at all this again, and getting myself even more confused. >> >> I believe there are 2 classes of ptep_get_lockless() caller: >> >> 1) vmf->orig_pte = ptep_get_lockless(vmf->pte); in handle_pte_fault() >> 2) everyone else > > Likely only completely lockless page table walkers where we *cannot* recheck > under PTL is special. Essentially where we disable interrupts and rely on TLB > flushes to sync against concurrent changes. Yes agreed - 2 types; "lockless walkers that later recheck under PTL" and "lockless walkers that never take the PTL". Detail: the part about disabling interrupts and TLB flush syncing is arch-specifc. That's not how arm64 does it (the hw broadcasts the TLBIs). But you make that clear further down. > > Let's take a look where ptep_get_lockless() comes from: > > commit 2a4a06da8a4b93dd189171eed7a99fffd38f42f3 > Author: Peter Zijlstra > Date:   Fri Nov 13 11:41:40 2020 +0100 > >     mm/gup: Provide gup_get_pte() more generic > >     In order to write another lockless page-table walker, we need >     gup_get_pte() exposed. While doing that, rename it to >     ptep_get_lockless() to match the existing ptep_get() naming. > > > GUP-fast, when we were still relying on TLB flushes to sync against GUP-fast. > > "With get_user_pages_fast(), we walk down the pagetables without taking any > locks.  For this we would like to load the pointers atomically, but sometimes > that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE).  What we > do have is the guarantee that a PTE will only either go from not present to > present, or present to not present or both -- it will not switch to a completely > different present page without a TLB flush in between; something hat we are > blocking by holding interrupts off." > > Later, we added support for GUP-fast that introduced the !TLB variant: > > commit 2667f50e8b81457fcb4a3dbe6aff3e81ea009e13 > Author: Steve Capper > Date:   Thu Oct 9 15:29:14 2014 -0700 > >     mm: introduce a general RCU get_user_pages_fast() > > With the pattern > > /* >  * In the line below we are assuming that the pte can be read >  * atomically. If this is not the case for your architecture, >  * please wrap this in a helper function! >  * >  * for an example see gup_get_pte in arch/x86/mm/gup.c >  */ > pte_t pte = ACCESS_ONCE(*ptep); > ... > if (unlikely(pte_val(pte) != pte_val(*ptep))) { > ... > > > Whereby the mentioned arch/x86/mm/gup.c code did a straight pte_t pte = > gup_get_pte(ptep) without any re-reading of PTEs. The PTE that was read was > required to be sane, this the lengthy comment above ptep_get_lockless() that > talks about TLB flushes. > > The comment above ptep_get_lockless() for CONFIG_GUP_GET_PXX_LOW_HIGH is still > full of details about TLB flushes syncing against GUP-fast. But as you note, we > use it even in contexts where we don't disable interrupts and the TLB flush > can't help. > > We don't disable interrupts during page faults ... so most of the things > described in ptep_get_lockless() don't really apply. > > That's also the reason why ... >> >>                  vmf->pte = pte_offset_map(vmf->pmd, vmf->address); >> -               vmf->orig_pte = *vmf->pte; >> +               vmf->orig_pte = ptep_get_lockless(vmf->pte); >>                  vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; >> >> -               /* >> -                * some architectures can have larger ptes than wordsize, >> -                * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and >> -                * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic >> -                * accesses.  The code below just needs a consistent view >> -                * for the ifs and we later double check anyway with the >> -                * ptl lock held. So here a barrier will do. >> -                */ >> -               barrier(); >>                  if (pte_none(vmf->orig_pte)) { > > ... that code was in place. We would possibly read garbage PTEs, but would > recheck *under PTL* (where the PTE can no longer change) that what we read > wasn't garbage and didn't change. Agreed. > >> >> -- >> >> (2) All the other users require that a subset of the pte fields are >> self-consistent; specifically they don't care about access, dirty, uffd-wp or >> soft-dirty. arm64 can guarrantee that all the other bits are self-consistent >> just by calling ptep_get(). > > Let's focus on access+dirty for now ;) > >> >> -- >> >> So, I'm making the bold claim that it was never neccessary to specialize >> pte_get_lockless() on arm64, and I *think* we could just delete it so that >> ptep_get_lockless() resolves to ptep_get() on arm64. That solves the original >> aim without needing to introduce "norecency" variants. >> >> Additionally I propose documenting ptep_get_lockless() to describe the set of >> fields that are guarranteed to be self-consistent and the remaining fields which >> are self-consistent only with best-effort. >> >> Could it be this easy? My head is hurting... > > I think what has to happen is: > > (1) pte_get_lockless() must return the same value as ptep_get() as long as there > are no races. No removal/addition of access/dirty bits etc. Today's arm64 ptep_get() guarantees this. > > (2) Lockless page table walkers that later verify under the PTL can handle > serious "garbage PTEs". This is our page fault handler. This isn't really a property of a ptep_get_lockless(); its a statement about a class of users. I agree with the statement. > > (3) Lockless page table walkers that cannot verify under PTL cannot handle > arbitrary garbage PTEs. This is GUP-fast. Two options: > > (3a) pte_get_lockless() can atomically read the PTE: We re-check later if the > atomically-read PTE is still unchanged (without PTL). No IPI for TLB flushes > required. This is the common case. HW might concurrently set access/dirty bits, > so we can race with that. But we don't read garbage. Today's arm64 ptep_get() cannot garantee that the access/dirty bits are consistent for contpte ptes. That's the bit that complicates the current ptep_get_lockless() implementation. But the point I was trying to make is that GUP-fast does not actually care about *all* the fields being consistent (e.g. access/dirty). So we could spec pte_get_lockless() to say that "all fields in the returned pte are guarranteed to be self-consistent except for access and dirty information, which may be inconsistent if a racing modification occured". This could mean that the access/dirty state *does* change for a given page while GUP-fast is walking it, but GUP-fast *doesn't* detect that change. I *think* that failing to detect this is benign. Aside: GUP-fast currently rechecks the pte originally obtained with ptep_get_lockless(), using ptep_get(). Is that correct? ptep_get() must conform to (1), so either it returns the same pte or it returns a different pte or garbage. But that garbage could just happen to be the same as the originally obtained pte. So in that case, it would have a false match. I think this needs to be changed to ptep_get_lockless()? > > (3b) pte_get_lockless() cannot atomically read the PTE: We need special magic to > read somewhat-sane PTEs and need IPIs during TLB flushes to sync against serious > PTE changes (e.g., present -> present). This is weird x86-PAE. > > > If ptep_get() on arm64 can do (1), (2) and (3a), we might be good. > > My head is hurting ... >