From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qLPW6-004Hkt-0P for linux-arm-kernel@lists.infradead.org; Mon, 17 Jul 2023 14:49:02 +0000 Message-ID: <4f89d7bf-2fe2-fa53-c7ca-e4f152ca0edf@arm.com> Date: Mon, 17 Jul 2023 15:47:52 +0100 MIME-Version: 1.0 Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> <2c4b2a41-1c98-0782-ac30-80e65bdb2b0c@arm.com> <2e7d5692-8ba7-1e56-a03f-449f1671b100@redhat.com> From: Ryan Roberts In-Reply-To: <2e7d5692-8ba7-1e56-a03f-449f1671b100@redhat.com> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+lwn-linux-arm-kernel=archive.lwn.net@lists.infradead.org List-Archive: To: David Hildenbrand , Yu Zhao Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org T24gMTcvMDcvMjAyMyAxNDo1NiwgRGF2aWQgSGlsZGVuYnJhbmQgd3JvdGU6Cj4gT24gMTcuMDcu MjMgMTU6MjAsIFJ5YW4gUm9iZXJ0cyB3cm90ZToKPj4gT24gMTcvMDcvMjAyMyAxNDowNiwgRGF2 aWQgSGlsZGVuYnJhbmQgd3JvdGU6Cj4+PiBPbiAxNC4wNy4yMyAxOToxNywgWXUgWmhhbyB3cm90 ZToKPj4+PiBPbiBGcmksIEp1bCAxNCwgMjAyMyBhdCAxMDoxN+KAr0FNIFJ5YW4gUm9iZXJ0cyA8 cnlhbi5yb2JlcnRzQGFybS5jb20+IHdyb3RlOgo+Pj4+Pgo+Pj4+PiBJbnRyb2R1Y2UgRkxFWElC TEVfVEhQIGZlYXR1cmUsIHdoaWNoIGFsbG93cyBhbm9ueW1vdXMgbWVtb3J5IHRvIGJlCj4+Pj4+ IGFsbG9jYXRlZCBpbiBsYXJnZSBmb2xpb3Mgb2YgYSBkZXRlcm1pbmVkIG9yZGVyLiBBbGwgcGFn ZXMgb2YgdGhlIGxhcmdlCj4+Pj4+IGZvbGlvIGFyZSBwdGUtbWFwcGVkIGR1cmluZyB0aGUgc2Ft ZSBwYWdlIGZhdWx0LCBzaWduaWZpY2FudGx5IHJlZHVjaW5nCj4+Pj4+IHRoZSBudW1iZXIgb2Yg cGFnZSBmYXVsdHMuIFRoZSBudW1iZXIgb2YgcGVyLXBhZ2Ugb3BlcmF0aW9ucyAoZS5nLiByZWYK Pj4+Pj4gY291bnRpbmcsIHJtYXAgbWFuYWdlbWVudCBscnUgbGlzdCBtYW5hZ2VtZW50KSBhcmUg YWxzbyBzaWduaWZpY2FudGx5Cj4+Pj4+IHJlZHVjZWQgc2luY2UgdGhvc2Ugb3BzIG5vdyBiZWNv bWUgcGVyLWZvbGlvLgo+Pj4+Pgo+Pj4+PiBUaGUgbmV3IGJlaGF2aW91ciBpcyBoaWRkZW4gYmVo aW5kIHRoZSBuZXcgRkxFWElCTEVfVEhQIEtjb25maWcsIHdoaWNoCj4+Pj4+IGRlZmF1bHRzIHRv IGRpc2FibGVkIGZvciBub3c7IFRoZSBsb25nIHRlcm0gYWltIGlzIGZvciB0aGlzIHRvIGRlZmF1 dCB0bwo+Pj4+PiBlbmFibGVkLCBidXQgdGhlcmUgYXJlIHNvbWUgcmlza3MgYXJvdW5kIGludGVy bmFsIGZyYWdtZW50YXRpb24gdGhhdAo+Pj4+PiBuZWVkIHRvIGJlIGJldHRlciB1bmRlcnN0b29k IGZpcnN0Lgo+Pj4+Pgo+Pj4+PiBXaGVuIGVuYWJsZWQsIHRoZSBmb2xpbyBvcmRlciBpcyBkZXRl cm1pbmVkIGFzIHN1Y2g6IEZvciBhIHZtYSwgcHJvY2Vzcwo+Pj4+PiBvciBzeXN0ZW0gdGhhdCBo YXMgZXhwbGljaXRseSBkaXNhYmxlZCBUSFAsIHdlIGNvbnRpbnVlIHRvIGFsbG9jYXRlCj4+Pj4+ IG9yZGVyLTAuIFRIUCBpcyBtb3N0IGxpa2VseSBkaXNhYmxlZCB0byBhdm9pZCBhbnkgcG9zc2li bGUgaW50ZXJuYWwKPj4+Pj4gZnJhZ21lbnRhdGlvbiBzbyB3ZSBob25vdXIgdGhhdCByZXF1ZXN0 Lgo+Pj4+Pgo+Pj4+PiBPdGhlcndpc2UsIHRoZSByZXR1cm4gdmFsdWUgb2YgYXJjaF93YW50c19w dGVfb3JkZXIoKSBpcyB1c2VkLiBGb3Igdm1hcwo+Pj4+PiB0aGF0IGhhdmUgbm90IGV4cGxpY2l0 bHkgb3B0ZWQtaW4gdG8gdXNlIHRyYW5zcGFyZW50IGh1Z2VwYWdlcyAoZS5nLgo+Pj4+PiB3aGVy ZSB0aHA9bWFkdmlzZSBhbmQgdGhlIHZtYSBkb2VzIG5vdCBoYXZlIE1BRFZfSFVHRVBBR0UpLCB0 aGVuCj4+Pj4+IGFyY2hfd2FudHNfcHRlX29yZGVyKCkgaXMgbGltaXRlZCBieSB0aGUgbmV3IGNt ZGxpbmUgcGFyYW1ldGVyLAo+Pj4+PiBgZmxleHRocF91bmhpbnRlZF9tYXhgLiBUaGlzIGFsbG93 cyBmb3IgYSBwZXJmb3JtYW5jZSBib29zdCB3aXRob3V0Cj4+Pj4+IHJlcXVpcmluZyBhbnkgZXhw bGljaXQgb3B0LWluIGZyb20gdGhlIHdvcmtsb2FkIHdoaWxlIGFsbG93aW5nIHRoZQo+Pj4+PiBz eXNhZG1pbiB0byB0dW5lIGJldHdlZW4gcGVyZm9ybWFuY2UgYW5kIGludGVybmFsIGZyYWdtZW50 YXRpb24uCj4+Pj4+Cj4+Pj4+IGFyY2hfd2FudHNfcHRlX29yZGVyKCkgY2FuIGJlIG92ZXJyaWRk ZW4gYnkgdGhlIGFyY2hpdGVjdHVyZSBpZiBkZXNpcmVkLgo+Pj4+PiBTb21lIGFyY2hpdGVjdHVy ZXMgKGUuZy4gYXJtNjQpIGNhbiBjb2Fsc2VjZSBUTEIgZW50cmllcyBpZiBhIGNvbnRpZ3VvdXMK Pj4+Pj4gc2V0IG9mIHB0ZXMgbWFwIHBoeXNpY2FsbHkgY29udGlnaW91cywgbmF0dXJhbGx5IGFs aWduZWQgbWVtb3J5LCBzbyB0aGlzCj4+Pj4+IG1lY2hhbmlzbSBhbGxvd3MgdGhlIGFyY2hpdGVj dHVyZSB0byBvcHRpbWl6ZSBhcyByZXF1aXJlZC4KPj4+Pj4KPj4+Pj4gSWYgdGhlIHByZWZlcnJl ZCBvcmRlciBjYW4ndCBiZSB1c2VkIChlLmcuIGJlY2F1c2UgdGhlIGZvbGlvIHdvdWxkCj4+Pj4+ IGJyZWFjaCB0aGUgYm91bmRzIG9mIHRoZSB2bWEsIG9yIGJlY2F1c2UgcHRlcyBpbiB0aGUgcmVn aW9uIGFyZSBhbHJlYWR5Cj4+Pj4+IG1hcHBlZCkgdGhlbiB3ZSBmYWxsIGJhY2sgdG8gYSBzdWl0 YWJsZSBsb3dlciBvcmRlcjsgZmlyc3QKPj4+Pj4gUEFHRV9BTExPQ19DT1NUTFlfT1JERVIsIHRo ZW4gb3JkZXItMC4KPj4+Pj4KPj4+Pj4gU2lnbmVkLW9mZi1ieTogUnlhbiBSb2JlcnRzIDxyeWFu LnJvYmVydHNAYXJtLmNvbT4KPj4+Pj4gLS0tCj4+Pj4+IMKgwqAgLi4uL2FkbWluLWd1aWRlL2tl cm5lbC1wYXJhbWV0ZXJzLnR4dMKgwqDCoMKgwqDCoMKgwqAgfMKgIDEwICsKPj4+Pj4gwqDCoCBt bS9LY29uZmlnwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCB8wqAgMTAgKwo+Pj4+PiDCoMKgIG1tL21lbW9yeS5jwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqAgfCAxODcgKysrKysrKysrKysrKysrKy0tCj4+Pj4+IMKgwqAgMyBmaWxlcyBjaGFu Z2VkLCAxOTAgaW5zZXJ0aW9ucygrKSwgMTcgZGVsZXRpb25zKC0pCj4+Pj4+Cj4+Pj4+IGRpZmYg LS1naXQgYS9Eb2N1bWVudGF0aW9uL2FkbWluLWd1aWRlL2tlcm5lbC1wYXJhbWV0ZXJzLnR4dAo+ Pj4+PiBiL0RvY3VtZW50YXRpb24vYWRtaW4tZ3VpZGUva2VybmVsLXBhcmFtZXRlcnMudHh0Cj4+ Pj4+IGluZGV4IGExNDU3OTk1ZmQ0MS4uNDA1ZDYyNGUyMTkxIDEwMDY0NAo+Pj4+PiAtLS0gYS9E b2N1bWVudGF0aW9uL2FkbWluLWd1aWRlL2tlcm5lbC1wYXJhbWV0ZXJzLnR4dAo+Pj4+PiArKysg Yi9Eb2N1bWVudGF0aW9uL2FkbWluLWd1aWRlL2tlcm5lbC1wYXJhbWV0ZXJzLnR4dAo+Pj4+PiBA QCAtMTQ5Nyw2ICsxNDk3LDE2IEBACj4+Pj4+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgIFNlZSBEb2N1bWVudGF0aW9uL2FkbWluLWd1aWRlL3N5c2N0 bC9uZXQucnN0IGZvcgo+Pj4+PiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoCBmYl90dW5uZWxzX29ubHlfZm9yX2luaXRfbnMKPj4+Pj4KPj4+Pj4gK8Kg wqDCoMKgwqDCoCBmbGV4dGhwX3VuaGludGVkX21heD0KPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIFtLTkxdIFJlcXVpcmVzIENPTkZJR19GTEVYSUJM RV9USFAgZW5hYmxlZC4gVGhlCj4+Pj4+IG1heGltdW0KPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGZvbGlvIHNpemUgdGhhdCB3aWxsIGJlIGFsbG9j YXRlZCBmb3IgYW4gYW5vbnltb3VzIHZtYQo+Pj4+PiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqAgdGhhdCBoYXMgbmVpdGhlciBleHBsaWNpdGx5IG9wdGVkIGlu IG5vciBvdXQgb2YgdXNpbmcKPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgIHRyYW5zcGFyZW50IGh1Z2VwYWdlcy4gVGhlIHNpemUgbXVzdCBiZSBhCj4+ Pj4+IHBvd2VyLW9mLTIgaW4KPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgIHRoZSByYW5nZSBbUEFHRV9TSVpFLCBQTURfU0laRSkuIEEgbGFyZ2VyIHNp emUKPj4+Pj4gaW1wcm92ZXMKPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgIHBlcmZvcm1hbmNlIGJ5IHJlZHVjaW5nIHBhZ2UgZmF1bHRzLCB3aGlsZSBh IHNtYWxsZXIKPj4+Pj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgIHNpemUgcmVkdWNlcyBpbnRlcm5hbCBmcmFnbWVudGF0aW9uLiBEZWZhdWx0OiBtYXgoNjRL LAo+Pj4+PiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgUEFH RV9TSVpFKS4gRm9ybWF0OiBzaXplW0tNR10uCj4+Pj4+ICsKPj4+Pgo+Pj4+IExldCdzIHNwbGl0 IHRoaXMgcGFyYW1ldGVyIGludG8gYSBzZXBhcmF0ZSBwYXRjaC4KPj4+Pgo+Pj4KPj4+IEp1c3Qg YSBnZW5lcmFsIGNvbW1lbnQgYWZ0ZXIgc3R1bWJsaW5nIG92ZXIgcGF0Y2ggIzIsIGxldCdzIG5v dCBzdGFydCBzcGxpdHRpbmcKPj4+IHBhdGNoZXMgaW50byB0aGluZ3MgdGhhdCBkb24ndCBtYWtl IGFueSBzZW5zZSBvbiB0aGVpciBvd247IHRoYXQganVzdCBtYWtlcwo+Pj4gcmV2aWV3IGEgbG90 IGhhcmRlci4KPj4KPj4gQUNLCj4+Cj4+Pgo+Pj4gRm9yIHRoaXMgY2FzZSBoZXJlLCBJJ2Qgc3Vn Z2VzdCBmaXJzdCBhZGRpbmcgdGhlIGdlbmVyYWwgaW5mcmFzdHJ1Y3R1cmUgYW5kIHRoZW4KPj4+ IGFkZGluZyB0dW5hYmxlcyB3ZSB3YW50IHRvIGhhdmUgb24gdG9wLgo+Pgo+PiBPSywgc28gMSBw YXRjaCBmb3IgdGhlIG1haW4gaW5mcmFzdHJ1Y3R1cmUsIHRoZW4gYSBwYXRjaCB0byBkaXNhYmxl IGZvcgo+PiBNQURWX05PSFVHRVBBR0UgYW5kIGZyaWVuZHMsIHRoZW4gYSBmdXJ0aGVyIHBhdGNo IHRvIHNldCBmbGV4dGhwX3VuaGludGVkX21heAo+PiB2aWEgYSBzeXNjdGw/Cj4gCj4gTUFEVl9O T0hVR0VQQUdFIGhhbmRsaW5nIGZvciBtZSBmYWxscyB1bmRlciB0aGUgY2F0ZWdvcnkgInJlcXVp cmVkIGZvcgo+IGNvcnJlY3RuZXNzIHRvIG5vdCBicmVhayBleGlzdGluZyB3b3JrbG9hZHMiIGFu ZCBoYXMgdG8gYmUgdGhlcmUgaW5pdGlhbGx5Lgo+IAo+IEFueXRoaW5nIHRoYXQgaXMgcmF0aGVy IGEgcGVyZm9ybWFuY2UgdHVuYWJsZSAoZS5nLiwgYSBzeXNjdGwgdG8gb3B0aW1pemUpIGNhbgo+ IGJlIGFkZGVkIG9uIHRvcCBhbmQgZGlzY3Vzc2VkIHNlcGFyYXRlbHkuPgo+IEF0IGxlYXN0IElN SE8gOikKPiAKPj4KPj4+Cj4+PiBJIGFncmVlIHRoYXQgdG9nZ2xpbmcgdGhhdCBhdCBydW50aW1l IChmb3IgZXhhbXBsZSB2aWEgc3lzZnMgYXMgcmFpc2VkIGJ5IG1lCj4+PiBwcmV2aW91c2x5KSB3 b3VsZCBiZSBuaWNlci4KPj4KPj4gT0ssIEkgY2xlYXJseSBtaXN1bmRlcnN0b29kLCBJIHRob3Vn aHQgeW91IHdlcmUgcmVxdWVzdGluZyBhIGJvb3QgcGFyYW1ldGVyLgo+IAo+IE9oLCBzb3JyeSBh Ym91dCB0aGF0LiBJIHdhbnRlZCB0byBhY3R1YWxseSBleHByZXNzCj4gIi9zeXMva2VybmVsL21t L3RyYW5zcGFyZW50X2h1Z2VwYWdlLyIgc3lzY3RscyB3aGVyZSB3ZSBjYW4gdG9nZ2xlIHRoYXQg bGF0ZXIgYXQKPiBydW50aW1lIGFzIHdlbGwuCj4gCj4+IFdoYXQncyB0aGUgQUJJIGNvbXBhdCBn dWFycmFudGVlIGZvciBzeXNjdGxzPyBJIGFzc3VtZWQgdGhhdCBmb3IgYSBib290Cj4+IHBhcmFt ZXRlciBpdCB3b3VsZCBiZSBlYXNpZXIgdG8gcmVtb3ZlIGluIGZ1dHVyZSBpZiB3ZSB3YW50ZWQs IGJ1dCBmb3Igc3lzY3RsLAo+PiBpdHMgdGhlcmUgZm9yZXZlcj8KPiAKPiBzeXNjdGwgYXJlIGhh cmQvaW1wb3NzaWJsZSB0byByZW1vdmUsIHllcy4gU28gd2UgYmV0dGVyIG1ha2Ugc3VyZSB3aGF0 IHdlIGFkZAo+IGhhcyBjbGVhciBzZW1hbnRpY3MuCj4gCj4gSWYgd2UgZXZlciB3YW50IHNvbWUg cmVhbCBhdXRvLXR1bmFibGUgbW9kZSAoYW5kIGNhbiBhY3R1YWxseSBpbXBsZW1lbnQgaXQKPiB3 aXRob3V0IGhhcm1pbmcgcGVyZm9ybWFuY2U7IGFuZCBJIGFtIHNrZXB0aWNhbCksIHdlIG1pZ2h0 IHdhbnQgdG8gYWxsb3cgZm9yCj4gc2V0dGluZyBzdWNoIGEgcGFyYW1ldGVyIHRvICJhdXRvIiwg Zm9yIGV4YW1wbGUuCj4gCj4+Cj4+IEFsc28sIGhvdyBkbyB5b3UgZmVlbCBhYm91dCB0aGUgbmFt aW5nIGFuZCBiZWhhdmlvciBvZiB0aGUgcGFyYW1ldGVyPwo+IAo+IFZlcnkgZ29vZCBxdWVzdGlv bi4gImZsZXh0aHBfdW5oaW50ZWRfbWF4IiBuYW1pbmcgaXMgYSBiaXQgc3Vib3B0aW1hbC4KPiAK PiBGb3IgZXhhbXBsZSwgSSdtIG5vdCBzbyBzdXJlIGlmIHdlIHNob3VsZCBleHBvc2UgdGhlIGZl YXR1cmUgdG8gdXNlciBzcGFjZSBhcwo+ICJmbGV4dGhwIiBhdCBhbGwuIEkgdGhpbmsgd2Ugc2hv dWxkIGZpbmQgYSBjbGVhcmVyIGZlYXR1cmUgbmFtZSB0byBiZWdpbiB3aXRoLgo+IAo+IC4uLiBt YXliZSB3ZSBjYW4gaW5pdGlhbGx5IGdldCBhd2F5IHdpdGggZHJvcHBpbmcgdGhhdCBwYXJhbWV0 ZXIgYW5kIGRlZmF1bHQgdG8KPiBzb21ldGhpbmcgcmVhc29uYWJseSBzbWFsbCAoaS5lLiwgNjRr IGFzIHlvdSBoYXZlIGFib3ZlKT8KClRoYXQgd291bGQgY2VydGFpbmx5IGdldCBteSB2b3RlLiBC dXQgaXQgd2FzIHlvdSB3aG8gd2FzIGFyZ3VpbmcgZm9yIGEgdHVuYWJsZQpwcmV2aW91c2x5IDst KS4gSSBwcm9wb3NlIHdlIHVzZSB0aGUgZm9sbG93aW5nIGFzIHRoZSAidW5oaW50ZWQgY2VpbGlu ZyIgZm9yCm5vdywgdGhlbiB3ZSBjYW4gYWRkIGEgdHVuYWJsZSBpZi93aGVuIHdlIGZpbmQgYSB1 c2UgY2FzZSB0aGF0IGRvZXNuJ3Qgd29yawpvcHRpbWFsbHkgd2l0aCB0aGlzIHZhbHVlOgoKCXN0 YXRpYyBpbnQgZmxleHRocF91bmhpbnRlZF9tYXhfb3JkZXIgPQoJCWlsb2cyKFNaXzY0SyA+IFBB R0VfU0laRSA/IFNaXzY0SyA6IFBBR0VfU0laRSkgLSBQQUdFX1NISUZUOwoKKFVzaW5nIFBBR0Vf U0laRSB3aGVuIGl0cyBndCA2NEsgdG8gY292ZXIgdGhlIHBwYyBjYXNlIHRoYXQgbG9va3MgbGlr ZSBpdCBjYW4Kc3VwcG9ydCAyNTZLIHBhZ2VzLiBPcGVuIGNvZGluZyB0aGUgbWF4IGJlY2F1c2Ug bWF4KCkgY2FuJ3QgYmUgdXNlZCBvdXRzaWRlIGEKZnVuY3Rpb24pLgoKPiAKPiAvc3lzL2tlcm5l bC9tbS90cmFuc3BhcmVudF9odWdlcGFnZS9lbmFibGVkPW5ldmVyIGFuZCBzaW1wbHkgbm90IGdl dCBhbnkgdGhwLgoKWWVzLCB0aGF0IHNob3VsZCB3b3JrIHdpdGggdGhlIHBhdGNoIGFzIGl0IGlz IHRvZGF5LgoKPiAKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3Rz LmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5m by9saW51eC1hcm0ta2VybmVsCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0649EB64DC for ; Mon, 17 Jul 2023 14:48:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E72DD8D0002; Mon, 17 Jul 2023 10:48:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E221E8D0001; Mon, 17 Jul 2023 10:48:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D10E48D0002; Mon, 17 Jul 2023 10:48:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C29848D0001 for ; Mon, 17 Jul 2023 10:48:02 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 12F34A03D0 for ; Mon, 17 Jul 2023 14:48:02 +0000 (UTC) X-FDA: 81021383604.19.FD6D85A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 91C9A1A001A for ; Mon, 17 Jul 2023 14:47:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689605278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B3UXGVmZUR9CDbDqdweYtFKavUS7ikLJ78/8r7SZsEA=; b=CODRAl1j1ndPZIQaUMpZySre8tWkbb8TrbiInj94B8yJOUI9ZffbdEOnoNvdqFXxQ+GKkk C/jhoicH3z7S8elXjJkStwykSTRRsUWVxgINGT6KdDqy3koVqYfWKyAeaRBbGJTTw5FCzD kCNjN4ncnPkk4d6K/n9Fuog3NQOd/nc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689605278; a=rsa-sha256; cv=none; b=htqJFFUro84Ptu5YqT7vKwS8JPRp7hu/Ilz4vQTEKChzxZbAH7nD6LozfW4W1NrZpkJbF3 1w3bARI/iVAmHZoriGVfyvg+D0SsJcRSoy18J16YeeLiNfZTE9k8nDanczDT7ty4hnHUb8 iRKLyesOqnWLf4vDqOrRA3lBpfUADJc= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C241B13D5; Mon, 17 Jul 2023 07:48:40 -0700 (PDT) Received: from [10.57.76.30] (unknown [10.57.76.30]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D7D833F738; Mon, 17 Jul 2023 07:47:54 -0700 (PDT) Message-ID: <4f89d7bf-2fe2-fa53-c7ca-e4f152ca0edf@arm.com> Date: Mon, 17 Jul 2023 15:47:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance To: David Hildenbrand , Yu Zhao Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> <2c4b2a41-1c98-0782-ac30-80e65bdb2b0c@arm.com> <2e7d5692-8ba7-1e56-a03f-449f1671b100@redhat.com> From: Ryan Roberts In-Reply-To: <2e7d5692-8ba7-1e56-a03f-449f1671b100@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 91C9A1A001A X-Stat-Signature: thfid3pb8yowtno1biwjyihywpo5eggi X-Rspam-User: X-HE-Tag: 1689605278-524657 X-HE-Meta: U2FsdGVkX1+9vwwqlb2NqTCG2IVXmXi8LqMijkjzGZGT79+MuwWWESR/xkXfmrr56PvI4usf7GeZgxyonH6fY2COTOrDWxLhSEPpKNmw+5vyzcW+u2hy4X1NSkmk/+noJSqkvJ/LPdczKFBuaPC/ZSLKSYx066RWBHJOiczmiQKQPz2OYByQGN85kYlaijI2RxDVsRETOiU2GJ0Sggmqs8CxGxM0IhRXVMCROOdWx4RXQgfOXKcZpa1AagRbDZAhLNDdE/2GIJ787s2nLLBmjinrqZ8twJ1b6KxiF3l4bCM3mTYlmW1IcO+QkKqBnSg5CTC5SllpgJxDZ0yHZNspAcZ5EzDRMnsJsIKBcKO5sdtawiwc2iVGMIGWtaIFZ3rZix1btkVgcX9b0A7WazJ6aFxwrG6MBTZvjjkqT+e9XWTQtV8nXf69BVaaBfjlCkd8evc4zQ/n+KpnlBKAIVuQpSzMpmyHVPoqzVExxXtSGvwqk5AxSlrvUvXqzH7xxlKywq9TF60VVy87D/+MJHydPXcxjS4XSOZ2QZ65wB4zt532PvLGGfQ7ErirgHVLSJV0OMv2eGVWN/ime9Xt594q44qzE8fR54UXy6+NNVGN1p45iYzYLuckwW0OiHkX2WgjQsKa+fUXIeFrQhut4guFoPg88tYua7dK21/0GuWpEh4TOW76YVKydZpnLGuUt5JCiVUfWrBKA2IM84onXQfhJq9AVemXiBPgYK/QHkRGyXR6vvZu3GmjR3P9yB6M8c9kqB7umjPJPXHDeP65FLCbPUkII2sRKJYk52y/VIbjaFiE0eoqy9kJvaAlgcoOzV2FyCtVDGKxl4dLNvQ/Fe0+FMDj0y0gKAepDk/u7Or24z1rLEFOf3qMroP7WQJ5E32fxAuWdMx+yHlTJyMiBbjPqBPR++RpHpoVxG5PTm0zLc1WNjJ1/YWSmZPLilHcy9+EfBIxIfHl1LlzEdvt3Se qD1BcUj1 ZQPg/4+ciplKDUu/YLsVv25s/L2/64T5ZekShBuh/O7rpoOaHcxjT8SyyrnxrSkxusEO4O9dw6tiCO9Cyg+uMLUP1tWzMr/F98kceOMXRdEEP4/keJaiUopjaM2No97gwTkZrCVF6SYtqN9kTUpESZDzkbW/3KZE5TqdKgvl7jO6dbEH1egtSoFHwgx1bECmV9bmyzp8arYCiJ9gm1U4vF8HORg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17/07/2023 14:56, David Hildenbrand wrote: > On 17.07.23 15:20, Ryan Roberts wrote: >> On 17/07/2023 14:06, David Hildenbrand wrote: >>> On 14.07.23 19:17, Yu Zhao wrote: >>>> On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts wrote: >>>>> >>>>> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be >>>>> allocated in large folios of a determined order. All pages of the large >>>>> folio are pte-mapped during the same page fault, significantly reducing >>>>> the number of page faults. The number of per-page operations (e.g. ref >>>>> counting, rmap management lru list management) are also significantly >>>>> reduced since those ops now become per-folio. >>>>> >>>>> The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which >>>>> defaults to disabled for now; The long term aim is for this to defaut to >>>>> enabled, but there are some risks around internal fragmentation that >>>>> need to be better understood first. >>>>> >>>>> When enabled, the folio order is determined as such: For a vma, process >>>>> or system that has explicitly disabled THP, we continue to allocate >>>>> order-0. THP is most likely disabled to avoid any possible internal >>>>> fragmentation so we honour that request. >>>>> >>>>> Otherwise, the return value of arch_wants_pte_order() is used. For vmas >>>>> that have not explicitly opted-in to use transparent hugepages (e.g. >>>>> where thp=madvise and the vma does not have MADV_HUGEPAGE), then >>>>> arch_wants_pte_order() is limited by the new cmdline parameter, >>>>> `flexthp_unhinted_max`. This allows for a performance boost without >>>>> requiring any explicit opt-in from the workload while allowing the >>>>> sysadmin to tune between performance and internal fragmentation. >>>>> >>>>> arch_wants_pte_order() can be overridden by the architecture if desired. >>>>> Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous >>>>> set of ptes map physically contigious, naturally aligned memory, so this >>>>> mechanism allows the architecture to optimize as required. >>>>> >>>>> If the preferred order can't be used (e.g. because the folio would >>>>> breach the bounds of the vma, or because ptes in the region are already >>>>> mapped) then we fall back to a suitable lower order; first >>>>> PAGE_ALLOC_COSTLY_ORDER, then order-0. >>>>> >>>>> Signed-off-by: Ryan Roberts >>>>> --- >>>>>    .../admin-guide/kernel-parameters.txt         |  10 + >>>>>    mm/Kconfig                                    |  10 + >>>>>    mm/memory.c                                   | 187 ++++++++++++++++-- >>>>>    3 files changed, 190 insertions(+), 17 deletions(-) >>>>> >>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt >>>>> b/Documentation/admin-guide/kernel-parameters.txt >>>>> index a1457995fd41..405d624e2191 100644 >>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>> @@ -1497,6 +1497,16 @@ >>>>>                           See Documentation/admin-guide/sysctl/net.rst for >>>>>                           fb_tunnels_only_for_init_ns >>>>> >>>>> +       flexthp_unhinted_max= >>>>> +                       [KNL] Requires CONFIG_FLEXIBLE_THP enabled. The >>>>> maximum >>>>> +                       folio size that will be allocated for an anonymous vma >>>>> +                       that has neither explicitly opted in nor out of using >>>>> +                       transparent hugepages. The size must be a >>>>> power-of-2 in >>>>> +                       the range [PAGE_SIZE, PMD_SIZE). A larger size >>>>> improves >>>>> +                       performance by reducing page faults, while a smaller >>>>> +                       size reduces internal fragmentation. Default: max(64K, >>>>> +                       PAGE_SIZE). Format: size[KMG]. >>>>> + >>>> >>>> Let's split this parameter into a separate patch. >>>> >>> >>> Just a general comment after stumbling over patch #2, let's not start splitting >>> patches into things that don't make any sense on their own; that just makes >>> review a lot harder. >> >> ACK >> >>> >>> For this case here, I'd suggest first adding the general infrastructure and then >>> adding tunables we want to have on top. >> >> OK, so 1 patch for the main infrastructure, then a patch to disable for >> MADV_NOHUGEPAGE and friends, then a further patch to set flexthp_unhinted_max >> via a sysctl? > > MADV_NOHUGEPAGE handling for me falls under the category "required for > correctness to not break existing workloads" and has to be there initially. > > Anything that is rather a performance tunable (e.g., a sysctl to optimize) can > be added on top and discussed separately.> > At least IMHO :) > >> >>> >>> I agree that toggling that at runtime (for example via sysfs as raised by me >>> previously) would be nicer. >> >> OK, I clearly misunderstood, I thought you were requesting a boot parameter. > > Oh, sorry about that. I wanted to actually express > "/sys/kernel/mm/transparent_hugepage/" sysctls where we can toggle that later at > runtime as well. > >> What's the ABI compat guarrantee for sysctls? I assumed that for a boot >> parameter it would be easier to remove in future if we wanted, but for sysctl, >> its there forever? > > sysctl are hard/impossible to remove, yes. So we better make sure what we add > has clear semantics. > > If we ever want some real auto-tunable mode (and can actually implement it > without harming performance; and I am skeptical), we might want to allow for > setting such a parameter to "auto", for example. > >> >> Also, how do you feel about the naming and behavior of the parameter? > > Very good question. "flexthp_unhinted_max" naming is a bit suboptimal. > > For example, I'm not so sure if we should expose the feature to user space as > "flexthp" at all. I think we should find a clearer feature name to begin with. > > ... maybe we can initially get away with dropping that parameter and default to > something reasonably small (i.e., 64k as you have above)? That would certainly get my vote. But it was you who was arguing for a tunable previously ;-). I propose we use the following as the "unhinted ceiling" for now, then we can add a tunable if/when we find a use case that doesn't work optimally with this value: static int flexthp_unhinted_max_order = ilog2(SZ_64K > PAGE_SIZE ? SZ_64K : PAGE_SIZE) - PAGE_SHIFT; (Using PAGE_SIZE when its gt 64K to cover the ppc case that looks like it can support 256K pages. Open coding the max because max() can't be used outside a function). > > /sys/kernel/mm/transparent_hugepage/enabled=never and simply not get any thp. Yes, that should work with the patch as it is today. >