From mboxrd@z Thu Jan 1 00:00:00 1970 From: toshi.kani@hpe.com (Kani, Toshi) Date: Fri, 27 Apr 2018 14:31:51 +0000 Subject: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces In-Reply-To: <20180427073719.GT15462@8bytes.org> References: <20180314180155.19492-1-toshi.kani@hpe.com> <20180314180155.19492-3-toshi.kani@hpe.com> <20180426141926.GN15462@8bytes.org> <1524759629.2693.465.camel@hpe.com> <20180426172327.GQ15462@8bytes.org> <1524764948.2693.478.camel@hpe.com> <20180426200737.GS15462@8bytes.org> <1524781764.2693.503.camel@hpe.com> <20180427073719.GT15462@8bytes.org> Message-ID: <1524839460.2693.531.camel@hpe.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, 2018-04-27 at 09:37 +0200, joro at 8bytes.org wrote: > On Thu, Apr 26, 2018 at 10:30:14PM +0000, Kani, Toshi wrote: > > Thanks for the clarification. After reading through SDM one more time, I > > agree that we need a TLB purge here. Here is my current understanding. > > > > - INVLPG purges both TLB and paging-structure caches. So, PMD cache was > > purged once. > > - However, processor may cache this PMD entry later in speculation > > since it has p-bit set. (This is where my misunderstanding was. > > Speculation is not allowed to access a target address, but it may still > > cache this PMD entry.) > > - A single INVLPG on each processor purges this PMD cache. It does not > > need a range purge (which was already done). > > > > Does it sound right to you? > > The right fix is to first synchronize the changes when the PMD/PUD is > cleared and then flush the TLB system-wide. After that is done you can > free the page. Agreed. This can be done on top of this patch. > But doing all that in the pud/pmd_free_pmd/pte_page() functions is too > expensive, as the TLB flush requires to send IPIs to all cores in the > system, and that every time the function is called. > > So what needs to be done is to fix this from high-level ioremap code to > first unmap all required PTE/PMD pages and collect them in a list. When > that is done you can synchronize the changes with the other page-tables > in the system and do one system-wide TLB flush. When that is complete > you can free the pages on the list that were collected while unmapping. > > Then the new mappings can be established and again synchronized with the > other page-tables in the system. Yes, and this patch was designed to work in such way. Please note that this patch added pud_free_pmd_page() and pmd_free_pte_page() to the ioremap() path when and only when it creates a pud or pmd map. This assures the following preconditions are met without overhead. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been done for a target pud/pmd address range. So, we can add the step 2 on top of this patch. 1. Clear pud/pmd entry. 2. System wide TLB flush <-- TO BE ADDED BY NEW PATCH 3. Free its underlining pmd/pte page. > > As for the BUG_ON issue, are you able to reproduce this issue? If so, > > would you be able to test the fix? > > Yes, I can reproduce the BUG_ON with my PTI patches and a fedora-i386 > VM. Great! > I already ran into the issue before your patches were merged upstream, > but my "fix" is different because it just prevents huge-mappings when > there were smaller mappings before. See > > e3e288121408 x86/pgtable: Don't set huge PUD/PMD on non-leaf entries > > for details. This patch does not fix the base-problem, but hides it > again, as the real fix needs some more work across architectures. Right. Patch 1/2 of this series made the same fix as well. See: b6bdb7517c3d mm/vmalloc: add interfaces to free unmapped page table > Your patch actually makes the problem worse, without it the PTE/PMD pages > were just leaked, so that they could not be reused. But with your patch > the pages can be used again and the page-walker might establish TLB > entries based on random content the new owner writes to it. This can > lead to all kinds of random and very hard to debug data corruption > issues. > > So until we make the generic ioremap code in lib/ioremap.c smarter about > unmapping/remapping ranges the best solution is making my fix work again > by reverting your patch. We do not need to revert this patch. We can make the above change I mentioned. Thanks, -Toshi From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f197.google.com (mail-ot0-f197.google.com [74.125.82.197]) by kanga.kvack.org (Postfix) with ESMTP id 71D026B0003 for ; Fri, 27 Apr 2018 10:32:21 -0400 (EDT) Received: by mail-ot0-f197.google.com with SMTP id r104-v6so1315625ota.19 for ; Fri, 27 Apr 2018 07:32:21 -0700 (PDT) Received: from g4t3426.houston.hpe.com (g4t3426.houston.hpe.com. [15.241.140.75]) by mx.google.com with ESMTPS id x142-v6si487743oia.376.2018.04.27.07.32.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Apr 2018 07:32:20 -0700 (PDT) From: "Kani, Toshi" Subject: Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces Date: Fri, 27 Apr 2018 14:31:51 +0000 Message-ID: <1524839460.2693.531.camel@hpe.com> References: <20180314180155.19492-1-toshi.kani@hpe.com> <20180314180155.19492-3-toshi.kani@hpe.com> <20180426141926.GN15462@8bytes.org> <1524759629.2693.465.camel@hpe.com> <20180426172327.GQ15462@8bytes.org> <1524764948.2693.478.camel@hpe.com> <20180426200737.GS15462@8bytes.org> <1524781764.2693.503.camel@hpe.com> <20180427073719.GT15462@8bytes.org> In-Reply-To: <20180427073719.GT15462@8bytes.org> Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-ID: <76CF738D819A5A4C9117DC257825724D@NAMPRD84.PROD.OUTLOOK.COM> Content-Transfer-Encoding: base64 MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: "joro@8bytes.org" Cc: "linux-kernel@vger.kernel.org" , "bp@suse.de" , "tglx@linutronix.de" , "linux-mm@kvack.org" , "guohanjun@huawei.com" , "wxf.wang@hisilicon.com" , "stable@vger.kernel.org" , "x86@kernel.org" , "akpm@linux-foundation.org" , "willy@infradead.org" , "hpa@zytor.com" , "catalin.marinas@arm.com" , "mingo@redhat.com" , "will.deacon@arm.com" , "Hocko, Michal" , "cpandya@codeaurora.org" , "linux-arm-kernel@lists.infradead.org" T24gRnJpLCAyMDE4LTA0LTI3IGF0IDA5OjM3ICswMjAwLCBqb3JvQDhieXRlcy5vcmcgd3JvdGU6 DQo+IE9uIFRodSwgQXByIDI2LCAyMDE4IGF0IDEwOjMwOjE0UE0gKzAwMDAsIEthbmksIFRvc2hp IHdyb3RlOg0KPiA+IFRoYW5rcyBmb3IgdGhlIGNsYXJpZmljYXRpb24uIEFmdGVyIHJlYWRpbmcg dGhyb3VnaCBTRE0gb25lIG1vcmUgdGltZSwgSQ0KPiA+IGFncmVlIHRoYXQgd2UgbmVlZCBhIFRM QiBwdXJnZSBoZXJlLiBIZXJlIGlzIG15IGN1cnJlbnQgdW5kZXJzdGFuZGluZy4gDQo+ID4gDQo+ ID4gIC0gSU5WTFBHIHB1cmdlcyBib3RoIFRMQiBhbmQgcGFnaW5nLXN0cnVjdHVyZSBjYWNoZXMu IFNvLCBQTUQgY2FjaGUgd2FzDQo+ID4gcHVyZ2VkIG9uY2UuDQo+ID4gIC0gSG93ZXZlciwgcHJv Y2Vzc29yIG1heSBjYWNoZSB0aGlzIFBNRCBlbnRyeSBsYXRlciBpbiBzcGVjdWxhdGlvbg0KPiA+ IHNpbmNlIGl0IGhhcyBwLWJpdCBzZXQuIChUaGlzIGlzIHdoZXJlIG15IG1pc3VuZGVyc3RhbmRp bmcgd2FzLg0KPiA+IFNwZWN1bGF0aW9uIGlzIG5vdCBhbGxvd2VkIHRvIGFjY2VzcyBhIHRhcmdl dCBhZGRyZXNzLCBidXQgaXQgbWF5IHN0aWxsDQo+ID4gY2FjaGUgdGhpcyBQTUQgZW50cnkuKQ0K PiA+ICAtIEEgc2luZ2xlIElOVkxQRyBvbiBlYWNoIHByb2Nlc3NvciBwdXJnZXMgdGhpcyBQTUQg Y2FjaGUuIEl0IGRvZXMgbm90DQo+ID4gbmVlZCBhIHJhbmdlIHB1cmdlICh3aGljaCB3YXMgYWxy ZWFkeSBkb25lKS4NCj4gPiANCj4gPiBEb2VzIGl0IHNvdW5kIHJpZ2h0IHRvIHlvdT8NCj4gDQo+ IFRoZSByaWdodCBmaXggaXMgdG8gZmlyc3Qgc3luY2hyb25pemUgdGhlIGNoYW5nZXMgd2hlbiB0 aGUgUE1EL1BVRCBpcw0KPiBjbGVhcmVkIGFuZCB0aGVuIGZsdXNoIHRoZSBUTEIgc3lzdGVtLXdp ZGUuIEFmdGVyIHRoYXQgaXMgZG9uZSB5b3UgY2FuDQo+IGZyZWUgdGhlIHBhZ2UuDQoNCkFncmVl ZC4gVGhpcyBjYW4gYmUgZG9uZSBvbiB0b3Agb2YgdGhpcyBwYXRjaC4NCg0KPiBCdXQgZG9pbmcg YWxsIHRoYXQgaW4gdGhlIHB1ZC9wbWRfZnJlZV9wbWQvcHRlX3BhZ2UoKSBmdW5jdGlvbnMgaXMg dG9vDQo+IGV4cGVuc2l2ZSwgYXMgdGhlIFRMQiBmbHVzaCByZXF1aXJlcyB0byBzZW5kIElQSXMg dG8gYWxsIGNvcmVzIGluIHRoZQ0KPiBzeXN0ZW0sIGFuZCB0aGF0IGV2ZXJ5IHRpbWUgdGhlIGZ1 bmN0aW9uIGlzIGNhbGxlZC4NCj4NCj4gU28gd2hhdCBuZWVkcyB0byBiZSBkb25lIGlzIHRvIGZp eCB0aGlzIGZyb20gaGlnaC1sZXZlbCBpb3JlbWFwIGNvZGUgdG8NCj4gZmlyc3QgdW5tYXAgYWxs IHJlcXVpcmVkIFBURS9QTUQgcGFnZXMgYW5kIGNvbGxlY3QgdGhlbSBpbiBhIGxpc3QuIFdoZW4N Cj4gdGhhdCBpcyBkb25lIHlvdSBjYW4gc3luY2hyb25pemUgdGhlIGNoYW5nZXMgd2l0aCB0aGUg b3RoZXIgcGFnZS10YWJsZXMNCj4gaW4gdGhlIHN5c3RlbSBhbmQgZG8gb25lIHN5c3RlbS13aWRl IFRMQiBmbHVzaC4gV2hlbiB0aGF0IGlzIGNvbXBsZXRlDQo+IHlvdSBjYW4gZnJlZSB0aGUgcGFn ZXMgb24gdGhlIGxpc3QgdGhhdCB3ZXJlIGNvbGxlY3RlZCB3aGlsZSB1bm1hcHBpbmcuDQo+DQo+ IFRoZW4gdGhlIG5ldyBtYXBwaW5ncyBjYW4gYmUgZXN0YWJsaXNoZWQgYW5kIGFnYWluIHN5bmNo cm9uaXplZCB3aXRoIHRoZQ0KPiBvdGhlciBwYWdlLXRhYmxlcyBpbiB0aGUgc3lzdGVtLg0KDQpZ ZXMsIGFuZCB0aGlzIHBhdGNoIHdhcyBkZXNpZ25lZCB0byB3b3JrIGluIHN1Y2ggd2F5LiAgUGxl YXNlIG5vdGUgdGhhdA0KdGhpcyBwYXRjaCBhZGRlZCBwdWRfZnJlZV9wbWRfcGFnZSgpIGFuZCBw bWRfZnJlZV9wdGVfcGFnZSgpIHRvIHRoZQ0KaW9yZW1hcCgpIHBhdGggd2hlbiBhbmQgb25seSB3 aGVuIGl0IGNyZWF0ZXMgYSBwdWQgb3IgcG1kIG1hcC4gIFRoaXMNCmFzc3VyZXMgdGhlIGZvbGxv d2luZyBwcmVjb25kaXRpb25zIGFyZSBtZXQgd2l0aG91dCBvdmVyaGVhZC4NCiAtIEFsbCBwdGUg ZW50cmllcyBmb3IgYSB0YXJnZXQgcHVkL3BtZCBhZGRyZXNzIHJhbmdlIGhhdmUgYmVlbiBjbGVh cmVkLg0KIC0gU3lzdGVtLXdpZGUgVExCIHB1cmdlcyBoYXZlIGJlZW4gZG9uZSBmb3IgYSB0YXJn ZXQgcHVkL3BtZCBhZGRyZXNzDQpyYW5nZS4NCg0KU28sIHdlIGNhbiBhZGQgdGhlIHN0ZXAgMiBv biB0b3Agb2YgdGhpcyBwYXRjaC4NCiAxLiBDbGVhciBwdWQvcG1kIGVudHJ5Lg0KIDIuIFN5c3Rl bSB3aWRlIFRMQiBmbHVzaCA8LS0gVE8gQkUgQURERUQgQlkgTkVXIFBBVENIDQogMy4gRnJlZSBp dHMgdW5kZXJsaW5pbmcgcG1kL3B0ZSBwYWdlLg0KDQo+ID4gQXMgZm9yIHRoZSBCVUdfT04gaXNz dWUsIGFyZSB5b3UgYWJsZSB0byByZXByb2R1Y2UgdGhpcyBpc3N1ZT8gIElmIHNvLA0KPiA+IHdv dWxkIHlvdSBiZSBhYmxlIHRvIHRlc3QgdGhlIGZpeD8NCj4gDQo+IFllcywgSSBjYW4gcmVwcm9k dWNlIHRoZSBCVUdfT04gd2l0aCBteSBQVEkgcGF0Y2hlcyBhbmQgYSBmZWRvcmEtaTM4Ng0KPiBW TS4NCg0KR3JlYXQhDQoNCj4gSSBhbHJlYWR5IHJhbiBpbnRvIHRoZSBpc3N1ZSBiZWZvcmUgeW91 ciBwYXRjaGVzIHdlcmUgbWVyZ2VkIHVwc3RyZWFtLA0KPiBidXQgbXkgImZpeCIgaXMgZGlmZmVy ZW50IGJlY2F1c2UgaXQganVzdCBwcmV2ZW50cyBodWdlLW1hcHBpbmdzIHdoZW4NCj4gdGhlcmUg d2VyZSBzbWFsbGVyIG1hcHBpbmdzIGJlZm9yZS4gU2VlDQo+IA0KPiAJZTNlMjg4MTIxNDA4IHg4 Ni9wZ3RhYmxlOiBEb24ndCBzZXQgaHVnZSBQVUQvUE1EIG9uIG5vbi1sZWFmIGVudHJpZXMNCj4g DQo+IGZvciBkZXRhaWxzLiBUaGlzIHBhdGNoIGRvZXMgbm90IGZpeCB0aGUgYmFzZS1wcm9ibGVt LCBidXQgaGlkZXMgaXQNCj4gYWdhaW4sIGFzIHRoZSByZWFsIGZpeCBuZWVkcyBzb21lIG1vcmUg d29yayBhY3Jvc3MgYXJjaGl0ZWN0dXJlcy4NCg0KUmlnaHQuICBQYXRjaCAxLzIgb2YgdGhpcyBz ZXJpZXMgbWFkZSB0aGUgc2FtZSBmaXggYXMgd2VsbC4gIFNlZToNCg0KYjZiZGI3NTE3YzNkICBt bS92bWFsbG9jOiBhZGQgaW50ZXJmYWNlcyB0byBmcmVlIHVubWFwcGVkIHBhZ2UgdGFibGUNCg0K PiBZb3VyIHBhdGNoIGFjdHVhbGx5IG1ha2VzIHRoZSBwcm9ibGVtIHdvcnNlLCB3aXRob3V0IGl0 IHRoZSBQVEUvUE1EIHBhZ2VzDQo+IHdlcmUganVzdCBsZWFrZWQsIHNvIHRoYXQgdGhleSBjb3Vs ZCBub3QgYmUgcmV1c2VkLiBCdXQgd2l0aCB5b3VyIHBhdGNoDQo+IHRoZSBwYWdlcyBjYW4gYmUg dXNlZCBhZ2FpbiBhbmQgdGhlIHBhZ2Utd2Fsa2VyIG1pZ2h0IGVzdGFibGlzaCBUTEINCj4gZW50 cmllcyBiYXNlZCBvbiByYW5kb20gY29udGVudCB0aGUgbmV3IG93bmVyIHdyaXRlcyB0byBpdC4g VGhpcyBjYW4NCj4gbGVhZCB0byBhbGwga2luZHMgb2YgcmFuZG9tIGFuZCB2ZXJ5IGhhcmQgdG8g ZGVidWcgZGF0YSBjb3JydXB0aW9uDQo+IGlzc3Vlcy4NCj4gDQo+IFNvIHVudGlsIHdlIG1ha2Ug dGhlIGdlbmVyaWMgaW9yZW1hcCBjb2RlIGluIGxpYi9pb3JlbWFwLmMgc21hcnRlciBhYm91dA0K PiB1bm1hcHBpbmcvcmVtYXBwaW5nIHJhbmdlcyB0aGUgYmVzdCBzb2x1dGlvbiBpcyBtYWtpbmcg bXkgZml4IHdvcmsgYWdhaW4NCj4gYnkgcmV2ZXJ0aW5nIHlvdXIgcGF0Y2guDQoNCldlIGRvIG5v dCBuZWVkIHRvIHJldmVydCB0aGlzIHBhdGNoLiAgV2UgY2FuIG1ha2UgdGhlIGFib3ZlIGNoYW5n ZSBJDQptZW50aW9uZWQuDQoNClRoYW5rcywNCi1Ub3NoaQ0K