From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95187C4338F for ; Tue, 24 Aug 2021 13:03:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2CE8D613AD for ; Tue, 24 Aug 2021 13:03:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2CE8D613AD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B07478D0002; Tue, 24 Aug 2021 09:03:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB7E68D0001; Tue, 24 Aug 2021 09:03:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9806B8D0002; Tue, 24 Aug 2021 09:03:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 7FDF88D0001 for ; Tue, 24 Aug 2021 09:03:07 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 23DD5824999B for ; Tue, 24 Aug 2021 13:03:07 +0000 (UTC) X-FDA: 78509989614.32.B404B0E Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf29.hostedemail.com (Postfix) with ESMTP id 6ABD0900025F for ; Tue, 24 Aug 2021 13:03:06 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 56BA06127B; Tue, 24 Aug 2021 13:02:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1629810184; bh=cznsRp1PYJ7vKk8CM2WCQGPz3rkhE4CdDjsVOSgprUA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PrjUw3FQZY2HmcafO02kTsL7fR/3UgGFc3v0nzfmIlA5UXiD9MoHugb+nPNQfcr0E gxRTH0QKSOF+SsYcAGKmD81xe3+vmcH8TXWjzSp4wBwXAh2Aswz+ifMEXID9/hSFSU vkzp5ZvNhNtpUKVWO/52J3fqhE+sVDzju2je1NU+FVEH1tYIX+4AvmYNbaw5vmk/la Mzh8ok7jDwr3/LRN4x+1pN/IWO6wnL/zHb2qOzx5GzqC3vythT/a6IBA/BaKO/lAWX AJVdpzf10gOHCmX96auYrDm0/IexpkzlQdFIfRWBeGDLrr9qBVKd36gU9O7PlCDyTy MXLqg8WeS/HbA== Date: Tue, 24 Aug 2021 16:02:45 +0300 From: Mike Rapoport To: "Edgecombe, Rick P" Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "keescook@chromium.org" , "Weiny, Ira" , "dave.hansen@linux.intel.com" , "vbabka@suse.cz" , "x86@kernel.org" , "akpm@linux-foundation.org" , "rppt@linux.ibm.com" , "Lutomirski, Andy" Subject: Re: [RFC PATCH 3/4] mm/page_alloc: introduce __GFP_PTE_MAPPED flag to allocate pte-mapped pages Message-ID: References: <20210823132513.15836-1-rppt@kernel.org> <20210823132513.15836-4-rppt@kernel.org> <889bdfef8b4acbe840668f27782c3d39a987c368.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <889bdfef8b4acbe840668f27782c3d39a987c368.camel@intel.com> Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PrjUw3FQ; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf29.hostedemail.com: domain of rppt@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=rppt@kernel.org X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6ABD0900025F X-Stat-Signature: 981rijynpq6wopcr5jqawe4dii51r6zo X-HE-Tag: 1629810186-786180 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 23, 2021 at 08:29:49PM +0000, Edgecombe, Rick P wrote: > On Mon, 2021-08-23 at 16:25 +0300, Mike Rapoport wrote: > > From: Mike Rapoport > > > > When __GFP_PTE_MAPPED flag is passed to an allocation request of > > order 0, > > the allocated page will be mapped at PTE level in the direct map. > > > > To reduce the direct map fragmentation, maintain a cache of 4K pages > > that > > are already mapped at PTE level in the direct map. Whenever the cache > > should be replenished, try to allocate 2M page and split it to 4K > > pages > > to localize shutter of the direct map. If the allocation of 2M page > > fails, > > fallback to a single page allocation at expense of the direct map > > fragmentation. > > > > The cache registers a shrinker that releases free pages from the > > cache to > > the page allocator. > > > > The __GFP_PTE_MAPPED and caching of 4K pages are enabled only if an > > architecture selects ARCH_WANTS_PTE_MAPPED_CACHE in its Kconfig. > > > > [ > > cache management are mostly copied from > > > https://lore.kernel.org/lkml/20210505003032.489164-4-rick.p.edgecombe@intel.com/ > > ] > > > > Signed-off-by: Mike Rapoport > > --- > > arch/Kconfig | 8 + > > arch/x86/Kconfig | 1 + > > include/linux/gfp.h | 11 +- > > include/linux/mm.h | 2 + > > include/linux/pageblock-flags.h | 26 ++++ > > init/main.c | 1 + > > mm/internal.h | 3 +- > > mm/page_alloc.c | 261 +++++++++++++++++++++++++++++++- > > 8 files changed, 309 insertions(+), 4 deletions(-) ... > > +static void pte_mapped_cache_add_neighbour_pages(struct page *page) > > +{ > > +#if 0 > > + /* > > + * TODO: if pte_mapped_cache_replenish() had to fallback to > > order-0 > > + * allocation, the large page in the direct map will be split > > + * anyway and if there are free pages in the same pageblock > > they > > + * can be added to pte_mapped cache. > > + */ > > + unsigned int order = (1 << HUGETLB_PAGE_ORDER); > > + unsigned int nr_pages = (1 << order); > > + unsigned long pfn = page_to_pfn(page); > > + struct page *page_head = page - (pfn & (order - 1)); > > + > > + for (i = 0; i < nr_pages; i++) { > > + page = page_head + i; > > + if (is_free_buddy_page(page)) { > > + take_page_off_buddy(page); > > + pte_mapped_cache_add(&pte_mapped_cache, page); > > + } > > + } > > +#endif > > +} > > > This seems a nice benefit of doing this sort of stuff in the page > allocator if it can work. I didn't try enable it yet, but I don't see a fundamental reason why this won't work. > > +static struct page *alloc_page_pte_mapped(gfp_t gfp) > > > I'm a little disappointed building into the page allocator didn't > automatically make higher order allocations easy. It seems this mostly > bolts the grouped pages code on to the page allocator and splits out of > the allocation/free paths to call into it? > > I was thinking the main benefit of handling direct map permissions in > the page allocator would be re-using the buddy part to support high > order pages, etc. Did you try to build it in like that? If we can't get > that, what is the benefit to doing permission stuff in the pageallocator? The addition of grouped pages to page allocator the way I did is somewhat intermediate solution between keeping such cache entirely separate from page allocator vs making it really tightly integrated, e.g. using a new migratetype or doing more intrusive changes to page allocator. One of the reasons I did it this way is to present various trade-offs because, tbh, I'm not yet sure what's the best way to move forward. [The other reason being my laziness, dropping your grouped pages code into the page allocator was the simplest thing to do ;-)]. The immediate benefit of having this code close to the page allocator is the simplification of the free path. Otherwise we'd need a cache-specific free method or some information in struct page about how to free a grouped page. Besides, it is possible to put pages mapped as 4k into such cache at boot time when page allocator is initialized. Also, keeping a central cache for multiple users will improve memory utilization and I believe it would require less splits of the direct map. OTOH, keeping such caches per-user allows managing access policy per cache which could be better from the security POV. I'm also going to explore the possibilities of using a new migratetype or SL*B as Dave suggested. > > +{ > > + struct pte_mapped_cache *cache = &pte_mapped_cache; > > + struct page *page; > > + > > + page = pte_mapped_cache_get(cache); > > + if (page) { > > + prep_new_page(page, 0, gfp, 0); > > + goto out; > > + } > > + > > + page = pte_mapped_cache_replenish(cache, gfp); > > + > > +out: > > + return page; > > +} > > + > We probably want to exclude GFP_ATOMIC before calling into CPA unless > debug page alloc is on, because it may need to split and sleep for the > allocation. There is a page table allocation with GFP_ATOMIC passed actually. Looking at the callers of alloc_low_pages() it seems that GFP_ATOMIC there is stale... > In my next series of this I added support for GFP_ATOMIC to this code, > but that solution should only work for permission changing grouped page > allocators in the protected page tables case where the direct map > tables are handled differently. As a general solution though (that's > the long term intention right?), GFP_ATOMIC might deserve some > consideration. ... but for the general solution GFP_ATOMIC indeed deserves some consideration. > The other thing is we probably don't want to clean out the atomic > reserves and add them to a cache just for one page. I opted to just > convert one page in the GFP_ATOMIC case. Do you mean to allocate one page in GFP_ATOMIC case and bypass high order allocation? But the CPA split is still necessary here, isn't it? -- Sincerely yours, Mike.