From: Mike Rapoport <rppt@kernel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"keescook@chromium.org" <keescook@chromium.org>,
"Weiny, Ira" <ira.weiny@intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"x86@kernel.org" <x86@kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"rppt@linux.ibm.com" <rppt@linux.ibm.com>,
"Lutomirski, Andy" <luto@kernel.org>
Subject: Re: [RFC PATCH 3/4] mm/page_alloc: introduce __GFP_PTE_MAPPED flag to allocate pte-mapped pages
Date: Tue, 24 Aug 2021 16:02:45 +0300 [thread overview]
Message-ID: <YSTt9XEDfbPOpab4@kernel.org> (raw)
In-Reply-To: <889bdfef8b4acbe840668f27782c3d39a987c368.camel@intel.com>
On Mon, Aug 23, 2021 at 08:29:49PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2021-08-23 at 16:25 +0300, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > When __GFP_PTE_MAPPED flag is passed to an allocation request of
> > order 0,
> > the allocated page will be mapped at PTE level in the direct map.
> >
> > To reduce the direct map fragmentation, maintain a cache of 4K pages
> > that
> > are already mapped at PTE level in the direct map. Whenever the cache
> > should be replenished, try to allocate 2M page and split it to 4K
> > pages
> > to localize shutter of the direct map. If the allocation of 2M page
> > fails,
> > fallback to a single page allocation at expense of the direct map
> > fragmentation.
> >
> > The cache registers a shrinker that releases free pages from the
> > cache to
> > the page allocator.
> >
> > The __GFP_PTE_MAPPED and caching of 4K pages are enabled only if an
> > architecture selects ARCH_WANTS_PTE_MAPPED_CACHE in its Kconfig.
> >
> > [
> > cache management are mostly copied from
> >
> https://lore.kernel.org/lkml/20210505003032.489164-4-rick.p.edgecombe@intel.com/
> > ]
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > arch/Kconfig | 8 +
> > arch/x86/Kconfig | 1 +
> > include/linux/gfp.h | 11 +-
> > include/linux/mm.h | 2 +
> > include/linux/pageblock-flags.h | 26 ++++
> > init/main.c | 1 +
> > mm/internal.h | 3 +-
> > mm/page_alloc.c | 261 +++++++++++++++++++++++++++++++-
> > 8 files changed, 309 insertions(+), 4 deletions(-)
...
> > +static void pte_mapped_cache_add_neighbour_pages(struct page *page)
> > +{
> > +#if 0
> > + /*
> > + * TODO: if pte_mapped_cache_replenish() had to fallback to
> > order-0
> > + * allocation, the large page in the direct map will be split
> > + * anyway and if there are free pages in the same pageblock
> > they
> > + * can be added to pte_mapped cache.
> > + */
> > + unsigned int order = (1 << HUGETLB_PAGE_ORDER);
> > + unsigned int nr_pages = (1 << order);
> > + unsigned long pfn = page_to_pfn(page);
> > + struct page *page_head = page - (pfn & (order - 1));
> > +
> > + for (i = 0; i < nr_pages; i++) {
> > + page = page_head + i;
> > + if (is_free_buddy_page(page)) {
> > + take_page_off_buddy(page);
> > + pte_mapped_cache_add(&pte_mapped_cache, page);
> > + }
> > + }
> > +#endif
> > +}
> >
> This seems a nice benefit of doing this sort of stuff in the page
> allocator if it can work.
I didn't try enable it yet, but I don't see a fundamental reason why this
won't work.
> > +static struct page *alloc_page_pte_mapped(gfp_t gfp)
> >
> I'm a little disappointed building into the page allocator didn't
> automatically make higher order allocations easy. It seems this mostly
> bolts the grouped pages code on to the page allocator and splits out of
> the allocation/free paths to call into it?
>
> I was thinking the main benefit of handling direct map permissions in
> the page allocator would be re-using the buddy part to support high
> order pages, etc. Did you try to build it in like that? If we can't get
> that, what is the benefit to doing permission stuff in the pageallocator?
The addition of grouped pages to page allocator the way I did is somewhat
intermediate solution between keeping such cache entirely separate from
page allocator vs making it really tightly integrated, e.g. using a new
migratetype or doing more intrusive changes to page allocator. One of the
reasons I did it this way is to present various trade-offs because, tbh,
I'm not yet sure what's the best way to move forward. [The other reason
being my laziness, dropping your grouped pages code into the page allocator
was the simplest thing to do ;-)].
The immediate benefit of having this code close to the page allocator is
the simplification of the free path. Otherwise we'd need a cache-specific
free method or some information in struct page about how to free a grouped
page. Besides, it is possible to put pages mapped as 4k into such cache at
boot time when page allocator is initialized.
Also, keeping a central cache for multiple users will improve memory
utilization and I believe it would require less splits of the direct map.
OTOH, keeping such caches per-user allows managing access policy per cache
which could be better from the security POV.
I'm also going to explore the possibilities of using a new migratetype or
SL*B as Dave suggested.
> > +{
> > + struct pte_mapped_cache *cache = &pte_mapped_cache;
> > + struct page *page;
> > +
> > + page = pte_mapped_cache_get(cache);
> > + if (page) {
> > + prep_new_page(page, 0, gfp, 0);
> > + goto out;
> > + }
> > +
> > + page = pte_mapped_cache_replenish(cache, gfp);
> > +
> > +out:
> > + return page;
> > +}
> > +
> We probably want to exclude GFP_ATOMIC before calling into CPA unless
> debug page alloc is on, because it may need to split and sleep for the
> allocation. There is a page table allocation with GFP_ATOMIC passed actually.
Looking at the callers of alloc_low_pages() it seems that GFP_ATOMIC there
is stale...
> In my next series of this I added support for GFP_ATOMIC to this code,
> but that solution should only work for permission changing grouped page
> allocators in the protected page tables case where the direct map
> tables are handled differently. As a general solution though (that's
> the long term intention right?), GFP_ATOMIC might deserve some
> consideration.
... but for the general solution GFP_ATOMIC indeed deserves some
consideration.
> The other thing is we probably don't want to clean out the atomic
> reserves and add them to a cache just for one page. I opted to just
> convert one page in the GFP_ATOMIC case.
Do you mean to allocate one page in GFP_ATOMIC case and bypass high order
allocation?
But the CPA split is still necessary here, isn't it?
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2021-08-24 13:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-23 13:25 [RFC PATCH 0/4] mm/page_alloc: cache pte-mapped allocations Mike Rapoport
2021-08-23 13:25 ` [RFC PATCH 1/4] list: Support getting most recent element in list_lru Mike Rapoport
2021-08-23 13:25 ` [RFC PATCH 2/4] list: Support list head not in object for list_lru Mike Rapoport
2021-08-23 13:25 ` [RFC PATCH 3/4] mm/page_alloc: introduce __GFP_PTE_MAPPED flag to allocate pte-mapped pages Mike Rapoport
2021-08-23 20:29 ` Edgecombe, Rick P
2021-08-24 13:02 ` Mike Rapoport [this message]
2021-08-24 16:38 ` Edgecombe, Rick P
2021-08-24 16:54 ` Mike Rapoport
2021-08-24 17:23 ` Edgecombe, Rick P
2021-08-24 17:37 ` Mike Rapoport
2021-08-24 16:12 ` Vlastimil Babka
2021-08-25 8:43 ` David Hildenbrand
2021-08-23 13:25 ` [RFC PATCH 4/4] x86/mm: write protect (most) page tables Mike Rapoport
2021-08-23 20:08 ` Edgecombe, Rick P
2021-08-23 23:50 ` Dave Hansen
2021-08-24 3:34 ` Andy Lutomirski
2021-08-25 14:59 ` Dave Hansen
2021-08-24 13:32 ` Mike Rapoport
2021-08-25 8:38 ` David Hildenbrand
2021-08-26 8:02 ` Mike Rapoport
2021-08-26 9:01 ` Vlastimil Babka
2021-08-24 5:32 ` Nadav Amit
2021-08-24 5:34 ` Nadav Amit
2021-08-24 13:36 ` Mike Rapoport
2021-08-23 20:02 ` [RFC PATCH 0/4] mm/page_alloc: cache pte-mapped allocations Edgecombe, Rick P
2021-08-24 13:03 ` Mike Rapoport
2021-08-24 16:09 ` Vlastimil Babka
2021-08-29 7:06 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YSTt9XEDfbPOpab4@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.