From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
Aaron Lu <aaron.lu@intel.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation
Date: Sun, 19 Feb 2023 08:07:59 +0000 [thread overview]
Message-ID: <Y/HY3y4toae8/nmQ@localhost> (raw)
In-Reply-To: <Y9qqLZz3bFsgE0Kn@kernel.org>
On Wed, Feb 01, 2023 at 08:06:37PM +0200, Mike Rapoport wrote:
> Hi all,
Hi Mike, I'm interested in this topic and hope to discuss this with you
at LSF/MM/BPF.
> There are use-cases that need to remove pages from the direct map or at least
> map them at PTE level. These use-cases include vfree, module loading, ftrace,
> kprobe, BPF, secretmem and generally any caller of set_memory/set_direct_map
> APIs.
>
> Remapping pages at PTE level causes split of the PUD and PMD sized mappings
> in the direct map which leads to performance degradation.
>
> To reduce the performance hit caused by the fragmentation of the direct
> map, it makes sense to group and/or cache the base pages removed from the
> direct map so that the most of base pages created during a split of a large
> page will be consumed by users requiring PTE level mappings.
How much performance difference did you see in your test when direct
map was fragmented, or is there a way to check this difference?
> Last year the proposal to use a new migrate type for such cache received
> strong pushback and the suggested alternative was to try to use slab
> instead.
>
> I've been thinking about it (yeah, it took me a while) and I believe slab
> is not appropriate because use cases require at least page size allocations
> and some would really benefit from higher order allocations, and in the
> most cases the code that allocates memory excluded from the direct map
> needs the struct page/folio.
>
> For example, caching allocations of text in 2M pages would benefit from
> reduced iTLB pressure and doing kmalloc() from vmalloc() will be way more
> intrusive than using some variant of __alloc_pages().
>
> Secretmem and potentially PKS protected page tables also need struct
> page/folio.
>
> My current proposal is to have a cache of 2M pages close to the page
> allocator and use a GFP flag to make allocation request use that cache. On
> the free() path, the pages that are mapped at PTE level will be put into
> that cache.
I would like to discuss not only having cache layer of pages but also how
direct map could be merged correctly and efficiently.
I vaguely recall that Aaron Lu sent RFC series about this and Kirill A.
Shutemov's feedback was to batch merge operations. [1]
Also a CPA API called by the cache layer that could merge fragmented
mappings would work for merging 4K pages to 2M [2], but won't work
for merging 2M mappings to 1G mappings.
At that time I didn't follow more discussions (e.g. execmem_alloc())
Maybe I'm missing some points.
[1] https://lore.kernel.org/linux-mm/20220809100408.rm6ofiewtty6rvcl@box
[2] https://lore.kernel.org/linux-mm/YvfLxuflw2ctHFWF@kernel.org
> The cache is internally implemented as a buddy allocator so it can satisfy
> high order allocations, and there will be a shrinker to release free pages
> from that cache to the page allocator.
>
> I hope to have a first prototype posted Really Soon.
Looking forward to that!
Wonder how it would be shaped.
>
> --
> Sincerely yours,
> Mike.
next prev parent reply other threads:[~2023-02-19 8:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 18:06 [LSF/MM/BPF TOPIC] reducing direct map fragmentation Mike Rapoport
2023-02-19 8:07 ` Hyeonggon Yoo [this message]
2023-02-19 18:09 ` Mike Rapoport
2023-02-20 14:43 ` Hyeonggon Yoo
2023-02-24 14:45 ` Mike Rapoport
2023-04-21 9:05 ` [Lsf-pc] " Michal Hocko
2023-04-21 9:47 ` Mike Rapoport
2023-04-21 12:41 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y/HY3y4toae8/nmQ@localhost \
--to=42.hyeyoo@gmail.com \
--cc=aaron.lu@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).