Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Aaron Lu <aaron.lu@intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation
Date: Sun, 19 Feb 2023 08:07:59 +0000	[thread overview]
Message-ID: <Y/HY3y4toae8/nmQ@localhost> (raw)
In-Reply-To: <Y9qqLZz3bFsgE0Kn@kernel.org>

On Wed, Feb 01, 2023 at 08:06:37PM +0200, Mike Rapoport wrote:
> Hi all,

Hi Mike, I'm interested in this topic and hope to discuss this with you
at LSF/MM/BPF.
 
> There are use-cases that need to remove pages from the direct map or at least
> map them at PTE level. These use-cases include vfree, module loading, ftrace,
> kprobe, BPF, secretmem and generally any caller of set_memory/set_direct_map
> APIs.
> 
> Remapping pages at PTE level causes split of the PUD and PMD sized mappings
> in the direct map which leads to performance degradation.
>
> To reduce the performance hit caused by the fragmentation of the direct
> map, it makes sense to group and/or cache the base pages removed from the
> direct map so that the most of base pages created during a split of a large
> page will be consumed by users requiring PTE level mappings.

How much performance difference did you see in your test when direct
map was fragmented, or is there a way to check this difference? 

> Last year the proposal to use a new migrate type for such cache received
> strong pushback and the suggested alternative was to try to use slab
> instead.
> 
> I've been thinking about it (yeah, it took me a while) and I believe slab
> is not appropriate because use cases require at least page size allocations
> and some would really benefit from higher order allocations, and in the
> most cases the code that allocates memory excluded from the direct map
> needs the struct page/folio. 
>
> For example, caching allocations of text in 2M pages would benefit from
> reduced iTLB pressure and doing kmalloc() from vmalloc() will be way more
> intrusive than using some variant of __alloc_pages().
>
> Secretmem and potentially PKS protected page tables also need struct
> page/folio.
> 
> My current proposal is to have a cache of 2M pages close to the page
> allocator and use a GFP flag to make allocation request use that cache. On
> the free() path, the pages that are mapped at PTE level will be put into
> that cache.

I would like to discuss not only having cache layer of pages but also how
direct map could be merged correctly and efficiently.

I vaguely recall that Aaron Lu sent RFC series about this and Kirill A.
Shutemov's feedback was to batch merge operations. [1]

Also a CPA API called by the cache layer that could merge fragmented
mappings would work for merging 4K pages to 2M [2], but won't work
for merging 2M mappings to 1G mappings.

At that time I didn't follow more discussions (e.g. execmem_alloc())
Maybe I'm missing some points.

[1] https://lore.kernel.org/linux-mm/20220809100408.rm6ofiewtty6rvcl@box

[2] https://lore.kernel.org/linux-mm/YvfLxuflw2ctHFWF@kernel.org
 
> The cache is internally implemented as a buddy allocator so it can satisfy
> high order allocations, and there will be a shrinker to release free pages
> from that cache to the page allocator.
> 
> I hope to have a first prototype posted Really Soon.

Looking forward to that!
Wonder how it would be shaped.

> 
> -- 
> Sincerely yours,
> Mike.

next prev parent reply	other threads:[~2023-02-19  8:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 18:06 [LSF/MM/BPF TOPIC] reducing direct map fragmentation Mike Rapoport
2023-02-19  8:07 ` Hyeonggon Yoo [this message]
2023-02-19 18:09   ` Mike Rapoport
2023-02-20 14:43     ` Hyeonggon Yoo
2023-02-24 14:45       ` Mike Rapoport
2023-04-21  9:05 ` [Lsf-pc] " Michal Hocko
2023-04-21  9:47   ` Mike Rapoport
2023-04-21 12:41     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/HY3y4toae8/nmQ@localhost \
    --to=42.hyeyoo@gmail.com \
    --cc=aaron.lu@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.