Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Usama Arif <usama.arif@linux.dev>
Subject: Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86
Date: Wed, 29 Apr 2026 15:39:18 +0100	[thread overview]
Message-ID: <afIYFtL6KrBs38rT@casper.infradead.org> (raw)
In-Reply-To: <aZcmlIF4bmG0twkp@thinkstation>

On Thu, Feb 19, 2026 at 03:08:51PM +0000, Kiryl Shutsemau wrote:
> No, there's no new hardware (that I know of). I want to explore what page size
> means.
> 
> The kernel uses the same value - PAGE_SIZE - for two things:
> 
>   - the order-0 buddy allocation size;
> 
>   - the granularity of virtual address space mapping;
> 
> I think we can benefit from separating these two meanings and allowing
> order-0 allocations to be larger than the virtual address space covered by a
> PTE entry.

I actually want to go in the other direction.  I once came up with a
name -- POTAM -- which stands for Power Of Two Allocator with Metadata.
The use case was something like XFS's buffer cache where we want a
filesystem block size of data (so 0.5KiB to 64KiB) with some metadata
attached (xfs_buf is 664 bytes with debugging enabled!)

I set this aside to work on folios, but folios offer a back door to
unifying this with the buddy allocator.  It's a long road, but here's
a sketch:

First, we separate memdescs from pages.  I believe this lets us shrink
struct page down to 8 bytes (previously presented as various LSFMMs).

Second, we get rid of 'page' in things like sglist and bvec.  This is
already in progress for various other good reasons.

Third (this bit is new), we replace memmap with something like a maple
tree.  That lets us lookup memdescs by physical address (typically
a memdesc will contain either the physical or virtual address of the
memory it controls).

Fourth, we change the unit of the lookup in the maple tree from being
a PFN to being address / 512 (or whatever size we want to use as our
minimum).

Now we can have memdescs for an arbitrary power of two which means we
can ditch all the awful code from ppc/s390 page table handling where
they try to share one memdesc between several different page tables.

It's going to be "fun" avoiding allocation deadlocks where we want to
rebalance the maple tree containing the memdescs ... that's a five year
away problem.


  parent reply	other threads:[~2026-04-29 14:39 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:08 [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86 Kiryl Shutsemau
2026-02-19 15:17 ` Peter Zijlstra
2026-02-19 15:20   ` Peter Zijlstra
2026-02-19 15:27     ` Kiryl Shutsemau
2026-02-19 15:33 ` Pedro Falcato
2026-02-19 15:50   ` Kiryl Shutsemau
2026-02-19 15:53     ` David Hildenbrand (Arm)
2026-02-19 19:31       ` Pedro Falcato
2026-02-19 15:39 ` David Hildenbrand (Arm)
2026-02-19 15:54   ` Kiryl Shutsemau
2026-02-19 16:09     ` David Hildenbrand (Arm)
2026-02-20  2:55       ` Zi Yan
2026-02-19 17:09   ` Kiryl Shutsemau
2026-02-20 10:24     ` David Hildenbrand (Arm)
2026-02-20 12:07       ` Kiryl Shutsemau
2026-02-20 16:30         ` David Hildenbrand (Arm)
2026-02-20 19:33           ` Kalesh Singh
2026-02-23 11:04             ` David Hildenbrand (Arm)
2026-02-23 11:13               ` Kiryl Shutsemau
2026-02-23 11:27                 ` David Hildenbrand (Arm)
2026-02-23 12:16                   ` Kiryl Shutsemau
2026-02-23 15:14                   ` Dave Hansen
2026-02-23 15:31                     ` David Hildenbrand (Arm)
2026-02-23 15:45                       ` Kiryl Shutsemau
2026-02-23 15:49                         ` David Hildenbrand (Arm)
2026-02-23 16:22                       ` Lorenzo Stoakes
2026-02-23 16:34                     ` David Laight
2026-02-19 23:24   ` Kalesh Singh
2026-02-20 12:10     ` Kiryl Shutsemau
2026-02-20 19:21       ` Kalesh Singh
2026-02-19 17:08 ` Dave Hansen
2026-02-19 22:05   ` Kiryl Shutsemau
2026-02-20  3:28     ` Liam R. Howlett
2026-02-20 12:33       ` Kiryl Shutsemau
2026-02-20 15:17         ` Liam R. Howlett
2026-02-20 15:50           ` Kiryl Shutsemau
2026-02-19 17:30 ` Dave Hansen
2026-02-19 22:14   ` Kiryl Shutsemau
2026-02-19 22:21     ` Dave Hansen
2026-02-19 17:47 ` Matthew Wilcox
2026-02-19 22:26   ` Kiryl Shutsemau
2026-02-20  9:04 ` David Laight
2026-02-20 12:12   ` Kiryl Shutsemau
2026-04-29 14:39 ` Matthew Wilcox [this message]
2026-04-29 15:26   ` Kiryl Shutsemau
2026-05-01 18:05   ` David Hildenbrand (Arm)
2026-05-01 18:00 ` Kiryl Shutsemau
2026-05-01 18:02   ` David Hildenbrand (Arm)
2026-05-01 18:12     ` Kiryl Shutsemau
2026-05-01 18:31       ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afIYFtL6KrBs38rT@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mingo@redhat.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=usama.arif@linux.dev \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox