* [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
@ 2026-02-19 17:51 Brendan Jackman
2026-02-23 11:28 ` Mike Rapoport
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Brendan Jackman @ 2026-02-19 17:51 UTC (permalink / raw)
To: lsf-pc; +Cc: linux-mm, rppt
As work on Address Space Isolation [0] trudges slowly along (next series coming
soon™... I promise... some details of the plan are in [0]) I've been running
into a common issue whenever I try to do new stuff with the kernel address
space: We have too many sets of pagetable manipulation routines, and yet we
don't have one that suits ASI's needs.
Similarly, I'm currently working on support for efficiently unmapping
guest_memfd pages from the physmap (an extension to [1]) - in this case I've run
into very much the same issues as with ASI.
Here are some areas of the kernel that manipulate pagetables:
1. The collection of APIs that are specific to userspace pagetables: mmu_gather,
mm/pagewalk.c, some vm_fault logic, all that good stuff.
2. The set_memory_* and set_direct_map_* APIs. (Which are implemented per-arch).
3. Some non-userspace-specific APIs in mm/memory.c, such as
apply_to_page_range().
4. mm/vmalloc.c
5. Highmem logic such as kmap_local_*
6. Boot and memory-hotplug support code (your architecture's version of
arch/x86/mm/init_64.c).
7. x86's KPTI
8. x86's LDT logic
(At LPC I started enumerating these off the top of my head and multiple people
spoke out with more examples I hadn't thought of - please join in if you can see
more!)
By and large, these components are designed completely independently from one
another. This is made possible by the smart design of the low-level helper API
(pte_present() and friends), and it does lead to nice explicit coding style.
Here are some "new" things I've wanted to do with pagetables, which are not
currently supported by any library:
- Have a second kernel pagetable (for ASI's "nonsensitive address space")
- Modify pagetables safely from a context where allocation is not possible
- Modify the kernel's pagetables while accounting pagetable allocations to the
current process
I think it's time to discuss if there's a way to scope out a "library" that:
a) Reduces the overall amount of code in the kernel, while
b) Serving the needs of the incoming guest_memfd and ASI features.
In this session I'd first like to do a quick survey of the pagetable
manipulation systems already in the kernel (that I know about), what purposes
they serve and what capabilities they have. Then I'd like to discuss some ideas
for the scope of a new "library" and which of these components it might replace.
Mike Rapoport has shared a prototype that he wrote for a generic higher-level
PGD abstraction, so I will be using that as inspiration.
This is mostly about looking for feedback and input from maintainers and
experts: what opportunities for refactoring might I be missing? What challenges
might I be forgetting about for sharing code?
[0] https://lpc.events/event/19/contributions/2029/
[1] https://lore.kernel.org/all/20260126164445.11867-1-kalyazin@amazon.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
2026-02-19 17:51 [LSF/MM/BPF TOPIC] A pagetable library for the kernel? Brendan Jackman
@ 2026-02-23 11:28 ` Mike Rapoport
2026-02-25 17:06 ` Brendan Jackman
2026-02-26 23:57 ` Isaac Manjarres
` (2 subsequent siblings)
3 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2026-02-23 11:28 UTC (permalink / raw)
To: Brendan Jackman; +Cc: lsf-pc, linux-mm
On Thu, Feb 19, 2026 at 05:51:09PM +0000, Brendan Jackman wrote:
> As work on Address Space Isolation [0] trudges slowly along (next series coming
> soon™... I promise... some details of the plan are in [0]) I've been running
> into a common issue whenever I try to do new stuff with the kernel address
> space: We have too many sets of pagetable manipulation routines, and yet we
> don't have one that suits ASI's needs.
>
> Similarly, I'm currently working on support for efficiently unmapping
> guest_memfd pages from the physmap (an extension to [1]) - in this case I've run
> into very much the same issues as with ASI.
>
> Here are some areas of the kernel that manipulate pagetables:
>
> 1. The collection of APIs that are specific to userspace pagetables: mmu_gather,
> mm/pagewalk.c, some vm_fault logic, all that good stuff.
>
> 2. The set_memory_* and set_direct_map_* APIs. (Which are implemented per-arch).
>
> 3. Some non-userspace-specific APIs in mm/memory.c, such as
> apply_to_page_range().
>
> 4. mm/vmalloc.c
>
> 5. Highmem logic such as kmap_local_*
>
> 6. Boot and memory-hotplug support code (your architecture's version of
> arch/x86/mm/init_64.c).
>
> 7. x86's KPTI
>
> 8. x86's LDT logic
>
> (At LPC I started enumerating these off the top of my head and multiple people
> spoke out with more examples I hadn't thought of - please join in if you can see
> more!)
>
> By and large, these components are designed completely independently from one
> another. This is made possible by the smart design of the low-level helper API
> (pte_present() and friends), and it does lead to nice explicit coding style.
By and large, lots of functionality that deals with kernel page tables was
added ad-hoc, like e.g. adopting set_memory() designed for DEBUG_PAGE_ALLOC
for protecting kernel and modules code.
I think it's a good idea to have a generic abstraction that can deal with
kernel page tables, like the one we have for the userspace page tables.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
2026-02-23 11:28 ` Mike Rapoport
@ 2026-02-25 17:06 ` Brendan Jackman
0 siblings, 0 replies; 6+ messages in thread
From: Brendan Jackman @ 2026-02-25 17:06 UTC (permalink / raw)
To: Mike Rapoport, Brendan Jackman; +Cc: lsf-pc, linux-mm, owner-linux-mm
On Mon Feb 23, 2026 at 11:28 AM UTC, Mike Rapoport wrote:
> On Thu, Feb 19, 2026 at 05:51:09PM +0000, Brendan Jackman wrote:
>> As work on Address Space Isolation [0] trudges slowly along (next series coming
>> soon™... I promise... some details of the plan are in [0]) I've been running
>> into a common issue whenever I try to do new stuff with the kernel address
>> space: We have too many sets of pagetable manipulation routines, and yet we
>> don't have one that suits ASI's needs.
>>
>> Similarly, I'm currently working on support for efficiently unmapping
>> guest_memfd pages from the physmap (an extension to [1]) - in this case I've run
>> into very much the same issues as with ASI.
>>
>> Here are some areas of the kernel that manipulate pagetables:
>>
>> 1. The collection of APIs that are specific to userspace pagetables: mmu_gather,
>> mm/pagewalk.c, some vm_fault logic, all that good stuff.
>>
>> 2. The set_memory_* and set_direct_map_* APIs. (Which are implemented per-arch).
>>
>> 3. Some non-userspace-specific APIs in mm/memory.c, such as
>> apply_to_page_range().
>>
>> 4. mm/vmalloc.c
>>
>> 5. Highmem logic such as kmap_local_*
>>
>> 6. Boot and memory-hotplug support code (your architecture's version of
>> arch/x86/mm/init_64.c).
>>
>> 7. x86's KPTI
>>
>> 8. x86's LDT logic
>>
>> (At LPC I started enumerating these off the top of my head and multiple people
>> spoke out with more examples I hadn't thought of - please join in if you can see
>> more!)
>>
>> By and large, these components are designed completely independently from one
>> another. This is made possible by the smart design of the low-level helper API
>> (pte_present() and friends), and it does lead to nice explicit coding style.
>
> By and large, lots of functionality that deals with kernel page tables was
> added ad-hoc, like e.g. adopting set_memory() designed for DEBUG_PAGE_ALLOC
> for protecting kernel and modules code.
That makes sense.
I've also just posted an RFC that does more awkward ad-hoc manipulation:
https://lore.kernel.org/all/20260225-page_alloc-unmapped-v1-4-e8808a03cd66@google.com/
This might help illustrate the kinda thing that we could benefit from
with a more general library, besides just deduplicating code.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
2026-02-19 17:51 [LSF/MM/BPF TOPIC] A pagetable library for the kernel? Brendan Jackman
2026-02-23 11:28 ` Mike Rapoport
@ 2026-02-26 23:57 ` Isaac Manjarres
2026-03-26 21:49 ` Jason Gunthorpe
2026-05-11 11:54 ` Brendan Jackman
3 siblings, 0 replies; 6+ messages in thread
From: Isaac Manjarres @ 2026-02-26 23:57 UTC (permalink / raw)
To: Brendan Jackman; +Cc: lsf-pc, linux-mm, rppt
On Thu, Feb 19, 2026 at 05:51:09PM +0000, Brendan Jackman wrote:
> As work on Address Space Isolation [0] trudges slowly along (next series coming
> soon™... I promise... some details of the plan are in [0]) I've been running
> into a common issue whenever I try to do new stuff with the kernel address
> space: We have too many sets of pagetable manipulation routines, and yet we
> don't have one that suits ASI's needs.
>
> Similarly, I'm currently working on support for efficiently unmapping
> guest_memfd pages from the physmap (an extension to [1]) - in this case I've run
> into very much the same issues as with ASI.
>
> Here are some areas of the kernel that manipulate pagetables:
>
> 1. The collection of APIs that are specific to userspace pagetables: mmu_gather,
> mm/pagewalk.c, some vm_fault logic, all that good stuff.
>
> 2. The set_memory_* and set_direct_map_* APIs. (Which are implemented per-arch).
>
> 3. Some non-userspace-specific APIs in mm/memory.c, such as
> apply_to_page_range().
>
> 4. mm/vmalloc.c
>
> 5. Highmem logic such as kmap_local_*
>
> 6. Boot and memory-hotplug support code (your architecture's version of
> arch/x86/mm/init_64.c).
>
> 7. x86's KPTI
>
> 8. x86's LDT logic
>
> (At LPC I started enumerating these off the top of my head and multiple people
> spoke out with more examples I hadn't thought of - please join in if you can see
> more!)
>
> By and large, these components are designed completely independently from one
> another. This is made possible by the smart design of the low-level helper API
> (pte_present() and friends), and it does lead to nice explicit coding style.
>
> Here are some "new" things I've wanted to do with pagetables, which are not
> currently supported by any library:
>
> - Have a second kernel pagetable (for ASI's "nonsensitive address space")
>
> - Modify pagetables safely from a context where allocation is not possible
>
> - Modify the kernel's pagetables while accounting pagetable allocations to the
> current process
>
> I think it's time to discuss if there's a way to scope out a "library" that:
>
> a) Reduces the overall amount of code in the kernel, while
>
> b) Serving the needs of the incoming guest_memfd and ASI features.
>
> In this session I'd first like to do a quick survey of the pagetable
> manipulation systems already in the kernel (that I know about), what purposes
> they serve and what capabilities they have. Then I'd like to discuss some ideas
> for the scope of a new "library" and which of these components it might replace.
>
> Mike Rapoport has shared a prototype that he wrote for a generic higher-level
> PGD abstraction, so I will be using that as inspiration.
>
> This is mostly about looking for feedback and input from maintainers and
> experts: what opportunities for refactoring might I be missing? What challenges
> might I be forgetting about for sharing code?
>
> [0] https://lpc.events/event/19/contributions/2029/
> [1] https://lore.kernel.org/all/20260126164445.11867-1-kalyazin@amazon.com/
Hello Brendan,
Thanks for sharing this! I this it's a great idea to introduce a library
like this for the kernel page tables. I'm interested in participating
in this discussion as well.
Thanks,
Isaac
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
2026-02-19 17:51 [LSF/MM/BPF TOPIC] A pagetable library for the kernel? Brendan Jackman
2026-02-23 11:28 ` Mike Rapoport
2026-02-26 23:57 ` Isaac Manjarres
@ 2026-03-26 21:49 ` Jason Gunthorpe
2026-05-11 11:54 ` Brendan Jackman
3 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2026-03-26 21:49 UTC (permalink / raw)
To: Brendan Jackman; +Cc: lsf-pc, linux-mm, rppt
On Thu, Feb 19, 2026 at 05:51:09PM +0000, Brendan Jackman wrote:
> 1. The collection of APIs that are specific to userspace pagetables: mmu_gather,
> mm/pagewalk.c, some vm_fault logic, all that good stuff.
I have done something for iommu that has a very general part that may
very well be a useful component of this:
https://docs.kernel.org/driver-api/generic_pt.html
https://github.com/torvalds/linux/tree/master/drivers/iommu/generic_pt
There are lots of things that need various page table algorithms..
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] A pagetable library for the kernel?
2026-02-19 17:51 [LSF/MM/BPF TOPIC] A pagetable library for the kernel? Brendan Jackman
` (2 preceding siblings ...)
2026-03-26 21:49 ` Jason Gunthorpe
@ 2026-05-11 11:54 ` Brendan Jackman
3 siblings, 0 replies; 6+ messages in thread
From: Brendan Jackman @ 2026-05-11 11:54 UTC (permalink / raw)
To: Brendan Jackman, lsf-pc; +Cc: linux-mm, rppt, owner-linux-mm
Slides: https://docs.google.com/presentation/d/1zcqEqRrpPR_K8LwcNIH8XpL7mk7mMcI_N1jJ67OWpYA/edit?slide=id.p#slide=id.p
Recap: We did not actually get onto the topic described in the proposal
at all! This was partly because while preparing for the session I began
to realise I don't think a grand vision is sensible here, instead I
think we just want to take incremental steps to share code in a small
number of reasonably obvious cases. Therefore, I allowed side
discussions to run on as long as they needed to, since I think they were
actually more interesting than the "real" topic.
I think these were the most important points of discussion:
1. This has a certain amount of similarity with Gregory Price's proposal
[1] formerly known as "private memory".
I think the most important overlap was that both of our proposals
involve adding GFP flags which is quite unpopular.
2. I proposed that in order to avoid __GFP_UNMAPPED, I would explore
trying to make the unmapped allocation feature "private" to the page
cache, giving it some sort of "side channel" to the allocator
without exposing a dangerous API or consuming a GFP bit.
People didn't seem too offended by that idea so I do plan to pursue
this, although it's still quite a vague proposition.
3. Although there are issues with the interfaces, the core page_alloc.c
code is in a reviewable shape and I would LOVE to get some review of
that. The relevant patch is here: [2]
https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-19-28bf1bd54f41@google.com/
(also the subsequent patch to add __GFP_ZERO support).
Also, from the hallway track: I am sorry that "mermap" looks like
"mremap"! I only noticed this after I had already sent proposals using
the new name. But, I'm not gonna propose a _third_ name for this thing
(I originally called it "ephmap" but this is extremely confusing when
vocalised), but I'm happy to return to the subject when all the
technical challenges are settled and we're talking about actually merging
commits.
[1] https://lore.kernel.org/all/af9i7dkNvGGxPHzu@gourry-fedora-PF4VCD3F/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-11 11:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 17:51 [LSF/MM/BPF TOPIC] A pagetable library for the kernel? Brendan Jackman
2026-02-23 11:28 ` Mike Rapoport
2026-02-25 17:06 ` Brendan Jackman
2026-02-26 23:57 ` Isaac Manjarres
2026-03-26 21:49 ` Jason Gunthorpe
2026-05-11 11:54 ` Brendan Jackman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox