From: Brendan Jackman <jackmanb@google.com>
To: Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Lorenzo Stoakes <ljs@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org,
rppt@kernel.org, Sumit Garg <sumit.garg@oss.qualcomm.com>,
derkling@google.com, reijiw@google.com,
Will Deacon <will@kernel.org>,
rientjes@google.com, "Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
patrick.roy@linux.dev, "Itazuri, Takahiro" <itazur@amazon.co.uk>,
Andy Lutomirski <luto@kernel.org>,
David Kaplan <david.kaplan@amd.com>,
Thomas Gleixner <tglx@kernel.org>,
Brendan Jackman <jackmanb@google.com>,
Yosry Ahmed <yosry@kernel.org>
Subject: [PATCH v2 02/22] x86/mm: Generalize LDT remap into "mm-local region"
Date: Fri, 20 Mar 2026 18:23:26 +0000 [thread overview]
Message-ID: <20260320-page_alloc-unmapped-v2-2-28bf1bd54f41@google.com> (raw)
In-Reply-To: <20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com>
Various security features benefit from having process-local address
mappings. Examples include no-direct-map guest_memfd [2] and significant
optimizations for ASI [1].
As pointed out by Andy in [0], x86 already has a PGD entry that is local
to the mm, which is used for the LDT.
So, simply redefine that entry's region as "the mm-local region" and
then redefine the LDT region as a sub-region of that.
With the currently-envisaged usecases, there will be many situations
where almost no processes have any need for the mm-local region.
Therefore, avoid its overhead (memory cost of pagetables, alloc/free
overhead during fork/exit) for processes that don't use it by requiring
its users to explicitly initialize it via the new mm_local_* API.
This means that the LDT remap code can be simplified:
1. map_ldt_struct_to_user() and free_ldt_pgtables() are no longer
required as the mm_local core code handles that automatically.
2. The sanity-check logic is unified: in both cases just walk the
pagetables via a generic mechanism. This slightly relaxes the
sanity-checking since lookup_address_in_pgd() is more flexible than
pgd_to_pmd_walk(), but this seems to be worth it for the simplified
code.
On 64-bit, the mm-local region gets a whole PGD. On 32-bit, it just
i.e. one PMD, i.e. it is completely consumed by the LDT remap - no
investigation has been done into whether it's feasible to expand the
region on 32-bit. Most likely there is no strong usecase for that
anyway.
In both cases, in order to combine the need for an on-demand mm
initialisation, combined with the desire to transparently handle
propagating mappings to userspace under KPTI, the user and kernel
pagetables are shared at the highest level possible. For PAE that means
the PTE table is shared and for 64-bit the P4D/PUD. This is implemented
by pre-allocating the first shared table when the mm-local region is
first initialised.
The PAE implementation of mm_local_map_to_user() does not allocate
pagetables, it assumes the PMD has been preallocated. To make that
assumption safer, expose PREALLOCATED_PMDs in the arch headers so that
mm_local_map_to_user() can have a BUILD_BUG_ON().
[0] https://lore.kernel.org/linux-mm/CALCETrXHbS9VXfZ80kOjiTrreM2EbapYeGp68mvJPbosUtorYA@mail.gmail.com/
[1] https://linuxasi.dev/
[2] https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus.lmu.de
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Documentation/arch/x86/x86_64/mm.rst | 4 +-
arch/x86/Kconfig | 2 +
arch/x86/include/asm/mmu_context.h | 119 ++++++++++++++++++++++++++++-
arch/x86/include/asm/page.h | 32 ++++++++
arch/x86/include/asm/pgtable_32_areas.h | 9 ++-
arch/x86/include/asm/pgtable_64_types.h | 12 ++-
arch/x86/kernel/ldt.c | 130 +++++---------------------------
arch/x86/mm/pgtable.c | 32 +-------
include/linux/mm.h | 13 ++++
include/linux/mm_types.h | 2 +
kernel/fork.c | 1 +
mm/Kconfig | 11 +++
12 files changed, 217 insertions(+), 150 deletions(-)
diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst
index a6cf05d51bd8c..fa2bb7bab6a42 100644
--- a/Documentation/arch/x86/x86_64/mm.rst
+++ b/Documentation/arch/x86/x86_64/mm.rst
@@ -53,7 +53,7 @@ Complete virtual memory map with 4-level page tables
____________________________________________________________|___________________________________________________________
| | | |
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
- ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | LDT remap for PTI
+ ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | MM-local kernel data. Includes LDT remap for PTI
ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
ffffc88000000000 | -55.5 TB | ffffc8ffffffffff | 0.5 TB | ... unused hole
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
@@ -123,7 +123,7 @@ Complete virtual memory map with 5-level page tables
____________________________________________________________|___________________________________________________________
| | | |
ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
- ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
+ ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | MM-local kernel data. Includes LDT remap for PTI
ff11000000000000 | -59.75 PB | ff90ffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
ff91000000000000 | -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8038b26ae99e0..d7073b6077c62 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -133,6 +133,7 @@ config X86
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
+ select ARCH_SUPPORTS_MM_LOCAL_REGION if X86_64 || X86_PAE
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if X86_CX8
select ARCH_USE_MEMTEST
@@ -2323,6 +2324,7 @@ config CMDLINE_OVERRIDE
config MODIFY_LDT_SYSCALL
bool "Enable the LDT (local descriptor table)" if EXPERT
default y
+ select MM_LOCAL_REGION if MITIGATION_PAGE_TABLE_ISOLATION || X86_PAE
help
Linux can allow user programs to install a per-process x86
Local Descriptor Table (LDT) using the modify_ldt(2) system
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index ef5b507de34e2..14f75d1d7e28f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -8,8 +8,10 @@
#include <trace/events/tlb.h>
+#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
+#include <asm/pgalloc.h>
#include <asm/debugreg.h>
#include <asm/gsseg.h>
#include <asm/desc.h>
@@ -59,7 +61,6 @@ static inline void init_new_context_ldt(struct mm_struct *mm)
}
int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm);
void destroy_context_ldt(struct mm_struct *mm);
-void ldt_arch_exit_mmap(struct mm_struct *mm);
#else /* CONFIG_MODIFY_LDT_SYSCALL */
static inline void init_new_context_ldt(struct mm_struct *mm) { }
static inline int ldt_dup_context(struct mm_struct *oldmm,
@@ -68,7 +69,6 @@ static inline int ldt_dup_context(struct mm_struct *oldmm,
return 0;
}
static inline void destroy_context_ldt(struct mm_struct *mm) { }
-static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { }
#endif
#ifdef CONFIG_MODIFY_LDT_SYSCALL
@@ -223,10 +223,123 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
return ldt_dup_context(oldmm, mm);
}
+#ifdef CONFIG_MM_LOCAL_REGION
+static inline void mm_local_region_free(struct mm_struct *mm)
+{
+ if (mm_local_region_used(mm)) {
+ struct mmu_gather tlb;
+ unsigned long start = MM_LOCAL_BASE_ADDR;
+ unsigned long end = MM_LOCAL_END_ADDR;
+
+ /*
+ * Although free_pgd_range() is intended for freeing user
+ * page-tables, it also works out for kernel mappings on x86.
+ * We use tlb_gather_mmu_fullmm() to avoid confusing the
+ * range-tracking logic in __tlb_adjust_range().
+ */
+ tlb_gather_mmu_fullmm(&tlb, mm);
+ free_pgd_range(&tlb, start, end, start, end);
+ tlb_finish_mmu(&tlb);
+
+ mm_flags_clear(MMF_LOCAL_REGION_USED, mm);
+ }
+}
+
+#if defined(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) && defined(CONFIG_X86_PAE)
+static inline pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+ p4d_t *p4d;
+ pud_t *pud;
+
+ if (pgd->pgd == 0)
+ return NULL;
+
+ p4d = p4d_offset(pgd, va);
+ if (p4d_none(*p4d))
+ return NULL;
+
+ pud = pud_offset(p4d, va);
+ if (pud_none(*pud))
+ return NULL;
+
+ return pmd_offset(pud, va);
+}
+
+static inline int mm_local_map_to_user(struct mm_struct *mm)
+{
+ BUILD_BUG_ON(!PREALLOCATED_PMDS);
+ pgd_t *k_pgd = pgd_offset(mm, MM_LOCAL_BASE_ADDR);
+ pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+ pmd_t *k_pmd, *u_pmd;
+ int err;
+
+ k_pmd = pgd_to_pmd_walk(k_pgd, MM_LOCAL_BASE_ADDR);
+ u_pmd = pgd_to_pmd_walk(u_pgd, MM_LOCAL_BASE_ADDR);
+
+ BUILD_BUG_ON(MM_LOCAL_END_ADDR - MM_LOCAL_BASE_ADDR > PMD_SIZE);
+
+ /* Preallocate the PTE table so it can be shared. */
+ err = pte_alloc(mm, k_pmd);
+ if (err)
+ return err;
+
+ /* Point the userspace PMD at the same PTE as the kernel PMD. */
+ set_pmd(u_pmd, *k_pmd);
+ return 0;
+}
+#elif defined(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION)
+static inline int mm_local_map_to_user(struct mm_struct *mm)
+{
+ pgd_t *pgd;
+ int err;
+
+ err = preallocate_sub_pgd(mm, MM_LOCAL_BASE_ADDR);
+ if (err)
+ return err;
+
+ pgd = pgd_offset(mm, MM_LOCAL_BASE_ADDR);
+ set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+ return 0;
+}
+#else
+static inline int mm_local_map_to_user(struct mm_struct *mm)
+{
+ WARN_ONCE(1, "mm_local_map_to_user() not implemented");
+ return -EINVAL;
+}
+#endif
+
+/*
+ * Do initial setup of the user-local region. Call from process context.
+ *
+ * Under PTI, userspace shares the pagetables for the mm-local region with the
+ * kernel (if you map stuff here, it's immediately mapped into userspace too).
+ * LDT remap. It's assuming nothing gets mapped in here that needs to be
+ * protected from Meltdown-type attacks from the current process.
+ */
+static inline int mm_local_region_init(struct mm_struct *mm)
+{
+ int err;
+
+ if (boot_cpu_has(X86_FEATURE_PTI)) {
+ err = mm_local_map_to_user(mm);
+ if (err)
+ return err;
+ }
+
+ mm_flags_set(MMF_LOCAL_REGION_USED, mm);
+
+ return 0;
+}
+
+#else
+static inline void mm_local_region_free(struct mm_struct *mm) { }
+#endif /* CONFIG_MM_LOCAL_REGION */
+
static inline void arch_exit_mmap(struct mm_struct *mm)
{
paravirt_arch_exit_mmap(mm);
- ldt_arch_exit_mmap(mm);
+ mm_local_region_free(mm);
}
#ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 416dc88e35c15..4de4715c3b40f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -78,6 +78,38 @@ static __always_inline u64 __is_canonical_address(u64 vaddr, u8 vaddr_bits)
return __canonical_address(vaddr, vaddr_bits) == vaddr;
}
+#ifdef CONFIG_X86_PAE
+
+/*
+ * In PAE mode, we need to do a cr3 reload (=tlb flush) when
+ * updating the top-level pagetable entries to guarantee the
+ * processor notices the update. Since this is expensive, and
+ * all 4 top-level entries are used almost immediately in a
+ * new process's life, we just pre-populate them here.
+ */
+#define PREALLOCATED_PMDS PTRS_PER_PGD
+/*
+ * "USER_PMDS" are the PMDs for the user copy of the page tables when
+ * PTI is enabled. They do not exist when PTI is disabled. Note that
+ * this is distinct from the user _portion_ of the kernel page tables
+ * which always exists.
+ *
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS (boot_cpu_has(X86_FEATURE_PTI) ? KERNEL_PGD_PTRS : 0)
+#define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
+
+#else /* !CONFIG_X86_PAE */
+
+/* No need to prepopulate any pagetable entries in non-PAE modes. */
+#define PREALLOCATED_PMDS 0
+#define PREALLOCATED_USER_PMDS 0
+#define MAX_PREALLOCATED_USER_PMDS 0
+
+#endif /* CONFIG_X86_PAE */
+
#endif /* __ASSEMBLER__ */
#include <asm-generic/memory_model.h>
diff --git a/arch/x86/include/asm/pgtable_32_areas.h b/arch/x86/include/asm/pgtable_32_areas.h
index 921148b429676..7fccb887f8b33 100644
--- a/arch/x86/include/asm/pgtable_32_areas.h
+++ b/arch/x86/include/asm/pgtable_32_areas.h
@@ -30,9 +30,14 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
#define CPU_ENTRY_AREA_BASE \
((FIXADDR_TOT_START - PAGE_SIZE*(CPU_ENTRY_AREA_PAGES+1)) & PMD_MASK)
-#define LDT_BASE_ADDR \
- ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
+/*
+ * On 32-bit the mm-local region is currently completely consumed by the LDT
+ * remap.
+ */
+#define MM_LOCAL_BASE_ADDR ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
+#define MM_LOCAL_END_ADDR (MM_LOCAL_BASE_ADDR + PMD_SIZE)
+#define LDT_BASE_ADDR MM_LOCAL_BASE_ADDR
#define LDT_END_ADDR (LDT_BASE_ADDR + PMD_SIZE)
#define PKMAP_BASE \
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 7eb61ef6a185f..1181565966405 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -5,8 +5,11 @@
#include <asm/sparsemem.h>
#ifndef __ASSEMBLER__
+#include <linux/build_bug.h>
#include <linux/types.h>
#include <asm/kaslr.h>
+#include <asm/page_types.h>
+#include <uapi/asm/ldt.h>
/*
* These are used to make use of C type-checking..
@@ -100,9 +103,12 @@ extern unsigned int ptrs_per_p4d;
#define GUARD_HOLE_BASE_ADDR (GUARD_HOLE_PGD_ENTRY << PGDIR_SHIFT)
#define GUARD_HOLE_END_ADDR (GUARD_HOLE_BASE_ADDR + GUARD_HOLE_SIZE)
-#define LDT_PGD_ENTRY -240UL
-#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
-#define LDT_END_ADDR (LDT_BASE_ADDR + PGDIR_SIZE)
+#define MM_LOCAL_PGD_ENTRY -240UL
+#define MM_LOCAL_BASE_ADDR (MM_LOCAL_PGD_ENTRY << PGDIR_SHIFT)
+#define MM_LOCAL_END_ADDR ((MM_LOCAL_PGD_ENTRY + 1) << PGDIR_SHIFT)
+
+#define LDT_BASE_ADDR MM_LOCAL_BASE_ADDR
+#define LDT_END_ADDR (LDT_BASE_ADDR + PMD_SIZE)
#define __VMALLOC_BASE_L4 0xffffc90000000000UL
#define __VMALLOC_BASE_L5 0xffa0000000000000UL
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 40c5bf97dd5cc..fb2a1914539f8 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -31,6 +31,8 @@
#include <xen/xen.h>
+/* LDTs are double-buffered, the buffers are called slots. */
+#define LDT_NUM_SLOTS 2
/* This is a multiple of PAGE_SIZE. */
#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
@@ -186,100 +188,36 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
-static void do_sanity_check(struct mm_struct *mm,
- bool had_kernel_mapping,
- bool had_user_mapping)
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
{
+ pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+ unsigned int k_level, u_level;
+ bool had_kernel, had_user;
+
+ had_kernel = lookup_address_in_pgd(k_pgd, LDT_BASE_ADDR, &k_level);
+ had_user = lookup_address_in_pgd(u_pgd, LDT_BASE_ADDR, &u_level);
+
if (mm->context.ldt) {
/*
* We already had an LDT. The top-level entry should already
* have been allocated and synchronized with the usermode
* tables.
*/
- WARN_ON(!had_kernel_mapping);
+ WARN_ON(!had_kernel);
if (boot_cpu_has(X86_FEATURE_PTI))
- WARN_ON(!had_user_mapping);
+ WARN_ON(!had_user);
} else {
/*
* This is the first time we're mapping an LDT for this process.
* Sync the pgd to the usermode tables.
*/
- WARN_ON(had_kernel_mapping);
+ WARN_ON(had_kernel);
if (boot_cpu_has(X86_FEATURE_PTI))
- WARN_ON(had_user_mapping);
+ WARN_ON(had_user);
}
}
-#ifdef CONFIG_X86_PAE
-
-static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
-{
- p4d_t *p4d;
- pud_t *pud;
-
- if (pgd->pgd == 0)
- return NULL;
-
- p4d = p4d_offset(pgd, va);
- if (p4d_none(*p4d))
- return NULL;
-
- pud = pud_offset(p4d, va);
- if (pud_none(*pud))
- return NULL;
-
- return pmd_offset(pud, va);
-}
-
-static void map_ldt_struct_to_user(struct mm_struct *mm)
-{
- pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
- pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
- pmd_t *k_pmd, *u_pmd;
-
- k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
- u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
-
- if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
- set_pmd(u_pmd, *k_pmd);
-}
-
-static void sanity_check_ldt_mapping(struct mm_struct *mm)
-{
- pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
- pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
- bool had_kernel, had_user;
- pmd_t *k_pmd, *u_pmd;
-
- k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
- u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
- had_kernel = (k_pmd->pmd != 0);
- had_user = (u_pmd->pmd != 0);
-
- do_sanity_check(mm, had_kernel, had_user);
-}
-
-#else /* !CONFIG_X86_PAE */
-
-static void map_ldt_struct_to_user(struct mm_struct *mm)
-{
- pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
-
- if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
- set_pgd(kernel_to_user_pgdp(pgd), *pgd);
-}
-
-static void sanity_check_ldt_mapping(struct mm_struct *mm)
-{
- pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
- bool had_kernel = (pgd->pgd != 0);
- bool had_user = (kernel_to_user_pgdp(pgd)->pgd != 0);
-
- do_sanity_check(mm, had_kernel, had_user);
-}
-
-#endif /* CONFIG_X86_PAE */
-
/*
* If PTI is enabled, this maps the LDT into the kernelmode and
* usermode tables for the given mm.
@@ -295,6 +233,8 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
if (!boot_cpu_has(X86_FEATURE_PTI))
return 0;
+ mm_local_region_init(mm);
+
/*
* Any given ldt_struct should have map_ldt_struct() called at most
* once.
@@ -339,9 +279,6 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
pte_unmap_unlock(ptep, ptl);
}
- /* Propagate LDT mapping to the user page-table */
- map_ldt_struct_to_user(mm);
-
ldt->slot = slot;
return 0;
}
@@ -390,28 +327,6 @@ static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
}
#endif /* CONFIG_MITIGATION_PAGE_TABLE_ISOLATION */
-static void free_ldt_pgtables(struct mm_struct *mm)
-{
-#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
- struct mmu_gather tlb;
- unsigned long start = LDT_BASE_ADDR;
- unsigned long end = LDT_END_ADDR;
-
- if (!boot_cpu_has(X86_FEATURE_PTI))
- return;
-
- /*
- * Although free_pgd_range() is intended for freeing user
- * page-tables, it also works out for kernel mappings on x86.
- * We use tlb_gather_mmu_fullmm() to avoid confusing the
- * range-tracking logic in __tlb_adjust_range().
- */
- tlb_gather_mmu_fullmm(&tlb, mm);
- free_pgd_range(&tlb, start, end, start, end);
- tlb_finish_mmu(&tlb);
-#endif
-}
-
/* After calling this, the LDT is immutable. */
static void finalize_ldt_struct(struct ldt_struct *ldt)
{
@@ -472,7 +387,6 @@ int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
retval = map_ldt_struct(mm, new_ldt, 0);
if (retval) {
- free_ldt_pgtables(mm);
free_ldt_struct(new_ldt);
goto out_unlock;
}
@@ -494,11 +408,6 @@ void destroy_context_ldt(struct mm_struct *mm)
mm->context.ldt = NULL;
}
-void ldt_arch_exit_mmap(struct mm_struct *mm)
-{
- free_ldt_pgtables(mm);
-}
-
static int read_ldt(void __user *ptr, unsigned long bytecount)
{
struct mm_struct *mm = current->mm;
@@ -645,10 +554,9 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
/*
* This only can fail for the first LDT setup. If an LDT is
* already installed then the PTE page is already
- * populated. Mop up a half populated page table.
+ * populated.
*/
- if (!WARN_ON_ONCE(old_ldt))
- free_ldt_pgtables(mm);
+ WARN_ON_ONCE(!old_ldt);
free_ldt_struct(new_ldt);
goto out_unlock;
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 2e5ecfdce73c3..e4132696c9ef2 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -111,29 +111,6 @@ static void pgd_dtor(pgd_t *pgd)
*/
#ifdef CONFIG_X86_PAE
-/*
- * In PAE mode, we need to do a cr3 reload (=tlb flush) when
- * updating the top-level pagetable entries to guarantee the
- * processor notices the update. Since this is expensive, and
- * all 4 top-level entries are used almost immediately in a
- * new process's life, we just pre-populate them here.
- */
-#define PREALLOCATED_PMDS PTRS_PER_PGD
-
-/*
- * "USER_PMDS" are the PMDs for the user copy of the page tables when
- * PTI is enabled. They do not exist when PTI is disabled. Note that
- * this is distinct from the user _portion_ of the kernel page tables
- * which always exists.
- *
- * We allocate separate PMDs for the kernel part of the user page-table
- * when PTI is enabled. We need them to map the per-process LDT into the
- * user-space page-table.
- */
-#define PREALLOCATED_USER_PMDS (boot_cpu_has(X86_FEATURE_PTI) ? \
- KERNEL_PGD_PTRS : 0)
-#define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
-
void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
{
paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -150,12 +127,6 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
*/
flush_tlb_mm(mm);
}
-#else /* !CONFIG_X86_PAE */
-
-/* No need to prepopulate any pagetable entries in non-PAE modes. */
-#define PREALLOCATED_PMDS 0
-#define PREALLOCATED_USER_PMDS 0
-#define MAX_PREALLOCATED_USER_PMDS 0
#endif /* CONFIG_X86_PAE */
static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
@@ -375,6 +346,9 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
void pgd_free(struct mm_struct *mm, pgd_t *pgd)
{
+ /* Should be cleaned up in mmap exit path. */
+ VM_WARN_ON_ONCE(mm_local_region_used(mm));
+
pgd_mop_up_pmds(mm, pgd);
pgd_dtor(pgd);
paravirt_pgd_free(mm, pgd);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 70747b53c7da9..413dc707cff9b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -906,6 +906,19 @@ static inline void mm_flags_clear_all(struct mm_struct *mm)
bitmap_zero(ACCESS_PRIVATE(&mm->flags, __mm_flags), NUM_MM_FLAG_BITS);
}
+#ifdef CONFIG_MM_LOCAL_REGION
+static inline bool mm_local_region_used(struct mm_struct *mm)
+{
+ return mm_flags_test(MMF_LOCAL_REGION_USED, mm);
+}
+#else
+static inline bool mm_local_region_used(struct mm_struct *mm)
+{
+ VM_WARN_ON_ONCE(mm_flags_test(MMF_LOCAL_REGION_USED, mm));
+ return false;
+}
+#endif
+
extern const struct vm_operations_struct vma_dummy_vm_ops;
static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cee934c6e78ec..0ca7cb7da918f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1944,6 +1944,8 @@ enum {
#define MMF_USER_HWCAP 32 /* user-defined HWCAPs */
+#define MMF_LOCAL_REGION_USED 33
+
#define MMF_INIT_LEGACY_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
diff --git a/kernel/fork.c b/kernel/fork.c
index 68cf0109dde3c..ff075c74333fe 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1153,6 +1153,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
fail_nocontext:
mm_free_id(mm);
fail_noid:
+ WARN_ON_ONCE(mm_local_region_used(mm));
mm_free_pgd(mm);
fail_nopgd:
futex_hash_free(mm);
diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687e..2813059df9c1c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1319,6 +1319,10 @@ config SECRETMEM
default y
bool "Enable memfd_secret() system call" if EXPERT
depends on ARCH_HAS_SET_DIRECT_MAP
+ # Soft dependency, for optimisation.
+ imply MM_LOCAL_REGION
+ imply MERMAP
+ imply PAGE_ALLOC_UNMAPPED
help
Enable the memfd_secret() system call with the ability to create
memory areas visible only in the context of the owning process and
@@ -1471,6 +1475,13 @@ config LAZY_MMU_MODE_KUNIT_TEST
If unsure, say N.
+config ARCH_SUPPORTS_MM_LOCAL_REGION
+ def_bool n
+
+config MM_LOCAL_REGION
+ bool
+ depends on ARCH_SUPPORTS_MM_LOCAL_REGION
+
source "mm/damon/Kconfig"
endmenu
--
2.51.2
next prev parent reply other threads:[~2026-03-20 18:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 18:23 [PATCH v2 00/22] mm: Add __GFP_UNMAPPED Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 01/22] x86/mm: split out preallocate_sub_pgd() Brendan Jackman
2026-03-20 19:42 ` Dave Hansen
2026-03-23 11:01 ` Brendan Jackman
2026-03-24 15:27 ` Borislav Petkov
2026-03-25 13:28 ` Brendan Jackman
2026-03-20 18:23 ` Brendan Jackman [this message]
2026-03-20 19:47 ` [PATCH v2 02/22] x86/mm: Generalize LDT remap into "mm-local region" Dave Hansen
2026-03-23 12:01 ` Brendan Jackman
2026-03-23 12:57 ` Brendan Jackman
2026-03-25 14:23 ` Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 03/22] x86/tlb: Expose some flush function declarations to modules Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 04/22] mm: Create flags arg for __apply_to_page_range() Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 05/22] mm: Add more flags " Brendan Jackman
2026-03-26 16:14 ` Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 06/22] x86/mm: introduce the mermap Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 07/22] mm: KUnit tests for " Brendan Jackman
2026-03-24 8:00 ` kernel test robot
2026-03-20 18:23 ` [PATCH v2 08/22] mm: introduce for_each_free_list() Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 09/22] mm/page_alloc: don't overload migratetype in find_suitable_fallback() Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 10/22] mm: introduce freetype_t Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 11/22] mm: move migratetype definitions to freetype.h Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 12/22] mm: add definitions for allocating unmapped pages Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 13/22] mm: rejig pageblock mask definitions Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 14/22] mm: encode freetype flags in pageblock flags Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 15/22] mm/page_alloc: remove ifdefs from pindex helpers Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 16/22] mm/page_alloc: separate pcplists by freetype flags Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 17/22] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 18/22] mm/page_alloc: introduce ALLOC_NOBLOCK Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 19/22] mm/page_alloc: implement __GFP_UNMAPPED allocations Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 20/22] mm/page_alloc: implement __GFP_UNMAPPED|__GFP_ZERO allocations Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 21/22] mm: Minimal KUnit tests for some new page_alloc logic Brendan Jackman
2026-03-20 18:23 ` [PATCH v2 22/22] mm/secretmem: Use __GFP_UNMAPPED when available Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320-page_alloc-unmapped-v2-2-28bf1bd54f41@google.com \
--to=jackmanb@google.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david.kaplan@amd.com \
--cc=david@kernel.org \
--cc=derkling@google.com \
--cc=hannes@cmpxchg.org \
--cc=itazur@amazon.co.uk \
--cc=kalyazin@amazon.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=luto@kernel.org \
--cc=patrick.roy@linux.dev \
--cc=peterz@infradead.org \
--cc=reijiw@google.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=sumit.garg@oss.qualcomm.com \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yosry@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox