* [PATCH 00/21] mm: ASI direct map management
@ 2025-09-24 14:59 Brendan Jackman
2025-09-24 14:59 ` [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
` (24 more replies)
0 siblings, 25 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
As per [0] I think ASI is ready to start merging. This is the first
step. The scope of this series is: everything needed to set up the
direct map in the restricted address spaces.
.:: Scope
Why is this the scope of the first series? The objective here is to
reach a MVP of ASI that people can actually run, as soon as possible.
Very broadly, this requires a) a restricted address space to exist and
b) a bunch of logic for transitioning in and out of it. An MVP of
ASI doesn't require too much flexibility w.r.t. the contents of the
restricted address space, but at least being able to omit user data from
the direct map seems like a good starting point. The rest of the address
space can be constructed trivially by just cloning the unrestricted
address space as illustrated in [1] (a commit from the branch published
in [0]), but that isn't included in this series, this is just for the
direct map.
So this series focuses on part a). The alternative would be to focus on
part b) first, instead just trivially creating the entire restricted
address space as a clone of the unrestricted one (i.e. starting from an
ASI that protects nothing).
.:: Design
Whether or not memory will be mapped into the restricted address space
("sensitivity") is determined at allocation time. This is encoded in a
new GFP flag called __GFP_SENSITIVE, which is added to GFP_USER. Some
early discussions questioned whether this GFP flag is really needed or
if we could instead determine sensitivity by some contextual hint. I'm
not aware of something that could provide this hint at the moment, but
if one exists I'd be happy to use it here. However, in the long term
it should be assumed that a GFP flag will need to appear eventually,
since we'll need to be able to annotate the sensitivity of pretty much
arbitrary memory.
So, the important thing we end up needing to design here is what
the allocator does with __GFP_SENSITIVE. This was discussed in [2] and
at LSF/MM/BPF 2024 [3]. The allocator needs to be able to map and unmap
pages into the restricted address space. Problems with this are:
1. Changing mappings might require allocating pagetables (allocating
while allocating).
2. Unmapping pages requires a TLB shootdown, which is slow and anyway
can't be done with IRQs off.
3. Mapping pages into the restricted address space, in the general case,
requires zeroing them in case they contain leftover data that was
previously sensitive.
The simple solution for point 1 is to just set a minimum granularity at
which sensitivity can change, and pre-allocate direct map pagetables
down to that granularity. This suggests that pages need to be physically
grouped by sensitivity. The second 2 points illustrate that changing
sensitivity is highly undesirable from a performance point of view. All
of this adds up to needing to be able to index free pages by
sensitivity, leading to the conclusion that we want separate freelists
for sensitive and nonsensitive pages.
The page allocator already has a mechanism to physically group, and to
index pages, by a property, namely migratetype. So the approach taken
here is to extend this concept to additionally encode sensitivity. So
when ASI is enabled, we basically double the number of free-page lists,
and add a pageblock flag that can be used to check a page's sensitivity
without needing to walk pagetables.
.:: Structure of the series
Some generic boilerplate for ASI:
x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
x86/mm/asi: add X86_FEATURE_ASI and asi=
Minimal ASI setup specifically for direct map management:
x86/mm: factor out phys_pgd_init()
x86/mm/asi: set up asi_nonsensitive_pgd
x86/mm/pat: mirror direct map changes to ASI
mm/page_alloc: add __GFP_SENSITIVE and always set it
Misc preparatory patches for easier review:
mm: introduce for_each_free_list()
mm: rejig pageblock mask definitions
mm/page_alloc: Invert is_check_pages_enabled() check
mm/page_alloc: remove ifdefs from pindex helpers
One very big annoying preparatory patch, separated to try and mitigate
review pain (sorry, I don't love this, but I think it's the best way):
mm: introduce freetype_t
The interesting bit where the actual functionality gets added:
mm/asi: encode sensitivity in freetypes and pageblocks
mm/page_alloc_test: unit test pindex helpers
x86/mm/pat: introduce cpa_fault option
mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER
mm/page_alloc: introduce ALLOC_NOBLOCK
mm/slub: defer application of gfp_allowed_mask
mm/asi: support changing pageblock sensitivity
Misc other stuff that feels just related enough to go in this series:
mm/asi: bad_page() when ASI mappings are wrong
x86/mm/asi: don't use global pages when ASI enabled
mm: asi_test: Smoke test for [non]sensitive page allocs
.:: Testing
Google is running ASI in production but this implementation is totally
different (the way we manage the direct map internally is not good,
things are working nicely so far but as we expand its footprint we're
expecting to run into an unfixable performance issue sooner or later).
Aside from the KUnit tests I've just tested this in a VM by running
these tests from run_vmtests.sh:
compaction, cow, migration, mmap, hugetlb
thp fails, but this also happens without these patches - I think it's a
bug with the ksft_set_plan(), I'll try to investigate this when I can.
Anyway if anyone has more tests they'd like me to do please let me
know. In particular I don't think anything on the list above will
exercise CMA or memory hotplug, but I don't know a good way to do that.
Also note that aside from the KUnit tests which do a super minimal
check, nothing here cares about the actual validity of the restricted
address space, it's just to try and catch cases where ASI breaks non-ASI
logic.
If people are interested, I can start a kind of "asi-next" branch that
contains everything from this patchset plus all the remaining prototype
logic to actually run ASI. Let me know if that seems useful to you
(I will have to do it sooner or later for benchmarking anyway).
[0] [Discuss] First steps for ASI (ASI is fast again)
https://lore.kernel.org/all/20250812173109.295750-1-jackmanb@google.com/
[1] mm: asi: Share most of the kernel address space with unrestricted
https://github.com/bjackman/linux/commit/04fd7a0b0098a
[2] [PATCH RFC 00/11] mm: ASI integration for the page allocator
https://lore.kernel.org/lkml/20250313-asi-page-alloc-v1-0-04972e046cea@google.com/
[3] LSF/MM/BPF 2025 slides
https://docs.google.com/presentation/d/1waibhMBXhfJ2qVEz8KtXop9MZ6UyjlWmK71i0WIH7CY/edit?slide=id.p#slide=id.p
CP:
https://lore.kernel.org/all/20250129124034.2612562-1-jackmanb@google.com/
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Brendan Jackman (21):
x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
x86/mm/asi: add X86_FEATURE_ASI and asi=
x86/mm: factor out phys_pgd_init()
x86/mm/asi: set up asi_nonsensitive_pgd
x86/mm/pat: mirror direct map changes to ASI
mm/page_alloc: add __GFP_SENSITIVE and always set it
mm: introduce for_each_free_list()
mm: rejig pageblock mask definitions
mm/page_alloc: Invert is_check_pages_enabled() check
mm/page_alloc: remove ifdefs from pindex helpers
mm: introduce freetype_t
mm/asi: encode sensitivity in freetypes and pageblocks
mm/page_alloc_test: unit test pindex helpers
x86/mm/pat: introduce cpa_fault option
mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER
mm/page_alloc: introduce ALLOC_NOBLOCK
mm/slub: defer application of gfp_allowed_mask
mm/asi: support changing pageblock sensitivity
mm/asi: bad_page() when ASI mappings are wrong
x86/mm/asi: don't use global pages when ASI enabled
mm: asi_test: smoke test for [non]sensitive page allocs
Documentation/admin-guide/kernel-parameters.txt | 8 +
arch/Kconfig | 13 +
arch/x86/.kunitconfig | 7 +
arch/x86/Kconfig | 8 +
arch/x86/include/asm/asi.h | 19 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/set_memory.h | 13 +
arch/x86/mm/Makefile | 3 +
arch/x86/mm/asi.c | 47 ++
arch/x86/mm/asi_test.c | 145 ++++++
arch/x86/mm/init.c | 10 +-
arch/x86/mm/init_64.c | 54 +-
arch/x86/mm/pat/set_memory.c | 118 ++++-
include/linux/asi.h | 19 +
include/linux/gfp.h | 16 +-
include/linux/gfp_types.h | 15 +-
include/linux/mmzone.h | 98 +++-
include/linux/pageblock-flags.h | 24 +-
include/linux/set_memory.h | 8 +
include/trace/events/mmflags.h | 1 +
init/main.c | 1 +
kernel/panic.c | 2 +
kernel/power/snapshot.c | 7 +-
mm/Kconfig | 5 +
mm/Makefile | 1 +
mm/compaction.c | 32 +-
mm/init-mm.c | 3 +
mm/internal.h | 44 +-
mm/mm_init.c | 11 +-
mm/page_alloc.c | 664 +++++++++++++++++-------
mm/page_alloc_test.c | 70 +++
mm/page_isolation.c | 2 +-
mm/page_owner.c | 7 +-
mm/page_reporting.c | 4 +-
mm/show_mem.c | 2 +-
mm/slub.c | 4 +-
36 files changed, 1205 insertions(+), 281 deletions(-)
---
base-commit: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
change-id: 20250923-b4-asi-page-alloc-74b5383a72fc
Best regards,
--
Brendan Jackman <jackmanb@google.com>
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-10-24 22:37 ` Borislav Petkov
2025-09-24 14:59 ` [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi= Brendan Jackman
` (23 subsequent siblings)
24 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
This long awkward name is for consistency with
CONFIG_MITIGATION_PAGE_TABLE_ISOLATION.
In the short term, there isn't much arch code. In the medium term, it
will mostly be x86 code. So, put the code where it will need to go
instead of just having to move it soon.
In the long term, it should probably include other archs too, so
things should be as arch-specific as necessary, but not more so.
Follow the proposal by Mike Rapoport[0]: a generic header includes
NOP stubs for ASI definitions. If
CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION is defined then the asm/ tree
must have asi.h, and that gets included instead of the stubs.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/Kconfig | 13 +++++++++++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/asi.h | 5 +++++
include/linux/asi.h | 10 ++++++++++
4 files changed, 29 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index bae871976d36f7b6b2af0be40a067ca2b3fd3d14..ad99637630406e5a484173f5207bbd5a64b2bf1f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -17,6 +17,19 @@ config CPU_MITIGATIONS
def_bool y
endif
+config ARCH_HAS_MITIGATION_ADDRESS_SPACE_ISOLATION
+ bool
+
+config MITIGATION_ADDRESS_SPACE_ISOLATION
+ bool "Allow code to run with a reduced kernel address space"
+ default n
+ depends on ARCH_HAS_MITIGATION_ADDRESS_SPACE_ISOLATION
+ help
+ This feature provides the ability to run some kernel code
+ with a reduced kernel address space. This can be used to
+ mitigate some speculative execution attacks.
+
+ ASI is not yet ready for use.
#
# Selected by architectures that need custom DMA operations for e.g. legacy
# IOMMUs not handled by dma-iommu. Drivers must never select this symbol.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1fd698311bc1dba134a8e14dd551d2390e752cda..cb874c3857cf443c6235e05bc3f070b0ea2686f0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -38,6 +38,7 @@ config X86_64
select ZONE_DMA32
select EXECMEM if DYNAMIC_FTRACE
select ACPI_MRRM if ACPI
+ select ARCH_HAS_MITIGATION_ADDRESS_SPACE_ISOLATION
config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
new file mode 100644
index 0000000000000000000000000000000000000000..53acdf22fe33efc6ccedbae52b262a904868459a
--- /dev/null
+++ b/arch/x86/include/asm/asi.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ASI_H
+#define _ASM_X86_ASI_H
+
+#endif /* _ASM_X86_ASI_H */
diff --git a/include/linux/asi.h b/include/linux/asi.h
new file mode 100644
index 0000000000000000000000000000000000000000..ef640c8e79369a9ada2881067f0c1d78093293f7
--- /dev/null
+++ b/include/linux/asi.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _INCLUDE_ASI_H
+#define _INCLUDE_ASI_H
+
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#include <asm/asi.h>
+#else
+
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+#endif /* _INCLUDE_ASI_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi=
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
2025-09-24 14:59 ` [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-10-25 10:06 ` Borislav Petkov
2025-09-24 14:59 ` [PATCH 03/21] x86/mm: factor out phys_pgd_init() Brendan Jackman
` (22 subsequent siblings)
24 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
Add a CPU feature to enable ASI, and a command-line flag to enable that
feature. At present, the feature doesn't do anything, but adding it
early helps to avoid unnecessary code churn later.
The cmdline arg will eventually need an "auto" behaviour, but since this
would be equivalent to "off", don't define it yet. Just define what's
necessary to be able to test the code.
Co-developed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Junaid Shahid <junaids@google.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Documentation/admin-guide/kernel-parameters.txt | 8 +++++++
arch/x86/include/asm/asi.h | 10 +++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/mm/Makefile | 1 +
arch/x86/mm/asi.c | 28 +++++++++++++++++++++++++
arch/x86/mm/init.c | 3 +++
include/linux/asi.h | 5 +++++
7 files changed, 56 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..9b8330fc1fe31721af39b08b58b729ced78ba803 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5324,6 +5324,14 @@
Not specifying this option is equivalent to pti=auto.
+ asi= [X86-64] Control Address Space Isolation (ASI), a
+ technology for mitigating CPU vulnerabilities. ASI is
+ not yet ready to provide security guarantees but can be
+ enabled for evaluation.
+
+ on - unconditionally enable
+ off - unconditionally disable
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
index 53acdf22fe33efc6ccedbae52b262a904868459a..32a4c04c4be0f6f425c7cbcff4c58f1827a4b4c4 100644
--- a/arch/x86/include/asm/asi.h
+++ b/arch/x86/include/asm/asi.h
@@ -2,4 +2,14 @@
#ifndef _ASM_X86_ASI_H
#define _ASM_X86_ASI_H
+#include <asm/cpufeature.h>
+
+void asi_check_boottime_disable(void);
+
+/* Helper for generic code. Arch code just uses cpu_feature_enabled(). */
+static inline bool asi_enabled_static(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_ASI);
+}
+
#endif /* _ASM_X86_ASI_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3eee24a4cabf3b2131c34596236d8bc8eec05b3b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -499,6 +499,7 @@
#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
#define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
#define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
+#define X86_FEATURE_ASI (21*32+17) /* Kernel Address Space Isolation */
/*
* BUG word(s)
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 5b9908f13dcfd092897f3778ee56ea4d45bdb868..5ecbff70964f61a903ac96cec3736a7cec1221fd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) += pti.o
+obj-$(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION) += asi.o
obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o
diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
new file mode 100644
index 0000000000000000000000000000000000000000..8c907f3c84f43f66e412ecbfa99e67390d31a66f
--- /dev/null
+++ b/arch/x86/mm/asi.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/asi.h>
+#include <linux/init.h>
+#include <linux/string.h>
+
+#include <asm/cmdline.h>
+#include <asm/cpufeature.h>
+
+void __init asi_check_boottime_disable(void)
+{
+ bool enabled = false;
+ char arg[4];
+ int ret;
+
+ ret = cmdline_find_option(boot_command_line, "asi", arg, sizeof(arg));
+ if (ret == 3 && !strncmp(arg, "off", 3)) {
+ enabled = false;
+ pr_info("ASI explicitly disabled by kernel cmdline.\n");
+ } else if (ret == 2 && !strncmp(arg, "on", 2)) {
+ enabled = true;
+ pr_info("ASI enabled.\n");
+ } else if (ret) {
+ pr_err("Unknown asi= flag '%s', try 'off' or 'on'\n", arg);
+ }
+
+ if (enabled)
+ setup_force_cpu_cap(X86_FEATURE_ASI);
+}
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8bf6ad4b9400e7a04e9dc4e341e20a4a67ddb7ab..b877a41fc291284eb271ebe764a52730d51da3fc 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/asi.h>
#include <linux/gfp.h>
#include <linux/initrd.h>
#include <linux/ioport.h>
@@ -761,6 +763,7 @@ void __init init_mem_mapping(void)
unsigned long end;
pti_check_boottime_disable();
+ asi_check_boottime_disable();
probe_page_size_mask();
setup_pcid();
diff --git a/include/linux/asi.h b/include/linux/asi.h
index ef640c8e79369a9ada2881067f0c1d78093293f7..1832feb1b14d63f05bbfa3f87dd07753338ed70b 100644
--- a/include/linux/asi.h
+++ b/include/linux/asi.h
@@ -6,5 +6,10 @@
#include <asm/asi.h>
#else
+#include <linux/types.h>
+
+static inline void asi_check_boottime_disable(void) { }
+static inline bool asi_enabled_static(void) { return false; }
+
#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
#endif /* _INCLUDE_ASI_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
2025-09-24 14:59 ` [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2025-09-24 14:59 ` [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi= Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-27 19:29 ` kernel test robot
2025-10-25 11:48 ` Borislav Petkov
2025-09-24 14:59 ` [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd Brendan Jackman
` (21 subsequent siblings)
24 siblings, 2 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
__kernel_physical_mapping_init() will soon need to work on multiple
PGDs, so factor out something similar to phys_p4d_init() and friends,
which takes the base of the PGD as an argument.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/init_64.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0e4270e20fadb578c7fd6bf5c5e4762027c36c45..e98e85cf15f42db669696ba8195d8fc633351b26 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -741,21 +741,20 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
}
static unsigned long __meminit
-__kernel_physical_mapping_init(unsigned long paddr_start,
- unsigned long paddr_end,
- unsigned long page_size_mask,
- pgprot_t prot, bool init)
+phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_end,
+ unsigned long page_size_mask, pgprot_t prot, bool init, bool *pgd_changed)
{
- bool pgd_changed = false;
unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+ *pgd_changed = false;
+
paddr_last = paddr_end;
vaddr = (unsigned long)__va(paddr_start);
vaddr_end = (unsigned long)__va(paddr_end);
vaddr_start = vaddr;
for (; vaddr < vaddr_end; vaddr = vaddr_next) {
- pgd_t *pgd = pgd_offset_k(vaddr);
+ pgd_t *pgd = pgd_offset_pgd(pgd_page, vaddr);
p4d_t *p4d;
vaddr_next = (vaddr & PGDIR_MASK) + PGDIR_SIZE;
@@ -781,15 +780,29 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
(pud_t *) p4d, init);
spin_unlock(&init_mm.page_table_lock);
- pgd_changed = true;
+ *pgd_changed = true;
}
- if (pgd_changed)
- sync_global_pgds(vaddr_start, vaddr_end - 1);
-
return paddr_last;
}
+static unsigned long __meminit
+__kernel_physical_mapping_init(unsigned long paddr_start,
+ unsigned long paddr_end,
+ unsigned long page_size_mask,
+ pgprot_t prot, bool init)
+{
+ bool pgd_changed;
+ unsigned long paddr_last;
+
+ paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
+ prot, init, &pgd_changed);
+ if (pgd_changed)
+ sync_global_pgds((unsigned long)__va(paddr_start),
+ (unsigned long)__va(paddr_end) - 1);
+
+ return paddr_last;
+}
/*
* Create page table mapping for the physical memory for specific physical
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (2 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 03/21] x86/mm: factor out phys_pgd_init() Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-10-01 20:28 ` Dave Hansen
2025-11-11 14:55 ` Borislav Petkov
2025-09-24 14:59 ` [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI Brendan Jackman
` (20 subsequent siblings)
24 siblings, 2 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Create the initial shared pagetable to hold all the mappings that will
be shared among ASI domains.
Mirror the physmap into the ASI pagetables, but with a maximum
granularity that's guaranteed to allow changing pageblock sensitivity
without having to allocate pagetables, and with everything as
non-present.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/include/asm/asi.h | 4 ++++
arch/x86/mm/asi.c | 19 +++++++++++++++++++
arch/x86/mm/init.c | 2 ++
arch/x86/mm/init_64.c | 25 +++++++++++++++++++++++--
include/linux/asi.h | 4 ++++
init/main.c | 1 +
6 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
index 32a4c04c4be0f6f425c7cbcff4c58f1827a4b4c4..85062f2a23e127c736a92bb0d49e54f6fdcc2a5b 100644
--- a/arch/x86/include/asm/asi.h
+++ b/arch/x86/include/asm/asi.h
@@ -12,4 +12,8 @@ static inline bool asi_enabled_static(void)
return cpu_feature_enabled(X86_FEATURE_ASI);
}
+void asi_init(void);
+
+extern pgd_t *asi_nonsensitive_pgd;
+
#endif /* _ASM_X86_ASI_H */
diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
index 8c907f3c84f43f66e412ecbfa99e67390d31a66f..7225f6aec936eedf98cd263d791dd62263d62575 100644
--- a/arch/x86/mm/asi.c
+++ b/arch/x86/mm/asi.c
@@ -1,11 +1,20 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/asi.h>
#include <linux/init.h>
+#include <linux/memblock.h>
#include <linux/string.h>
#include <asm/cmdline.h>
#include <asm/cpufeature.h>
+#include "mm_internal.h"
+
+/*
+ * This is a bit like init_mm.pgd, it holds mappings shared among all ASI
+ * domains.
+ */
+pgd_t *asi_nonsensitive_pgd;
+
void __init asi_check_boottime_disable(void)
{
bool enabled = false;
@@ -26,3 +35,13 @@ void __init asi_check_boottime_disable(void)
if (enabled)
setup_force_cpu_cap(X86_FEATURE_ASI);
}
+
+void __init asi_init(void)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_ASI))
+ return;
+
+ asi_nonsensitive_pgd = alloc_low_page();
+ if (WARN_ON(!asi_nonsensitive_pgd))
+ setup_clear_cpu_cap(X86_FEATURE_ASI);
+}
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index b877a41fc291284eb271ebe764a52730d51da3fc..8fd34475af7ccd49d0124e13a87342d3bfef3e05 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -773,6 +773,8 @@ void __init init_mem_mapping(void)
end = max_low_pfn << PAGE_SHIFT;
#endif
+ asi_init();
+
/* the ISA range is always mapped regardless of memory holes */
init_memory_mapping(0, ISA_END_ADDRESS, PAGE_KERNEL);
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e98e85cf15f42db669696ba8195d8fc633351b26..7e0471d46767c63ceade479ae0d1bf738f14904a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -7,6 +7,7 @@
* Copyright (C) 2002,2003 Andi Kleen <ak@suse.de>
*/
+#include <linux/asi.h>
#include <linux/signal.h>
#include <linux/sched.h>
#include <linux/kernel.h>
@@ -746,7 +747,8 @@ phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_en
{
unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
- *pgd_changed = false;
+ if (pgd_changed)
+ *pgd_changed = false;
paddr_last = paddr_end;
vaddr = (unsigned long)__va(paddr_start);
@@ -780,7 +782,8 @@ phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_en
(pud_t *) p4d, init);
spin_unlock(&init_mm.page_table_lock);
- *pgd_changed = true;
+ if (pgd_changed)
+ *pgd_changed = true;
}
return paddr_last;
@@ -797,6 +800,24 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
prot, init, &pgd_changed);
+
+ /*
+ * Set up ASI's unrestricted physmap. This needs to mapped at minimum 2M
+ * size so that regions can be mapped and unmapped at pageblock
+ * granularity without requiring allocations.
+ */
+ if (asi_nonsensitive_pgd) {
+ /*
+ * Since most memory is expected to end up sensitive, start with
+ * everything unmapped in this pagetable.
+ */
+ pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
+
+ VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
+ phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
+ prot_np, init, NULL);
+ }
+
if (pgd_changed)
sync_global_pgds((unsigned long)__va(paddr_start),
(unsigned long)__va(paddr_end) - 1);
diff --git a/include/linux/asi.h b/include/linux/asi.h
index 1832feb1b14d63f05bbfa3f87dd07753338ed70b..cc4bc957274dbf92ce5bf6185a418d0a8d1b7748 100644
--- a/include/linux/asi.h
+++ b/include/linux/asi.h
@@ -11,5 +11,9 @@
static inline void asi_check_boottime_disable(void) { }
static inline bool asi_enabled_static(void) { return false; }
+#define asi_nonsensitive_pgd NULL
+
+static inline void asi_init(void) { };
+
#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
#endif /* _INCLUDE_ASI_H */
diff --git a/init/main.c b/init/main.c
index 07a3116811c5d72cbab48410493b3d0f89d1f1b2..0ec230ba123613c89c4dfbede27e0441207b2f88 100644
--- a/init/main.c
+++ b/init/main.c
@@ -12,6 +12,7 @@
#define DEBUG /* Enable initcall_debug */
+#include <linux/asi.h>
#include <linux/types.h>
#include <linux/export.h>
#include <linux/extable.h>
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (3 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-25 13:36 ` kernel test robot
2025-10-01 20:50 ` Dave Hansen
2025-09-24 14:59 ` [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it Brendan Jackman
` (19 subsequent siblings)
24 siblings, 2 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
ASI has a separate PGD for the physmap, which needs to be kept in sync
with the unrestricted physmap with respect to permissions.
Since only the direct map is currently populated in that address space,
just ignore everything else. Handling of holes in that map is left
behaving the same as the unrestricted pagetables.
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/pat/set_memory.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index d2d54b8c4dbb04cf276d074ddee3ffde2f48e381..53c3ac0ba55d6b6992db6f6761ffdfbd52bf3688 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -3,6 +3,7 @@
* Copyright 2002 Andi Kleen, SuSE Labs.
* Thanks to Ben LaHaise for precious feedback.
*/
+#include <linux/asi.h>
#include <linux/highmem.h>
#include <linux/memblock.h>
#include <linux/sched.h>
@@ -1780,6 +1781,11 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
cpa->numpages = ret;
return 0;
}
+static inline bool is_direct_map(unsigned long vaddr)
+{
+ return within(vaddr, PAGE_OFFSET,
+ PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT));
+}
static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
int primary)
@@ -1808,8 +1814,7 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
* one virtual address page and its pfn. TBD: numpages can be set based
* on the initial value and the level returned by lookup_address().
*/
- if (within(vaddr, PAGE_OFFSET,
- PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))) {
+ if (is_direct_map(vaddr)) {
cpa->numpages = 1;
cpa->pfn = __pa(vaddr) >> PAGE_SHIFT;
return 0;
@@ -1981,6 +1986,27 @@ static int cpa_process_alias(struct cpa_data *cpa)
return 0;
}
+/*
+ * Having updated the unrestricted PGD, reflect this change in the ASI
+ * restricted address space too.
+ */
+static inline int mirror_asi_direct_map(struct cpa_data *cpa, int primary)
+{
+ struct cpa_data asi_cpa = *cpa;
+
+ if (!asi_enabled_static())
+ return 0;
+
+ /* Only need to do this for the real unrestricted direct map. */
+ if ((cpa->pgd && cpa->pgd != init_mm.pgd) || !is_direct_map(*cpa->vaddr))
+ return 0;
+ VM_WARN_ON_ONCE(!is_direct_map(*cpa->vaddr + (cpa->numpages * PAGE_SIZE)));
+
+ asi_cpa.pgd = asi_nonsensitive_pgd;
+ asi_cpa.curpage = 0;
+ return __change_page_attr(cpa, primary);
+}
+
static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
{
unsigned long numpages = cpa->numpages;
@@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
if (!debug_pagealloc_enabled())
spin_lock(&cpa_lock);
ret = __change_page_attr(cpa, primary);
+ if (!ret)
+ ret = mirror_asi_direct_map(cpa, primary);
if (!debug_pagealloc_enabled())
spin_unlock(&cpa_lock);
if (ret)
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (4 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-10-01 21:18 ` Dave Hansen
2025-09-24 14:59 ` [PATCH 07/21] mm: introduce for_each_free_list() Brendan Jackman
` (18 subsequent siblings)
24 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
__GFP_SENSITIVE represents that a page should not be mapped into the
ASI restricted address space.
This is added as a GFP flag instead of via some contextual hint, because
its presence is not ultimately expected to correspond to any such
existing context. If necessary, it should be possible to instead achieve
this optionality with something like __alloc_pages_sensitive(), but
this would be much more invasive to the overall kernel.
On startup, all pages are sensitive. Since there is currently no way to
create nonsensitive pages, temporarily set the flag unconditionally at
the top of the allocator.
__GFP_SENSITIVE is also added to GFP_USER since that's the most
important data that ASI needs to protect.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp_types.h | 15 ++++++++++++++-
include/trace/events/mmflags.h | 1 +
mm/page_alloc.c | 7 +++++++
3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 65db9349f9053c701e24bdcf1dfe6afbf1278a2d..5147dbd53eafdccc32cfd506569b04d5c082d1b2 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -58,6 +58,7 @@ enum {
#ifdef CONFIG_SLAB_OBJ_EXT
___GFP_NO_OBJ_EXT_BIT,
#endif
+ ___GFP_SENSITIVE_BIT,
___GFP_LAST_BIT
};
@@ -103,6 +104,11 @@ enum {
#else
#define ___GFP_NO_OBJ_EXT 0
#endif
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#define ___GFP_SENSITIVE BIT(___GFP_SENSITIVE_BIT)
+#else
+#define ___GFP_SENSITIVE 0
+#endif
/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -299,6 +305,12 @@ enum {
/* Disable lockdep for GFP context tracking */
#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
+/*
+ * Allocate sensitive memory, i.e. do not map it into ASI's restricted address
+ * space.
+ */
+#define __GFP_SENSITIVE ((__force gfp_t)___GFP_SENSITIVE)
+
/* Room for N __GFP_FOO bits */
#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -380,7 +392,8 @@ enum {
#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
#define GFP_NOIO (__GFP_RECLAIM)
#define GFP_NOFS (__GFP_RECLAIM | __GFP_IO)
-#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
+#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | \
+ __GFP_HARDWALL | __GFP_SENSITIVE)
#define GFP_DMA __GFP_DMA
#define GFP_DMA32 __GFP_DMA32
#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index aa441f593e9a6b537d02189add91eb77bebc6a97..425385b7f073d05e9d660ad19cb7497f045adfb7 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -100,6 +100,7 @@ TRACE_DEFINE_ENUM(___GFP_LAST_BIT);
gfpflag_string(GFP_DMA), \
gfpflag_string(GFP_DMA32), \
gfpflag_string(__GFP_RECLAIM), \
+ gfpflag_string(__GFP_SENSITIVE), \
TRACE_GFP_FLAGS \
{ 0, NULL }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23d75fdd4aec118e34f3f49d3de2e0..0d1c28decd57b4a5e250acc0efc41669b7f67f5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5152,6 +5152,13 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = { };
+ /*
+ * Temporary hack: Allocation of nonsensitive pages is not possible yet,
+ * allocate everything sensitive. The restricted address space is never
+ * actually entered yet so this is fine.
+ */
+ gfp |= __GFP_SENSITIVE;
+
/*
* There are several places where we assume that the order value is sane
* so bail out early if the request is out of bound.
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 07/21] mm: introduce for_each_free_list()
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (5 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 08/21] mm: rejig pageblock mask definitions Brendan Jackman
` (17 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Later patches will rearrange the free areas, but there are a couple of
places that iterate over them with the assumption that they have the
current structure.
It seems ideally, code outside of mm should not be directly aware of
struct free_area in the first place, but that awareness seems relatively
harmless so just make the minimal change.
Now instead of letting users manually iterate over the free lists, just
provide a macro to do that. Then adopt that macro in a couple of places.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/mmzone.h | 9 ++++++---
kernel/power/snapshot.c | 7 +++----
mm/mm_init.c | 11 +++++++----
3 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7fb7331c57250782a464a9583c6ea4867f4ffdab..02f5e8cc40c78ac8b81bb5c6f9af8718b1ffb316 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -123,9 +123,12 @@ static inline bool migratetype_is_mergeable(int mt)
return mt < MIGRATE_PCPTYPES;
}
-#define for_each_migratetype_order(order, type) \
- for (order = 0; order < NR_PAGE_ORDERS; order++) \
- for (type = 0; type < MIGRATE_TYPES; type++)
+#define for_each_free_list(list, zone) \
+ for (unsigned int order = 0; order < NR_PAGE_ORDERS; order++) \
+ for (unsigned int type = 0; \
+ list = &zone->free_area[order].free_list[type], \
+ type < MIGRATE_TYPES; \
+ type++) \
extern int page_group_by_mobility_disabled;
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 645f42e404789286ffa751f083e97e52a4e4cf7e..40a7064eb6b247f47ca02211f8347cbd605af590 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1244,8 +1244,8 @@ unsigned int snapshot_additional_pages(struct zone *zone)
static void mark_free_pages(struct zone *zone)
{
unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT;
+ struct list_head *free_list;
unsigned long flags;
- unsigned int order, t;
struct page *page;
if (zone_is_empty(zone))
@@ -1269,9 +1269,8 @@ static void mark_free_pages(struct zone *zone)
swsusp_unset_page_free(page);
}
- for_each_migratetype_order(order, t) {
- list_for_each_entry(page,
- &zone->free_area[order].free_list[t], buddy_list) {
+ for_each_free_list(free_list, zone) {
+ list_for_each_entry(page, free_list, buddy_list) {
unsigned long i;
pfn = page_to_pfn(page);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 3db2dea7db4c57c81f3fc3b71f0867025edda655..9554b79d0946a4a1a2ac5c934c1f80d2dc91b087 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1435,11 +1435,14 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx,
static void __meminit zone_init_free_lists(struct zone *zone)
{
- unsigned int order, t;
- for_each_migratetype_order(order, t) {
- INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
+ struct list_head *list;
+ unsigned int order;
+
+ for_each_free_list(list, zone)
+ INIT_LIST_HEAD(list);
+
+ for (order = 0; order < NR_PAGE_ORDERS; order++)
zone->free_area[order].nr_free = 0;
- }
#ifdef CONFIG_UNACCEPTED_MEMORY
INIT_LIST_HEAD(&zone->unaccepted_pages);
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 08/21] mm: rejig pageblock mask definitions
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (6 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 07/21] mm: introduce for_each_free_list() Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 09/21] mm/page_alloc: Invert is_check_pages_enabled() check Brendan Jackman
` (16 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
A later patch will complicate the definition of these masks, this is a
preparatory patch to make that patch easier to review.
- More masks will be needed, so add a PAGEBLOCK_ prefix to the names
to avoid polluting the "global namespace" too much.
- Move the CONFIG_MEMORY_ISOLATION ifdeffery into a separate block, this
allows the various conditionally-defined masks to be combined cleanly.
- This makes MIGRATETYPE_AND_ISO_MASK start to look pretty long. Well,
that global mask only exists for quite a specific purpose so just drop
it and take advantage of the newly-defined PAGEBLOCK_ISO_MASK.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/pageblock-flags.h | 6 +++---
mm/page_alloc.c | 18 +++++++++---------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index e046278a01fa8c37d898df94114d088933b6747f..13457e920e892c1c5083e0dc63e2ecfbed88f60e 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -36,12 +36,12 @@ enum pageblock_bits {
#define NR_PAGEBLOCK_BITS (roundup_pow_of_two(__NR_PAGEBLOCK_BITS))
-#define MIGRATETYPE_MASK (BIT(PB_migrate_0)|BIT(PB_migrate_1)|BIT(PB_migrate_2))
+#define PAGEBLOCK_MIGRATETYPE_MASK (BIT(PB_migrate_0)|BIT(PB_migrate_1)|BIT(PB_migrate_2))
#ifdef CONFIG_MEMORY_ISOLATION
-#define MIGRATETYPE_AND_ISO_MASK (MIGRATETYPE_MASK | BIT(PB_migrate_isolate))
+#define PAGEBLOCK_ISO_MASK BIT(PB_migrate_isolate)
#else
-#define MIGRATETYPE_AND_ISO_MASK MIGRATETYPE_MASK
+#define PAGEBLOCK_ISO_MASK 0
#endif
#if defined(CONFIG_HUGETLB_PAGE)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0d1c28decd57b4a5e250acc0efc41669b7f67f5b..a1db87488296a6d2d91a1be8d4d202f1841c4dfd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -370,7 +370,7 @@ get_pfnblock_bitmap_bitidx(const struct page *page, unsigned long pfn,
#else
BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
#endif
- BUILD_BUG_ON(__MIGRATE_TYPE_END > MIGRATETYPE_MASK);
+ BUILD_BUG_ON(__MIGRATE_TYPE_END > PAGEBLOCK_MIGRATETYPE_MASK);
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
bitmap = get_pageblock_bitmap(page, pfn);
@@ -443,7 +443,7 @@ bool get_pfnblock_bit(const struct page *page, unsigned long pfn,
__always_inline enum migratetype
get_pfnblock_migratetype(const struct page *page, unsigned long pfn)
{
- unsigned long mask = MIGRATETYPE_AND_ISO_MASK;
+ unsigned long mask = PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK;
unsigned long flags;
flags = __get_pfnblock_flags_mask(page, pfn, mask);
@@ -452,7 +452,7 @@ get_pfnblock_migratetype(const struct page *page, unsigned long pfn)
if (flags & BIT(PB_migrate_isolate))
return MIGRATE_ISOLATE;
#endif
- return flags & MIGRATETYPE_MASK;
+ return flags & PAGEBLOCK_MIGRATETYPE_MASK;
}
/**
@@ -540,11 +540,11 @@ static void set_pageblock_migratetype(struct page *page,
}
VM_WARN_ONCE(get_pageblock_isolate(page),
"Use clear_pageblock_isolate() to unisolate pageblock");
- /* MIGRATETYPE_AND_ISO_MASK clears PB_migrate_isolate if it is set */
+ /* PAGEBLOCK_ISO_MASK clears PB_migrate_isolate if it is set */
#endif
__set_pfnblock_flags_mask(page, page_to_pfn(page),
(unsigned long)migratetype,
- MIGRATETYPE_AND_ISO_MASK);
+ PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK);
}
void __meminit init_pageblock_migratetype(struct page *page,
@@ -570,7 +570,7 @@ void __meminit init_pageblock_migratetype(struct page *page,
flags |= BIT(PB_migrate_isolate);
#endif
__set_pfnblock_flags_mask(page, page_to_pfn(page), flags,
- MIGRATETYPE_AND_ISO_MASK);
+ PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK);
}
#ifdef CONFIG_DEBUG_VM
@@ -2122,15 +2122,15 @@ static bool __move_freepages_block_isolate(struct zone *zone,
}
move:
- /* Use MIGRATETYPE_MASK to get non-isolate migratetype */
+ /* Use PAGEBLOCK_MIGRATETYPE_MASK to get non-isolate migratetype */
if (isolate) {
from_mt = __get_pfnblock_flags_mask(page, page_to_pfn(page),
- MIGRATETYPE_MASK);
+ PAGEBLOCK_MIGRATETYPE_MASK);
to_mt = MIGRATE_ISOLATE;
} else {
from_mt = MIGRATE_ISOLATE;
to_mt = __get_pfnblock_flags_mask(page, page_to_pfn(page),
- MIGRATETYPE_MASK);
+ PAGEBLOCK_MIGRATETYPE_MASK);
}
__move_freepages_block(zone, start_pfn, from_mt, to_mt);
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 09/21] mm/page_alloc: Invert is_check_pages_enabled() check
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (7 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 08/21] mm: rejig pageblock mask definitions Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 10/21] mm/page_alloc: remove ifdefs from pindex helpers Brendan Jackman
` (15 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
A later patch will expand this function, making it ugly that the whole
body sits inside a conditional.
In preparation, invert it to de-indent the main logic. Separate commit
to make review easier.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a1db87488296a6d2d91a1be8d4d202f1841c4dfd..10757410da2127b0488c99c5933422fc649f9a1d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1752,13 +1752,14 @@ static bool check_new_page(struct page *page)
static inline bool check_new_pages(struct page *page, unsigned int order)
{
- if (is_check_pages_enabled()) {
- for (int i = 0; i < (1 << order); i++) {
- struct page *p = page + i;
+ if (!is_check_pages_enabled())
+ return false;
- if (check_new_page(p))
- return true;
- }
+ for (int i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (check_new_page(p))
+ return true;
}
return false;
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 10/21] mm/page_alloc: remove ifdefs from pindex helpers
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (8 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 09/21] mm/page_alloc: Invert is_check_pages_enabled() check Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 11/21] mm: introduce freetype_t Brendan Jackman
` (14 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
The ifdefs are not technically needed here, everything used here is
always defined.
They aren't doing much harm right now but a following patch will
complicate these functions. Switching to IS_ENABLED() makes the code a
bit less tiresome to read.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10757410da2127b0488c99c5933422fc649f9a1d..08e0faab992fcf3c426d4783da041f930075d903 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -656,19 +656,17 @@ static void bad_page(struct page *page, const char *reason)
static inline unsigned int order_to_pindex(int migratetype, int order)
{
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ bool movable = migratetype == MIGRATE_MOVABLE;
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- bool movable;
- if (order > PAGE_ALLOC_COSTLY_ORDER) {
- VM_BUG_ON(order != HPAGE_PMD_ORDER);
+ if (order > PAGE_ALLOC_COSTLY_ORDER) {
+ VM_BUG_ON(order != HPAGE_PMD_ORDER);
- movable = migratetype == MIGRATE_MOVABLE;
-
- return NR_LOWORDER_PCP_LISTS + movable;
+ return NR_LOWORDER_PCP_LISTS + movable;
+ }
+ } else {
+ VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
}
-#else
- VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
-#endif
return (MIGRATE_PCPTYPES * order) + migratetype;
}
@@ -677,12 +675,12 @@ static inline int pindex_to_order(unsigned int pindex)
{
int order = pindex / MIGRATE_PCPTYPES;
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (pindex >= NR_LOWORDER_PCP_LISTS)
- order = HPAGE_PMD_ORDER;
-#else
- VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
-#endif
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ if (pindex >= NR_LOWORDER_PCP_LISTS)
+ order = HPAGE_PMD_ORDER;
+ } else {
+ VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
+ }
return order;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 11/21] mm: introduce freetype_t
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (9 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 10/21] mm/page_alloc: remove ifdefs from pindex helpers Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-25 13:15 ` kernel test robot
2025-10-01 21:20 ` Dave Hansen
2025-09-24 14:59 ` [PATCH 12/21] mm/asi: encode sensitivity in freetypes and pageblocks Brendan Jackman
` (13 subsequent siblings)
24 siblings, 2 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
This is preparation for making the page allocator aware of ASI
sensitivity. To do that, certain properties will be highly desirable:
- A "pool" of pages with each sensitivity is usually available, so that
pages can be provided with the correct sensitivity without zeroing/TLB
flushing.
- Pages are physically grouped by sensitivity, so that large
allocations rarely have to alter the pagetables due to ASI.
- ASI sensitivity varies only at a certain fixed address granularity, so
that the pagetables can all be pre-allocated. This is desirable
because the page allocator will be changing mappings: pre-allocation
is a straightforward way to avoid recursive allocations (of
pagetables).
It seems that the existing infrastructure for grouping pages by
mobility, i.e. pageblocks and migratetypes, serves this purpose pretty
nicely.
Early prototypes of this approach took advantage of it by just
overloading enum migratetype to encode not only mobility but also
sensitivity. This overloading is OK if you can apply constraints like
"only movable pages are ever sensitive", but in the real world such
constraints don't actually exist, so overloading migratetype gets rather
ugly.
Therefore, introduce a new higher-level concept, called "freetype"
(because it is used to index "free"lists) that can encode sensitivity
orthogonally to mobility. This is the "more invasive changes" mentioned
in [0].
Since freetypes and migratetypes would be very easy to mix up, freetypes
are (at least for now) stored in a struct typedef similar to atomic_t.
This provides type-safety, but comes at the expense of being pretty
annoying to code with. For instance, freetype_t cannot be compared with
the == operator. Once this code matures, if the freetype/migratetype
distinction gets less confusing, it might be wise to drop this
struct and just use ints.
To try and reduce review pain for such a churny patch, first introduce
freetypes as nothing but an indirection over migratetypes. The helpers
concerned with sensitivity are defined, but only as stubs. Convert
everything over to using freetypes wherever they are needed to index
freelists, but maintain references to migratetypes in code that really
only cares specifically about mobility.
A later patch will add the encoding of sensitivity into freetype_t. For
this patch, no functional change is intended.
[0]: https://lore.kernel.org/lkml/20250313-asi-page-alloc-v1-9-04972e046cea@google.com/
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp.h | 16 ++-
include/linux/mmzone.h | 68 +++++++++-
mm/compaction.c | 32 ++---
mm/internal.h | 20 ++-
mm/page_alloc.c | 346 ++++++++++++++++++++++++++++++-------------------
mm/page_isolation.c | 2 +-
mm/page_owner.c | 7 +-
mm/page_reporting.c | 4 +-
mm/show_mem.c | 2 +-
9 files changed, 330 insertions(+), 167 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0ceb4e09306c4a7098d5a61645396e3b82a1ca30..a275171c5a6aecafd7783e57ce7d4316c5e56655 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -16,8 +16,10 @@ struct mempolicy;
#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
#define GFP_MOVABLE_SHIFT 3
-static inline int gfp_migratetype(const gfp_t gfp_flags)
+static inline freetype_t gfp_freetype(const gfp_t gfp_flags)
{
+ int migratetype;
+
VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
@@ -25,11 +27,15 @@ static inline int gfp_migratetype(const gfp_t gfp_flags)
BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
- if (unlikely(page_group_by_mobility_disabled))
- return MIGRATE_UNMOVABLE;
+ if (unlikely(page_group_by_mobility_disabled)) {
+ migratetype = MIGRATE_UNMOVABLE;
+ } else {
+ /* Group based on mobility */
+ migratetype = (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK)
+ >> GFP_MOVABLE_SHIFT;
+ }
- /* Group based on mobility */
- return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
+ return migrate_to_freetype(migratetype, false);
}
#undef GFP_MOVABLE_MASK
#undef GFP_MOVABLE_SHIFT
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02f5e8cc40c78ac8b81bb5c6f9af8718b1ffb316..56310722f38b788154ee15845b6877ed7e70d6b7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -123,6 +123,55 @@ static inline bool migratetype_is_mergeable(int mt)
return mt < MIGRATE_PCPTYPES;
}
+#define NR_SENSITIVITIES 1
+
+/*
+ * A freetype is the index used to identify free lists (free area lists and
+ * pcplists). On non-ASI this is the same thing as a migratetype, on ASI it also
+ * encodes sensitivity. To avoid accidentally mixing the two identifiers,
+ * freetypes are a struct in the style of atomic_t.
+ */
+typedef struct {
+ int type;
+} freetype_t;
+
+#define NR_FREETYPES (MIGRATE_TYPES * NR_SENSITIVITIES)
+
+static inline freetype_t migrate_to_freetype(enum migratetype mt, bool sensitive)
+{
+ freetype_t freetype;
+
+ freetype.type = mt;
+ return freetype;
+}
+
+static inline enum migratetype free_to_migratetype(freetype_t freetype)
+{
+ return freetype.type;
+}
+
+static inline bool freetype_sensitive(freetype_t freetype)
+{
+ return false;
+}
+
+/* Convenience helper, return the freetype modified to have the migratetype. */
+static inline freetype_t freetype_with_migrate(freetype_t freetype,
+ enum migratetype migratetype)
+{
+ return migrate_to_freetype(migratetype, freetype_sensitive(freetype));
+}
+
+static inline bool freetypes_equal(freetype_t a, freetype_t b)
+{
+ return a.type == b.type;
+}
+
+#define for_each_sensitivity(sensitive) \
+ for (int _s = 0; \
+ sensitive = (bool)_s, _s < NR_SENSITIVITIES; \
+ _s++)
+
#define for_each_free_list(list, zone) \
for (unsigned int order = 0; order < NR_PAGE_ORDERS; order++) \
for (unsigned int type = 0; \
@@ -132,17 +181,30 @@ static inline bool migratetype_is_mergeable(int mt)
extern int page_group_by_mobility_disabled;
+freetype_t get_pfnblock_freetype(const struct page *page, unsigned long pfn);
+
#define get_pageblock_migratetype(page) \
get_pfnblock_migratetype(page, page_to_pfn(page))
+#define get_pageblock_freetype(page) \
+ get_pfnblock_freetype(page, page_to_pfn(page))
+
#define folio_migratetype(folio) \
get_pageblock_migratetype(&folio->page)
struct free_area {
- struct list_head free_list[MIGRATE_TYPES];
+ struct list_head free_list[NR_FREETYPES];
unsigned long nr_free;
};
+static inline
+struct list_head *free_area_list(struct free_area *area, freetype_t type)
+{
+ VM_BUG_ON(type.type < 0 || type.type >= ARRAY_SIZE(area->free_list));
+ VM_BUG_ON(!area);
+ return &area->free_list[type.type];
+}
+
struct pglist_data;
#ifdef CONFIG_NUMA
@@ -726,8 +788,10 @@ enum zone_watermarks {
#else
#define NR_PCP_THP 0
#endif
+/* Note this is the number per sensitivity. */
#define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1))
-#define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP)
+#define NR_PCP_LISTS_PER_SENSITIVITY (NR_LOWORDER_PCP_LISTS + NR_PCP_THP)
+#define NR_PCP_LISTS (NR_PCP_LISTS_PER_SENSITIVITY * NR_SENSITIVITIES)
/*
* Flags used in pcp->flags field.
diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c6844c27682677a0a9ea552316828..64a2c88a66a92f5c87169fbc11f87e8ae822af99 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1359,7 +1359,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
static bool suitable_migration_source(struct compact_control *cc,
struct page *page)
{
- int block_mt;
+ freetype_t block_ft;
if (pageblock_skip_persistent(page))
return false;
@@ -1367,12 +1367,12 @@ static bool suitable_migration_source(struct compact_control *cc,
if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
return true;
- block_mt = get_pageblock_migratetype(page);
+ block_ft = get_pageblock_freetype(page);
- if (cc->migratetype == MIGRATE_MOVABLE)
- return is_migrate_movable(block_mt);
+ if (free_to_migratetype(cc->freetype) == MIGRATE_MOVABLE)
+ return is_migrate_movable(free_to_migratetype(block_ft));
else
- return block_mt == cc->migratetype;
+ return freetypes_equal(block_ft, cc->freetype);
}
/* Returns true if the page is within a block suitable for migration to */
@@ -1963,7 +1963,8 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
* reduces the risk that a large movable pageblock is freed for
* an unmovable/reclaimable small allocation.
*/
- if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
+ if (cc->direct_compaction &&
+ free_to_migratetype(cc->freetype) != MIGRATE_MOVABLE)
return pfn;
/*
@@ -2234,7 +2235,7 @@ static bool should_proactive_compact_node(pg_data_t *pgdat)
static enum compact_result __compact_finished(struct compact_control *cc)
{
unsigned int order;
- const int migratetype = cc->migratetype;
+ const freetype_t freetype = cc->freetype;
int ret;
/* Compaction run completes if the migrate and free scanner meet */
@@ -2309,24 +2310,25 @@ static enum compact_result __compact_finished(struct compact_control *cc)
for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
struct free_area *area = &cc->zone->free_area[order];
- /* Job done if page is free of the right migratetype */
- if (!free_area_empty(area, migratetype))
+ /* Job done if page is free of the right freetype */
+ if (!free_area_empty(area, freetype))
return COMPACT_SUCCESS;
#ifdef CONFIG_CMA
/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
if (migratetype == MIGRATE_MOVABLE &&
- !free_area_empty(area, MIGRATE_CMA))
+ !free_areas_empty(area, MIGRATE_CMA))
return COMPACT_SUCCESS;
#endif
/*
* Job done if allocation would steal freepages from
- * other migratetype buddy lists.
+ * other freetype buddy lists.
*/
- if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+ if (find_suitable_fallback(area, order, freetype, true) >= 0)
/*
- * Movable pages are OK in any pageblock. If we are
- * stealing for a non-movable allocation, make sure
+ * Movable pages are OK in any pageblock of the right
+ * sensitivity. If we are * stealing for a
+ * non-movable allocation, make sure
* we finish compacting the current pageblock first
* (which is assured by the above migrate_pfn align
* check) so it is as free as possible and we won't
@@ -2531,7 +2533,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
INIT_LIST_HEAD(&cc->freepages[order]);
INIT_LIST_HEAD(&cc->migratepages);
- cc->migratetype = gfp_migratetype(cc->gfp_mask);
+ cc->freetype = gfp_freetype(cc->gfp_mask);
if (!is_via_compact_memory(cc->order)) {
ret = compaction_suit_allocation_order(cc->zone, cc->order,
diff --git a/mm/internal.h b/mm/internal.h
index c8d6998bd6b223c62405fd54419282453fc40b9e..50ff6671f19d38a59c9f07e66d347baf85ddf085 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -10,6 +10,7 @@
#include <linux/fs.h>
#include <linux/khugepaged.h>
#include <linux/mm.h>
+#include <linux/mmzone.h>
#include <linux/mm_inline.h>
#include <linux/pagemap.h>
#include <linux/pagewalk.h>
@@ -589,7 +590,7 @@ struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
struct zoneref *preferred_zoneref;
- int migratetype;
+ freetype_t freetype;
/*
* highest_zoneidx represents highest usable zone index of
@@ -740,8 +741,8 @@ static inline void clear_zone_contiguous(struct zone *zone)
}
extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __putback_isolated_page(struct page *page, unsigned int order,
- int mt);
+void __putback_isolated_page(struct page *page, unsigned int order,
+ freetype_t freetype);
extern void memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order);
extern void __free_pages_core(struct page *page, unsigned int order,
@@ -893,7 +894,7 @@ struct compact_control {
short search_order; /* order to start a fast search at */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
int order; /* order a direct compactor needs */
- int migratetype; /* migratetype of direct compactor */
+ freetype_t freetype; /* freetype of direct compactor */
const unsigned int alloc_flags; /* alloc flags of a direct compactor */
const int highest_zoneidx; /* zone index of a direct compactor */
enum migrate_mode mode; /* Async or sync migration mode */
@@ -950,11 +951,16 @@ static inline void init_cma_pageblock(struct page *page)
int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable);
+ freetype_t freetype, bool claimable);
-static inline bool free_area_empty(struct free_area *area, int migratetype)
+static inline bool free_area_empty(struct free_area *area, freetype_t freetype)
{
- return list_empty(&area->free_list[migratetype]);
+ return list_empty(free_area_list(area, freetype));
+}
+
+static inline bool free_areas_empty(struct free_area *area, int migratetype)
+{
+ return free_area_empty(area, migrate_to_freetype(migratetype, false));
}
/* mm/util.c */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 08e0faab992fcf3c426d4783da041f930075d903..4ce81f8d4e59966b7c0c2902e24aa2f4639a0e59 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -430,6 +430,37 @@ bool get_pfnblock_bit(const struct page *page, unsigned long pfn,
return test_bit(bitidx + pb_bit, bitmap_word);
}
+/**
+ * __get_pfnblock_freetype - Return the freetype of a pageblock, optionally
+ * ignoring the fact that it's currently isolated.
+ * @page: The page within the block of interest
+ * @pfn: The target page frame number
+ * @ignore_iso: If isolated, return the migratetype that the block had before
+ * isolation.
+ */
+__always_inline freetype_t
+__get_pfnblock_freetype(const struct page *page, unsigned long pfn,
+ bool ignore_iso)
+{
+ int mt = get_pfnblock_migratetype(page, pfn);
+
+ return migrate_to_freetype(mt, false);
+}
+
+/**
+ * get_pfnblock_migratetype - Return the freetype of a pageblock
+ * @page: The page within the block of interest
+ * @pfn: The target page frame number
+ *
+ * Return: The freetype of the pageblock
+ */
+__always_inline freetype_t
+get_pfnblock_freetype(const struct page *page, unsigned long pfn)
+{
+ return __get_pfnblock_freetype(page, pfn, false);
+}
+
+
/**
* get_pfnblock_migratetype - Return the migratetype of a pageblock
* @page: The page within the block of interest
@@ -739,8 +770,11 @@ static inline struct capture_control *task_capc(struct zone *zone)
static inline bool
compaction_capture(struct capture_control *capc, struct page *page,
- int order, int migratetype)
+ int order, freetype_t freetype)
{
+ enum migratetype migratetype = free_to_migratetype(freetype);
+ enum migratetype capc_mt;
+
if (!capc || order != capc->cc->order)
return false;
@@ -749,6 +783,8 @@ compaction_capture(struct capture_control *capc, struct page *page,
is_migrate_isolate(migratetype))
return false;
+ capc_mt = free_to_migratetype(capc->cc->freetype);
+
/*
* Do not let lower order allocations pollute a movable pageblock
* unless compaction is also requesting movable pages.
@@ -757,12 +793,12 @@ compaction_capture(struct capture_control *capc, struct page *page,
* have trouble finding a high-order free page.
*/
if (order < pageblock_order && migratetype == MIGRATE_MOVABLE &&
- capc->cc->migratetype != MIGRATE_MOVABLE)
+ capc_mt != MIGRATE_MOVABLE)
return false;
- if (migratetype != capc->cc->migratetype)
+ if (migratetype != capc_mt)
trace_mm_page_alloc_extfrag(page, capc->cc->order, order,
- capc->cc->migratetype, migratetype);
+ capc_mt, migratetype);
capc->page = page;
return true;
@@ -776,7 +812,7 @@ static inline struct capture_control *task_capc(struct zone *zone)
static inline bool
compaction_capture(struct capture_control *capc, struct page *page,
- int order, int migratetype)
+ int order, freetype_t freetype)
{
return false;
}
@@ -801,23 +837,23 @@ static inline void account_freepages(struct zone *zone, int nr_pages,
/* Used for pages not on another list */
static inline void __add_to_free_list(struct page *page, struct zone *zone,
- unsigned int order, int migratetype,
+ unsigned int order, freetype_t freetype,
bool tail)
{
struct free_area *area = &zone->free_area[order];
int nr_pages = 1 << order;
- VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype,
+ VM_WARN_ONCE(!freetypes_equal(get_pageblock_freetype(page), freetype),
"page type is %d, passed migratetype is %d (nr=%d)\n",
- get_pageblock_migratetype(page), migratetype, nr_pages);
+ get_pageblock_freetype(page).type, freetype.type, nr_pages);
if (tail)
- list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
+ list_add_tail(&page->buddy_list, free_area_list(area, freetype));
else
- list_add(&page->buddy_list, &area->free_list[migratetype]);
+ list_add(&page->buddy_list, free_area_list(area, freetype));
area->nr_free++;
- if (order >= pageblock_order && !is_migrate_isolate(migratetype))
+ if (order >= pageblock_order && !is_migrate_isolate(free_to_migratetype(freetype)))
__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages);
}
@@ -827,17 +863,20 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone,
* allocation again (e.g., optimization for memory onlining).
*/
static inline void move_to_free_list(struct page *page, struct zone *zone,
- unsigned int order, int old_mt, int new_mt)
+ unsigned int order,
+ freetype_t old_ft, freetype_t new_ft)
{
struct free_area *area = &zone->free_area[order];
+ int old_mt = free_to_migratetype(old_ft);
+ int new_mt = free_to_migratetype(new_ft);
int nr_pages = 1 << order;
/* Free page moving can fail, so it happens before the type update */
- VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt,
- "page type is %d, passed migratetype is %d (nr=%d)\n",
- get_pageblock_migratetype(page), old_mt, nr_pages);
+ VM_WARN_ONCE(!freetypes_equal(get_pageblock_freetype(page), old_ft),
+ "page type is %d, passed freetype is %d (nr=%d)\n",
+ get_pageblock_freetype(page).type, old_ft.type, nr_pages);
- list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
+ list_move_tail(&page->buddy_list, free_area_list(area, new_ft));
account_freepages(zone, -nr_pages, old_mt);
account_freepages(zone, nr_pages, new_mt);
@@ -880,9 +919,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
}
static inline struct page *get_page_from_free_area(struct free_area *area,
- int migratetype)
+ freetype_t freetype)
{
- return list_first_entry_or_null(&area->free_list[migratetype],
+ return list_first_entry_or_null(free_area_list(area, freetype),
struct page, buddy_list);
}
@@ -938,9 +977,10 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
static inline void __free_one_page(struct page *page,
unsigned long pfn,
struct zone *zone, unsigned int order,
- int migratetype, fpi_t fpi_flags)
+ freetype_t freetype, fpi_t fpi_flags)
{
struct capture_control *capc = task_capc(zone);
+ int migratetype = free_to_migratetype(freetype);
unsigned long buddy_pfn = 0;
unsigned long combined_pfn;
struct page *buddy;
@@ -949,16 +989,17 @@ static inline void __free_one_page(struct page *page,
VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags.f & PAGE_FLAGS_CHECK_AT_PREP, page);
- VM_BUG_ON(migratetype == -1);
+ VM_BUG_ON(freetype.type == -1);
VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
VM_BUG_ON_PAGE(bad_range(zone, page), page);
account_freepages(zone, 1 << order, migratetype);
while (order < MAX_PAGE_ORDER) {
- int buddy_mt = migratetype;
+ freetype_t buddy_ft = freetype;
+ enum migratetype buddy_mt = free_to_migratetype(buddy_ft);
- if (compaction_capture(capc, page, order, migratetype)) {
+ if (compaction_capture(capc, page, order, freetype)) {
account_freepages(zone, -(1 << order), migratetype);
return;
}
@@ -974,7 +1015,8 @@ static inline void __free_one_page(struct page *page,
* pageblock isolation could cause incorrect freepage or CMA
* accounting or HIGHATOMIC accounting.
*/
- buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn);
+ buddy_ft = get_pfnblock_freetype(buddy, buddy_pfn);
+ buddy_mt = free_to_migratetype(buddy_ft);
if (migratetype != buddy_mt &&
(!migratetype_is_mergeable(migratetype) ||
@@ -1016,7 +1058,7 @@ static inline void __free_one_page(struct page *page,
else
to_tail = buddy_merge_likely(pfn, buddy_pfn, page, order);
- __add_to_free_list(page, zone, order, migratetype, to_tail);
+ __add_to_free_list(page, zone, order, freetype, to_tail);
/* Notify page reporting subsystem of freed page */
if (!(fpi_flags & FPI_SKIP_REPORT_NOTIFY))
@@ -1471,19 +1513,20 @@ static void free_pcppages_bulk(struct zone *zone, int count,
nr_pages = 1 << order;
do {
unsigned long pfn;
- int mt;
+ freetype_t ft;
page = list_last_entry(list, struct page, pcp_list);
pfn = page_to_pfn(page);
- mt = get_pfnblock_migratetype(page, pfn);
+ ft = get_pfnblock_freetype(page, pfn);
/* must delete to avoid corrupting pcp list */
list_del(&page->pcp_list);
count -= nr_pages;
pcp->count -= nr_pages;
- __free_one_page(page, pfn, zone, order, mt, FPI_NONE);
- trace_mm_page_pcpu_drain(page, order, mt);
+ __free_one_page(page, pfn, zone, order, ft, FPI_NONE);
+ trace_mm_page_pcpu_drain(page, order,
+ free_to_migratetype(ft));
} while (count > 0 && !list_empty(list));
}
@@ -1504,9 +1547,9 @@ static void split_large_buddy(struct zone *zone, struct page *page,
order = pageblock_order;
do {
- int mt = get_pfnblock_migratetype(page, pfn);
+ freetype_t ft = get_pfnblock_freetype(page, pfn);
- __free_one_page(page, pfn, zone, order, mt, fpi);
+ __free_one_page(page, pfn, zone, order, ft, fpi);
pfn += 1 << order;
if (pfn == end)
break;
@@ -1684,7 +1727,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
* -- nyc
*/
static inline unsigned int expand(struct zone *zone, struct page *page, int low,
- int high, int migratetype)
+ int high, freetype_t freetype)
{
unsigned int size = 1 << high;
unsigned int nr_added = 0;
@@ -1703,7 +1746,7 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
if (set_page_guard(zone, &page[size], high))
continue;
- __add_to_free_list(&page[size], zone, high, migratetype, false);
+ __add_to_free_list(&page[size], zone, high, freetype, false);
set_buddy_order(&page[size], high);
nr_added += size;
}
@@ -1713,12 +1756,13 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
static __always_inline void page_del_and_expand(struct zone *zone,
struct page *page, int low,
- int high, int migratetype)
+ int high, freetype_t freetype)
{
+ enum migratetype migratetype = free_to_migratetype(freetype);
int nr_pages = 1 << high;
__del_page_from_free_list(page, zone, high, migratetype);
- nr_pages -= expand(zone, page, low, high, migratetype);
+ nr_pages -= expand(zone, page, low, high, freetype);
account_freepages(zone, -nr_pages, migratetype);
}
@@ -1877,7 +1921,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
*/
static __always_inline
struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
- int migratetype)
+ freetype_t freetype)
{
unsigned int current_order;
struct free_area *area;
@@ -1885,13 +1929,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
/* Find a page of the appropriate size in the preferred list */
for (current_order = order; current_order < NR_PAGE_ORDERS; ++current_order) {
+ enum migratetype migratetype = free_to_migratetype(freetype);
+
area = &(zone->free_area[current_order]);
- page = get_page_from_free_area(area, migratetype);
+ page = get_page_from_free_area(area, freetype);
if (!page)
continue;
page_del_and_expand(zone, page, order, current_order,
- migratetype);
+ freetype);
trace_mm_page_alloc_zone_locked(page, order, migratetype,
pcp_allowed_order(order) &&
migratetype < MIGRATE_PCPTYPES);
@@ -1916,13 +1962,18 @@ static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPTYPES - 1] = {
#ifdef CONFIG_CMA
static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
- unsigned int order)
+ unsigned int order, bool sensitive)
{
- return __rmqueue_smallest(zone, order, MIGRATE_CMA);
+ freetype_t freetype = migrate_to_freetype(MIGRATE_CMA, sensitive);
+
+ return __rmqueue_smallest(zone, order, freetype);
}
#else
static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
- unsigned int order) { return NULL; }
+ unsigned int order, bool sensitive)
+{
+ return NULL;
+}
#endif
/*
@@ -1930,7 +1981,7 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
* change the block type.
*/
static int __move_freepages_block(struct zone *zone, unsigned long start_pfn,
- int old_mt, int new_mt)
+ freetype_t old_ft, freetype_t new_ft)
{
struct page *page;
unsigned long pfn, end_pfn;
@@ -1953,7 +2004,7 @@ static int __move_freepages_block(struct zone *zone, unsigned long start_pfn,
order = buddy_order(page);
- move_to_free_list(page, zone, order, old_mt, new_mt);
+ move_to_free_list(page, zone, order, old_ft, new_ft);
pfn += 1 << order;
pages_moved += 1 << order;
@@ -2013,7 +2064,7 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
}
static int move_freepages_block(struct zone *zone, struct page *page,
- int old_mt, int new_mt)
+ freetype_t old_ft, freetype_t new_ft)
{
unsigned long start_pfn;
int res;
@@ -2021,8 +2072,11 @@ static int move_freepages_block(struct zone *zone, struct page *page,
if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL))
return -1;
- res = __move_freepages_block(zone, start_pfn, old_mt, new_mt);
- set_pageblock_migratetype(pfn_to_page(start_pfn), new_mt);
+ VM_BUG_ON(freetype_sensitive(old_ft) != freetype_sensitive(new_ft));
+
+ res = __move_freepages_block(zone, start_pfn, old_ft, new_ft);
+ set_pageblock_migratetype(pfn_to_page(start_pfn),
+ free_to_migratetype(new_ft));
return res;
@@ -2090,8 +2144,7 @@ static bool __move_freepages_block_isolate(struct zone *zone,
struct page *page, bool isolate)
{
unsigned long start_pfn, buddy_pfn;
- int from_mt;
- int to_mt;
+ freetype_t block_ft, from_ft, to_ft;
struct page *buddy;
if (isolate == get_pageblock_isolate(page)) {
@@ -2247,14 +2300,15 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
/*
* Check whether there is a suitable fallback freepage with requested order.
- * If claimable is true, this function returns fallback_mt only if
+ * If claimable is true, this function returns a fallback only if
* we would do this whole-block claiming. This would help to reduce
* fragmentation due to mixed migratetype pages in one pageblock.
*/
int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable)
+ freetype_t freetype, bool claimable)
{
int i;
+ enum migratetype migratetype = free_to_migratetype(freetype);
if (claimable && !should_try_claim_block(order, migratetype))
return -2;
@@ -2264,9 +2318,11 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
int fallback_mt = fallbacks[migratetype][i];
+ freetype_t fallback_ft = migrate_to_freetype(fallback_mt,
+ freetype_sensitive(freetype));
- if (!free_area_empty(area, fallback_mt))
- return fallback_mt;
+ if (!free_area_empty(area, fallback_ft))
+ return fallback_ft.type;
}
return -1;
@@ -2281,20 +2337,22 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
*/
static struct page *
try_to_claim_block(struct zone *zone, struct page *page,
- int current_order, int order, int start_type,
- int block_type, unsigned int alloc_flags)
+ int current_order, int order, freetype_t start_type,
+ freetype_t block_type, unsigned int alloc_flags)
{
int free_pages, movable_pages, alike_pages;
+ int block_mt = free_to_migratetype(block_type);
+ int start_mt = free_to_migratetype(start_type);
unsigned long start_pfn;
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
unsigned int nr_added;
- del_page_from_free_list(page, zone, current_order, block_type);
- change_pageblock_range(page, current_order, start_type);
+ del_page_from_free_list(page, zone, current_order, block_mt);
+ change_pageblock_range(page, current_order, start_mt);
nr_added = expand(zone, page, order, current_order, start_type);
- account_freepages(zone, nr_added, start_type);
+ account_freepages(zone, nr_added, start_mt);
return page;
}
@@ -2316,7 +2374,7 @@ try_to_claim_block(struct zone *zone, struct page *page,
* For movable allocation, it's the number of movable pages which
* we just obtained. For other types it's a bit more tricky.
*/
- if (start_type == MIGRATE_MOVABLE) {
+ if (start_mt == MIGRATE_MOVABLE) {
alike_pages = movable_pages;
} else {
/*
@@ -2326,7 +2384,7 @@ try_to_claim_block(struct zone *zone, struct page *page,
* vice versa, be conservative since we can't distinguish the
* exact migratetype of non-movable pages.
*/
- if (block_type == MIGRATE_MOVABLE)
+ if (block_mt == MIGRATE_MOVABLE)
alike_pages = pageblock_nr_pages
- (free_pages + movable_pages);
else
@@ -2339,7 +2397,7 @@ try_to_claim_block(struct zone *zone, struct page *page,
if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
page_group_by_mobility_disabled) {
__move_freepages_block(zone, start_pfn, block_type, start_type);
- set_pageblock_migratetype(pfn_to_page(start_pfn), start_type);
+ set_pageblock_migratetype(pfn_to_page(start_pfn), start_mt);
return __rmqueue_smallest(zone, order, start_type);
}
@@ -2355,14 +2413,14 @@ try_to_claim_block(struct zone *zone, struct page *page,
* condition simpler.
*/
static __always_inline struct page *
-__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
+__rmqueue_claim(struct zone *zone, int order, freetype_t start_freetype,
unsigned int alloc_flags)
{
struct free_area *area;
int current_order;
int min_order = order;
struct page *page;
- int fallback_mt;
+ int fallback;
/*
* Do not steal pages from freelists belonging to other pageblocks
@@ -2379,25 +2437,29 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
*/
for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
--current_order) {
+ int start_mt = free_to_migratetype(start_freetype);
+ freetype_t fallback_ft;
+
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, true);
+ fallback = find_suitable_fallback(area, current_order,
+ start_freetype, true);
/* No block in that order */
- if (fallback_mt == -1)
+ if (fallback == -1)
continue;
/* Advanced into orders too low to claim, abort */
- if (fallback_mt == -2)
+ if (fallback == -2)
break;
- page = get_page_from_free_area(area, fallback_mt);
+ fallback_ft.type = fallback;
+ page = get_page_from_free_area(area, fallback_ft);
page = try_to_claim_block(zone, page, current_order, order,
- start_migratetype, fallback_mt,
+ start_freetype, fallback_ft,
alloc_flags);
if (page) {
trace_mm_page_alloc_extfrag(page, order, current_order,
- start_migratetype, fallback_mt);
+ start_mt, free_to_migratetype(fallback_ft));
return page;
}
}
@@ -2410,24 +2472,26 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
* the block as its current migratetype, potentially causing fragmentation.
*/
static __always_inline struct page *
-__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+__rmqueue_steal(struct zone *zone, int order, freetype_t start_freetype)
{
struct free_area *area;
int current_order;
struct page *page;
- int fallback_mt;
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
+ freetype_t fallback_ft;
+
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, false);
- if (fallback_mt == -1)
+ fallback_ft.type = find_suitable_fallback(area, current_order,
+ start_freetype, false);
+ if (fallback_ft.type == -1)
continue;
- page = get_page_from_free_area(area, fallback_mt);
- page_del_and_expand(zone, page, order, current_order, fallback_mt);
+ page = get_page_from_free_area(area, fallback_ft);
+ page_del_and_expand(zone, page, order, current_order, fallback_ft);
trace_mm_page_alloc_extfrag(page, order, current_order,
- start_migratetype, fallback_mt);
+ free_to_migratetype(start_freetype),
+ free_to_migratetype(fallback_ft));
return page;
}
@@ -2446,7 +2510,7 @@ enum rmqueue_mode {
* Call me with the zone->lock already held.
*/
static __always_inline struct page *
-__rmqueue(struct zone *zone, unsigned int order, int migratetype,
+__rmqueue(struct zone *zone, unsigned int order, freetype_t freetype,
unsigned int alloc_flags, enum rmqueue_mode *mode)
{
struct page *page;
@@ -2460,7 +2524,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
- page = __rmqueue_cma_fallback(zone, order);
+ page = __rmqueue_cma_fallback(zone, order,
+ freetype_sensitive(freetype));
if (page)
return page;
}
@@ -2477,13 +2542,14 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
*/
switch (*mode) {
case RMQUEUE_NORMAL:
- page = __rmqueue_smallest(zone, order, migratetype);
+ page = __rmqueue_smallest(zone, order, freetype);
if (page)
return page;
fallthrough;
case RMQUEUE_CMA:
if (alloc_flags & ALLOC_CMA) {
- page = __rmqueue_cma_fallback(zone, order);
+ page = __rmqueue_cma_fallback(zone, order,
+ freetype_sensitive(freetype));
if (page) {
*mode = RMQUEUE_CMA;
return page;
@@ -2491,7 +2557,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
fallthrough;
case RMQUEUE_CLAIM:
- page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
+ page = __rmqueue_claim(zone, order, freetype, alloc_flags);
if (page) {
/* Replenished preferred freelist, back to normal mode. */
*mode = RMQUEUE_NORMAL;
@@ -2500,7 +2566,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
fallthrough;
case RMQUEUE_STEAL:
if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
- page = __rmqueue_steal(zone, order, migratetype);
+ page = __rmqueue_steal(zone, order, freetype);
if (page) {
*mode = RMQUEUE_STEAL;
return page;
@@ -2517,7 +2583,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
*/
static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
- int migratetype, unsigned int alloc_flags)
+ freetype_t freetype, unsigned int alloc_flags)
{
enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
unsigned long flags;
@@ -2530,7 +2596,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
spin_lock_irqsave(&zone->lock, flags);
}
for (i = 0; i < count; ++i) {
- struct page *page = __rmqueue(zone, order, migratetype,
+ struct page *page = __rmqueue(zone, order, freetype,
alloc_flags, &rmqm);
if (unlikely(page == NULL))
break;
@@ -2815,8 +2881,8 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
}
static void free_frozen_page_commit(struct zone *zone,
- struct per_cpu_pages *pcp, struct page *page, int migratetype,
- unsigned int order, fpi_t fpi_flags)
+ struct per_cpu_pages *pcp, struct page *page,
+ freetype_t freetype, unsigned int order, fpi_t fpi_flags)
{
int high, batch;
int pindex;
@@ -2829,7 +2895,7 @@ static void free_frozen_page_commit(struct zone *zone,
*/
pcp->alloc_factor >>= 1;
__count_vm_events(PGFREE, 1 << order);
- pindex = order_to_pindex(migratetype, order);
+ pindex = order_to_pindex(free_to_migratetype(freetype), order);
list_add(&page->pcp_list, &pcp->lists[pindex]);
pcp->count += 1 << order;
@@ -2896,6 +2962,7 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
struct zone *zone;
unsigned long pfn = page_to_pfn(page);
int migratetype;
+ freetype_t freetype;
if (!pcp_allowed_order(order)) {
__free_pages_ok(page, order, fpi_flags);
@@ -2913,13 +2980,14 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
* excessively into the page allocator
*/
zone = page_zone(page);
- migratetype = get_pfnblock_migratetype(page, pfn);
+ freetype = get_pfnblock_freetype(page, pfn);
+ migratetype = free_to_migratetype(freetype);
if (unlikely(migratetype >= MIGRATE_PCPTYPES)) {
if (unlikely(is_migrate_isolate(migratetype))) {
free_one_page(zone, page, pfn, order, fpi_flags);
return;
}
- migratetype = MIGRATE_MOVABLE;
+ freetype = freetype_with_migrate(freetype, MIGRATE_MOVABLE);
}
if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT)
@@ -2930,7 +2998,7 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
pcp_trylock_prepare(UP_flags);
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (pcp) {
- free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags);
+ free_frozen_page_commit(zone, pcp, page, freetype, order, fpi_flags);
pcp_spin_unlock(pcp);
} else {
free_one_page(zone, page, pfn, order, fpi_flags);
@@ -2982,10 +3050,12 @@ void free_unref_folios(struct folio_batch *folios)
struct zone *zone = folio_zone(folio);
unsigned long pfn = folio_pfn(folio);
unsigned int order = (unsigned long)folio->private;
+ freetype_t freetype;
int migratetype;
folio->private = NULL;
- migratetype = get_pfnblock_migratetype(&folio->page, pfn);
+ freetype = get_pfnblock_freetype(&folio->page, pfn);
+ migratetype = free_to_migratetype(freetype);
/* Different zone requires a different pcp lock */
if (zone != locked_zone ||
@@ -3027,10 +3097,11 @@ void free_unref_folios(struct folio_batch *folios)
* to the MIGRATE_MOVABLE pcp list.
*/
if (unlikely(migratetype >= MIGRATE_PCPTYPES))
- migratetype = MIGRATE_MOVABLE;
+ freetype = freetype_with_migrate(freetype,
+ MIGRATE_MOVABLE);
trace_mm_page_free_batched(&folio->page);
- free_frozen_page_commit(zone, pcp, &folio->page, migratetype,
+ free_frozen_page_commit(zone, pcp, &folio->page, freetype,
order, FPI_NONE);
}
@@ -3091,14 +3162,16 @@ int __isolate_free_page(struct page *page, unsigned int order)
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
- int mt = get_pageblock_migratetype(page);
+ freetype_t old_ft = get_pageblock_freetype(page);
+ freetype_t new_ft = freetype_with_migrate(old_ft,
+ MIGRATE_MOVABLE);
+
/*
* Only change normal pageblocks (i.e., they can merge
* with others)
*/
if (migratetype_is_mergeable(mt))
- move_freepages_block(zone, page, mt,
- MIGRATE_MOVABLE);
+ move_freepages_block(zone, page, old_ft, new_ft);
}
}
@@ -3114,7 +3187,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
* This function is meant to return a page pulled from the free lists via
* __isolate_free_page back to the free lists they were pulled from.
*/
-void __putback_isolated_page(struct page *page, unsigned int order, int mt)
+void __putback_isolated_page(struct page *page, unsigned int order,
+ freetype_t freetype)
{
struct zone *zone = page_zone(page);
@@ -3122,7 +3196,7 @@ void __putback_isolated_page(struct page *page, unsigned int order, int mt)
lockdep_assert_held(&zone->lock);
/* Return isolated page to tail of freelist. */
- __free_one_page(page, page_to_pfn(page), zone, order, mt,
+ __free_one_page(page, page_to_pfn(page), zone, order, freetype,
FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL);
}
@@ -3155,10 +3229,12 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
static __always_inline
struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
unsigned int order, unsigned int alloc_flags,
- int migratetype)
+ freetype_t freetype)
{
struct page *page;
unsigned long flags;
+ freetype_t ft_high = freetype_with_migrate(freetype,
+ MIGRATE_HIGHATOMIC);
do {
page = NULL;
@@ -3169,11 +3245,11 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
spin_lock_irqsave(&zone->lock, flags);
}
if (alloc_flags & ALLOC_HIGHATOMIC)
- page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+ page = __rmqueue_smallest(zone, order, ft_high);
if (!page) {
enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
- page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
+ page = __rmqueue(zone, order, freetype, alloc_flags, &rmqm);
/*
* If the allocation fails, allow OOM handling and
@@ -3182,7 +3258,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
* high-order atomic allocation in the future.
*/
if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK)))
- page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+ page = __rmqueue_smallest(zone, order, ft_high);
if (!page) {
spin_unlock_irqrestore(&zone->lock, flags);
@@ -3251,7 +3327,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order)
/* Remove page from the per-cpu list, caller must protect the list */
static inline
struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
- int migratetype,
+ freetype_t freetype,
unsigned int alloc_flags,
struct per_cpu_pages *pcp,
struct list_head *list)
@@ -3265,7 +3341,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
alloced = rmqueue_bulk(zone, order,
batch, list,
- migratetype, alloc_flags);
+ freetype, alloc_flags);
pcp->count += alloced << order;
if (unlikely(list_empty(list)))
@@ -3283,7 +3359,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
/* Lock and remove page from the per-cpu list */
static struct page *rmqueue_pcplist(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
- int migratetype, unsigned int alloc_flags)
+ freetype_t freetype, unsigned int alloc_flags)
{
struct per_cpu_pages *pcp;
struct list_head *list;
@@ -3304,8 +3380,8 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
* frees.
*/
pcp->free_count >>= 1;
- list = &pcp->lists[order_to_pindex(migratetype, order)];
- page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
+ list = &pcp->lists[order_to_pindex(free_to_migratetype(freetype), order)];
+ page = __rmqueue_pcplist(zone, order, freetype, alloc_flags, pcp, list);
pcp_spin_unlock(pcp);
pcp_trylock_finish(UP_flags);
if (page) {
@@ -3331,19 +3407,19 @@ static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
gfp_t gfp_flags, unsigned int alloc_flags,
- int migratetype)
+ freetype_t freetype)
{
struct page *page;
if (likely(pcp_allowed_order(order))) {
page = rmqueue_pcplist(preferred_zone, zone, order,
- migratetype, alloc_flags);
+ freetype, alloc_flags);
if (likely(page))
goto out;
}
page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
- migratetype);
+ freetype);
out:
/* Separate test+clear to avoid unnecessary atomics */
@@ -3365,7 +3441,7 @@ struct page *rmqueue(struct zone *preferred_zone,
static void reserve_highatomic_pageblock(struct page *page, int order,
struct zone *zone)
{
- int mt;
+ freetype_t ft, ft_high;
unsigned long max_managed, flags;
/*
@@ -3387,13 +3463,14 @@ static void reserve_highatomic_pageblock(struct page *page, int order,
goto out_unlock;
/* Yoink! */
- mt = get_pageblock_migratetype(page);
+ ft = get_pageblock_freetype(page);
/* Only reserve normal pageblocks (i.e., they can merge with others) */
- if (!migratetype_is_mergeable(mt))
+ if (!migratetype_is_mergeable(free_to_migratetype(ft)))
goto out_unlock;
+ ft_high = freetype_with_migrate(ft, MIGRATE_HIGHATOMIC);
if (order < pageblock_order) {
- if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) == -1)
+ if (move_freepages_block(zone, page, ft, ft_high) == -1)
goto out_unlock;
zone->nr_reserved_highatomic += pageblock_nr_pages;
} else {
@@ -3438,9 +3515,11 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < NR_PAGE_ORDERS; order++) {
struct free_area *area = &(zone->free_area[order]);
+ freetype_t ft_high = freetype_with_migrate(ac->freetype,
+ MIGRATE_HIGHATOMIC);
unsigned long size;
- page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
+ page = get_page_from_free_area(area, ft_high);
if (!page)
continue;
@@ -3467,14 +3546,14 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
*/
if (order < pageblock_order)
ret = move_freepages_block(zone, page,
- MIGRATE_HIGHATOMIC,
- ac->migratetype);
+ ft_high,
+ ac->freetype);
else {
move_to_free_list(page, zone, order,
- MIGRATE_HIGHATOMIC,
- ac->migratetype);
+ ft_high,
+ ac->freetype);
change_pageblock_range(page, order,
- ac->migratetype);
+ free_to_migratetype(ac->freetype));
ret = 1;
}
/*
@@ -3580,18 +3659,18 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
continue;
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
- if (!free_area_empty(area, mt))
+ if (!free_areas_empty(area, mt))
return true;
}
#ifdef CONFIG_CMA
if ((alloc_flags & ALLOC_CMA) &&
- !free_area_empty(area, MIGRATE_CMA)) {
+ !free_areas_empty(area, MIGRATE_CMA)) {
return true;
}
#endif
if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
- !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
+ !free_areas_empty(area, MIGRATE_HIGHATOMIC)) {
return true;
}
}
@@ -3715,7 +3794,7 @@ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
unsigned int alloc_flags)
{
#ifdef CONFIG_CMA
- if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+ if (free_to_migratetype(gfp_freetype(gfp_mask)) == MIGRATE_MOVABLE)
alloc_flags |= ALLOC_CMA;
#endif
return alloc_flags;
@@ -3878,7 +3957,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
try_this_zone:
page = rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order,
- gfp_mask, alloc_flags, ac->migratetype);
+ gfp_mask, alloc_flags, ac->freetype);
if (page) {
prep_new_page(page, order, gfp_mask, alloc_flags);
@@ -4644,6 +4723,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
unsigned int zonelist_iter_cookie;
int reserve_flags;
+ enum migratetype migratetype;
if (unlikely(nofail)) {
/*
@@ -4714,6 +4794,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ migratetype = free_to_migratetype(ac->freetype);
+
/*
* For costly allocations, try direct compaction first, as it's likely
* that we have enough base pages and don't need to reclaim. For non-
@@ -4725,7 +4807,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
*/
if (can_direct_reclaim && can_compact &&
(costly_order ||
- (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+ (order > 0 && migratetype != MIGRATE_MOVABLE))
&& !gfp_pfmemalloc_allowed(gfp_mask)) {
page = __alloc_pages_direct_compact(gfp_mask, order,
alloc_flags, ac,
@@ -4933,7 +5015,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
ac->highest_zoneidx = gfp_zone(gfp_mask);
ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
ac->nodemask = nodemask;
- ac->migratetype = gfp_migratetype(gfp_mask);
+ ac->freetype = gfp_freetype(gfp_mask);
if (cpusets_enabled()) {
*alloc_gfp |= __GFP_HARDWALL;
@@ -5094,7 +5176,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
goto failed_irq;
/* Attempt the batch allocation */
- pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)];
+ pcp_list = &pcp->lists[order_to_pindex(free_to_migratetype(ac.freetype), 0)];
while (nr_populated < nr_pages) {
/* Skip existing pages */
@@ -5103,7 +5185,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
continue;
}
- page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
+ page = __rmqueue_pcplist(zone, 0, ac.freetype, alloc_flags,
pcp, pcp_list);
if (unlikely(!page)) {
/* Try and allocate at least one page */
@@ -5208,7 +5290,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
page = NULL;
}
- trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ trace_mm_page_alloc(page, order, alloc_gfp,
+ free_to_migratetype(ac.freetype));
kmsan_alloc_page(page, order, alloc_gfp);
return page;
@@ -7607,7 +7690,8 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
__free_frozen_pages(page, order, FPI_TRYLOCK);
page = NULL;
}
- trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ trace_mm_page_alloc(page, order, alloc_gfp,
+ free_to_migratetype(ac.freetype));
kmsan_alloc_page(page, order, alloc_gfp);
return page;
}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f72b6cd38b958be97edcea9ce20154ff43131a4a..572128767a34d87cfc7ba856e78860e06706730d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -265,7 +265,7 @@ static void unset_migratetype_isolate(struct page *page)
WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page));
} else {
clear_pageblock_isolate(page);
- __putback_isolated_page(page, order, get_pageblock_migratetype(page));
+ __putback_isolated_page(page, order, get_pageblock_freetype(page));
}
zone->nr_isolate_pageblock--;
out:
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c3ca21132c2c18e77cd4b6b3edb586fc1ac3cba7..355012309cea1373759e81aa1e45220160250801 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -469,7 +469,8 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
goto ext_put_continue;
page_owner = get_page_owner(page_ext);
- page_mt = gfp_migratetype(page_owner->gfp_mask);
+ page_mt = free_to_migratetype(
+ gfp_freetype(page_owner->gfp_mask));
if (pageblock_mt != page_mt) {
if (is_migrate_cma(pageblock_mt))
count[MIGRATE_MOVABLE]++;
@@ -554,7 +555,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
/* Print information relevant to grouping pages by mobility */
pageblock_mt = get_pageblock_migratetype(page);
- page_mt = gfp_migratetype(page_owner->gfp_mask);
+ page_mt = free_to_migratetype(gfp_freetype(page_owner->gfp_mask));
ret += scnprintf(kbuf + ret, count - ret,
"PFN 0x%lx type %s Block %lu type %s Flags %pGp\n",
pfn,
@@ -605,7 +606,7 @@ void __dump_page_owner(const struct page *page)
page_owner = get_page_owner(page_ext);
gfp_mask = page_owner->gfp_mask;
- mt = gfp_migratetype(gfp_mask);
+ mt = free_to_migratetype(gfp_freetype(gfp_mask));
if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
pr_alert("page_owner info is not present (never set?)\n");
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index e4c428e61d8c1765ae00ee22818cfadfd27f324c..faf0347e795cabf8a52ba3a35b5483ea0a8e1934 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -113,10 +113,10 @@ page_reporting_drain(struct page_reporting_dev_info *prdev,
*/
do {
struct page *page = sg_page(sg);
- int mt = get_pageblock_migratetype(page);
+ freetype_t ft = get_pageblock_freetype(page);
unsigned int order = get_order(sg->length);
- __putback_isolated_page(page, order, mt);
+ __putback_isolated_page(page, order, ft);
/* If the pages were not reported due to error skip flagging */
if (!reported)
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 3a4b5207635da8e5224ace454badb99ca017170d..f0d9cbd37b0c0297cfb57349ce23df7d52b98d97 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -374,7 +374,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z
types[order] = 0;
for (type = 0; type < MIGRATE_TYPES; type++) {
- if (!free_area_empty(area, type))
+ if (!free_areas_empty(area, type))
types[order] |= 1 << type;
}
}
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 12/21] mm/asi: encode sensitivity in freetypes and pageblocks
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (10 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 11/21] mm: introduce freetype_t Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers Brendan Jackman
` (12 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Now that there is a higher-level concept for encoding the "type" of a
collection of freelists, use this to encode sensitivity, i.e. whether
pages are currently mapped into ASI restricted address spaces.
Just like with migratetypes, the sensitivity of a page needs to be
looked up from the pageblock flags when it is freed, so add a bit for
that too.
Increase the number of freelists and update the pcplist index-mapping
logic to encode sensitivity as another dimension. Then update
NR_SENSITIVITIES and code that iterates over freelists to be aware of
the new set of lists.
Blocks of differing sensitivity cannot be merged, so update
__free_one_page() to reflect that.
Finally, update __move_freepages_block_isolate() to be aware that it of
which sensitivity's freelists it needs to manipulate.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
include/linux/gfp.h | 2 +-
include/linux/mmzone.h | 31 ++++++++++--
include/linux/pageblock-flags.h | 18 +++++++
mm/internal.h | 10 +++-
mm/page_alloc.c | 104 ++++++++++++++++++++++++++--------------
5 files changed, 123 insertions(+), 42 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index a275171c5a6aecafd7783e57ce7d4316c5e56655..a186a932f19e7c450e6e6b9a5f6e592f6e8f2bed 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -35,7 +35,7 @@ static inline freetype_t gfp_freetype(const gfp_t gfp_flags)
>> GFP_MOVABLE_SHIFT;
}
- return migrate_to_freetype(migratetype, false);
+ return migrate_to_freetype(migratetype, gfp_flags & __GFP_SENSITIVE);
}
#undef GFP_MOVABLE_MASK
#undef GFP_MOVABLE_SHIFT
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 56310722f38b788154ee15845b6877ed7e70d6b7..c16e2c1581c8ec0cb241ab340f8e8f65717b0cdb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -5,6 +5,7 @@
#ifndef __ASSEMBLY__
#ifndef __GENERATING_BOUNDS_H
+#include <linux/asi.h>
#include <linux/spinlock.h>
#include <linux/list.h>
#include <linux/list_nulls.h>
@@ -123,7 +124,11 @@ static inline bool migratetype_is_mergeable(int mt)
return mt < MIGRATE_PCPTYPES;
}
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#define NR_SENSITIVITIES 2
+#else
#define NR_SENSITIVITIES 1
+#endif
/*
* A freetype is the index used to identify free lists (free area lists and
@@ -141,18 +146,30 @@ static inline freetype_t migrate_to_freetype(enum migratetype mt, bool sensitive
{
freetype_t freetype;
- freetype.type = mt;
+ /*
+ * When ASI is off, .sensitive is meaningless. Set it to false so that
+ * freetype values are the same when asi=off as when ASI is
+ * compiled out.
+ */
+ if (!asi_enabled_static())
+ sensitive = false;
+
+ freetype.type = (MIGRATE_TYPES * sensitive) + mt;
return freetype;
}
static inline enum migratetype free_to_migratetype(freetype_t freetype)
{
- return freetype.type;
+ VM_WARN_ON_ONCE(!asi_enabled_static() && freetype.type >= MIGRATE_TYPES);
+ return freetype.type % MIGRATE_TYPES;
}
static inline bool freetype_sensitive(freetype_t freetype)
{
- return false;
+ bool sensitive = freetype.type / MIGRATE_TYPES;
+
+ VM_WARN_ON_ONCE(!asi_enabled_static() && sensitive);
+ return sensitive;
}
/* Convenience helper, return the freetype modified to have the migratetype. */
@@ -174,10 +191,14 @@ static inline bool freetypes_equal(freetype_t a, freetype_t b)
#define for_each_free_list(list, zone) \
for (unsigned int order = 0; order < NR_PAGE_ORDERS; order++) \
- for (unsigned int type = 0; \
- list = &zone->free_area[order].free_list[type], \
+ for (unsigned int type = 0;\
type < MIGRATE_TYPES; \
type++) \
+ for (int sensitive = 0; \
+ list = free_area_list(&zone->free_area[order], \
+ migrate_to_freetype(type, sensitive)), \
+ sensitive < NR_SENSITIVITIES; \
+ sensitive++)
extern int page_group_by_mobility_disabled;
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 13457e920e892c1c5083e0dc63e2ecfbed88f60e..289542ce027ca937cbad8dfed37cd2b35e5f3ab5 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -18,6 +18,14 @@ enum pageblock_bits {
PB_migrate_0,
PB_migrate_1,
PB_migrate_2,
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+ /*
+ * Block is mapped into restricted address spaces. Having a
+ * "nonsensitive" flag instead of a "sensitive" flag is convenient
+ * so that the initial value of 0 is correct at boot.
+ */
+ PB_nonsensitive,
+#endif
PB_compact_skip,/* If set the block is skipped by compaction */
#ifdef CONFIG_MEMORY_ISOLATION
@@ -44,6 +52,16 @@ enum pageblock_bits {
#define PAGEBLOCK_ISO_MASK 0
#endif
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#define PAGEBLOCK_NONSENSITIVE_MASK BIT(PB_nonsensitive)
+#else
+#define PAGEBLOCK_NONSENSITIVE_MASK 0
+#endif
+
+#define PAGEBLOCK_FREETYPE_MASK (PAGEBLOCK_MIGRATETYPE_MASK | \
+ PAGEBLOCK_ISO_MASK | \
+ PAGEBLOCK_NONSENSITIVE_MASK)
+
#if defined(CONFIG_HUGETLB_PAGE)
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
diff --git a/mm/internal.h b/mm/internal.h
index 50ff6671f19d38a59c9f07e66d347baf85ddf085..0401412220a76a233e14a7ee7d64c1194fc3759d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -960,7 +960,15 @@ static inline bool free_area_empty(struct free_area *area, freetype_t freetype)
static inline bool free_areas_empty(struct free_area *area, int migratetype)
{
- return free_area_empty(area, migrate_to_freetype(migratetype, false));
+ bool sensitive;
+
+ for_each_sensitivity(sensitive) {
+ freetype_t ft = migrate_to_freetype(migratetype, sensitive);
+
+ if (!free_area_empty(area, ft))
+ return false;
+ }
+ return true;
}
/* mm/util.c */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4ce81f8d4e59966b7c0c2902e24aa2f4639a0e59..5943b821089b72fd148bd93ee035c0e70e45ec91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -365,11 +365,8 @@ get_pfnblock_bitmap_bitidx(const struct page *page, unsigned long pfn,
unsigned long *bitmap;
unsigned long word_bitidx;
-#ifdef CONFIG_MEMORY_ISOLATION
- BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 8);
-#else
- BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
-#endif
+ /* NR_PAGEBLOCK_BITS must divide word size. */
+ BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4 && NR_PAGEBLOCK_BITS != 8);
BUILD_BUG_ON(__MIGRATE_TYPE_END > PAGEBLOCK_MIGRATETYPE_MASK);
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
@@ -442,9 +439,18 @@ __always_inline freetype_t
__get_pfnblock_freetype(const struct page *page, unsigned long pfn,
bool ignore_iso)
{
- int mt = get_pfnblock_migratetype(page, pfn);
+ unsigned long mask = PAGEBLOCK_FREETYPE_MASK;
+ enum migratetype migratetype;
+ unsigned long flags;
- return migrate_to_freetype(mt, false);
+ flags = __get_pfnblock_flags_mask(page, pfn, mask);
+
+ migratetype = flags & PAGEBLOCK_MIGRATETYPE_MASK;
+#ifdef CONFIG_MEMORY_ISOLATION
+ if (!ignore_iso && flags & BIT(PB_migrate_isolate))
+ migratetype = MIGRATE_ISOLATE;
+#endif
+ return migrate_to_freetype(migratetype, !(flags & PAGEBLOCK_NONSENSITIVE_MASK));
}
/**
@@ -601,7 +607,7 @@ void __meminit init_pageblock_migratetype(struct page *page,
flags |= BIT(PB_migrate_isolate);
#endif
__set_pfnblock_flags_mask(page, page_to_pfn(page), flags,
- PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK);
+ PAGEBLOCK_FREETYPE_MASK);
}
#ifdef CONFIG_DEBUG_VM
@@ -685,29 +691,39 @@ static void bad_page(struct page *page, const char *reason)
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}
-static inline unsigned int order_to_pindex(int migratetype, int order)
+static inline unsigned int order_to_pindex(freetype_t freetype, int order)
{
- if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ int migratetype = free_to_migratetype(freetype);
+ /* pindex if the freetype is nonsensitive */
+ int pindex_ns;
+
+ VM_BUG_ON(!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ order > PAGE_ALLOC_COSTLY_ORDER);
+
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ order > PAGE_ALLOC_COSTLY_ORDER) {
bool movable = migratetype == MIGRATE_MOVABLE;
- if (order > PAGE_ALLOC_COSTLY_ORDER) {
- VM_BUG_ON(order != HPAGE_PMD_ORDER);
-
- return NR_LOWORDER_PCP_LISTS + movable;
- }
+ VM_BUG_ON(order != HPAGE_PMD_ORDER);
+ pindex_ns = NR_LOWORDER_PCP_LISTS + movable;
} else {
- VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
+ pindex_ns = (MIGRATE_PCPTYPES * order) + migratetype;
}
- return (MIGRATE_PCPTYPES * order) + migratetype;
+ return (NR_PCP_LISTS_PER_SENSITIVITY * freetype_sensitive(freetype))
+ + pindex_ns;
}
-static inline int pindex_to_order(unsigned int pindex)
+inline int pindex_to_order(unsigned int pindex)
{
- int order = pindex / MIGRATE_PCPTYPES;
+ /* pindex if the freetype is nonsensitive */
+ int pindex_ns = (pindex % NR_PCP_LISTS_PER_SENSITIVITY);
+ int order = pindex_ns / MIGRATE_PCPTYPES;
+
+ VM_BUG_ON(pindex >= NR_PCP_LISTS);
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- if (pindex >= NR_LOWORDER_PCP_LISTS)
+ if (pindex_ns >= NR_LOWORDER_PCP_LISTS)
order = HPAGE_PMD_ORDER;
} else {
VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
@@ -950,6 +966,26 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
NULL) != NULL;
}
+/*
+ * Can pages of these two freetypes be combined into a single higher-order free
+ * page?
+ */
+static inline bool can_merge_freetypes(freetype_t a, freetype_t b)
+{
+ if (freetypes_equal(a, b))
+ return true;
+
+ if (!migratetype_is_mergeable(free_to_migratetype(a)) ||
+ !migratetype_is_mergeable(free_to_migratetype(b)))
+ return false;
+
+ /*
+ * Mustn't merge differing sensitivities, changing the sensitivity
+ * requires changing pagetables.
+ */
+ return freetype_sensitive(a) == freetype_sensitive(b);
+}
+
/*
* Freeing function for a buddy system allocator.
*
@@ -1018,9 +1054,7 @@ static inline void __free_one_page(struct page *page,
buddy_ft = get_pfnblock_freetype(buddy, buddy_pfn);
buddy_mt = free_to_migratetype(buddy_ft);
- if (migratetype != buddy_mt &&
- (!migratetype_is_mergeable(migratetype) ||
- !migratetype_is_mergeable(buddy_mt)))
+ if (!can_merge_freetypes(freetype, buddy_ft))
goto done_merging;
}
@@ -1037,7 +1071,9 @@ static inline void __free_one_page(struct page *page,
/*
* Match buddy type. This ensures that an
* expand() down the line puts the sub-blocks
- * on the right freelists.
+ * on the right freelists. Sensitivity is
+ * already set correctly because of
+ * can_merge_freetypes().
*/
set_pageblock_migratetype(buddy, migratetype);
}
@@ -2174,18 +2210,16 @@ static bool __move_freepages_block_isolate(struct zone *zone,
}
move:
- /* Use PAGEBLOCK_MIGRATETYPE_MASK to get non-isolate migratetype */
+ block_ft = __get_pfnblock_freetype(page, page_to_pfn(page), true);
if (isolate) {
- from_mt = __get_pfnblock_flags_mask(page, page_to_pfn(page),
- PAGEBLOCK_MIGRATETYPE_MASK);
- to_mt = MIGRATE_ISOLATE;
+ from_ft = block_ft;
+ to_ft = freetype_with_migrate(block_ft, MIGRATE_ISOLATE);
} else {
- from_mt = MIGRATE_ISOLATE;
- to_mt = __get_pfnblock_flags_mask(page, page_to_pfn(page),
- PAGEBLOCK_MIGRATETYPE_MASK);
+ from_ft = freetype_with_migrate(block_ft, MIGRATE_ISOLATE);
+ to_ft = block_ft;
}
- __move_freepages_block(zone, start_pfn, from_mt, to_mt);
+ __move_freepages_block(zone, start_pfn, from_ft, to_ft);
toggle_pageblock_isolate(pfn_to_page(start_pfn), isolate);
return true;
@@ -2895,7 +2929,7 @@ static void free_frozen_page_commit(struct zone *zone,
*/
pcp->alloc_factor >>= 1;
__count_vm_events(PGFREE, 1 << order);
- pindex = order_to_pindex(free_to_migratetype(freetype), order);
+ pindex = order_to_pindex(freetype, order);
list_add(&page->pcp_list, &pcp->lists[pindex]);
pcp->count += 1 << order;
@@ -3380,7 +3414,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
* frees.
*/
pcp->free_count >>= 1;
- list = &pcp->lists[order_to_pindex(free_to_migratetype(freetype), order)];
+ list = &pcp->lists[order_to_pindex(freetype, order)];
page = __rmqueue_pcplist(zone, order, freetype, alloc_flags, pcp, list);
pcp_spin_unlock(pcp);
pcp_trylock_finish(UP_flags);
@@ -5176,7 +5210,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
goto failed_irq;
/* Attempt the batch allocation */
- pcp_list = &pcp->lists[order_to_pindex(free_to_migratetype(ac.freetype), 0)];
+ pcp_list = &pcp->lists[order_to_pindex(ac.freetype, 0)];
while (nr_populated < nr_pages) {
/* Skip existing pages */
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (11 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 12/21] mm/asi: encode sensitivity in freetypes and pageblocks Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-25 13:36 ` kernel test robot
2025-09-24 14:59 ` [PATCH 14/21] x86/mm/pat: introduce cpa_fault option Brendan Jackman
` (11 subsequent siblings)
24 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
The author struggles with really basic arithmetic. This test checks for
errors in the helpers that are used to map to and from pcplist indices.
This can be run via a basic kunit.py invocation:
tools/testing/kunit/kunit.py run "page_alloc.*"
That will run it via UML which means no THP or ASI. If you want to test
with those enabled you can set the --arch flag to run it via QEMU:
tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_TRANSPARENT_HUGEPAGE=y "page_alloc.*"
tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION=y "page_alloc.*"
tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION=y \
--kconfig_add CONFIG_TRANSPARENT_HUGEPAGE=y \
"page_alloc.*"
Signed-off-by: Brendan Jackman <jackmanb@google.com>
fix
---
mm/Kconfig | 5 ++++
mm/Makefile | 1 +
mm/internal.h | 6 +++++
mm/page_alloc.c | 10 +++++---
mm/page_alloc_test.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 89 insertions(+), 3 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 034a1662d8c1af320b2262ebcb0cb51d4622e6b0..e25451c1adbd6e079f2d00e3eb8a28affcedab7e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1375,4 +1375,9 @@ config FIND_NORMAL_PAGE
source "mm/damon/Kconfig"
+config PAGE_ALLOC_KUNIT_TEST
+ tristate "KUnit Tests for page_alloc code" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS
+
endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 21abb3353550153a7a477640e4fa6dc6df327541..c6ce46a2abf144f2e62df96ec7f606f90affc5f0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o
memory-hotplug-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
obj-y += page-alloc.o
+obj-$(CONFIG_PAGE_ALLOC_KUNIT_TEST) += page_alloc_test.o
obj-y += page_frag_cache.o
obj-y += init-mm.o
obj-y += memblock.o
diff --git a/mm/internal.h b/mm/internal.h
index 0401412220a76a233e14a7ee7d64c1194fc3759d..6006cfb2b9c7e771a0c647c471901dc7fcdad242 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1693,4 +1693,10 @@ static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
return remap_pfn_range_complete(vma, addr, pfn, size, prot);
}
+#ifdef CONFIG_KUNIT
+unsigned int order_to_pindex(freetype_t freetype, int order);
+int pindex_to_order(unsigned int pindex);
+bool pcp_allowed_order(unsigned int order);
+#endif /* CONFIG_KUNIT */
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5943b821089b72fd148bd93ee035c0e70e45ec91..0b205aefd27e188c492c32754db08a4488317bd8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -56,6 +56,7 @@
#include <linux/cacheinfo.h>
#include <linux/pgalloc_tag.h>
#include <asm/div64.h>
+#include <kunit/visibility.h>
#include "internal.h"
#include "shuffle.h"
#include "page_reporting.h"
@@ -691,7 +692,7 @@ static void bad_page(struct page *page, const char *reason)
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}
-static inline unsigned int order_to_pindex(freetype_t freetype, int order)
+VISIBLE_IF_KUNIT inline unsigned int order_to_pindex(freetype_t freetype, int order)
{
int migratetype = free_to_migratetype(freetype);
/* pindex if the freetype is nonsensitive */
@@ -713,8 +714,9 @@ static inline unsigned int order_to_pindex(freetype_t freetype, int order)
return (NR_PCP_LISTS_PER_SENSITIVITY * freetype_sensitive(freetype))
+ pindex_ns;
}
+EXPORT_SYMBOL_IF_KUNIT(order_to_pindex);
-inline int pindex_to_order(unsigned int pindex)
+VISIBLE_IF_KUNIT inline int pindex_to_order(unsigned int pindex)
{
/* pindex if the freetype is nonsensitive */
int pindex_ns = (pindex % NR_PCP_LISTS_PER_SENSITIVITY);
@@ -731,8 +733,9 @@ inline int pindex_to_order(unsigned int pindex)
return order;
}
+EXPORT_SYMBOL_IF_KUNIT(pindex_to_order);
-static inline bool pcp_allowed_order(unsigned int order)
+VISIBLE_IF_KUNIT inline bool pcp_allowed_order(unsigned int order)
{
if (order <= PAGE_ALLOC_COSTLY_ORDER)
return true;
@@ -742,6 +745,7 @@ static inline bool pcp_allowed_order(unsigned int order)
#endif
return false;
}
+EXPORT_SYMBOL_IF_KUNIT(pcp_allowed_order);
/*
* Higher-order pages are called "compound pages". They are structured thusly:
diff --git a/mm/page_alloc_test.c b/mm/page_alloc_test.c
new file mode 100644
index 0000000000000000000000000000000000000000..1cc615ce90d95c47ecae206a87f2af3fab3a5581
--- /dev/null
+++ b/mm/page_alloc_test.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bitmap.h>
+
+#include <kunit/test.h>
+
+#include "internal.h"
+
+/* This just checks for basic arithmetic errors. */
+static void test_pindex_helpers(struct kunit *test)
+{
+ unsigned long bitmap[bitmap_size(NR_PCP_LISTS)];
+
+ /* Bit means "pindex not yet used". */
+ bitmap_fill(bitmap, NR_PCP_LISTS);
+
+ for (unsigned int order = 0; order < NR_PAGE_ORDERS; order++) {
+ for (unsigned int mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
+ if (!pcp_allowed_order(order))
+ continue;
+
+ for (int sensitive = 0; sensitive < NR_SENSITIVITIES; sensitive++) {
+ freetype_t ft = migrate_to_freetype(mt, sensitive);
+ unsigned int pindex = order_to_pindex(ft, order);
+ int got_order;
+
+ KUNIT_ASSERT_LT_MSG(test, pindex, NR_PCP_LISTS,
+ "invalid pindex %d (order %d mt %d sensitive %d)",
+ pindex, order, mt, sensitive);
+ KUNIT_EXPECT_TRUE_MSG(test, test_bit(pindex, bitmap),
+ "pindex %d reused (order %d mt %d sensitive %d)",
+ pindex, order, mt, sensitive);
+
+ /*
+ * For THP, two migratetypes map to the
+ * same pindex, just manually exclude one
+ * of those cases.
+ */
+ if (!(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ order == HPAGE_PMD_ORDER &&
+ mt == min(MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE)))
+ clear_bit(pindex, bitmap);
+
+ got_order = pindex_to_order(pindex);
+ KUNIT_EXPECT_EQ_MSG(test, order, got_order,
+ "roundtrip failed, got %d want %d (pindex %d mt %d sensitive %d)",
+ got_order, order, pindex, mt, sensitive);
+
+ }
+ }
+ }
+
+ KUNIT_EXPECT_TRUE_MSG(test, bitmap_empty(bitmap, NR_PCP_LISTS),
+ "unused pindices: %*pbl", NR_PCP_LISTS, bitmap);
+}
+
+static struct kunit_case page_alloc_test_cases[] = {
+ KUNIT_CASE(test_pindex_helpers),
+ {}
+};
+
+static struct kunit_suite page_alloc_test_suite = {
+ .name = "page_alloc",
+ .test_cases = page_alloc_test_cases,
+};
+
+kunit_test_suite(page_alloc_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 14/21] x86/mm/pat: introduce cpa_fault option
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (12 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 15/21] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER Brendan Jackman
` (10 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Different usecases for the CPA code have different needs for the
behaviour when encountering an unmapped address.
Currently this is encoded by using the presence of the .pgd as a
side-channel. Subsequent ASI changes won't get the correct behaviour
based on this side-channel, so add an explicit enum to request the
different behaviours that might be needed.
Note this is now making explicit a couple of causes that populate the
pagetables when encountering holes, until now this was implicit:
1. kernel_unmap_pages_in_pgd()
Calling this function without a corresponding
kernel_map_pages_in_pgd() seems like a bug, so the "correct"
behaviour here might actually be CPA_FAULT_ERROR.
2. Ditto for __set_memory_enc_pgtable().
It seems the comment in __cpa_process_fault() (deleted in this patch)
may have been stale with regard to the coco usecases here (including
point 2).
Anyway, if these need to be updated that will be a separate patch, no
functional change is intended with this one.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/pat/set_memory.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 53c3ac0ba55d6b6992db6f6761ffdfbd52bf3688..2a50844515e81913fed32d5b6d1ec19e8e249533 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -36,6 +36,16 @@
#include "../mm_internal.h"
+/* What should CPA do if encountering an unmapped address? */
+enum cpa_fault {
+ /* Default depending on address. */
+ CPA_FAULT_DEFAULT = 0,
+ /* Populate cpa_data.pgd using cpa_data.pfn. */
+ CPA_FAULT_POPULATE,
+ /* Warn and return an error. */
+ CPA_FAULT_ERROR,
+};
+
/*
* The current flushing context - we pass it instead of 5 arguments:
*/
@@ -51,6 +61,7 @@ struct cpa_data {
unsigned int force_split : 1,
force_static_prot : 1,
force_flush_all : 1;
+ enum cpa_fault on_fault : 2;
struct page **pages;
};
@@ -1790,14 +1801,13 @@ static inline bool is_direct_map(unsigned long vaddr)
static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
int primary)
{
- if (cpa->pgd) {
- /*
- * Right now, we only execute this code path when mapping
- * the EFI virtual memory map regions, no other users
- * provide a ->pgd value. This may change in the future.
- */
+ if (cpa->on_fault == CPA_FAULT_POPULATE)
return populate_pgd(cpa, vaddr);
- }
+
+ if (WARN_ON(cpa->on_fault == CPA_FAULT_ERROR))
+ return -EFAULT;
+
+ /* CPA_FAULT_DEFAULT: */
/*
* Ignore all non primary paths.
@@ -2417,6 +2427,7 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
cpa.mask_set = enc ? pgprot_encrypted(empty) : pgprot_decrypted(empty);
cpa.mask_clr = enc ? pgprot_decrypted(empty) : pgprot_encrypted(empty);
cpa.pgd = init_mm.pgd;
+ cpa.on_fault = CPA_FAULT_POPULATE;
/* Must avoid aliasing mappings in the highmem code */
kmap_flush_unused();
@@ -2743,6 +2754,7 @@ int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
.mask_set = __pgprot(0),
.mask_clr = __pgprot(~page_flags & (_PAGE_NX|_PAGE_RW|_PAGE_DIRTY)),
.flags = CPA_NO_CHECK_ALIAS,
+ .on_fault = CPA_FAULT_POPULATE,
};
WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
@@ -2786,6 +2798,7 @@ int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
.mask_set = __pgprot(0),
.mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY),
.flags = CPA_NO_CHECK_ALIAS,
+ .on_fault = CPA_FAULT_POPULATE,
};
WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 15/21] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (13 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 14/21] x86/mm/pat: introduce cpa_fault option Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 16/21] mm/page_alloc: introduce ALLOC_NOBLOCK Brendan Jackman
` (9 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH
non-blocking allocations accesses reserves") renamed ALLOC_HARDER to
ALLOC_NON_BLOCK because the former is "a vague description".
However, vagueness is accurate here, this is a vague flag. It is not set
for __GFP_NOMEMALLOC. It doesn't really mean "allocate without blocking"
but rather "allow dipping into atomic reserves, _because_ of the need
not to block".
A later commit will need an alloc flag that really means "don't block
here", so go back to the flag's old name and update the commentary
to try and give it a slightly clearer meaning.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/internal.h | 9 +++++----
mm/page_alloc.c | 8 ++++----
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 6006cfb2b9c7e771a0c647c471901dc7fcdad242..513aba6c00bed813c9e38464aec5a15e65edaa58 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1297,9 +1297,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_OOM ALLOC_NO_WATERMARKS
#endif
-#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access
- * to 25% of the min watermark or
- * 62.5% if __GFP_HIGH is set.
+#define ALLOC_HARDER 0x10 /* Because the caller cannot block,
+ * allow access * to 25% of the min
+ * watermark or 62.5% if __GFP_HIGH is
+ * set.
*/
#define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50%
* of the min watermark.
@@ -1316,7 +1317,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
/* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
enum ttu_flags;
struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b205aefd27e188c492c32754db08a4488317bd8..cd47cfaae820ce696d2e6e0c47436e00d3feef60 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3295,7 +3295,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
* reserves as failing now is worse than failing a
* high-order atomic allocation in the future.
*/
- if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK)))
+ if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_HARDER)))
page = __rmqueue_smallest(zone, order, ft_high);
if (!page) {
@@ -3662,7 +3662,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
* or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
* access to the min reserve.
*/
- if (alloc_flags & ALLOC_NON_BLOCK)
+ if (alloc_flags & ALLOC_HARDER)
min -= min / 4;
}
@@ -4546,7 +4546,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
* The caller may dip into page reserves a bit more if the caller
* cannot run direct reclaim, or if the caller has realtime scheduling
* policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will
- * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
+ * set both ALLOC_HARDER and ALLOC_MIN_RESERVE(__GFP_HIGH).
*/
alloc_flags |= (__force int)
(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
@@ -4557,7 +4557,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
* if it can't schedule.
*/
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
- alloc_flags |= ALLOC_NON_BLOCK;
+ alloc_flags |= ALLOC_HARDER;
if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE))
alloc_flags |= ALLOC_HIGHATOMIC;
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 16/21] mm/page_alloc: introduce ALLOC_NOBLOCK
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (14 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 15/21] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 17/21] mm/slub: defer application of gfp_allowed_mask Brendan Jackman
` (8 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
This flag is set unless we can be sure the caller isn't in an atomic
context.
The allocator will soon start needing to call set_direct_map_* APIs
which cannot be called with IRQs off. It will need to do this even
before direct reclaim is possible.
Despite the fact that, in principle, ALLOC_NOBLOCK is distinct from
__GFP_DIRECT_RECLAIM, in order to avoid introducing a GFP flag, just
infer the former based on whether the caller set the latter. This means
that, in practice, ALLOC_NOBLOCK is just !__GFP_DIRECT_RECLAIM, except
that it is not influenced by gfp_allowed_mask.
Call it ALLOC_NOBLOCK in order to try and mitigate confusion vs the
recently-removed ALLOC_NON_BLOCK, which meant something different.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/internal.h | 1 +
mm/page_alloc.c | 29 ++++++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 513aba6c00bed813c9e38464aec5a15e65edaa58..c697ed35a8ca3376445d1e4249e9ce03097f15b8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1315,6 +1315,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+#define ALLOC_NOBLOCK 0x1000 /* Caller may be atomic */
/* Flags that allow allocations below the min watermark. */
#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd47cfaae820ce696d2e6e0c47436e00d3feef60..b0aeb97baa13af038fff0edae33affbbf49e825c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4552,6 +4552,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
+ alloc_flags |= ALLOC_NOBLOCK;
+
/*
* Not worth trying to allocate harder for __GFP_NOMEMALLOC even
* if it can't schedule.
@@ -4745,14 +4747,13 @@ check_retry_cpuset(int cpuset_mems_cookie, struct alloc_context *ac)
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
- struct alloc_context *ac)
+ struct alloc_context *ac, unsigned int alloc_flags)
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
bool can_compact = gfp_compaction_allowed(gfp_mask);
bool nofail = gfp_mask & __GFP_NOFAIL;
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
- unsigned int alloc_flags;
unsigned long did_some_progress;
enum compact_priority compact_priority;
enum compact_result compact_result;
@@ -4795,7 +4796,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* kswapd needs to be woken up, and to avoid the cost of setting up
* alloc_flags precisely. So we do that now.
*/
- alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
+ alloc_flags |= gfp_to_alloc_flags(gfp_mask, order);
/*
* We need to recalculate the starting point for the zonelist iterator
@@ -5045,6 +5046,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
return page;
}
+static inline unsigned int init_alloc_flags(gfp_t gfp_mask, unsigned int flags)
+{
+ /*
+ * If the caller allowed __GFP_DIRECT_RECLAIM, they can't be atomic.
+ * Note this is a separate determination from whether direct
+ * reclaim is actually allowed, it must happen before applying
+ * gfp_allowed_mask.
+ */
+ if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
+ flags |= ALLOC_NOBLOCK;
+ return flags;
+}
+
static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
int preferred_nid, nodemask_t *nodemask,
struct alloc_context *ac, gfp_t *alloc_gfp,
@@ -5121,7 +5135,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
struct list_head *pcp_list;
struct alloc_context ac;
gfp_t alloc_gfp;
- unsigned int alloc_flags = ALLOC_WMARK_LOW;
+ unsigned int alloc_flags = init_alloc_flags(gfp, ALLOC_WMARK_LOW);
int nr_populated = 0, nr_account = 0;
/*
@@ -5267,7 +5281,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
int preferred_nid, nodemask_t *nodemask)
{
struct page *page;
- unsigned int alloc_flags = ALLOC_WMARK_LOW;
+ unsigned int alloc_flags = init_alloc_flags(gfp, ALLOC_WMARK_LOW);
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = { };
@@ -5319,7 +5333,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
*/
ac.nodemask = nodemask;
- page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
+ page = __alloc_pages_slowpath(alloc_gfp, order, &ac, alloc_flags);
out:
if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
@@ -7684,10 +7698,11 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
*/
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_COMP
| gfp_flags;
- unsigned int alloc_flags = ALLOC_TRYLOCK;
+ unsigned int alloc_flags = init_alloc_flags(gfp_flags, ALLOC_TRYLOCK);
struct alloc_context ac = { };
struct page *page;
+ VM_WARN_ON_ONCE(!(alloc_flags & ALLOC_NOBLOCK));
VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT);
/*
* In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 17/21] mm/slub: defer application of gfp_allowed_mask
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (15 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 16/21] mm/page_alloc: introduce ALLOC_NOBLOCK Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 18/21] mm/asi: support changing pageblock sensitivity Brendan Jackman
` (7 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
The page allocator will take care of doing this so SLUB doesn't need to
do this before calling into it.
The page allocator will soon start using the GFP bits as a proxy to
infer if it can do blocking stuff (separately from whether it can do
actual reclaim), hence SLUB will benefit from leaving
__GFP_DIRECT_RECLAIM set even when it's forbidden by gfp_allowed_mask.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/slub.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 867a07260acf9e3c0f92de66e2d25f081ae51dcb..0f8724af4ce63f6e2a32e889f6490be7a25823eb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3177,8 +3177,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
int idx;
bool shuffle;
- flags &= gfp_allowed_mask;
-
flags |= s->allocflags;
/*
@@ -3212,7 +3210,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
slab->frozen = 0;
init_slab_obj_exts(slab);
- account_slab(slab, oo_order(oo), s, flags);
+ account_slab(slab, oo_order(oo), s, flags & gfp_allowed_mask);
slab->slab_cache = s;
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 18/21] mm/asi: support changing pageblock sensitivity
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (16 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 17/21] mm/slub: defer application of gfp_allowed_mask Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 19/21] mm/asi: bad_page() when ASI mappings are wrong Brendan Jackman
` (6 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Currently all pages are sensitive and there's no way to change that.
This patch introduces one. Changing sensitivity has some requirements:
- It can only be done at pageblock granularity. This means that
there's never a need to allocate pagetables to do it (since the
restricted direct map is always pre-allocated down to pageblock
granularity).
- Flipping pages from nonsensitive to sensitive (unmapping) requires a
TLB shootdown, meaning IRQs must be enabled.
- Flipping from sensitive to nonsensitive requires zeroing pages, which
seems like an undesirable thing to do with a spinlock held.
This makes allocations that need to change sensitivity _somewhat_
similar to those that need to fallback to a different migratetype. But,
the locking requirements mean that this can't just be squashed into the
existing "fallback" allocator logic, instead a new allocator path just
for this purpose is needed.
The new path is assumed to be much cheaper than the really heavyweight
stuff like compaction and reclaim. But at present it is treated as less
desirable than the mobility-related "fallback" and "stealing" logic.
This might turn out to need revision (in particular, maybe it's a
problem that __rmqueue_steal(), which causes fragmentation, happens
before __rmqueue_asi()), but that should be treated as a subsequent
optimisation project.
Now that !__GFP_SENSITIVE allocations are no longer doomed to fail, stop
hard-coding it at the top of the allocator.
Note that this does loads of unnecessary TLB flushes and IPIs. The
design goal here is that the transitions are rare enough that this
doesn't matter a huge amount, but it should still be addressed in later
patches.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/include/asm/set_memory.h | 10 ++++
arch/x86/mm/pat/set_memory.c | 28 ++++++++++
include/linux/set_memory.h | 6 +++
mm/page_alloc.c | 106 +++++++++++++++++++++++++++++++++-----
4 files changed, 138 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 61f56cdaccb5af18e36790677b635b4ab6f5e24d..396580693e7d1317537148c0c219296e2b7c13fd 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -92,6 +92,16 @@ int set_direct_map_default_noflush(struct page *page);
int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
bool kernel_page_present(struct page *page);
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensitive);
+#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+static inline
+int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensitive)
+{
+ return 0;
+}
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+
extern int kernel_set_to_readonly;
#endif /* _ASM_X86_SET_MEMORY_H */
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 2a50844515e81913fed32d5b6d1ec19e8e249533..88fb65574d4fa0089fa31a9a06fe096c408991e6 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -4,6 +4,7 @@
* Thanks to Ben LaHaise for precious feedback.
*/
#include <linux/asi.h>
+#include <linux/align.h>
#include <linux/highmem.h>
#include <linux/memblock.h>
#include <linux/sched.h>
@@ -2695,6 +2696,33 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
return __set_pages_np(page, nr);
}
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+/*
+ * Map/unmap a set of contiguous pageblocks into all ASI restricted address
+ * spaces. All pagetables are pre-allocated so this can be called anywhere.
+ * This should not be called on pages that may be mapped elsewhere.
+ */
+int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensitive)
+{
+ if (WARN_ON_ONCE(!IS_ALIGNED(page_to_pfn(page), 1 << pageblock_order)))
+ return -EINVAL;
+
+ unsigned long tempaddr = (unsigned long)page_address(page);
+ struct cpa_data cpa = { .vaddr = &tempaddr,
+ .pgd = asi_nonsensitive_pgd,
+ .numpages = num_pageblocks << pageblock_order,
+ .flags = CPA_NO_CHECK_ALIAS,
+ .on_fault = CPA_FAULT_ERROR, };
+
+ if (sensitive)
+ cpa.mask_clr = __pgprot(_PAGE_PRESENT);
+ else
+ cpa.mask_set = __pgprot(_PAGE_PRESENT);
+
+ return __change_page_attr_set_clr(&cpa, 1);
+}
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+
#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 3030d9245f5ac8a35b27e249c6d8b9539f148635..db4225c046c47c114293af8b504886b103dc94ce 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -44,6 +44,12 @@ static inline bool kernel_page_present(struct page *page)
{
return true;
}
+
+static inline int set_direct_map_sensitive(struct page *page,
+ int num_pageblocks, bool sensitive) {
+ return 0;
+}
+
#else /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
/*
* Some architectures, e.g. ARM64 can disable direct map modifications at
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b0aeb97baa13af038fff0edae33affbbf49e825c..a8e3556643b0ff2fe1d35a678937270356006d34 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -44,6 +44,7 @@
#include <linux/mm_inline.h>
#include <linux/mmu_notifier.h>
#include <linux/migrate.h>
+#include <linux/set_memory.h>
#include <linux/sched/mm.h>
#include <linux/page_owner.h>
#include <linux/page_table_check.h>
@@ -585,6 +586,13 @@ static void set_pageblock_migratetype(struct page *page,
PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK);
}
+static inline void set_pageblock_sensitive(struct page *page, bool sensitive)
+{
+ __set_pfnblock_flags_mask(page, page_to_pfn(page),
+ sensitive ? 0 : PAGEBLOCK_NONSENSITIVE_MASK,
+ PAGEBLOCK_NONSENSITIVE_MASK);
+}
+
void __meminit init_pageblock_migratetype(struct page *page,
enum migratetype migratetype,
bool isolate)
@@ -3264,6 +3272,85 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
#endif
}
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+static inline struct page *__rmqueue_asi(struct zone *zone, unsigned int request_order,
+ unsigned int alloc_flags, freetype_t freetype)
+{
+ freetype_t freetype_other = migrate_to_freetype(
+ free_to_migratetype(freetype), !freetype_sensitive(freetype));
+ unsigned long flags;
+ struct page *page;
+ int alloc_order;
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+ int nr_pageblocks;
+
+ if (!asi_enabled_static())
+ return NULL;
+
+ /*
+ * Might need a TLB shootdown. Even if IRQs are on this isn't
+ * safe if the caller holds a lock (in case the other CPUs need that
+ * lock to handle the shootdown IPI).
+ */
+ if (alloc_flags & ALLOC_NOBLOCK)
+ return NULL;
+ lockdep_assert(!irqs_disabled() || unlikely(early_boot_irqs_disabled));
+
+ /*
+ * Need to [un]map a whole pageblock (otherwise it might require
+ * allocating pagetables). First allocate it.
+ */
+ alloc_order = max(request_order, pageblock_order);
+ nr_pageblocks = 1 << (alloc_order - pageblock_order);
+ spin_lock_irqsave(&zone->lock, flags);
+ page = __rmqueue(zone, alloc_order, freetype_other, alloc_flags, &rmqm);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ if (!page)
+ return NULL;
+
+ if (!freetype_sensitive(freetype)) {
+ /*
+ * These pages were formerly sensitive so we need to clear them
+ * out before exposing them to CPU attacks. Doing this with the
+ * zone lock held would have been undesirable.
+ */
+ kernel_init_pages(page, 1 << alloc_order);
+ }
+
+ /*
+ * Now that IRQs are on it's safe to do a TLB shootdown, so change
+ * mapping and update pageblock flags.
+ */
+ set_direct_map_sensitive(page, nr_pageblocks, freetype_sensitive(freetype));
+ for (int i = 0; i < nr_pageblocks; i++) {
+ struct page *block_page = page + (pageblock_nr_pages * i);
+
+ set_pageblock_sensitive(block_page, freetype_sensitive(freetype));
+ }
+
+ if (request_order >= alloc_order)
+ return page;
+
+ /* Free any remaining pages in the block. */
+ spin_lock_irqsave(&zone->lock, flags);
+ for (unsigned int i = request_order; i < alloc_order; i++) {
+ struct page *page_to_free = page + (1 << i);
+
+ __free_one_page(page_to_free, page_to_pfn(page_to_free), zone,
+ i, freetype, FPI_SKIP_REPORT_NOTIFY);
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+
+ return page;
+}
+#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+static inline struct page *__rmqueue_asi(struct zone *zone, unsigned int request_order,
+ unsigned int alloc_flags, freetype_t freetype)
+{
+ return NULL;
+}
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+
static __always_inline
struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
unsigned int order, unsigned int alloc_flags,
@@ -3297,13 +3384,15 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
*/
if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_HARDER)))
page = __rmqueue_smallest(zone, order, ft_high);
-
- if (!page) {
- spin_unlock_irqrestore(&zone->lock, flags);
- return NULL;
- }
}
spin_unlock_irqrestore(&zone->lock, flags);
+
+ /* Try changing sensitivity, now we've released the zone lock */
+ if (!page)
+ page = __rmqueue_asi(zone, order, alloc_flags, freetype);
+ if (!page)
+ return NULL;
+
} while (check_new_pages(page, order));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -5285,13 +5374,6 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
struct alloc_context ac = { };
- /*
- * Temporary hack: Allocation of nonsensitive pages is not possible yet,
- * allocate everything sensitive. The restricted address space is never
- * actually entered yet so this is fine.
- */
- gfp |= __GFP_SENSITIVE;
-
/*
* There are several places where we assume that the order value is sane
* so bail out early if the request is out of bound.
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 19/21] mm/asi: bad_page() when ASI mappings are wrong
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (17 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 18/21] mm/asi: support changing pageblock sensitivity Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 20/21] x86/mm/asi: don't use global pages when ASI enabled Brendan Jackman
` (5 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Add bad_page() checks that fire when the page allocator thinks a page is
mapped/unmapped in ASI restricted address space, but the pagetables
disagree.
This requires adding an accessor for set_memory.c to walk the page
tables and report the state.
This is implemented with the assumption that the mapping is at pageblock
granularity. That means it doesn't need to be repeated for each order-0
page. As a result of this special order-awareness, it can't go into
free_page_is_bad() and needs to be separately integrated into
free_pages_prepare(). The alloc side is easier - there it just goes into
check_new_pages().
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/include/asm/set_memory.h | 3 +++
arch/x86/mm/pat/set_memory.c | 31 +++++++++++++++++++++++++++
include/linux/set_memory.h | 2 ++
mm/page_alloc.c | 45 ++++++++++++++++++++++++++++++++++-----
4 files changed, 76 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 396580693e7d1317537148c0c219296e2b7c13fd..3870fa8cf51c0ece0dedf4d7876c4d14111deffd 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -94,12 +94,15 @@ bool kernel_page_present(struct page *page);
#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensitive);
+bool direct_map_sensitive(struct page *page);
#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
static inline
int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensitive)
{
return 0;
}
+
+static inline bool direct_map_sensitive(struct page *page) { return false; }
#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
extern int kernel_set_to_readonly;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 88fb65574d4fa0089fa31a9a06fe096c408991e6..d4c3219374f889f9a60c459f0559e5ffb472073d 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2721,6 +2721,37 @@ int set_direct_map_sensitive(struct page *page, int num_pageblocks, bool sensiti
return __change_page_attr_set_clr(&cpa, 1);
}
+
+/*
+ * Walk the pagetable to check if the page is mapped into all ASI restricted
+ * address spaces.
+ */
+bool direct_map_sensitive(struct page *page)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgd_t *pgd = pgd_offset_pgd(asi_nonsensitive_pgd, addr);
+ unsigned int level;
+ bool nx, rw;
+ pte_t *pte = lookup_address_in_pgd_attr(pgd, addr, &level, &nx, &rw);
+
+ switch (level) {
+ case PG_LEVEL_4K:
+ /*
+ * lookup_address_in_pgd_attr() still returns the PTE for
+ * non-present 4K pages.
+ */
+ return !pte_present(*pte);
+ case PG_LEVEL_2M:
+ /*
+ * pmd_present() checks PSE to deal with some hugetlb
+ * logic. That's not relevant for the direct map so just
+ * explicitly check the real P bit.
+ */
+ return !(pmd_flags(*(pmd_t *)pte) & _PAGE_PRESENT);
+ default:
+ return !pte;
+ }
+}
#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
#ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index db4225c046c47c114293af8b504886b103dc94ce..6f42d6a35feceeae4623c2da50cfac54e3533228 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -50,6 +50,8 @@ static inline int set_direct_map_sensitive(struct page *page,
return 0;
}
+static inline bool direct_map_sensitive(struct page *page) { return false; }
+
#else /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
/*
* Some architectures, e.g. ARM64 can disable direct map modifications at
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8e3556643b0ff2fe1d35a678937270356006d34..68bc3cc5ed7e7f1adb8dda90edc2e001f9a1c3c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -15,6 +15,7 @@
* (lots of bits borrowed from Ingo Molnar & Andrew Morton)
*/
+#include <linux/asi.h>
#include <linux/stddef.h>
#include <linux/mm.h>
#include <linux/highmem.h>
@@ -1161,6 +1162,33 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
return bad_reason;
}
+static bool page_asi_mapping_bad(struct page *page, unsigned int order, bool sensitive)
+{
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+ if (asi_enabled_static()) {
+ struct page *block_page = page;
+
+ /*
+ * ASI mappings are at pageblock granularity. Check they match
+ * the requested sensitivity.
+ */
+ while (block_page < page + (1 << order)) {
+ if (direct_map_sensitive(block_page) != sensitive) {
+ bad_page(page,
+ sensitive ?
+ "page unexpectedly nonsensitive" :
+ "page unexpectedly sensitive");
+ return true;
+ }
+
+ block_page += pageblock_nr_pages;
+ }
+ }
+#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */
+
+ return false;
+}
+
static inline bool free_page_is_bad(struct page *page)
{
if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
@@ -1471,8 +1499,14 @@ __always_inline bool free_pages_prepare(struct page *page,
page->page_type = UINT_MAX;
if (is_check_pages_enabled()) {
+ freetype_t ft = get_pageblock_freetype(page);
+
if (free_page_is_bad(page))
bad++;
+
+ if (!bad)
+ bad += page_asi_mapping_bad(page, order,
+ freetype_sensitive(ft));
if (bad)
return false;
}
@@ -1840,7 +1874,8 @@ static bool check_new_page(struct page *page)
return true;
}
-static inline bool check_new_pages(struct page *page, unsigned int order)
+static inline bool check_new_pages(struct page *page, unsigned int order,
+ bool sensitive)
{
if (!is_check_pages_enabled())
return false;
@@ -1852,7 +1887,7 @@ static inline bool check_new_pages(struct page *page, unsigned int order)
return true;
}
- return false;
+ return page_asi_mapping_bad(page, order, sensitive);
}
static inline bool should_skip_kasan_unpoison(gfp_t flags)
@@ -3393,7 +3428,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
if (!page)
return NULL;
- } while (check_new_pages(page, order));
+ } while (check_new_pages(page, order, freetype_sensitive(freetype)));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
zone_statistics(preferred_zone, zone, 1);
@@ -3478,7 +3513,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
page = list_first_entry(list, struct page, pcp_list);
list_del(&page->pcp_list);
pcp->count -= 1 << order;
- } while (check_new_pages(page, order));
+ } while (check_new_pages(page, order, freetype_sensitive(freetype)));
return page;
}
@@ -7231,7 +7266,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
struct page *head = pfn_to_page(start);
- check_new_pages(head, order);
+ check_new_pages(head, order, gfp_mask & __GFP_SENSITIVE);
prep_new_page(head, order, gfp_mask, 0);
set_page_refcounted(head);
} else {
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 20/21] x86/mm/asi: don't use global pages when ASI enabled
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (18 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 19/21] mm/asi: bad_page() when ASI mappings are wrong Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 21/21] mm: asi_test: smoke test for [non]sensitive page allocs Brendan Jackman
` (4 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
For the same reason as with PTI, ASI means kernel pages can't be global:
TLB entries from the unrestricted address space must not carry over into
the restricted address space.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/mm/init.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8fd34475af7ccd49d0124e13a87342d3bfef3e05..45bbe764ca4e0abc13d41590dc4ff466180cca31 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -249,8 +249,9 @@ static void __init probe_page_size_mask(void)
/* By the default is everything supported: */
__default_kernel_pte_mask = __supported_pte_mask;
- /* Except when with PTI where the kernel is mostly non-Global: */
- if (cpu_feature_enabled(X86_FEATURE_PTI))
+ /* Except when with PTI/ASI where the kernel is mostly non-Global: */
+ if (cpu_feature_enabled(X86_FEATURE_PTI) ||
+ cpu_feature_enabled(X86_FEATURE_ASI))
__default_kernel_pte_mask &= ~_PAGE_GLOBAL;
/* Enable 1 GB linear kernel mappings if available: */
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 21/21] mm: asi_test: smoke test for [non]sensitive page allocs
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (19 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 20/21] x86/mm/asi: don't use global pages when ASI enabled Brendan Jackman
@ 2025-09-24 14:59 ` Brendan Jackman
2025-09-25 17:51 ` [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (3 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-24 14:59 UTC (permalink / raw)
To: jackmanb, Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
Add a simple smoke test for allocating pages of different sensitivities.
Since KUnit doesn't (yet) have infrastructure for this, add custom .init
and .exit hooks to our kunit_suite that detect WARNs by checking the
kernel's taint flags before and after running the tests.
Since ASI is disabled by default, whatever command people currently use
to run KUnit tests probably won't run these tests. Therefore add a new
.kunitconfig file for the x86 tree that explicitly enables ASI. It
should be possible to delete this again when ASI is the default.
So the most straightforward way to run this test is:
tools/testing/kunit/kunit.py run --arch=x86_64 \
--kunitconfig=arch/x86/.kunitconfig --kernel_args asi=on
The more long-winded way, which lets you customize the kernel config,
is:
mkdir -p .kunit
cp arch/x86/.kunitconfig .kunit
tools/testing/kunit/kunit.py config --arch=x86_64
make O=.kunit menuconfig # Or debug.config or whatever
tools/testing/kunit/kunit.py run --arch=x86_64 --kernel_args asi=on
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
arch/x86/.kunitconfig | 7 +++
arch/x86/Kconfig | 7 +++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/asi_test.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/panic.c | 2 +
mm/init-mm.c | 3 +
6 files changed, 166 insertions(+)
diff --git a/arch/x86/.kunitconfig b/arch/x86/.kunitconfig
new file mode 100644
index 0000000000000000000000000000000000000000..83219e6ecca8d2064aba71fab1f15d57161fa2e4
--- /dev/null
+++ b/arch/x86/.kunitconfig
@@ -0,0 +1,7 @@
+CONFIG_PCI=y
+CONFIG_MMU=y
+CONFIG_64BIT=y
+CONFIG_X86_64=y
+CONFIG_KUNIT=y
+CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION=y
+CONFIG_ASI_KUNIT_TESTS=y
\ No newline at end of file
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cb874c3857cf443c6235e05bc3f070b0ea2686f0..a7b5658ecb1203458e06a0a065bcc7aa7dca8538 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2458,6 +2458,13 @@ config MITIGATION_PAGE_TABLE_ISOLATION
See Documentation/arch/x86/pti.rst for more details.
+config ASI_KUNIT_TESTS
+ tristate "KUnit tests for ASI" if !KUNIT_ALL_TESTS
+ depends on MITIGATION_ADDRESS_SPACE_ISOLATION && KUNIT
+ default KUNIT_ALL_TESTS
+ help
+ Builds the KUnit tests for ASI.
+
config MITIGATION_RETPOLINE
bool "Avoid speculative indirect branches in kernel"
select OBJTOOL if HAVE_OBJTOOL
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 5ecbff70964f61a903ac96cec3736a7cec1221fd..7c36ec7f24ebb285fcfc010004206a57536fc990 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -58,3 +58,5 @@ obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
+
+obj-$(CONFIG_ASI_KUNIT_TESTS) += asi_test.o
diff --git a/arch/x86/mm/asi_test.c b/arch/x86/mm/asi_test.c
new file mode 100644
index 0000000000000000000000000000000000000000..6076a61980ed9daea63113a30e990eb02a7b08d5
--- /dev/null
+++ b/arch/x86/mm/asi_test.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/mm_types.h>
+#include <linux/mm.h>
+#include <linux/pgtable.h>
+#include <linux/set_memory.h>
+#include <linux/sched/mm.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include <kunit/resource.h>
+#include <kunit/test.h>
+
+#include <asm/asi.h>
+
+struct free_pages_ctx {
+ unsigned int order;
+ struct list_head pages;
+};
+
+static void action_many__free_pages(void *context)
+{
+ struct free_pages_ctx *ctx = context;
+ struct page *page, *tmp;
+
+ list_for_each_entry_safe(page, tmp, &ctx->pages, lru)
+ __free_pages(page, ctx->order);
+}
+
+/*
+ * Allocate a bunch of pages with the same order and GFP flags, transparently
+ * take care of error handling and cleanup. Does this all via a single KUnit
+ * resource, i.e. has a fixed memory overhead.
+ */
+static struct free_pages_ctx *do_many_alloc_pages(struct kunit *test, gfp_t gfp,
+ unsigned int order, unsigned int count)
+{
+ struct free_pages_ctx *ctx = kunit_kzalloc(
+ test, sizeof(struct free_pages_ctx), GFP_KERNEL);
+
+ KUNIT_ASSERT_NOT_NULL(test, ctx);
+ INIT_LIST_HEAD(&ctx->pages);
+ ctx->order = order;
+
+ for (int i = 0; i < count; i++) {
+ struct page *page = alloc_pages(gfp, order);
+
+ if (!page) {
+ struct page *page, *tmp;
+
+ list_for_each_entry_safe(page, tmp, &ctx->pages, lru)
+ __free_pages(page, order);
+
+ KUNIT_FAIL_AND_ABORT(test,
+ "Failed to alloc order %d page (GFP *%pG) iter %d",
+ order, &gfp, i);
+ }
+ list_add(&page->lru, &ctx->pages);
+ }
+
+ KUNIT_ASSERT_EQ(test,
+ kunit_add_action_or_reset(test, action_many__free_pages, ctx), 0);
+ return ctx;
+}
+
+/*
+ * Do some allocations that force the allocator to change the sensitivity of
+ * some blocks.
+ */
+static void test_alloc_sensitive_nonsensitive(struct kunit *test)
+{
+ unsigned long page_majority;
+ struct free_pages_ctx *ctx;
+ gfp_t gfp = GFP_KERNEL | __GFP_THISNODE;
+ struct page *page;
+
+ if (!cpu_feature_enabled(X86_FEATURE_ASI))
+ kunit_skip(test, "ASI off. Set asi=on in kernel cmdline\n");
+
+ /* No cleanup here - assuming kthread "belongs" to this test. */
+ set_cpus_allowed_ptr(current, cpumask_of_node(numa_node_id()));
+
+ /*
+ * First allocate more than half of the memory in the node as
+ * nonsensitive. Assuming the memory starts out unmapped, this should
+ * exercise the sensitive->nonsensitive flip already.
+ */
+ page_majority = (node_present_pages(numa_node_id()) / 2) + 1;
+ ctx = do_many_alloc_pages(test, gfp, 0, page_majority);
+
+ /* Check pages are mapped */
+ list_for_each_entry(page, &ctx->pages, lru) {
+ /*
+ * Logically it should be an EXPECT, but that would cause heavy
+ * log spam on failure so use ASSERT for concision.
+ */
+ KUNIT_ASSERT_FALSE(test, direct_map_sensitive(page));
+ }
+
+ /*
+ * Now free them again and allocate the same amount as sensitive.
+ * This will exercise the nonsensitive->sensitive flip.
+ */
+ kunit_release_action(test, action_many__free_pages, ctx);
+ gfp |= __GFP_SENSITIVE;
+ ctx = do_many_alloc_pages(test, gfp, 0, page_majority);
+
+ /* Check pages are unmapped */
+ list_for_each_entry(page, &ctx->pages, lru)
+ KUNIT_ASSERT_TRUE(test, direct_map_sensitive(page));
+}
+
+static struct kunit_case asi_test_cases[] = {
+ KUNIT_CASE(test_alloc_sensitive_nonsensitive),
+ {}
+};
+
+static unsigned long taint_pre;
+
+static int store_taint_pre(struct kunit *test)
+{
+ taint_pre = get_taint();
+ return 0;
+}
+
+static void check_taint_post(struct kunit *test)
+{
+ unsigned long new_taint = get_taint() & ~taint_pre;
+
+ KUNIT_EXPECT_EQ_MSG(test, new_taint, 0,
+ "Kernel newly tainted after test. Maybe a WARN?");
+}
+
+static struct kunit_suite asi_test_suite = {
+ .name = "asi",
+ .init = store_taint_pre,
+ .exit = check_taint_post,
+ .test_cases = asi_test_cases,
+};
+
+kunit_test_suite(asi_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
diff --git a/kernel/panic.c b/kernel/panic.c
index d9c7cd09aeb9fe22f05e0b05d26555e20e502d2f..6aa79c5192520af55cd473912d2ac802de687304 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -39,6 +39,7 @@
#include <linux/sys_info.h>
#include <trace/events/error_report.h>
#include <asm/sections.h>
+#include <kunit/visibility.h>
#define PANIC_TIMER_STEP 100
#define PANIC_BLINK_SPD 18
@@ -737,6 +738,7 @@ unsigned long get_taint(void)
{
return tainted_mask;
}
+EXPORT_SYMBOL_IF_KUNIT(get_taint);
/**
* add_taint: add a taint flag if not already set.
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 4600e7605cab43b4bce24b85ec1667db8b92dc80..456b8f7d2ab3bd7963a51908dff76713a4e65ab5 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -13,6 +13,8 @@
#include <linux/iommu.h>
#include <asm/mmu.h>
+#include <kunit/visibility.h>
+
#ifndef INIT_MM_CONTEXT
#define INIT_MM_CONTEXT(name)
#endif
@@ -47,6 +49,7 @@ struct mm_struct init_mm = {
.cpu_bitmap = CPU_BITS_NONE,
INIT_MM_CONTEXT(init_mm)
};
+EXPORT_SYMBOL_IF_KUNIT(init_mm);
void setup_initial_init_mm(void *start_code, void *end_code,
void *end_data, void *brk)
--
2.50.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH 11/21] mm: introduce freetype_t
2025-09-24 14:59 ` [PATCH 11/21] mm: introduce freetype_t Brendan Jackman
@ 2025-09-25 13:15 ` kernel test robot
2025-10-01 21:20 ` Dave Hansen
1 sibling, 0 replies; 65+ messages in thread
From: kernel test robot @ 2025-09-25 13:15 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: llvm, oe-kbuild-all, peterz, bp, dave.hansen, mingo, tglx, akpm,
david, derkling, junaids, linux-kernel, linux-mm, reijiw,
rientjes, rppt, vbabka, x86, yosry.ahmed
Hi Brendan,
kernel test robot noticed the following build errors:
[auto build test ERROR on bf2602a3cb2381fb1a04bf1c39a290518d2538d1]
url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/x86-mm-asi-Add-CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION/20250924-230633
base: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
patch link: https://lore.kernel.org/r/20250924-b4-asi-page-alloc-v1-11-2d861768041f%40google.com
patch subject: [PATCH 11/21] mm: introduce freetype_t
config: i386-buildonly-randconfig-001-20250925 (https://download.01.org/0day-ci/archive/20250925/202509252109.X3oqS16b-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250925/202509252109.X3oqS16b-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509252109.X3oqS16b-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/compaction.c:2319:7: error: use of undeclared identifier 'migratetype'; did you mean 'migrate_pfn'?
2319 | if (migratetype == MIGRATE_MOVABLE &&
| ^~~~~~~~~~~
| migrate_pfn
include/linux/migrate.h:137:29: note: 'migrate_pfn' declared here
137 | static inline unsigned long migrate_pfn(unsigned long pfn)
| ^
1 error generated.
vim +2319 mm/compaction.c
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2234
40cacbcb3240362 Mel Gorman 2019-03-05 2235 static enum compact_result __compact_finished(struct compact_control *cc)
748446bb6b5a939 Mel Gorman 2010-05-24 2236 {
8fb74b9fb2b182d Mel Gorman 2013-01-11 2237 unsigned int order;
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2238 const freetype_t freetype = cc->freetype;
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2239 int ret;
748446bb6b5a939 Mel Gorman 2010-05-24 2240
753341a4b85ff33 Mel Gorman 2012-10-08 2241 /* Compaction run completes if the migrate and free scanner meet */
f2849aa09d4fbc4 Vlastimil Babka 2015-09-08 2242 if (compact_scanners_met(cc)) {
55b7c4c99f6a448 Vlastimil Babka 2014-01-21 2243 /* Let the next compaction start anew. */
40cacbcb3240362 Mel Gorman 2019-03-05 2244 reset_cached_positions(cc->zone);
55b7c4c99f6a448 Vlastimil Babka 2014-01-21 2245
62997027ca5b3d4 Mel Gorman 2012-10-08 2246 /*
62997027ca5b3d4 Mel Gorman 2012-10-08 2247 * Mark that the PG_migrate_skip information should be cleared
accf62422b3a67f Vlastimil Babka 2016-03-17 2248 * by kswapd when it goes to sleep. kcompactd does not set the
62997027ca5b3d4 Mel Gorman 2012-10-08 2249 * flag itself as the decision to be clear should be directly
62997027ca5b3d4 Mel Gorman 2012-10-08 2250 * based on an allocation request.
62997027ca5b3d4 Mel Gorman 2012-10-08 2251 */
accf62422b3a67f Vlastimil Babka 2016-03-17 2252 if (cc->direct_compaction)
40cacbcb3240362 Mel Gorman 2019-03-05 2253 cc->zone->compact_blockskip_flush = true;
62997027ca5b3d4 Mel Gorman 2012-10-08 2254
c8f7de0bfae36e8 Michal Hocko 2016-05-20 2255 if (cc->whole_zone)
748446bb6b5a939 Mel Gorman 2010-05-24 2256 return COMPACT_COMPLETE;
c8f7de0bfae36e8 Michal Hocko 2016-05-20 2257 else
c8f7de0bfae36e8 Michal Hocko 2016-05-20 2258 return COMPACT_PARTIAL_SKIPPED;
bb13ffeb9f6bfeb Mel Gorman 2012-10-08 2259 }
748446bb6b5a939 Mel Gorman 2010-05-24 2260
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2261 if (cc->proactive_compaction) {
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2262 int score, wmark_low;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2263 pg_data_t *pgdat;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2264
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2265 pgdat = cc->zone->zone_pgdat;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2266 if (kswapd_is_running(pgdat))
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2267 return COMPACT_PARTIAL_SKIPPED;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2268
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2269 score = fragmentation_score_zone(cc->zone);
8fbb92bd10be26d Kemeng Shi 2023-08-09 2270 wmark_low = fragmentation_score_wmark(true);
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2271
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2272 if (score > wmark_low)
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2273 ret = COMPACT_CONTINUE;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2274 else
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2275 ret = COMPACT_SUCCESS;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2276
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2277 goto out;
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2278 }
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2279
21c527a3cba07f9 Yaowei Bai 2015-11-05 2280 if (is_via_compact_memory(cc->order))
56de7263fcf3eb1 Mel Gorman 2010-05-24 2281 return COMPACT_CONTINUE;
56de7263fcf3eb1 Mel Gorman 2010-05-24 2282
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2283 /*
efe771c7603bc52 Mel Gorman 2019-03-05 2284 * Always finish scanning a pageblock to reduce the possibility of
efe771c7603bc52 Mel Gorman 2019-03-05 2285 * fallbacks in the future. This is particularly important when
efe771c7603bc52 Mel Gorman 2019-03-05 2286 * migration source is unmovable/reclaimable but it's not worth
efe771c7603bc52 Mel Gorman 2019-03-05 2287 * special casing.
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2288 */
ee0913c47196102 Kefeng Wang 2022-09-07 2289 if (!pageblock_aligned(cc->migrate_pfn))
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2290 return COMPACT_CONTINUE;
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2291
a211c6550efcc87 Johannes Weiner 2025-03-13 2292 /*
a211c6550efcc87 Johannes Weiner 2025-03-13 2293 * When defrag_mode is enabled, make kcompactd target
a211c6550efcc87 Johannes Weiner 2025-03-13 2294 * watermarks in whole pageblocks. Because they can be stolen
a211c6550efcc87 Johannes Weiner 2025-03-13 2295 * without polluting, no further fallback checks are needed.
a211c6550efcc87 Johannes Weiner 2025-03-13 2296 */
a211c6550efcc87 Johannes Weiner 2025-03-13 2297 if (defrag_mode && !cc->direct_compaction) {
a211c6550efcc87 Johannes Weiner 2025-03-13 2298 if (__zone_watermark_ok(cc->zone, cc->order,
a211c6550efcc87 Johannes Weiner 2025-03-13 2299 high_wmark_pages(cc->zone),
a211c6550efcc87 Johannes Weiner 2025-03-13 2300 cc->highest_zoneidx, cc->alloc_flags,
a211c6550efcc87 Johannes Weiner 2025-03-13 2301 zone_page_state(cc->zone,
a211c6550efcc87 Johannes Weiner 2025-03-13 2302 NR_FREE_PAGES_BLOCKS)))
a211c6550efcc87 Johannes Weiner 2025-03-13 2303 return COMPACT_SUCCESS;
a211c6550efcc87 Johannes Weiner 2025-03-13 2304
a211c6550efcc87 Johannes Weiner 2025-03-13 2305 return COMPACT_CONTINUE;
a211c6550efcc87 Johannes Weiner 2025-03-13 2306 }
a211c6550efcc87 Johannes Weiner 2025-03-13 2307
56de7263fcf3eb1 Mel Gorman 2010-05-24 2308 /* Direct compactor: Is a suitable page free? */
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2309 ret = COMPACT_NO_SUITABLE_PAGE;
fd37721803c6e73 Kirill A. Shutemov 2023-12-28 2310 for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
40cacbcb3240362 Mel Gorman 2019-03-05 2311 struct free_area *area = &cc->zone->free_area[order];
8fb74b9fb2b182d Mel Gorman 2013-01-11 2312
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2313 /* Job done if page is free of the right freetype */
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2314 if (!free_area_empty(area, freetype))
cf378319d335663 Vlastimil Babka 2016-10-07 2315 return COMPACT_SUCCESS;
56de7263fcf3eb1 Mel Gorman 2010-05-24 2316
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2317 #ifdef CONFIG_CMA
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2318 /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 @2319 if (migratetype == MIGRATE_MOVABLE &&
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2320 !free_areas_empty(area, MIGRATE_CMA))
cf378319d335663 Vlastimil Babka 2016-10-07 2321 return COMPACT_SUCCESS;
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2322 #endif
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2323 /*
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2324 * Job done if allocation would steal freepages from
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2325 * other freetype buddy lists.
2149cdaef6c0eb5 Joonsoo Kim 2015-04-14 2326 */
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2327 if (find_suitable_fallback(area, order, freetype, true) >= 0)
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2328 /*
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2329 * Movable pages are OK in any pageblock of the right
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2330 * sensitivity. If we are * stealing for a
fb432f7fe43d9fb Brendan Jackman 2025-09-24 2331 * non-movable allocation, make sure
fa599c44987df43 Miaohe Lin 2022-04-28 2332 * we finish compacting the current pageblock first
fa599c44987df43 Miaohe Lin 2022-04-28 2333 * (which is assured by the above migrate_pfn align
fa599c44987df43 Miaohe Lin 2022-04-28 2334 * check) so it is as free as possible and we won't
fa599c44987df43 Miaohe Lin 2022-04-28 2335 * have to steal another one soon.
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2336 */
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2337 return COMPACT_SUCCESS;
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2338 }
baf6a9a1db5a40e Vlastimil Babka 2017-05-08 2339
facdaa917c4d5a3 Nitin Gupta 2020-08-11 2340 out:
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2341 if (cc->contended || fatal_signal_pending(current))
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2342 ret = COMPACT_CONTENDED;
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2343
cb2dcaf023c2cf1 Mel Gorman 2019-03-05 2344 return ret;
837d026d560c5ef Joonsoo Kim 2015-02-11 2345 }
837d026d560c5ef Joonsoo Kim 2015-02-11 2346
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-09-24 14:59 ` [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI Brendan Jackman
@ 2025-09-25 13:36 ` kernel test robot
2025-10-01 20:50 ` Dave Hansen
1 sibling, 0 replies; 65+ messages in thread
From: kernel test robot @ 2025-09-25 13:36 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: oe-kbuild-all, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86, Yosry Ahmed
Hi Brendan,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bf2602a3cb2381fb1a04bf1c39a290518d2538d1]
url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/x86-mm-asi-Add-CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION/20250924-230633
base: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
patch link: https://lore.kernel.org/r/20250924-b4-asi-page-alloc-v1-5-2d861768041f%40google.com
patch subject: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
config: x86_64-buildonly-randconfig-002-20250925 (https://download.01.org/0day-ci/archive/20250925/202509252153.JhjsdZ6c-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250925/202509252153.JhjsdZ6c-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509252153.JhjsdZ6c-lkp@intel.com/
All warnings (new ones prefixed by >>):
arch/x86/mm/pat/set_memory.c: In function 'mirror_asi_direct_map':
>> arch/x86/mm/pat/set_memory.c:1995:25: warning: variable 'asi_cpa' set but not used [-Wunused-but-set-variable]
1995 | struct cpa_data asi_cpa = *cpa;
| ^~~~~~~
vim +/asi_cpa +1995 arch/x86/mm/pat/set_memory.c
1988
1989 /*
1990 * Having updated the unrestricted PGD, reflect this change in the ASI
1991 * restricted address space too.
1992 */
1993 static inline int mirror_asi_direct_map(struct cpa_data *cpa, int primary)
1994 {
> 1995 struct cpa_data asi_cpa = *cpa;
1996
1997 if (!asi_enabled_static())
1998 return 0;
1999
2000 /* Only need to do this for the real unrestricted direct map. */
2001 if ((cpa->pgd && cpa->pgd != init_mm.pgd) || !is_direct_map(*cpa->vaddr))
2002 return 0;
2003 VM_WARN_ON_ONCE(!is_direct_map(*cpa->vaddr + (cpa->numpages * PAGE_SIZE)));
2004
2005 asi_cpa.pgd = asi_nonsensitive_pgd;
2006 asi_cpa.curpage = 0;
2007 return __change_page_attr(cpa, primary);
2008 }
2009
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers
2025-09-24 14:59 ` [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers Brendan Jackman
@ 2025-09-25 13:36 ` kernel test robot
0 siblings, 0 replies; 65+ messages in thread
From: kernel test robot @ 2025-09-25 13:36 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: oe-kbuild-all, peterz, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86, yosry.ahmed
Hi Brendan,
kernel test robot noticed the following build errors:
[auto build test ERROR on bf2602a3cb2381fb1a04bf1c39a290518d2538d1]
url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/x86-mm-asi-Add-CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION/20250924-230633
base: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
patch link: https://lore.kernel.org/r/20250924-b4-asi-page-alloc-v1-13-2d861768041f%40google.com
patch subject: [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers
config: x86_64-buildonly-randconfig-006-20250925 (https://download.01.org/0day-ci/archive/20250925/202509252146.WmdVQlgy-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250925/202509252146.WmdVQlgy-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509252146.WmdVQlgy-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/page_alloc_test.c: In function 'test_pindex_helpers':
>> mm/page_alloc_test.c:19:30: error: implicit declaration of function 'pcp_allowed_order' [-Wimplicit-function-declaration]
19 | if (!pcp_allowed_order(order))
| ^~~~~~~~~~~~~~~~~
>> mm/page_alloc_test.c:24:55: error: implicit declaration of function 'order_to_pindex' [-Wimplicit-function-declaration]
24 | unsigned int pindex = order_to_pindex(ft, order);
| ^~~~~~~~~~~~~~~
>> mm/page_alloc_test.c:44:45: error: implicit declaration of function 'pindex_to_order'; did you mean 'next_order'? [-Wimplicit-function-declaration]
44 | got_order = pindex_to_order(pindex);
| ^~~~~~~~~~~~~~~
| next_order
vim +/pcp_allowed_order +19 mm/page_alloc_test.c
8
9 /* This just checks for basic arithmetic errors. */
10 static void test_pindex_helpers(struct kunit *test)
11 {
12 unsigned long bitmap[bitmap_size(NR_PCP_LISTS)];
13
14 /* Bit means "pindex not yet used". */
15 bitmap_fill(bitmap, NR_PCP_LISTS);
16
17 for (unsigned int order = 0; order < NR_PAGE_ORDERS; order++) {
18 for (unsigned int mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
> 19 if (!pcp_allowed_order(order))
20 continue;
21
22 for (int sensitive = 0; sensitive < NR_SENSITIVITIES; sensitive++) {
23 freetype_t ft = migrate_to_freetype(mt, sensitive);
> 24 unsigned int pindex = order_to_pindex(ft, order);
25 int got_order;
26
27 KUNIT_ASSERT_LT_MSG(test, pindex, NR_PCP_LISTS,
28 "invalid pindex %d (order %d mt %d sensitive %d)",
29 pindex, order, mt, sensitive);
30 KUNIT_EXPECT_TRUE_MSG(test, test_bit(pindex, bitmap),
31 "pindex %d reused (order %d mt %d sensitive %d)",
32 pindex, order, mt, sensitive);
33
34 /*
35 * For THP, two migratetypes map to the
36 * same pindex, just manually exclude one
37 * of those cases.
38 */
39 if (!(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
40 order == HPAGE_PMD_ORDER &&
41 mt == min(MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE)))
42 clear_bit(pindex, bitmap);
43
> 44 got_order = pindex_to_order(pindex);
45 KUNIT_EXPECT_EQ_MSG(test, order, got_order,
46 "roundtrip failed, got %d want %d (pindex %d mt %d sensitive %d)",
47 got_order, order, pindex, mt, sensitive);
48
49 }
50 }
51 }
52
53 KUNIT_EXPECT_TRUE_MSG(test, bitmap_empty(bitmap, NR_PCP_LISTS),
54 "unused pindices: %*pbl", NR_PCP_LISTS, bitmap);
55 }
56
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (20 preceding siblings ...)
2025-09-24 14:59 ` [PATCH 21/21] mm: asi_test: smoke test for [non]sensitive page allocs Brendan Jackman
@ 2025-09-25 17:51 ` Brendan Jackman
2025-09-30 19:51 ` Konrad Rzeszutek Wilk
` (2 subsequent siblings)
24 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-09-25 17:51 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed, owner-linux-mm
On Wed Sep 24, 2025 at 2:59 PM UTC, Brendan Jackman wrote:
> base-commit: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
I forgot to mention that this is based on linux-next from 2025-09-22. I
have pushed this series here:
https://github.com/bjackman/linux/tree/asi/direct-map-v1
And I'll be keeping this branch up-to-date between [PATCH] revisions as
I respond to feedback (I've already pushed fixes for the build failures
identified by the bot):
https://github.com/bjackman/linux/tree/asi/direct-map
Also, someone pointed out that this post doesn't explain what ASI
actually is. This information is all online if you chase my references,
but so people don't have to do that, I will add something to
Documentation/ for v2.
For the benefit of anyone reading this version who isn't already
familiar with ASI, I'm pasting my draft below. Let me know if I can
clarify anything here.
Cheers,
Brendan
---
=============================
Address Space Isolation (ASI)
=============================
.. Warning::
ASI is incomplete. It is available to enable for testing but doesn't offer
security guarantees. See the "Status" section for details.
Introduction
============
ASI is a mechanism to mitigate a broad class of CPU vulnerabilities. While the
precise scope of these vulnerabilities is complex, ASI, when appropriately
configured, mitigates most well-known CPU exploits.
This class of vulnerabilities could be mitigated by the following *blanket
mitigation*:
1. Remove all potentially secret data from the attacker's address space (i.e.
enable PTI).
2. Disable SMT.
3. Whenever transitioning from an untrusted domain (i.e. a userspace processe or
a KVM guest) into a potential victim domain (in this case, the kernel), clear
all state from the branch predictor.
4. Whenever transitionin from the victim domain into an untrusted domain, clear
all microarchitectural state that might be exploited to leak data from a
sidechannel (e.g. L1D$, load and store buffers, etc).
The performance overhead of this mitigation is unacceptable for most use-cases. In the
abstract, ASI works by doing these things, but only *selectively*.
What ASI does
=============
Memory is divided into *sensitive* and *nonsensitive* memory. Sensitive memory
refers to memory that might contain data the kernel is obliged to protect from
an attacker. Specifically, this includes any memory that might contain user data
or could be indirectly used to steal user data (such as keys). All other memory
is nonsensitive.
A new address space, called the *restricted address space*, is introduced, where
sensitive memory is not mapped. The "normal" address space where everything is
mapped (equivalent to the address space used by the kernel when ASI is disabled)
is called the *unrestricted address space*. When the CPU enters the
does so in the restricted address space (no sensitive memory mapped).
If the kernel accesses sensitive memory, it triggers a page fault. In this page
fault handler, the kernel transitions from the restricted to the unrestricted
address space. At this point, a security boundary is crossed: just before the
transition, the kernel flushes branch predictor state as it would in point
3 of the blanket mitigation above. Furthermore, SMT is disabled (the sibling
hyperthread is paused).
.. Note::
Because the restricted -> unrestricted transition is triggered by a page
fault, it is totally automatic and transparent to the rest of the kernel.
Kernel code is not generally aware of memory sensitivity.
Before returning to the untrusted domain, the kernel transitions back to the
restricted address space. Immediately afterwards, it flushes any potential
side-channels, like in step 4 of the blanket mitigation above. At this point SMT
is also re-enabled.
Why it works
============
In terms of security, this is equivalent to the blanket mitigation. However,
instead of doing these expensive things on every transition into and out of the
kernel, ASI does them only on transitions between its address spaces. Most
entries to the kernel do not require access to any sensitive data. This means
that a roundtrip can be performed without doing any of the flushes mentioned
above.
This selectivity means that much more aggressive mitigation techniques are
available for a dramatically reduced performance cost. In turn, these more
aggressive techniques tend to be more generic. For example, instead of needing
to develop new microarchitecture-specific techniques to efficiently eliminate
attacker "mistraining", ASI makes it viable to just use generic flush operations
like IBPB.
Status
======
ASI is currently still in active development. None of the features described
above actually work yet.
Prototypes only exist for ASI on x86 and in its initial development it will
remain x86-specific. This is not fundamental to its design, it could eventually
be extended for other architectures too as needed.
Resources
=========
* Presentation at LSF/MM/BPF 2024, introducing ASI: https://www.youtube.com/watch?v=DxaN6X_fdlI
* RFCs on LKML:
* `Junaid Shahid, 2022 <https://lore.kernel.org/all/20220223052223.1202152-1-junaids@google.com/>`__
* `Brendan Jackman, 2025 <https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@google.com>`__
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-09-24 14:59 ` [PATCH 03/21] x86/mm: factor out phys_pgd_init() Brendan Jackman
@ 2025-09-27 19:29 ` kernel test robot
2025-10-01 12:26 ` Brendan Jackman
2025-10-25 11:48 ` Borislav Petkov
1 sibling, 1 reply; 65+ messages in thread
From: kernel test robot @ 2025-09-27 19:29 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: llvm, oe-kbuild-all, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86, yosry.ahmed
Hi Brendan,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bf2602a3cb2381fb1a04bf1c39a290518d2538d1]
url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/x86-mm-asi-Add-CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION/20250924-230633
base: bf2602a3cb2381fb1a04bf1c39a290518d2538d1
patch link: https://lore.kernel.org/r/20250924-b4-asi-page-alloc-v1-3-2d861768041f%40google.com
patch subject: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20250927/202509272136.N4ELb64u-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250927/202509272136.N4ELb64u-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509272136.N4ELb64u-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/x86/mm/init_64.c:747:23: warning: variable 'vaddr_start' set but not used [-Wunused-but-set-variable]
747 | unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
| ^
1 warning generated.
vim +/vaddr_start +747 arch/x86/mm/init_64.c
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 742
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 743 static unsigned long __meminit
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 744 phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_end,
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 745 unsigned long page_size_mask, pgprot_t prot, bool init, bool *pgd_changed)
^1da177e4c3f41 arch/x86_64/mm/init.c Linus Torvalds 2005-04-16 746 {
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 @747 unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
^1da177e4c3f41 arch/x86_64/mm/init.c Linus Torvalds 2005-04-16 748
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 749 *pgd_changed = false;
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 750
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 751 paddr_last = paddr_end;
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 752 vaddr = (unsigned long)__va(paddr_start);
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 753 vaddr_end = (unsigned long)__va(paddr_end);
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 754 vaddr_start = vaddr;
^1da177e4c3f41 arch/x86_64/mm/init.c Linus Torvalds 2005-04-16 755
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 756 for (; vaddr < vaddr_end; vaddr = vaddr_next) {
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 757 pgd_t *pgd = pgd_offset_pgd(pgd_page, vaddr);
f2a6a7050109e0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-03-17 758 p4d_t *p4d;
44df75e629106e arch/x86_64/mm/init.c Matt Tolentino 2006-01-17 759
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 760 vaddr_next = (vaddr & PGDIR_MASK) + PGDIR_SIZE;
4f9c11dd49fb73 arch/x86/mm/init_64.c Jeremy Fitzhardinge 2008-06-25 761
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 762 if (pgd_val(*pgd)) {
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 763 p4d = (p4d_t *)pgd_page_vaddr(*pgd);
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 764 paddr_last = phys_p4d_init(p4d, __pa(vaddr),
59b3d0206d74a7 arch/x86/mm/init_64.c Thomas Garnier 2016-06-21 765 __pa(vaddr_end),
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 766 page_size_mask,
c164fbb40c43f8 arch/x86/mm/init_64.c Logan Gunthorpe 2020-04-10 767 prot, init);
4f9c11dd49fb73 arch/x86/mm/init_64.c Jeremy Fitzhardinge 2008-06-25 768 continue;
4f9c11dd49fb73 arch/x86/mm/init_64.c Jeremy Fitzhardinge 2008-06-25 769 }
4f9c11dd49fb73 arch/x86/mm/init_64.c Jeremy Fitzhardinge 2008-06-25 770
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 771 p4d = alloc_low_page();
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 772 paddr_last = phys_p4d_init(p4d, __pa(vaddr), __pa(vaddr_end),
c164fbb40c43f8 arch/x86/mm/init_64.c Logan Gunthorpe 2020-04-10 773 page_size_mask, prot, init);
8ae3a5a8dff2c9 arch/x86/mm/init_64.c Jan Beulich 2008-08-21 774
8ae3a5a8dff2c9 arch/x86/mm/init_64.c Jan Beulich 2008-08-21 775 spin_lock(&init_mm.page_table_lock);
ed7588d5dc6f5e arch/x86/mm/init_64.c Kirill A. Shutemov 2018-05-18 776 if (pgtable_l5_enabled())
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 777 pgd_populate_init(&init_mm, pgd, p4d, init);
7e82ea946ae4d0 arch/x86/mm/init_64.c Kirill A. Shutemov 2017-06-06 778 else
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 779 p4d_populate_init(&init_mm, p4d_offset(pgd, vaddr),
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 780 (pud_t *) p4d, init);
eccd906484d1cd arch/x86/mm/init_64.c Brijesh Singh 2019-04-17 781
8ae3a5a8dff2c9 arch/x86/mm/init_64.c Jan Beulich 2008-08-21 782 spin_unlock(&init_mm.page_table_lock);
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 783 *pgd_changed = true;
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 784 }
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 785
46b7f8ebabd0a2 arch/x86/mm/init_64.c Brendan Jackman 2025-09-24 786 return paddr_last;
^1da177e4c3f41 arch/x86_64/mm/init.c Linus Torvalds 2005-04-16 787 }
9b861528a8012e arch/x86/mm/init_64.c Haicheng Li 2010-08-20 788
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (21 preceding siblings ...)
2025-09-25 17:51 ` [PATCH 00/21] mm: ASI direct map management Brendan Jackman
@ 2025-09-30 19:51 ` Konrad Rzeszutek Wilk
2025-10-01 7:12 ` Brendan Jackman
2025-10-01 19:54 ` Dave Hansen
2025-10-01 20:59 ` Dave Hansen
24 siblings, 1 reply; 65+ messages in thread
From: Konrad Rzeszutek Wilk @ 2025-09-30 19:51 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, bp,
dave.hansen, mingo, tglx, akpm, david, derkling, junaids,
linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka, x86,
Yosry Ahmed
On Wed, Sep 24, 2025 at 02:59:35PM +0000, Brendan Jackman wrote:
> As per [0] I think ASI is ready to start merging. This is the first
> step. The scope of this series is: everything needed to set up the
> direct map in the restricted address spaces.
There looks to be a different approach taken by other folks to
yank the guest pages from the hypervisor:
https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk/
That looks to have a very similar end result with less changes?
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-09-30 19:51 ` Konrad Rzeszutek Wilk
@ 2025-10-01 7:12 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-01 7:12 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, bp,
dave.hansen, mingo, tglx, akpm, david, derkling, junaids,
linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka, x86,
Yosry Ahmed
On Tue, 30 Sept 2025 at 21:51, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>
> On Wed, Sep 24, 2025 at 02:59:35PM +0000, Brendan Jackman wrote:
> > As per [0] I think ASI is ready to start merging. This is the first
> > step. The scope of this series is: everything needed to set up the
> > direct map in the restricted address spaces.
>
> There looks to be a different approach taken by other folks to
> yank the guest pages from the hypervisor:
>
> https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk/
>
> That looks to have a very similar end result with less changes?
Hey Konrad,
Yeah if you only care about the security boundary around VM guests,
and you're able to rework your hypervisor stack appropriately (I don't
know too much about this but presumably it's just a subset of what's
needed to support confidential computing usecases?), that approach
seems good to me.
But that isn't true for most of Linux's users. We still need to
support systems where there is a meaningful security boundary around
native processes. Also, unless I'm mistaken Patrick's approach will
always require changes to the VMM, I don't think the kernel can just
tell all users to go and make those changes.
Basically: I support that approach, it's a good idea. It just solves a
different set of problems. (I haven't thought about it carefully but I
guess it solves some problems that ASI doesn't, since I guess it
prevents some set of software exploits too, while ASI only helps with
HW vulns).
Cheers,
Brendan
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-09-27 19:29 ` kernel test robot
@ 2025-10-01 12:26 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-01 12:26 UTC (permalink / raw)
To: kernel test robot, Brendan Jackman, Andy Lutomirski,
Lorenzo Stoakes, Liam R. Howlett, Suren Baghdasaryan,
Michal Hocko, Johannes Weiner, Zi Yan, Axel Rasmussen,
Yuanchu Xie, Roman Gushchin
Cc: llvm, oe-kbuild-all, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86, yosry.ahmed
On Sat Sep 27, 2025 at 7:29 PM UTC, kernel test robot wrote:
> Hi Brendan,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on bf2602a3cb2381fb1a04bf1c39a290518d2538d1]
I've fixed this and the others in my WIP branch but I will wait a bit
longer before sending a v2..
They're all real issues - one of them confirms I have not exercised the
CMA with this code (by demonstrating that I did not even compile with
CONFIG_CMA=y). This one is just a benign bug that only shows up with
W=1.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (22 preceding siblings ...)
2025-09-30 19:51 ` Konrad Rzeszutek Wilk
@ 2025-10-01 19:54 ` Dave Hansen
2025-10-01 20:22 ` Yosry Ahmed
2025-10-01 20:59 ` Dave Hansen
24 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 19:54 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> As per [0] I think ASI is ready to start merging. This is the first
> step. The scope of this series is: everything needed to set up the
> direct map in the restricted address spaces.
Brendan!
Generally, we ask that patches get review tags before we consider them
for being merged. Is there a reason this series doesn't need reviews
before it gets merged?
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-01 19:54 ` Dave Hansen
@ 2025-10-01 20:22 ` Yosry Ahmed
2025-10-01 20:30 ` Dave Hansen
0 siblings, 1 reply; 65+ messages in thread
From: Yosry Ahmed @ 2025-10-01 20:22 UTC (permalink / raw)
To: Dave Hansen
Cc: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin, peterz, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86
On Wed, Oct 01, 2025 at 12:54:42PM -0700, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
> > As per [0] I think ASI is ready to start merging. This is the first
> > step. The scope of this series is: everything needed to set up the
> > direct map in the restricted address spaces.
>
> Brendan!
>
> Generally, we ask that patches get review tags before we consider them
> for being merged. Is there a reason this series doesn't need reviews
> before it gets merged?
I think Brendan just meant that this is not an RFC aimed at prompting
discussion anymore, these are fully functional patches aimed at being
merged after they are reviewed and iterated on accordingly.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-09-24 14:59 ` [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd Brendan Jackman
@ 2025-10-01 20:28 ` Dave Hansen
2025-10-02 14:05 ` Brendan Jackman
2025-11-11 14:55 ` Borislav Petkov
1 sibling, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 20:28 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> Create the initial shared pagetable to hold all the mappings that will
> be shared among ASI domains.
>
> Mirror the physmap into the ASI pagetables, but with a maximum
> granularity that's guaranteed to allow changing pageblock sensitivity
> without having to allocate pagetables, and with everything as
> non-present.
Could you also talk about what this granularity _actually_ is and why it
has the property of never requiring page table alloc
...
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index e98e85cf15f42db669696ba8195d8fc633351b26..7e0471d46767c63ceade479ae0d1bf738f14904a 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -7,6 +7,7 @@
> * Copyright (C) 2002,2003 Andi Kleen <ak@suse.de>
> */
>
> +#include <linux/asi.h>
> #include <linux/signal.h>
> #include <linux/sched.h>
> #include <linux/kernel.h>
> @@ -746,7 +747,8 @@ phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_en
> {
> unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
>
> - *pgd_changed = false;
> + if (pgd_changed)
> + *pgd_changed = false;
This 'pgd_changed' hunk isn't mentioned in the changelog.
...
> @@ -797,6 +800,24 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
>
> paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
> prot, init, &pgd_changed);
> +
> + /*
> + * Set up ASI's unrestricted physmap. This needs to mapped at minimum 2M
> + * size so that regions can be mapped and unmapped at pageblock
> + * granularity without requiring allocations.
> + */
This took me a minute to wrap my head around.
Here, I think you're trying to convey that:
1. There's a higher-level design decision that all sensitivity will be
done at a 2M granularity. A 2MB physical region is either sensitive
or not.
2. Because of #1, 1GB mappings are not cool because splitting a 1GB
mapping into 2MB needs to allocate a page table page.
3. 4k mappings are OK because they can also have their permissions
changed at a 2MB granularity. It's just more laborious.
The "minimum 2M size" comment really threw me off because that, to me,
also includes 1G which is a no-no here.
I also can't help but wonder if it would have been easier and more
straightforward to just start this whole exercise at 4k: force all the
ASI tables to be 4k. Then, later, add the 2MB support and tie to
pageblocks on after.
> + if (asi_nonsensitive_pgd) {
> + /*
> + * Since most memory is expected to end up sensitive, start with
> + * everything unmapped in this pagetable.
> + */
> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
> +
> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
> + prot_np, init, NULL);
> + }
I'm also kinda wondering what the purpose is of having a whole page
table full of !_PAGE_PRESENT entries. It would be nice to know how this
eventually gets turned into something useful.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-01 20:22 ` Yosry Ahmed
@ 2025-10-01 20:30 ` Dave Hansen
2025-10-02 11:05 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 20:30 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin, peterz, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86
On 10/1/25 13:22, Yosry Ahmed wrote:
> On Wed, Oct 01, 2025 at 12:54:42PM -0700, Dave Hansen wrote:
>> On 9/24/25 07:59, Brendan Jackman wrote:
>>> As per [0] I think ASI is ready to start merging. This is the first
>>> step. The scope of this series is: everything needed to set up the
>>> direct map in the restricted address spaces.
>> Brendan!
>>
>> Generally, we ask that patches get review tags before we consider them
>> for being merged. Is there a reason this series doesn't need reviews
>> before it gets merged?
> I think Brendan just meant that this is not an RFC aimed at prompting
> discussion anymore, these are fully functional patches aimed at being
> merged after they are reviewed and iterated on accordingly.
Just setting expectations ... I think Brendan has probably rewritten
this two or three times. I suggest he's about halfway done; only two or
three rewrites left. ;)
But, seriously, this _is_ a big deal. It's not going to be something
that gets a few tags slapped on it and gets merged. At least that's not
how I expect it to go.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-09-24 14:59 ` [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI Brendan Jackman
2025-09-25 13:36 ` kernel test robot
@ 2025-10-01 20:50 ` Dave Hansen
2025-10-02 14:31 ` Brendan Jackman
1 sibling, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 20:50 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> ASI has a separate PGD for the physmap, which needs to be kept in sync
> with the unrestricted physmap with respect to permissions.
So that leads to another thing... What about vmalloc()? Why doesn't it
need to be in the ASI pgd?
> +static inline bool is_direct_map(unsigned long vaddr)
> +{
> + return within(vaddr, PAGE_OFFSET,
> + PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT));
> +}
>
> static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
> int primary)
> @@ -1808,8 +1814,7 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
> * one virtual address page and its pfn. TBD: numpages can be set based
> * on the initial value and the level returned by lookup_address().
> */
> - if (within(vaddr, PAGE_OFFSET,
> - PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))) {
> + if (is_direct_map(vaddr)) {
> cpa->numpages = 1;
> cpa->pfn = __pa(vaddr) >> PAGE_SHIFT;
> return 0;
> @@ -1981,6 +1986,27 @@ static int cpa_process_alias(struct cpa_data *cpa)
> return 0;
> }
>
> +/*
> + * Having updated the unrestricted PGD, reflect this change in the ASI
> + * restricted address space too.
> + */
> +static inline int mirror_asi_direct_map(struct cpa_data *cpa, int primary)
> +{
> + struct cpa_data asi_cpa = *cpa;
> +
> + if (!asi_enabled_static())
> + return 0;
> +
> + /* Only need to do this for the real unrestricted direct map. */
> + if ((cpa->pgd && cpa->pgd != init_mm.pgd) || !is_direct_map(*cpa->vaddr))
> + return 0;
> + VM_WARN_ON_ONCE(!is_direct_map(*cpa->vaddr + (cpa->numpages * PAGE_SIZE)));
> +
> + asi_cpa.pgd = asi_nonsensitive_pgd;
> + asi_cpa.curpage = 0;
Please document what functionality this curpage=0 has. It's not clear.
> + return __change_page_attr(cpa, primary);
> +}
But let's say someone is doing something silly like:
set_memory_np(addr, size);
set_memory_p(addr, size);
Won't that end up in here and make the "unrestricted PGD" have
_PAGE_PRESENT==1 entries?
Also, could we try and make the nomenclature consistent? We've got
"unrestricted direct map" and "asi_nonsensitive_pgd" being used (at
least). Could the terminology be made more consistent?
One subtle thing here is that it's OK to allocate memory here when
mirroring changes into 'asi_nonsensitive_pgd'. It's just not OK when
flipping sensitivity. That seems worth a comment.
> static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
> {
> unsigned long numpages = cpa->numpages;
> @@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
> if (!debug_pagealloc_enabled())
> spin_lock(&cpa_lock);
> ret = __change_page_attr(cpa, primary);
> + if (!ret)
> + ret = mirror_asi_direct_map(cpa, primary);
> if (!debug_pagealloc_enabled())
> spin_unlock(&cpa_lock);
> if (ret)
>
Is cpa->pgd ever have any values other than NULL or init_mm->pgd? I
didn't see anything in a quick grep.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
` (23 preceding siblings ...)
2025-10-01 19:54 ` Dave Hansen
@ 2025-10-01 20:59 ` Dave Hansen
2025-10-02 7:34 ` David Hildenbrand
2025-10-02 11:23 ` Brendan Jackman
24 siblings, 2 replies; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 20:59 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> Why is this the scope of the first series? The objective here is to
> reach a MVP of ASI that people can actually run, as soon as possible.
I had to ask ChatGPT what you meant by MVP. Minimum Viable Product?
So this series just creates a new address space and then ensures that
sensitive data is not mapped there? To me, that's a proof-of-concept,
not a bit of valuable functionality that can be merged upstream.
I'm curious how far the first bit of functionality that would be useful
to end users is from the end of this series.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it
2025-09-24 14:59 ` [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it Brendan Jackman
@ 2025-10-01 21:18 ` Dave Hansen
2025-10-02 14:34 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 21:18 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
> +#define ___GFP_SENSITIVE BIT(___GFP_SENSITIVE_BIT)
> +#else
> +#define ___GFP_SENSITIVE 0
> +#endif
This is clearly one of the inflection points of the series.
To go any farther with this approach, I think it's critical to get a few
acks on this hunk specifically. Well, maybe not formal acked-by's, but
at least _clear_ agreement from at least one of:
MEMORY MANAGEMENT - PAGE ALLOCATOR
M: Andrew Morton <akpm@linux-foundation.org>
M: Vlastimil Babka <vbabka@suse.cz>
... or this approach is dead in the water.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 11/21] mm: introduce freetype_t
2025-09-24 14:59 ` [PATCH 11/21] mm: introduce freetype_t Brendan Jackman
2025-09-25 13:15 ` kernel test robot
@ 2025-10-01 21:20 ` Dave Hansen
2025-10-02 14:39 ` Brendan Jackman
1 sibling, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-01 21:20 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On 9/24/25 07:59, Brendan Jackman wrote:
> @@ -2234,7 +2235,7 @@ static bool should_proactive_compact_node(pg_data_t *pgdat)
> static enum compact_result __compact_finished(struct compact_control *cc)
> {
> unsigned int order;
> - const int migratetype = cc->migratetype;
> + const freetype_t freetype = cc->freetype;
Just as I'm scanning this series at a high level, this patch looks too
big to me. There is too much mixing of mechanical changes like this
s/int/freetype_t/ and s/migratetype/freetype/ with new functionality.
I'd be looking for ways to split this up a lot more.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-01 20:59 ` Dave Hansen
@ 2025-10-02 7:34 ` David Hildenbrand
2025-10-02 11:23 ` Brendan Jackman
1 sibling, 0 replies; 65+ messages in thread
From: David Hildenbrand @ 2025-10-02 7:34 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, derkling, junaids,
linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka, x86,
Yosry Ahmed
On 01.10.25 22:59, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> Why is this the scope of the first series? The objective here is to
>> reach a MVP of ASI that people can actually run, as soon as possible.
>
> I had to ask ChatGPT what you meant by MVP. Minimum Viable Product?
>
> So this series just creates a new address space and then ensures that
> sensitive data is not mapped there? To me, that's a proof-of-concept,
> not a bit of valuable functionality that can be merged upstream.
>
> I'm curious how far the first bit of functionality that would be useful
> to end users is from the end of this series.
There was this mail "[Discuss] First steps for ASI (ASI is fast
again)"[1] that I also didn't get to fully digest yet, where there was a
question at the very end
"
Once we have some x86 maintainers saying "yep, it looks like this can
work and it's something we want", I can start turning my page_alloc RFC
[3] into a proper patchset (or maybe multiple if I can find a way to
break things down further).
...
So, x86 folks: Does this feel like "line of sight" to you? If not, what
would that look like, what experiments should I run?
"
Unless I am missing something, no x86 maintainer replied to that one so
far and I assume this patch set here is the revival of above mentioned RFC
"
so it might be reasonable to reply there.
[1]
https://lore.kernel.org/all/20250812173109.295750-1-jackmanb@google.com/T/#u
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-01 20:30 ` Dave Hansen
@ 2025-10-02 11:05 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 11:05 UTC (permalink / raw)
To: Dave Hansen, Yosry Ahmed
Cc: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin, peterz, bp, dave.hansen, mingo, tglx, akpm, david,
derkling, junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt,
vbabka, x86
On Wed Oct 1, 2025 at 8:30 PM UTC, Dave Hansen wrote:
> On 10/1/25 13:22, Yosry Ahmed wrote:
>> On Wed, Oct 01, 2025 at 12:54:42PM -0700, Dave Hansen wrote:
>>> On 9/24/25 07:59, Brendan Jackman wrote:
>>>> As per [0] I think ASI is ready to start merging. This is the first
>>>> step. The scope of this series is: everything needed to set up the
>>>> direct map in the restricted address spaces.
>>> Brendan!
>>>
>>> Generally, we ask that patches get review tags before we consider them
>>> for being merged. Is there a reason this series doesn't need reviews
>>> before it gets merged?
>> I think Brendan just meant that this is not an RFC aimed at prompting
>> discussion anymore, these are fully functional patches aimed at being
>> merged after they are reviewed and iterated on accordingly.
>
> Just setting expectations ... I think Brendan has probably rewritten
> this two or three times. I suggest he's about halfway done; only two or
> three rewrites left. ;)
Yeah, I'd love to say "... and we have become exceedingly efficient at
it" [0], but no, debugging my idiotic freelist and pagetable corruptions
was just as hard this time as the first and second times...
[0] https://www.youtube.com/watch?v=r51EomcIqA0
> But, seriously, this _is_ a big deal. It's not going to be something
> that gets a few tags slapped on it and gets merged. At least that's not
> how I expect it to go.
Yeah, sorry if this was poorly worded, I'm DEFINITELY not asking anyone
to merge this without the requisite acks - "ready for merge" just means
"please review this as real grown-up code, I no longer consider this a
PoC". And I'm not expecting this to get merged in v2 either :)
Maybe worth noting here: there are two broad parties of important
reviewers - mm folks and x86 folks. I think we're at risk of a
chicken-and-egg problem where party A is thinking "no point in reviewing
this too carefully, it's not yet clear that party B is ever gonna accept
ASI even in theory". Meanwhile party B says "yeah ASI seems desirable,
but I'll keep my nose out until party A has ironed out the details on
their side".
So, if you can do anything to help develop a consensus on whether we
actually want this thing, that would help a lot. Maybe the best way to
do that is just to dig into the details anyway, I'm not sure.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-01 20:59 ` Dave Hansen
2025-10-02 7:34 ` David Hildenbrand
@ 2025-10-02 11:23 ` Brendan Jackman
2025-10-02 17:01 ` Dave Hansen
1 sibling, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 11:23 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On Wed Oct 1, 2025 at 8:59 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> Why is this the scope of the first series? The objective here is to
>> reach a MVP of ASI that people can actually run, as soon as possible.
>
> I had to ask ChatGPT what you meant by MVP. Minimum Viable Product?
Yeah exactly, sorry I am leaking corporate jargon.
> So this series just creates a new address space and then ensures that
> sensitive data is not mapped there? To me, that's a proof-of-concept,
> not a bit of valuable functionality that can be merged upstream.
>
> I'm curious how far the first bit of functionality that would be useful
> to end users is from the end of this series.
I think this series is about half way there. With 2 main series:
1. The bit to get the pagetables set up (this series)
2. The bit to switch in and out of the address space
We already have something that delivers security value. It would only
perform well for a certain set of usecases, but there are users for whom
its still a win - it's already strictly cheaper than IBPB-on-VMExit.
[Well, I'm assuming there that we include the actual security flushes in
series 2, maybe that would be more like "2b"...]
To get to the more interesting cases where it's faster than the current
default, I think is not that far away for KVM usecases. I think the
branch I posted in my [Discuss] thread[0] gets competitive with existing
KVM usecases well before it devolves into the really hacky prototype
stuff.
To get to the actual goal, where ASI can become the global default (i.e.
it's still fast when you sandbox native tasks as well as KVM guests), is
further since we need to figure out the details on something like what I
called the "ephmap" in [0].
There are competing tensions here - we would prefer not to merge code
that "doesn't do anything", but on the other hand I don't think anyone
wants to find themselves receiving [PATCH v34 19/40] next July... so
I've tried to strike a balance here. Something like:
1. Develop a consensus that "we probably want ASI and it's worth trying"
2. Start working towards it in-tree, by breaking it down into smaller
chunks.
Do you think it would help if I started also maintaining an asi-next
branch with the next few things all queued up and benchmarked, so we can
get a look at the "goal state" while also keeping an eye on the here and
now? Or do you have other suggestions for the strategy here?
[0] https://lore.kernel.org/all/20250812173109.295750-1-jackmanb@google.com/
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-10-01 20:28 ` Dave Hansen
@ 2025-10-02 14:05 ` Brendan Jackman
2025-10-02 16:14 ` Dave Hansen
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 14:05 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On Wed Oct 1, 2025 at 8:28 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> Create the initial shared pagetable to hold all the mappings that will
>> be shared among ASI domains.
>>
>> Mirror the physmap into the ASI pagetables, but with a maximum
>> granularity that's guaranteed to allow changing pageblock sensitivity
>> without having to allocate pagetables, and with everything as
>> non-present.
>
> Could you also talk about what this granularity _actually_ is and why it
> has the property of never requiring page table alloc
Ack, will expand on this (I think from your other comments that you
understand it now, and you're just asking me to improve the commit
message, let me know if I misread that).
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index e98e85cf15f42db669696ba8195d8fc633351b26..7e0471d46767c63ceade479ae0d1bf738f14904a 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -7,6 +7,7 @@
>> * Copyright (C) 2002,2003 Andi Kleen <ak@suse.de>
>> */
>>
>> +#include <linux/asi.h>
>> #include <linux/signal.h>
>> #include <linux/sched.h>
>> #include <linux/kernel.h>
>> @@ -746,7 +747,8 @@ phys_pgd_init(pgd_t *pgd_page, unsigned long paddr_start, unsigned long paddr_en
>> {
>> unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
>>
>> - *pgd_changed = false;
>> + if (pgd_changed)
>> + *pgd_changed = false;
>
> This 'pgd_changed' hunk isn't mentioned in the changelog.
Oops, will add a note about that. The alternative would just be to
squash this into the commit that introduces phys_pgd_init(), let me know
if you have a preference.
>> @@ -797,6 +800,24 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
>>
>> paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
>> prot, init, &pgd_changed);
>> +
>> + /*
>> + * Set up ASI's unrestricted physmap. This needs to mapped at minimum 2M
>> + * size so that regions can be mapped and unmapped at pageblock
>> + * granularity without requiring allocations.
>> + */
>
> This took me a minute to wrap my head around.
>
> Here, I think you're trying to convey that:
>
> 1. There's a higher-level design decision that all sensitivity will be
> done at a 2M granularity. A 2MB physical region is either sensitive
> or not.
> 2. Because of #1, 1GB mappings are not cool because splitting a 1GB
> mapping into 2MB needs to allocate a page table page.
> 3. 4k mappings are OK because they can also have their permissions
> changed at a 2MB granularity. It's just more laborious.
>
> The "minimum 2M size" comment really threw me off because that, to me,
> also includes 1G which is a no-no here.
Er yeah sorry that's just wrong, it should say "maximum size".
> I also can't help but wonder if it would have been easier and more
> straightforward to just start this whole exercise at 4k: force all the
> ASI tables to be 4k. Then, later, add the 2MB support and tie to
> pageblocks on after.
This would lead to a much smaller patchset, but I think it creates some
pretty yucky technical debt and complexity of its own. If you're
imagining a world where we just leave most of the allocator as-is, and
just inject "map into ASI" or "unmap from ASI" at the right moments...
I think to make this work you have to do one of:
- Say all free pages are unmapped from the restricted address space, we
map them on-demand in allocation (if !__GFP_SENSITIVE), and unmap them
again when they are freed. Because you can't flush the TLB
synchronously in the free path, you need an async worker to take care
of that for you.
This is what we did in the Google implementation (where "don't change
the page allocator more than you have to" kinda trumps everything) and
it's pretty nasty. We have lots of knobs we can turn to try and make
it perform well but in the end it's eventually gonna block deployment
to some environment or other.
- Say free pages are mapped into the restricted address space. So if you
get a __GFP_SENSITIVE alloc you unmap the pages and do the TLB flush
synchronously there, unless we think the caller might be atomic, in
which case.... I guess we'd have to have a sort of special atomic
reserve for this? Which... seems like a weaker and more awkward
version of the thing I'm proposing in this patchset.
Then when you free the page you need to map it back again, which means
you need to zero it.
I might have some tunnel-vision on this so please challenge me if it
sounds like I'm missing something.
>> + if (asi_nonsensitive_pgd) {
>> + /*
>> + * Since most memory is expected to end up sensitive, start with
>> + * everything unmapped in this pagetable.
>> + */
>> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
>> +
>> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
>> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
>> + prot_np, init, NULL);
>> + }
>
> I'm also kinda wondering what the purpose is of having a whole page
> table full of !_PAGE_PRESENT entries. It would be nice to know how this
> eventually gets turned into something useful.
If you are thinking of the fact that just clearing P doesn't really do
anything for Meltdown/L1TF.. yeah that's true! We'll actually need to
munge the PFN or something too, but here I wanted do just focus on the
broad strokes of integration without worrying too much about individual
CPU mitigations. Flippping _PAGE_PRESENT is already supported by
set_memory.c and IIRC it's good enough for everything newer than
Skylake.
Other than that, these pages being unmapped is the whole point.. later
on, the subset of memory that we don't need to protect will get flipped
to being present. Everything else will trigger a pagefault if touched
and we'll switch address spaces, do the flushing etc.
Sorry if I'm missing your point here...
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-10-01 20:50 ` Dave Hansen
@ 2025-10-02 14:31 ` Brendan Jackman
2025-10-02 16:40 ` Dave Hansen
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 14:31 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On Wed Oct 1, 2025 at 8:50 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> ASI has a separate PGD for the physmap, which needs to be kept in sync
>> with the unrestricted physmap with respect to permissions.
>
> So that leads to another thing... What about vmalloc()? Why doesn't it
> need to be in the ASI pgd?
Oh yeah it does. For the "actually entering the restricted addres space"
patchset, I'll include logic that just shares that region between the
unrestricted and restricted address space, something like this:
https://github.com/torvalds/linux/commit/04fd7a0b0098af48f2f8d9c0343b1edd12987681#diff-ecb3536ec179c07d4b4b387e58e62a9a6e553069cfed22a73448eb2ce5b82aa6R637-R669
Later, we'll want to be able to protect subsets of the vmalloc area
(i.e. unmap them from the restricted address space) too, but that's
something we can think about later I think. Unless I'm mistaken it's
much simpler than for the direct map. Junaid had a minumal solution for
that in his 2022 RFC [0]:
[0] https://lore.kernel.org/all/20220223052223.1202152-12-junaids@google.com/
>> +static inline bool is_direct_map(unsigned long vaddr)
>> +{
>> + return within(vaddr, PAGE_OFFSET,
>> + PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT));
>> +}
>>
>> static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
>> int primary)
>> @@ -1808,8 +1814,7 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
>> * one virtual address page and its pfn. TBD: numpages can be set based
>> * on the initial value and the level returned by lookup_address().
>> */
>> - if (within(vaddr, PAGE_OFFSET,
>> - PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))) {
>> + if (is_direct_map(vaddr)) {
>> cpa->numpages = 1;
>> cpa->pfn = __pa(vaddr) >> PAGE_SHIFT;
>> return 0;
>> @@ -1981,6 +1986,27 @@ static int cpa_process_alias(struct cpa_data *cpa)
>> return 0;
>> }
>>
>> +/*
>> + * Having updated the unrestricted PGD, reflect this change in the ASI
>> + * restricted address space too.
>> + */
>> +static inline int mirror_asi_direct_map(struct cpa_data *cpa, int primary)
>> +{
>> + struct cpa_data asi_cpa = *cpa;
>> +
>> + if (!asi_enabled_static())
>> + return 0;
>> +
>> + /* Only need to do this for the real unrestricted direct map. */
>> + if ((cpa->pgd && cpa->pgd != init_mm.pgd) || !is_direct_map(*cpa->vaddr))
>> + return 0;
>> + VM_WARN_ON_ONCE(!is_direct_map(*cpa->vaddr + (cpa->numpages * PAGE_SIZE)));
>> +
>> + asi_cpa.pgd = asi_nonsensitive_pgd;
>> + asi_cpa.curpage = 0;
>
> Please document what functionality this curpage=0 has. It's not clear.
Ack, I'll add some commentary.
>> + return __change_page_attr(cpa, primary);
>> +}
>
> But let's say someone is doing something silly like:
>
> set_memory_np(addr, size);
> set_memory_p(addr, size);
>
> Won't that end up in here and make the "unrestricted PGD" have
> _PAGE_PRESENT==1 entries?
Er, yes, that's a bug, thanks for pointing this out. I guess this is
actually broken under debug_pagealloc or something? I should check that.
This code should only mirror the bits that are irrelevant to ASI.
> Also, could we try and make the nomenclature consistent? We've got
> "unrestricted direct map" and "asi_nonsensitive_pgd" being used (at
> least). Could the terminology be made more consistent?
Hm. It is actually consistent: "unrestricted" is a property of the
address space / execution context. "nonsensitive" is a property of the
memory. Nonsensitive memory is mapped into the unrestricted address
space. asi_nonsensitive_pgd isn't an address space we enter it's just a
holding area (like if we never actually pointed CR3 at init_mm.pgd but
just useed it as a source to clone from).
However.. just because it's consistent doesn't mean it's not confusing.
Do you think we should just squash these two words and call the whole
thing "nonsensitive"? I don't know if "nonsensitive address space" makes
much sense... Is it possible I can fix this by just adding more
comments?
> One subtle thing here is that it's OK to allocate memory here when
> mirroring changes into 'asi_nonsensitive_pgd'. It's just not OK when
> flipping sensitivity. That seems worth a comment.
Ack, will add that.
>> static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>> {
>> unsigned long numpages = cpa->numpages;
>> @@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>> if (!debug_pagealloc_enabled())
>> spin_lock(&cpa_lock);
>> ret = __change_page_attr(cpa, primary);
>> + if (!ret)
>> + ret = mirror_asi_direct_map(cpa, primary);
>> if (!debug_pagealloc_enabled())
>> spin_unlock(&cpa_lock);
>> if (ret)
>>
>
> Is cpa->pgd ever have any values other than NULL or init_mm->pgd? I
> didn't see anything in a quick grep.
It can also be efi_mm.pgd via sev_es_efi_map_ghcbs_cas().
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it
2025-10-01 21:18 ` Dave Hansen
@ 2025-10-02 14:34 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 14:34 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On Wed Oct 1, 2025 at 9:18 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
>> +#define ___GFP_SENSITIVE BIT(___GFP_SENSITIVE_BIT)
>> +#else
>> +#define ___GFP_SENSITIVE 0
>> +#endif
>
> This is clearly one of the inflection points of the series.
>
> To go any farther with this approach, I think it's critical to get a few
> acks on this hunk specifically. Well, maybe not formal acked-by's, but
> at least _clear_ agreement from at least one of:
>
> MEMORY MANAGEMENT - PAGE ALLOCATOR
> M: Andrew Morton <akpm@linux-foundation.org>
> M: Vlastimil Babka <vbabka@suse.cz>
>
> ... or this approach is dead in the water.
Yep, I agree. This is where the chicken-and-egg thing I mentioned in [0]
comes into play though...
[0] https://lore.kernel.org/all/DD7SCRK2OJI9.1EJ9GSEH9FHW2@google.com/
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 11/21] mm: introduce freetype_t
2025-10-01 21:20 ` Dave Hansen
@ 2025-10-02 14:39 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 14:39 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On Wed Oct 1, 2025 at 9:20 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> @@ -2234,7 +2235,7 @@ static bool should_proactive_compact_node(pg_data_t *pgdat)
>> static enum compact_result __compact_finished(struct compact_control *cc)
>> {
>> unsigned int order;
>> - const int migratetype = cc->migratetype;
>> + const freetype_t freetype = cc->freetype;
>
> Just as I'm scanning this series at a high level, this patch looks too
> big to me. There is too much mixing of mechanical changes like this
> s/int/freetype_t/ and s/migratetype/freetype/ with new functionality.
>
> I'd be looking for ways to split this up a lot more.
Ack. One avenue I didn't fully explore would be to break it into:
1. Introduce freetype_t as nothing else than an annoying wrapper around
migratetype.
2. Add the sensitive field when ASI is compiled in.
The reason I shied away from this, is that part 1 will look kinda weird,
because in some place I'll be switching the code to freetype while
others will still use migratetype (code that will never care about
sensitivity), and the distinction might not be obvious without first
reading part 2. I'll just have to try and write a good commit message I
suppose.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-10-02 14:05 ` Brendan Jackman
@ 2025-10-02 16:14 ` Dave Hansen
2025-10-02 17:19 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-02 16:14 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On 10/2/25 07:05, Brendan Jackman wrote:
> On Wed Oct 1, 2025 at 8:28 PM UTC, Dave Hansen wrote:
...>> I also can't help but wonder if it would have been easier and more
>> straightforward to just start this whole exercise at 4k: force all the
>> ASI tables to be 4k. Then, later, add the 2MB support and tie to
>> pageblocks on after.
>
> This would lead to a much smaller patchset, but I think it creates some
> pretty yucky technical debt and complexity of its own. If you're
> imagining a world where we just leave most of the allocator as-is, and
> just inject "map into ASI" or "unmap from ASI" at the right moments...
...
I'm trying to separate out the two problems:
1. Have a set of page tables that never require allocations in order to
map or unmap sensitive data.
2. Manage each pageblock as either all sensitive or all not sensitive
There is a nonzero set of dependencies to make sure that the pageblock
size is compatible with the page table mapping size... unless you just
make the mapping size 4k.
If the mapping size is 4k, the pageblock size can be anything. There's
no dependency to satisfy.
So I'm not saying to make the sensitive/nonsensitive boundary 4k. Just
to make the _mapping_ size 4k. Then, come back later, and move the
mapping size over to 2MB as an optimization.
>>> + if (asi_nonsensitive_pgd) {
>>> + /*
>>> + * Since most memory is expected to end up sensitive, start with
>>> + * everything unmapped in this pagetable.
>>> + */
>>> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
>>> +
>>> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
>>> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
>>> + prot_np, init, NULL);
>>> + }
>>
>> I'm also kinda wondering what the purpose is of having a whole page
>> table full of !_PAGE_PRESENT entries. It would be nice to know how this
>> eventually gets turned into something useful.
>
> If you are thinking of the fact that just clearing P doesn't really do
> anything for Meltdown/L1TF.. yeah that's true! We'll actually need to
> munge the PFN or something too, but here I wanted do just focus on the
> broad strokes of integration without worrying too much about individual
> CPU mitigations. Flippping _PAGE_PRESENT is already supported by
> set_memory.c and IIRC it's good enough for everything newer than
> Skylake.
>
> Other than that, these pages being unmapped is the whole point.. later
> on, the subset of memory that we don't need to protect will get flipped
> to being present. Everything else will trigger a pagefault if touched
> and we'll switch address spaces, do the flushing etc.
>
> Sorry if I'm missing your point here...
What is the point of having a pgd if you can't put it in CR3? If you:
write_cr3(asi_nonsensitive_pgd);
you'll just triple fault because all kernel text is !_PAGE_PRESENT.
The critical point is when 'asi_nonsensitive_pgd' is functional enough
that it can be loaded into CR3 and handle a switch to the normal
init_mm->pgd.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-10-02 14:31 ` Brendan Jackman
@ 2025-10-02 16:40 ` Dave Hansen
2025-10-02 17:08 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-02 16:40 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On 10/2/25 07:31, Brendan Jackman wrote:
> On Wed Oct 1, 2025 at 8:50 PM UTC, Dave Hansen wrote:
...
>> But let's say someone is doing something silly like:
>>
>> set_memory_np(addr, size);
>> set_memory_p(addr, size);
>>
>> Won't that end up in here and make the "unrestricted PGD" have
>> _PAGE_PRESENT==1 entries?
>
> Er, yes, that's a bug, thanks for pointing this out. I guess this is
> actually broken under debug_pagealloc or something? I should check that.
>
> This code should only mirror the bits that are irrelevant to ASI.
It's actually anything that has _PAGE_PRESENT in cpa->mask_set. There
are a number of those. Some of them are irrelevant like the execmem
code, but there are quite a few more that look troublesome outside of
debugging environments.
>> Also, could we try and make the nomenclature consistent? We've got
>> "unrestricted direct map" and "asi_nonsensitive_pgd" being used (at
>> least). Could the terminology be made more consistent?
>
> Hm. It is actually consistent: "unrestricted" is a property of the
> address space / execution context. "nonsensitive" is a property of the
> memory. Nonsensitive memory is mapped into the unrestricted address
> space. asi_nonsensitive_pgd isn't an address space we enter it's just a
> holding area (like if we never actually pointed CR3 at init_mm.pgd but
> just useed it as a source to clone from).
>
> However.. just because it's consistent doesn't mean it's not confusing.
> Do you think we should just squash these two words and call the whole
> thing "nonsensitive"? I don't know if "nonsensitive address space" makes
> much sense... Is it possible I can fix this by just adding more
> comments?
It makes sense to me that a "nonsensitive address space" would not map
any sensitive data and that a "asi_nonsensitive_pgd" is the root of that
address space.
>>> static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>> {
>>> unsigned long numpages = cpa->numpages;
>>> @@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>> if (!debug_pagealloc_enabled())
>>> spin_lock(&cpa_lock);
>>> ret = __change_page_attr(cpa, primary);
>>> + if (!ret)
>>> + ret = mirror_asi_direct_map(cpa, primary);
>>> if (!debug_pagealloc_enabled())
>>> spin_unlock(&cpa_lock);
>>> if (ret)
>>>
>>
>> Is cpa->pgd ever have any values other than NULL or init_mm->pgd? I
>> didn't see anything in a quick grep.
>
> It can also be efi_mm.pgd via sev_es_efi_map_ghcbs_cas().
It would be _nice_ if the ASI exclusion wasn't so magic.
Like, instead of hooking in to __change_page_attr_set_clr() and
filtering on init_mm if we had the callers declare explicitly whether
their changes get reflected into the ASI nonsensitive PGD.
Maybe that looks like a new flag: CPA_DIRECT_MAP or something. Once you
pass that flag in, the cpa code knows that you're working on init_mm.pgd
and mirror_asi_direct_map() can look for *that* instead of init_mm.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-02 11:23 ` Brendan Jackman
@ 2025-10-02 17:01 ` Dave Hansen
2025-10-02 19:19 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Dave Hansen @ 2025-10-02 17:01 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On 10/2/25 04:23, Brendan Jackman wrote:
...
> [Well, I'm assuming there that we include the actual security flushes in
> series 2, maybe that would be more like "2b"...]
>
> To get to the more interesting cases where it's faster than the current
> default, I think is not that far away for KVM usecases. I think the
> branch I posted in my [Discuss] thread[0] gets competitive with existing
> KVM usecases well before it devolves into the really hacky prototype
> stuff.
>
> To get to the actual goal, where ASI can become the global default (i.e.
> it's still fast when you sandbox native tasks as well as KVM guests), is
> further since we need to figure out the details on something like what I
> called the "ephmap" in [0].
>
> There are competing tensions here - we would prefer not to merge code
> that "doesn't do anything", but on the other hand I don't think anyone
> wants to find themselves receiving [PATCH v34 19/40] next July... so
> I've tried to strike a balance here. Something like:
>
> 1. Develop a consensus that "we probably want ASI and it's worth trying"
>
> 2. Start working towards it in-tree, by breaking it down into smaller
> chunks.
Just to be clear: we don't merge code that doesn't do anything
functional. The bar for inclusion is that it has to do something
practical and useful for end users. It can't be purely infrastructure or
preparatory.
Protection keys is a good example. It was a big, gnarly series that
could be roughly divided into two pieces: one that did all the page
table gunk, and all the new ABI bits around exposing pkeys to apps. But
we found a way to do all the page table gunk with no new ABI and that
also gave security folks something they wanted: execute_only_pkey().
So we merged all the page table and internal gunk first, and then the
new ABI a release or two later.
But the important part was that it had _some_ functionality from day one
when it was merged. It wasn't purely infrastructure.
> Do you think it would help if I started also maintaining an asi-next
> branch with the next few things all queued up and benchmarked, so we can
> get a look at the "goal state" while also keeping an eye on the here and
> now? Or do you have other suggestions for the strategy here?
Yes, I think that would be useful.
For instance, imagine you'd had that series sitting around:
6.16-asi-next. Then, all of a sudden you see the vmscape series[1] show
up. Ideally, you'd take your 6.16-asi-next branch and show us how much
simpler and faster it is to mitigate vmscape with ASI instead of the
IBPB silliness that we ended up with.
Basically, use your asi-next branch to bludgeon us each time we _should_
have been using it.
It's also not too late. You could still go back and do that analysis for
vmscape. It's fresh enough in our minds to matter.
1.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=223ba8ee0a3986718c874b66ed24e7f87f6b8124
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI
2025-10-02 16:40 ` Dave Hansen
@ 2025-10-02 17:08 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 17:08 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On Thu Oct 2, 2025 at 4:40 PM UTC, Dave Hansen wrote:
> On 10/2/25 07:31, Brendan Jackman wrote:
> It's actually anything that has _PAGE_PRESENT in cpa->mask_set. There
> are a number of those. Some of them are irrelevant like the execmem
> code, but there are quite a few more that look troublesome outside of
> debugging environments.
>
>>> Also, could we try and make the nomenclature consistent? We've got
>>> "unrestricted direct map" and "asi_nonsensitive_pgd" being used (at
>>> least). Could the terminology be made more consistent?
>>
>> Hm. It is actually consistent: "unrestricted" is a property of the
>> address space / execution context. "nonsensitive" is a property of the
>> memory. Nonsensitive memory is mapped into the unrestricted address
>> space. asi_nonsensitive_pgd isn't an address space we enter it's just a
>> holding area (like if we never actually pointed CR3 at init_mm.pgd but
>> just useed it as a source to clone from).
>>
>> However.. just because it's consistent doesn't mean it's not confusing.
>> Do you think we should just squash these two words and call the whole
>> thing "nonsensitive"? I don't know if "nonsensitive address space" makes
>> much sense... Is it possible I can fix this by just adding more
>> comments?
>
> It makes sense to me that a "nonsensitive address space" would not map
> any sensitive data and that a "asi_nonsensitive_pgd" is the root of that
> address space.
OK, then it probably just sounds wrong to me because I'm steeped in the
current jargon. For v2 I'll try just dropping "[un]restricted".
>>>> static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>>> {
>>>> unsigned long numpages = cpa->numpages;
>>>> @@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>>> if (!debug_pagealloc_enabled())
>>>> spin_lock(&cpa_lock);
>>>> ret = __change_page_attr(cpa, primary);
>>>> + if (!ret)
>>>> + ret = mirror_asi_direct_map(cpa, primary);
>>>> if (!debug_pagealloc_enabled())
>>>> spin_unlock(&cpa_lock);
>>>> if (ret)
>>>>
>>>
>>> Is cpa->pgd ever have any values other than NULL or init_mm->pgd? I
>>> didn't see anything in a quick grep.
>>
>> It can also be efi_mm.pgd via sev_es_efi_map_ghcbs_cas().
>
> It would be _nice_ if the ASI exclusion wasn't so magic.
>
> Like, instead of hooking in to __change_page_attr_set_clr() and
> filtering on init_mm if we had the callers declare explicitly whether
> their changes get reflected into the ASI nonsensitive PGD.
>
> Maybe that looks like a new flag: CPA_DIRECT_MAP or something. Once you
> pass that flag in, the cpa code knows that you're working on init_mm.pgd
> and mirror_asi_direct_map() can look for *that* instead of init_mm.
Sounds good to me. If "The Direct Map" is a gonna be a special thing
then having it be a flag instead of a certain magic pgd_t * makes sense
to me. I'll try this out.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-10-02 16:14 ` Dave Hansen
@ 2025-10-02 17:19 ` Brendan Jackman
2025-11-12 19:39 ` Dave Hansen
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 17:19 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On Thu Oct 2, 2025 at 4:14 PM UTC, Dave Hansen wrote:
> On 10/2/25 07:05, Brendan Jackman wrote:
>> On Wed Oct 1, 2025 at 8:28 PM UTC, Dave Hansen wrote:
> ...>> I also can't help but wonder if it would have been easier and more
>>> straightforward to just start this whole exercise at 4k: force all the
>>> ASI tables to be 4k. Then, later, add the 2MB support and tie to
>>> pageblocks on after.
>>
>> This would lead to a much smaller patchset, but I think it creates some
>> pretty yucky technical debt and complexity of its own. If you're
>> imagining a world where we just leave most of the allocator as-is, and
>> just inject "map into ASI" or "unmap from ASI" at the right moments...
> ...
>
> I'm trying to separate out the two problems:
>
> 1. Have a set of page tables that never require allocations in order to
> map or unmap sensitive data.
> 2. Manage each pageblock as either all sensitive or all not sensitive
>
> There is a nonzero set of dependencies to make sure that the pageblock
> size is compatible with the page table mapping size... unless you just
> make the mapping size 4k.
>
> If the mapping size is 4k, the pageblock size can be anything. There's
> no dependency to satisfy.
>
> So I'm not saying to make the sensitive/nonsensitive boundary 4k. Just
> to make the _mapping_ size 4k. Then, come back later, and move the
> mapping size over to 2MB as an optimization.
Ahh thanks, I get your point now. And yep I'm sold, I'll go to 4k for
v2.
>>>> + if (asi_nonsensitive_pgd) {
>>>> + /*
>>>> + * Since most memory is expected to end up sensitive, start with
>>>> + * everything unmapped in this pagetable.
>>>> + */
>>>> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
>>>> +
>>>> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
>>>> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
>>>> + prot_np, init, NULL);
>>>> + }
>>>
>>> I'm also kinda wondering what the purpose is of having a whole page
>>> table full of !_PAGE_PRESENT entries. It would be nice to know how this
>>> eventually gets turned into something useful.
>>
>> If you are thinking of the fact that just clearing P doesn't really do
>> anything for Meltdown/L1TF.. yeah that's true! We'll actually need to
>> munge the PFN or something too, but here I wanted do just focus on the
>> broad strokes of integration without worrying too much about individual
>> CPU mitigations. Flippping _PAGE_PRESENT is already supported by
>> set_memory.c and IIRC it's good enough for everything newer than
>> Skylake.
>>
>> Other than that, these pages being unmapped is the whole point.. later
>> on, the subset of memory that we don't need to protect will get flipped
>> to being present. Everything else will trigger a pagefault if touched
>> and we'll switch address spaces, do the flushing etc.
>>
>> Sorry if I'm missing your point here...
>
> What is the point of having a pgd if you can't put it in CR3? If you:
>
> write_cr3(asi_nonsensitive_pgd);
>
> you'll just triple fault because all kernel text is !_PAGE_PRESENT.
>
> The critical point is when 'asi_nonsensitive_pgd' is functional enough
> that it can be loaded into CR3 and handle a switch to the normal
> init_mm->pgd.
Hm, are you saying that I should expand the scope of the patchset from
"set up the direct map" to "set up an ASI address space"? If so, yeah I
can do that, I don't think the patchset would get that much bigger. I
only left the other bits out because it feels weird to set up a whole
address space but never actually switch into it. Setting up the logic to
switch into it would make the patchset really big though.
Like I said in the cover letter, I could also always change tack:
we could instead start with all the address-space switching logic, but
just have the two address spaces be clones of each other. Then we could
come back and start poking holes in the ASI one for the second series. I
don't have a really strong opinion about the best place to start, but
I'll stick to my current course unless someone else does have a strong
opinion.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 00/21] mm: ASI direct map management
2025-10-02 17:01 ` Dave Hansen
@ 2025-10-02 19:19 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-10-02 19:19 UTC (permalink / raw)
To: Dave Hansen, Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, Yosry Ahmed
On Thu Oct 2, 2025 at 5:01 PM UTC, Dave Hansen wrote:
> On 10/2/25 04:23, Brendan Jackman wrote:
> ...
>> [Well, I'm assuming there that we include the actual security flushes in
>> series 2, maybe that would be more like "2b"...]
>>
>> To get to the more interesting cases where it's faster than the current
>> default, I think is not that far away for KVM usecases. I think the
>> branch I posted in my [Discuss] thread[0] gets competitive with existing
>> KVM usecases well before it devolves into the really hacky prototype
>> stuff.
>>
>> To get to the actual goal, where ASI can become the global default (i.e.
>> it's still fast when you sandbox native tasks as well as KVM guests), is
>> further since we need to figure out the details on something like what I
>> called the "ephmap" in [0].
>>
>> There are competing tensions here - we would prefer not to merge code
>> that "doesn't do anything", but on the other hand I don't think anyone
>> wants to find themselves receiving [PATCH v34 19/40] next July... so
>> I've tried to strike a balance here. Something like:
>>
>> 1. Develop a consensus that "we probably want ASI and it's worth trying"
>>
>> 2. Start working towards it in-tree, by breaking it down into smaller
>> chunks.
>
> Just to be clear: we don't merge code that doesn't do anything
> functional. The bar for inclusion is that it has to do something
> practical and useful for end users. It can't be purely infrastructure or
> preparatory.
>
> Protection keys is a good example. It was a big, gnarly series that
> could be roughly divided into two pieces: one that did all the page
> table gunk, and all the new ABI bits around exposing pkeys to apps. But
> we found a way to do all the page table gunk with no new ABI and that
> also gave security folks something they wanted: execute_only_pkey().
>
> So we merged all the page table and internal gunk first, and then the
> new ABI a release or two later.
>
> But the important part was that it had _some_ functionality from day one
> when it was merged. It wasn't purely infrastructure.
OK thanks, after our IRC chat I understand this now. So in the case of
pkeys I guess the internal gunk didn't "do anything" per se but it was a
clear improvement in the code in its own right. So I'll look for a way
to split out the preparatory stuff to be more like that. And then I'll
try to get a single patchset that goes from "no ASI" to "ASI that does
_something_ useful". I think it's inevitable that this will still be
rather on the large side but I'll do my best.
>> Do you think it would help if I started also maintaining an asi-next
>> branch with the next few things all queued up and benchmarked, so we can
>> get a look at the "goal state" while also keeping an eye on the here and
>> now? Or do you have other suggestions for the strategy here?
>
> Yes, I think that would be useful.
>
> For instance, imagine you'd had that series sitting around:
> 6.16-asi-next. Then, all of a sudden you see the vmscape series[1] show
> up. Ideally, you'd take your 6.16-asi-next branch and show us how much
> simpler and faster it is to mitigate vmscape with ASI instead of the
> IBPB silliness that we ended up with.
>
> Basically, use your asi-next branch to bludgeon us each time we _should_
> have been using it.
>
> It's also not too late. You could still go back and do that analysis for
> vmscape. It's fresh enough in our minds to matter.
>
> 1.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=223ba8ee0a3986718c874b66ed24e7f87f6b8124
And yep, I'll take a look at this too. Thanks very much for taking a
look and for all of the valuable pointers.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
2025-09-24 14:59 ` [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
@ 2025-10-24 22:37 ` Borislav Petkov
2025-10-24 23:32 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-10-24 22:37 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Wed, Sep 24, 2025 at 02:59:36PM +0000, Brendan Jackman wrote:
> This long awkward name is for consistency with
> CONFIG_MITIGATION_PAGE_TABLE_ISOLATION.
But why?
I bet you someone will get confused and mean
CONFIG_MITIGATION_PAGE_TABLE_ISOLATION when she means
CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION or vice versa due to the
conglomerate of similar words.
Now compare that to CONFIG_ASI! Wonderfully short and clear.
Especially when the namespace already is "asi_" ...
The only problem with ASI is it doesn't tell you what it is but you can look
it up with simple grepping...
I'd say.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
2025-10-24 22:37 ` Borislav Petkov
@ 2025-10-24 23:32 ` Brendan Jackman
2025-10-25 9:57 ` Borislav Petkov
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-24 23:32 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Fri Oct 24, 2025 at 10:37 PM UTC, Borislav Petkov wrote:
> On Wed, Sep 24, 2025 at 02:59:36PM +0000, Brendan Jackman wrote:
>> This long awkward name is for consistency with
>> CONFIG_MITIGATION_PAGE_TABLE_ISOLATION.
>
> But why?
>
> I bet you someone will get confused and mean
> CONFIG_MITIGATION_PAGE_TABLE_ISOLATION when she means
> CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION or vice versa due to the
> conglomerate of similar words.
>
> Now compare that to CONFIG_ASI! Wonderfully short and clear.
>
> Especially when the namespace already is "asi_" ...
>
> The only problem with ASI is it doesn't tell you what it is but you can look
> it up with simple grepping...
>
> I'd say.
Sure, CONFIG_ASI sounds great to me, if it sounds good to you :)
And yeah if someone doesn't know what ASI is, they probably don't know
what ADDRESS_SPACE_ISOLATION is either to be honest. The Kconfig file
has a nice place to document it.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
2025-10-24 23:32 ` Brendan Jackman
@ 2025-10-25 9:57 ` Borislav Petkov
0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2025-10-25 9:57 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Fri, Oct 24, 2025 at 11:32:30PM +0000, Brendan Jackman wrote:
> Sure, CONFIG_ASI sounds great to me, if it sounds good to you :)
Meh, I'm just being practical. :)
> And yeah if someone doesn't know what ASI is, they probably don't know
> what ADDRESS_SPACE_ISOLATION is either to be honest. The Kconfig file
> has a nice place to document it.
Right.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi=
2025-09-24 14:59 ` [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi= Brendan Jackman
@ 2025-10-25 10:06 ` Borislav Petkov
2025-10-26 22:24 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-10-25 10:06 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, Yosry Ahmed
On Wed, Sep 24, 2025 at 02:59:37PM +0000, Brendan Jackman wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..9b8330fc1fe31721af39b08b58b729ced78ba803 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5324,6 +5324,14 @@
>
> Not specifying this option is equivalent to pti=auto.
>
> + asi= [X86-64] Control Address Space Isolation (ASI), a
> + technology for mitigating CPU vulnerabilities.
> ASI is
> + not yet ready to provide security guarantees but can be
> + enabled for evaluation.
Yeah, no need for such "temporary" statements in the help text since you're
going to have to touch it again once it becomes a full-fledged feature.
> + on - unconditionally enable
> + off - unconditionally disable
"unconditionally" as opposed to some other setting which is conditional?
> +
> pty.legacy_count=
> [KNL] Number of legacy pty's. Overwrites compiled-in
> default number.
> diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
> index 53acdf22fe33efc6ccedbae52b262a904868459a..32a4c04c4be0f6f425c7cbcff4c58f1827a4b4c4 100644
> --- a/arch/x86/include/asm/asi.h
> +++ b/arch/x86/include/asm/asi.h
> @@ -2,4 +2,14 @@
> #ifndef _ASM_X86_ASI_H
> #define _ASM_X86_ASI_H
>
> +#include <asm/cpufeature.h>
> +
> +void asi_check_boottime_disable(void);
> +
> +/* Helper for generic code. Arch code just uses cpu_feature_enabled(). */
> +static inline bool asi_enabled_static(void)
"static" because? There will be a dynamic one too?
> +{
> + return cpu_feature_enabled(X86_FEATURE_ASI);
> +}
> +
> #endif /* _ASM_X86_ASI_H */
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3eee24a4cabf3b2131c34596236d8bc8eec05b3b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -499,6 +499,7 @@
> #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
> #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
> #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
> +#define X86_FEATURE_ASI (21*32+17) /* Kernel Address Space Isolation */
I think we really will need to show this in /proc/cpuinfo as it is a real, big
feature which gets proper kernel glue vs some silly CPUID bit.
IOW,
#define X86_FEATURE_ASI (21*32+17) /* "asi" Kernel Address Space Isolation */
^^^^
Not sure, though, when we should make it an ABI - perhaps once the whole pile
has landed...
> /*
> * BUG word(s)
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 5b9908f13dcfd092897f3778ee56ea4d45bdb868..5ecbff70964f61a903ac96cec3736a7cec1221fd 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o
> obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
> obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
> obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) += pti.o
> +obj-$(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION) += asi.o
>
> obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o
> obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o
> diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8c907f3c84f43f66e412ecbfa99e67390d31a66f
> --- /dev/null
> +++ b/arch/x86/mm/asi.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/asi.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +
> +#include <asm/cmdline.h>
> +#include <asm/cpufeature.h>
> +
> +void __init asi_check_boottime_disable(void)
> +{
> + bool enabled = false;
> + char arg[4];
> + int ret;
> +
> + ret = cmdline_find_option(boot_command_line, "asi", arg, sizeof(arg));
> + if (ret == 3 && !strncmp(arg, "off", 3)) {
> + enabled = false;
> + pr_info("ASI explicitly disabled by kernel cmdline.\n");
> + } else if (ret == 2 && !strncmp(arg, "on", 2)) {
> + enabled = true;
> + pr_info("ASI enabled.\n");
I'm not sure about those pr_info()s. When it is disabled, you can clear
X86_FEATURE_ASI so you won't see it in /proc/cpuinfo and then it is disabled.
And the same when it is enabled.
> + } else if (ret) {
> + pr_err("Unknown asi= flag '%s', try 'off' or 'on'\n", arg);
> + }
> +
> + if (enabled)
> + setup_force_cpu_cap(X86_FEATURE_ASI);
> +}
Not an early_param() ?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-09-24 14:59 ` [PATCH 03/21] x86/mm: factor out phys_pgd_init() Brendan Jackman
2025-09-27 19:29 ` kernel test robot
@ 2025-10-25 11:48 ` Borislav Petkov
2025-10-26 22:29 ` Brendan Jackman
1 sibling, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-10-25 11:48 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Wed, Sep 24, 2025 at 02:59:38PM +0000, Brendan Jackman wrote:
> +static unsigned long __meminit
> +__kernel_physical_mapping_init(unsigned long paddr_start,
> + unsigned long paddr_end,
> + unsigned long page_size_mask,
> + pgprot_t prot, bool init)
> +{
> + bool pgd_changed;
I have to say, that pgd_changed is yuck but I don't have a better idea and
this has happened a long time ago anyway.
How about you have the caller pass in false:
bool pgd_changed = false;
and then callee sets it to true when it does so?
> + unsigned long paddr_last;
The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::
struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;
The above is faster to parse than the reverse ordering::
int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;
And even more so than random ordering::
unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;
> +
> + paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
> + prot, init, &pgd_changed);
> + if (pgd_changed)
> + sync_global_pgds((unsigned long)__va(paddr_start),
> + (unsigned long)__va(paddr_end) - 1);
> +
> + return paddr_last;
> +}
>
> /*
> * Create page table mapping for the physical memory for specific physical
>
> --
> 2.50.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi=
2025-10-25 10:06 ` Borislav Petkov
@ 2025-10-26 22:24 ` Brendan Jackman
2025-11-10 11:26 ` Borislav Petkov
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-26 22:24 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, Yosry Ahmed
On Sat Oct 25, 2025 at 10:06 AM UTC, Borislav Petkov wrote:
> On Wed, Sep 24, 2025 at 02:59:37PM +0000, Brendan Jackman wrote:
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..9b8330fc1fe31721af39b08b58b729ced78ba803 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -5324,6 +5324,14 @@
>>
>> Not specifying this option is equivalent to pti=auto.
>>
>> + asi= [X86-64] Control Address Space Isolation (ASI), a
>> + technology for mitigating CPU vulnerabilities.
>
>> ASI is
>> + not yet ready to provide security guarantees but can be
>> + enabled for evaluation.
>
> Yeah, no need for such "temporary" statements in the help text since you're
> going to have to touch it again once it becomes a full-fledged feature.
Sure. Per Dave's feedback it is anyway gonna have to be merged in a
state where it already does something useful, so I guess this statement
will need to be updated regardless.
>> + on - unconditionally enable
>> + off - unconditionally disable
>
> "unconditionally" as opposed to some other setting which is conditional?
My assumption here is that eventually we'll end up with an "auto"
setting or something, where the big startup condiguration dance in
bugs.c decides whether to enable ASI based on your CPU/attack vector
controls/individual mitigation configs.
>> +
>> pty.legacy_count=
>> [KNL] Number of legacy pty's. Overwrites compiled-in
>> default number.
>> diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
>> index 53acdf22fe33efc6ccedbae52b262a904868459a..32a4c04c4be0f6f425c7cbcff4c58f1827a4b4c4 100644
>> --- a/arch/x86/include/asm/asi.h
>> +++ b/arch/x86/include/asm/asi.h
>> @@ -2,4 +2,14 @@
>> #ifndef _ASM_X86_ASI_H
>> #define _ASM_X86_ASI_H
>>
>> +#include <asm/cpufeature.h>
>> +
>> +void asi_check_boottime_disable(void);
>> +
>> +/* Helper for generic code. Arch code just uses cpu_feature_enabled(). */
>> +static inline bool asi_enabled_static(void)
>
> "static" because? There will be a dynamic one too?
Actually... no, probably not - should drop this from the name, thanks.
(At some point I think I was worried about spamming the kernel image
with too many static branches, but this seems like a bridge to cross if
we ever actually come to it).
>> +{
>> + return cpu_feature_enabled(X86_FEATURE_ASI);
>> +}
>> +
>> #endif /* _ASM_X86_ASI_H */
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3eee24a4cabf3b2131c34596236d8bc8eec05b3b 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -499,6 +499,7 @@
>> #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
>> #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
>> #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
>> +#define X86_FEATURE_ASI (21*32+17) /* Kernel Address Space Isolation */
>
> I think we really will need to show this in /proc/cpuinfo as it is a real, big
> feature which gets proper kernel glue vs some silly CPUID bit.
>
> IOW,
>
> #define X86_FEATURE_ASI (21*32+17) /* "asi" Kernel Address Space Isolation */
> ^^^^
>
> Not sure, though, when we should make it an ABI - perhaps once the whole pile
> has landed...
Hm yeah, I actually also thought I had some direct feedback from one of
the x86 maintainers saying not to expose it here. I can no longer find
that feedback on Lore so I think I must be misremembering, the flag
was already hidden back in [0].
[0] https://lore.kernel.org/linux-mm/20240712-asi-rfc-24-v1-5-144b319a40d8@google.com/
If that feedback indeed doesn't exist then personally I'd lean towards
exposing it right away, I don't see that much downside in terms of ABI,
since ASI kinda "doesn't do anything", from a SW point of view it's just
a very weird and complicated NOP. It's hard for me to see how userspace
could grow a functional dependency on this flag. Whereas for general
monitoring it's handy.
Still, there's definitely no hill I'd die on here.
>> /*
>> * BUG word(s)
>> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
>> index 5b9908f13dcfd092897f3778ee56ea4d45bdb868..5ecbff70964f61a903ac96cec3736a7cec1221fd 100644
>> --- a/arch/x86/mm/Makefile
>> +++ b/arch/x86/mm/Makefile
>> @@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o
>> obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>> obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>> obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) += pti.o
>> +obj-$(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION) += asi.o
>>
>> obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o
>> obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o
>> diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..8c907f3c84f43f66e412ecbfa99e67390d31a66f
>> --- /dev/null
>> +++ b/arch/x86/mm/asi.c
>> @@ -0,0 +1,28 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/asi.h>
>> +#include <linux/init.h>
>> +#include <linux/string.h>
>> +
>> +#include <asm/cmdline.h>
>> +#include <asm/cpufeature.h>
>> +
>> +void __init asi_check_boottime_disable(void)
>> +{
>> + bool enabled = false;
>> + char arg[4];
>> + int ret;
>> +
>> + ret = cmdline_find_option(boot_command_line, "asi", arg, sizeof(arg));
>> + if (ret == 3 && !strncmp(arg, "off", 3)) {
>> + enabled = false;
>> + pr_info("ASI explicitly disabled by kernel cmdline.\n");
>> + } else if (ret == 2 && !strncmp(arg, "on", 2)) {
>> + enabled = true;
>> + pr_info("ASI enabled.\n");
>
> I'm not sure about those pr_info()s. When it is disabled, you can clear
> X86_FEATURE_ASI so you won't see it in /proc/cpuinfo and then it is disabled.
> And the same when it is enabled.
Agreed, if we expose the CPU flag we don't need the printks.
>> + } else if (ret) {
>> + pr_err("Unknown asi= flag '%s', try 'off' or 'on'\n", arg);
>> + }
>> +
>> + if (enabled)
>> + setup_force_cpu_cap(X86_FEATURE_ASI);
>> +}
>
> Not an early_param() ?
Oh this is just for consistency with pti_check_boottime_disable(). But,
I think that function actually exists because of init ordering issues
that aren't relevant here, so early_param() seems fine to me (or, if I
find some reason why it doesn't, work, I'll add a comment in v2 to
explain why we don't use it).
Thanks for taking a look :)
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-10-25 11:48 ` Borislav Petkov
@ 2025-10-26 22:29 ` Brendan Jackman
2025-11-10 11:38 ` Borislav Petkov
0 siblings, 1 reply; 65+ messages in thread
From: Brendan Jackman @ 2025-10-26 22:29 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Sat Oct 25, 2025 at 11:48 AM UTC, Borislav Petkov wrote:
> On Wed, Sep 24, 2025 at 02:59:38PM +0000, Brendan Jackman wrote:
>> +static unsigned long __meminit
>> +__kernel_physical_mapping_init(unsigned long paddr_start,
>> + unsigned long paddr_end,
>> + unsigned long page_size_mask,
>> + pgprot_t prot, bool init)
>> +{
>> + bool pgd_changed;
>
> I have to say, that pgd_changed is yuck but I don't have a better idea and
> this has happened a long time ago anyway.
>
> How about you have the caller pass in false:
>
> bool pgd_changed = false;
>
> and then callee sets it to true when it does so?
Sure.
Per Dave's feedback I am still slightly hopeful I can find a way to
come in and refactor this code so that it's gets cleaner for you guys
and then ASI becomes a natural addition. So far I don't come up with
anything in init_64.c but I'm still planning to stare at set_memory.c a
while longer and see if anything comes to mind. So maybe we'll be able
to reduce the yuck factor a bit.
>> + unsigned long paddr_last;
>
> The tip-tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
>
> struct long_struct_name *descriptive_name;
> unsigned long foo, bar;
> unsigned int tmp;
> int ret;
>
> The above is faster to parse than the reverse ordering::
>
> int ret;
> unsigned int tmp;
> unsigned long foo, bar;
> struct long_struct_name *descriptive_name;
>
> And even more so than random ordering::
>
> unsigned long foo, bar;
> int ret;
> struct long_struct_name *descriptive_name;
> unsigned int tmp;
Ack
>> +
>> + paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
>> + prot, init, &pgd_changed);
>> + if (pgd_changed)
>> + sync_global_pgds((unsigned long)__va(paddr_start),
>> + (unsigned long)__va(paddr_end) - 1);
>> +
>> + return paddr_last;
>> +}
>>
>> /*
>> * Create page table mapping for the physical memory for specific physical
>>
>> --
>> 2.50.1
>>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi=
2025-10-26 22:24 ` Brendan Jackman
@ 2025-11-10 11:26 ` Borislav Petkov
2025-11-10 12:15 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-11-10 11:26 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, Yosry Ahmed
On Sun, Oct 26, 2025 at 10:24:35PM +0000, Brendan Jackman wrote:
> Hm yeah, I actually also thought I had some direct feedback from one of
> the x86 maintainers saying not to expose it here. I can no longer find
> that feedback on Lore so I think I must be misremembering, the flag
> was already hidden back in [0].
>
> [0] https://lore.kernel.org/linux-mm/20240712-asi-rfc-24-v1-5-144b319a40d8@google.com/
>
> If that feedback indeed doesn't exist
Just ignore everything whoever might've told you or not - we override all
previous statements! :-P
From Documentation/arch/x86/cpuinfo.rst
"So, the current use of /proc/cpuinfo is to show features which the
kernel has *enabled* and *supports*. As in: the CPUID feature flag is
there, there's an additional setup which the kernel has done while
booting and the functionality is ready to use. A perfect example for
that is "user_shstk" where additional code enablement is present in the
kernel to support shadow stack for user programs."
So it is all written down now and is the law! :-P
> then personally I'd lean towards exposing it right away, I don't see that
> much downside in terms of ABI, since ASI kinda "doesn't do anything", from
> a SW point of view it's just a very weird and complicated NOP. It's hard for
> me to see how userspace could grow a functional dependency on this flag.
> Whereas for general monitoring it's handy.
The point is: once all the ASI code lands, we should show it in cpuinfo. As
in: "this kernel supports ASI" and not "there's asi in cpuinfo but well,
that's not the whole deal."
Makes sense?
> > Not an early_param() ?
>
> Oh this is just for consistency with pti_check_boottime_disable(). But,
> I think that function actually exists because of init ordering issues
> that aren't relevant here, so early_param() seems fine to me (or, if I
> find some reason why it doesn't, work, I'll add a comment in v2 to
> explain why we don't use it).
Ack.
> Thanks for taking a look :)
Sure, np.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-10-26 22:29 ` Brendan Jackman
@ 2025-11-10 11:38 ` Borislav Petkov
2025-11-10 12:36 ` Brendan Jackman
0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-11-10 11:38 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Sun, Oct 26, 2025 at 10:29:23PM +0000, Brendan Jackman wrote:
> Per Dave's feedback I am still slightly hopeful I can find a way to
> come in and refactor this code so that it's gets cleaner for you guys
> and then ASI becomes a natural addition. So far I don't come up with
> anything in init_64.c but I'm still planning to stare at set_memory.c a
> while longer and see if anything comes to mind. So maybe we'll be able
> to reduce the yuck factor a bit.
Cleanups like that are always more than welcome!
:-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi=
2025-11-10 11:26 ` Borislav Petkov
@ 2025-11-10 12:15 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-11-10 12:15 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, Yosry Ahmed
On Mon Nov 10, 2025 at 11:26 AM UTC, Borislav Petkov wrote:
> On Sun, Oct 26, 2025 at 10:24:35PM +0000, Brendan Jackman wrote:
>> Hm yeah, I actually also thought I had some direct feedback from one of
>> the x86 maintainers saying not to expose it here. I can no longer find
>> that feedback on Lore so I think I must be misremembering, the flag
>> was already hidden back in [0].
>>
>> [0] https://lore.kernel.org/linux-mm/20240712-asi-rfc-24-v1-5-144b319a40d8@google.com/
>>
>> If that feedback indeed doesn't exist
>
> Just ignore everything whoever might've told you or not - we override all
> previous statements! :-P
>
> From Documentation/arch/x86/cpuinfo.rst
>
> "So, the current use of /proc/cpuinfo is to show features which the
> kernel has *enabled* and *supports*. As in: the CPUID feature flag is
> there, there's an additional setup which the kernel has done while
> booting and the functionality is ready to use. A perfect example for
> that is "user_shstk" where additional code enablement is present in the
> kernel to support shadow stack for user programs."
>
> So it is all written down now and is the law! :-P
>
>> then personally I'd lean towards exposing it right away, I don't see that
>> much downside in terms of ABI, since ASI kinda "doesn't do anything", from
>> a SW point of view it's just a very weird and complicated NOP. It's hard for
>> me to see how userspace could grow a functional dependency on this flag.
>> Whereas for general monitoring it's handy.
>
> The point is: once all the ASI code lands, we should show it in cpuinfo. As
> in: "this kernel supports ASI" and not "there's asi in cpuinfo but well,
> that's not the whole deal."
>
> Makes sense?
Sure, sounds good to me.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/21] x86/mm: factor out phys_pgd_init()
2025-11-10 11:38 ` Borislav Petkov
@ 2025-11-10 12:36 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-11-10 12:36 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Mon Nov 10, 2025 at 11:38 AM UTC, Borislav Petkov wrote:
> On Sun, Oct 26, 2025 at 10:29:23PM +0000, Brendan Jackman wrote:
>> Per Dave's feedback I am still slightly hopeful I can find a way to
>> come in and refactor this code so that it's gets cleaner for you guys
>> and then ASI becomes a natural addition. So far I don't come up with
>> anything in init_64.c but I'm still planning to stare at set_memory.c a
>> while longer and see if anything comes to mind. So maybe we'll be able
>> to reduce the yuck factor a bit.
>
> Cleanups like that are always more than welcome!
>
> :-)
In that case, I will advertise this (less ambitious) cleanup which is
awaiting review:
https://lore.kernel.org/all/20251003-x86-init-cleanup-v1-4-f2b7994c2ad6@google.com/
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-09-24 14:59 ` [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd Brendan Jackman
2025-10-01 20:28 ` Dave Hansen
@ 2025-11-11 14:55 ` Borislav Petkov
2025-11-11 17:53 ` Brendan Jackman
1 sibling, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2025-11-11 14:55 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Wed, Sep 24, 2025 at 02:59:39PM +0000, Brendan Jackman wrote:
> @@ -797,6 +800,24 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
>
> paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
> prot, init, &pgd_changed);
> +
> + /*
> + * Set up ASI's unrestricted physmap. This needs to mapped at minimum 2M
> + * size so that regions can be mapped and unmapped at pageblock
> + * granularity without requiring allocations.
> + */
> + if (asi_nonsensitive_pgd) {
> + /*
> + * Since most memory is expected to end up sensitive, start with
> + * everything unmapped in this pagetable.
> + */
> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
> +
> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
> + prot_np, init, NULL);
> + }
This looks weird: so you have some other function - asi_init() - which *must*
run before this one so that the pgd is allocated. But then you check it here
and in order to do such a "distributed" init, you export it too.
Instead, I'd simply add a function call here - asi_init_physmap() or whatever
- which is defined in asi.c and gets *only* called from here. And that
function returns the pgd or NULL. And then you use phys_pgd_init() on it.
Also, looking at kernel_map_pages_in_pgd() - and you mentioned set_memory.c
already - and if I squint my eyes hard enough, it does look like a bunch of
redundancy between there and init_64.c. But that's nasty code so unifying that
would be a hard task.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-11-11 14:55 ` Borislav Petkov
@ 2025-11-11 17:53 ` Brendan Jackman
0 siblings, 0 replies; 65+ messages in thread
From: Brendan Jackman @ 2025-11-11 17:53 UTC (permalink / raw)
To: Borislav Petkov, Brendan Jackman
Cc: Andy Lutomirski, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Michal Hocko, Johannes Weiner, Zi Yan,
Axel Rasmussen, Yuanchu Xie, Roman Gushchin, peterz, dave.hansen,
mingo, tglx, akpm, david, derkling, junaids, linux-kernel,
linux-mm, reijiw, rientjes, rppt, vbabka, x86, yosry.ahmed
On Tue Nov 11, 2025 at 2:55 PM UTC, Borislav Petkov wrote:
> On Wed, Sep 24, 2025 at 02:59:39PM +0000, Brendan Jackman wrote:
>> @@ -797,6 +800,24 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
>>
>> paddr_last = phys_pgd_init(init_mm.pgd, paddr_start, paddr_end, page_size_mask,
>> prot, init, &pgd_changed);
>> +
>> + /*
>> + * Set up ASI's unrestricted physmap. This needs to mapped at minimum 2M
>> + * size so that regions can be mapped and unmapped at pageblock
>> + * granularity without requiring allocations.
>> + */
>> + if (asi_nonsensitive_pgd) {
>> + /*
>> + * Since most memory is expected to end up sensitive, start with
>> + * everything unmapped in this pagetable.
>> + */
>> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
>> +
>> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
>> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
>> + prot_np, init, NULL);
>> + }
>
> This looks weird: so you have some other function - asi_init() - which *must*
> run before this one so that the pgd is allocated. But then you check it here
> and in order to do such a "distributed" init, you export it too.
>
> Instead, I'd simply add a function call here - asi_init_physmap() or whatever
> - which is defined in asi.c and gets *only* called from here. And that
> function returns the pgd or NULL. And then you use phys_pgd_init() on it.
Well, this isn't the only place that refers to asi_nonsensitive_pgd in
this patchset - it's also used as a global from set_memory.c for the
later updates.
Still, you're right about the janky distributed init / setup ordering
issues. So yeah what you suggested with asi_init_physmap() (or whatever
we call it) still makes sense to me, it's just that we'd still have to
export it to set_memory.c
> Also, looking at kernel_map_pages_in_pgd() - and you mentioned set_memory.c
> already - and if I squint my eyes hard enough, it does look like a bunch of
> redundancy between there and init_64.c. But that's nasty code so unifying that
> would be a hard task.
Yeah :/ Some folks pointed out to me that all this logic is kinda
separated between the upper levels of pagetables which are preallocated,
and the lower level ones which are more complicated. So I am still
planning to see if I can come up with some sort of refactoring that only
affects the upper levels.
However, in the meantime I have switched tracks since David H pointed
out an opportunity for me to help out with the guest_memfd stuff [0].
That lets me start getting an interesting subset of this series without
needing any changes to the x86 code just yet.
[0] https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus.lmu.de/
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
2025-10-02 17:19 ` Brendan Jackman
@ 2025-11-12 19:39 ` Dave Hansen
0 siblings, 0 replies; 65+ messages in thread
From: Dave Hansen @ 2025-11-12 19:39 UTC (permalink / raw)
To: Brendan Jackman, Andy Lutomirski, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Michal Hocko,
Johannes Weiner, Zi Yan, Axel Rasmussen, Yuanchu Xie,
Roman Gushchin
Cc: peterz, bp, dave.hansen, mingo, tglx, akpm, david, derkling,
junaids, linux-kernel, linux-mm, reijiw, rientjes, rppt, vbabka,
x86, yosry.ahmed
On 10/2/25 10:19, Brendan Jackman wrote:
> On Thu Oct 2, 2025 at 4:14 PM UTC, Dave Hansen wrote:
...>> What is the point of having a pgd if you can't put it in CR3? If you:
>>
>> write_cr3(asi_nonsensitive_pgd);
>>
>> you'll just triple fault because all kernel text is !_PAGE_PRESENT.
>>
>> The critical point is when 'asi_nonsensitive_pgd' is functional enough
>> that it can be loaded into CR3 and handle a switch to the normal
>> init_mm->pgd.
>
> Hm, are you saying that I should expand the scope of the patchset from
> "set up the direct map" to "set up an ASI address space"? If so, yeah I
> can do that, I don't think the patchset would get that much bigger. I
> only left the other bits out because it feels weird to set up a whole
> address space but never actually switch into it. Setting up the logic to
> switch into it would make the patchset really big though.
The patch set has to _do_ something, though. It's fine for a patch
series to add code that then gets turned on at the end of the series.
But, at the end of the series, it has to have something to show for it.
If the series is small *and* useful, all the better. But, if I have to
choose between small or useful, it's always going to be useful.
> Like I said in the cover letter, I could also always change tack:
> we could instead start with all the address-space switching logic, but
> just have the two address spaces be clones of each other. Then we could
> come back and start poking holes in the ASI one for the second series. I
> don't have a really strong opinion about the best place to start, but
> I'll stick to my current course unless someone else does have a strong
> opinion.
Yeah, but the end of the series has to have holes poked that are
marginally useful for *SOMETHING*, at least if anyone wants it applied.
^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2025-11-12 19:41 UTC | newest]
Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 14:59 [PATCH 00/21] mm: ASI direct map management Brendan Jackman
2025-09-24 14:59 ` [PATCH 01/21] x86/mm/asi: Add CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2025-10-24 22:37 ` Borislav Petkov
2025-10-24 23:32 ` Brendan Jackman
2025-10-25 9:57 ` Borislav Petkov
2025-09-24 14:59 ` [PATCH 02/21] x86/mm/asi: add X86_FEATURE_ASI and asi= Brendan Jackman
2025-10-25 10:06 ` Borislav Petkov
2025-10-26 22:24 ` Brendan Jackman
2025-11-10 11:26 ` Borislav Petkov
2025-11-10 12:15 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 03/21] x86/mm: factor out phys_pgd_init() Brendan Jackman
2025-09-27 19:29 ` kernel test robot
2025-10-01 12:26 ` Brendan Jackman
2025-10-25 11:48 ` Borislav Petkov
2025-10-26 22:29 ` Brendan Jackman
2025-11-10 11:38 ` Borislav Petkov
2025-11-10 12:36 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd Brendan Jackman
2025-10-01 20:28 ` Dave Hansen
2025-10-02 14:05 ` Brendan Jackman
2025-10-02 16:14 ` Dave Hansen
2025-10-02 17:19 ` Brendan Jackman
2025-11-12 19:39 ` Dave Hansen
2025-11-11 14:55 ` Borislav Petkov
2025-11-11 17:53 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI Brendan Jackman
2025-09-25 13:36 ` kernel test robot
2025-10-01 20:50 ` Dave Hansen
2025-10-02 14:31 ` Brendan Jackman
2025-10-02 16:40 ` Dave Hansen
2025-10-02 17:08 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 06/21] mm/page_alloc: add __GFP_SENSITIVE and always set it Brendan Jackman
2025-10-01 21:18 ` Dave Hansen
2025-10-02 14:34 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 07/21] mm: introduce for_each_free_list() Brendan Jackman
2025-09-24 14:59 ` [PATCH 08/21] mm: rejig pageblock mask definitions Brendan Jackman
2025-09-24 14:59 ` [PATCH 09/21] mm/page_alloc: Invert is_check_pages_enabled() check Brendan Jackman
2025-09-24 14:59 ` [PATCH 10/21] mm/page_alloc: remove ifdefs from pindex helpers Brendan Jackman
2025-09-24 14:59 ` [PATCH 11/21] mm: introduce freetype_t Brendan Jackman
2025-09-25 13:15 ` kernel test robot
2025-10-01 21:20 ` Dave Hansen
2025-10-02 14:39 ` Brendan Jackman
2025-09-24 14:59 ` [PATCH 12/21] mm/asi: encode sensitivity in freetypes and pageblocks Brendan Jackman
2025-09-24 14:59 ` [PATCH 13/21] mm/page_alloc_test: unit test pindex helpers Brendan Jackman
2025-09-25 13:36 ` kernel test robot
2025-09-24 14:59 ` [PATCH 14/21] x86/mm/pat: introduce cpa_fault option Brendan Jackman
2025-09-24 14:59 ` [PATCH 15/21] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER Brendan Jackman
2025-09-24 14:59 ` [PATCH 16/21] mm/page_alloc: introduce ALLOC_NOBLOCK Brendan Jackman
2025-09-24 14:59 ` [PATCH 17/21] mm/slub: defer application of gfp_allowed_mask Brendan Jackman
2025-09-24 14:59 ` [PATCH 18/21] mm/asi: support changing pageblock sensitivity Brendan Jackman
2025-09-24 14:59 ` [PATCH 19/21] mm/asi: bad_page() when ASI mappings are wrong Brendan Jackman
2025-09-24 14:59 ` [PATCH 20/21] x86/mm/asi: don't use global pages when ASI enabled Brendan Jackman
2025-09-24 14:59 ` [PATCH 21/21] mm: asi_test: smoke test for [non]sensitive page allocs Brendan Jackman
2025-09-25 17:51 ` [PATCH 00/21] mm: ASI direct map management Brendan Jackman
2025-09-30 19:51 ` Konrad Rzeszutek Wilk
2025-10-01 7:12 ` Brendan Jackman
2025-10-01 19:54 ` Dave Hansen
2025-10-01 20:22 ` Yosry Ahmed
2025-10-01 20:30 ` Dave Hansen
2025-10-02 11:05 ` Brendan Jackman
2025-10-01 20:59 ` Dave Hansen
2025-10-02 7:34 ` David Hildenbrand
2025-10-02 11:23 ` Brendan Jackman
2025-10-02 17:01 ` Dave Hansen
2025-10-02 19:19 ` Brendan Jackman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).