* [PATCH 0/3] arm64: Assorted GCS fixes
@ 2026-02-20 14:05 Catalin Marinas
2026-02-20 14:05 ` [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled Catalin Marinas
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Catalin Marinas @ 2026-02-20 14:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Mark Brown, Will Deacon, David Hildenbrand, Emanuele Rocca,
Mark Rutland
The first patch fixes a kernel panic when LPA2 is enabled together with
GCS because of the PTE_SHARED bits in _PAGE_GCS{,RO} overlapping with
OA[9:8].
The second patch allows PROT_NONE mappings even with GCS, useful for
NUMA balancing.
The third patch is an optimisation that was added to the normal stacks
as well (when mapped with MAP_STACK) as it doesn't make sense to use a
THP for a small GCS mapping. This patch also updates the do_mmap() call
in alloc_gcs() to use PROT_WRITE instead of VM_WRITE for consistency.
Thanks.
Catalin Marinas (3):
arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is
enabled
arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing
arm64: gcs: Do not map the guarded control stack as THP
arch/arm64/include/asm/pgtable-prot.h | 4 ++--
arch/arm64/mm/gcs.c | 8 ++++++--
arch/arm64/mm/mmap.c | 10 +++++++++-
3 files changed, 17 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled 2026-02-20 14:05 [PATCH 0/3] arm64: Assorted GCS fixes Catalin Marinas @ 2026-02-20 14:05 ` Catalin Marinas 2026-02-20 15:56 ` David Hildenbrand (Arm) 2026-02-20 14:05 ` [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing Catalin Marinas 2026-02-20 14:05 ` [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP Catalin Marinas 2 siblings, 1 reply; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 14:05 UTC (permalink / raw) To: linux-arm-kernel Cc: Mark Brown, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the shareability attribute with bits 50-51 of the output address. The _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 and they match the other user _PAGE_* prot macros. However, the difference is that all the classic prot values are accessed via protection_map[] and have the PTE_SHARED bits removed when LPA2 is enabled. Ensure that PAGE_GCS{,RO} use the dynamic PTE_MAYBE_SHARED instead of the static PTE_SHARED. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Reported-by: Emanuele Rocca <emanuele.rocca@arm.com> Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> --- arch/arm64/include/asm/pgtable-prot.h | 4 ++-- arch/arm64/mm/mmap.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 161e8660eddd..a65f2c50e9ca 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -164,8 +164,8 @@ static inline bool __pure lpa2_is_enabled(void) #define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER) #define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER) -#define PAGE_GCS __pgprot(_PAGE_GCS) -#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO) +#define PAGE_GCS __pgprot((_PAGE_GCS & ~PTE_SHARED) | PTE_MAYBE_SHARED) +#define PAGE_GCS_RO __pgprot((_PAGE_GCS_RO & ~PTE_SHARED) | PTE_MAYBE_SHARED) #define PIE_E0 ( \ PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS), PIE_GCS) | \ diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 08ee177432c2..2e404441063b 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -87,7 +87,7 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) /* Short circuit GCS to avoid bloating the table. */ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { - prot = _PAGE_GCS_RO; + prot = pgprot_val(PAGE_GCS_RO); } else { prot = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled 2026-02-20 14:05 ` [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled Catalin Marinas @ 2026-02-20 15:56 ` David Hildenbrand (Arm) 2026-02-20 16:45 ` Catalin Marinas 0 siblings, 1 reply; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-20 15:56 UTC (permalink / raw) To: Catalin Marinas, linux-arm-kernel Cc: Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On 2/20/26 15:05, Catalin Marinas wrote: > When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the > shareability attribute with bits 50-51 of the output address. The > _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 and they > match the other user _PAGE_* prot macros. I assume that comes from _PAGE_DEFAULT -> _PROT_DEFAULT > However, the difference is > that all the classic prot values are accessed via protection_map[] and > have the PTE_SHARED bits removed when LPA2 is enabled. > > Ensure that PAGE_GCS{,RO} use the dynamic PTE_MAYBE_SHARED instead of > the static PTE_SHARED. I expected here a quick description of the symptom: "Leaving PTE_SHARED set results in kernel panics." etc. :) > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") > Reported-by: Emanuele Rocca <emanuele.rocca@arm.com> > Cc: <stable@vger.kernel.org> > Cc: Mark Brown <broonie@kernel.org> > Cc: Will Deacon <will@kernel.org> > --- > arch/arm64/include/asm/pgtable-prot.h | 4 ++-- > arch/arm64/mm/mmap.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > index 161e8660eddd..a65f2c50e9ca 100644 > --- a/arch/arm64/include/asm/pgtable-prot.h > +++ b/arch/arm64/include/asm/pgtable-prot.h > @@ -164,8 +164,8 @@ static inline bool __pure lpa2_is_enabled(void) > #define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER) > #define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER) > > -#define PAGE_GCS __pgprot(_PAGE_GCS) > -#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO) > +#define PAGE_GCS __pgprot((_PAGE_GCS & ~PTE_SHARED) | PTE_MAYBE_SHARED) > +#define PAGE_GCS_RO __pgprot((_PAGE_GCS_RO & ~PTE_SHARED) | PTE_MAYBE_SHARED) > > #define PIE_E0 ( \ > PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS), PIE_GCS) | \ > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > index 08ee177432c2..2e404441063b 100644 > --- a/arch/arm64/mm/mmap.c > +++ b/arch/arm64/mm/mmap.c > @@ -87,7 +87,7 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > > /* Short circuit GCS to avoid bloating the table. */ > if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { > - prot = _PAGE_GCS_RO; > + prot = pgprot_val(PAGE_GCS_RO); > } else { > prot = pgprot_val(protection_map[vm_flags & > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); The only confusion I have is why we don't update _PAGE_GCS/_PAGE_GCS_RO, consequently leaving PTE_SHARED set for the other users of _PAGE_GCS/_PAGE_GCS_RO in arch/arm64/include/asm/pgtable-prot.h. Staring at pte_pi_index() (and the definitions of PTE_PI_IDX_0), I assume it doesn't matter. Just curious why we don't fixup _PAGE_GCS / _PAGE_GCS_RO instead. Sorry for the probably stupid questions, still learning all these arch details :) -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled 2026-02-20 15:56 ` David Hildenbrand (Arm) @ 2026-02-20 16:45 ` Catalin Marinas 2026-02-20 16:47 ` Catalin Marinas 0 siblings, 1 reply; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 16:45 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: linux-arm-kernel, Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On Fri, Feb 20, 2026 at 04:56:26PM +0100, David Hildenbrand wrote: > On 2/20/26 15:05, Catalin Marinas wrote: > > When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the > > shareability attribute with bits 50-51 of the output address. The > > _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 and they > > match the other user _PAGE_* prot macros. > > I assume that comes from _PAGE_DEFAULT -> _PROT_DEFAULT Yes. > > However, the difference is > > that all the classic prot values are accessed via protection_map[] and > > have the PTE_SHARED bits removed when LPA2 is enabled. > > > > Ensure that PAGE_GCS{,RO} use the dynamic PTE_MAYBE_SHARED instead of > > the static PTE_SHARED. > > I expected here a quick description of the symptom: "Leaving PTE_SHARED set > results in kernel panics." etc. :) Ah, yes, I forgot to give the details of the fault - a lot worse with THP, unhandled page fault, or bad page warning with small pages. I'll respin with some better comment. > > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > > index 161e8660eddd..a65f2c50e9ca 100644 > > --- a/arch/arm64/include/asm/pgtable-prot.h > > +++ b/arch/arm64/include/asm/pgtable-prot.h > > @@ -164,8 +164,8 @@ static inline bool __pure lpa2_is_enabled(void) > > #define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER) > > #define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER) > > -#define PAGE_GCS __pgprot(_PAGE_GCS) > > -#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO) > > +#define PAGE_GCS __pgprot((_PAGE_GCS & ~PTE_SHARED) | PTE_MAYBE_SHARED) > > +#define PAGE_GCS_RO __pgprot((_PAGE_GCS_RO & ~PTE_SHARED) | PTE_MAYBE_SHARED) > > #define PIE_E0 ( \ > > PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS), PIE_GCS) | \ > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > > index 08ee177432c2..2e404441063b 100644 > > --- a/arch/arm64/mm/mmap.c > > +++ b/arch/arm64/mm/mmap.c > > @@ -87,7 +87,7 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > > /* Short circuit GCS to avoid bloating the table. */ > > if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { > > - prot = _PAGE_GCS_RO; > > + prot = pgprot_val(PAGE_GCS_RO); > > } else { > > prot = pgprot_val(protection_map[vm_flags & > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > > The only confusion I have is why we don't update _PAGE_GCS/_PAGE_GCS_RO, > consequently leaving PTE_SHARED set for the other users of > _PAGE_GCS/_PAGE_GCS_RO in arch/arm64/include/asm/pgtable-prot.h. > > Staring at pte_pi_index() (and the definitions of PTE_PI_IDX_0), I assume it > doesn't matter. > > Just curious why we don't fixup _PAGE_GCS / _PAGE_GCS_RO instead. _PAGE_GCS needs to be constant as it ends up in asm, so we can't add the dynamic PTE_MAYBE_SHARED. There are other ways to solve this but it is somewhat more consistent with the other _PAGE_* definitions which all have PTE_SHARED. Well, that's for a quick fix that can be easily backported. We could overhaul these macros to make them clearer. -- Catalin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled 2026-02-20 16:45 ` Catalin Marinas @ 2026-02-20 16:47 ` Catalin Marinas 0 siblings, 0 replies; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 16:47 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: linux-arm-kernel, Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On Fri, Feb 20, 2026 at 04:45:11PM +0000, Catalin Marinas wrote: > On Fri, Feb 20, 2026 at 04:56:26PM +0100, David Hildenbrand wrote: > > On 2/20/26 15:05, Catalin Marinas wrote: > > > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > > > index 161e8660eddd..a65f2c50e9ca 100644 > > > --- a/arch/arm64/include/asm/pgtable-prot.h > > > +++ b/arch/arm64/include/asm/pgtable-prot.h > > > @@ -164,8 +164,8 @@ static inline bool __pure lpa2_is_enabled(void) > > > #define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER) > > > #define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER) > > > -#define PAGE_GCS __pgprot(_PAGE_GCS) > > > -#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO) > > > +#define PAGE_GCS __pgprot((_PAGE_GCS & ~PTE_SHARED) | PTE_MAYBE_SHARED) > > > +#define PAGE_GCS_RO __pgprot((_PAGE_GCS_RO & ~PTE_SHARED) | PTE_MAYBE_SHARED) > > > #define PIE_E0 ( \ > > > PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS), PIE_GCS) | \ > > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > > > index 08ee177432c2..2e404441063b 100644 > > > --- a/arch/arm64/mm/mmap.c > > > +++ b/arch/arm64/mm/mmap.c > > > @@ -87,7 +87,7 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > > > /* Short circuit GCS to avoid bloating the table. */ > > > if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { > > > - prot = _PAGE_GCS_RO; > > > + prot = pgprot_val(PAGE_GCS_RO); > > > } else { > > > prot = pgprot_val(protection_map[vm_flags & > > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > > > > The only confusion I have is why we don't update _PAGE_GCS/_PAGE_GCS_RO, > > consequently leaving PTE_SHARED set for the other users of > > _PAGE_GCS/_PAGE_GCS_RO in arch/arm64/include/asm/pgtable-prot.h. > > > > Staring at pte_pi_index() (and the definitions of PTE_PI_IDX_0), I assume it > > doesn't matter. > > > > Just curious why we don't fixup _PAGE_GCS / _PAGE_GCS_RO instead. > > _PAGE_GCS needs to be constant as it ends up in asm, so we can't add > the dynamic PTE_MAYBE_SHARED. There are other ways to solve this but it > is somewhat more consistent with the other _PAGE_* definitions which all > have PTE_SHARED. Hmm, it's only in asm-offsets and it looks like the compiler didn't complain. I'll check the generated code. -- Catalin ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing 2026-02-20 14:05 [PATCH 0/3] arm64: Assorted GCS fixes Catalin Marinas 2026-02-20 14:05 ` [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled Catalin Marinas @ 2026-02-20 14:05 ` Catalin Marinas 2026-02-20 16:16 ` David Hildenbrand (Arm) 2026-02-20 14:05 ` [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP Catalin Marinas 2 siblings, 1 reply; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 14:05 UTC (permalink / raw) To: linux-arm-kernel Cc: Mark Brown, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland vm_get_page_prot() short-circuits the protection_map[] lookup for a VM_SHADOW_STACK mapping since its permissions are special. However, it also ignores PAGE_NONE mappings used for NUMA balancing by creating an accessible PTE. Special-case the VM_NONE permission to create an invalid PTE even if it is a GCS mapping. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@kernel.org> --- arch/arm64/mm/mmap.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 2e404441063b..f8993e3fa5d1 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -87,7 +87,15 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) /* Short circuit GCS to avoid bloating the table. */ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { - prot = pgprot_val(PAGE_GCS_RO); + /* + * Allow PAGE_NONE for NUMA balancing, otherwise use + * PAGE_GCS_RO. The permission will be made writeable + * (PAGE_GCS) on a GCS fault. + */ + if (vm_flags & (VM_READ | VM_WRITE)) + prot = pgprot_val(PAGE_GCS_RO); + else + prot = pgprot_val(protection_map[VM_NONE]); } else { prot = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing 2026-02-20 14:05 ` [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing Catalin Marinas @ 2026-02-20 16:16 ` David Hildenbrand (Arm) 2026-02-20 19:52 ` Catalin Marinas 0 siblings, 1 reply; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-20 16:16 UTC (permalink / raw) To: Catalin Marinas, linux-arm-kernel Cc: Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On 2/20/26 15:05, Catalin Marinas wrote: > vm_get_page_prot() short-circuits the protection_map[] lookup for a > VM_SHADOW_STACK mapping since its permissions are special. However, it > also ignores PAGE_NONE mappings used for NUMA balancing by creating an > accessible PTE. > > Special-case the VM_NONE permission to create an invalid PTE even if it > is a GCS mapping. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") > Cc: <stable@vger.kernel.org> > Cc: Mark Brown <broonie@kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > --- > arch/arm64/mm/mmap.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > index 2e404441063b..f8993e3fa5d1 100644 > --- a/arch/arm64/mm/mmap.c > +++ b/arch/arm64/mm/mmap.c > @@ -87,7 +87,15 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > > /* Short circuit GCS to avoid bloating the table. */ > if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { > - prot = pgprot_val(PAGE_GCS_RO); > + /* > + * Allow PAGE_NONE for NUMA balancing, otherwise use > + * PAGE_GCS_RO. The permission will be made writeable > + * (PAGE_GCS) on a GCS fault. > + */ > + if (vm_flags & (VM_READ | VM_WRITE)) Could consider using VM_ACCESS_FLAGS here. For Shadow stacks we'd never expect executable properties. > + prot = pgprot_val(PAGE_GCS_RO); > + else > + prot = pgprot_val(protection_map[VM_NONE]); change_protection() documents that "This is assuming that NUMA faults are handled using PROT_NONE. If an architecture makes a different choice, it will need further changes to the core." So task_numa_work()->change_prot_numa()->change_protection() passes "newprot = PAGE_NONE". Where is the vm_get_page_prot() called on that path where your change would make a difference? I'd thing that vm_get_page_prot() gets only invoked through a "proper" mpotect() in mprotect_fixup()->vma_set_page_prot()...->vm_get_page_prot(), not for NUMA hinting that leaves the VMA untouched. OTOH, I wonder whether mprotect(PROT_NONE) could trigger the path you thought of above. -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing 2026-02-20 16:16 ` David Hildenbrand (Arm) @ 2026-02-20 19:52 ` Catalin Marinas 0 siblings, 0 replies; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 19:52 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: linux-arm-kernel, Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On Fri, Feb 20, 2026 at 05:16:52PM +0100, David Hildenbrand wrote: > On 2/20/26 15:05, Catalin Marinas wrote: > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > > index 2e404441063b..f8993e3fa5d1 100644 > > --- a/arch/arm64/mm/mmap.c > > +++ b/arch/arm64/mm/mmap.c > > @@ -87,7 +87,15 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > > /* Short circuit GCS to avoid bloating the table. */ > > if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { > > - prot = pgprot_val(PAGE_GCS_RO); > > + /* > > + * Allow PAGE_NONE for NUMA balancing, otherwise use > > + * PAGE_GCS_RO. The permission will be made writeable > > + * (PAGE_GCS) on a GCS fault. > > + */ > > + if (vm_flags & (VM_READ | VM_WRITE)) > > Could consider using VM_ACCESS_FLAGS here. For Shadow stacks we'd never > expect executable properties. Yes, this is better. > > + prot = pgprot_val(PAGE_GCS_RO); > > + else > > + prot = pgprot_val(protection_map[VM_NONE]); > > change_protection() documents that "This is assuming that NUMA faults are > handled using PROT_NONE. If an architecture makes a different choice, it > will need further changes to the core." > > So task_numa_work()->change_prot_numa()->change_protection() passes "newprot > = PAGE_NONE". > > Where is the vm_get_page_prot() called on that path where your change would > make a difference? > > I'd thing that vm_get_page_prot() gets only invoked through a "proper" > mpotect() in mprotect_fixup()->vma_set_page_prot()...->vm_get_page_prot(), > not for NUMA hinting that leaves the VMA untouched. > > OTOH, I wonder whether mprotect(PROT_NONE) could trigger the path you > thought of above. I started with the mprotect(PROT_NONE) in mind but thought the NUMA case is a better argument. You are right, it doesn't use the same path. I need to check what we do with mprotect(PROT_NONE). If it's not rejected somewhere on the path to change_protect(), we end up with an accessible GCS mapping. Maybe it doesn't matter much but I'd rather have the access disabled. Anyway, I'll write some test next week to see what it does. The above comment will need to be changed. Thanks for the review. -- Catalin ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 14:05 [PATCH 0/3] arm64: Assorted GCS fixes Catalin Marinas 2026-02-20 14:05 ` [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled Catalin Marinas 2026-02-20 14:05 ` [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing Catalin Marinas @ 2026-02-20 14:05 ` Catalin Marinas 2026-02-20 14:34 ` Mark Brown 2026-02-20 15:33 ` David Hildenbrand (Arm) 2 siblings, 2 replies; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 14:05 UTC (permalink / raw) To: linux-arm-kernel Cc: Mark Brown, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland The default GCS size allocated on first prctl() for the main thread or subsequently on clone() is either half of RLIMIT_STACK or half of a thread's stack size. Both of these are likely to be suitable for a THP allocation and the kernel is more aggressive in creating such mappings. However, it does not make much sense to use a huge page as it didn't make sense for the normal stacks either. See commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE"). Force VM_NOHUGEPAGE when allocating/mapping the GCS. As per commit 7190b3c8bd2b ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled"), only pass this flag if TRANSPARENT_HUGEPAGE is enabled as not to confuse CRIU tools. While at it, use the PROT_WRITE prot argument rather than the VM_WRITE flag when calling do_mmap(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: David Hildenbrand <david@kernel.org> --- arch/arm64/mm/gcs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c index 6e93f78de79b..bbdb62ae47cd 100644 --- a/arch/arm64/mm/gcs.c +++ b/arch/arm64/mm/gcs.c @@ -13,15 +13,19 @@ static unsigned long alloc_gcs(unsigned long addr, unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; + vm_flags_t vm_flags = VM_SHADOW_STACK; struct mm_struct *mm = current->mm; unsigned long mapped_addr, unused; if (addr) flags |= MAP_FIXED_NOREPLACE; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + vm_flags |= VM_NOHUGEPAGE; + mmap_write_lock(mm); - mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + mapped_addr = do_mmap(NULL, addr, size, PROT_READ | PROT_WRITE, + flags, vm_flags, 0, &unused, NULL); mmap_write_unlock(mm); return mapped_addr; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 14:05 ` [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP Catalin Marinas @ 2026-02-20 14:34 ` Mark Brown 2026-02-20 15:13 ` Catalin Marinas 2026-02-20 15:33 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 14+ messages in thread From: Mark Brown @ 2026-02-20 14:34 UTC (permalink / raw) To: Catalin Marinas Cc: linux-arm-kernel, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland [-- Attachment #1: Type: text/plain, Size: 1374 bytes --] On Fri, Feb 20, 2026 at 02:05:31PM +0000, Catalin Marinas wrote: > The default GCS size allocated on first prctl() for the main thread or > subsequently on clone() is either half of RLIMIT_STACK or half of a > thread's stack size. Both of these are likely to be suitable for a THP > allocation and the kernel is more aggressive in creating such mappings. > However, it does not make much sense to use a huge page as it didn't > make sense for the normal stacks either. See commit c4608d1bf7c6 ("mm: > mmap: map MAP_STACK to VM_NOHUGEPAGE"). > Force VM_NOHUGEPAGE when allocating/mapping the GCS. As per commit > 7190b3c8bd2b ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is > enabled"), only pass this flag if TRANSPARENT_HUGEPAGE is enabled as not > to confuse CRIU tools. I agree that this is sensible however I'm fairly sure this will also apply to the other shadow stack implementations so I think it would be better to either do it cross architecture (ideally factoring this out of the arch code entirely) or put a note in the commit log that it's likely going to apply to other architectures. There's a bunch of stuff that we should start factoring out into common code now that RISC-V landed and it looks like the clone3() stuff ran it's course, we should make it as easy as possible to understand why any divergences we're adding. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 14:34 ` Mark Brown @ 2026-02-20 15:13 ` Catalin Marinas 2026-02-20 16:17 ` Mark Brown 0 siblings, 1 reply; 14+ messages in thread From: Catalin Marinas @ 2026-02-20 15:13 UTC (permalink / raw) To: Mark Brown Cc: linux-arm-kernel, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland On Fri, Feb 20, 2026 at 02:34:08PM +0000, Mark Brown wrote: > On Fri, Feb 20, 2026 at 02:05:31PM +0000, Catalin Marinas wrote: > > The default GCS size allocated on first prctl() for the main thread or > > subsequently on clone() is either half of RLIMIT_STACK or half of a > > thread's stack size. Both of these are likely to be suitable for a THP > > allocation and the kernel is more aggressive in creating such mappings. > > However, it does not make much sense to use a huge page as it didn't > > make sense for the normal stacks either. See commit c4608d1bf7c6 ("mm: > > mmap: map MAP_STACK to VM_NOHUGEPAGE"). > > > Force VM_NOHUGEPAGE when allocating/mapping the GCS. As per commit > > 7190b3c8bd2b ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is > > enabled"), only pass this flag if TRANSPARENT_HUGEPAGE is enabled as not > > to confuse CRIU tools. > > I agree that this is sensible however I'm fairly sure this will also > apply to the other shadow stack implementations so I think it would be > better to either do it cross architecture (ideally factoring this out of > the arch code entirely) or put a note in the commit log that it's likely > going to apply to other architectures. There's a bunch of stuff that we > should start factoring out into common code now that RISC-V landed and > it looks like the clone3() stuff ran it's course, we should make it as > easy as possible to understand why any divergences we're adding. Something like below (not tested yet and not addressing riscv, waiting for -rc1): diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c index bbdb62ae47cd..21ed78d129de 100644 --- a/arch/arm64/mm/gcs.c +++ b/arch/arm64/mm/gcs.c @@ -12,23 +12,7 @@ static unsigned long alloc_gcs(unsigned long addr, unsigned long size) { - int flags = MAP_ANONYMOUS | MAP_PRIVATE; - vm_flags_t vm_flags = VM_SHADOW_STACK; - struct mm_struct *mm = current->mm; - unsigned long mapped_addr, unused; - - if (addr) - flags |= MAP_FIXED_NOREPLACE; - - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) - vm_flags |= VM_NOHUGEPAGE; - - mmap_write_lock(mm); - mapped_addr = do_mmap(NULL, addr, size, PROT_READ | PROT_WRITE, - flags, vm_flags, 0, &unused, NULL); - mmap_write_unlock(mm); - - return mapped_addr; + return vm_mmap_shadow_stack(addr, size, 0); } static unsigned long gcs_size(unsigned long size) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 978232b6d48d..9725e7d89b1e 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -100,17 +100,9 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) static unsigned long alloc_shstk(unsigned long addr, unsigned long size, unsigned long token_offset, bool set_res_tok) { - int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; - struct mm_struct *mm = current->mm; - unsigned long mapped_addr, unused; + unsigned long mapped_addr; - if (addr) - flags |= MAP_FIXED_NOREPLACE; - - mmap_write_lock(mm); - mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); - mmap_write_unlock(mm); + mapped_addr = vm_mmap_shadow_stack(addr, size, MAP_ABOVE4G); if (!set_res_tok || IS_ERR_VALUE(mapped_addr)) goto out; diff --git a/include/linux/mm.h b/include/linux/mm.h index f0d5be9dc736..4bde7539adc8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3711,6 +3711,8 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} /* This takes the mm semaphore itself */ extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); extern int vm_munmap(unsigned long, size_t); +extern unsigned long __must_check vm_mmap_shadow_stack(unsigned long addr, + unsigned long len, unsigned long flags); extern unsigned long __must_check vm_mmap(struct file *, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/mm/util.c b/mm/util.c index 97cae40c0209..5c0d92f52157 100644 --- a/mm/util.c +++ b/mm/util.c @@ -588,6 +588,24 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, return ret; } +unsigned long vm_mmap_shadow_stack(unsigned long addr, unsigned long len, + unsigned long flags) +{ + struct mm_struct *mm = current->mm; + unsigned long ret, unused; + + flags |= MAP_ANONYMOUS | MAP_PRIVATE; + if (addr) + flags |= MAP_FIXED_NOREPLACE; + + mmap_write_lock(mm); + ret = do_mmap(NULL, addr, len, PROT_READ | PROT_WRITE, flags, + VM_SHADOW_STACK | VM_NOHUGEPAGE, 0, &unused, NULL); + mmap_write_unlock(mm); + + return ret; +} + /* * Perform a userland memory mapping into the current process address space. See * the comment for do_mmap() for more details on this operation in general. -- Catalin ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 15:13 ` Catalin Marinas @ 2026-02-20 16:17 ` Mark Brown 0 siblings, 0 replies; 14+ messages in thread From: Mark Brown @ 2026-02-20 16:17 UTC (permalink / raw) To: Catalin Marinas Cc: linux-arm-kernel, Will Deacon, David Hildenbrand, Emanuele Rocca, Mark Rutland [-- Attachment #1: Type: text/plain, Size: 537 bytes --] On Fri, Feb 20, 2026 at 03:13:51PM +0000, Catalin Marinas wrote: > On Fri, Feb 20, 2026 at 02:34:08PM +0000, Mark Brown wrote: > > I agree that this is sensible however I'm fairly sure this will also > > apply to the other shadow stack implementations so I think it would be > > better to either do it cross architecture (ideally factoring this out of > > the arch code entirely) or put a note in the commit log that it's likely > Something like below (not tested yet and not addressing riscv, waiting > for -rc1): LGTM. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 14:05 ` [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP Catalin Marinas 2026-02-20 14:34 ` Mark Brown @ 2026-02-20 15:33 ` David Hildenbrand (Arm) 2026-02-20 15:36 ` Mark Brown 1 sibling, 1 reply; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-20 15:33 UTC (permalink / raw) To: Catalin Marinas, linux-arm-kernel Cc: Mark Brown, Will Deacon, Emanuele Rocca, Mark Rutland On 2/20/26 15:05, Catalin Marinas wrote: > The default GCS size allocated on first prctl() for the main thread or > subsequently on clone() is either half of RLIMIT_STACK or half of a > thread's stack size. Both of these are likely to be suitable for a THP > allocation and the kernel is more aggressive in creating such mappings. > However, it does not make much sense to use a huge page as it didn't > make sense for the normal stacks either. See commit c4608d1bf7c6 ("mm: > mmap: map MAP_STACK to VM_NOHUGEPAGE"). Agreed. At least when it comes to PMD THPs. > > Force VM_NOHUGEPAGE when allocating/mapping the GCS. As per commit > 7190b3c8bd2b ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is > enabled"), only pass this flag if TRANSPARENT_HUGEPAGE is enabled as not > to confuse CRIU tools. I was briefly concerned that we wouldn't even have PMD support to handle shadow stack, but turns out we do have pmd_mkwrite() that consumes a VMA to handle it. > > While at it, use the PROT_WRITE prot argument rather than the VM_WRITE > flag when calling do_mmap(). That LGTM as well. Agreed with trying to let common code deal with that. I'll note that madvise() would be able to turn THPs back on; I assume that's okay. -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP 2026-02-20 15:33 ` David Hildenbrand (Arm) @ 2026-02-20 15:36 ` Mark Brown 0 siblings, 0 replies; 14+ messages in thread From: Mark Brown @ 2026-02-20 15:36 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Catalin Marinas, linux-arm-kernel, Will Deacon, Emanuele Rocca, Mark Rutland [-- Attachment #1: Type: text/plain, Size: 249 bytes --] On Fri, Feb 20, 2026 at 04:33:12PM +0100, David Hildenbrand (Arm) wrote: > I'll note that madvise() would be able to turn THPs back on; I assume that's > okay. It's not a functional problem so if people want to do that it seems like it's on them. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-02-20 19:53 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-20 14:05 [PATCH 0/3] arm64: Assorted GCS fixes Catalin Marinas 2026-02-20 14:05 ` [PATCH 1/3] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled Catalin Marinas 2026-02-20 15:56 ` David Hildenbrand (Arm) 2026-02-20 16:45 ` Catalin Marinas 2026-02-20 16:47 ` Catalin Marinas 2026-02-20 14:05 ` [PATCH 2/3] arm64: gcs: Allow PAGE_NONE mappings for NUMA balancing Catalin Marinas 2026-02-20 16:16 ` David Hildenbrand (Arm) 2026-02-20 19:52 ` Catalin Marinas 2026-02-20 14:05 ` [PATCH 3/3] arm64: gcs: Do not map the guarded control stack as THP Catalin Marinas 2026-02-20 14:34 ` Mark Brown 2026-02-20 15:13 ` Catalin Marinas 2026-02-20 16:17 ` Mark Brown 2026-02-20 15:33 ` David Hildenbrand (Arm) 2026-02-20 15:36 ` Mark Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox