public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] KASAN: HW_TAGS: Disable tagging for stack and page-tables
@ 2026-03-24 13:26 Muhammad Usama Anjum
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Muhammad Usama Anjum @ 2026-03-24 13:26 UTC (permalink / raw)
  To: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Catalin Marinas, Will Deacon, Ryan.Roberts,
	david.hildenbrand
  Cc: Muhammad Usama Anjum

Stacks and page tables are always accessed with the match‑all tag,
so assigning a new random tag every time at allocation and setting
invalid tag at deallocation time, just adds overhead without improving
the detection.

With __GFP_SKIP_KASAN the page keeps its poison tag and KASAN_TAG_KERNEL
(match-all tag) is stored in the page flags while keeping the poison tag
in the hardware. The benefit of it is that 256 tag setting instruction
per 4 kB page aren't needed at allocation and deallocation time.

Thus match‑all pointers still work, while non‑match tags (other than
poison tag) still fault.

__GFP_SKIP_KASAN only skips for KASAN_HW_TAGS mode, so coverage is
unchanged.

Benchmark:
The benchmark has two modes. In thread mode, the child process forks
and creates N threads. In pgtable mode, the parent maps and faults a
specified memory size and then forks repeatedly with children exiting
immediately.

Thread benchmark:
2000 iterations, 2000 threads:	2.575 s → 2.229 s (~13.4% faster)

The pgtable samples:
- 2048 MB, 2000 iters		19.08 s → 17.62 s (~7.6% faster)
---
Changes since v1: (summary only)
- Update description/title
- Patch 1: Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
- Patch 2: Specify _GFP_SKIP_KASAN in THREADINFO_GFP and GFP_VMAP_STACK

Muhammad Usama Anjum (3):
  vmalloc: add __GFP_SKIP_KASAN support
  kasan: skip HW tagging for all kernel thread stacks
  mm: skip KASAN tagging for page-allocated page tables

 include/asm-generic/pgalloc.h |  2 +-
 include/linux/thread_info.h   |  2 +-
 kernel/fork.c                 |  5 +++--
 mm/vmalloc.c                  | 11 ++++++++---
 4 files changed, 13 insertions(+), 7 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-03-24 13:26 [PATCH v2 0/3] KASAN: HW_TAGS: Disable tagging for stack and page-tables Muhammad Usama Anjum
@ 2026-03-24 13:26 ` Muhammad Usama Anjum
  2026-04-10 18:10   ` Catalin Marinas
                     ` (2 more replies)
  2026-03-24 13:26 ` [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks Muhammad Usama Anjum
  2026-03-24 13:26 ` [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables Muhammad Usama Anjum
  2 siblings, 3 replies; 19+ messages in thread
From: Muhammad Usama Anjum @ 2026-03-24 13:26 UTC (permalink / raw)
  To: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Catalin Marinas, Will Deacon, Ryan.Roberts,
	david.hildenbrand
  Cc: Muhammad Usama Anjum

For allocations that will be accessed only with match-all pointers
(e.g., kernel stacks), setting tags is wasted work. If the caller
already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
returns early without tagging.

Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
APIs. So it wasn't being checked. Now its being checked and acted
upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
defined there.

This is a preparatory patch for optimizing kernel stack allocations.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
---
Changes since v1:
- Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
  is zero in non-hw-tags mode.
- Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
---
 mm/vmalloc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c607307c657a6..69ae205effb46 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				__GFP_NOFAIL | __GFP_ZERO |\
 				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
 				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
-				GFP_USER | __GFP_NOLOCKDEP)
+				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
 
 static gfp_t vmalloc_fix_flags(gfp_t flags)
 {
@@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
  *
  * %__GFP_NOWARN can be used to suppress failure messages.
  *
+ * %__GFP_SKIP_KASAN can be used to skip poisoning
+ *
  * Can not be called from interrupt nor NMI contexts.
  * Return: the address of the area or %NULL on failure
  */
@@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
 	 * kasan_unpoison_vmalloc().
 	 */
 	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
-		if (kasan_hw_tags_enabled()) {
+		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
+
+		if (kasan_hw_tags_enabled() && !skip_kasan) {
 			/*
 			 * Modify protection bits to allow tagging.
 			 * This must be done before mapping.
@@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
 		}
 
 		/* Take note that the mapping is PAGE_KERNEL. */
-		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
+		if (!skip_kasan)
+			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
 	}
 
 	/* Allocate physical pages and map them into vmalloc space. */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-03-24 13:26 [PATCH v2 0/3] KASAN: HW_TAGS: Disable tagging for stack and page-tables Muhammad Usama Anjum
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
@ 2026-03-24 13:26 ` Muhammad Usama Anjum
  2026-04-10 18:32   ` Catalin Marinas
  2026-03-24 13:26 ` [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables Muhammad Usama Anjum
  2 siblings, 1 reply; 19+ messages in thread
From: Muhammad Usama Anjum @ 2026-03-24 13:26 UTC (permalink / raw)
  To: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Catalin Marinas, Will Deacon, Ryan.Roberts,
	david.hildenbrand
  Cc: Muhammad Usama Anjum

HW-tag KASAN never checks kernel stacks because stack pointers carry the
match-all tag, so setting/poisoning tags is pure overhead.

- Add __GFP_SKIP_KASAN to THREADINFO_GFP so every stack allocator that
  uses it skips tagging (fork path plus arch users)
- Add __GFP_SKIP_KASAN to GFP_VMAP_STACK for the fork-specific vmap
  stacks.
- When reusing cached vmap stacks, skip kasan_unpoison_range() if HW tags
  are enabled.

Software KASAN is unchanged; this only affects tag-based KASAN.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
---
Changes since v1:
- Specify _GFP_SKIP_KASAN in THREADINFO_GFP and GFP_VMAP_STACK to use
  it everywhere and cover the missed locations
- Update description
---
 include/linux/thread_info.h | 2 +-
 kernel/fork.c               | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 051e429026904..307b8390fc670 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -92,7 +92,7 @@ static inline long set_restart_fn(struct restart_block *restart,
 #define THREAD_ALIGN	THREAD_SIZE
 #endif
 
-#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
+#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_SKIP_KASAN)
 
 /*
  * flag set/clear/test wrappers
diff --git a/kernel/fork.c b/kernel/fork.c
index bb0c2613a5604..4bc7a03662109 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -207,7 +207,7 @@ static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
  * accounting is performed by the code assigning/releasing stacks to tasks.
  * We need a zeroed memory without __GFP_ACCOUNT.
  */
-#define GFP_VMAP_STACK (GFP_KERNEL | __GFP_ZERO)
+#define GFP_VMAP_STACK (GFP_KERNEL | __GFP_ZERO | __GFP_SKIP_KASAN)
 
 struct vm_stack {
 	struct rcu_head rcu;
@@ -345,7 +345,8 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node)
 		}
 
 		/* Reset stack metadata. */
-		kasan_unpoison_range(vm_area->addr, THREAD_SIZE);
+		if (!kasan_hw_tags_enabled())
+			kasan_unpoison_range(vm_area->addr, THREAD_SIZE);
 
 		stack = kasan_reset_tag(vm_area->addr);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables
  2026-03-24 13:26 [PATCH v2 0/3] KASAN: HW_TAGS: Disable tagging for stack and page-tables Muhammad Usama Anjum
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
  2026-03-24 13:26 ` [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks Muhammad Usama Anjum
@ 2026-03-24 13:26 ` Muhammad Usama Anjum
  2026-04-10 18:19   ` Catalin Marinas
  2026-04-16  8:55   ` David Hildenbrand (Arm)
  2 siblings, 2 replies; 19+ messages in thread
From: Muhammad Usama Anjum @ 2026-03-24 13:26 UTC (permalink / raw)
  To: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Catalin Marinas, Will Deacon, Ryan.Roberts,
	david.hildenbrand
  Cc: Muhammad Usama Anjum, Ryan Roberts

Page tables are always accessed via the linear mapping with a match-all
tag, so HW-tag KASAN never checks them. For page-allocated tables (PTEs
and PGDs etc), avoid the tag setup and poisoning overhead by using
__GFP_SKIP_KASAN. SLUB-backed page tables are unchanged for now. (They
aren't widely used and require more SLUB related skip logic. Leave it
later.)

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
---
Changes since v1:
- Update description to mention SLUB-backed page tables are unchanged
---
 include/asm-generic/pgalloc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 57137d3ac1592..051aa1331051c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -4,7 +4,7 @@
 
 #ifdef CONFIG_MMU
 
-#define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO)
+#define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO | __GFP_SKIP_KASAN)
 #define GFP_PGTABLE_USER	(GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
 
 /**
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
@ 2026-04-10 18:10   ` Catalin Marinas
  2026-04-16  9:10   ` David Hildenbrand
  2026-04-22 13:21   ` Ryan Roberts
  2 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2026-04-10 18:10 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Will Deacon, Ryan.Roberts, david.hildenbrand

On Tue, Mar 24, 2026 at 01:26:27PM +0000, Muhammad Usama Anjum wrote:
> For allocations that will be accessed only with match-all pointers
> (e.g., kernel stacks), setting tags is wasted work. If the caller
> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
> returns early without tagging.
> 
> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
> APIs. So it wasn't being checked. Now its being checked and acted
> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
> defined there.
> 
> This is a preparatory patch for optimizing kernel stack allocations.
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
> ---
> Changes since v1:
> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>   is zero in non-hw-tags mode.
> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
> ---
>  mm/vmalloc.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index c607307c657a6..69ae205effb46 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				__GFP_NOFAIL | __GFP_ZERO |\
>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> -				GFP_USER | __GFP_NOLOCKDEP)
> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>  
>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>  {
> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>   *
>   * %__GFP_NOWARN can be used to suppress failure messages.
>   *
> + * %__GFP_SKIP_KASAN can be used to skip poisoning
> + *
>   * Can not be called from interrupt nor NMI contexts.
>   * Return: the address of the area or %NULL on failure
>   */
> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>  	 * kasan_unpoison_vmalloc().
>  	 */
>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
> -		if (kasan_hw_tags_enabled()) {
> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
> +
> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>  			/*
>  			 * Modify protection bits to allow tagging.
>  			 * This must be done before mapping.
> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>  		}
>  
>  		/* Take note that the mapping is PAGE_KERNEL. */
> -		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
> +		if (!skip_kasan)
> +			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>  	}

In the cover letter, you said that __GFP_SKIP_KASAN is only meant for
KASAN_HW_TAGS. IIUC, here you skip passing KASAN_VMALLOC_PROT_NORMAL
even for KASAN_SW_TAGS. The flag is used in mm/kasan/shadow.c.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables
  2026-03-24 13:26 ` [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables Muhammad Usama Anjum
@ 2026-04-10 18:19   ` Catalin Marinas
  2026-04-16  8:55   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2026-04-10 18:19 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Will Deacon, Ryan.Roberts, david.hildenbrand

On Tue, Mar 24, 2026 at 01:26:29PM +0000, Muhammad Usama Anjum wrote:
> Page tables are always accessed via the linear mapping with a match-all
> tag, so HW-tag KASAN never checks them. For page-allocated tables (PTEs
> and PGDs etc), avoid the tag setup and poisoning overhead by using
> __GFP_SKIP_KASAN. SLUB-backed page tables are unchanged for now. (They
> aren't widely used and require more SLUB related skip logic. Leave it
> later.)
> 
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-03-24 13:26 ` [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks Muhammad Usama Anjum
@ 2026-04-10 18:32   ` Catalin Marinas
  2026-04-10 18:36     ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2026-04-10 18:32 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Will Deacon, Ryan.Roberts, david.hildenbrand

On Tue, Mar 24, 2026 at 01:26:28PM +0000, Muhammad Usama Anjum wrote:
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index 051e429026904..307b8390fc670 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -92,7 +92,7 @@ static inline long set_restart_fn(struct restart_block *restart,
>  #define THREAD_ALIGN	THREAD_SIZE
>  #endif
>  
> -#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
> +#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_SKIP_KASAN)
>  
>  /*
>   * flag set/clear/test wrappers
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bb0c2613a5604..4bc7a03662109 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -207,7 +207,7 @@ static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
>   * accounting is performed by the code assigning/releasing stacks to tasks.
>   * We need a zeroed memory without __GFP_ACCOUNT.
>   */
> -#define GFP_VMAP_STACK (GFP_KERNEL | __GFP_ZERO)
> +#define GFP_VMAP_STACK (GFP_KERNEL | __GFP_ZERO | __GFP_SKIP_KASAN)
>  
>  struct vm_stack {
>  	struct rcu_head rcu;
> @@ -345,7 +345,8 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node)
>  		}
>  
>  		/* Reset stack metadata. */
> -		kasan_unpoison_range(vm_area->addr, THREAD_SIZE);
> +		if (!kasan_hw_tags_enabled())
> +			kasan_unpoison_range(vm_area->addr, THREAD_SIZE);
>  
>  		stack = kasan_reset_tag(vm_area->addr);

I wonder, since with kasan_reset_tag() returns a match-all pointer even
with KASAN_SW_TAGS, is it worth unpoisoning the range (unless it somehow
interferes with vfree() but I couldn't see how).

What the original approach might help with is use-after-realloc in case
we had a tagged pointer in a past life of a page and it still works now.
Oh well, that's I guess for other types of hardening to address like
delayed reallocation.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-04-10 18:32   ` Catalin Marinas
@ 2026-04-10 18:36     ` Catalin Marinas
  2026-04-16  9:03       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2026-04-10 18:36 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Uladzislau Rezki, linux-arch, linux-kernel, linux-mm,
	Andrey Konovalov, Marco Elver, Vincenzo Frascino,
	Peter Collingbourne, Will Deacon, Ryan.Roberts, david.hildenbrand

On Fri, Apr 10, 2026 at 07:32:23PM +0100, Catalin Marinas wrote:
> What the original approach might help with is use-after-realloc in case
> we had a tagged pointer in a past life of a page and it still works now.
> Oh well, that's I guess for other types of hardening to address like
> delayed reallocation.

Another thought (for a separate series) - we could try to map the stack
as Untagged (unless stack tagging is enabled; needs compiler
instrumentation) and enable canonical tag checking (newer addition to
MTE). This way, any stray tagged pointer won't work on the stack since
it needs a 0xf tag (canonical).

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables
  2026-03-24 13:26 ` [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables Muhammad Usama Anjum
  2026-04-10 18:19   ` Catalin Marinas
@ 2026-04-16  8:55   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-16  8:55 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
	Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki,
	linux-arch, linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, Ryan.Roberts, david.hildenbrand

On 3/24/26 14:26, Muhammad Usama Anjum wrote:
> Page tables are always accessed via the linear mapping with a match-all
> tag, so HW-tag KASAN never checks them. For page-allocated tables (PTEs
> and PGDs etc), avoid the tag setup and poisoning overhead by using
> __GFP_SKIP_KASAN. SLUB-backed page tables are unchanged for now. (They
> aren't widely used and require more SLUB related skip logic. Leave it
> later.)
> 
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
> ---
> Changes since v1:
> - Update description to mention SLUB-backed page tables are unchanged
> ---
>  include/asm-generic/pgalloc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index 57137d3ac1592..051aa1331051c 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -4,7 +4,7 @@
>  
>  #ifdef CONFIG_MMU
>  
> -#define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO)
> +#define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO | __GFP_SKIP_KASAN)
>  #define GFP_PGTABLE_USER	(GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-04-10 18:36     ` Catalin Marinas
@ 2026-04-16  9:03       ` David Hildenbrand (Arm)
  2026-04-17  8:31         ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-16  9:03 UTC (permalink / raw)
  To: Catalin Marinas, Muhammad Usama Anjum
  Cc: Arnd Bergmann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Kees Cook, Andrew Morton,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Will Deacon, Ryan.Roberts,
	david.hildenbrand

On 4/10/26 20:36, Catalin Marinas wrote:
> On Fri, Apr 10, 2026 at 07:32:23PM +0100, Catalin Marinas wrote:
>> What the original approach might help with is use-after-realloc in case
>> we had a tagged pointer in a past life of a page and it still works now.
>> Oh well, that's I guess for other types of hardening to address like
>> delayed reallocation.
> 
> Another thought (for a separate series) - we could try to map the stack
> as Untagged (unless stack tagging is enabled; needs compiler
> instrumentation) and enable canonical tag checking (newer addition to
> MTE). This way, any stray tagged pointer won't work on the stack since
> it needs a 0xf tag (canonical).

Do you mean mapping it as Untagged in the vmap for CONFIG_VMAP_STACK or
also as Untagged in the directmap?

The latter brings in the set of problems with direct map fragmentation.
-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
  2026-04-10 18:10   ` Catalin Marinas
@ 2026-04-16  9:10   ` David Hildenbrand
  2026-04-22 13:21   ` Ryan Roberts
  2 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2026-04-16  9:10 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, Ryan.Roberts

On 3/24/26 14:26, Muhammad Usama Anjum wrote:
> For allocations that will be accessed only with match-all pointers
> (e.g., kernel stacks), setting tags is wasted work. If the caller
> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
> returns early without tagging.
> 
> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
> APIs. So it wasn't being checked. Now its being checked and acted
> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
> defined there.
> 
> This is a preparatory patch for optimizing kernel stack allocations.
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
> ---
> Changes since v1:
> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>   is zero in non-hw-tags mode.
> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
> ---
>  mm/vmalloc.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index c607307c657a6..69ae205effb46 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				__GFP_NOFAIL | __GFP_ZERO |\
>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> -				GFP_USER | __GFP_NOLOCKDEP)
> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>  
>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>  {
> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>   *
>   * %__GFP_NOWARN can be used to suppress failure messages.
>   *
> + * %__GFP_SKIP_KASAN can be used to skip poisoning
> + *
>   * Can not be called from interrupt nor NMI contexts.
>   * Return: the address of the area or %NULL on failure
>   */
> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>  	 * kasan_unpoison_vmalloc().
>  	 */
>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
> -		if (kasan_hw_tags_enabled()) {
> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
> +
> +		if (kasan_hw_tags_enabled() && !skip_kasan) {

This code gets ever more ugly. :)

After I spotted the horrible ___GFP_SKIP_ZERO that shouldn't even exist,
I thought about teaching vmalloc.c to use a sub-allocator interface to
the buddy instead, where we would essentially say "leave zeroing and
KASAN to the sub-allocator": vmalloc.

Then, we'd get rid of ___GFP_SKIP_ZERO and just use __GFP_SKIP_KASAN to
decide ourselves here what to do with KASAN.

I tried to implement that, but that SW KASAN / !KASAN handling messes
with my brain. :)

In particular, the order for HW KASAN is currently:

a) Allocate pages *and map them*.

b) Zero the pages

That means that we have temporarily unzeroed pages mapped there. I don't
know if that's problematic, but it's one of the differences to SW KASAN
/ ! KASAN handling here.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-04-16  9:03       ` David Hildenbrand (Arm)
@ 2026-04-17  8:31         ` Catalin Marinas
  2026-04-22 13:31           ` Ryan Roberts
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2026-04-17  8:31 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
	Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki,
	linux-arch, linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Will Deacon, Ryan.Roberts,
	david.hildenbrand

On Thu, Apr 16, 2026 at 11:03:46AM +0200, David Hildenbrand wrote:
> On 4/10/26 20:36, Catalin Marinas wrote:
> > On Fri, Apr 10, 2026 at 07:32:23PM +0100, Catalin Marinas wrote:
> >> What the original approach might help with is use-after-realloc in case
> >> we had a tagged pointer in a past life of a page and it still works now.
> >> Oh well, that's I guess for other types of hardening to address like
> >> delayed reallocation.
> > 
> > Another thought (for a separate series) - we could try to map the stack
> > as Untagged (unless stack tagging is enabled; needs compiler
> > instrumentation) and enable canonical tag checking (newer addition to
> > MTE). This way, any stray tagged pointer won't work on the stack since
> > it needs a 0xf tag (canonical).
> 
> Do you mean mapping it as Untagged in the vmap for CONFIG_VMAP_STACK or
> also as Untagged in the directmap?
> 
> The latter brings in the set of problems with direct map fragmentation.

Just the vmap, there are a lot more problems with the direct map. Not
sure how much it does in terms of security, maybe marginally. A
match-all tag (0xf) would still be able to access the canonically tagged
memory.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
  2026-04-10 18:10   ` Catalin Marinas
  2026-04-16  9:10   ` David Hildenbrand
@ 2026-04-22 13:21   ` Ryan Roberts
  2026-04-22 14:23     ` Dev Jain
  2 siblings, 1 reply; 19+ messages in thread
From: Ryan Roberts @ 2026-04-22 13:21 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, david.hildenbrand

On 24/03/2026 13:26, Muhammad Usama Anjum wrote:
> For allocations that will be accessed only with match-all pointers
> (e.g., kernel stacks), setting tags is wasted work. If the caller
> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
> returns early without tagging.
> 
> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
> APIs. So it wasn't being checked. Now its being checked and acted
> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
> defined there.
> 
> This is a preparatory patch for optimizing kernel stack allocations.
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
> ---
> Changes since v1:
> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>   is zero in non-hw-tags mode.
> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
> ---
>  mm/vmalloc.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index c607307c657a6..69ae205effb46 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				__GFP_NOFAIL | __GFP_ZERO |\
>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> -				GFP_USER | __GFP_NOLOCKDEP)
> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>  
>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>  {
> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>   *
>   * %__GFP_NOWARN can be used to suppress failure messages.
>   *
> + * %__GFP_SKIP_KASAN can be used to skip poisoning

You mean skip *un*poisoning, I think? But you would only want this to apply to
the actaul pages mapped by vmalloc. You wouldn't want to skip unpoisoning for
any allocated meta data; I think that is currently possible since the gfp_flags
that are passed into __vmalloc_node_range_noprof() are passed down to
__get_vm_area_node() unmdified. You probably want to explicitly ensure
__GFP_SKIP_KASAN is clear for that internal call?

> + *
>   * Can not be called from interrupt nor NMI contexts.
>   * Return: the address of the area or %NULL on failure
>   */
> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>  	 * kasan_unpoison_vmalloc().
>  	 */
>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
> -		if (kasan_hw_tags_enabled()) {
> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
> +
> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>  			/*
>  			 * Modify protection bits to allow tagging.
>  			 * This must be done before mapping.
> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>  		}
>  
>  		/* Take note that the mapping is PAGE_KERNEL. */
> -		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
> +		if (!skip_kasan)
> +			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;

It's pretty ugly to use the absence of this flag to rely on
kasan_unpoison_vmalloc() not unpoisoning. Perhaps it is preferable to just not
call kasan_unpoison_vmalloc() for the skip_kasan case?

>  	}
>  
>  	/* Allocate physical pages and map them into vmalloc space. */

Perhaps something like this would work:

---8<---
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c31a8615a8328..c340db141df57 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3979,6 +3979,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
  * under moderate memory pressure.
  *
  * %__GFP_NOWARN can be used to suppress failure messages.
+
+ * %__GFP_SKIP_KASAN skip unpoisoning of mapped pages (when prot=PAGE_KERNEL).
  *
  * Can not be called from interrupt nor NMI contexts.
  * Return: the address of the area or %NULL on failure
@@ -3993,6 +3995,9 @@ void *__vmalloc_node_range_noprof(unsigned long size,
unsigned long align,
 	kasan_vmalloc_flags_t kasan_flags = KASAN_VMALLOC_NONE;
 	unsigned long original_align = align;
 	unsigned int shift = PAGE_SHIFT;
+	bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
+
+	gfp_mask &= ~__GFP_SKIP_KASAN;
  	if (WARN_ON_ONCE(!size))
 		return NULL;
@@ -4041,7 +4046,7 @@ void *__vmalloc_node_range_noprof(unsigned long size,
unsigned long align,
 	 * kasan_unpoison_vmalloc().
 	 */
 	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
-		if (kasan_hw_tags_enabled()) {
+		if (kasan_hw_tags_enabled() && !skip_kasan) {
 			/*
 			 * Modify protection bits to allow tagging.
 			 * This must be done before mapping.
@@ -4054,6 +4059,12 @@ void *__vmalloc_node_range_noprof(unsigned long size,
unsigned long align,
 			 * poisoned and zeroed by kasan_unpoison_vmalloc().
 			 */
 			gfp_mask |= __GFP_SKIP_KASAN | __GFP_SKIP_ZERO;
+		} else if (skip_kasan) {
+			/*
+			 * Skip page_alloc unpoisoning physical pages backing
+			 * VM_ALLOC mapping, as requested by caller.
+			 */
+			gfp_mask |= __GFP_SKIP_KASAN;
 		}
  		/* Take note that the mapping is PAGE_KERNEL. */
@@ -4078,7 +4089,8 @@ void *__vmalloc_node_range_noprof(unsigned long size,
unsigned long align,
 	    (gfp_mask & __GFP_SKIP_ZERO))
 		kasan_flags |= KASAN_VMALLOC_INIT;
 	/* KASAN_VMALLOC_PROT_NORMAL already set if required. */
-	area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
+	if (!skip_kasan)
+		area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
  	/*
 	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED

---8<---

Thanks,
Ryan


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-04-17  8:31         ` Catalin Marinas
@ 2026-04-22 13:31           ` Ryan Roberts
  2026-04-22 18:00             ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Ryan Roberts @ 2026-04-22 13:31 UTC (permalink / raw)
  To: Catalin Marinas, David Hildenbrand (Arm)
  Cc: Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
	Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki,
	linux-arch, linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Will Deacon,
	david.hildenbrand

On 17/04/2026 09:31, Catalin Marinas wrote:
> On Thu, Apr 16, 2026 at 11:03:46AM +0200, David Hildenbrand wrote:
>> On 4/10/26 20:36, Catalin Marinas wrote:
>>> On Fri, Apr 10, 2026 at 07:32:23PM +0100, Catalin Marinas wrote:
>>>> What the original approach might help with is use-after-realloc in case
>>>> we had a tagged pointer in a past life of a page and it still works now.
>>>> Oh well, that's I guess for other types of hardening to address like
>>>> delayed reallocation.
>>>
>>> Another thought (for a separate series) - we could try to map the stack
>>> as Untagged (unless stack tagging is enabled; needs compiler
>>> instrumentation) and enable canonical tag checking (newer addition to
>>> MTE). This way, any stray tagged pointer won't work on the stack since
>>> it needs a 0xf tag (canonical).
>>
>> Do you mean mapping it as Untagged in the vmap for CONFIG_VMAP_STACK or
>> also as Untagged in the directmap?
>>
>> The latter brings in the set of problems with direct map fragmentation.
> 
> Just the vmap, there are a lot more problems with the direct map. Not
> sure how much it does in terms of security, maybe marginally. A
> match-all tag (0xf) would still be able to access the canonically tagged
> memory.
> 

I think with the first patch in this series, we are alredy vmapping the stack
memory as untagged, right? vmalloc only calls arch_vmap_pgprot_tagged() if we
are not skipping kasan. So I think we already have this protection? (perhaps we
need to explicitly enable the canonical tag checks?)

Thanks,
Ryan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-04-22 13:21   ` Ryan Roberts
@ 2026-04-22 14:23     ` Dev Jain
  2026-04-22 14:38       ` Ryan Roberts
  0 siblings, 1 reply; 19+ messages in thread
From: Dev Jain @ 2026-04-22 14:23 UTC (permalink / raw)
  To: Ryan Roberts, Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Kees Cook, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, david.hildenbrand



On 22/04/26 6:51 pm, Ryan Roberts wrote:
> On 24/03/2026 13:26, Muhammad Usama Anjum wrote:
>> For allocations that will be accessed only with match-all pointers
>> (e.g., kernel stacks), setting tags is wasted work. If the caller
>> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
>> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
>> returns early without tagging.
>>
>> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
>> APIs. So it wasn't being checked. Now its being checked and acted
>> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
>> defined there.
>>
>> This is a preparatory patch for optimizing kernel stack allocations.
>>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
>> ---
>> Changes since v1:
>> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>>   is zero in non-hw-tags mode.
>> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
>> ---
>>  mm/vmalloc.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index c607307c657a6..69ae205effb46 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>  				__GFP_NOFAIL | __GFP_ZERO |\
>>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
>> -				GFP_USER | __GFP_NOLOCKDEP)
>> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>>  
>>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>>  {
>> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>   *
>>   * %__GFP_NOWARN can be used to suppress failure messages.
>>   *
>> + * %__GFP_SKIP_KASAN can be used to skip poisoning
> 
> You mean skip *un*poisoning, I think? But you would only want this to apply to
> the actaul pages mapped by vmalloc. You wouldn't want to skip unpoisoning for
> any allocated meta data; I think that is currently possible since the gfp_flags
> that are passed into __vmalloc_node_range_noprof() are passed down to
> __get_vm_area_node() unmdified. You probably want to explicitly ensure
> __GFP_SKIP_KASAN is clear for that internal call?
> 
>> + *
>>   * Can not be called from interrupt nor NMI contexts.
>>   * Return: the address of the area or %NULL on failure
>>   */
>> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>  	 * kasan_unpoison_vmalloc().
>>  	 */
>>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>> -		if (kasan_hw_tags_enabled()) {
>> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>> +
>> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>>  			/*
>>  			 * Modify protection bits to allow tagging.
>>  			 * This must be done before mapping.
>> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>  		}
>>  
>>  		/* Take note that the mapping is PAGE_KERNEL. */
>> -		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>> +		if (!skip_kasan)
>> +			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
> 
> It's pretty ugly to use the absence of this flag to rely on
> kasan_unpoison_vmalloc() not unpoisoning. Perhaps it is preferable to just not
> call kasan_unpoison_vmalloc() for the skip_kasan case?
> 
>>  	}
>>  
>>  	/* Allocate physical pages and map them into vmalloc space. */
> 
> Perhaps something like this would work:
> 
> ---8<---
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index c31a8615a8328..c340db141df57 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3979,6 +3979,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>   * under moderate memory pressure.
>   *
>   * %__GFP_NOWARN can be used to suppress failure messages.
> +
> + * %__GFP_SKIP_KASAN skip unpoisoning of mapped pages (when prot=PAGE_KERNEL).
>   *
>   * Can not be called from interrupt nor NMI contexts.
>   * Return: the address of the area or %NULL on failure
> @@ -3993,6 +3995,9 @@ void *__vmalloc_node_range_noprof(unsigned long size,
> unsigned long align,
>  	kasan_vmalloc_flags_t kasan_flags = KASAN_VMALLOC_NONE;
>  	unsigned long original_align = align;
>  	unsigned int shift = PAGE_SHIFT;
> +	bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
> +
> +	gfp_mask &= ~__GFP_SKIP_KASAN;

Okay so this is so that metadata allocation can keep using normal
page allocator side unpoisoning.

>   	if (WARN_ON_ONCE(!size))
>  		return NULL;
> @@ -4041,7 +4046,7 @@ void *__vmalloc_node_range_noprof(unsigned long size,
> unsigned long align,
>  	 * kasan_unpoison_vmalloc().
>  	 */
>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
> -		if (kasan_hw_tags_enabled()) {
> +		if (kasan_hw_tags_enabled() && !skip_kasan) {

Why do we want to elide GFP_SKIP_ZERO (set below) in this case?

>  			/*
>  			 * Modify protection bits to allow tagging.
>  			 * This must be done before mapping.
> @@ -4054,6 +4059,12 @@ void *__vmalloc_node_range_noprof(unsigned long size,
> unsigned long align,
>  			 * poisoned and zeroed by kasan_unpoison_vmalloc().
>  			 */
>  			gfp_mask |= __GFP_SKIP_KASAN | __GFP_SKIP_ZERO;
> +		} else if (skip_kasan) {
> +			/*
> +			 * Skip page_alloc unpoisoning physical pages backing
> +			 * VM_ALLOC mapping, as requested by caller.
> +			 */
> +			gfp_mask |= __GFP_SKIP_KASAN;
>  		}
>   		/* Take note that the mapping is PAGE_KERNEL. */
> @@ -4078,7 +4089,8 @@ void *__vmalloc_node_range_noprof(unsigned long size,
> unsigned long align,
>  	    (gfp_mask & __GFP_SKIP_ZERO))
>  		kasan_flags |= KASAN_VMALLOC_INIT;
>  	/* KASAN_VMALLOC_PROT_NORMAL already set if required. */
> -	area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
> +	if (!skip_kasan)
> +		area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);

I really think we should do some decoupling here - GFP_SKIP_KASAN means,
"skip KASAN when going through page allocator". Now we reuse this flag
to skip vmalloc unpoisoning.

Some code path using GFP_SKIP_KASAN (which is highly likely given that
GFP_HIGHUSER_MOVABLE has this) and also using vmalloc() will unintentionally
also skip vmalloc unpoisoning.

I think we are doing patch 1 because of patch 2 - so in patch 2, perhaps
instead of calling __vmalloc_node we can call __vmalloc_node_range_noprof and
shift this "skip vmalloc unpoisoning" functionality into vmalloc flags instead?
Perhaps this won't work for the nommu case (__vmalloc_node has two definitions),
just a line of thought.


>   	/*
>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> 
> ---8<---
> 
> Thanks,
> Ryan
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-04-22 14:23     ` Dev Jain
@ 2026-04-22 14:38       ` Ryan Roberts
  2026-04-22 15:59         ` David Hildenbrand (Arm)
  2026-04-23  6:13         ` Dev Jain
  0 siblings, 2 replies; 19+ messages in thread
From: Ryan Roberts @ 2026-04-22 14:38 UTC (permalink / raw)
  To: Dev Jain, Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Kees Cook, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, david.hildenbrand

On 22/04/2026 15:23, Dev Jain wrote:
> 
> 
> On 22/04/26 6:51 pm, Ryan Roberts wrote:
>> On 24/03/2026 13:26, Muhammad Usama Anjum wrote:
>>> For allocations that will be accessed only with match-all pointers
>>> (e.g., kernel stacks), setting tags is wasted work. If the caller
>>> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
>>> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
>>> returns early without tagging.
>>>
>>> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
>>> APIs. So it wasn't being checked. Now its being checked and acted
>>> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
>>> defined there.
>>>
>>> This is a preparatory patch for optimizing kernel stack allocations.
>>>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
>>> ---
>>> Changes since v1:
>>> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>>>   is zero in non-hw-tags mode.
>>> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
>>> ---
>>>  mm/vmalloc.c | 11 ++++++++---
>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index c607307c657a6..69ae205effb46 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>  				__GFP_NOFAIL | __GFP_ZERO |\
>>>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>>>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
>>> -				GFP_USER | __GFP_NOLOCKDEP)
>>> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>>>  
>>>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>  {
>>> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>   *
>>>   * %__GFP_NOWARN can be used to suppress failure messages.
>>>   *
>>> + * %__GFP_SKIP_KASAN can be used to skip poisoning
>>
>> You mean skip *un*poisoning, I think? But you would only want this to apply to
>> the actaul pages mapped by vmalloc. You wouldn't want to skip unpoisoning for
>> any allocated meta data; I think that is currently possible since the gfp_flags
>> that are passed into __vmalloc_node_range_noprof() are passed down to
>> __get_vm_area_node() unmdified. You probably want to explicitly ensure
>> __GFP_SKIP_KASAN is clear for that internal call?
>>
>>> + *
>>>   * Can not be called from interrupt nor NMI contexts.
>>>   * Return: the address of the area or %NULL on failure
>>>   */
>>> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>  	 * kasan_unpoison_vmalloc().
>>>  	 */
>>>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>>> -		if (kasan_hw_tags_enabled()) {
>>> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>>> +
>>> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>>>  			/*
>>>  			 * Modify protection bits to allow tagging.
>>>  			 * This must be done before mapping.
>>> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>  		}
>>>  
>>>  		/* Take note that the mapping is PAGE_KERNEL. */
>>> -		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>> +		if (!skip_kasan)
>>> +			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>
>> It's pretty ugly to use the absence of this flag to rely on
>> kasan_unpoison_vmalloc() not unpoisoning. Perhaps it is preferable to just not
>> call kasan_unpoison_vmalloc() for the skip_kasan case?
>>
>>>  	}
>>>  
>>>  	/* Allocate physical pages and map them into vmalloc space. */
>>
>> Perhaps something like this would work:
>>
>> ---8<---
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index c31a8615a8328..c340db141df57 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -3979,6 +3979,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>   * under moderate memory pressure.
>>   *
>>   * %__GFP_NOWARN can be used to suppress failure messages.
>> +
>> + * %__GFP_SKIP_KASAN skip unpoisoning of mapped pages (when prot=PAGE_KERNEL).
>>   *
>>   * Can not be called from interrupt nor NMI contexts.
>>   * Return: the address of the area or %NULL on failure
>> @@ -3993,6 +3995,9 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>> unsigned long align,
>>  	kasan_vmalloc_flags_t kasan_flags = KASAN_VMALLOC_NONE;
>>  	unsigned long original_align = align;
>>  	unsigned int shift = PAGE_SHIFT;
>> +	bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>> +
>> +	gfp_mask &= ~__GFP_SKIP_KASAN;
> 
> Okay so this is so that metadata allocation can keep using normal
> page allocator side unpoisoning.

Yes.

> 
>>   	if (WARN_ON_ONCE(!size))
>>  		return NULL;
>> @@ -4041,7 +4046,7 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>> unsigned long align,
>>  	 * kasan_unpoison_vmalloc().
>>  	 */
>>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>> -		if (kasan_hw_tags_enabled()) {
>> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
> 
> Why do we want to elide GFP_SKIP_ZERO (set below) in this case?

You mean why do we want to skip initializing the allocated memory to zero for
the case where kasan HW_TAGS is enabled and we are not skipping kasan unpoisoning?

Because setting tags at the same time as zeroing the memory is less expensive
than doing them both as separate operations. So we tell page_alloc not to bother
zeroing the memory and kasan_unpoison_vmalloc() does it at the same time as
setting the tags instead. See kasan_unpoison() which ultimately calls
mte_set_mem_tag_range().

> 
>>  			/*
>>  			 * Modify protection bits to allow tagging.
>>  			 * This must be done before mapping.
>> @@ -4054,6 +4059,12 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>> unsigned long align,
>>  			 * poisoned and zeroed by kasan_unpoison_vmalloc().
>>  			 */
>>  			gfp_mask |= __GFP_SKIP_KASAN | __GFP_SKIP_ZERO;
>> +		} else if (skip_kasan) {
>> +			/*
>> +			 * Skip page_alloc unpoisoning physical pages backing
>> +			 * VM_ALLOC mapping, as requested by caller.
>> +			 */
>> +			gfp_mask |= __GFP_SKIP_KASAN;
>>  		}
>>   		/* Take note that the mapping is PAGE_KERNEL. */
>> @@ -4078,7 +4089,8 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>> unsigned long align,
>>  	    (gfp_mask & __GFP_SKIP_ZERO))
>>  		kasan_flags |= KASAN_VMALLOC_INIT;
>>  	/* KASAN_VMALLOC_PROT_NORMAL already set if required. */
>> -	area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
>> +	if (!skip_kasan)
>> +		area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
> 
> I really think we should do some decoupling here - GFP_SKIP_KASAN means,
> "skip KASAN when going through page allocator". > Now we reuse this flag
> to skip vmalloc unpoisoning.
> 
> Some code path using GFP_SKIP_KASAN (which is highly likely given that
> GFP_HIGHUSER_MOVABLE has this) and also using vmalloc() will unintentionally
> also skip vmalloc unpoisoning.

If a caller wants to vmalloc() memory with GFP_HIGHUSER_MOVABLE (which seems
HIGHLY suspect to me) then surely leaving the memory poisoned is *exactly* what
they expect?

> 
> I think we are doing patch 1 because of patch 2 - so in patch 2, perhaps
> instead of calling __vmalloc_node we can call __vmalloc_node_range_noprof and
> shift this "skip vmalloc unpoisoning" functionality into vmalloc flags instead?

This is exactly how Usama was doing it in v1. I suggested we should just reuse
the existing flag since it already provides the semantic we want and is less
confusing than introducing a new flag.

I know David is keen to do a wider rework and remove/rename/change the semantics
of __GFP_SKIP_KASAN, but I'm hoping that if we just continue to use the existing
flag and its semantics for vmalloc then there is no reason why this series can't
be merged independently of that wider rework.

Thanks,
Ryan


> Perhaps this won't work for the nommu case (__vmalloc_node has two definitions),
> just a line of thought.
> 
> 
>>   	/*
>>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
>>
>> ---8<---
>>
>> Thanks,
>> Ryan
>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-04-22 14:38       ` Ryan Roberts
@ 2026-04-22 15:59         ` David Hildenbrand (Arm)
  2026-04-23  6:13         ` Dev Jain
  1 sibling, 0 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-22 15:59 UTC (permalink / raw)
  To: Ryan Roberts, Dev Jain, Muhammad Usama Anjum, Arnd Bergmann,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Kees Cook, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, david.hildenbrand

>> I think we are doing patch 1 because of patch 2 - so in patch 2, perhaps
>> instead of calling __vmalloc_node we can call __vmalloc_node_range_noprof and
>> shift this "skip vmalloc unpoisoning" functionality into vmalloc flags instead?
> 
> This is exactly how Usama was doing it in v1. I suggested we should just reuse
> the existing flag since it already provides the semantic we want and is less
> confusing than introducing a new flag.
> 
> I know David is keen to do a wider rework and remove/rename/change the semantics
> of __GFP_SKIP_KASAN, but I'm hoping that if we just continue to use the existing
> flag and its semantics for vmalloc then there is no reason why this series can't
> be merged independently of that wider rework.

Independent of how the flag will be called, I think it will have the same
semantics. How we'll implement that internally is a different question.

So I agree that adding __GFP_SKIP_KASAN support here is the right approach for
the time being.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks
  2026-04-22 13:31           ` Ryan Roberts
@ 2026-04-22 18:00             ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2026-04-22 18:00 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand (Arm), Muhammad Usama Anjum, Arnd Bergmann,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Kees Cook, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Will Deacon,
	david.hildenbrand

On Wed, Apr 22, 2026 at 02:31:14PM +0100, Ryan Roberts wrote:
> On 17/04/2026 09:31, Catalin Marinas wrote:
> > On Thu, Apr 16, 2026 at 11:03:46AM +0200, David Hildenbrand wrote:
> >> On 4/10/26 20:36, Catalin Marinas wrote:
> >>> On Fri, Apr 10, 2026 at 07:32:23PM +0100, Catalin Marinas wrote:
> >>>> What the original approach might help with is use-after-realloc in case
> >>>> we had a tagged pointer in a past life of a page and it still works now.
> >>>> Oh well, that's I guess for other types of hardening to address like
> >>>> delayed reallocation.
> >>>
> >>> Another thought (for a separate series) - we could try to map the stack
> >>> as Untagged (unless stack tagging is enabled; needs compiler
> >>> instrumentation) and enable canonical tag checking (newer addition to
> >>> MTE). This way, any stray tagged pointer won't work on the stack since
> >>> it needs a 0xf tag (canonical).
> >>
> >> Do you mean mapping it as Untagged in the vmap for CONFIG_VMAP_STACK or
> >> also as Untagged in the directmap?
> >>
> >> The latter brings in the set of problems with direct map fragmentation.
> > 
> > Just the vmap, there are a lot more problems with the direct map. Not
> > sure how much it does in terms of security, maybe marginally. A
> > match-all tag (0xf) would still be able to access the canonically tagged
> > memory.
> 
> I think with the first patch in this series, we are alredy vmapping the stack
> memory as untagged, right? vmalloc only calls arch_vmap_pgprot_tagged() if we
> are not skipping kasan. So I think we already have this protection? (perhaps we
> need to explicitly enable the canonical tag checks?)

Ah, yes, good point. So, we could just enable canonical tag checking so
that untagged memory only uses the 0xf tag while in the kernel (not sure
what might break but in theory these would only happen if we have use
after free bugs etc.)

I think it's just a matter of setting TCR_EL1.MTX1 but it has some
implications on the PAC bits. This setting would affect the kernel
image mapping, modules. Anyway, something to investigate separately.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support
  2026-04-22 14:38       ` Ryan Roberts
  2026-04-22 15:59         ` David Hildenbrand (Arm)
@ 2026-04-23  6:13         ` Dev Jain
  1 sibling, 0 replies; 19+ messages in thread
From: Dev Jain @ 2026-04-23  6:13 UTC (permalink / raw)
  To: Ryan Roberts, Muhammad Usama Anjum, Arnd Bergmann, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Kees Cook, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Uladzislau Rezki, linux-arch,
	linux-kernel, linux-mm, Andrey Konovalov, Marco Elver,
	Vincenzo Frascino, Peter Collingbourne, Catalin Marinas,
	Will Deacon, david.hildenbrand



On 22/04/26 8:08 pm, Ryan Roberts wrote:
> On 22/04/2026 15:23, Dev Jain wrote:
>>
>>
>> On 22/04/26 6:51 pm, Ryan Roberts wrote:
>>> On 24/03/2026 13:26, Muhammad Usama Anjum wrote:
>>>> For allocations that will be accessed only with match-all pointers
>>>> (e.g., kernel stacks), setting tags is wasted work. If the caller
>>>> already set __GFP_SKIP_KASAN, don’t skip zeroing the pages and
>>>> don’t set KASAN_VMALLOC_PROT_NORMAL so kasan_unpoison_vmalloc()
>>>> returns early without tagging.
>>>>
>>>> Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc
>>>> APIs. So it wasn't being checked. Now its being checked and acted
>>>> upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN isn't
>>>> defined there.
>>>>
>>>> This is a preparatory patch for optimizing kernel stack allocations.
>>>>
>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
>>>> ---
>>>> Changes since v1:
>>>> - Simplify skip conditions based on the fact that __GFP_SKIP_KASAN
>>>>   is zero in non-hw-tags mode.
>>>> - Add __GFP_SKIP_KASAN to GFP_VMALLOC_SUPPORTED list of flags
>>>> ---
>>>>  mm/vmalloc.c | 11 ++++++++---
>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index c607307c657a6..69ae205effb46 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -3939,7 +3939,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>>>>  				__GFP_NOFAIL | __GFP_ZERO |\
>>>>  				__GFP_NORETRY | __GFP_RETRY_MAYFAIL |\
>>>>  				GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
>>>> -				GFP_USER | __GFP_NOLOCKDEP)
>>>> +				GFP_USER | __GFP_NOLOCKDEP | __GFP_SKIP_KASAN)
>>>>  
>>>>  static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>>  {
>>>> @@ -3980,6 +3980,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>>   *
>>>>   * %__GFP_NOWARN can be used to suppress failure messages.
>>>>   *
>>>> + * %__GFP_SKIP_KASAN can be used to skip poisoning
>>>
>>> You mean skip *un*poisoning, I think? But you would only want this to apply to
>>> the actaul pages mapped by vmalloc. You wouldn't want to skip unpoisoning for
>>> any allocated meta data; I think that is currently possible since the gfp_flags
>>> that are passed into __vmalloc_node_range_noprof() are passed down to
>>> __get_vm_area_node() unmdified. You probably want to explicitly ensure
>>> __GFP_SKIP_KASAN is clear for that internal call?
>>>
>>>> + *
>>>>   * Can not be called from interrupt nor NMI contexts.
>>>>   * Return: the address of the area or %NULL on failure
>>>>   */
>>>> @@ -4041,7 +4043,9 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>>  	 * kasan_unpoison_vmalloc().
>>>>  	 */
>>>>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>>>> -		if (kasan_hw_tags_enabled()) {
>>>> +		bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>>>> +
>>>> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>>>>  			/*
>>>>  			 * Modify protection bits to allow tagging.
>>>>  			 * This must be done before mapping.
>>>> @@ -4057,7 +4061,8 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
>>>>  		}
>>>>  
>>>>  		/* Take note that the mapping is PAGE_KERNEL. */
>>>> -		kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>>> +		if (!skip_kasan)
>>>> +			kasan_flags |= KASAN_VMALLOC_PROT_NORMAL;
>>>
>>> It's pretty ugly to use the absence of this flag to rely on
>>> kasan_unpoison_vmalloc() not unpoisoning. Perhaps it is preferable to just not
>>> call kasan_unpoison_vmalloc() for the skip_kasan case?
>>>
>>>>  	}
>>>>  
>>>>  	/* Allocate physical pages and map them into vmalloc space. */
>>>
>>> Perhaps something like this would work:
>>>
>>> ---8<---
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index c31a8615a8328..c340db141df57 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -3979,6 +3979,8 @@ static gfp_t vmalloc_fix_flags(gfp_t flags)
>>>   * under moderate memory pressure.
>>>   *
>>>   * %__GFP_NOWARN can be used to suppress failure messages.
>>> +
>>> + * %__GFP_SKIP_KASAN skip unpoisoning of mapped pages (when prot=PAGE_KERNEL).
>>>   *
>>>   * Can not be called from interrupt nor NMI contexts.
>>>   * Return: the address of the area or %NULL on failure
>>> @@ -3993,6 +3995,9 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>>  	kasan_vmalloc_flags_t kasan_flags = KASAN_VMALLOC_NONE;
>>>  	unsigned long original_align = align;
>>>  	unsigned int shift = PAGE_SHIFT;
>>> +	bool skip_kasan = gfp_mask & __GFP_SKIP_KASAN;
>>> +
>>> +	gfp_mask &= ~__GFP_SKIP_KASAN;
>>
>> Okay so this is so that metadata allocation can keep using normal
>> page allocator side unpoisoning.
> 
> Yes.
> 
>>
>>>   	if (WARN_ON_ONCE(!size))
>>>  		return NULL;
>>> @@ -4041,7 +4046,7 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>>  	 * kasan_unpoison_vmalloc().
>>>  	 */
>>>  	if (pgprot_val(prot) == pgprot_val(PAGE_KERNEL)) {
>>> -		if (kasan_hw_tags_enabled()) {
>>> +		if (kasan_hw_tags_enabled() && !skip_kasan) {
>>
>> Why do we want to elide GFP_SKIP_ZERO (set below) in this case?
> 
> You mean why do we want to skip initializing the allocated memory to zero for
> the case where kasan HW_TAGS is enabled and we are not skipping kasan unpoisoning?
> 
> Because setting tags at the same time as zeroing the memory is less expensive
> than doing them both as separate operations. So we tell page_alloc not to bother
> zeroing the memory and kasan_unpoison_vmalloc() does it at the same time as
> setting the tags instead. See kasan_unpoison() which ultimately calls
> mte_set_mem_tag_range().

I was asking the opposite question. So in the case of skip_kasan, we also want
to skip setting GFP_SKIP_ZERO, because we are not reliant on kasan hw tags path
to zero the memory, we are relying on page allocator now. Got it.

> 
>>
>>>  			/*
>>>  			 * Modify protection bits to allow tagging.
>>>  			 * This must be done before mapping.
>>> @@ -4054,6 +4059,12 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>>  			 * poisoned and zeroed by kasan_unpoison_vmalloc().
>>>  			 */
>>>  			gfp_mask |= __GFP_SKIP_KASAN | __GFP_SKIP_ZERO;
>>> +		} else if (skip_kasan) {
>>> +			/*
>>> +			 * Skip page_alloc unpoisoning physical pages backing
>>> +			 * VM_ALLOC mapping, as requested by caller.
>>> +			 */
>>> +			gfp_mask |= __GFP_SKIP_KASAN;
>>>  		}
>>>   		/* Take note that the mapping is PAGE_KERNEL. */
>>> @@ -4078,7 +4089,8 @@ void *__vmalloc_node_range_noprof(unsigned long size,
>>> unsigned long align,
>>>  	    (gfp_mask & __GFP_SKIP_ZERO))
>>>  		kasan_flags |= KASAN_VMALLOC_INIT;
>>>  	/* KASAN_VMALLOC_PROT_NORMAL already set if required. */
>>> -	area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
>>> +	if (!skip_kasan)
>>> +		area->addr = kasan_unpoison_vmalloc(area->addr, size, kasan_flags);
>>
>> I really think we should do some decoupling here - GFP_SKIP_KASAN means,
>> "skip KASAN when going through page allocator". > Now we reuse this flag
>> to skip vmalloc unpoisoning.
>>
>> Some code path using GFP_SKIP_KASAN (which is highly likely given that
>> GFP_HIGHUSER_MOVABLE has this) and also using vmalloc() will unintentionally
>> also skip vmalloc unpoisoning.
> 
> If a caller wants to vmalloc() memory with GFP_HIGHUSER_MOVABLE (which seems
> HIGHLY suspect to me) then surely leaving the memory poisoned is *exactly* what
> they expect?

Okay I get your point.
> 
>>
>> I think we are doing patch 1 because of patch 2 - so in patch 2, perhaps
>> instead of calling __vmalloc_node we can call __vmalloc_node_range_noprof and
>> shift this "skip vmalloc unpoisoning" functionality into vmalloc flags instead?
> 
> This is exactly how Usama was doing it in v1. I suggested we should just reuse
> the existing flag since it already provides the semantic we want and is less
> confusing than introducing a new flag.
> 
> I know David is keen to do a wider rework and remove/rename/change the semantics
> of __GFP_SKIP_KASAN, but I'm hoping that if we just continue to use the existing
> flag and its semantics for vmalloc then there is no reason why this series can't
> be merged independently of that wider rework.

Okay makes sense.

> 
> Thanks,
> Ryan
> 
> 
>> Perhaps this won't work for the nommu case (__vmalloc_node has two definitions),
>> just a line of thought.
>>
>>
>>>   	/*
>>>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
>>>
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-04-23  6:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 13:26 [PATCH v2 0/3] KASAN: HW_TAGS: Disable tagging for stack and page-tables Muhammad Usama Anjum
2026-03-24 13:26 ` [PATCH v2 1/3] vmalloc: add __GFP_SKIP_KASAN support Muhammad Usama Anjum
2026-04-10 18:10   ` Catalin Marinas
2026-04-16  9:10   ` David Hildenbrand
2026-04-22 13:21   ` Ryan Roberts
2026-04-22 14:23     ` Dev Jain
2026-04-22 14:38       ` Ryan Roberts
2026-04-22 15:59         ` David Hildenbrand (Arm)
2026-04-23  6:13         ` Dev Jain
2026-03-24 13:26 ` [PATCH v2 2/3] kasan: skip HW tagging for all kernel thread stacks Muhammad Usama Anjum
2026-04-10 18:32   ` Catalin Marinas
2026-04-10 18:36     ` Catalin Marinas
2026-04-16  9:03       ` David Hildenbrand (Arm)
2026-04-17  8:31         ` Catalin Marinas
2026-04-22 13:31           ` Ryan Roberts
2026-04-22 18:00             ` Catalin Marinas
2026-03-24 13:26 ` [PATCH v2 3/3] mm: skip KASAN tagging for page-allocated page tables Muhammad Usama Anjum
2026-04-10 18:19   ` Catalin Marinas
2026-04-16  8:55   ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox