[PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking
@ 2026-01-15 23:07 Carl Worth
  2026-01-15 23:07 ` [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO Carl Worth
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Carl Worth @ 2026-01-15 23:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Taehyun Noh, Carl Worth

[Thanks to Taehyun Noh from UT Austin for originally reporting this
bug. In this cover letter, "we" refers to a collaborative effort
between indiviuals at both Ampere Computing and UT Austin.]

We measured severe performance overhead (25-50%) when enabling
userspace MTE and running memcached on an AmpereOne machine, (detailed
benchmark results are provided below).

We identified excessive tag checking taking place in the kernel,
(though only userspace tag checking was requested), as the culprit for
the performance slowdown. The existing kernel implementation expects
that if tag check faults are not requested, then the implementation
will not perform tag checking. We found (empirically) that this is not
the case for at least some implementations, and verified that there's
no architectural requirement that tag checking be disabled when tag
check faults are not requested.

This patch series addresses the slowdown by using TCMA1 to explicitly
disable unwanted tag checking.

The effect of this patch series is most-readily seen by using perf to
count tag-checked accesses in both kernel and userspace, for example
while runnning "perf bench futex hash" with MTE enabled.

Prior to the patch series, we see:

 # GLIBC_TUNABLES=glibc.mem.tagging=3 perf stat -e mem_access_checked_rd:u,mem_access_checked_wr:u,mem_access_checked_rd:k,mem_access_checked_wr:k perf bench futex hash
...
 Performance counter stats for 'perf bench futex hash':
     4,246,651,954      mem_access_checked_rd:u
        29,375,167      mem_access_checked_wr:u
   246,588,717,771      mem_access_checked_rd:k
    78,805,316,911      mem_access_checked_wr:k

And after the patch series we see (for the same command):

 Performance counter stats for 'perf bench futex hash':
     4,337,091,554      mem_access_checked_rd:u
            23,487      mem_access_checked_wr:u
     4,342,774,550      mem_access_checked_rd:k
               788      mem_access_checked_wr:k

As can be seen above, with roughly equivalent counts of userspace
tag-checked accesses, over 98% of the kernel-space tag-checked
accesses are eliminated.

As to performance, the patch series should have no behavioral impact
if the kernel is not compiled with MTE support. And the series has not
been observed to have any impact when the kernel includes MTE support
but the workloads have MTE disabled in userspace.

For workloads with MTE enabled, we measured the series giving a 2%
improvement for "perf bench futex hash" at 95% confidence.

Also, we used the Phoronix Test Suite pts/memcached benchmark with a
get-heavy workload (1:10 Set:Get ratio) which is where the slowdown
appears most clearly. The slowdown worsens with increased core count,
levelling out above 32 cores. The numbers below are based on averages
from 50 runs each, with 96 cores on each run. For "MTE on",
GLIBC_TUNABLES was set to "glibc.mem.tagging=3". For "MTE off",
GLIBC_TUNABLES was unset.

The numbers below are normalized ops./sec. (higher is better),
normalized to the baseline case (unpatched kernel, MTE off).

Before the patch series (upstream v6.19-rc5+):

	MTE off: 1.000
	MTE  on: 0.742

	MTE overhead: 25.8% +/- 1.6%

After applying this patch series:

	MTE off: 0.991
	MTE  on: 0.990

	MTE overhead: No difference proven at 95.0% confidence

-Carl

---
Changes in v2:
- Fixed to correctly pass 'current' vs. 'next' in set_kernel_mte_policy,
  (thanks to Will Deacon)
- Changed approach to use TCMA1 rather than toggling PSTATE.TCO
  (thanks to Catalin Marinas)
- Link to v1: https://lore.kernel.org/r/20251030-mte-tighten-tco-v1-0-88c92e7529d9@os.amperecomputing.com
---
Carl Worth (1):
      arm64: mte: Set TCMA1 whenever MTE is present in the kernel

Taehyun Noh (1):
      arm64: mte: Clarify kernel MTE policy and manipulation of TCO

 arch/arm64/include/asm/mte.h     | 40 +++++++++++++++++++++++++++++++++-------
 arch/arm64/kernel/entry-common.c |  4 ++--
 arch/arm64/kernel/mte.c          |  2 +-
 arch/arm64/mm/proc.S             | 10 +++++-----
 4 files changed, 41 insertions(+), 15 deletions(-)
---
base-commit: 944aacb68baf7624ab8d277d0ebf07f025ca137c

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO
  2026-01-15 23:07 [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Carl Worth
@ 2026-01-15 23:07 ` Carl Worth
  2026-01-19 18:17   ` Catalin Marinas
  2026-01-15 23:07 ` [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Carl Worth
  2026-01-27 11:39 ` [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Will Deacon
  2 siblings, 1 reply; 9+ messages in thread
From: Carl Worth @ 2026-01-15 23:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Taehyun Noh, Carl Worth

From: Taehyun Noh <taehyun@utexas.edu>

The kernel's primary knob for controlling MTE tag checking is the
PSTATE.TCO bit (tag check override). TCO is enabled (which,
confusingly _disables_ tag checking) by the hardware at the time of an
exception. Then, at various times, when the kernel needs to enable
tag-checking it clears TCO, (which in turn allows TCF0 or TCF to
control whether tag-checking occurs).

Some of the TCO manipulation code is unclear or perhaps confusing.

Make the code more clear by introducing a new function
user_uses_tagcheck which captures the existing condition for testing
whether tag checking is desired. This new function includes
significant new comments to help explain the logic.

Also fix the confusing naming by renaming mte_disable_tco_entry() to
set_kernel_mte_policy(). This function does not necessarily disable
TCO, but does so only conditionally in the case of KASAN HW TAGS. The
new name accurately describes the purpose of the function.

This commit should have no behavioral change.

Signed-off-by: Taehyun Noh <taehyun@utexas.edu>
Co-developed-by: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
---
 arch/arm64/include/asm/mte.h     | 40 +++++++++++++++++++++++++++++++++-------
 arch/arm64/kernel/entry-common.c |  4 ++--
 arch/arm64/kernel/mte.c          |  2 +-
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 6d4a78b9dc3e..fccb51b2abb0 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -224,7 +224,35 @@ static inline bool folio_try_hugetlb_mte_tagging(struct folio *folio)
 }
 #endif
 
-static inline void mte_disable_tco_entry(struct task_struct *task)
+static inline bool user_uses_tagcheck(struct task_struct *task)
+{
+	/*
+	 * To decide whether userspace wants tag checking we only look
+	 * at TCF0 (SCTLR_EL1.TCF0 bit 0 is set for both synchronous
+	 * or asymmetric mode).
+	 *
+	 * There's an argument that could be made that the kernel
+	 * should also consider the state of TCO (tag check override)
+	 * since userspace does have the ability to set that as well,
+	 * and that could suggest a desire to disable tag checking in
+	 * spite of the state of TCF0. However, the Linux kernel has
+	 * never historically considered the userspace state of TCO,
+	 * (so changing this would be an ABI break), and the hardware
+	 * unconditionally sets TCO when an exception occurs
+	 * anyway.
+	 *
+	 * So, again, here we look only at TCF0 and do not consider
+	 * TCO.
+	 */
+	return (task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT));
+}
+
+/*
+ * Set the kernel's desired policy for MTE tag checking.
+ *
+ * This function should be used right after the kernel entry.
+ */
+static inline void set_kernel_mte_policy(struct task_struct *task)
 {
 	if (!system_supports_mte())
 		return;
@@ -232,15 +260,13 @@ static inline void mte_disable_tco_entry(struct task_struct *task)
 	/*
 	 * Re-enable tag checking (TCO set on exception entry). This is only
 	 * necessary if MTE is enabled in either the kernel or the userspace
-	 * task in synchronous or asymmetric mode (SCTLR_EL1.TCF0 bit 0 is set
-	 * for both). With MTE disabled in the kernel and disabled or
-	 * asynchronous in userspace, tag check faults (including in uaccesses)
-	 * are not reported, therefore there is no need to re-enable checking.
+	 * task. With MTE disabled in the kernel and disabled or asynchronous
+	 * in userspace, tag check faults (including in uaccesses) are not
+	 * reported, therefore there is no need to re-enable checking.
 	 * This is beneficial on microarchitectures where re-enabling TCO is
 	 * expensive.
 	 */
-	if (kasan_hw_tags_enabled() ||
-	    (task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT)))
+	if (kasan_hw_tags_enabled() || user_uses_tagcheck(task))
 		asm volatile(SET_PSTATE_TCO(0));
 }
 
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3625797e9ee8..ab02840ef449 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -41,7 +41,7 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 
 	state = irqentry_enter(regs);
 	mte_check_tfsr_entry();
-	mte_disable_tco_entry(current);
+	set_kernel_mte_policy(current);
 
 	return state;
 }
@@ -66,7 +66,7 @@ static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
 static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
 {
 	enter_from_user_mode(regs);
-	mte_disable_tco_entry(current);
+	set_kernel_mte_policy(current);
 }
 
 /*
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 32148bf09c1d..5ee74bb60ce7 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -289,7 +289,7 @@ void mte_thread_switch(struct task_struct *next)
 	mte_update_gcr_excl(next);
 
 	/* TCO may not have been disabled on exception entry for the current task. */
-	mte_disable_tco_entry(next);
+	set_kernel_mte_policy(next);
 
 	/*
 	 * Check if an async tag exception occurred at EL1.

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO
  2026-01-15 23:07 ` [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO Carl Worth
@ 2026-01-19 18:17   ` Catalin Marinas
  2026-01-20 19:44     ` Taehyun Noh
  0 siblings, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2026-01-19 18:17 UTC (permalink / raw)
  To: Carl Worth
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Taehyun Noh,
	Peter Collingbourne

+ Peter as he contributed the original patch for skipping PSTATE.TCO
clearing.

On Thu, Jan 15, 2026 at 03:07:17PM -0800, Carl Worth wrote:
> From: Taehyun Noh <taehyun@utexas.edu>
> 
> The kernel's primary knob for controlling MTE tag checking is the
> PSTATE.TCO bit (tag check override). TCO is enabled (which,
> confusingly _disables_ tag checking) by the hardware at the time of an
> exception. Then, at various times, when the kernel needs to enable
> tag-checking it clears TCO, (which in turn allows TCF0 or TCF to
> control whether tag-checking occurs).
> 
> Some of the TCO manipulation code is unclear or perhaps confusing.
> 
> Make the code more clear by introducing a new function
> user_uses_tagcheck which captures the existing condition for testing
> whether tag checking is desired. This new function includes
> significant new comments to help explain the logic.
> 
> Also fix the confusing naming by renaming mte_disable_tco_entry() to
> set_kernel_mte_policy(). This function does not necessarily disable
> TCO, but does so only conditionally in the case of KASAN HW TAGS. The
> new name accurately describes the purpose of the function.
> 
> This commit should have no behavioral change.
> 
> Signed-off-by: Taehyun Noh <taehyun@utexas.edu>
> Co-developed-by: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
> ---
>  arch/arm64/include/asm/mte.h     | 40 +++++++++++++++++++++++++++++++++-------
>  arch/arm64/kernel/entry-common.c |  4 ++--
>  arch/arm64/kernel/mte.c          |  2 +-
>  3 files changed, 36 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 6d4a78b9dc3e..fccb51b2abb0 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -224,7 +224,35 @@ static inline bool folio_try_hugetlb_mte_tagging(struct folio *folio)
>  }
>  #endif
>  
> -static inline void mte_disable_tco_entry(struct task_struct *task)
> +static inline bool user_uses_tagcheck(struct task_struct *task)

The naming is not entirely correct since the user may have enabled
asynchronous tag checks. They are still checks.

> +{
> +	/*
> +	 * To decide whether userspace wants tag checking we only look
> +	 * at TCF0 (SCTLR_EL1.TCF0 bit 0 is set for both synchronous
> +	 * or asymmetric mode).
> +	 *
> +	 * There's an argument that could be made that the kernel
> +	 * should also consider the state of TCO (tag check override)
> +	 * since userspace does have the ability to set that as well,
> +	 * and that could suggest a desire to disable tag checking in
> +	 * spite of the state of TCF0. However, the Linux kernel has
> +	 * never historically considered the userspace state of TCO,
> +	 * (so changing this would be an ABI break), and the hardware
> +	 * unconditionally sets TCO when an exception occurs
> +	 * anyway.

This behaviour around user TCO is already documented in
Documentation/arch/arm64/memory-tagging-extension.rst.

> +	 *
> +	 * So, again, here we look only at TCF0 and do not consider
> +	 * TCO.
> +	 */
> +	return (task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT));
> +}
> +
> +/*
> + * Set the kernel's desired policy for MTE tag checking.
> + *
> + * This function should be used right after the kernel entry.
> + */
> +static inline void set_kernel_mte_policy(struct task_struct *task)
>  {
>  	if (!system_supports_mte())
>  		return;
> @@ -232,15 +260,13 @@ static inline void mte_disable_tco_entry(struct task_struct *task)
>  	/*
>  	 * Re-enable tag checking (TCO set on exception entry). This is only
>  	 * necessary if MTE is enabled in either the kernel or the userspace
> -	 * task in synchronous or asymmetric mode (SCTLR_EL1.TCF0 bit 0 is set
> -	 * for both). With MTE disabled in the kernel and disabled or
> -	 * asynchronous in userspace, tag check faults (including in uaccesses)
> -	 * are not reported, therefore there is no need to re-enable checking.
> +	 * task. With MTE disabled in the kernel and disabled or asynchronous
> +	 * in userspace, tag check faults (including in uaccesses) are not
> +	 * reported, therefore there is no need to re-enable checking.
>  	 * This is beneficial on microarchitectures where re-enabling TCO is
>  	 * expensive.
>  	 */
> -	if (kasan_hw_tags_enabled() ||
> -	    (task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT)))
> +	if (kasan_hw_tags_enabled() || user_uses_tagcheck(task))
>  		asm volatile(SET_PSTATE_TCO(0));
>  }

TBH, I'm fine with leaving the logic in this function without
introducing a new user_uses_tagcheck() but not strongly opposed to it
with better naming.

That said, the set_kernel_mte_policy() naming looks too broad. The
policy somehow implies tag check mode, fault behaviour. All it does is
dealing with PSTATE.TCO.

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO
  2026-01-19 18:17   ` Catalin Marinas
@ 2026-01-20 19:44     ` Taehyun Noh
  0 siblings, 0 replies; 9+ messages in thread
From: Taehyun Noh @ 2026-01-20 19:44 UTC (permalink / raw)
  To: Catalin Marinas, Carl Worth
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Peter Collingbourne

On Mon Jan 19, 2026 at 12:17 PM CST, Catalin Marinas wrote:
> TBH, I'm fine with leaving the logic in this function without
> introducing a new user_uses_tagcheck() but not strongly opposed to it
> with better naming.
>
> That said, the set_kernel_mte_policy() naming looks too broad. The
> policy somehow implies tag check mode, fault behaviour. All it does is
> dealing with PSTATE.TCO.

I agree with your point. having TCO on the function name is more concise
than what I've suggested. We can drop this patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  2026-01-15 23:07 [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Carl Worth
  2026-01-15 23:07 ` [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO Carl Worth
@ 2026-01-15 23:07 ` Carl Worth
  2026-01-19 17:57   ` Catalin Marinas
  2026-01-22 10:23   ` Usama Anjum
  2026-01-27 11:39 ` [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Will Deacon
  2 siblings, 2 replies; 9+ messages in thread
From: Carl Worth @ 2026-01-15 23:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Taehyun Noh, Carl Worth

Set the TCMA1 bit so that access to TTBR1 addresses with 0xf in their
tag bits will be treated as tag unchecked.

This is important to avoid unwanted tag checking on some
systems. Specifically, SCTLR_EL1.TCF can be set to indicate that no
tag check faults are desired. But the architecture doesn't guarantee
that in this case the system won't still perform tag checks.

Use TCMA1 to ensure that undesired tag checks are not performed. This
bit was already set in the KASAN case. Adding it to the non-KASAN case
prevents tag checking since all TTBR1 address will have a value of 0xf
in their tag bits.

This patch has been measured on an Ampere system to improve the following:

* Eliminate over 98% of kernel-side tag checks during "perf bench
  futex hash", as measured with "perf stat".

* Eliminate all MTE overhead (was previously a 25% performance
  penalty) from the Phoronix pts/memcached benchmark (1:10 Set:Get
  ration with 96 cores).

Reported-by: Taehyun Noh <taehyun@utexas.edu>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
---
 arch/arm64/mm/proc.S | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5d907ce3b6d3..22866b49be37 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -48,14 +48,14 @@
 #define TCR_KASAN_SW_FLAGS 0
 #endif

-#ifdef CONFIG_KASAN_HW_TAGS
-#define TCR_MTE_FLAGS TCR_EL1_TCMA1 | TCR_EL1_TBI1 | TCR_EL1_TBID1
-#elif defined(CONFIG_ARM64_MTE)
+#ifdef CONFIG_ARM64_MTE
 /*
  * The mte_zero_clear_page_tags() implementation uses DC GZVA, which relies on
- * TBI being enabled at EL1.
+ * TBI being enabled at EL1.  TCMA1 is needed to treat accesses with the
+ * match-all tag (0xF) as Tag Unchecked, irrespective of the SCTLR_EL1.TCF
+ * setting.
  */
-#define TCR_MTE_FLAGS TCR_EL1_TBI1 | TCR_EL1_TBID1
+#define TCR_MTE_FLAGS TCR_EL1_TCMA1 | TCR_EL1_TBI1 | TCR_EL1_TBID1
 #else
 #define TCR_MTE_FLAGS 0
 #endif

-- 
2.39.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  2026-01-15 23:07 ` [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Carl Worth
@ 2026-01-19 17:57   ` Catalin Marinas
  2026-01-22 10:23   ` Usama Anjum
  1 sibling, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2026-01-19 17:57 UTC (permalink / raw)
  To: Carl Worth; +Cc: Will Deacon, linux-arm-kernel, linux-kernel, Taehyun Noh

On Thu, Jan 15, 2026 at 03:07:18PM -0800, Carl Worth wrote:
> Set the TCMA1 bit so that access to TTBR1 addresses with 0xf in their
> tag bits will be treated as tag unchecked.
> 
> This is important to avoid unwanted tag checking on some
> systems. Specifically, SCTLR_EL1.TCF can be set to indicate that no
> tag check faults are desired. But the architecture doesn't guarantee
> that in this case the system won't still perform tag checks.
> 
> Use TCMA1 to ensure that undesired tag checks are not performed. This
> bit was already set in the KASAN case. Adding it to the non-KASAN case
> prevents tag checking since all TTBR1 address will have a value of 0xf
> in their tag bits.
> 
> This patch has been measured on an Ampere system to improve the following:
> 
> * Eliminate over 98% of kernel-side tag checks during "perf bench
>   futex hash", as measured with "perf stat".
> 
> * Eliminate all MTE overhead (was previously a 25% performance
>   penalty) from the Phoronix pts/memcached benchmark (1:10 Set:Get
>   ration with 96 cores).
> 
> Reported-by: Taehyun Noh <taehyun@utexas.edu>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Carl Worth <carl@os.amperecomputing.com>

Thanks for testing an sending this.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  2026-01-15 23:07 ` [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Carl Worth
  2026-01-19 17:57   ` Catalin Marinas
@ 2026-01-22 10:23   ` Usama Anjum
  2026-01-22 11:49     ` Catalin Marinas
  1 sibling, 1 reply; 9+ messages in thread
From: Usama Anjum @ 2026-01-22 10:23 UTC (permalink / raw)
  To: Carl Worth, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Taehyun Noh, usama.anjum,
	Peter Collingbourne

On 15/01/2026 11:07 pm, Carl Worth wrote:
> Set the TCMA1 bit so that access to TTBR1 addresses with 0xf in their
> tag bits will be treated as tag unchecked.
> 
> This is important to avoid unwanted tag checking on some
> systems. Specifically, SCTLR_EL1.TCF can be set to indicate that no
> tag check faults are desired. But the architecture doesn't guarantee
> that in this case the system won't still perform tag checks.
> 
> Use TCMA1 to ensure that undesired tag checks are not performed. This
> bit was already set in the KASAN case. Adding it to the non-KASAN case
> prevents tag checking since all TTBR1 address will have a value of 0xf
> in their tag bits.
> 
> This patch has been measured on an Ampere system to improve the following:
> 
> * Eliminate over 98% of kernel-side tag checks during "perf bench
>    futex hash", as measured with "perf stat".
> 
> * Eliminate all MTE overhead (was previously a 25% performance
>    penalty) from the Phoronix pts/memcached benchmark (1:10 Set:Get
>    ration with 96 cores).
> 
> Reported-by: Taehyun Noh <taehyun@utexas.edu>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
Fixes tag is required here so that the fix traverses to the stable 
kernels. I've not found the most appropriate commit:

973b9e373306 ("arm64: mte: move register initialization to C")
bfc62c598527 ("arm64: kasan: allow enabling in-kernel MTE")

In my opinion, bfc62c598527 should be in the fixes-by tag. At a minimum, 
the back porting should be done till 973b9e373306.

> ---
>   arch/arm64/mm/proc.S | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5d907ce3b6d3..22866b49be37 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -48,14 +48,14 @@
>   #define TCR_KASAN_SW_FLAGS 0
>   #endif
>   
> -#ifdef CONFIG_KASAN_HW_TAGS
> -#define TCR_MTE_FLAGS TCR_EL1_TCMA1 | TCR_EL1_TBI1 | TCR_EL1_TBID1
> -#elif defined(CONFIG_ARM64_MTE)
> +#ifdef CONFIG_ARM64_MTE
>   /*
>    * The mte_zero_clear_page_tags() implementation uses DC GZVA, which relies on
> - * TBI being enabled at EL1.
> + * TBI being enabled at EL1.  TCMA1 is needed to treat accesses with the
> + * match-all tag (0xF) as Tag Unchecked, irrespective of the SCTLR_EL1.TCF
> + * setting.
>    */
> -#define TCR_MTE_FLAGS TCR_EL1_TBI1 | TCR_EL1_TBID1
> +#define TCR_MTE_FLAGS TCR_EL1_TCMA1 | TCR_EL1_TBI1 | TCR_EL1_TBID1
>   #else
>   #define TCR_MTE_FLAGS 0
>   #endif
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  2026-01-22 10:23   ` Usama Anjum
@ 2026-01-22 11:49     ` Catalin Marinas
  0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2026-01-22 11:49 UTC (permalink / raw)
  To: Usama Anjum
  Cc: Carl Worth, Will Deacon, linux-arm-kernel, linux-kernel,
	Taehyun Noh, Peter Collingbourne

On Thu, Jan 22, 2026 at 10:23:01AM +0000, Usama Anjum wrote:
> On 15/01/2026 11:07 pm, Carl Worth wrote:
> > Set the TCMA1 bit so that access to TTBR1 addresses with 0xf in their
> > tag bits will be treated as tag unchecked.
> > 
> > This is important to avoid unwanted tag checking on some
> > systems. Specifically, SCTLR_EL1.TCF can be set to indicate that no
> > tag check faults are desired. But the architecture doesn't guarantee
> > that in this case the system won't still perform tag checks.
> > 
> > Use TCMA1 to ensure that undesired tag checks are not performed. This
> > bit was already set in the KASAN case. Adding it to the non-KASAN case
> > prevents tag checking since all TTBR1 address will have a value of 0xf
> > in their tag bits.
> > 
> > This patch has been measured on an Ampere system to improve the following:
> > 
> > * Eliminate over 98% of kernel-side tag checks during "perf bench
> >    futex hash", as measured with "perf stat".
> > 
> > * Eliminate all MTE overhead (was previously a 25% performance
> >    penalty) from the Phoronix pts/memcached benchmark (1:10 Set:Get
> >    ration with 96 cores).
> > 
> > Reported-by: Taehyun Noh <taehyun@utexas.edu>
> > Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
> 
> Fixes tag is required here so that the fix traverses to the stable kernels.
> I've not found the most appropriate commit:
> 
> 973b9e373306 ("arm64: mte: move register initialization to C")
> bfc62c598527 ("arm64: kasan: allow enabling in-kernel MTE")
> 
> In my opinion, bfc62c598527 should be in the fixes-by tag. At a minimum, the
> back porting should be done till 973b9e373306.

We can always submit it for stable backports even without a fixes tag.
It's more of a hardware implementation choice than actually fixing a
kernel bug. The previous behaviour was also correct.

-- 
Catalin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking
  2026-01-15 23:07 [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Carl Worth
  2026-01-15 23:07 ` [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO Carl Worth
  2026-01-15 23:07 ` [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Carl Worth
@ 2026-01-27 11:39 ` Will Deacon
  2 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2026-01-27 11:39 UTC (permalink / raw)
  To: Catalin Marinas, Carl Worth
  Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-kernel,
	Taehyun Noh

On Thu, 15 Jan 2026 15:07:16 -0800, Carl Worth wrote:
> [Thanks to Taehyun Noh from UT Austin for originally reporting this
> bug. In this cover letter, "we" refers to a collaborative effort
> between indiviuals at both Ampere Computing and UT Austin.]
> 
> We measured severe performance overhead (25-50%) when enabling
> userspace MTE and running memcached on an AmpereOne machine, (detailed
> benchmark results are provided below).
> 
> [...]

Applied TCMA1 patch to arm64 (for-next/cpufeature), thanks!

[2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel
      https://git.kernel.org/arm64/c/a4e5927115f3

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-01-27 11:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-15 23:07 [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Carl Worth
2026-01-15 23:07 ` [PATCH v2 1/2] arm64: mte: Clarify kernel MTE policy and manipulation of TCO Carl Worth
2026-01-19 18:17   ` Catalin Marinas
2026-01-20 19:44     ` Taehyun Noh
2026-01-15 23:07 ` [PATCH v2 2/2] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Carl Worth
2026-01-19 17:57   ` Catalin Marinas
2026-01-22 10:23   ` Usama Anjum
2026-01-22 11:49     ` Catalin Marinas
2026-01-27 11:39 ` [PATCH v2 0/2] arm64: mte: Improve performance by explicitly disabling unwanted tag checking Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox