From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Muhammad Usama Anjum <usama.anjum@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Thomas Huth <thuth@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Lance Yang <lance.yang@linux.dev>,
Yeoreum Yun <yeoreum.yun@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
Date: Wed, 25 Mar 2026 12:46:40 +0100 [thread overview]
Message-ID: <c347fdc9-90b4-43ac-8c2e-559248777ba8@kernel.org> (raw)
In-Reply-To: <20260311175054.3889093-1-usama.anjum@arm.com>
On 3/11/26 18:50, Muhammad Usama Anjum wrote:
> In MTE synchronous mode, tag check faults are reported as immediate
> Data Abort exceptions. The TFSR_EL1.TF1 bit is never set, since faults
> never go through the asynchronous path. Therefore, reading TFSR_EL1
> and executing data and instruction barriers on kernel entry, exit,
> context switch, and suspend is unnecessary overhead in sync mode.
>
> The exit path (mte_check_tfsr_exit) and the assembly paths
> (check_mte_async_tcf / clear_mte_async_tcf in entry.S) already had this
> check.
Right, that's for user space (TFSR_EL1.TF0 IIUC). What you are adding is
for KASAN. Maybe make that clearer.
> Extend the same optimization on kernel entry/exit, context
> switch and suspend.
>
> All mte kselftests pass. The kunit before and after the patch show same
> results.
>
> A selection of test_vmalloc benchmarks running on a arm64 machine.
> v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) =
> statistically significant Regression/Improvement). Based on significance
> and ignoring the noise, the benchmarks improved.
>
> * 77 result classes were considered, with 9 wins, 0 losses and 68 ties
>
> Results of fastpath [1] on v6.19 vs this patch:
>
> +----------------------------+----------------------------------------------------------+------------+
> | Benchmark | Result Class | barriers |
> +============================+==========================================================+============+
> | micromm/fork | fork: p:1, d:10 (seconds) | (I) 2.75% |
> | | fork: p:512, d:10 (seconds) | 0.96% |
> +----------------------------+----------------------------------------------------------+------------+
> | micromm/munmap | munmap: p:1, d:10 (seconds) | -1.78% |
> | | munmap: p:512, d:10 (seconds) | 5.02% |
> +----------------------------+----------------------------------------------------------+------------+
> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
> | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 0.70% |
> | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 1.18% |
> | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | -5.01% |
> | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 13.81% |
> | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 6.51% |
> | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 32.87% |
> | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 4.17% |
> | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 8.40% |
> | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | -0.48% |
> | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | -0.74% |
> | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 0.53% |
> | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.81% |
> | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.06% |
> | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
> | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | -0.41% |
> | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 0.89% |
> | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 1.71% |
> | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 0.83% |
> +----------------------------+----------------------------------------------------------+------------+
> | schbench/thread-contention | -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec) | 0.05% |
> | | -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.60% |
> | | -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
> | | -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
> | | -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | -0.58% |
> | | -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 9.09% |
> | | -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.74% |
> | | -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | -1.40% |
> | | -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
> | | -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec) | -0.78% |
> | | -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | -0.11% |
> | | -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
> | | -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.64% |
> | | -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 3.15% |
> | | -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 17.54% |
> | | -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -1.22% |
> | | -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.85% |
> | | -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
> | | -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
> | | -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 1.05% |
> | | -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
> | | -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.41% |
> | | -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.58% |
> | | -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 2.13% |
> | | -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.67% |
> | | -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 2.07% |
> | | -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -1.28% |
> | | -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 1.01% |
> | | -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 0.69% |
> | | -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 13.12% |
> | | -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -0.25% |
> | | -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | -0.48% |
> | | -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 10.53% |
> | | -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.06% |
> | | -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 0.00% |
> | | -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
> | | -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.36% |
> | | -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.52% |
> | | -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
> | | -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.52% |
> | | -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 3.53% |
> | | -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -0.10% |
> | | -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.53% |
> | | -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 1.82% |
> | | -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | -5.80% |
> +----------------------------+----------------------------------------------------------+------------+
> | syscall/getpid | mean (ns) | (I) 15.98% |
> | | p99 (ns) | (I) 11.11% |
> | | p99.9 (ns) | (I) 16.13% |
> +----------------------------+----------------------------------------------------------+------------+
> | syscall/getppid | mean (ns) | (I) 14.82% |
> | | p99 (ns) | (I) 17.86% |
> | | p99.9 (ns) | (I) 9.09% |
> +----------------------------+----------------------------------------------------------+------------+
> | syscall/invalid | mean (ns) | (I) 17.78% |
> | | p99 (ns) | (I) 11.11% |
> | | p99.9 (ns) | 13.33% |
> +----------------------------+----------------------------------------------------------+------------+
>
> [1] https://gitlab.arm.com/tooling/fastpath
>
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
> ---
> The patch applies on v6.19 and next-20260309.
> ---
> arch/arm64/include/asm/mte.h | 6 +++++-
> arch/arm64/kernel/mte.c | 5 +++++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 6d4a78b9dc3e6..0e05d20cf2583 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -252,7 +252,8 @@ static inline void mte_check_tfsr_entry(void)
> if (!kasan_hw_tags_enabled())
> return;
>
> - mte_check_tfsr_el1();
> + if (system_uses_mte_async_or_asymm_mode())
> + mte_check_tfsr_el1();
For symmetry, I would also write this as
if (!system_uses_mte_async_or_asymm_mode())
return;
mte_check_tfsr_el1();
Nothing jumped at me, but I am still new to this code + spec :)
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
next prev parent reply other threads:[~2026-03-25 11:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 17:50 [PATCH] arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode Muhammad Usama Anjum
2026-03-12 6:48 ` Yeoreum Yun
2026-03-19 10:58 ` Muhammad Usama Anjum
2026-03-25 11:46 ` David Hildenbrand (Arm) [this message]
2026-04-09 16:27 ` Catalin Marinas
2026-04-09 18:34 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c347fdc9-90b4-43ac-8c2e-559248777ba8@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=thuth@redhat.com \
--cc=usama.anjum@arm.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox