From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
James Morse <james.morse@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
Date: Tue, 2 Sep 2025 17:23:19 +0100 [thread overview]
Message-ID: <aLcZ93VeOYa4ilZb@arm.com> (raw)
In-Reply-To: <20250829153510.2401161-3-ryan.roberts@arm.com>
On Fri, Aug 29, 2025 at 04:35:08PM +0100, Ryan Roberts wrote:
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index f66b8c4696d0..651440e0aff9 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -182,6 +182,52 @@ static inline unsigned long get_trans_granule(void)
> (__pages >> (5 * (scale) + 1)) - 1; \
> })
>
> +/*
> + * Determines whether the user tlbi invalidation can be performed only on the
> + * local CPU or whether it needs to be broadcast. (Returns true for local).
> + * Additionally issues appropriate barrier to ensure prior pgtable updates are
> + * visible to the table walker. Must be paired with flush_tlb_user_post().
> + */
> +static inline bool flush_tlb_user_pre(struct mm_struct *mm)
> +{
> + unsigned int self, active;
> + bool local;
> +
> + migrate_disable();
> +
> + self = smp_processor_id();
> +
> + /*
> + * The load of mm->context.active_cpu must not be reordered before the
> + * store to the pgtable that necessitated this flush. This ensures that
> + * if the value read is our cpu id, then no other cpu can have seen the
> + * old pgtable value and therefore does not need this old value to be
> + * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
> + * needed to make the pgtable updates visible to the walker, to a
> + * dsb(ish) by default. So speculatively load without a barrier and if
> + * it indicates our cpu id, then upgrade the barrier and re-load.
> + */
> + active = READ_ONCE(mm->context.active_cpu);
> + if (active == self) {
> + dsb(ish);
> + active = READ_ONCE(mm->context.active_cpu);
> + } else {
> + dsb(ishst);
> + }
Does the ISH vs ISHST make much difference in practice? I wonder whether
we could keep this simple.
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index b2ac06246327..adf4fc26ddb6 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -214,9 +214,10 @@ static u64 new_context(struct mm_struct *mm)
>
> void check_and_switch_context(struct mm_struct *mm)
> {
> - unsigned long flags;
> - unsigned int cpu;
> + unsigned int cpu = smp_processor_id();
> u64 asid, old_active_asid;
> + unsigned int active;
> + unsigned long flags;
>
> if (system_supports_cnp())
> cpu_set_reserved_ttbr0();
> @@ -251,7 +252,6 @@ void check_and_switch_context(struct mm_struct *mm)
> atomic64_set(&mm->context.id, asid);
> }
>
> - cpu = smp_processor_id();
> if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
> local_flush_tlb_all();
>
> @@ -262,6 +262,30 @@ void check_and_switch_context(struct mm_struct *mm)
>
> arm64_apply_bp_hardening();
>
> + /*
> + * Update mm->context.active_cpu in such a manner that we avoid cmpxchg
> + * and dsb unless we definitely need it. If initially ACTIVE_CPU_NONE
> + * then we are the first cpu to run so set it to our id. If initially
> + * any id other than ours, we are the second cpu to run so set it to
> + * ACTIVE_CPU_MULTIPLE. If we update the value then we must issue
> + * dsb(ishst) to ensure stores to mm->context.active_cpu are ordered
> + * against the TTBR0 write in cpu_switch_mm()/uaccess_enable(); the
> + * store must be visible to another cpu before this cpu could have
> + * populated any TLB entries based on the pgtables that will be
> + * installed.
> + */
> + active = READ_ONCE(mm->context.active_cpu);
> + if (active != cpu && active != ACTIVE_CPU_MULTIPLE) {
> + if (active == ACTIVE_CPU_NONE)
> + active = cmpxchg_relaxed(&mm->context.active_cpu,
> + ACTIVE_CPU_NONE, cpu);
> +
> + if (active != ACTIVE_CPU_NONE)
> + WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_MULTIPLE);
> +
> + dsb(ishst);
> + }
I think this works. I recall James had a litmus test for the model
checker confirming this.
In a cut-down version, we'd have:
P0: P1:
set_pte(); WRITE_ONCE(active_cpu);
dsb(); dsb();
READ_ONCE(active_cpu); READ_ONCE(pte);
The pte read on P1 is implied following the TTBR0 write. As you
described, if P0 did not see the active_cpu update, P1 would have seen
the updated pte.
So far I couldn't fail this, so:
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
next prev parent reply other threads:[~2025-09-02 21:44 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-29 15:35 [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 1/2] arm64: tlbflush: Move invocation of __flush_tlb_range_op() to a macro Ryan Roberts
2025-09-02 16:25 ` Catalin Marinas
2025-09-11 5:50 ` Anshuman Khandual
2025-09-11 14:12 ` Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Ryan Roberts
2025-09-01 9:08 ` Alexandru Elisei
2025-09-01 9:18 ` Ryan Roberts
2025-09-02 16:23 ` Catalin Marinas [this message]
2025-09-02 16:54 ` Ryan Roberts
2025-09-10 23:58 ` Yang Shi
2025-09-11 1:20 ` Huang, Ying
2025-09-11 14:19 ` Ryan Roberts
2025-09-11 22:29 ` Yang Shi
2025-09-18 15:18 ` Catalin Marinas
2025-09-02 16:47 ` [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Catalin Marinas
2025-09-02 16:56 ` Ryan Roberts
2025-09-15 16:05 ` Christoph Lameter (Ampere)
2025-09-03 2:12 ` Huang, Ying
2025-09-15 16:02 ` Christoph Lameter (Ampere)
2025-09-10 10:57 ` Huang, Ying
2025-09-10 12:42 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLcZ93VeOYa4ilZb@arm.com \
--to=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.