From: Linu Cherian <linu.cherian@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Kevin Brodsky <kevin.brodsky@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <yang@os.amperecomputing.com>,
Mark Rutland <mark.rutland@arm.com>,
Huang Ying <ying.huang@linux.alibaba.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
shameerali.kolothum.thodi@huawei.com
Subject: Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
Date: Tue, 16 Jun 2026 10:35:04 +0530 [thread overview]
Message-ID: <ajDZgPAYlecHFEdl@a079125.arm.com> (raw)
In-Reply-To: <4aa78619-5a79-4fd0-aaac-a990b8c3fd05@arm.com>
Hi,
On Mon, Jun 15, 2026 at 04:41:04PM +0100, Ryan Roberts wrote:
> On 15/06/2026 15:43, Will Deacon wrote:
> > On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
> >> On 14/06/2026 12:04, Will Deacon wrote:
> >>> On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
> >>>> From: Ryan Roberts <ryan.roberts@arm.com>
> >>>>
> >>>> Testing with 7.1-rc4 :
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>> | Benchmark | Result Class | Improvement|
> >>>> +=======================+===================================================+=============+
> >>>> | perf/syscall | fork (ops/sec) | (I) 3.25% |
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) | (I) 2.70% |
> >>>> | | Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) | (I) 2.13% |
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>
> >>> I think we need a much more comprehensive set of benchmarks before we can
> >>> begin to consider a change like this.
> >>
> >> I believe that Linu ran a wider set of benchmarks and didn't find any
> >> regressions. These are just the ones that show improvement (Linu, please correct
> >> me and/or provide details).
> >
> > I think it's important to show the ones that suffer as well... and also
> > look at different configurations (e.g. preemptible settings) and different
> > environments (e.g. native vs in a VM).
> >
> >> Additionally Huang Ying did some testing against the RFC and reported 4.5%
> >> improvement with Redis:
> >>
> >> https://lore.kernel.org/linux-arm-kernel/87segumv6w.fsf@DESKTOP-5N7EMDA
> >
> > To be clear: I'm not disputing that some benchmarks appear to show a small
> > boost from this series. I'm just worried that's not the whole story.
> >
> >>>> arch/arm64/include/asm/mmu.h | 12 +++
> >>>> arch/arm64/include/asm/mmu_context.h | 2 +
> >>>> arch/arm64/include/asm/tlbflush.h | 127 +++++++++++++++++++++------
> >>>> arch/arm64/mm/context.c | 30 ++++++-
> >>>> 4 files changed, 141 insertions(+), 30 deletions(-)
> >>>
> >>> Doesn't this break BTM/SVM with the SMMU? I think that's a non-starter
> >>> even if you can provide some more compelling numbers.
> >>
> >> AIUI, we don't support BTM upstream - the SMMU uses private ASIDs and implements
> >> MMU notifiers to forward the TLBIs via its command queue interface.
> >>
> >> I was also under the impression that supporting BTM upsteam was not desired;
> >> Please correct me if that's not accurate or if you're aware of plans to add
> >> support. I've been (coincidentlly) looking at some other stuff that could
> >> benefit from BTM but had concluded it wouldn't be an acceptable approach upstream.
> >>
> >> If we did ever want to add SMMU BTM support though, I think it would be simple
> >> enough to add an interface to allow the SMMU to disable the optimization (i.e.
> >> force active_cpu to ACTIVE_CPU_MULTIPLE)?
> >
> > We used to have some initial BTM support in the SMMUv3 driver but the
> > main problem was finding an upstream driver/soc that can use it properly
> > and so it was ultimately removed in d38c28dbefee ("iommu/arm-smmu-v3: Put
> > the SVA mmu notifier in the smmu_domain") because it was getting in the
> > way of wider driver rework and we couldn't test it.
> >
> > However, there *is* work to re-enable it on top of that rework (and other
> > changes):
> >
> > https://lore.kernel.org/linux-iommu/20250319173202.78988-6-shameerali.kolothum.thodi@huawei.com/
> >
> > although I don't know if Shameer intends to repost that...
>
> Thanks for the pointers; That's very interesting feedback. I'll take a closer
> look :)
>
> >
> >>>> +static inline bool flush_tlb_user_pre(struct mm_struct *mm, tlbf_t flags)
> >>>> +{
> >>>> + unsigned int self, active;
> >>>> + bool local;
> >>>> +
> >>>> + migrate_disable();
> >>>> +
> >>>> + if (flags & TLBF_NOBROADCAST) {
> >>>> + dsb(nshst);
> >>>> + return true;
> >>>> + }
> >>>
> >>> Why does the NOBROADCAST case need migration disabled? It didn't before...
> >>
> >> The existing semantic for TLBF_BOBROADCAST is that it emits a local TLBI on
> >> whatever CPU we happen to be executing on. It's used for lazily fixing up
> >> spurious faults (i.e. hitting RO TLB entries when the PTE has been relaxed to
> >> RW). So it's still functionally correct if the thread migrates CPU between
> >> taking the fault and issuing the local TLBI - in the worst case it just leads to
> >> another spurious fault.
> >>
> >> For this new case, we need to ensure we don't get migrated between reading
> >> active_cpu and issuing the local TLBI, otherwise we would only issue a local
> >> TLBI when a broadcast was required.
> >
> > Sounds like those two users probably need separating out, then?
>
> Ahh, I see; I'll admit I hadn't actually reviewed the new integration part. I
> agree - NOBROADCAST is different to to this. This is an optimization for the
> "not NOBROADCAST" case. We need to avoid disabling migration in the NOBROADCAST
> case.
>
Ack.
next prev parent reply other threads:[~2026-06-16 5:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-23 13:47 [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Linu Cherian
2026-06-14 11:04 ` Will Deacon
2026-06-14 11:33 ` Will Deacon
2026-06-15 11:21 ` Ryan Roberts
2026-06-15 14:43 ` Will Deacon
2026-06-15 15:41 ` Ryan Roberts
2026-06-16 5:05 ` Linu Cherian [this message]
2026-06-16 5:00 ` Linu Cherian
2026-06-16 4:54 ` Linu Cherian
2026-06-15 12:39 ` Mark Rutland
2026-06-15 14:44 ` Will Deacon
2026-06-16 6:13 ` Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajDZgPAYlecHFEdl@a079125.arm.com \
--to=linu.cherian@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=kevin.brodsky@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=ryan.roberts@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox