Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Linu Cherian <linu.cherian@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Kevin Brodsky <kevin.brodsky@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <yang@os.amperecomputing.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Huang Ying <ying.huang@linux.alibaba.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	shameerali.kolothum.thodi@huawei.com
Subject: Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
Date: Tue, 16 Jun 2026 10:35:04 +0530	[thread overview]
Message-ID: <ajDZgPAYlecHFEdl@a079125.arm.com> (raw)
In-Reply-To: <4aa78619-5a79-4fd0-aaac-a990b8c3fd05@arm.com>

Hi,

On Mon, Jun 15, 2026 at 04:41:04PM +0100, Ryan Roberts wrote:
> On 15/06/2026 15:43, Will Deacon wrote:
> > On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
> >> On 14/06/2026 12:04, Will Deacon wrote:
> >>> On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
> >>>> From: Ryan Roberts <ryan.roberts@arm.com>
> >>>>
> >>>> Testing with 7.1-rc4 :
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>> | Benchmark             | Result Class                                      |  Improvement|  
> >>>> +=======================+===================================================+=============+
> >>>> | perf/syscall          | fork (ops/sec)                                    |   (I) 3.25% |
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) |   (I) 2.70% |
> >>>> | 			| Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) |   (I) 2.13% |
> >>>> +-----------------------+---------------------------------------------------+-------------+
> >>>
> >>> I think we need a much more comprehensive set of benchmarks before we can
> >>> begin to consider a change like this.
> >>
> >> I believe that Linu ran a wider set of benchmarks and didn't find any
> >> regressions. These are just the ones that show improvement (Linu, please correct
> >> me and/or provide details).
> > 
> > I think it's important to show the ones that suffer as well... and also
> > look at different configurations (e.g. preemptible settings) and different
> > environments (e.g. native vs in a VM).
> > 
> >> Additionally Huang Ying did some testing against the RFC and reported 4.5%
> >> improvement with Redis:
> >>
> >> https://lore.kernel.org/linux-arm-kernel/87segumv6w.fsf@DESKTOP-5N7EMDA 
> > 
> > To be clear: I'm not disputing that some benchmarks appear to show a small
> > boost from this series. I'm just worried that's not the whole story.
> > 
> >>>>  arch/arm64/include/asm/mmu.h         |  12 +++
> >>>>  arch/arm64/include/asm/mmu_context.h |   2 +
> >>>>  arch/arm64/include/asm/tlbflush.h    | 127 +++++++++++++++++++++------
> >>>>  arch/arm64/mm/context.c              |  30 ++++++-
> >>>>  4 files changed, 141 insertions(+), 30 deletions(-)
> >>>
> >>> Doesn't this break BTM/SVM with the SMMU? I think that's a non-starter
> >>> even if you can provide some more compelling numbers.
> >>
> >> AIUI, we don't support BTM upstream - the SMMU uses private ASIDs and implements
> >> MMU notifiers to forward the TLBIs via its command queue interface.
> >>
> >> I was also under the impression that supporting BTM upsteam was not desired;
> >> Please correct me if that's not accurate or if you're aware of plans to add
> >> support. I've been (coincidentlly) looking at some other stuff that could
> >> benefit from BTM but had concluded it wouldn't be an acceptable approach upstream.
> >>
> >> If we did ever want to add SMMU BTM support though, I think it would be simple
> >> enough to add an interface to allow the SMMU to disable the optimization (i.e.
> >> force active_cpu to ACTIVE_CPU_MULTIPLE)?
> > 
> > We used to have some initial BTM support in the SMMUv3 driver but the
> > main problem was finding an upstream driver/soc that can use it properly
> > and so it was ultimately removed in d38c28dbefee ("iommu/arm-smmu-v3: Put
> > the SVA mmu notifier in the smmu_domain") because it was getting in the
> > way of wider driver rework and we couldn't test it.
> > 
> > However, there *is* work to re-enable it on top of that rework (and other
> > changes):
> > 
> >   https://lore.kernel.org/linux-iommu/20250319173202.78988-6-shameerali.kolothum.thodi@huawei.com/
> > 
> > although I don't know if Shameer intends to repost that...
> 
> Thanks for the pointers; That's very interesting feedback. I'll take a closer
> look :)
> 
> > 
> >>>> +static inline bool flush_tlb_user_pre(struct mm_struct *mm, tlbf_t flags)
> >>>> +{
> >>>> +	unsigned int self, active;
> >>>> +	bool local;
> >>>> +
> >>>> +	migrate_disable();
> >>>> +
> >>>> +	if (flags & TLBF_NOBROADCAST) {
> >>>> +		dsb(nshst);
> >>>> +		return true;
> >>>> +	}
> >>>
> >>> Why does the NOBROADCAST case need migration disabled? It didn't before...
> >>
> >> The existing semantic for TLBF_BOBROADCAST is that it emits a local TLBI on
> >> whatever CPU we happen to be executing on. It's used for lazily fixing up
> >> spurious faults (i.e. hitting RO TLB entries when the PTE has been relaxed to
> >> RW). So it's still functionally correct if the thread migrates CPU between
> >> taking the fault and issuing the local TLBI - in the worst case it just leads to
> >> another spurious fault.
> >>
> >> For this new case, we need to ensure we don't get migrated between reading
> >> active_cpu and issuing the local TLBI, otherwise we would only issue a local
> >> TLBI when a broadcast was required.
> > 
> > Sounds like those two users probably need separating out, then?
> 
> Ahh, I see; I'll admit I hadn't actually reviewed the new integration part. I
> agree - NOBROADCAST is different to to this. This is an optimization for the
> "not NOBROADCAST" case. We need to avoid disabling migration in the NOBROADCAST
> case.
> 

Ack.

next prev parent reply	other threads:[~2026-06-16  5:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-23 13:47 [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Linu Cherian
2026-06-14 11:04 ` Will Deacon
2026-06-14 11:33   ` Will Deacon
2026-06-15 11:21   ` Ryan Roberts
2026-06-15 14:43     ` Will Deacon
2026-06-15 15:41       ` Ryan Roberts
2026-06-16  5:05         ` Linu Cherian [this message]
2026-06-16  5:00       ` Linu Cherian
2026-06-16  4:54     ` Linu Cherian
2026-06-15 12:39   ` Mark Rutland
2026-06-15 14:44     ` Will Deacon
2026-06-16  6:13       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajDZgPAYlecHFEdl@a079125.arm.com \
    --to=linu.cherian@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ying.huang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.