All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	 James Morse <james.morse@arm.com>,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Takao Indoh <indou.takao@jp.fujitsu.com>,
	QI Fuli <qi.fuli@fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rafael Aquini <aquini@redhat.com>
Subject: Re: [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU
Date: Wed, 03 Sep 2025 10:12:52 +0800	[thread overview]
Message-ID: <874itk1dy3.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250829153510.2401161-1-ryan.roberts@arm.com> (Ryan Roberts's message of "Fri, 29 Aug 2025 16:35:06 +0100")

Hi, Ryan,

Ryan Roberts <ryan.roberts@arm.com> writes:

> Hi All,
>
> This is an RFC for my implementation of an idea from James Morse to avoid
> broadcasting TBLIs to remote CPUs if it can be proven that no remote CPU could
> have ever observed the pgtable entry for the TLB entry that is being
> invalidated. It turns out that x86 does something similar in principle.
>
> The primary feedback I'm looking for is; is this actually correct and safe?
> James and I both believe it to be, but it would be useful to get further
> validation.
>
> Beyond that, the next question is; does it actually improve performance?
> stress-ng's --tlb-shootdown stressor suggests yes; as concurrency increases, we
> do a much better job of sustaining the overall number of "tlb shootdowns per
> second" after the change:
>
> +------------+--------------------------+--------------------------+--------------------------+
> |            |     Baseline (v6.15)     |        tlbi local        |        Improvement       |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> | nr_threads |     ops/sec |    ops/sec |     ops/sec |    ops/sec |     ops/sec |    ops/sec |
> |            | (real time) | (cpu time) | (real time) | (cpu time) | (real time) | (cpu time) |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> |          1 |        9109 |       2573 |        8903 |       3653 |         -2% |        42% |
> |          4 |        8115 |       1299 |        9892 |       1059 |         22% |       -18% |
> |          8 |        5119 |        477 |       11854 |       1265 |        132% |       165% |
> |         16 |        4796 |        286 |       14176 |        821 |        196% |       187% |
> |         32 |        1593 |         38 |       15328 |        474 |        862% |      1147% |
> |         64 |        1486 |         19 |        8096 |        131 |        445% |       589% |
> |        128 |        1315 |         16 |        8257 |        145 |        528% |       806% |
> +------------+-------------+------------+-------------+------------+-------------+------------+
>
> But looking at real-world benchmarks, I haven't yet found anything where it
> makes a huge difference; When compiling the kernel, it reduces kernel time by
> ~2.2%, but overall wall time remains the same. I'd be interested in any
> suggestions for workloads where this might prove valuable.
>
> All mm selftests have been run and no regressions are observed. Applies on
> v6.17-rc3.

Thanks for working on this.

Several previous TLBI broadcast optimization have been tried before,
Cced the original authors for discussion.  Some workloads show good
improvement,

https://lore.kernel.org/lkml/20190617143255.10462-1-indou.takao@jp.fujitsu.com/
https://lore.kernel.org/all/20200203201745.29986-1-aarcange@redhat.com/

Especially in the following mail,

https://lore.kernel.org/all/20200314031609.GB2250@redhat.com/

---
Best Regards,
Huang, Ying


  parent reply	other threads:[~2025-09-03  2:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29 15:35 [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 1/2] arm64: tlbflush: Move invocation of __flush_tlb_range_op() to a macro Ryan Roberts
2025-09-02 16:25   ` Catalin Marinas
2025-09-11  5:50   ` Anshuman Khandual
2025-09-11 14:12     ` Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Ryan Roberts
2025-09-01  9:08   ` Alexandru Elisei
2025-09-01  9:18     ` Ryan Roberts
2025-09-02 16:23   ` Catalin Marinas
2025-09-02 16:54     ` Ryan Roberts
2025-09-10 23:58   ` Yang Shi
2025-09-11  1:20     ` Huang, Ying
2025-09-11 14:19     ` Ryan Roberts
2025-09-11 22:29       ` Yang Shi
2025-09-18 15:18   ` Catalin Marinas
2025-09-02 16:47 ` [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Catalin Marinas
2025-09-02 16:56   ` Ryan Roberts
2025-09-15 16:05   ` Christoph Lameter (Ampere)
2025-09-03  2:12 ` Huang, Ying [this message]
2025-09-15 16:02   ` Christoph Lameter (Ampere)
2025-09-10 10:57 ` Huang, Ying
2025-09-10 12:42   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874itk1dy3.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=aquini@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=qi.fuli@fujitsu.com \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.