All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU
Date: Tue, 2 Sep 2025 17:47:56 +0100	[thread overview]
Message-ID: <aLcfvIfFb6xD-NXp@arm.com> (raw)
In-Reply-To: <20250829153510.2401161-1-ryan.roberts@arm.com>

On Fri, Aug 29, 2025 at 04:35:06PM +0100, Ryan Roberts wrote:
> Beyond that, the next question is; does it actually improve performance?
> stress-ng's --tlb-shootdown stressor suggests yes; as concurrency increases, we
> do a much better job of sustaining the overall number of "tlb shootdowns per
> second" after the change:
> 
> +------------+--------------------------+--------------------------+--------------------------+
> |            |     Baseline (v6.15)     |        tlbi local        |        Improvement       |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> | nr_threads |     ops/sec |    ops/sec |     ops/sec |    ops/sec |     ops/sec |    ops/sec |
> |            | (real time) | (cpu time) | (real time) | (cpu time) | (real time) | (cpu time) |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> |          1 |        9109 |       2573 |        8903 |       3653 |         -2% |        42% |
> |          4 |        8115 |       1299 |        9892 |       1059 |         22% |       -18% |
> |          8 |        5119 |        477 |       11854 |       1265 |        132% |       165% |
> |         16 |        4796 |        286 |       14176 |        821 |        196% |       187% |
> |         32 |        1593 |         38 |       15328 |        474 |        862% |      1147% |
> |         64 |        1486 |         19 |        8096 |        131 |        445% |       589% |
> |        128 |        1315 |         16 |        8257 |        145 |        528% |       806% |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> 
> But looking at real-world benchmarks, I haven't yet found anything where it
> makes a huge difference; When compiling the kernel, it reduces kernel time by
> ~2.2%, but overall wall time remains the same. I'd be interested in any
> suggestions for workloads where this might prove valuable.

I suspect it's highly dependent on hardware and how it handles the DVM
messages. There were some old proposals from Fujitsu:

https://lore.kernel.org/linux-arm-kernel/20190617143255.10462-1-indou.takao@jp.fujitsu.com/

Christoph Lameter (Ampere) also followed with some refactoring in this
area to allow a boot-configurable way to do TLBI via IS ops or IPI:

https://lore.kernel.org/linux-arm-kernel/20231207035703.158053467@gentwo.org/

(for some reason, the patches did not make it to the list, I have them
in my inbox if you are interested)

I don't remember any real-world workload, more like hand-crafted
mprotect() loops.

Anyway, I think the approach in your series doesn't have downsides, it's
fairly clean and addresses some low-hanging fruits. For multi-threaded
workloads where a flush_tlb_mm() is cheaper than a series of per-page
TLBIs, I think we can wait for that hardware to be phased out. The TLBI
range operations should significantly reduce the DVM messages between
CPUs.

-- 
Catalin


  parent reply	other threads:[~2025-09-02 21:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29 15:35 [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 1/2] arm64: tlbflush: Move invocation of __flush_tlb_range_op() to a macro Ryan Roberts
2025-09-02 16:25   ` Catalin Marinas
2025-09-11  5:50   ` Anshuman Khandual
2025-09-11 14:12     ` Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Ryan Roberts
2025-09-01  9:08   ` Alexandru Elisei
2025-09-01  9:18     ` Ryan Roberts
2025-09-02 16:23   ` Catalin Marinas
2025-09-02 16:54     ` Ryan Roberts
2025-09-10 23:58   ` Yang Shi
2025-09-11  1:20     ` Huang, Ying
2025-09-11 14:19     ` Ryan Roberts
2025-09-11 22:29       ` Yang Shi
2025-09-18 15:18   ` Catalin Marinas
2025-09-02 16:47 ` Catalin Marinas [this message]
2025-09-02 16:56   ` [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Ryan Roberts
2025-09-15 16:05   ` Christoph Lameter (Ampere)
2025-09-03  2:12 ` Huang, Ying
2025-09-15 16:02   ` Christoph Lameter (Ampere)
2025-09-10 10:57 ` Huang, Ying
2025-09-10 12:42   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLcfvIfFb6xD-NXp@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.