public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	 James Morse <james.morse@arm.com>,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU
Date: Wed, 10 Sep 2025 18:57:27 +0800	[thread overview]
Message-ID: <87segumv6w.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250829153510.2401161-1-ryan.roberts@arm.com> (Ryan Roberts's message of "Fri, 29 Aug 2025 16:35:06 +0100")

Ryan Roberts <ryan.roberts@arm.com> writes:

> Hi All,
>
> This is an RFC for my implementation of an idea from James Morse to avoid
> broadcasting TBLIs to remote CPUs if it can be proven that no remote CPU could
> have ever observed the pgtable entry for the TLB entry that is being
> invalidated. It turns out that x86 does something similar in principle.
>
> The primary feedback I'm looking for is; is this actually correct and safe?
> James and I both believe it to be, but it would be useful to get further
> validation.
>
> Beyond that, the next question is; does it actually improve performance?
> stress-ng's --tlb-shootdown stressor suggests yes; as concurrency increases, we
> do a much better job of sustaining the overall number of "tlb shootdowns per
> second" after the change:
>
> +------------+--------------------------+--------------------------+--------------------------+
> |            |     Baseline (v6.15)     |        tlbi local        |        Improvement       |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> | nr_threads |     ops/sec |    ops/sec |     ops/sec |    ops/sec |     ops/sec |    ops/sec |
> |            | (real time) | (cpu time) | (real time) | (cpu time) | (real time) | (cpu time) |
> +------------+-------------+------------+-------------+------------+-------------+------------+
> |          1 |        9109 |       2573 |        8903 |       3653 |         -2% |        42% |
> |          4 |        8115 |       1299 |        9892 |       1059 |         22% |       -18% |
> |          8 |        5119 |        477 |       11854 |       1265 |        132% |       165% |
> |         16 |        4796 |        286 |       14176 |        821 |        196% |       187% |
> |         32 |        1593 |         38 |       15328 |        474 |        862% |      1147% |
> |         64 |        1486 |         19 |        8096 |        131 |        445% |       589% |
> |        128 |        1315 |         16 |        8257 |        145 |        528% |       806% |
> +------------+-------------+------------+-------------+------------+-------------+------------+
>
> But looking at real-world benchmarks, I haven't yet found anything where it
> makes a huge difference; When compiling the kernel, it reduces kernel time by
> ~2.2%, but overall wall time remains the same. I'd be interested in any
> suggestions for workloads where this might prove valuable.
>
> All mm selftests have been run and no regressions are observed. Applies on
> v6.17-rc3.

I have used redis (a single threaded in-memory database) to test the
patchset on an ARM server.  32 redis-server processes are run on the
NUMA node 1 to enlarge the overhead of TLBI broadcast.  32
memtier-benchmark processes are run on the NUMA node 0 accordingly.
Snapshot is triggered constantly in redis-server, which fork(), saves
memory database to disk, exit(), so that COW in the redis-server will
trigger a large amount of TLBI.  Basically, this tests the performance
of redis-server during snapshot.  The test time is about 300s.  Test
results show that the benchmark score can improve ~4.5% with the
patchset.

Feel free to add my

Tested-by: Huang Ying <ying.huang@linux.alibaba.com>

in the future versions.

---
Best Regards,
Huang, Ying

  parent reply	other threads:[~2025-09-10 10:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29 15:35 [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 1/2] arm64: tlbflush: Move invocation of __flush_tlb_range_op() to a macro Ryan Roberts
2025-09-02 16:25   ` Catalin Marinas
2025-09-11  5:50   ` Anshuman Khandual
2025-09-11 14:12     ` Ryan Roberts
2025-08-29 15:35 ` [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu Ryan Roberts
2025-09-01  9:08   ` Alexandru Elisei
2025-09-01  9:18     ` Ryan Roberts
2025-09-02 16:23   ` Catalin Marinas
2025-09-02 16:54     ` Ryan Roberts
2025-09-10 23:58   ` Yang Shi
2025-09-11  1:20     ` Huang, Ying
2025-09-11 14:19     ` Ryan Roberts
2025-09-11 22:29       ` Yang Shi
2025-09-18 15:18   ` Catalin Marinas
2025-09-02 16:47 ` [RFC PATCH v1 0/2] Don't broadcast TLBI if mm was only active on local CPU Catalin Marinas
2025-09-02 16:56   ` Ryan Roberts
2025-09-15 16:05   ` Christoph Lameter (Ampere)
2025-09-03  2:12 ` Huang, Ying
2025-09-15 16:02   ` Christoph Lameter (Ampere)
2025-09-10 10:57 ` Huang, Ying [this message]
2025-09-10 12:42   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87segumv6w.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox