linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 00/13] AMD broadcast TLB invalidation
@ 2025-02-26  3:00 Rik van Riel
  2025-02-26  3:00 ` [PATCH v14 01/13] x86/mm: consolidate full flush threshold decision Rik van Riel
                   ` (14 more replies)
  0 siblings, 15 replies; 64+ messages in thread
From: Rik van Riel @ 2025-02-26  3:00 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, bp, peterz, dave.hansen, zhengqi.arch, nadav.amit,
	thomas.lendacky, kernel-team, linux-mm, akpm, jackmanb, jannh,
	mhklinux, andrew.cooper3, Manali.Shukla, mingo

Add support for broadcast TLB invalidation using AMD's INVLPGB instruction.

This allows the kernel to invalidate TLB entries on remote CPUs without
needing to send IPIs, without having to wait for remote CPUs to handle
those interrupts, and with less interruption to what was running on
those CPUs.

Because x86 PCID space is limited, and there are some very large
systems out there, broadcast TLB invalidation is only used for
processes that are active on 3 or more CPUs, with the threshold
being gradually increased the more the PCID space gets exhausted.

Combined with the removal of unnecessary lru_add_drain calls
(see https://lkml.org/lkml/2024/12/19/1388) this results in a
nice performance boost for the will-it-scale tlb_flush2_threads
test on an AMD Milan system with 36 cores:

- vanilla kernel:           527k loops/second
- lru_add_drain removal:    731k loops/second
- only INVLPGB:             527k loops/second
- lru_add_drain + INVLPGB: 1157k loops/second

Profiling with only the INVLPGB changes showed while
TLB invalidation went down from 40% of the total CPU
time to only around 4% of CPU time, the contention
simply moved to the LRU lock.

Fixing both at the same time about doubles the
number of iterations per second from this case.

Some numbers closer to real world performance
can be found at Phoronix, thanks to Michael:

https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits

My current plan is to implement support for Intel's RAR
(Remote Action Request) TLB flushing in a follow-up series,
after this thing has been merged into -tip. Making things
any larger would just be unwieldy for reviewers.

v14:
 - code & comment cleanups (Boris)
 - drop "noinvlpgb" commandline option (Boris)
 - fix !CONFIG_X86_BROADCAST_TLB_FLUSH compile anywhere in the series
v13:
 - move invlpgb_count_max back to amd.c for resume (Boris, Oleksandr)
 - fix Kconfig circular dependency (Tom, Boris)
 - add performance numbers to the patch adding invlpgb for userspace (Ingo)
 - drop page table RCU free patches (already in -tip)
v12
 - make sure "nopcid" command line option turns off invlpgb (Brendan)
 - add "noinvlpgb" kernel command line option
 - split out kernel TLB flushing differently (Dave & Yosry)
 - split up the patch that does invlpgb flushing for user processes (Dave)
 - clean up get_flush_tlb_info (Boris)
 - move invlpgb_count_max initialization to get_cpu_cap (Boris)
 - bunch more comments as requested
v11:
 - resolve conflict with CONFIG_PT_RECLAIM code
 - a few more cleanups (Peter, Brendan, Nadav)
v10:
 - simplify partial pages with min(nr, 1) in the invlpgb loop (Peter)
 - document x86 paravirt, AMD invlpgb, and ARM64 flush without IPI (Brendan)
 - remove IS_ENABLED(CONFIG_X86_BROADCAST_TLB_FLUSH) (Brendan)
 - various cleanups (Brendan)
v9:
 - print warning when start or end address was rounded (Peter)
 - in the reclaim code, tlbsync at context switch time (Peter)
 - fix !CONFIG_CPU_SUP_AMD compile error in arch_tlbbatch_add_pending (Jan)
v8:
 - round start & end to handle non-page-aligned callers (Steven & Jan)
 - fix up changelog & add tested-by tags (Manali)
v7:
 - a few small code cleanups (Nadav)
 - fix spurious VM_WARN_ON_ONCE in mm_global_asid
 - code simplifications & better barriers (Peter & Dave)
v6:
 - fix info->end check in flush_tlb_kernel_range (Michael)
 - disable broadcast TLB flushing on 32 bit x86
v5:
 - use byte assembly for compatibility with older toolchains (Borislav, Michael)
 - ensure a panic on an invalid number of extra pages (Dave, Tom)
 - add cant_migrate() assertion to tlbsync (Jann)
 - a bunch more cleanups (Nadav)
 - key TCE enabling off X86_FEATURE_TCE (Andrew)
 - fix a race between reclaim and ASID transition (Jann)
v4:
 - Use only bitmaps to track free global ASIDs (Nadav)
 - Improved AMD initialization (Borislav & Tom)
 - Various naming and documentation improvements (Peter, Nadav, Tom, Dave)
 - Fixes for subtle race conditions (Jann)
v3:
 - Remove paravirt tlb_remove_table call (thank you Qi Zheng)
 - More suggested cleanups and changelog fixes by Peter and Nadav
v2:
 - Apply suggestions by Peter and Borislav (thank you!)
 - Fix bug in arch_tlbbatch_flush, where we need to do both
   the TLBSYNC, and flush the CPUs that are in the cpumask.
 - Some updates to comments and changelogs based on questions.



^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2025-09-03 14:43 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26  3:00 [PATCH v14 00/13] AMD broadcast TLB invalidation Rik van Riel
2025-02-26  3:00 ` [PATCH v14 01/13] x86/mm: consolidate full flush threshold decision Rik van Riel
2025-09-02 15:44   ` [BUG] x86/mm: regression after 4a02ed8e1cc3 Giovanni Cabiddu
2025-09-02 15:50     ` Dave Hansen
2025-09-02 16:08       ` Nadav Amit
2025-09-02 16:11         ` Dave Hansen
2025-09-03 14:00       ` Rik van Riel
2025-09-02 16:05     ` Jann Horn
2025-09-02 16:13       ` Jann Horn
2025-09-03 14:18       ` Nadav Amit
2025-09-03 14:42         ` Jann Horn
2025-09-02 16:31     ` Jann Horn
2025-09-02 16:57       ` Giovanni Cabiddu
2025-02-26  3:00 ` [PATCH v14 02/13] x86/mm: get INVLPGB count max from CPUID Rik van Riel
2025-02-28 16:21   ` Borislav Petkov
2025-02-28 19:27   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 03/13] x86/mm: add INVLPGB support code Rik van Riel
2025-02-28 18:46   ` Borislav Petkov
2025-02-28 18:51   ` Dave Hansen
2025-02-28 19:47   ` Borislav Petkov
2025-03-03 18:41     ` Dave Hansen
2025-03-03 19:23       ` Dave Hansen
2025-03-04 11:00         ` Borislav Petkov
2025-03-04 15:10           ` Dave Hansen
2025-03-04 16:19             ` Borislav Petkov
2025-03-04 16:57               ` Dave Hansen
2025-03-04 21:12                 ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 04/13] x86/mm: use INVLPGB for kernel TLB flushes Rik van Riel
2025-02-28 19:00   ` Dave Hansen
2025-02-28 21:43   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 05/13] x86/mm: use INVLPGB in flush_tlb_all Rik van Riel
2025-02-28 19:18   ` Dave Hansen
2025-03-01 12:20     ` Borislav Petkov
2025-03-01 15:54       ` Rik van Riel
2025-02-28 22:20   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 06/13] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing Rik van Riel
2025-02-28 18:57   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 07/13] x86/mm: add global ASID allocation helper functions Rik van Riel
2025-03-02  7:06   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 08/13] x86/mm: global ASID context switch & TLB flush handling Rik van Riel
2025-03-02  7:58   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 09/13] x86/mm: global ASID process exit helpers Rik van Riel
2025-03-02 12:38   ` Borislav Petkov
2025-03-02 13:53     ` Rik van Riel
2025-03-03 10:16       ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 10/13] x86/mm: enable broadcast TLB invalidation for multi-threaded processes Rik van Riel
2025-03-03 10:57   ` Borislav Petkov
2025-02-26  3:00 ` [PATCH v14 11/13] x86/mm: do targeted broadcast flushing from tlbbatch code Rik van Riel
2025-03-03 11:46   ` Borislav Petkov
2025-03-03 21:47     ` Dave Hansen
2025-03-04 11:52       ` Borislav Petkov
2025-03-04 15:24         ` Dave Hansen
2025-03-04 12:52       ` Brendan Jackman
2025-03-04 14:11         ` Borislav Petkov
2025-03-04 15:33           ` Brendan Jackman
2025-03-04 17:51             ` Dave Hansen
2025-02-26  3:00 ` [PATCH v14 12/13] x86/mm: enable AMD translation cache extensions Rik van Riel
2025-02-26  3:00 ` [PATCH v14 13/13] x86/mm: only invalidate final translations with INVLPGB Rik van Riel
2025-03-03 22:40   ` Dave Hansen
2025-03-04 11:53     ` Borislav Petkov
2025-03-03 12:42 ` [PATCH v14 00/13] AMD broadcast TLB invalidation Borislav Petkov
2025-03-03 13:29   ` Borislav Petkov
2025-03-04 12:04 ` [PATCH] x86/mm: Always set the ASID valid bit for the INVLPGB instruction Borislav Petkov
2025-03-04 12:43   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).