All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: akpm@linux-foundation.org
Cc: david@kernel.org, dave.hansen@intel.com,
	dave.hansen@linux.intel.com, ypodemsk@redhat.com,
	hughd@google.com, will@kernel.org, aneesh.kumar@kernel.org,
	npiggin@gmail.com, peterz@infradead.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	arnd@arndb.de, lorenzo.stoakes@oracle.com, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, shy828301@gmail.com, riel@surriel.com,
	jannh@google.com, jgross@suse.com, seanjc@google.com,
	pbonzini@redhat.com, boris.ostrovsky@oracle.com,
	virtualization@lists.linux.dev, kvm@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, ioworker0@gmail.com
Subject: [PATCH v4 0/3] targeted TLB sync IPIs for lockless page table walkers
Date: Mon,  2 Feb 2026 15:45:54 +0800	[thread overview]
Message-ID: <20260202074557.16544-1-lance.yang@linux.dev> (raw)

When freeing or unsharing page tables we send an IPI to synchronize with
concurrent lockless page table walkers (e.g. GUP-fast). Today we broadcast
that IPI to all CPUs, which is costly on large machines and hurts RT
workloads[1].

This series makes those IPIs targeted. We track which CPUs are currently
doing a lockless page table walk for a given mm (per-CPU
active_lockless_pt_walk_mm). When we need to sync, we only IPI those CPUs.
GUP-fast and perf_get_page_size() set/clear the tracker around their walk;
tlb_remove_table_sync_mm() uses it and replaces the previous broadcast in
the free/unshare paths.

On x86, when the TLB flush path already sends IPIs (native without INVLPGB,
or KVM), the extra sync IPI is redundant. We add a property on pv_mmu_ops
so each backend can declare whether its flush_tlb_multi sends real IPIs; if
so, tlb_remove_table_sync_mm() is a no-op. We also have tlb_flush() pass
both freed_tables and unshared_tables so lazy-TLB CPUs get IPIs during
hugetlb unshare.

David Hildenbrand did the initial implementation. I built on his work and
relied on off-list discussions to push it further - thanks a lot David!

[1] https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/

v3 -> v4:
- Rework based on David's two-step direction and per-CPU idea:
  1) Targeted IPIs: per-CPU variable when entering/leaving lockless page
     table walk; tlb_remove_table_sync_mm() IPIs only those CPUs.
  2) On x86, pv_mmu_ops property set at init to skip the extra sync when
     flush_tlb_multi() already sends IPIs.
  https://lore.kernel.org/linux-mm/bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org/
- https://lore.kernel.org/linux-mm/20260106120303.38124-1-lance.yang@linux.dev/

v2 -> v3:
- Complete rewrite: use dynamic IPI tracking instead of static checks
  (per Dave Hansen, thanks!)
- Track IPIs via mmu_gather: native_flush_tlb_multi() sets flag when
  actually sending IPIs
- Motivation for skipping redundant IPIs explained by David:
  https://lore.kernel.org/linux-mm/1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org/
- https://lore.kernel.org/linux-mm/20251229145245.85452-1-lance.yang@linux.dev/

v1 -> v2:
- Fix cover letter encoding to resolve send-email issues. Apologies for
  any email flood caused by the failed send attempts :(

RFC -> v1:
- Use a callback function in pv_mmu_ops instead of comparing function
  pointers (per David)
- Embed the check directly in tlb_remove_table_sync_one() instead of
  requiring every caller to check explicitly (per David)
- Move tlb_table_flush_implies_ipi_broadcast() outside of
  CONFIG_MMU_GATHER_RCU_TABLE_FREE to fix build error on architectures
  that don't enable this config.
  https://lore.kernel.org/oe-kbuild-all/202512142156.cShiu6PU-lkp@intel.com/
- https://lore.kernel.org/linux-mm/20251213080038.10917-1-lance.yang@linux.dev/

Lance Yang (3):
  mm: use targeted IPIs for TLB sync with lockless page table walkers
  mm: switch callers to tlb_remove_table_sync_mm()
  x86/tlb: add architecture-specific TLB IPI optimization support

 arch/x86/hyperv/mmu.c                 |  5 ++
 arch/x86/include/asm/paravirt.h       |  5 ++
 arch/x86/include/asm/paravirt_types.h |  6 +++
 arch/x86/include/asm/tlb.h            | 20 +++++++-
 arch/x86/kernel/kvm.c                 |  6 +++
 arch/x86/kernel/paravirt.c            | 18 +++++++
 arch/x86/kernel/smpboot.c             |  1 +
 arch/x86/xen/mmu_pv.c                 |  2 +
 include/asm-generic/tlb.h             | 28 +++++++++--
 include/linux/mm.h                    | 34 +++++++++++++
 kernel/events/core.c                  |  2 +
 mm/gup.c                              |  2 +
 mm/khugepaged.c                       |  2 +-
 mm/mmu_gather.c                       | 69 ++++++++++++++++++++++++---
 14 files changed, 187 insertions(+), 13 deletions(-)

-- 
2.49.0


             reply	other threads:[~2026-02-02  7:46 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02  7:45 Lance Yang [this message]
2026-02-02  7:45 ` [PATCH v4 1/3] mm: use targeted IPIs for TLB sync with lockless page table walkers Lance Yang
2026-02-02  9:42   ` Peter Zijlstra
2026-02-02 12:14     ` Lance Yang
2026-02-02 12:51       ` Peter Zijlstra
2026-02-02 13:23         ` Lance Yang
2026-02-02 13:42           ` Peter Zijlstra
2026-02-02 14:28             ` Lance Yang
2026-02-02 16:20       ` Dave Hansen
2026-02-02 11:37   ` kernel test robot
2026-02-03 23:49   ` kernel test robot
2026-02-02  7:45 ` [PATCH v4 2/3] mm: switch callers to tlb_remove_table_sync_mm() Lance Yang
2026-02-02  7:45 ` [PATCH v4 3/3] x86/tlb: add architecture-specific TLB IPI optimization support Lance Yang
2026-02-25 20:11   ` Sean Christopherson
2026-02-26 11:37     ` Lance Yang
2026-02-26 18:24       ` Sean Christopherson
2026-03-01  6:56         ` Lance Yang
2026-02-02  9:54 ` [PATCH v4 0/3] targeted TLB sync IPIs for lockless page table walkers Peter Zijlstra
2026-02-02 11:00   ` [PATCH v4 0/3] targeted TLB sync IPIs for lockless page table Lance Yang
2026-02-02 12:50     ` Peter Zijlstra
2026-02-02 12:58       ` Lance Yang
2026-02-02 13:07         ` Lance Yang
2026-02-02 13:37           ` Peter Zijlstra
2026-02-02 14:37             ` Lance Yang
2026-02-02 15:09               ` Peter Zijlstra
2026-02-02 15:52                 ` Lance Yang
2026-02-05 13:25                   ` David Hildenbrand (Arm)
2026-02-05 15:01                     ` Lance Yang
2026-02-05 15:05                       ` David Hildenbrand (Arm)
2026-02-05 15:28                         ` Lance Yang
2026-02-05 15:09                       ` Dave Hansen
2026-02-05 15:31                         ` Lance Yang
2026-02-05 15:41                           ` Dave Hansen
2026-02-05 16:30                             ` Lance Yang
2026-02-05 16:46                               ` David Hildenbrand (Arm)
2026-02-05 16:48                               ` Matthew Wilcox
2026-02-05 17:06                                 ` David Hildenbrand (Arm)
2026-02-05 18:36                                   ` Dave Hansen
2026-02-05 22:49                                     ` David Hildenbrand (Arm)
2026-02-05 21:30                                   ` David Hildenbrand (Arm)
2026-02-05 17:00                               ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260202074557.16544-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=jgross@suse.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=seanjc@google.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ypodemsk@redhat.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.