linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 PATCH 0/9] Intel RAR TLB invalidation
@ 2025-05-20  1:02 Rik van Riel
  2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit

This patch series adds support for IPI-less TLB invalidation
using Intel RAR technology.

Intel RAR differs from AMD INVLPGB in a few ways:
- RAR goes through (emulated?) APIC writes, not instructions
- RAR flushes go through a memory table with 64 entries
- RAR flushes can be targeted to a cpumask
- The RAR functionality must be set up at boot time before it can be used

The cpumask targeting has resulted in Intel RAR and AMD INVLPGB having
slightly different rules:
- Processes with dynamic ASIDs use IPI based shootdowns
- INVLPGB: processes with a global ASID 
   - always have the TLB up to date, on every CPU
   - never need to flush the TLB at context switch time
- RAR: processes with global ASIDs
   - have the TLB up to date on CPUs in the mm_cpumask
   - can skip a TLB flush at context switch time if the CPU is in the mm_cpumask
   - need to flush the TLB when scheduled on a cpu not in the mm_cpumask,
     in case it used to run there before and the TLB has stale entries

RAR functionality is present on Sapphire Rapids and newer CPUs.

Information about Intel RAR can be found in this whitepaper.

https://www.intel.com/content/dam/develop/external/us/en/documents/341431-remote-action-request-white-paper.pdf

This patch series is based off a 2019 patch series created by
Intel, with patches later in the series modified to fit into
the TLB flush code structure we have after AMD INVLPGB functionality
was integrated.

RFC v2:
- Cleanups suggested by Ingo and Nadav (thank you)
- Basic RAR code seems to actually work now
- Kernel TLB flushes with RAR seem to work correctly
- User TLB flushes with RAR are still broken, with two symptoms:
  - The !is_lazy WARN_ON in leave_mm() is tripped
  - Random segfaults



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-21 14:57   ` Dave Hansen
  2025-05-22 15:10   ` Sean Christopherson
  2025-05-20  1:02 ` [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs Rik van Riel
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu, Rik van Riel

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

MSR_IA32_CORE_CAPABILITIES indicates the existence of other MSRs.
Bit[1] indicates Remote Action Request (RAR) TLB registers.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..c848dd4bfceb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -220,6 +220,12 @@
 						     * their affected status.
 						     */
 
+#define MSR_IA32_CORE_CAPABILITIES	0x000000cf
+#define CORE_CAP_RAR			BIT(1)	/*
+						 * Remote Action Request. Used to directly
+						 * flush the TLB on remote CPUs.
+						 */
+
 #define MSR_IA32_FLUSH_CMD		0x0000010b
 #define L1D_FLUSH			BIT(0)	/*
 						 * Writeback and invalidate the
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
  2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-21 11:49   ` Borislav Petkov
  2025-05-20  1:02 ` [RFC v2 3/9] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too Rik van Riel
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu, Rik van Riel

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

Remote Action Request (RAR) is a TLB flushing broadcast facility.
This patch introduces RAR MSRs.  RAR is introduced in later patches.

There are five RAR MSRs:

  MSR_CORE_CAPABILITIES
  MSR_IA32_RAR_CTRL
  MSR_IA32_RAR_ACT_VEC
  MSR_IA32_RAR_PAYLOAD_BASE
  MSR_IA32_RAR_INFO

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/msr-index.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index c848dd4bfceb..adff8f0dc7bb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -122,6 +122,17 @@
 #define SNB_C3_AUTO_UNDEMOTE		(1UL << 27)
 #define SNB_C1_AUTO_UNDEMOTE		(1UL << 28)
 
+/*
+ * Remote Action Requests (RAR) MSRs
+ */
+#define MSR_IA32_RAR_CTRL		0x000000ed
+#define MSR_IA32_RAR_ACT_VEC		0x000000ee
+#define MSR_IA32_RAR_PAYLOAD_BASE	0x000000ef
+#define MSR_IA32_RAR_INFO		0x000000f0
+
+#define RAR_CTRL_ENABLE			BIT(31)
+#define RAR_CTRL_IGNORE_IF		BIT(30)
+
 #define MSR_MTRRcap			0x000000fe
 
 #define MSR_IA32_ARCH_CAPABILITIES	0x0000010a
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 3/9] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
  2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
  2025-05-20  1:02 ` [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-20  1:02 ` [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR Rik van Riel
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Rik van Riel

From: Rik van Riel <riel@fb.com>

Much of the code for Intel RAR and AMD INVLPGB is shared.

Place both under the same config option.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/Kconfig.cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index f928cf6e3252..f9cdd145abba 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -360,7 +360,7 @@ menuconfig PROCESSOR_SELECT
 
 config BROADCAST_TLB_FLUSH
 	def_bool y
-	depends on CPU_SUP_AMD && 64BIT
+	depends on (CPU_SUP_AMD || CPU_SUP_INTEL) && 64BIT
 
 config CPU_SUP_INTEL
 	default y
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (2 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 3/9] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-21 11:53   ` Borislav Petkov
  2025-05-20  1:02 ` [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly Rik van Riel
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu,
	Rik van Riel

From: Rik van Riel <riel@fb.com>

Introduce X86_FEATURE_RAR and enumeration of the feature.

[riel: moved initialization to intel.c and disabling to Kconfig.cpufeatures]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/Kconfig.cpufeatures       |  4 ++++
 arch/x86/include/asm/cpufeatures.h |  2 +-
 arch/x86/kernel/cpu/common.c       | 13 +++++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
index 250c10627ab3..7d459b5f47f7 100644
--- a/arch/x86/Kconfig.cpufeatures
+++ b/arch/x86/Kconfig.cpufeatures
@@ -195,3 +195,7 @@ config X86_DISABLED_FEATURE_SEV_SNP
 config X86_DISABLED_FEATURE_INVLPGB
 	def_bool y
 	depends on !BROADCAST_TLB_FLUSH
+
+config X86_DISABLED_FEATURE_RAR
+	def_bool y
+	depends on !BROADCAST_TLB_FLUSH
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 5b50e0e35129..0729c2d54109 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -76,7 +76,7 @@
 #define X86_FEATURE_K8			( 3*32+ 4) /* Opteron, Athlon64 */
 #define X86_FEATURE_ZEN5		( 3*32+ 5) /* CPU based on Zen5 microarchitecture */
 #define X86_FEATURE_ZEN6		( 3*32+ 6) /* CPU based on Zen6 microarchitecture */
-/* Free                                 ( 3*32+ 7) */
+#define X86_FEATURE_RAR			( 3*32+ 7) /* Intel Remote Action Request */
 #define X86_FEATURE_CONSTANT_TSC	( 3*32+ 8) /* "constant_tsc" TSC ticks at a constant rate */
 #define X86_FEATURE_UP			( 3*32+ 9) /* "up" SMP kernel running on UP */
 #define X86_FEATURE_ART			( 3*32+10) /* "art" Always running timer (ART) */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8feb8fd2957a..dd662c42f510 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1545,6 +1545,18 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 	setup_force_cpu_bug(X86_BUG_L1TF);
 }
 
+static void __init detect_rar(struct cpuinfo_x86 *c)
+{
+	u64 msr;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, msr);
+
+		if (msr & CORE_CAP_RAR)
+			setup_force_cpu_cap(X86_FEATURE_RAR);
+	}
+}
+
 /*
  * The NOPL instruction is supposed to exist on all CPUs of family >= 6;
  * unfortunately, that's not true in practice because of early VIA
@@ -1771,6 +1783,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 		setup_clear_cpu_cap(X86_FEATURE_LA57);
 
 	detect_nopl();
+	detect_rar(c);
 }
 
 void __init init_cpu_devs(void)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (3 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-21 11:54   ` Borislav Petkov
  2025-05-21 15:16   ` Dave Hansen
  2025-05-20  1:02 ` [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations Rik van Riel
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu,
	Rik van Riel

From: Rik van Riel <riel@fb.com>

The function cpa_flush() calls __flush_tlb_one_kernel() and
flush_tlb_all().

Replacing that with a call to flush_tlb_kernel_range() allows
cpa_flush() to make use of INVLPGB or RAR without any additional
changes.

Initialize invlpgb_count_max to 1, since flush_tlb_kernel_range()
can now be called before invlpgb_count_max has been initialized
to the value read from CPUID.

[riel: remove now unused __cpa_flush_tlb]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/kernel/cpu/amd.c    |  2 +-
 arch/x86/mm/pat/set_memory.c | 20 +++++++-------------
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 93da466dfe2c..b2ad8d13211a 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -31,7 +31,7 @@
 
 #include "cpu.h"
 
-u16 invlpgb_count_max __ro_after_init;
+u16 invlpgb_count_max __ro_after_init = 1;
 
 static inline int rdmsrq_amd_safe(unsigned msr, u64 *p)
 {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 30ab4aced761..2454f5249329 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -399,15 +399,6 @@ static void cpa_flush_all(unsigned long cache)
 	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
 }
 
-static void __cpa_flush_tlb(void *data)
-{
-	struct cpa_data *cpa = data;
-	unsigned int i;
-
-	for (i = 0; i < cpa->numpages; i++)
-		flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
-}
-
 static int collapse_large_pages(unsigned long addr, struct list_head *pgtables);
 
 static void cpa_collapse_large_pages(struct cpa_data *cpa)
@@ -444,6 +435,7 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
 
 static void cpa_flush(struct cpa_data *cpa, int cache)
 {
+	unsigned long start, end;
 	unsigned int i;
 
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
@@ -453,10 +445,12 @@ static void cpa_flush(struct cpa_data *cpa, int cache)
 		goto collapse_large_pages;
 	}
 
-	if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
-		flush_tlb_all();
-	else
-		on_each_cpu(__cpa_flush_tlb, cpa, 1);
+	start = fix_addr(__cpa_addr(cpa, 0));
+	end = fix_addr(__cpa_addr(cpa, cpa->numpages));
+	if (cpa->force_flush_all)
+		end = TLB_FLUSH_ALL;
+
+	flush_tlb_kernel_range(start, end);
 
 	if (!cache)
 		goto collapse_large_pages;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (4 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-20  9:16   ` Ingo Molnar
  2025-05-21 15:28   ` Dave Hansen
  2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu,
	Rik van Riel

From: Rik van Riel <riel@fb.com>

RAR TLB flushing is started by sending a command to the APIC.
This patch adds Remote Action Request commands.

[riel: move some things around to acount for 6 years of changes]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/apicdef.h     |  1 +
 arch/x86/include/asm/irq_vectors.h |  5 +++++
 arch/x86/include/asm/smp.h         | 15 +++++++++++++++
 arch/x86/kernel/apic/ipi.c         | 23 +++++++++++++++++++----
 arch/x86/kernel/apic/local.h       |  3 +++
 arch/x86/kernel/smp.c              |  3 +++
 6 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 094106b6a538..b152d45af91a 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -92,6 +92,7 @@
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
+#define		APIC_DM_RAR		0x00300
 #define		APIC_DM_NMI		0x00400
 #define		APIC_DM_INIT		0x00500
 #define		APIC_DM_STARTUP		0x00600
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 47051871b436..c417b0015304 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,6 +103,11 @@
  */
 #define POSTED_MSI_NOTIFICATION_VECTOR	0xeb
 
+/*
+ * RAR (remote action request) TLB flush
+ */
+#define RAR_VECTOR			0xe0
+
 #define NR_VECTORS			 256
 
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 0c1c68039d6f..1ab9f5fcac8a 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -40,6 +40,9 @@ struct smp_ops {
 
 	void (*send_call_func_ipi)(const struct cpumask *mask);
 	void (*send_call_func_single_ipi)(int cpu);
+
+	void (*send_rar_ipi)(const struct cpumask *mask);
+	void (*send_rar_single_ipi)(int cpu);
 };
 
 /* Globals due to paravirt */
@@ -100,6 +103,16 @@ static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 	smp_ops.send_call_func_ipi(mask);
 }
 
+static inline void arch_send_rar_single_ipi(int cpu)
+{
+	smp_ops.send_rar_single_ipi(cpu);
+}
+
+static inline void arch_send_rar_ipi_mask(const struct cpumask *mask)
+{
+	smp_ops.send_rar_ipi(mask);
+}
+
 void cpu_disable_common(void);
 void native_smp_prepare_boot_cpu(void);
 void smp_prepare_cpus_common(void);
@@ -120,6 +133,8 @@ void __noreturn mwait_play_dead(unsigned int eax_hint);
 void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
+void native_send_rar_ipi(const struct cpumask *mask);
+void native_send_rar_single_ipi(int cpu);
 
 asmlinkage __visible void smp_reboot_interrupt(void);
 __visible void smp_reschedule_interrupt(struct pt_regs *regs);
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 98a57cb4aa86..e5e9fc08f86c 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -79,7 +79,7 @@ void native_send_call_func_single_ipi(int cpu)
 	__apic_send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
-void native_send_call_func_ipi(const struct cpumask *mask)
+static void do_native_send_ipi(const struct cpumask *mask, int vector)
 {
 	if (static_branch_likely(&apic_use_ipi_shorthand)) {
 		unsigned int cpu = smp_processor_id();
@@ -88,14 +88,19 @@ void native_send_call_func_ipi(const struct cpumask *mask)
 			goto sendmask;
 
 		if (cpumask_test_cpu(cpu, mask))
-			__apic_send_IPI_all(CALL_FUNCTION_VECTOR);
+			__apic_send_IPI_all(vector);
 		else if (num_online_cpus() > 1)
-			__apic_send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+			__apic_send_IPI_allbutself(vector);
 		return;
 	}
 
 sendmask:
-	__apic_send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+	__apic_send_IPI_mask(mask, vector);
+}
+
+void native_send_call_func_ipi(const struct cpumask *mask)
+{
+	do_native_send_ipi(mask, CALL_FUNCTION_VECTOR);
 }
 
 void apic_send_nmi_to_offline_cpu(unsigned int cpu)
@@ -106,6 +111,16 @@ void apic_send_nmi_to_offline_cpu(unsigned int cpu)
 		return;
 	apic->send_IPI(cpu, NMI_VECTOR);
 }
+
+void native_send_rar_single_ipi(int cpu)
+{
+	apic->send_IPI_mask(cpumask_of(cpu), RAR_VECTOR);
+}
+
+void native_send_rar_ipi(const struct cpumask *mask)
+{
+	do_native_send_ipi(mask, RAR_VECTOR);
+}
 #endif /* CONFIG_SMP */
 
 static inline int __prepare_ICR2(unsigned int mask)
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index bdcf609eb283..833669174267 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -38,6 +38,9 @@ static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
 	case NMI_VECTOR:
 		icr |= APIC_DM_NMI;
 		break;
+	case RAR_VECTOR:
+		icr |= APIC_DM_RAR;
+		break;
 	}
 	return icr;
 }
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 18266cc3d98c..2c51ed6aaf03 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -297,5 +297,8 @@ struct smp_ops smp_ops = {
 
 	.send_call_func_ipi	= native_send_call_func_ipi,
 	.send_call_func_single_ipi = native_send_call_func_single_ipi,
+
+	.send_rar_ipi		= native_send_rar_ipi,
+	.send_rar_single_ipi	= native_send_rar_single_ipi,
 };
 EXPORT_SYMBOL_GPL(smp_ops);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (5 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-20  9:28   ` Ingo Molnar
                     ` (2 more replies)
  2025-05-20  1:02 ` [RFC v2 8/9] x86/mm: use RAR for kernel TLB flushes Rik van Riel
  2025-05-20  1:02 ` [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR Rik van Riel
  8 siblings, 3 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu, Rik van Riel

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

Remote Action Request (RAR) is a TLB flushing broadcast facility.
To start a TLB flush, the initiator CPU creates a RAR payload and
sends a command to the APIC.  The receiving CPUs automatically flush
TLBs as specified in the payload without the kernel's involement.

[ riel: add pcid parameter to smp_call_rar_many so other mms can be flushed ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/rar.h   |  69 +++++++++++++
 arch/x86/kernel/cpu/common.c |   4 +
 arch/x86/mm/Makefile         |   1 +
 arch/x86/mm/rar.c            | 195 +++++++++++++++++++++++++++++++++++
 4 files changed, 269 insertions(+)
 create mode 100644 arch/x86/include/asm/rar.h
 create mode 100644 arch/x86/mm/rar.c

diff --git a/arch/x86/include/asm/rar.h b/arch/x86/include/asm/rar.h
new file mode 100644
index 000000000000..78c039e40e81
--- /dev/null
+++ b/arch/x86/include/asm/rar.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RAR_H
+#define _ASM_X86_RAR_H
+
+/*
+ * RAR payload types
+ */
+#define RAR_TYPE_INVPG		0
+#define RAR_TYPE_INVPG_NO_CR3	1
+#define RAR_TYPE_INVPCID	2
+#define RAR_TYPE_INVEPT		3
+#define RAR_TYPE_INVVPID	4
+#define RAR_TYPE_WRMSR		5
+
+/*
+ * Subtypes for RAR_TYPE_INVLPG
+ */
+#define RAR_INVPG_ADDR			0 /* address specific */
+#define RAR_INVPG_ALL			2 /* all, include global */
+#define RAR_INVPG_ALL_NO_GLOBAL		3 /* all, exclude global */
+
+/*
+ * Subtypes for RAR_TYPE_INVPCID
+ */
+#define RAR_INVPCID_ADDR		0 /* address specific */
+#define RAR_INVPCID_PCID		1 /* all of PCID */
+#define RAR_INVPCID_ALL			2 /* all, include global */
+#define RAR_INVPCID_ALL_NO_GLOBAL	3 /* all, exclude global */
+
+/*
+ * Page size for RAR_TYPE_INVLPG
+ */
+#define RAR_INVLPG_PAGE_SIZE_4K		0
+#define RAR_INVLPG_PAGE_SIZE_2M		1
+#define RAR_INVLPG_PAGE_SIZE_1G		2
+
+/*
+ * Max number of pages per payload
+ */
+#define RAR_INVLPG_MAX_PAGES 63
+
+struct rar_payload {
+	u64 for_sw		: 8;
+	u64 type		: 8;
+	u64 must_be_zero_1	: 16;
+	u64 subtype		: 3;
+	u64 page_size		: 2;
+	u64 num_pages		: 6;
+	u64 must_be_zero_2	: 21;
+
+	u64 must_be_zero_3;
+
+	/*
+	 * Starting address
+	 */
+	u64 initiator_cr3;
+	u64 linear_address;
+
+	/*
+	 * Padding
+	 */
+	u64 padding[4];
+};
+
+void rar_cpu_init(void);
+void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
+		       unsigned long start, unsigned long end);
+
+#endif /* _ASM_X86_RAR_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index dd662c42f510..b1e1b9afb2ac 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -71,6 +71,7 @@
 #include <asm/tdx.h>
 #include <asm/posted_intr.h>
 #include <asm/runtime-const.h>
+#include <asm/rar.h>
 
 #include "cpu.h"
 
@@ -2438,6 +2439,9 @@ void cpu_init(void)
 	if (is_uv_system())
 		uv_cpu_init();
 
+	if (cpu_feature_enabled(X86_FEATURE_RAR))
+		rar_cpu_init();
+
 	load_fixmap_gdt(cpu);
 }
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 5b9908f13dcf..f36fc99e8b10 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)	+= pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY)			+= kaslr.o
 obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION)	+= pti.o
+obj-$(CONFIG_BROADCAST_TLB_FLUSH)		+= rar.o
 
 obj-$(CONFIG_X86_MEM_ENCRYPT)	+= mem_encrypt.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_amd.o
diff --git a/arch/x86/mm/rar.c b/arch/x86/mm/rar.c
new file mode 100644
index 000000000000..16dc9b889cbd
--- /dev/null
+++ b/arch/x86/mm/rar.c
@@ -0,0 +1,195 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * RAR TLB shootdown
+ */
+#include <linux/sched.h>
+#include <linux/bug.h>
+#include <asm/current.h>
+#include <asm/io.h>
+#include <asm/sync_bitops.h>
+#include <asm/rar.h>
+#include <asm/tlbflush.h>
+
+static DEFINE_PER_CPU(struct cpumask, rar_cpu_mask);
+
+#define RAR_ACTION_OK		0x00
+#define RAR_ACTION_START	0x01
+#define RAR_ACTION_ACKED	0x02
+#define RAR_ACTION_FAIL		0x80
+
+#define RAR_MAX_PAYLOADS 32UL
+
+static unsigned long rar_in_use = ~(RAR_MAX_PAYLOADS - 1);
+static struct rar_payload rar_payload[RAR_MAX_PAYLOADS] __page_aligned_bss;
+static DEFINE_PER_CPU_ALIGNED(u8[RAR_MAX_PAYLOADS], rar_action);
+
+static unsigned long get_payload(void)
+{
+	while (1) {
+		unsigned long bit;
+
+		/*
+		 * Find a free bit and confirm it with
+		 * test_and_set_bit() below.
+		 */
+		bit = ffz(READ_ONCE(rar_in_use));
+
+		if (bit >= RAR_MAX_PAYLOADS)
+			continue;
+
+		if (!test_and_set_bit((long)bit, &rar_in_use))
+			return bit;
+	}
+}
+
+static void free_payload(unsigned long idx)
+{
+	clear_bit(idx, &rar_in_use);
+}
+
+static void set_payload(unsigned long idx, u16 pcid, unsigned long start,
+			uint32_t pages)
+{
+	struct rar_payload *p = &rar_payload[idx];
+
+	p->must_be_zero_1	= 0;
+	p->must_be_zero_2	= 0;
+	p->must_be_zero_3	= 0;
+	p->page_size		= RAR_INVLPG_PAGE_SIZE_4K;
+	p->type			= RAR_TYPE_INVPCID;
+	p->num_pages		= pages;
+	p->initiator_cr3	= pcid;
+	p->linear_address	= start;
+
+	if (pcid) {
+		/* RAR invalidation of the mapping of a specific process. */
+		if (pages >= RAR_INVLPG_MAX_PAGES)
+			p->subtype = RAR_INVPCID_PCID;
+		else
+			p->subtype = RAR_INVPCID_ADDR;
+	} else {
+		/*
+		 * Unfortunately RAR_INVPCID_ADDR excludes global translations.
+		 * Always do a full flush for kernel invalidations.
+		 */
+		p->subtype = RAR_INVPCID_ALL;
+	}
+
+	smp_wmb();
+}
+
+static void set_action_entry(unsigned long idx, int target_cpu)
+{
+	u8 *bitmap = per_cpu(rar_action, target_cpu);
+
+	WRITE_ONCE(bitmap[idx], RAR_ACTION_START);
+}
+
+static void wait_for_done(unsigned long idx, int target_cpu)
+{
+	u8 status;
+	u8 *rar_actions = per_cpu(rar_action, target_cpu);
+
+	status = READ_ONCE(rar_actions[idx]);
+
+	while ((status != RAR_ACTION_OK) && (status != RAR_ACTION_FAIL)) {
+		cpu_relax();
+		status = READ_ONCE(rar_actions[idx]);
+	}
+
+	WARN_ON_ONCE(rar_actions[idx] == RAR_ACTION_FAIL);
+}
+
+void rar_cpu_init(void)
+{
+	u64 r;
+	u8 *bitmap;
+	int this_cpu = smp_processor_id();
+
+	cpumask_clear(&per_cpu(rar_cpu_mask, this_cpu));
+
+	rdmsrl(MSR_IA32_RAR_INFO, r);
+	pr_info_once("RAR: support %lld payloads\n", r >> 32);
+
+	bitmap = (u8 *)per_cpu(rar_action, this_cpu);
+	memset(bitmap, 0, RAR_MAX_PAYLOADS);
+	wrmsrl(MSR_IA32_RAR_ACT_VEC, (u64)virt_to_phys(bitmap));
+	wrmsrl(MSR_IA32_RAR_PAYLOAD_BASE, (u64)virt_to_phys(rar_payload));
+
+	r = RAR_CTRL_ENABLE | RAR_CTRL_IGNORE_IF;
+	// reserved bits!!! r |= (RAR_VECTOR & 0xff);
+	wrmsrl(MSR_IA32_RAR_CTRL, r);
+}
+
+/*
+ * This is a modified version of smp_call_function_many() of kernel/smp.c,
+ * without a function pointer, because the RAR handler is the ucode.
+ */
+void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
+		       unsigned long start, unsigned long end)
+{
+	unsigned long pages = (end - start + PAGE_SIZE) / PAGE_SIZE;
+	int cpu, next_cpu, this_cpu = smp_processor_id();
+	cpumask_t *dest_mask;
+	unsigned long idx;
+
+	if (pages > RAR_INVLPG_MAX_PAGES || end == TLB_FLUSH_ALL)
+		pages = RAR_INVLPG_MAX_PAGES;
+
+	/*
+	 * Can deadlock when called with interrupts disabled.
+	 * We allow cpu's that are not yet online though, as no one else can
+	 * send smp call function interrupt to this cpu and as such deadlocks
+	 * can't happen.
+	 */
+	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
+		     && !oops_in_progress && !early_boot_irqs_disabled);
+
+	/* Try to fastpath.  So, what's a CPU they want?  Ignoring this one. */
+	cpu = cpumask_first_and(mask, cpu_online_mask);
+	if (cpu == this_cpu)
+		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
+
+	/* No online cpus?  We're done. */
+	if (cpu >= nr_cpu_ids)
+		return;
+
+	/* Do we have another CPU which isn't us? */
+	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
+	if (next_cpu == this_cpu)
+		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+
+	/* Fastpath: do that cpu by itself. */
+	if (next_cpu >= nr_cpu_ids) {
+		idx = get_payload();
+		set_payload(idx, pcid, start, pages);
+		set_action_entry(idx, cpu);
+		arch_send_rar_single_ipi(cpu);
+		wait_for_done(idx, cpu);
+		free_payload(idx);
+		return;
+	}
+
+	dest_mask = this_cpu_ptr(&rar_cpu_mask);
+	cpumask_and(dest_mask, mask, cpu_online_mask);
+	cpumask_clear_cpu(this_cpu, dest_mask);
+
+	/* Some callers race with other cpus changing the passed mask */
+	if (unlikely(!cpumask_weight(dest_mask)))
+		return;
+
+	idx = get_payload();
+	set_payload(idx, pcid, start, pages);
+
+	for_each_cpu(cpu, dest_mask)
+		set_action_entry(idx, cpu);
+
+	/* Send a message to all CPUs in the map */
+	arch_send_rar_ipi_mask(dest_mask);
+
+	for_each_cpu(cpu, dest_mask)
+		wait_for_done(idx, cpu);
+
+	free_payload(idx);
+}
+EXPORT_SYMBOL(smp_call_rar_many);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 8/9] x86/mm: use RAR for kernel TLB flushes
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (6 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-20  1:02 ` [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR Rik van Riel
  8 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Rik van Riel

From: Rik van Riel <riel@fb.com>

Use Intel RAR for kernel TLB flushes, when enabled.

Pass in PCID 0 to smp_call_rar_many() to flush the specified addresses,
regardless of which PCID they might be cached under in any destination CPU.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/mm/rar.c |  4 ++--
 arch/x86/mm/tlb.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/rar.c b/arch/x86/mm/rar.c
index 16dc9b889cbd..9a18c926ea7b 100644
--- a/arch/x86/mm/rar.c
+++ b/arch/x86/mm/rar.c
@@ -142,8 +142,8 @@ void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress && !early_boot_irqs_disabled);
+	if (cpu_online(this_cpu) && !oops_in_progress && !early_boot_irqs_disabled)
+		lockdep_assert_irqs_enabled();
 
 	/* Try to fastpath.  So, what's a CPU they want?  Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f5761e8be77f..35489df811dc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -21,6 +21,7 @@
 #include <asm/apic.h>
 #include <asm/msr.h>
 #include <asm/perf_event.h>
+#include <asm/rar.h>
 #include <asm/tlb.h>
 
 #include "mm_internal.h"
@@ -1446,6 +1447,18 @@ static void do_flush_tlb_all(void *info)
 	__flush_tlb_all();
 }
 
+static void rar_full_flush(const cpumask_t *cpumask)
+{
+	guard(preempt)();
+	smp_call_rar_many(cpumask, 0, 0, TLB_FLUSH_ALL);
+	invpcid_flush_all();
+}
+
+static void rar_flush_all(void)
+{
+	rar_full_flush(cpu_online_mask);
+}
+
 void flush_tlb_all(void)
 {
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
@@ -1453,6 +1466,8 @@ void flush_tlb_all(void)
 	/* First try (faster) hardware-assisted TLB invalidation. */
 	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
 		invlpgb_flush_all();
+	else if (cpu_feature_enabled(X86_FEATURE_RAR))
+		rar_flush_all();
 	else
 		/* Fall back to the IPI-based invalidation. */
 		on_each_cpu(do_flush_tlb_all, NULL, 1);
@@ -1482,15 +1497,36 @@ static void do_kernel_range_flush(void *info)
 	struct flush_tlb_info *f = info;
 	unsigned long addr;
 
+	/*
+	 * With PTI kernel TLB entries in all PCIDs need to be flushed.
+	 * With RAR the PCID space becomes so large, we might as well flush it all.
+	 *
+	 * Either of the two by itself works with targeted flushes.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_RAR) &&
+	    cpu_feature_enabled(X86_FEATURE_PTI)) {
+		invpcid_flush_all();
+		return;
+	}
+
 	/* flush range by one by one 'invlpg' */
 	for (addr = f->start; addr < f->end; addr += PAGE_SIZE)
 		flush_tlb_one_kernel(addr);
 }
 
+static void rar_kernel_range_flush(struct flush_tlb_info *info)
+{
+	guard(preempt)();
+	smp_call_rar_many(cpu_online_mask, 0, info->start, info->end);
+	do_kernel_range_flush(info);
+}
+
 static void kernel_tlb_flush_all(struct flush_tlb_info *info)
 {
 	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
 		invlpgb_flush_all();
+	else if (cpu_feature_enabled(X86_FEATURE_RAR))
+		rar_flush_all();
 	else
 		on_each_cpu(do_flush_tlb_all, NULL, 1);
 }
@@ -1499,6 +1535,8 @@ static void kernel_tlb_flush_range(struct flush_tlb_info *info)
 {
 	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
 		invlpgb_kernel_range_flush(info);
+	else if (cpu_feature_enabled(X86_FEATURE_RAR))
+		rar_kernel_range_flush(info);
 	else
 		on_each_cpu(do_kernel_range_flush, info, 1);
 }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR
  2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
                   ` (7 preceding siblings ...)
  2025-05-20  1:02 ` [RFC v2 8/9] x86/mm: use RAR for kernel TLB flushes Rik van Riel
@ 2025-05-20  1:02 ` Rik van Riel
  2025-05-20  2:48   ` [RFC v2.1 " Rik van Riel
  8 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  1:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Rik van Riel

From: Rik van Riel <riel@fb.com>

Use Intel RAR to flush userspace mappings.

Because RAR flushes are targeted using a cpu bitmap, the rules are
a little bit different than for true broadcast TLB invalidation.

For true broadcast TLB invalidation, like done with AMD INVLPGB,
a global ASID always has up to date TLB entries on every CPU.
The context switch code never has to flush the TLB when switching
to a global ASID on any CPU with INVLPGB.

For RAR, the TLB mappings for a global ASID are kept up to date
only on CPUs within the mm_cpumask, which lazily follows the
threads around the system. The context switch code does not
need to flush the TLB if the CPU is in the mm_cpumask, and
the PCID used stays the same.

However, a CPU that falls outside of the mm_cpumask can have
out of date TLB mappings for this task. When switching to
that task on a CPU not in the mm_cpumask, the TLB does need
to be flushed.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/tlbflush.h |   9 ++-
 arch/x86/mm/tlb.c               | 121 ++++++++++++++++++++++++++------
 2 files changed, 104 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cc9935bbbd45..bdde3ce6c9b1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -276,7 +276,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
 {
 	u16 asid;
 
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return 0;
 
 	asid = smp_load_acquire(&mm->context.global_asid);
@@ -289,7 +290,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
 
 static inline void mm_init_global_asid(struct mm_struct *mm)
 {
-	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    cpu_feature_enabled(X86_FEATURE_RAR)) {
 		mm->context.global_asid = 0;
 		mm->context.asid_transition = false;
 	}
@@ -313,7 +315,8 @@ static inline void mm_clear_asid_transition(struct mm_struct *mm)
 
 static inline bool mm_in_asid_transition(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return false;
 
 	return mm && READ_ONCE(mm->context.asid_transition);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 35489df811dc..51658bdaa0b3 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -203,7 +203,8 @@ struct new_asid {
 	unsigned int need_flush : 1;
 };
 
-static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen)
+static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen,
+				       bool new_cpu)
 {
 	struct new_asid ns;
 	u16 asid;
@@ -216,14 +217,22 @@ static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen)
 
 	/*
 	 * TLB consistency for global ASIDs is maintained with hardware assisted
-	 * remote TLB flushing. Global ASIDs are always up to date.
+	 * remote TLB flushing. Global ASIDs are always up to date with INVLPGB,
+	 * and up to date for CPUs in the mm_cpumask with RAR..
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    cpu_feature_enabled(X86_FEATURE_RAR)) {
 		u16 global_asid = mm_global_asid(next);
 
 		if (global_asid) {
 			ns.asid = global_asid;
 			ns.need_flush = 0;
+			/*
+			 * If the CPU fell out of the cpumask, it can be
+			 * out of date with RAR, and should be flushed.
+			 */
+			if (cpu_feature_enabled(X86_FEATURE_RAR))
+				ns.need_flush = new_cpu;
 			return ns;
 		}
 	}
@@ -281,7 +290,14 @@ static void reset_global_asid_space(void)
 {
 	lockdep_assert_held(&global_asid_lock);
 
-	invlpgb_flush_all_nonglobals();
+	/*
+	 * The global flush ensures that a freshly allocated global ASID
+	 * has no entries in any TLB, and can be used immediately.
+	 * With Intel RAR, the TLB may still need to be flushed at context
+	 * switch time when dealing with a CPU that was not in the mm_cpumask
+	 * for the process, and may have missed flushes along the way.
+	 */
+	flush_tlb_all();
 
 	/*
 	 * The TLB flush above makes it safe to re-use the previously
@@ -358,7 +374,7 @@ static void use_global_asid(struct mm_struct *mm)
 {
 	u16 asid;
 
-	guard(raw_spinlock_irqsave)(&global_asid_lock);
+	guard(raw_spinlock)(&global_asid_lock);
 
 	/* This process is already using broadcast TLB invalidation. */
 	if (mm_global_asid(mm))
@@ -384,13 +400,14 @@ static void use_global_asid(struct mm_struct *mm)
 
 void mm_free_global_asid(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return;
 
 	if (!mm_global_asid(mm))
 		return;
 
-	guard(raw_spinlock_irqsave)(&global_asid_lock);
+	guard(raw_spinlock)(&global_asid_lock);
 
 	/* The global ASID can be re-used only after flush at wrap-around. */
 #ifdef CONFIG_BROADCAST_TLB_FLUSH
@@ -408,7 +425,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
 {
 	u16 global_asid = mm_global_asid(mm);
 
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return false;
 
 	/* Process is transitioning to a global ASID */
@@ -426,13 +444,17 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
  */
 static void consider_global_asid(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return;
 
 	/* Check every once in a while. */
 	if ((current->pid & 0x1f) != (jiffies & 0x1f))
 		return;
 
+	if (mm == &init_mm)
+		return;
+
 	/*
 	 * Assign a global ASID if the process is active on
 	 * 4 or more CPUs simultaneously.
@@ -480,7 +502,7 @@ static void finish_asid_transition(struct flush_tlb_info *info)
 	mm_clear_asid_transition(mm);
 }
 
-static void broadcast_tlb_flush(struct flush_tlb_info *info)
+static void invlpgb_tlb_flush(struct flush_tlb_info *info)
 {
 	bool pmd = info->stride_shift == PMD_SHIFT;
 	unsigned long asid = mm_global_asid(info->mm);
@@ -511,8 +533,6 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info)
 		addr += nr << info->stride_shift;
 	} while (addr < info->end);
 
-	finish_asid_transition(info);
-
 	/* Wait for the INVLPGBs kicked off above to finish. */
 	__tlbsync();
 }
@@ -840,7 +860,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		/* Check if the current mm is transitioning to a global ASID */
 		if (mm_needs_global_asid(next, prev_asid)) {
 			next_tlb_gen = atomic64_read(&next->context.tlb_gen);
-			ns = choose_new_asid(next, next_tlb_gen);
+			ns = choose_new_asid(next, next_tlb_gen, true);
 			goto reload_tlb;
 		}
 
@@ -878,6 +898,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		ns.asid = prev_asid;
 		ns.need_flush = true;
 	} else {
+		bool new_cpu = false;
 		/*
 		 * Apply process to process speculation vulnerability
 		 * mitigations if applicable.
@@ -892,20 +913,25 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 
-		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
-		if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))
+		/* Start receiving IPIs and RAR invalidations */
+		if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) {
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+			if (cpu_feature_enabled(X86_FEATURE_RAR))
+				new_cpu = true;
+		}
+
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
 
-		ns = choose_new_asid(next, next_tlb_gen);
+		ns = choose_new_asid(next, next_tlb_gen, new_cpu);
 	}
 
 reload_tlb:
 	new_lam = mm_lam_cr3_mask(next);
 	if (ns.need_flush) {
-		VM_WARN_ON_ONCE(is_global_asid(ns.asid));
-		this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id);
-		this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen);
+		if (is_dyn_asid(ns.asid)) {
+			this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id);
+			this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen);
+		}
 		load_new_mm_cr3(next->pgd, ns.asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
@@ -1122,8 +1148,13 @@ static void flush_tlb_func(void *info)
 		loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
 	}
 
-	/* Broadcast ASIDs are always kept up to date with INVLPGB. */
-	if (is_global_asid(loaded_mm_asid))
+	/*
+	 * Broadcast ASIDs are always kept up to date with INVLPGB; with
+	 * Intel RAR IPI based flushes are used periodically to trim the
+	 * mm_cpumask, and flushes that get here should be processed.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    is_global_asid(loaded_mm_asid))
 		return;
 
 	VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=
@@ -1358,6 +1389,35 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info);
 static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx);
 #endif
 
+static void rar_tlb_flush(struct flush_tlb_info *info)
+{
+	unsigned long asid = mm_global_asid(info->mm);
+	u16 pcid = kern_pcid(asid);
+
+	/* Flush the remote CPUs. */
+	smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end);
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, info->end);
+
+	/* Flush the local TLB, if needed. */
+	if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(info->mm))) {
+		lockdep_assert_irqs_enabled();
+		local_irq_disable();
+		flush_tlb_func(info);
+		local_irq_enable();
+	}
+}
+
+static void broadcast_tlb_flush(struct flush_tlb_info *info)
+{
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
+		invlpgb_tlb_flush(info);
+	else /* Intel RAR */
+		rar_tlb_flush(info);
+
+	finish_asid_transition(info);
+}
+
 static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
 			unsigned long start, unsigned long end,
 			unsigned int stride_shift, bool freed_tables,
@@ -1418,15 +1478,22 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables,
 				  new_tlb_gen);
 
+	/*
+	 * IPIs and RAR can be targeted to a cpumask. Periodically trim that
+	 * mm_cpumask by sending TLB flush IPIs, even when most TLB flushes
+	 * are done with RAR.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) || !mm_global_asid(mm))
+		info->trim_cpumask = should_trim_cpumask(mm);
+
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
 	 * a local TLB flush is needed. Optimize this use-case by calling
 	 * flush_tlb_func_local() directly in this case.
 	 */
-	if (mm_global_asid(mm)) {
+	if (mm_global_asid(mm) && !info->trim_cpumask) {
 		broadcast_tlb_flush(info);
 	} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
-		info->trim_cpumask = should_trim_cpumask(mm);
 		flush_tlb_multi(mm_cpumask(mm), info);
 		consider_global_asid(mm);
 	} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
@@ -1737,6 +1804,14 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) {
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages = false;
+	} else if (cpu_feature_enabled(X86_FEATURE_RAR) && cpumask_any(&batch->cpumask) < nr_cpu_ids) {
+		rar_full_flush(&batch->cpumask);
+		if (cpumask_test_cpu(cpu, &batch->cpumask)) {
+			lockdep_assert_irqs_enabled();
+			local_irq_disable();
+			invpcid_flush_all_nonglobals();
+			local_irq_enable();
+		}
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
 		flush_tlb_multi(&batch->cpumask, info);
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC v2.1 9/9] x86/mm: userspace & pageout flushing using Intel RAR
  2025-05-20  1:02 ` [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR Rik van Riel
@ 2025-05-20  2:48   ` Rik van Riel
  0 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20  2:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel

On Mon, 19 May 2025 21:02:34 -0400
Rik van Riel <riel@surriel.com> wrote:

> From: Rik van Riel <riel@fb.com>
> 
> Use Intel RAR to flush userspace mappings.

The version below no longer segfaults.

However, I am still hitting the WARN_ON() in leave_mm(),
when called from the idle task through cpuidle_enter_state().

---8<---
From e80e10cdb6f15d29a65ab438cb07ba4b99f64b6e Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Thu, 24 Apr 2025 07:15:44 -0700
Subject: [PATCH 10/11] x86/mm: userspace & pageout flushing using Intel RAR

Use Intel RAR to flush userspace mappings.

Because RAR flushes are targeted using a cpu bitmap, the rules are
a little bit different than for true broadcast TLB invalidation.

For true broadcast TLB invalidation, like done with AMD INVLPGB,
a global ASID always has up to date TLB entries on every CPU.
The context switch code never has to flush the TLB when switching
to a global ASID on any CPU with INVLPGB.

For RAR, the TLB mappings for a global ASID are kept up to date
only on CPUs within the mm_cpumask, which lazily follows the
threads around the system. The context switch code does not
need to flush the TLB if the CPU is in the mm_cpumask, and
the PCID used stays the same.

However, a CPU that falls outside of the mm_cpumask can have
out of date TLB mappings for this task. When switching to
that task on a CPU not in the mm_cpumask, the TLB does need
to be flushed.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/include/asm/tlbflush.h |   9 ++-
 arch/x86/mm/tlb.c               | 133 +++++++++++++++++++++++++-------
 2 files changed, 111 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cc9935bbbd45..bdde3ce6c9b1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -276,7 +276,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
 {
 	u16 asid;
 
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return 0;
 
 	asid = smp_load_acquire(&mm->context.global_asid);
@@ -289,7 +290,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
 
 static inline void mm_init_global_asid(struct mm_struct *mm)
 {
-	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    cpu_feature_enabled(X86_FEATURE_RAR)) {
 		mm->context.global_asid = 0;
 		mm->context.asid_transition = false;
 	}
@@ -313,7 +315,8 @@ static inline void mm_clear_asid_transition(struct mm_struct *mm)
 
 static inline bool mm_in_asid_transition(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return false;
 
 	return mm && READ_ONCE(mm->context.asid_transition);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 35489df811dc..457191c2b5de 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -203,7 +203,8 @@ struct new_asid {
 	unsigned int need_flush : 1;
 };
 
-static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen)
+static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen,
+				       bool new_cpu)
 {
 	struct new_asid ns;
 	u16 asid;
@@ -216,14 +217,22 @@ static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen)
 
 	/*
 	 * TLB consistency for global ASIDs is maintained with hardware assisted
-	 * remote TLB flushing. Global ASIDs are always up to date.
+	 * remote TLB flushing. Global ASIDs are always up to date with INVLPGB,
+	 * and up to date for CPUs in the mm_cpumask with RAR..
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    cpu_feature_enabled(X86_FEATURE_RAR)) {
 		u16 global_asid = mm_global_asid(next);
 
 		if (global_asid) {
 			ns.asid = global_asid;
 			ns.need_flush = 0;
+			/*
+			 * If the CPU fell out of the cpumask, it can be
+			 * out of date with RAR, and should be flushed.
+			 */
+			if (cpu_feature_enabled(X86_FEATURE_RAR))
+				ns.need_flush = new_cpu;
 			return ns;
 		}
 	}
@@ -281,7 +290,14 @@ static void reset_global_asid_space(void)
 {
 	lockdep_assert_held(&global_asid_lock);
 
-	invlpgb_flush_all_nonglobals();
+	/*
+	 * The global flush ensures that a freshly allocated global ASID
+	 * has no entries in any TLB, and can be used immediately.
+	 * With Intel RAR, the TLB may still need to be flushed at context
+	 * switch time when dealing with a CPU that was not in the mm_cpumask
+	 * for the process, and may have missed flushes along the way.
+	 */
+	flush_tlb_all();
 
 	/*
 	 * The TLB flush above makes it safe to re-use the previously
@@ -358,7 +374,7 @@ static void use_global_asid(struct mm_struct *mm)
 {
 	u16 asid;
 
-	guard(raw_spinlock_irqsave)(&global_asid_lock);
+	guard(raw_spinlock)(&global_asid_lock);
 
 	/* This process is already using broadcast TLB invalidation. */
 	if (mm_global_asid(mm))
@@ -384,13 +400,14 @@ static void use_global_asid(struct mm_struct *mm)
 
 void mm_free_global_asid(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return;
 
 	if (!mm_global_asid(mm))
 		return;
 
-	guard(raw_spinlock_irqsave)(&global_asid_lock);
+	guard(raw_spinlock)(&global_asid_lock);
 
 	/* The global ASID can be re-used only after flush at wrap-around. */
 #ifdef CONFIG_BROADCAST_TLB_FLUSH
@@ -408,7 +425,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
 {
 	u16 global_asid = mm_global_asid(mm);
 
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return false;
 
 	/* Process is transitioning to a global ASID */
@@ -426,13 +444,17 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
  */
 static void consider_global_asid(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    !cpu_feature_enabled(X86_FEATURE_RAR))
 		return;
 
 	/* Check every once in a while. */
 	if ((current->pid & 0x1f) != (jiffies & 0x1f))
 		return;
 
+	if (mm == &init_mm)
+		return;
+
 	/*
 	 * Assign a global ASID if the process is active on
 	 * 4 or more CPUs simultaneously.
@@ -480,7 +502,7 @@ static void finish_asid_transition(struct flush_tlb_info *info)
 	mm_clear_asid_transition(mm);
 }
 
-static void broadcast_tlb_flush(struct flush_tlb_info *info)
+static void invlpgb_tlb_flush(struct flush_tlb_info *info)
 {
 	bool pmd = info->stride_shift == PMD_SHIFT;
 	unsigned long asid = mm_global_asid(info->mm);
@@ -511,8 +533,6 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info)
 		addr += nr << info->stride_shift;
 	} while (addr < info->end);
 
-	finish_asid_transition(info);
-
 	/* Wait for the INVLPGBs kicked off above to finish. */
 	__tlbsync();
 }
@@ -840,7 +860,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		/* Check if the current mm is transitioning to a global ASID */
 		if (mm_needs_global_asid(next, prev_asid)) {
 			next_tlb_gen = atomic64_read(&next->context.tlb_gen);
-			ns = choose_new_asid(next, next_tlb_gen);
+			ns = choose_new_asid(next, next_tlb_gen, true);
 			goto reload_tlb;
 		}
 
@@ -878,6 +898,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		ns.asid = prev_asid;
 		ns.need_flush = true;
 	} else {
+		bool new_cpu = false;
 		/*
 		 * Apply process to process speculation vulnerability
 		 * mitigations if applicable.
@@ -892,20 +913,25 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 
-		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
-		if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))
+		/* Start receiving IPIs and RAR invalidations */
+		if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) {
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+			if (cpu_feature_enabled(X86_FEATURE_RAR))
+				new_cpu = true;
+		}
+
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
 
-		ns = choose_new_asid(next, next_tlb_gen);
+		ns = choose_new_asid(next, next_tlb_gen, new_cpu);
 	}
 
 reload_tlb:
 	new_lam = mm_lam_cr3_mask(next);
 	if (ns.need_flush) {
-		VM_WARN_ON_ONCE(is_global_asid(ns.asid));
-		this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id);
-		this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen);
+		if (is_dyn_asid(ns.asid)) {
+			this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id);
+			this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen);
+		}
 		load_new_mm_cr3(next->pgd, ns.asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
@@ -1096,7 +1122,7 @@ static void flush_tlb_func(void *info)
 	u64 local_tlb_gen;
 	bool local = smp_processor_id() == f->initiating_cpu;
 	unsigned long nr_invalidate = 0;
-	u64 mm_tlb_gen;
+	u64 mm_tlb_gen = 0;
 
 	/* This code cannot presently handle being reentered. */
 	VM_WARN_ON(!irqs_disabled());
@@ -1122,12 +1148,17 @@ static void flush_tlb_func(void *info)
 		loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
 	}
 
-	/* Broadcast ASIDs are always kept up to date with INVLPGB. */
-	if (is_global_asid(loaded_mm_asid))
+	/*
+	 * Broadcast ASIDs are always kept up to date with INVLPGB; with
+	 * Intel RAR IPI based flushes are used periodically to trim the
+	 * mm_cpumask, and flushes that get here should be processed.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) &&
+	    is_global_asid(loaded_mm_asid))
 		return;
 
-	VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=
-		   loaded_mm->context.ctx_id);
+	VM_WARN_ON(is_dyn_asid(loaded_mm_asid) && loaded_mm->context.ctx_id !=
+		   this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id));
 
 	if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) {
 		/*
@@ -1143,7 +1174,8 @@ static void flush_tlb_func(void *info)
 		return;
 	}
 
-	local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
+	if (is_dyn_asid(loaded_mm_asid))
+		local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
 
 	if (unlikely(f->new_tlb_gen != TLB_GENERATION_INVALID &&
 		     f->new_tlb_gen <= local_tlb_gen)) {
@@ -1242,7 +1274,8 @@ static void flush_tlb_func(void *info)
 	}
 
 	/* Both paths above update our state to mm_tlb_gen. */
-	this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
+	if (is_dyn_asid(loaded_mm_asid))
+		this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
 
 	/* Tracing is done in a unified manner to reduce the code size */
 done:
@@ -1358,6 +1391,35 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info);
 static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx);
 #endif
 
+static void rar_tlb_flush(struct flush_tlb_info *info)
+{
+	unsigned long asid = mm_global_asid(info->mm);
+	u16 pcid = kern_pcid(asid);
+
+	/* Flush the remote CPUs. */
+	smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end);
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, info->end);
+
+	/* Flush the local TLB, if needed. */
+	if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(info->mm))) {
+		lockdep_assert_irqs_enabled();
+		local_irq_disable();
+		flush_tlb_func(info);
+		local_irq_enable();
+	}
+}
+
+static void broadcast_tlb_flush(struct flush_tlb_info *info)
+{
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
+		invlpgb_tlb_flush(info);
+	else /* Intel RAR */
+		rar_tlb_flush(info);
+
+	finish_asid_transition(info);
+}
+
 static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
 			unsigned long start, unsigned long end,
 			unsigned int stride_shift, bool freed_tables,
@@ -1418,15 +1480,22 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables,
 				  new_tlb_gen);
 
+	/*
+	 * IPIs and RAR can be targeted to a cpumask. Periodically trim that
+	 * mm_cpumask by sending TLB flush IPIs, even when most TLB flushes
+	 * are done with RAR.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) || !mm_global_asid(mm))
+		info->trim_cpumask = should_trim_cpumask(mm);
+
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
 	 * a local TLB flush is needed. Optimize this use-case by calling
 	 * flush_tlb_func_local() directly in this case.
 	 */
-	if (mm_global_asid(mm)) {
+	if (mm_global_asid(mm) && !info->trim_cpumask) {
 		broadcast_tlb_flush(info);
 	} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
-		info->trim_cpumask = should_trim_cpumask(mm);
 		flush_tlb_multi(mm_cpumask(mm), info);
 		consider_global_asid(mm);
 	} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
@@ -1737,6 +1806,14 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 	if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) {
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages = false;
+	} else if (cpu_feature_enabled(X86_FEATURE_RAR) && cpumask_any(&batch->cpumask) < nr_cpu_ids) {
+		rar_full_flush(&batch->cpumask);
+		if (cpumask_test_cpu(cpu, &batch->cpumask)) {
+			lockdep_assert_irqs_enabled();
+			local_irq_disable();
+			invpcid_flush_all_nonglobals();
+			local_irq_enable();
+		}
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
 		flush_tlb_multi(&batch->cpumask, info);
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations
  2025-05-20  1:02 ` [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations Rik van Riel
@ 2025-05-20  9:16   ` Ingo Molnar
  2025-06-04  0:11     ` Rik van Riel
  2025-05-21 15:28   ` Dave Hansen
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2025-05-20  9:16 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Rik van Riel,
	Yu-cheng Yu


* Rik van Riel <riel@surriel.com> wrote:

> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 47051871b436..c417b0015304 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -103,6 +103,11 @@
>   */
>  #define POSTED_MSI_NOTIFICATION_VECTOR	0xeb
>  
> +/*
> + * RAR (remote action request) TLB flush
> + */
> +#define RAR_VECTOR			0xe0
> +
>  #define NR_VECTORS			 256

This subtly breaks x86 IRQ vector allocation AFAICS.

Right now device IRQ vectors are allocated from 0x81 to 
FIRST_SYSTEM_VECTOR (POSTED_MSI_NOTIFICATION_VECTOR) or 0xeb.

But RAR_VECTOR is within that range, the the IRQ allocator will overlap 
it and result in what I guess will be misbehaving RAR code and 
misbehaving device IRQ handling once it hands out 0xeb as well.

So you need to lower NR_EXTERNAL_VECTORS for there to be no overlap 
between device IRQ vectors and system IRQ vectors.

This will substantially compresses the available device vector space 
from ~108 vectors to ~95 vectors, a ~12% reduction. RAR, under the 
current device IRQ vector allocator, will effectively reduce the number 
of vectors not by 1 vector, but by 13 vectors. This should be pointed 
out in the changelog.

It probably doesn't matter much due to MSI multiplexing, but should 
nevertheless be implemented correctly and should be documented.

Thanks,

	Ingo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
@ 2025-05-20  9:28   ` Ingo Molnar
  2025-05-20 12:57     ` Rik van Riel
  2025-05-20 11:29   ` Nadav Amit
  2025-05-21 16:38   ` Dave Hansen
  2 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2025-05-20  9:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Yu-cheng Yu


* Rik van Riel <riel@surriel.com> wrote:

> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Remote Action Request (RAR) is a TLB flushing broadcast facility.
> To start a TLB flush, the initiator CPU creates a RAR payload and
> sends a command to the APIC.  The receiving CPUs automatically flush
> TLBs as specified in the payload without the kernel's involement.
> 
> [ riel: add pcid parameter to smp_call_rar_many so other mms can be flushed ]

Please actually review & tidy up patches that you pass through, don't 
just hack them up minimally and slap your tag and SOB on top of it.

One example, of many:

> +	 * We allow cpu's that are not yet online though, as no one else can

Here the comment has 'CPU' in lowercase, and with a grammar mistake.

> +	 * send smp call function interrupt to this cpu and as such deadlocks

Here 'CPU' is in lowercase.

> +	/* Try to fastpath.  So, what's a CPU they want?  Ignoring this one. */

Oh, here 'CPU' is uppercase again! What happened?

> +	/* No online cpus?  We're done. */

Lowercase again. Damn, I thought we settled on a way to spell this 
thing already.

> +	/* Do we have another CPU which isn't us? */

And uppercase. What a roller-coaster.

> +	/* Fastpath: do that cpu by itself. */
> +	/* Some callers race with other cpus changing the passed mask */

And lowercase.

> +	/* Send a message to all CPUs in the map */

And uppercase.

It's almost as if nobody has ever read these comments after writing 
them.

There's like a zillion small random-noise details through the entire 
series that insert unnecessary extra white noise in critical system 
code that should be a lot more carefully written, which emits a foul 
aura of carelessness. Reviewers should not be forced to point these out 
to you, in fact reviewers should not be exposed to such noise at all.

Please review the entire thing *much* more carefully before submitting 
-v3.

Thanks,

	Ingo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
  2025-05-20  9:28   ` Ingo Molnar
@ 2025-05-20 11:29   ` Nadav Amit
  2025-05-20 13:00     ` Rik van Riel
  2025-05-21 16:38   ` Dave Hansen
  2 siblings, 1 reply; 35+ messages in thread
From: Nadav Amit @ 2025-05-20 11:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linux Kernel Mailing List, open list:MEMORY MANAGEMENT,
	the arch/x86 maintainers, kernel-team, Dave Hansen, luto, peterz,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Yu-cheng Yu

Not a full review, but..

> On 20 May 2025, at 4:02, Rik van Riel <riel@surriel.com> wrote:
> 
> +/*
> + * This is a modified version of smp_call_function_many() of kernel/smp.c,

The updated function names is smp_call_function_many_cond() and it is
not aligned with smp_call_rar_many. I think the new version is (suprisingly)
better, so it’d be beneficial to bring smp_call_rar_many() to be like the
updated one in smp.c.

> + * without a function pointer, because the RAR handler is the ucode.
> + */
> +void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
> +		       unsigned long start, unsigned long end)
> +{
> +	unsigned long pages = (end - start + PAGE_SIZE) / PAGE_SIZE;
> +	int cpu, next_cpu, this_cpu = smp_processor_id();
> +	cpumask_t *dest_mask;
> +	unsigned long idx;
> +
> +	if (pages > RAR_INVLPG_MAX_PAGES || end == TLB_FLUSH_ALL)
> +		pages = RAR_INVLPG_MAX_PAGES;
> +
> +	/*
> +	 * Can deadlock when called with interrupts disabled.
> +	 * We allow cpu's that are not yet online though, as no one else can
> +	 * send smp call function interrupt to this cpu and as such deadlocks
> +	 * can't happen.
> +	 */
> +	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> +		     && !oops_in_progress && !early_boot_irqs_disabled);

I thought you agreed to change it to make it use lockdep instead (so it will
be compiled out without LOCKDEP), like done in smp_call_function_many_cond()

> +
> +	/* Try to fastpath.  So, what's a CPU they want?  Ignoring this one. */
> +	cpu = cpumask_first_and(mask, cpu_online_mask);
> +	if (cpu == this_cpu)
> +		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
> +

Putting aside the rest of the code, I see you don’t call should_flush_tlb().
I think it is worth mentioning in commit log or comment the rationale behind
it (and maybe benchmarks to justify it).



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20  9:28   ` Ingo Molnar
@ 2025-05-20 12:57     ` Rik van Riel
  2025-05-24  9:22       ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-20 12:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Yu-cheng Yu

On Tue, 2025-05-20 at 11:28 +0200, Ingo Molnar wrote:
> 
> * Rik van Riel <riel@surriel.com> wrote:
> 
> > From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > 
> > Remote Action Request (RAR) is a TLB flushing broadcast facility.
> > To start a TLB flush, the initiator CPU creates a RAR payload and
> > sends a command to the APIC.  The receiving CPUs automatically
> > flush
> > TLBs as specified in the payload without the kernel's involement.
> > 
> > [ riel: add pcid parameter to smp_call_rar_many so other mms can be
> > flushed ]
> 
> Please actually review & tidy up patches that you pass through, don't
> just hack them up minimally and slap your tag and SOB on top of it.

I'm happy to do that now that the code is finally working.

v3 will have much more cleanups, and hopefully a few
optimizations.

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20 11:29   ` Nadav Amit
@ 2025-05-20 13:00     ` Rik van Riel
  2025-05-20 20:26       ` Nadav Amit
  0 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-20 13:00 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linux Kernel Mailing List, open list:MEMORY MANAGEMENT,
	the arch/x86 maintainers, kernel-team, Dave Hansen, luto, peterz,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Yu-cheng Yu

On Tue, 2025-05-20 at 14:29 +0300, Nadav Amit wrote:
> Not a full review, but..
> 
> > On 20 May 2025, at 4:02, Rik van Riel <riel@surriel.com> wrote:
> > 
> > +/*
> > + * This is a modified version of smp_call_function_many() of
> > kernel/smp.c,
> 
> The updated function names is smp_call_function_many_cond() and it is
> not aligned with smp_call_rar_many. I think the new version is
> (suprisingly)
> better, so it’d be beneficial to bring smp_call_rar_many() to be like
> the
> updated one in smp.c.
> 
Agreed, it will be good to conditionally not send 
the RAR vector to some CPUs, especially ones that
are in deeper idle states.

That means structuring the code more like
smp_call_function_many_cond()

> > +	/*
> > +	 * Can deadlock when called with interrupts disabled.
> > +	 * We allow cpu's that are not yet online though, as no
> > one else can
> > +	 * send smp call function interrupt to this cpu and as
> > such deadlocks
> > +	 * can't happen.
> > +	 */
> > +	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> > +		     && !oops_in_progress &&
> > !early_boot_irqs_disabled);
> 
> I thought you agreed to change it to make it use lockdep instead (so
> it will
> be compiled out without LOCKDEP), like done in
> smp_call_function_many_cond()
> 
I thought I had made that change in my tree.

I guess I lost it in a rebase :(

> > +
> > +	/* Try to fastpath.  So, what's a CPU they want?  Ignoring
> > this one. */
> > +	cpu = cpumask_first_and(mask, cpu_online_mask);
> > +	if (cpu == this_cpu)
> > +		cpu = cpumask_next_and(cpu, mask,
> > cpu_online_mask);
> > +
> 
> Putting aside the rest of the code, I see you don’t call
> should_flush_tlb().
> I think it is worth mentioning in commit log or comment the rationale
> behind
> it (and maybe benchmarks to justify it).
> 
> 
The long term plan here is to simply have the originating
CPU included in the cpumask, and have it send a RAR
request to itself.

That way all the CPUs can invalidate their entries in
parallel, without any extra code.

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20 13:00     ` Rik van Riel
@ 2025-05-20 20:26       ` Nadav Amit
  2025-05-20 20:31         ` Rik van Riel
  0 siblings, 1 reply; 35+ messages in thread
From: Nadav Amit @ 2025-05-20 20:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linux Kernel Mailing List, open list:MEMORY MANAGEMENT,
	the arch/x86 maintainers, kernel-team, Dave Hansen, luto, peterz,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Yu-cheng Yu



> On 20 May 2025, at 16:00, Rik van Riel <riel@surriel.com> wrote:
> 
>> Putting aside the rest of the code, I see you don’t call
>> should_flush_tlb().
>> I think it is worth mentioning in commit log or comment the rationale
>> behind
>> it (and maybe benchmarks to justify it).
>> 
>> 
> The long term plan here is to simply have the originating
> CPU included in the cpumask, and have it send a RAR
> request to itself.

That’s unrelated. I was referring to considering supporting
some sort of lazy TLB to eliminate sending RAR to cores that
do not care about it. Is there a cost of RAR to more cores than
needed? My guess is that there is one, and maybe in such cases
you would want actual IPI and special handling.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20 20:26       ` Nadav Amit
@ 2025-05-20 20:31         ` Rik van Riel
  0 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-20 20:31 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linux Kernel Mailing List, open list:MEMORY MANAGEMENT,
	the arch/x86 maintainers, kernel-team, Dave Hansen, luto, peterz,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Yu-cheng Yu

On Tue, 2025-05-20 at 23:26 +0300, Nadav Amit wrote:
> 
> > On 20 May 2025, at 16:00, Rik van Riel <riel@surriel.com> wrote:
> > 
> > > Putting aside the rest of the code, I see you don’t call
> > > should_flush_tlb().
> > > I think it is worth mentioning in commit log or comment the
> > > rationale
> > > behind
> > > it (and maybe benchmarks to justify it).
> > > 
> > > 
> > The long term plan here is to simply have the originating
> > CPU included in the cpumask, and have it send a RAR
> > request to itself.
> 
> That’s unrelated. I was referring to considering supporting
> some sort of lazy TLB to eliminate sending RAR to cores that
> do not care about it. Is there a cost of RAR to more cores than
> needed? My guess is that there is one, and maybe in such cases
> you would want actual IPI and special handling.

For RAR, I suspect the big cost is waking up
CPUs in idle states, and waiting for them to
wake up.

One possibility may be to change leave_mm()
to have an argument to set some flag that
the RAR code can read to see whether or
not to send a RAR interrupt to that CPU,
even if it is in the cpumask.

I don't think we can use the exact same
should_flush_tlb() logic, because the
tlb_gen is not updated by a RAR flush,
and the should_flush_tlb() logic is
somewhat intertwined with the tlb_gen
logic.

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs
  2025-05-20  1:02 ` [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs Rik van Riel
@ 2025-05-21 11:49   ` Borislav Petkov
  0 siblings, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2025-05-21 11:49 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Yu-cheng Yu

On Mon, May 19, 2025 at 09:02:27PM -0400, Rik van Riel wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Remote Action Request (RAR) is a TLB flushing broadcast facility.
> This patch introduces RAR MSRs.  RAR is introduced in later patches.
> 
> There are five RAR MSRs:
> 
>   MSR_CORE_CAPABILITIES
>   MSR_IA32_RAR_CTRL
>   MSR_IA32_RAR_ACT_VEC
>   MSR_IA32_RAR_PAYLOAD_BASE
>   MSR_IA32_RAR_INFO
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  arch/x86/include/asm/msr-index.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)

You can merge this one with the previous one.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-20  1:02 ` [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR Rik van Riel
@ 2025-05-21 11:53   ` Borislav Petkov
  2025-05-21 13:57     ` Rik van Riel
  0 siblings, 1 reply; 35+ messages in thread
From: Borislav Petkov @ 2025-05-21 11:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Mon, May 19, 2025 at 09:02:29PM -0400, Rik van Riel wrote:
> From: Rik van Riel <riel@fb.com>
> 
> Introduce X86_FEATURE_RAR and enumeration of the feature.
> 
> [riel: moved initialization to intel.c and disabling to Kconfig.cpufeatures]
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

I'm guessing Yu-cheng is the original author - that's expressed differently.

> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  arch/x86/Kconfig.cpufeatures       |  4 ++++
>  arch/x86/include/asm/cpufeatures.h |  2 +-
>  arch/x86/kernel/cpu/common.c       | 13 +++++++++++++
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
> index 250c10627ab3..7d459b5f47f7 100644
> --- a/arch/x86/Kconfig.cpufeatures
> +++ b/arch/x86/Kconfig.cpufeatures
> @@ -195,3 +195,7 @@ config X86_DISABLED_FEATURE_SEV_SNP
>  config X86_DISABLED_FEATURE_INVLPGB
>  	def_bool y
>  	depends on !BROADCAST_TLB_FLUSH
> +
> +config X86_DISABLED_FEATURE_RAR
> +	def_bool y
> +	depends on !BROADCAST_TLB_FLUSH
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 5b50e0e35129..0729c2d54109 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -76,7 +76,7 @@
>  #define X86_FEATURE_K8			( 3*32+ 4) /* Opteron, Athlon64 */
>  #define X86_FEATURE_ZEN5		( 3*32+ 5) /* CPU based on Zen5 microarchitecture */
>  #define X86_FEATURE_ZEN6		( 3*32+ 6) /* CPU based on Zen6 microarchitecture */
> -/* Free                                 ( 3*32+ 7) */
> +#define X86_FEATURE_RAR			( 3*32+ 7) /* Intel Remote Action Request */
>  #define X86_FEATURE_CONSTANT_TSC	( 3*32+ 8) /* "constant_tsc" TSC ticks at a constant rate */
>  #define X86_FEATURE_UP			( 3*32+ 9) /* "up" SMP kernel running on UP */
>  #define X86_FEATURE_ART			( 3*32+10) /* "art" Always running timer (ART) */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 8feb8fd2957a..dd662c42f510 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1545,6 +1545,18 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  	setup_force_cpu_bug(X86_BUG_L1TF);
>  }
>  
> +static void __init detect_rar(struct cpuinfo_x86 *c)
> +{
> +	u64 msr;
> +
> +	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> +		rdmsrl(MSR_IA32_CORE_CAPABILITIES, msr);
> +
> +		if (msr & CORE_CAP_RAR)
> +			setup_force_cpu_cap(X86_FEATURE_RAR);
> +	}
> +}
> +
>  /*
>   * The NOPL instruction is supposed to exist on all CPUs of family >= 6;
>   * unfortunately, that's not true in practice because of early VIA
> @@ -1771,6 +1783,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>  		setup_clear_cpu_cap(X86_FEATURE_LA57);
>  
>  	detect_nopl();
> +	detect_rar(c);
>  }

Move all this gunk into early_init_intel().

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly
  2025-05-20  1:02 ` [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly Rik van Riel
@ 2025-05-21 11:54   ` Borislav Petkov
  2025-05-21 15:16   ` Dave Hansen
  1 sibling, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2025-05-21 11:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Mon, May 19, 2025 at 09:02:30PM -0400, Rik van Riel wrote:
> From: Rik van Riel <riel@fb.com>
> 
> The function cpa_flush() calls __flush_tlb_one_kernel() and
> flush_tlb_all().
> 
> Replacing that with a call to flush_tlb_kernel_range() allows
> cpa_flush() to make use of INVLPGB or RAR without any additional
> changes.
> 
> Initialize invlpgb_count_max to 1, since flush_tlb_kernel_range()
> can now be called before invlpgb_count_max has been initialized
> to the value read from CPUID.
> 
> [riel: remove now unused __cpa_flush_tlb]
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Signed-off-by: Rik van Riel <riel@surriel.com>

Please audit all your SOB chains.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-21 11:53   ` Borislav Petkov
@ 2025-05-21 13:57     ` Rik van Riel
  2025-05-21 14:53       ` Borislav Petkov
  0 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-21 13:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Wed, 2025-05-21 at 13:53 +0200, Borislav Petkov wrote:
> On Mon, May 19, 2025 at 09:02:29PM -0400, Rik van Riel wrote:
> > From: Rik van Riel <riel@fb.com>
> > 
> > Introduce X86_FEATURE_RAR and enumeration of the feature.
> > 
> > [riel: moved initialization to intel.c and disabling to
> > Kconfig.cpufeatures]
> > 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> I'm guessing Yu-cheng is the original author - that's expressed
> differently.

I will fix that up!

> 
> > @@ -1771,6 +1783,7 @@ static void __init early_identify_cpu(struct
> > cpuinfo_x86 *c)
> >  		setup_clear_cpu_cap(X86_FEATURE_LA57);
> >  
> >  	detect_nopl();
> > +	detect_rar(c);
> >  }
> 
> Move all this gunk into early_init_intel().
> 
I had the same thought, and tried that already.

It didn't work.

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-21 13:57     ` Rik van Riel
@ 2025-05-21 14:53       ` Borislav Petkov
  2025-05-21 16:06         ` Rik van Riel
  0 siblings, 1 reply; 35+ messages in thread
From: Borislav Petkov @ 2025-05-21 14:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Wed, May 21, 2025 at 09:57:52AM -0400, Rik van Riel wrote:
> I had the same thought, and tried that already.
> 
> It didn't work.

Care to share why?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES
  2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
@ 2025-05-21 14:57   ` Dave Hansen
  2025-05-22 15:10   ` Sean Christopherson
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2025-05-21 14:57 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu

On 5/19/25 18:02, Rik van Riel wrote:
> MSR_IA32_CORE_CAPABILITIES indicates the existence of other MSRs.
> Bit[1] indicates Remote Action Request (RAR) TLB registers.

Nit: This may have changed from when Yu-cheng wrote the changelog, but
RAR can do more than just flush the TLB. This probably needs to get a
refresh.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly
  2025-05-20  1:02 ` [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly Rik van Riel
  2025-05-21 11:54   ` Borislav Petkov
@ 2025-05-21 15:16   ` Dave Hansen
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2025-05-21 15:16 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On 5/19/25 18:02, Rik van Riel wrote:
> The function cpa_flush() calls __flush_tlb_one_kernel() and
> flush_tlb_all().
> 
> Replacing that with a call to flush_tlb_kernel_range() allows
> cpa_flush() to make use of INVLPGB or RAR without any additional
> changes.

Yeah, the pageattr.c flushing code has gone through some twists and
turns over the years but it does indeed look like it has converged to be
awfully close to the other flushing code. It used to do wbinvd() and a
full flush, but the wbinvd() disappeared at some point.

I don't immediately see any downsides to doing this. You could probably
even hoist this up to the top of the series. I think it's a good cleanup
on its own.

Also, I'd make the point in the subject and changelog that this isn't
just changing one function to call another, it's removing some
duplicated functionality and consolidating it to existing common code.

Maybe this for the subject:

	x86/mm: Have cpa_flush() use common TLB flushing infrastructure

One super nit:

> +	start = fix_addr(__cpa_addr(cpa, 0));
> +	end = fix_addr(__cpa_addr(cpa, cpa->numpages));

Please vertically align the fix_addr()'s.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations
  2025-05-20  1:02 ` [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations Rik van Riel
  2025-05-20  9:16   ` Ingo Molnar
@ 2025-05-21 15:28   ` Dave Hansen
  2025-05-21 15:59     ` Rik van Riel
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2025-05-21 15:28 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 0c1c68039d6f..1ab9f5fcac8a 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -40,6 +40,9 @@ struct smp_ops {
>  
>  	void (*send_call_func_ipi)(const struct cpumask *mask);
>  	void (*send_call_func_single_ipi)(int cpu);
> +
> +	void (*send_rar_ipi)(const struct cpumask *mask);
> +	void (*send_rar_single_ipi)(int cpu);
>  };

I assume Yu-cheng did it this way.

I'm curios why new smp_ops are needed for this, though. It's not like
there are a bunch of different implementations to pick between.


> -void native_send_call_func_ipi(const struct cpumask *mask)
> +static void do_native_send_ipi(const struct cpumask *mask, int vector)
>  {
>  	if (static_branch_likely(&apic_use_ipi_shorthand)) {
>  		unsigned int cpu = smp_processor_id();
> @@ -88,14 +88,19 @@ void native_send_call_func_ipi(const struct cpumask *mask)
>  			goto sendmask;
>  
>  		if (cpumask_test_cpu(cpu, mask))
> -			__apic_send_IPI_all(CALL_FUNCTION_VECTOR);
> +			__apic_send_IPI_all(vector);
>  		else if (num_online_cpus() > 1)
> -			__apic_send_IPI_allbutself(CALL_FUNCTION_VECTOR);
> +			__apic_send_IPI_allbutself(vector);
>  		return;
>  	}
>  
>  sendmask:
> -	__apic_send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
> +	__apic_send_IPI_mask(mask, vector);
> +}
> +
> +void native_send_call_func_ipi(const struct cpumask *mask)
> +{
> +	do_native_send_ipi(mask, CALL_FUNCTION_VECTOR);
>  }

This refactoring probably belongs in a separate patch.

>  void apic_send_nmi_to_offline_cpu(unsigned int cpu)
> @@ -106,6 +111,16 @@ void apic_send_nmi_to_offline_cpu(unsigned int cpu)
>  		return;
>  	apic->send_IPI(cpu, NMI_VECTOR);
>  }
> +
> +void native_send_rar_single_ipi(int cpu)
> +{
> +	apic->send_IPI_mask(cpumask_of(cpu), RAR_VECTOR);
> +}
> +
> +void native_send_rar_ipi(const struct cpumask *mask)
> +{
> +	do_native_send_ipi(mask, RAR_VECTOR);
> +}
>  #endif /* CONFIG_SMP */
>  
>  static inline int __prepare_ICR2(unsigned int mask)
> diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
> index bdcf609eb283..833669174267 100644
> --- a/arch/x86/kernel/apic/local.h
> +++ b/arch/x86/kernel/apic/local.h
> @@ -38,6 +38,9 @@ static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
>  	case NMI_VECTOR:
>  		icr |= APIC_DM_NMI;
>  		break;
> +	case RAR_VECTOR:
> +		icr |= APIC_DM_RAR;
> +		break;
>  	}
>  	return icr;
>  }
I feel like this patch is doing three separate things:

1. Adds smp_ops
2. Refactors native_send_call_func_ipi()
3. Adds RAR support

None of those are huge, but it would make a lot more sense to break
those out. I'm also still not sure of the point of the smp_ops.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations
  2025-05-21 15:28   ` Dave Hansen
@ 2025-05-21 15:59     ` Rik van Riel
  0 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-05-21 15:59 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Wed, 2025-05-21 at 08:28 -0700, Dave Hansen wrote:
> > diff --git a/arch/x86/include/asm/smp.h
> > b/arch/x86/include/asm/smp.h
> > index 0c1c68039d6f..1ab9f5fcac8a 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -40,6 +40,9 @@ struct smp_ops {
> >  
> >  	void (*send_call_func_ipi)(const struct cpumask *mask);
> >  	void (*send_call_func_single_ipi)(int cpu);
> > +
> > +	void (*send_rar_ipi)(const struct cpumask *mask);
> > +	void (*send_rar_single_ipi)(int cpu);
> >  };
> 
> I assume Yu-cheng did it this way.
> 
> I'm curios why new smp_ops are needed for this, though. It's not like
> there are a bunch of different implementations to pick between.
> 
You are right, this was in the code I received.

> I feel like this patch is doing three separate things:
> 
> 1. Adds smp_ops
> 2. Refactors native_send_call_func_ipi()
> 3. Adds RAR support
> 
> None of those are huge, but it would make a lot more sense to break
> those out. I'm also still not sure of the point of the smp_ops.
> 
I am not very familiar with this part of the kernel,
but would be happy to make whatever changes the
maintainers want to see.

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-21 14:53       ` Borislav Petkov
@ 2025-05-21 16:06         ` Rik van Riel
  2025-05-21 19:39           ` Borislav Petkov
  0 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2025-05-21 16:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Wed, 2025-05-21 at 16:53 +0200, Borislav Petkov wrote:
> On Wed, May 21, 2025 at 09:57:52AM -0400, Rik van Riel wrote:
> > I had the same thought, and tried that already.
> > 
> > It didn't work.
> 
> Care to share why?
> 
It resulted in RAR not being properly initialized,
and the system hanging when trying to use RAR to
flush the TLB.

I don't remember exactly what sequence of events
was happening here, maybe something with the
boot CPU per-cpu RAR initialization being called
(or not, due to X86_FEATURE_RAR not being set)
before the systemwide initialization?

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
  2025-05-20  9:28   ` Ingo Molnar
  2025-05-20 11:29   ` Nadav Amit
@ 2025-05-21 16:38   ` Dave Hansen
  2025-05-21 19:06     ` Thomas Gleixner
  2025-06-03 20:08     ` Rik van Riel
  2 siblings, 2 replies; 35+ messages in thread
From: Dave Hansen @ 2025-05-21 16:38 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu

On 5/19/25 18:02, Rik van Riel wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Remote Action Request (RAR) is a TLB flushing broadcast facility.
> To start a TLB flush, the initiator CPU creates a RAR payload and
> sends a command to the APIC.  The receiving CPUs automatically flush
> TLBs as specified in the payload without the kernel's involement.
> 
> [ riel: add pcid parameter to smp_call_rar_many so other mms can be flushed ]
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  arch/x86/include/asm/rar.h   |  69 +++++++++++++
>  arch/x86/kernel/cpu/common.c |   4 +
>  arch/x86/mm/Makefile         |   1 +
>  arch/x86/mm/rar.c            | 195 +++++++++++++++++++++++++++++++++++
>  4 files changed, 269 insertions(+)
>  create mode 100644 arch/x86/include/asm/rar.h
>  create mode 100644 arch/x86/mm/rar.c
> 
> diff --git a/arch/x86/include/asm/rar.h b/arch/x86/include/asm/rar.h
> new file mode 100644
> index 000000000000..78c039e40e81
> --- /dev/null
> +++ b/arch/x86/include/asm/rar.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_RAR_H
> +#define _ASM_X86_RAR_H
> +
> +/*
> + * RAR payload types
> + */
> +#define RAR_TYPE_INVPG		0
> +#define RAR_TYPE_INVPG_NO_CR3	1
> +#define RAR_TYPE_INVPCID	2
> +#define RAR_TYPE_INVEPT		3
> +#define RAR_TYPE_INVVPID	4
> +#define RAR_TYPE_WRMSR		5
> +
> +/*
> + * Subtypes for RAR_TYPE_INVLPG
> + */
> +#define RAR_INVPG_ADDR			0 /* address specific */
> +#define RAR_INVPG_ALL			2 /* all, include global */
> +#define RAR_INVPG_ALL_NO_GLOBAL		3 /* all, exclude global */
> +
> +/*
> + * Subtypes for RAR_TYPE_INVPCID
> + */
> +#define RAR_INVPCID_ADDR		0 /* address specific */
> +#define RAR_INVPCID_PCID		1 /* all of PCID */
> +#define RAR_INVPCID_ALL			2 /* all, include global */
> +#define RAR_INVPCID_ALL_NO_GLOBAL	3 /* all, exclude global */
> +
> +/*
> + * Page size for RAR_TYPE_INVLPG
> + */
> +#define RAR_INVLPG_PAGE_SIZE_4K		0
> +#define RAR_INVLPG_PAGE_SIZE_2M		1
> +#define RAR_INVLPG_PAGE_SIZE_1G		2
> +
> +/*
> + * Max number of pages per payload
> + */
> +#define RAR_INVLPG_MAX_PAGES 63
> +
> +struct rar_payload {
> +	u64 for_sw		: 8;
> +	u64 type		: 8;
> +	u64 must_be_zero_1	: 16;
> +	u64 subtype		: 3;
> +	u64 page_size		: 2;
> +	u64 num_pages		: 6;
> +	u64 must_be_zero_2	: 21;
> +
> +	u64 must_be_zero_3;
> +
> +	/*
> +	 * Starting address
> +	 */
> +	u64 initiator_cr3;
> +	u64 linear_address;
> +
> +	/*
> +	 * Padding
> +	 */
> +	u64 padding[4];
> +};
> +
> +void rar_cpu_init(void);
> +void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
> +		       unsigned long start, unsigned long end);
> +
> +#endif /* _ASM_X86_RAR_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index dd662c42f510..b1e1b9afb2ac 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -71,6 +71,7 @@
>  #include <asm/tdx.h>
>  #include <asm/posted_intr.h>
>  #include <asm/runtime-const.h>
> +#include <asm/rar.h>
>  
>  #include "cpu.h"
>  
> @@ -2438,6 +2439,9 @@ void cpu_init(void)
>  	if (is_uv_system())
>  		uv_cpu_init();
>  
> +	if (cpu_feature_enabled(X86_FEATURE_RAR))
> +		rar_cpu_init();
> +
>  	load_fixmap_gdt(cpu);
>  }
>  
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 5b9908f13dcf..f36fc99e8b10 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)	+= pkeys.o
>  obj-$(CONFIG_RANDOMIZE_MEMORY)			+= kaslr.o
>  obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION)	+= pti.o
> +obj-$(CONFIG_BROADCAST_TLB_FLUSH)		+= rar.o
>  
>  obj-$(CONFIG_X86_MEM_ENCRYPT)	+= mem_encrypt.o
>  obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_amd.o
> diff --git a/arch/x86/mm/rar.c b/arch/x86/mm/rar.c
> new file mode 100644
> index 000000000000..16dc9b889cbd
> --- /dev/null
> +++ b/arch/x86/mm/rar.c
> @@ -0,0 +1,195 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * RAR TLB shootdown
> + */
> +#include <linux/sched.h>
> +#include <linux/bug.h>
> +#include <asm/current.h>
> +#include <asm/io.h>
> +#include <asm/sync_bitops.h>
> +#include <asm/rar.h>
> +#include <asm/tlbflush.h>
> +
> +static DEFINE_PER_CPU(struct cpumask, rar_cpu_mask);
> +
> +#define RAR_ACTION_OK		0x00
> +#define RAR_ACTION_START	0x01
> +#define RAR_ACTION_ACKED	0x02
> +#define RAR_ACTION_FAIL		0x80

These don't match up with the names that ended up in the public
documentation. Could we realign them, please?

> +#define RAR_MAX_PAYLOADS 32UL
> +
> +static unsigned long rar_in_use = ~(RAR_MAX_PAYLOADS - 1);
> +static struct rar_payload rar_payload[RAR_MAX_PAYLOADS] __page_aligned_bss;
> +static DEFINE_PER_CPU_ALIGNED(u8[RAR_MAX_PAYLOADS], rar_action);

At some point, there needs to be a description of the data structures.
For instance, there's nothing architecturally requiring all CPUs to
share a payload table. But this implementation chooses to have them
share. We need a discussion somewhere of those design decisions.

One thing that also needs discussion: 'rar_in_use' isn't really about
RAR itself. It's a bitmap of whether the payload is allocated.

> +static unsigned long get_payload(void)
> +{

This is more like "allocate a payload slot" than a "get payload"
operation, IMNHO.

> +	while (1) {
> +		unsigned long bit;
> +
> +		/*
> +		 * Find a free bit and confirm it with
> +		 * test_and_set_bit() below.
> +		 */
> +		bit = ffz(READ_ONCE(rar_in_use));
> +
> +		if (bit >= RAR_MAX_PAYLOADS)
> +			continue;
> +
> +		if (!test_and_set_bit((long)bit, &rar_in_use))
> +			return bit;
> +	}
> +}

This also serves like a kind of spinlock to wait for a payload slot to
become free.

> +static void free_payload(unsigned long idx)
> +{
> +	clear_bit(idx, &rar_in_use);
> +}
> +
> +static void set_payload(unsigned long idx, u16 pcid, unsigned long start,
> +			uint32_t pages)
> +{
> +	struct rar_payload *p = &rar_payload[idx];

I'd _probably_ just pass the 'struct rar_payload *' instead of an index.
It's harder to screw up a pointer.

> +	p->must_be_zero_1	= 0;
> +	p->must_be_zero_2	= 0;
> +	p->must_be_zero_3	= 0;
> +	p->page_size		= RAR_INVLPG_PAGE_SIZE_4K;
> +	p->type			= RAR_TYPE_INVPCID;
> +	p->num_pages		= pages;
> +	p->initiator_cr3	= pcid;
> +	p->linear_address	= start;
> +
> +	if (pcid) {
> +		/* RAR invalidation of the mapping of a specific process. */
> +		if (pages >= RAR_INVLPG_MAX_PAGES)
> +			p->subtype = RAR_INVPCID_PCID;
> +		else
> +			p->subtype = RAR_INVPCID_ADDR;
> +	} else {
> +		/*
> +		 * Unfortunately RAR_INVPCID_ADDR excludes global translations.
> +		 * Always do a full flush for kernel invalidations.
> +		 */
> +		p->subtype = RAR_INVPCID_ALL;
> +	}
> +
> +	smp_wmb();
> +}

The barrier could use a comment too.

> +static void set_action_entry(unsigned long idx, int target_cpu)

Just trying to read this, I think we probably should remove the 'idx'
nomenclature and call them "payload_nr"'s or something more descriptive.

> +{
> +	u8 *bitmap = per_cpu(rar_action, target_cpu);
> +
> +	WRITE_ONCE(bitmap[idx], RAR_ACTION_START);
> +}

Maybe a comment like this for set_action_entry() would be helpful:

/*
 * Given a remote CPU, "arm" its action vector to ensure it
 * handles payload number 'idx' when it receives the RAR signal.
 * The remote CPU will overwrite RAR_ACTION_START when it handles
 * the request.
 */

> +static void wait_for_done(unsigned long idx, int target_cpu)
> +{
> +	u8 status;
> +	u8 *rar_actions = per_cpu(rar_action, target_cpu);
> +
> +	status = READ_ONCE(rar_actions[idx]);
> +
> +	while ((status != RAR_ACTION_OK) && (status != RAR_ACTION_FAIL)) {

Should this be:

	while (status == RAR_ACTION_START) {
	...

? That would more clearly link it to set_action_entry() and would also
be shorter.

> +		cpu_relax();
> +		status = READ_ONCE(rar_actions[idx]);
> +	}
> +
> +	WARN_ON_ONCE(rar_actions[idx] == RAR_ACTION_FAIL);
> +}
> +
> +void rar_cpu_init(void)
> +{
> +	u64 r;
> +	u8 *bitmap;
> +	int this_cpu = smp_processor_id();
> +
> +	cpumask_clear(&per_cpu(rar_cpu_mask, this_cpu));
> +
> +	rdmsrl(MSR_IA32_RAR_INFO, r);
> +	pr_info_once("RAR: support %lld payloads\n", r >> 32);

Doesn't this need to get coordinated or checked against RAR_MAX_PAYLOADS?

It might also be nice to use one of the mask functions for this. It's
nice when you see a spec say "37:32" and then you see code actually see
a GENMASK(37, 32) somewhere to match it.

> +	bitmap = (u8 *)per_cpu(rar_action, this_cpu);
> +	memset(bitmap, 0, RAR_MAX_PAYLOADS);
> +	wrmsrl(MSR_IA32_RAR_ACT_VEC, (u64)virt_to_phys(bitmap));
> +	wrmsrl(MSR_IA32_RAR_PAYLOAD_BASE, (u64)virt_to_phys(rar_payload));

	please vertically align the virt_to_phys() ^

> +
> +	r = RAR_CTRL_ENABLE | RAR_CTRL_IGNORE_IF;

Setting RAR_CTRL_IGNORE_IF is probably worth a _little_ discussion in
the changelog.

> +	// reserved bits!!! r |= (RAR_VECTOR & 0xff);

Is this just some cruft from testing?

> +	wrmsrl(MSR_IA32_RAR_CTRL, r);
> +}
> +
> +/*
> + * This is a modified version of smp_call_function_many() of kernel/smp.c,
> + * without a function pointer, because the RAR handler is the ucode.
> + */

It doesn't look _that_ much like smp_call_function_many(). I don't see
much that can be consolidated.

> +void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
> +		       unsigned long start, unsigned long end)
> +{
> +	unsigned long pages = (end - start + PAGE_SIZE) / PAGE_SIZE;
> +	int cpu, next_cpu, this_cpu = smp_processor_id();
> +	cpumask_t *dest_mask;
> +	unsigned long idx;
> +
> +	if (pages > RAR_INVLPG_MAX_PAGES || end == TLB_FLUSH_ALL)
> +		pages = RAR_INVLPG_MAX_PAGES;
> +
> +	/*
> +	 * Can deadlock when called with interrupts disabled.
> +	 * We allow cpu's that are not yet online though, as no one else can

Nit: at some point all of the "we's" need to be excised and moved over
to imperative voice.

> +	 * send smp call function interrupt to this cpu and as such deadlocks
> +	 * can't happen.
> +	 */
> +	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> +		     && !oops_in_progress && !early_boot_irqs_disabled);
> +
> +	/* Try to fastpath.  So, what's a CPU they want?  Ignoring this one. */
> +	cpu = cpumask_first_and(mask, cpu_online_mask);
> +	if (cpu == this_cpu)
> +		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
> +
> +	/* No online cpus?  We're done. */
> +	if (cpu >= nr_cpu_ids)
> +		return;

This little idiom _is_ in smp_call_function_many_cond(). I wonder if it
can be refactored out.

> +	/* Do we have another CPU which isn't us? */
> +	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
> +	if (next_cpu == this_cpu)
> +		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
> +
> +	/* Fastpath: do that cpu by itself. */
> +	if (next_cpu >= nr_cpu_ids) {
> +		idx = get_payload();
> +		set_payload(idx, pcid, start, pages);
> +		set_action_entry(idx, cpu);
> +		arch_send_rar_single_ipi(cpu);
> +		wait_for_done(idx, cpu);
> +		free_payload(idx);
> +		return;
> +	}

FWIW, I'm not sure this is that much of a fast path. I wouldn't be
shocked if _some_ hardware has a much faster way of IPI'ing a single CPU
versus a bunch. But I think arch_send_rar_single_ipi() and
arch_send_rar_ipi_mask() end up frobbing the hardware in pretty similar
ways.

I'd probably just axe this in the name of simplification unless there
are numbers behind it.

> +	dest_mask = this_cpu_ptr(&rar_cpu_mask);
> +	cpumask_and(dest_mask, mask, cpu_online_mask);
> +	cpumask_clear_cpu(this_cpu, dest_mask);
> +
> +	/* Some callers race with other cpus changing the passed mask */
> +	if (unlikely(!cpumask_weight(dest_mask)))
> +		return;
> +
> +	idx = get_payload();
> +	set_payload(idx, pcid, start, pages);
> +
> +	for_each_cpu(cpu, dest_mask)
> +		set_action_entry(idx, cpu);
> +
> +	/* Send a message to all CPUs in the map */
> +	arch_send_rar_ipi_mask(dest_mask);
> +
> +	for_each_cpu(cpu, dest_mask)
> +		wait_for_done(idx, cpu);

Naming nit: Let's give wait_for_done() a more RAR-specific name. It'll
make it clear that this is a RAR opertion and not soemthing generic.

> +	free_payload(idx);
> +}
> +EXPORT_SYMBOL(smp_call_rar_many);



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-21 16:38   ` Dave Hansen
@ 2025-05-21 19:06     ` Thomas Gleixner
  2025-06-03 20:08     ` Rik van Riel
  1 sibling, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2025-05-21 19:06 UTC (permalink / raw)
  To: Dave Hansen, Rik van Riel, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, mingo, bp,
	hpa, nadav.amit, Yu-cheng Yu

On Wed, May 21 2025 at 09:38, Dave Hansen wrote:
> On 5/19/25 18:02, Rik van Riel wrote:
>> +/*
>> + * This is a modified version of smp_call_function_many() of kernel/smp.c,
>> + * without a function pointer, because the RAR handler is the ucode.
>> + */
>
> It doesn't look _that_ much like smp_call_function_many(). I don't see
> much that can be consolidated.

It does not look like it because it has a gazillion of function
arguments, which can all be packed into a data structure, i.e. the
function argument of smp_call_function_many().

There is zero justification to reinvent the wheel and create another
source of hard to debug problems.

IMNSHO it's absolutely not rocket science to reuse
smp_call_function_many() for this, but I might be missing something as
always and I'm happy to be enlightenend.

Just for the record: The changelog contains an utter void of information
why this modified version of well established common code is required
and desired.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR
  2025-05-21 16:06         ` Rik van Riel
@ 2025-05-21 19:39           ` Borislav Petkov
  0 siblings, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2025-05-21 19:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, hpa, nadav.amit, Rik van Riel, Yu-cheng Yu

On Wed, May 21, 2025 at 12:06:59PM -0400, Rik van Riel wrote:
> On Wed, 2025-05-21 at 16:53 +0200, Borislav Petkov wrote:
> > On Wed, May 21, 2025 at 09:57:52AM -0400, Rik van Riel wrote:
> > > I had the same thought, and tried that already.
> > > 
> > > It didn't work.
> > 
> > Care to share why?
> > 
> It resulted in RAR not being properly initialized,
> and the system hanging when trying to use RAR to
> flush the TLB.
> 
> I don't remember exactly what sequence of events
> was happening here, maybe something with the
> boot CPU per-cpu RAR initialization being called
> (or not, due to X86_FEATURE_RAR not being set)
> before the systemwide initialization?

I'm asking you to move it from this path to

                if (this_cpu->c_early_init)
                        this_cpu->c_early_init(c);

or

                if (this_cpu->c_bsp_init)
                        this_cpu->c_bsp_init(c);

a couple of lines above.

This doesn't change anything: you're still running it on the BSP once. So
I don't see how any of the above confusion would happen.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES
  2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
  2025-05-21 14:57   ` Dave Hansen
@ 2025-05-22 15:10   ` Sean Christopherson
  1 sibling, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2025-05-22 15:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Yu-cheng Yu

On Mon, May 19, 2025, Rik van Riel wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> MSR_IA32_CORE_CAPABILITIES indicates the existence of other MSRs.
> Bit[1] indicates Remote Action Request (RAR) TLB registers.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  arch/x86/include/asm/msr-index.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index b7dded3c8113..c848dd4bfceb 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -220,6 +220,12 @@
>  						     * their affected status.
>  						     */
>  
> +#define MSR_IA32_CORE_CAPABILITIES	0x000000cf
> +#define CORE_CAP_RAR			BIT(1)	/*
> +						 * Remote Action Request. Used to directly
> +						 * flush the TLB on remote CPUs.
> +						 */

CORE_CAPABILITIES is already supported and enumerated, it's just abbreviated:

/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
#define MSR_IA32_CORE_CAPS			  0x000000cf
#define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS_BIT	  2
#define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS	  BIT(MSR_IA32_CORE_CAPS_INTEGRITY_CAPS_BIT)
#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-20 12:57     ` Rik van Riel
@ 2025-05-24  9:22       ` Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2025-05-24  9:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Yu-cheng Yu


* Rik van Riel <riel@surriel.com> wrote:

> v3 will have much more cleanups, and hopefully a few optimizations.

Thanks!

	Ingo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 7/9] x86/mm: Introduce Remote Action Request
  2025-05-21 16:38   ` Dave Hansen
  2025-05-21 19:06     ` Thomas Gleixner
@ 2025-06-03 20:08     ` Rik van Riel
  1 sibling, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-06-03 20:08 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: linux-mm, x86, kernel-team, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, nadav.amit, Yu-cheng Yu

On Wed, 2025-05-21 at 09:38 -0700, Dave Hansen wrote:
> 
> > +static void wait_for_done(unsigned long idx, int target_cpu)
> > +{
> > +	u8 status;
> > +	u8 *rar_actions = per_cpu(rar_action, target_cpu);
> > +
> > +	status = READ_ONCE(rar_actions[idx]);
> > +
> > +	while ((status != RAR_ACTION_OK) && (status !=
> > RAR_ACTION_FAIL)) {
> 
> Should this be:
> 
> 	while (status == RAR_ACTION_START) {
> 	...
> 
> ? That would more clearly link it to set_action_entry() and would
> also
> be shorter.
> 
That is a very good question. The old RAR code
suggests there might be some intermediate state
when the target CPU works on processing the
RAR entry, but the current documentation only
shows RAR_SUCCESS, RAR_PENDING, and RAR_FAILURE
as possible values.

Lets try with status == RAR_ACTION_PENDING.

> > 
> > +void rar_cpu_init(void)
> > +{
> > +	u64 r;
> > +	u8 *bitmap;
> > +	int this_cpu = smp_processor_id();
> > +
> > +	cpumask_clear(&per_cpu(rar_cpu_mask, this_cpu));
> > +
> > +	rdmsrl(MSR_IA32_RAR_INFO, r);
> > +	pr_info_once("RAR: support %lld payloads\n", r >> 32);
> 
> Doesn't this need to get coordinated or checked against
> RAR_MAX_PAYLOADS?

I just added that in, and also applied all the cleanups
from your email.

> 
> > +	// reserved bits!!! r |= (RAR_VECTOR & 0xff);
> 
> Is this just some cruft from testing?
> 
I'm kind of guessing the old code might have used this
value to specify which IRQ vector to use for RAR, but
modern microcode hardcodes the RAR_VECTOR value.

> > +	wrmsrl(MSR_IA32_RAR_CTRL, r);
> > +}
> > +
> > +/*
> > + * This is a modified version of smp_call_function_many() of
> > kernel/smp.c,
> > + * without a function pointer, because the RAR handler is the
> > ucode.
> > + */
> 
> It doesn't look _that_ much like smp_call_function_many(). I don't
> see
> much that can be consolidated.

Agreed. It looks even less like it after some more
simplifications.

> 
> > +	/* No online cpus?  We're done. */
> > +	if (cpu >= nr_cpu_ids)
> > +		return;
> 
> This little idiom _is_ in smp_call_function_many_cond(). I wonder if
> it
> can be refactored out.

Removing the arch_send_rar_single_ipi fast path
gets rid of this code completely.

Once we cpumask_and with the cpu_online_mask,
the cpumask_weight should end up as 0 if no
online CPUs are in the mask.

Thank you for all the cleanup suggestions.
I've tried to address them all for v3.


-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations
  2025-05-20  9:16   ` Ingo Molnar
@ 2025-06-04  0:11     ` Rik van Riel
  0 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2025-06-04  0:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-mm, x86, kernel-team, dave.hansen, luto,
	peterz, tglx, mingo, bp, hpa, nadav.amit, Rik van Riel,
	Yu-cheng Yu

On Tue, 2025-05-20 at 11:16 +0200, Ingo Molnar wrote:
> 
> * Rik van Riel <riel@surriel.com> wrote:
> 
> > diff --git a/arch/x86/include/asm/irq_vectors.h
> > b/arch/x86/include/asm/irq_vectors.h
> > index 47051871b436..c417b0015304 100644
> > --- a/arch/x86/include/asm/irq_vectors.h
> > +++ b/arch/x86/include/asm/irq_vectors.h
> > @@ -103,6 +103,11 @@
> >   */
> >  #define POSTED_MSI_NOTIFICATION_VECTOR	0xeb
> >  
> > +/*
> > + * RAR (remote action request) TLB flush
> > + */
> > +#define RAR_VECTOR			0xe0
> > +
> >  #define NR_VECTORS			 256
> 
> This subtly breaks x86 IRQ vector allocation AFAICS.
> 
> Right now device IRQ vectors are allocated from 0x81 to 
> FIRST_SYSTEM_VECTOR (POSTED_MSI_NOTIFICATION_VECTOR) or 0xeb.
> 
> But RAR_VECTOR is within that range, the the IRQ allocator will
> overlap 
> it and result in what I guess will be misbehaving RAR code and 
> misbehaving device IRQ handling once it hands out 0xeb as well.

Sure enough! After fixing this issue, the nearly instant
segfaults for programs using RAR are no longer happening.

I'll let it run tests overnight, and will hopefully be able
to post a reliable v3 tomorrow.

Thank you!

-- 
All Rights Reversed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-06-04  0:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20  1:02 [RFC v2 PATCH 0/9] Intel RAR TLB invalidation Rik van Riel
2025-05-20  1:02 ` [RFC v2 1/9] x86/mm: Introduce MSR_IA32_CORE_CAPABILITIES Rik van Riel
2025-05-21 14:57   ` Dave Hansen
2025-05-22 15:10   ` Sean Christopherson
2025-05-20  1:02 ` [RFC v2 2/9] x86/mm: Introduce Remote Action Request MSRs Rik van Riel
2025-05-21 11:49   ` Borislav Petkov
2025-05-20  1:02 ` [RFC v2 3/9] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too Rik van Riel
2025-05-20  1:02 ` [RFC v2 4/9] x86/mm: Introduce X86_FEATURE_RAR Rik van Riel
2025-05-21 11:53   ` Borislav Petkov
2025-05-21 13:57     ` Rik van Riel
2025-05-21 14:53       ` Borislav Petkov
2025-05-21 16:06         ` Rik van Riel
2025-05-21 19:39           ` Borislav Petkov
2025-05-20  1:02 ` [RFC v2 5/9] x86/mm: Change cpa_flush() to call flush_kernel_range() directly Rik van Riel
2025-05-21 11:54   ` Borislav Petkov
2025-05-21 15:16   ` Dave Hansen
2025-05-20  1:02 ` [RFC v2 6/9] x86/apic: Introduce Remote Action Request Operations Rik van Riel
2025-05-20  9:16   ` Ingo Molnar
2025-06-04  0:11     ` Rik van Riel
2025-05-21 15:28   ` Dave Hansen
2025-05-21 15:59     ` Rik van Riel
2025-05-20  1:02 ` [RFC v2 7/9] x86/mm: Introduce Remote Action Request Rik van Riel
2025-05-20  9:28   ` Ingo Molnar
2025-05-20 12:57     ` Rik van Riel
2025-05-24  9:22       ` Ingo Molnar
2025-05-20 11:29   ` Nadav Amit
2025-05-20 13:00     ` Rik van Riel
2025-05-20 20:26       ` Nadav Amit
2025-05-20 20:31         ` Rik van Riel
2025-05-21 16:38   ` Dave Hansen
2025-05-21 19:06     ` Thomas Gleixner
2025-06-03 20:08     ` Rik van Riel
2025-05-20  1:02 ` [RFC v2 8/9] x86/mm: use RAR for kernel TLB flushes Rik van Riel
2025-05-20  1:02 ` [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR Rik van Riel
2025-05-20  2:48   ` [RFC v2.1 " Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).