All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>
Subject: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
Date: Tue, 3 Dec 2024 14:48:45 -0500	[thread overview]
Message-ID: <20241203144845.7093ea1a@fangorn> (raw)
In-Reply-To: <5dcb4050-f0f3-43d6-b4b1-42fa305a0fba@efficios.com>

On Tue, 3 Dec 2024 09:57:55 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:


> I'd recommend to rename "last_trimmed_cpumask" to "next_trim_cpumask",
> and always update it to "jiffies + HZ". Then we can remove the addition
> from the comparison in the should_flush_tlb() fast-path:

Thanks Mathieu, I have applied your suggested improvements,
except for the one you posted as a separate patch earlier.

---8<---

From c7d04233f15ba217ce6ebd0dcf12fab91c437e96 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second

Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.

Synchronization is handled by switch_mm_irqs_off blocking interrupts.

Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.

Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.

The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.

Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
 arch/x86/include/asm/mmu.h         |  2 ++
 arch/x86/include/asm/mmu_context.h |  1 +
 arch/x86/mm/tlb.c                  | 27 ++++++++++++++++++++++++---
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..3b496cdcb74b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
 	 */
 	atomic64_t tlb_gen;
 
+	unsigned long next_trim_cpumask;
+
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 	struct rw_semaphore	ldt_usr_sem;
 	struct ldt_struct	*ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb668d7f..795fdd53bd0a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
 
 	mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
 	atomic64_set(&mm->context.tlb_gen, 0);
+	mm->context.next_trim_cpumask = jiffies + HZ;
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 	if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa90d3d..e90edbbf0188 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -759,8 +759,11 @@ static void flush_tlb_func(void *info)
 
 		/* Can only happen on remote CPUs */
 		if (f->mm && f->mm != loaded_mm) {
+			unsigned long next_jiffies = jiffies + HZ;
 			cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
 			trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
+			if (time_after(next_jiffies, READ_ONCE(f->mm->context.next_trim_cpumask)))
+				WRITE_ONCE(f->mm->context.next_trim_cpumask, next_jiffies);
 			return;
 		}
 	}
@@ -892,9 +895,27 @@ static void flush_tlb_func(void *info)
 			nr_invalidate);
 }
 
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
 {
-	return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+	struct flush_tlb_info *info = data;
+
+	/* Lazy TLB will get flushed at the next context switch. */
+	if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+		return false;
+
+	/* No mm means kernel memory flush. */
+	if (!info->mm)
+		return true;
+
+	/* The target mm is loaded, and the CPU is not lazy. */
+	if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+		return true;
+
+	/* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+	if (time_after(jiffies, info->mm->context.next_trim_cpumask))
+		return true;
+
+	return false;
 }
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +949,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
 	if (info->freed_tables)
 		on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
 	else
-		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+		on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
 				(void *)info, 1, cpumask);
 }
 
-- 
2.47.0


  reply	other threads:[~2024-12-03 19:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
2024-11-28 16:21 ` Peter Zijlstra
2024-11-29  1:44   ` Oliver Sang
2024-11-28 19:46 ` Mathieu Desnoyers
2024-11-29  2:52   ` Rik van Riel
2024-12-02 16:30     ` Mathieu Desnoyers
2024-12-02 18:10       ` Rik van Riel
2024-12-02 16:50   ` Dave Hansen
2024-12-03  0:43 ` [PATCH] x86,mm: only trim the mm_cpumask once a second Rik van Riel
2024-12-04 13:15   ` Oliver Sang
2024-12-04 16:07     ` Rik van Riel
2024-12-04 16:56     ` [PATCH v3] " Rik van Riel
2024-12-04 20:19       ` Mathieu Desnoyers
2024-12-05  2:03         ` [PATCH v4] " Rik van Riel
2024-12-06  1:30           ` Oliver Sang
2024-12-06  9:40           ` [tip: x86/mm] x86/mm/tlb: Only " tip-bot2 for Rik van Riel
2024-12-03  1:22 ` [PATCH -tip] x86,mm: only " Rik van Riel
2024-12-03 14:57   ` Mathieu Desnoyers
2024-12-03 19:48     ` Rik van Riel [this message]
2024-12-03 20:05       ` [PATCH v2] " Dave Hansen
2024-12-03 20:07         ` Rik van Riel
2024-12-04  0:46           ` Dave Hansen
2024-12-04  1:43             ` Rik van Riel
2024-12-03 23:27       ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241203144845.7093ea1a@fangorn \
    --to=riel@surriel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.