public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [REF PATCH] x86/tlb: just do tlb flush on one of siblings of SMT
@ 2016-04-06  3:14 Alex Shi
  2016-04-06  4:47 ` Andy Lutomirski
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Shi @ 2016-04-06  3:14 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Alex Shi, Andrew Morton, Andy Lutomirski, Rik van Riel

It seems Intel core still share the TLB pool, flush both of threads' TLB
just cause a extra useless IPI and a extra flush. The extra flush will 
flush out TLB again which another thread just introduced.
That's double waste.

The micro testing show memory access can save about 25% time on my 
haswell i7 desktop.
munmap source code is here: https://lkml.org/lkml/2012/5/17/59

test result on Kernel v4.5.0:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 57ms 14072ns/time, memory access uses 48356 times/thread/ms, cost 20ns/time

 Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':

        18,739,808      dTLB-load-misses          #    2.47% of all dTLB cache hits   (43.05%)
       757,380,911      dTLB-loads                                                    (34.34%)
         2,125,275      dTLB-store-misses                                             (32.23%)
       318,307,759      dTLB-stores                                                   (46.32%)
            32,765      iTLB-load-misses          #    2.03% of all iTLB cache hits   (56.90%)
         1,616,237      iTLB-loads                                                    (44.47%)
            41,476      tlb:tlb_flush

       1.443484546 seconds time elapsed

/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 32262

test result on Kernel v4.5.0 + this patch:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 48ms 11933ns/time, memory access uses 59966 times/thread/ms, cost 16ns/time

 Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':

        15,984,772      dTLB-load-misses          #    1.89% of all dTLB cache hits   (41.72%)
       844,099,241      dTLB-loads                                                    (33.30%)
         1,328,102      dTLB-store-misses                                             (52.13%)
       280,902,875      dTLB-stores                                                   (52.03%)
            27,678      iTLB-load-misses          #    1.67% of all iTLB cache hits   (35.35%)
         1,659,550      iTLB-loads                                                    (38.38%)
            25,137      tlb:tlb_flush

       1.428880301 seconds time elapsed

/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 15912

BTW, 
This change isn't architecturally guaranteed.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-kernel@vger.kernel.org
To: Mel Gorman <mgorman@suse.de>
To: x86@kernel.org
To: "H. Peter Anvin" <hpa@zytor.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Alex Shi <alex.shi@linaro.org>
---
 arch/x86/mm/tlb.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 8f4cc3d..6510316 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -134,7 +134,10 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 				 struct mm_struct *mm, unsigned long start,
 				 unsigned long end)
 {
+	int cpu;
 	struct flush_tlb_info info;
+	cpumask_t flush_mask, *sblmask;
+
 	info.flush_mm = mm;
 	info.flush_start = start;
 	info.flush_end = end;
@@ -151,7 +154,23 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 								&info, 1);
 		return;
 	}
-	smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+
+	if (unlikely(smp_num_siblings <= 1)) {
+		smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+		return;
+	}
+
+	/* Only one flush needed on both siblings of SMT */
+	cpumask_copy(&flush_mask, cpumask);
+	for_each_cpu(cpu, &flush_mask) {
+		sblmask = topology_sibling_cpumask(cpu);
+		if (!cpumask_subset(sblmask, &flush_mask))
+			continue;
+
+		cpumask_clear_cpu(cpumask_next(cpu, sblmask), &flush_mask);
+	}
+
+	smp_call_function_many(&flush_mask, flush_tlb_func, &info, 1);
 }
 
 void flush_tlb_current_task(void)
-- 
2.7.2.333.g70bd996

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-06  5:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-06  3:14 [REF PATCH] x86/tlb: just do tlb flush on one of siblings of SMT Alex Shi
2016-04-06  4:47 ` Andy Lutomirski
2016-04-06  5:15   ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox