Re: [PATCH v10 7/9] x86/tlb: enable tlb flush range support for x86

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "H. Peter Anvin" <hpa@zytor.com>
To: Alex Shi <alex.shi@intel.com>
Cc: Borislav Petkov <bp@amd64.org>,
	tglx@linutronix.de, mingo@redhat.com, arnd@arndb.de,
	rostedt@goodmis.org, fweisbec@gmail.com, jeremy@goop.org,
	luto@mit.edu, yinghai@kernel.org, riel@redhat.com,
	avi@redhat.com, len.brown@intel.com, tj@kernel.org,
	akpm@linux-foundation.org, cl@gentwo.org,
	borislav.petkov@amd.com, ak@linux.intel.com, jbeulich@suse.com,
	eric.dumazet@gmail.com, akinobu.mita@gmail.com,
	vapier@gentoo.org, cpw@sgi.com, steiner@sgi.com,
	viro@zeniv.linux.org.uk, kamezawa.hiroyu@jp.fujitsu.com,
	rientjes@google.com, aarcange@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 7/9] x86/tlb: enable tlb flush range support for x86
Date: Thu, 19 Jul 2012 17:44:49 -0700	[thread overview]
Message-ID: <ace2ffd4-5c19-4e82-8324-4ce70bf641f3@email.android.com> (raw)
In-Reply-To: <5008A108.7070602@intel.com>

Separate is better.  When I say clean patch I mean in a separate email so git am can process it.

Alex Shi <alex.shi@intel.com> wrote:

>On 07/20/2012 07:56 AM, H. Peter Anvin wrote:
>
>> On 07/19/2012 04:52 PM, Alex Shi wrote:
>>>
>>> Sure, it is a bug, the fix had sent:
>>> https://lkml.org/lkml/2012/7/6/350
>>>
>> 
>> Could you please re-send that as a clean patch?
>> 
>> 	-hpa
>> 
>
>
>
>
>Since, it has not impact for the serial left patches, and linux-next
>has not merge this patchset. I folded this patch into original. 
>Is that ok, or need a separated one?
>
>===
>From 2e6117dfda5b323261e959bb5faf778cbe4b3c64 Mon Sep 17 00:00:00 2001
>From: Alex Shi <alex.shi@intel.com>
>Date: Mon, 25 Jun 2012 11:06:46 +0800
>Subject: [PATCH 7/9] x86/tlb: enable tlb flush range support for x86
>
>Not every tlb_flush execution moment is really need to evacuate all
>TLB entries, like in munmap, just few 'invlpg' is better for whole
>process performance, since it leaves most of TLB entries for later
>accessing.
>
>This patch also rewrite flush_tlb_range for 2 purposes:
>1, split it out to get flush_blt_mm_range function.
>2, clean up to reduce line breaking, thanks for Borislav's input.
>
>My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
>show that the random memory access on other CPU has 0~50% speed up
>on a 2P * 4cores * HT NHM EP while do 'munmap'.
>
>Thanks Yongjie's testing on this patch:
>-------------
>I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest
>kernel.
>After running two benchmarks in Xen HVM guest, I found his patches
>brought about 1%~3% performance gain in 'kernel build' and 'netperf'
>testing, though the performance gain was not very stable in 'kernel
>build' testing.
>
>Some detailed testing results are below.
>
>Testing Environment:
>	Hardware: Romley-EP platform
>	Xen version: latest upstream
>	Linux kernel: 3.4-RC6
>	Guest vCPU number: 8
>	NIC: Intel 82599 (10GB bandwidth)
>
>In 'kernel build' testing in guest:
>	Command line  |  performance gain
>    make -j 4      |    3.81%
>    make -j 8      |    0.37%
>    make -j 16     |    -0.52%
>
>In 'netperf' testing, we tested TCP_STREAM with default socket size
>16384 byte as large packet and 64 byte as small packet.
>I used several clients to add networking pressure, then 'netperf'
>server
>automatically generated several threads to response them.
>I also used large-size packet and small-size packet in the testing.
>	Packet size  |  Thread number | performance gain
>	16384 bytes  |      4       |   0.02%
>	16384 bytes  |      8       |   2.21%
>	16384 bytes  |      16      |   2.04%
>	64 bytes     |      4       |   1.07%
>	64 bytes     |      8       |   3.31%
>	64 bytes     |      16      |   0.71%
>
>This patch also fold a flush_tlb_mm_range() fixing in 'make
>allnoconfig'
>, that reported by Tetsuo Handa. Thanks!
>
>Signed-off-by: Alex Shi <alex.shi@intel.com>
>Tested-by: Ren, Yongjie <yongjie.ren@intel.com>
>---
> arch/x86/include/asm/tlb.h      |    9 +++-
> arch/x86/include/asm/tlbflush.h |   17 +++++-
>arch/x86/mm/tlb.c               |  112
>++++++++++++++++-----------------------
> 3 files changed, 68 insertions(+), 70 deletions(-)
>
>diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
>index 829215f..4fef207 100644
>--- a/arch/x86/include/asm/tlb.h
>+++ b/arch/x86/include/asm/tlb.h
>@@ -4,7 +4,14 @@
> #define tlb_start_vma(tlb, vma) do { } while (0)
> #define tlb_end_vma(tlb, vma) do { } while (0)
> #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
>-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
>+
>+#define tlb_flush(tlb)							\
>+{									\
>+	if (tlb->fullmm == 0)						\
>+		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end, 0UL);	\
>+	else								\
>+		flush_tlb_mm_range(tlb->mm, 0UL, TLB_FLUSH_ALL, 0UL);	\
>+}
> 
> #include <asm-generic/tlb.h>
> 
>diff --git a/arch/x86/include/asm/tlbflush.h
>b/arch/x86/include/asm/tlbflush.h
>index 33608d9..4fc8faf 100644
>--- a/arch/x86/include/asm/tlbflush.h
>+++ b/arch/x86/include/asm/tlbflush.h
>@@ -105,6 +105,13 @@ static inline void flush_tlb_range(struct
>vm_area_struct *vma,
> 		__flush_tlb();
> }
> 
>+static inline void flush_tlb_mm_range(struct mm_struct *mm,
>+	   unsigned long start, unsigned long end, unsigned long vmflag)
>+{
>+	if (mm == current->active_mm)
>+		__flush_tlb();
>+}
>+
>static inline void native_flush_tlb_others(const struct cpumask
>*cpumask,
> 					   struct mm_struct *mm,
> 					   unsigned long start,
>@@ -122,12 +129,16 @@ static inline void reset_lazy_tlbstate(void)
> 
> #define local_flush_tlb() __flush_tlb()
> 
>+#define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL,
>0UL)
>+
>+#define flush_tlb_range(vma, start, end)	\
>+		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
>+
> extern void flush_tlb_all(void);
> extern void flush_tlb_current_task(void);
>-extern void flush_tlb_mm(struct mm_struct *);
> extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
>-extern void flush_tlb_range(struct vm_area_struct *vma,
>-				   unsigned long start, unsigned long end);
>+extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long
>start,
>+				unsigned long end, unsigned long vmflag);
> 
> #define flush_tlb()	flush_tlb_current_task()
> 
>diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>index 5911f61..481737d 100644
>--- a/arch/x86/mm/tlb.c
>+++ b/arch/x86/mm/tlb.c
>@@ -301,23 +301,10 @@ void flush_tlb_current_task(void)
> 	preempt_enable();
> }
> 
>-void flush_tlb_mm(struct mm_struct *mm)
>-{
>-	preempt_disable();
>-
>-	if (current->active_mm == mm) {
>-		if (current->mm)
>-			local_flush_tlb();
>-		else
>-			leave_mm(smp_processor_id());
>-	}
>-	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
>-		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
>-
>-	preempt_enable();
>-}
>-
>-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>+/*
>+ * It can find out the THP large page, or
>+ * HUGETLB page in tlb_flush when THP disabled
>+ */
> static inline unsigned long has_large_page(struct mm_struct *mm,
> 				 unsigned long start, unsigned long end)
> {
>@@ -339,68 +326,61 @@ static inline unsigned long has_large_page(struct
>mm_struct *mm,
> 	}
> 	return 0;
> }
>-#else
>-static inline unsigned long has_large_page(struct mm_struct *mm,
>-				 unsigned long start, unsigned long end)
>-{
>-	return 0;
>-}
>-#endif
>-void flush_tlb_range(struct vm_area_struct *vma,
>-				   unsigned long start, unsigned long end)
>-{
>-	struct mm_struct *mm;
> 
>-	if (vma->vm_flags & VM_HUGETLB || tlb_flushall_shift == -1) {
>-flush_all:
>-		flush_tlb_mm(vma->vm_mm);
>-		return;
>-	}
>+void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>+				unsigned long end, unsigned long vmflag)
>+{
>+	unsigned long addr;
>+	unsigned act_entries, tlb_entries = 0;
> 
> 	preempt_disable();
>-	mm = vma->vm_mm;
>-	if (current->active_mm == mm) {
>-		if (current->mm) {
>-			unsigned long addr, vmflag = vma->vm_flags;
>-			unsigned act_entries, tlb_entries = 0;
>+	if (current->active_mm != mm)
>+		goto flush_all;
> 
>-			if (vmflag & VM_EXEC)
>-				tlb_entries = tlb_lli_4k[ENTRIES];
>-			else
>-				tlb_entries = tlb_lld_4k[ENTRIES];
>-
>-			act_entries = tlb_entries > mm->total_vm ?
>-					mm->total_vm : tlb_entries;
>+	if (!current->mm) {
>+		leave_mm(smp_processor_id());
>+		goto flush_all;
>+	}
> 
>-			if ((end - start) >> PAGE_SHIFT >
>-					act_entries >> tlb_flushall_shift)
>-				local_flush_tlb();
>-			else {
>-				if (has_large_page(mm, start, end)) {
>-					preempt_enable();
>-					goto flush_all;
>-				}
>-				for (addr = start; addr < end;
>-						addr += PAGE_SIZE)
>-					__flush_tlb_single(addr);
>+	if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
>+					|| vmflag == VM_HUGETLB) {
>+		local_flush_tlb();
>+		goto flush_all;
>+	}
> 
>-				if (cpumask_any_but(mm_cpumask(mm),
>-					smp_processor_id()) < nr_cpu_ids)
>-					flush_tlb_others(mm_cpumask(mm), mm,
>-								start, end);
>-				preempt_enable();
>-				return;
>-			}
>-		} else {
>-			leave_mm(smp_processor_id());
>+	/* In modern CPU, last level tlb used for both data/ins */
>+	if (vmflag & VM_EXEC)
>+		tlb_entries = tlb_lli_4k[ENTRIES];
>+	else
>+		tlb_entries = tlb_lld_4k[ENTRIES];
>+	/* Assume all of TLB entries was occupied by this task */
>+	act_entries = mm->total_vm > tlb_entries ? tlb_entries :
>mm->total_vm;
>+
>+	/* tlb_flushall_shift is on balance point, details in commit log */
>+	if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift)
>+		local_flush_tlb();
>+	else {
>+		if (has_large_page(mm, start, end)) {
>+			local_flush_tlb();
>+			goto flush_all;
> 		}
>+		/* flush range by one by one 'invlpg' */
>+		for (addr = start; addr < end;	addr += PAGE_SIZE)
>+			__flush_tlb_single(addr);
>+
>+		if (cpumask_any_but(mm_cpumask(mm),
>+				smp_processor_id()) < nr_cpu_ids)
>+			flush_tlb_others(mm_cpumask(mm), mm, start, end);
>+		preempt_enable();
>+		return;
> 	}
>+
>+flush_all:
> 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
> 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
> 	preempt_enable();
> }
> 
>-
> void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
> {
> 	struct mm_struct *mm = vma->vm_mm;
>-- 
>1.7.5.4

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

next prev parent reply	other threads:[~2012-07-20  0:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-28  1:02 [PATCH v10 0/9] X86 TLB flush optimization Alex Shi
2012-06-28  1:02 ` [PATCH v10 1/9] x86/tlb_info: get last level TLB entry number of CPU Alex Shi
2012-06-28 15:37   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 2/9] x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range Alex Shi
2012-06-28 15:38   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 3/9] x86/tlb: fall back to flush all when meet a THP large page Alex Shi
2012-06-28 15:39   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 4/9] x86/tlb: add tlb_flushall_shift for specific CPU Alex Shi
2012-06-28 15:40   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 5/9] x86/tlb: add tlb_flushall_shift knob into debugfs Alex Shi
2012-06-28 15:41   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 6/9] mm/mmu_gather: enable tlb flush range in generic mmu_gather Alex Shi
2012-06-28 15:42   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 7/9] x86/tlb: enable tlb flush range support for x86 Alex Shi
2012-06-28 15:42   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-07-19 12:20   ` [PATCH v10 7/9] " Borislav Petkov
2012-07-19 23:52     ` Alex Shi
2012-07-19 23:56       ` H. Peter Anvin
2012-07-20  0:06         ` Alex Shi
2012-07-20  0:44           ` H. Peter Anvin [this message]
2012-06-28  1:02 ` [PATCH v10 8/9] x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR Alex Shi
2012-06-28 15:43   ` [tip:x86/mm] " tip-bot for Alex Shi
2012-06-28  1:02 ` [PATCH v10 9/9] x86/tlb: do flush_tlb_kernel_range by 'invlpg' Alex Shi
2012-06-28 15:44   ` [tip:x86/mm] " tip-bot for Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ace2ffd4-5c19-4e82-8324-4ce70bf641f3@email.android.com \
    --to=hpa@zytor.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akinobu.mita@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=borislav.petkov@amd.com \
    --cc=bp@amd64.org \
    --cc=cl@gentwo.org \
    --cc=cpw@sgi.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=jeremy@goop.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=mingo@redhat.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vapier@gentoo.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox