From: Borislav Petkov <bp@amd64.org>
To: Alex Shi <alex.shi@intel.com>
Cc: rob@landley.net, tglx@linutronix.de, mingo@redhat.com,
hpa@zytor.com, arnd@arndb.de, rostedt@goodmis.org,
fweisbec@gmail.com, jeremy@goop.org, gregkh@linuxfoundation.org,
borislav.petkov@amd.com, riel@redhat.com, luto@mit.edu,
avi@redhat.com, len.brown@intel.com, dhowells@redhat.com,
fenghua.yu@intel.com, ak@linux.intel.com, cpw@sgi.com,
steiner@sgi.com, akpm@linux-foundation.org, penberg@kernel.org,
hughd@google.com, rientjes@google.com,
kosaki.motohiro@jp.fujitsu.com, n-horiguchi@ah.jp.nec.com,
paul.gortmaker@windriver.com, trenn@suse.de, tj@kernel.org,
oleg@redhat.com, axboe@kernel.dk, a.p.zijlstra@chello.nl,
kamezawa.hiroyu@jp.fujitsu.com, viro@zeniv.linux.org.uk,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 3/7] x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
Date: Thu, 10 May 2012 10:42:13 +0200 [thread overview]
Message-ID: <20120510084213.GD30055@aftab.osrc.amd.com> (raw)
In-Reply-To: <1336626013-28413-4-git-send-email-alex.shi@intel.com>
On Thu, May 10, 2012 at 01:00:09PM +0800, Alex Shi wrote:
> x86 has no flush_tlb_range support in instruction level. Currently the
> flush_tlb_range just implemented by flushing all page table. That is not
> the best solution for all scenarios. In fact, if we just use 'invlpg' to
> flush few lines from TLB, we can get the performance gain from later
> remain TLB lines accessing.
>
> But the 'invlpg' instruction costs much of time. Its execution time can
> compete with cr3 rewriting, and even a bit more on SNB CPU.
>
> So, on a 512 4KB TLB entries CPU, the balance points is at:
> (512 - X) * 100ns(assumed TLB refill cost) =
> X(TLB flush entries) * 100ns(assumed invlpg cost)
>
> Here, X is 256, that is 1/2 of 512 entries.
>
> But with the mysterious CPU pre-fetcher and page miss handler Unit, the
> assumed TLB refill cost is far lower then 100ns in sequential access. And
> 2 HT siblings in one core makes the memory access more faster if they are
> accessing the same memory. So, in the patch, I just do the change when
> the target entries is less than 1/16 of whole active tlb entries.
> Actually, I have no data support for the percentage '1/16', so any
> suggestions are welcomed.
>
> As to hugetlb, guess due to smaller page table, and smaller active TLB
> entries, I didn't see benefit via my benchmark, so no optimizing now.
>
> My macro benchmark show in ideal scenarios, the performance improves 70
> percent in reading. And in worst scenario, the reading/writing
> performance is similar with unpatched 3.4-rc4 kernel.
>
> Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
> 'always':
>
> multi thread testing, '-t' paramter is thread number:
> with patch unpatched 3.4-rc4
> ./mprotect -t 1 14ns 24ns
> ./mprotect -t 2 13ns 22ns
> ./mprotect -t 4 12ns 19ns
> ./mprotect -t 8 14ns 16ns
> ./mprotect -t 16 28ns 26ns
> ./mprotect -t 32 54ns 51ns
> ./mprotect -t 128 200ns 199ns
>
> Single process with sequencial flushing and memory accessing:
>
> with patch unpatched 3.4-rc4
> ./mprotect 7ns 11ns
> ./mprotect -p 4096 -l 8 -n 10240
> 21ns 21ns
>
> I also tried other benchmarks on Intel core2/NHM/SNB EP and NHM EX machine.
> No clear performance change on specjbb2005 with openjdk, and oltp reading.
>
> Signed-off-by: Alex Shi <alex.shi@intel.com>
[ … ]
> +
> +#define FLUSHALL_BAR 16
> +
Btw, you can save a bunch of indenting on this function, let me add
the final version here from the whole patchset so I can comment on it
easier:
> void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
> unsigned long end, unsigned long vmflag)
> {
> preempt_disable();
> if (current->active_mm == mm) {
if (current->active_mm != mm)
goto flush_all;
Now this whole piece below can move one indentation level to the left.
Then you can do:
if (!current->mm)
goto leave;
and add the "leave" label below.
Now you're saving yet another indentation level, bringing the meat of
the function at 1st indentation level, which is cool and gives you much
more room so that you don't have to linebreak longer statements.
> if (current->mm) {
> unsigned long addr;
> unsigned long act_entries, tlb_entries = 0;
>
> if (end == TLB_FLUSH_ALL ||
> tlb_flushall_factor == (u16)TLB_FLUSH_ALL) {
> local_flush_tlb();
> goto flush_all;
> }
> if (vmflag & VM_EXEC)
> tlb_entries = tlb_lli_4k[ENTRIES];
> else
> tlb_entries = tlb_lld_4k[ENTRIES];
> act_entries = min(mm->total_vm, tlb_entries);
>
> if ((end - start) >> PAGE_SHIFT >
> act_entries >> tlb_flushall_factor)
> local_flush_tlb();
> else {
> if (has_large_page(mm, start, end)) {
> local_flush_tlb();
> goto flush_all;
> }
> for (addr = start; addr <= end;
> addr += PAGE_SIZE)
> __flush_tlb_single(addr);
>
> if (cpumask_any_but(mm_cpumask(mm),
> smp_processor_id()) < nr_cpu_ids)
> flush_tlb_others(mm_cpumask(mm), mm,
> start, end);
> preempt_enable();
> return;
> }
> } else {
> leave_mm(smp_processor_id());
> }
> }
leave:
leave_mm(smp_processor_id());
> flush_all:
> if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
> flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
> preempt_enable();
> }
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2012-05-10 8:42 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-10 5:00 [PATCH v4 0/7] tlb flush optimization Alex Shi
2012-05-10 5:00 ` [PATCH v4 1/7] x86/tlb: unify TLB_FLUSH_ALL definition Alex Shi
2012-05-10 18:46 ` Rob Landley
2012-05-11 18:33 ` H. Peter Anvin
2012-05-10 5:00 ` [PATCH v4 2/7] x86/tlb_info: get last level TLB entry number of CPU Alex Shi
2012-05-10 14:43 ` Borislav Petkov
2012-05-11 0:33 ` Alex Shi
2012-05-10 15:58 ` Borislav Petkov
2012-05-11 0:38 ` Alex Shi
2012-05-10 5:00 ` [PATCH v4 3/7] x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range Alex Shi
2012-05-10 7:53 ` Borislav Petkov
2012-05-10 8:50 ` Alex Shi
2012-05-10 21:42 ` Rob Landley
2012-05-10 8:42 ` Borislav Petkov [this message]
2012-05-10 9:04 ` Alex Shi
2012-05-12 8:01 ` Alex Shi
2012-05-13 11:13 ` Borislav Petkov
2012-05-15 1:06 ` Alex Shi
2012-05-15 10:33 ` Borislav Petkov
2012-05-15 11:16 ` Peter Zijlstra
2012-05-15 11:56 ` Borislav Petkov
2012-05-15 12:00 ` Peter Zijlstra
2012-05-15 13:58 ` Alex Shi
2012-05-10 5:00 ` [PATCH v4 4/7] x86/tlb: fall back to flush all when meet a THP large page Alex Shi
2012-05-10 9:29 ` Peter Zijlstra
2012-05-10 10:40 ` Borislav Petkov
2012-05-11 0:44 ` Alex Shi
2012-05-11 9:03 ` Peter Zijlstra
2012-05-11 16:28 ` Andrea Arcangeli
2012-05-12 7:58 ` Alex Shi
2012-05-10 5:00 ` [PATCH v4 5/7] x86/tlb: add tlb flush all factor for specific CPU Alex Shi
2012-05-10 9:35 ` Peter Zijlstra
2012-05-11 0:47 ` Alex Shi
2012-05-10 9:37 ` Peter Zijlstra
2012-05-11 0:48 ` Alex Shi
2012-05-10 9:38 ` Peter Zijlstra
2012-05-10 10:42 ` Borislav Petkov
2012-05-11 0:50 ` Alex Shi
2012-05-11 0:49 ` Alex Shi
2012-05-11 9:04 ` Peter Zijlstra
2012-05-11 9:04 ` Peter Zijlstra
2012-05-11 12:51 ` Alex Shi
2012-05-10 5:00 ` [PATCH v4 6/7] x86/tlb: optimizing flush_tlb_mm Alex Shi
2012-05-10 8:27 ` Peter Zijlstra
2012-05-10 5:00 ` [PATCH v4 7/7] x86/tlb: add tlb_flushall_factor into sysfs for user testing/tuning Alex Shi
2012-05-10 8:27 ` Borislav Petkov
2012-05-11 0:52 ` Alex Shi
2012-05-11 9:51 ` Borislav Petkov
2012-05-11 12:53 ` Alex Shi
2012-05-10 15:13 ` Greg KH
2012-05-11 0:59 ` Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120510084213.GD30055@aftab.osrc.amd.com \
--to=bp@amd64.org \
--cc=a.p.zijlstra@chello.nl \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=axboe@kernel.dk \
--cc=borislav.petkov@amd.com \
--cc=cpw@sgi.com \
--cc=dhowells@redhat.com \
--cc=fenghua.yu@intel.com \
--cc=fweisbec@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jeremy@goop.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@mit.edu \
--cc=mingo@redhat.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=oleg@redhat.com \
--cc=paul.gortmaker@windriver.com \
--cc=penberg@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=rob@landley.net \
--cc=rostedt@goodmis.org \
--cc=steiner@sgi.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=trenn@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.