From: tip-bot for Borislav Petkov <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: tglx@linutronix.de, peterz@infradead.org, luto@kernel.org,
brgerst@gmail.com, bp@alien8.de, dvlasenk@redhat.com,
fengguang.wu@intel.com, torvalds@linux-foundation.org,
bp@suse.de, jpoimboe@redhat.com, hpa@zytor.com,
linux-kernel@vger.kernel.org, mingo@kernel.org
Subject: [tip:x86/asm] x86/asm: Optimize clear_page()
Date: Wed, 1 Mar 2017 01:47:09 -0800 [thread overview]
Message-ID: <tip-49ca7bb328c630dd43be626534b49e19513296fd@git.kernel.org> (raw)
In-Reply-To: <20170215111927.emdgxf2pide3kwro@pd.tnic>
Commit-ID: 49ca7bb328c630dd43be626534b49e19513296fd
Gitweb: http://git.kernel.org/tip/49ca7bb328c630dd43be626534b49e19513296fd
Author: Borislav Petkov <bp@suse.de>
AuthorDate: Thu, 9 Feb 2017 01:34:49 +0100
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 1 Mar 2017 10:18:32 +0100
x86/asm: Optimize clear_page()
Currently, we CALL clear_page() which then JMPs to the proper function
chosen by the alternatives.
What we should do instead is CALL the proper function directly. (This
was something Ingo suggested a while ago). So let's do that.
Measuring our favourite kernel build workload shows that there are no
significant changes in performance.
AMD
===
-- /tmp/before 2017-02-09 18:01:46.451961188 +0100
++ /tmp/after 2017-02-09 18:01:54.883961175 +0100
@@ -1,15 +1,15 @@
Performance counter stats for 'system wide' (5 runs):
- 1028960.373643 cpu-clock (msec) # 6.000 CPUs utilized ( +- 1.41% )
+ 1023086.018961 cpu-clock (msec) # 6.000 CPUs utilized ( +- 1.20% )
- 518,744 context-switches # 0.504 K/sec ( +- 1.04% )
+ 518,254 context-switches # 0.507 K/sec ( +- 1.01% )
- 38,112 cpu-migrations # 0.037 K/sec ( +- 1.95% )
+ 37,917 cpu-migrations # 0.037 K/sec ( +- 1.02% )
- 20,874,266 page-faults # 0.020 M/sec ( +- 0.07% )
+ 20,918,897 page-faults # 0.020 M/sec ( +- 0.18% )
- 2,043,646,230,667 cycles # 1.986 GHz ( +- 0.14% ) (66.67%)
+ 2,045,305,584,032 cycles # 1.999 GHz ( +- 0.16% ) (66.67%)
- 553,698,855,431 stalled-cycles-frontend # 27.09% frontend cycles idle ( +- 0.07% ) (66.67%)
+ 555,099,401,413 stalled-cycles-frontend # 27.14% frontend cycles idle ( +- 0.13% ) (66.67%)
- 621,544,286,390 stalled-cycles-backend # 30.41% backend cycles idle ( +- 0.39% ) (66.67%)
+ 621,371,430,254 stalled-cycles-backend # 30.38% backend cycles idle ( +- 0.32% ) (66.67%)
- 1,738,364,431,659 instructions # 0.85 insn per cycle
+ 1,739,895,771,901 instructions # 0.85 insn per cycle
- # 0.36 stalled cycles per insn ( +- 0.11% ) (66.67%)
+ # 0.36 stalled cycles per insn ( +- 0.13% ) (66.67%)
- 391,170,943,850 branches # 380.161 M/sec ( +- 0.13% ) (66.67%)
+ 391,398,551,757 branches # 382.567 M/sec ( +- 0.13% ) (66.67%)
- 22,567,810,411 branch-misses # 5.77% of all branches ( +- 0.11% ) (66.67%)
+ 22,574,726,683 branch-misses # 5.77% of all branches ( +- 0.13% ) (66.67%)
- 171.480741921 seconds time elapsed ( +- 1.41% )
+ 170.509229451 seconds time elapsed ( +- 1.20% )
Intel
=====
-- /tmp/before 2017-02-09 20:36:19.851947473 +0100
++ /tmp/after 2017-02-09 20:36:30.151947458 +0100
@@ -1,15 +1,15 @@
Performance counter stats for 'system wide' (5 runs):
- 2207248.598126 cpu-clock (msec) # 8.000 CPUs utilized ( +- 0.69% )
+ 2213300.106631 cpu-clock (msec) # 8.000 CPUs utilized ( +- 0.73% )
- 899,342 context-switches # 0.407 K/sec ( +- 0.68% )
+ 898,381 context-switches # 0.406 K/sec ( +- 0.79% )
- 80,553 cpu-migrations # 0.036 K/sec ( +- 1.13% )
+ 80,979 cpu-migrations # 0.037 K/sec ( +- 1.11% )
- 36,171,148 page-faults # 0.016 M/sec ( +- 0.02% )
+ 36,179,791 page-faults # 0.016 M/sec ( +- 0.02% )
- 6,665,288,826,484 cycles # 3.020 GHz ( +- 0.07% ) (83.33%)
+ 6,671,638,410,799 cycles # 3.014 GHz ( +- 0.06% ) (83.33%)
- 5,065,975,115,197 stalled-cycles-frontend # 76.01% frontend cycles idle ( +- 0.11% ) (83.33%)
+ 5,076,835,183,223 stalled-cycles-frontend # 76.10% frontend cycles idle ( +- 0.11% ) (83.33%)
- 3,841,556,350,614 stalled-cycles-backend # 57.64% backend cycles idle ( +- 0.13% ) (66.67%)
+ 3,852,823,974,333 stalled-cycles-backend # 57.75% backend cycles idle ( +- 0.12% ) (66.67%)
- 4,148,398,171,079 instructions # 0.62 insn per cycle
+ 4,148,997,156,059 instructions # 0.62 insn per cycle
- # 1.22 stalled cycles per insn ( +- 0.10% ) (83.33%)
+ # 1.22 stalled cycles per insn ( +- 0.11% ) (83.33%)
- 887,187,118,591 branches # 401.943 M/sec ( +- 0.09% ) (83.33%)
+ 887,271,341,121 branches # 400.882 M/sec ( +- 0.11% ) (83.33%)
- 30,139,439,034 branch-misses # 3.40% of all branches ( +- 0.09% ) (83.33%)
+ 30,134,864,997 branch-misses # 3.40% of all branches ( +- 0.06% ) (83.33%)
- 275.904405540 seconds time elapsed ( +- 0.69% )
+ 276.660352016 seconds time elapsed ( +- 0.73% )
allmodconfig vmlinux size grows by a ~1Kb but that's fine - we optimize
our calling of the clear_page variants.
text data bss dec hex filename
9051979 23067670 27009024 59128673 3863b61 vmlinux
9053000 23067670 27009024 59129694 3863f5e vmlinux.clear_page
Reported-by: kernel test robot <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170215111927.emdgxf2pide3kwro@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/alternative.h | 17 +++++++++++++++++
arch/x86/include/asm/page_64.h | 15 ++++++++++++++-
arch/x86/lib/clear_page_64.S | 17 +++++++----------
3 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b02038..12e3d8d 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -227,6 +227,23 @@ static inline int alternatives_text_reserved(void *start, void *end)
}
/*
+ * Like alternative_call(), but there are two features and respective functions.
+ * If CPU has feature2, function2 is used.
+ * Otherwise, if CPU has feature1, function1 is used.
+ * Otherwise, old function is used.
+ */
+#define alternative_void_call_2(oldfunc, newfunc1, feature1, newfunc2, \
+ feature2, input...) \
+{ \
+ register void *__sp asm(_ASM_SP); \
+ asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1, \
+ "call %P[new2]", feature2) \
+ : "+r" (__sp) \
+ : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
+ [new2] "i" (newfunc2), ## input); \
+}
+
+/*
* use this macro(s) if you need more than one output parameter
* in alternative_io
*/
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index b3bebf9..254abce 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -4,6 +4,7 @@
#include <asm/page_64_types.h>
#ifndef __ASSEMBLY__
+#include <asm/alternative.h>
/* duplicated to the one in bootmem.h */
extern unsigned long max_pfn;
@@ -34,7 +35,19 @@ extern unsigned long __phys_addr_symbol(unsigned long);
#define pfn_valid(pfn) ((pfn) < max_pfn)
#endif
-void clear_page(void *page);
+void clear_page_orig(void *page);
+void clear_page_rep(void *page);
+void clear_page_erms(void *page);
+
+static inline void clear_page(void *page)
+{
+ alternative_void_call_2(clear_page_orig,
+ clear_page_rep, X86_FEATURE_REP_GOOD,
+ clear_page_erms, X86_FEATURE_ERMS,
+ "D" (page)
+ : "memory", "rax", "rcx");
+}
+
void copy_page(void *to, void *from);
#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 5e2af3a..81b1635 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -14,20 +14,15 @@
* Zero a page.
* %rdi - page
*/
-ENTRY(clear_page)
-
- ALTERNATIVE_2 "jmp clear_page_orig", "", X86_FEATURE_REP_GOOD, \
- "jmp clear_page_c_e", X86_FEATURE_ERMS
-
+ENTRY(clear_page_rep)
movl $4096/8,%ecx
xorl %eax,%eax
rep stosq
ret
-ENDPROC(clear_page)
-EXPORT_SYMBOL(clear_page)
+ENDPROC(clear_page_rep)
+EXPORT_SYMBOL_GPL(clear_page_rep)
ENTRY(clear_page_orig)
-
xorl %eax,%eax
movl $4096/64,%ecx
.p2align 4
@@ -47,10 +42,12 @@ ENTRY(clear_page_orig)
nop
ret
ENDPROC(clear_page_orig)
+EXPORT_SYMBOL_GPL(clear_page_orig)
-ENTRY(clear_page_c_e)
+ENTRY(clear_page_erms)
movl $4096,%ecx
xorl %eax,%eax
rep stosb
ret
-ENDPROC(clear_page_c_e)
+ENDPROC(clear_page_erms)
+EXPORT_SYMBOL_GPL(clear_page_erms)
next prev parent reply other threads:[~2017-03-01 9:49 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-09 19:51 [PATCH] x86: Optimize clear_page() Borislav Petkov
2017-02-15 11:19 ` [PATCH -v1.1] " Borislav Petkov
2017-02-19 13:41 ` Borislav Petkov
2017-03-01 9:47 ` tip-bot for Borislav Petkov [this message]
2017-03-07 5:11 ` [tip:x86/asm] x86/asm: " Yinghai Lu
2017-03-07 7:30 ` Ingo Molnar
2017-03-07 18:57 ` Yinghai Lu
2017-03-08 10:47 ` Ingo Molnar
2017-03-02 1:09 ` [lkp-robot] [x86] ed3ce2a917: BUG:unable_to_handle_kernel kernel test robot
2017-03-02 1:09 ` kernel test robot
2017-03-02 18:19 ` Borislav Petkov
2017-03-02 18:19 ` Borislav Petkov
2017-03-09 2:13 ` Ye Xiaolong
2017-03-09 2:13 ` Ye Xiaolong
2017-03-09 2:30 ` Fengguang Wu
2017-03-09 2:30 ` Fengguang Wu
2017-03-09 8:17 ` Borislav Petkov
2017-03-09 8:17 ` Borislav Petkov
2017-03-09 8:14 ` Borislav Petkov
2017-03-09 8:14 ` Borislav Petkov
2017-03-10 2:33 ` Ye Xiaolong
2017-03-10 2:33 ` Ye Xiaolong
2017-03-10 8:30 ` Borislav Petkov
2017-03-10 8:30 ` Borislav Petkov
2017-03-07 8:34 ` [tip:x86/asm] x86/asm: Optimize clear_page() tip-bot for Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-49ca7bb328c630dd43be626534b49e19513296fd@git.kernel.org \
--to=tipbot@zytor.com \
--cc=bp@alien8.de \
--cc=bp@suse.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=fengguang.wu@intel.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.