From: Borislav Petkov <bp@alien8.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@amacapital.net>,
Denys Vlasenko <vda.linux@googlemail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Brian Gerst <brgerst@gmail.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
Ingo Molnar <mingo@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Oleg Nesterov <oleg@redhat.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
Date: Sat, 25 Apr 2015 23:12:06 +0200 [thread overview]
Message-ID: <20150425211206.GE32099@pd.tnic> (raw)
In-Reply-To: <5d120f358612d73fc909f5bfa47e7bd082db0af0.1429841474.git.luto@kernel.org>
On Thu, Apr 23, 2015 at 07:15:01PM -0700, Andy Lutomirski wrote:
> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
> with SS == 0 results in an invalid usermode state in which SS is
> apparently equal to __USER_DS but causes #SS if used.
>
> Work around the issue by replacing NULL SS values with __KERNEL_DS
> in __switch_to, thus ensuring that SYSRET never happens with SS set
> to NULL.
>
> This was exposed by a recent vDSO cleanup.
>
> Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>
> Tested only on Intel, which isn't very interesting. I'll tidy up
> and send a test case, too, once Borislav confirms that it works.
So I did some benchmarking today. Custom kernel build measured with perf
stat, 10 builds with --pre doing
$ cat pre-build-kernel.sh
make -s clean
echo 3 > /proc/sys/vm/drop_caches
$ cat measure.sh
EVENTS="cpu-clock,task-clock,cycles,instructions,branches,branch-misses,context-switches,migrations"
perf stat -e $EVENTS --sync -a --repeat 10 --pre ~/kernel/pre-build-kernel.sh make -s -j64
I've prepended the perf stat output with markers A:, B: or C: for easier
comparing. The markers mean:
A: Linus' master from a couple of days ago + tip/master + tip/x86/asm
B: With Andy's SYSRET patch ontop
C: Without RCX canonicalness check (see patch at the end).
Numbers are from an AMD F16h box:
A: 2835570.145246 cpu-clock (msec) ( +- 0.02% ) [100.00%]
B: 2833364.074970 cpu-clock (msec) ( +- 0.04% ) [100.00%]
C: 2834708.335431 cpu-clock (msec) ( +- 0.02% ) [100.00%]
This is interesting - The SYSRET SS fix makes it minimally better and
the C-patch is a bit worse again. Net win is 861 msec, almost a second,
oh well.
A: 2835570.099981 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%]
B: 2833364.073633 task-clock (msec) # 3.996 CPUs utilized ( +- 0.04% ) [100.00%]
C: 2834708.350387 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%]
Similar thing observable here.
A: 5,591,213,166,613 cycles # 1.972 GHz ( +- 0.03% ) [75.00%]
B: 5,585,023,802,888 cycles # 1.971 GHz ( +- 0.03% ) [75.00%]
C: 5,587,983,212,758 cycles # 1.971 GHz ( +- 0.02% ) [75.00%]
net win is 3,229,953,855 cycles drop.
A: 3,106,707,101,530 instructions # 0.56 insns per cycle ( +- 0.01% ) [75.00%]
B: 3,106,632,251,528 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]
C: 3,106,265,958,142 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]
This looks like it would make sense - instruction count drops from A -> B -> C.
A: 683,676,044,429 branches # 241.107 M/sec ( +- 0.01% ) [75.00%]
B: 683,670,899,595 branches # 241.293 M/sec ( +- 0.01% ) [75.00%]
C: 683,675,772,858 branches # 241.180 M/sec ( +- 0.01% ) [75.00%]
Also makes sense - the C patch adds an unconditional JMP over the
RCX-canonicalness check.
A: 43,829,535,008 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
B: 43,844,118,416 branch-misses # 6.41% of all branches ( +- 0.03% ) [75.00%]
C: 43,819,871,086 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
And this is nice, branch misses are the smallest with C, cool. It makes
sense again - the C patch adds an unconditional JMP which doesn't miss.
A: 2,030,357 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%]
B: 2,029,313 context-switches # 0.716 K/sec ( +- 0.05% ) [100.00%]
C: 2,028,566 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%]
Those look good.
A: 52,421 migrations # 0.018 K/sec ( +- 1.13% )
B: 52,049 migrations # 0.018 K/sec ( +- 1.02% )
C: 51,365 migrations # 0.018 K/sec ( +- 0.92% )
Same here.
A: 709.528485252 seconds time elapsed ( +- 0.02% )
B: 708.976557288 seconds time elapsed ( +- 0.04% )
C: 709.312844791 seconds time elapsed ( +- 0.02% )
Interestingly, the unconditional JMP kinda costs... Btw, I'm not sure if
kernel build is the optimal workload for benchmarking here but I don't
see why not - it does a lot of syscalls so it should exercise the SYSRET
path sufficiently.
Anyway, we can do this below. Or not, I'm sitting on the fence about
that one.
---
From: Borislav Petkov <bp@suse.de>
Date: Sat, 25 Apr 2015 19:30:33 +0200
Subject: [PATCH] x86/entry: Avoid canonical RCX check on AMD
It is not needed on AMD as RCX canonicalness is not checked during
SYSRET there.
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/intel.c | 2 ++
arch/x86/kernel/entry_64.S | 13 +++++++++----
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 7ee9b94d9921..8d555b046fe9 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -265,6 +265,7 @@
#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */
#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
+#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX non-canonical */
#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 50163fa9034f..109a51815e92 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -159,6 +159,8 @@ static void early_init_intel(struct cpuinfo_x86 *c)
pr_info("Disabling PGE capability bit\n");
setup_clear_cpu_cap(X86_FEATURE_PGE);
}
+
+ set_cpu_bug(c, X86_BUG_CANONICAL_RCX);
}
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e952f6bf1d6d..d01fb6c1362f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -415,16 +415,20 @@ syscall_return:
jne opportunistic_sysret_failed
/*
- * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
- * in kernel space. This essentially lets the user take over
- * the kernel, since userspace controls RSP.
- *
* If width of "canonical tail" ever becomes variable, this will need
* to be updated to remain correct on both old and new CPUs.
*/
.ifne __VIRTUAL_MASK_SHIFT - 47
.error "virtual address width changed -- SYSRET checks need update"
.endif
+
+ /*
+ * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
+ * in kernel space. This essentially lets the user take over
+ * the kernel, since userspace controls RSP.
+ */
+ ALTERNATIVE "jmp 1f", "", X86_BUG_CANONICAL_RCX
+
/* Change top 16 bits to be the sign-extension of 47th bit */
shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
@@ -432,6 +436,7 @@ syscall_return:
cmpq %rcx, %r11
jne opportunistic_sysret_failed
+1:
cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */
jne opportunistic_sysret_failed
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
next prev parent reply other threads:[~2015-04-25 21:12 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-24 2:15 [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue Andy Lutomirski
2015-04-24 2:18 ` Andy Lutomirski
2015-04-26 12:34 ` Denys Vlasenko
2015-04-24 3:58 ` Brian Gerst
2015-04-24 9:59 ` Denys Vlasenko
2015-04-24 10:59 ` Borislav Petkov
2015-04-24 19:58 ` Borislav Petkov
2015-04-24 11:27 ` Denys Vlasenko
2015-04-24 12:00 ` Brian Gerst
2015-04-24 16:25 ` Linus Torvalds
2015-04-24 17:33 ` Brian Gerst
2015-04-24 17:41 ` Linus Torvalds
2015-04-24 17:57 ` Brian Gerst
2015-04-24 20:21 ` Andy Lutomirski
2015-04-24 20:46 ` Denys Vlasenko
2015-04-24 20:50 ` Andy Lutomirski
2015-04-24 21:45 ` H. Peter Anvin
2015-04-24 21:45 ` H. Peter Anvin
2015-04-24 21:45 ` H. Peter Anvin
2015-04-24 21:45 ` H. Peter Anvin
2015-04-24 21:45 ` H. Peter Anvin
2015-04-24 21:45 ` H. Peter Anvin
2015-04-25 2:17 ` Denys Vlasenko
2015-04-26 23:36 ` Andy Lutomirski
2015-04-24 20:53 ` Linus Torvalds
2015-04-25 21:12 ` Borislav Petkov [this message]
2015-04-26 11:22 ` perf numbers (was: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue) Borislav Petkov
2015-04-26 23:39 ` [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue Andy Lutomirski
2015-04-27 8:53 ` Borislav Petkov
2015-04-27 10:07 ` Denys Vlasenko
2015-04-27 10:09 ` Borislav Petkov
2015-04-27 11:35 ` Borislav Petkov
2015-04-27 12:08 ` Denys Vlasenko
2015-04-27 12:48 ` Borislav Petkov
2015-04-27 14:57 ` Linus Torvalds
2015-04-27 15:06 ` Linus Torvalds
2015-04-27 15:35 ` Borislav Petkov
2015-04-27 15:46 ` Borislav Petkov
2015-04-27 15:56 ` Andy Lutomirski
2015-04-27 16:04 ` Brian Gerst
2015-04-27 16:10 ` Denys Vlasenko
2015-04-27 16:00 ` Linus Torvalds
2015-04-27 16:40 ` Borislav Petkov
2015-04-27 18:14 ` Linus Torvalds
2015-04-27 18:38 ` Borislav Petkov
2015-04-27 18:47 ` Linus Torvalds
2015-04-27 18:53 ` Borislav Petkov
2015-04-27 19:59 ` H. Peter Anvin
2015-04-27 20:03 ` Borislav Petkov
2015-04-27 20:14 ` H. Peter Anvin
2015-04-28 15:55 ` Borislav Petkov
2015-04-28 16:28 ` Linus Torvalds
2015-04-28 16:58 ` Borislav Petkov
2015-04-28 17:16 ` Linus Torvalds
2015-04-28 18:38 ` Borislav Petkov
2015-04-30 21:39 ` H. Peter Anvin
2015-04-30 23:23 ` H. Peter Anvin
2015-05-01 9:03 ` Borislav Petkov
2015-05-03 11:51 ` Borislav Petkov
2015-04-27 19:11 ` Borislav Petkov
2015-04-27 19:21 ` Denys Vlasenko
2015-04-27 19:45 ` Borislav Petkov
2015-04-28 13:40 ` Borislav Petkov
2015-04-27 16:12 ` Denys Vlasenko
2015-04-27 18:12 ` Linus Torvalds
2015-04-27 18:47 ` Borislav Petkov
2015-04-27 14:39 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150425211206.GE32099@pd.tnic \
--to=bp@alien8.de \
--cc=ast@plumgrid.com \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
--cc=wad@chromium.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.