From: Denys Vlasenko <dvlasenk@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>, Brian Gerst <brgerst@gmail.com>
Cc: "Steven Rostedt" <rostedt@goodmis.org>,
"Oleg Nesterov" <oleg@redhat.com>,
"Ingo Molnar" <mingo@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"Borislav Petkov" <bp@alien8.de>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Will Drewry" <wad@chromium.org>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Alexei Starovoitov" <ast@plumgrid.com>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Kees Cook" <keescook@chromium.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"linux-tip-commits@vger.kernel.org"
<linux-tip-commits@vger.kernel.org>
Subject: Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Date: Thu, 23 Apr 2015 11:56:21 +0200 [thread overview]
Message-ID: <5538C1C5.7010408@redhat.com> (raw)
In-Reply-To: <CALCETrVH6STDwEuLY3gFaNevtjvtOZ7Ox3e10ZhtmitCJnJUKQ@mail.gmail.com>
On 04/23/2015 10:49 AM, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 12:37 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
>> <tipbot@zytor.com> wrote:
>>> Commit-ID: e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Gitweb: http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Author: Denys Vlasenko <dvlasenk@redhat.com>
>>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>>> Committer: Ingo Molnar <mingo@kernel.org>
>>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>>
>>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>>
>>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>>
>>> On 64-bit kernels, the data segment is the same for 32-bit and
>>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>>> selector.
>>>
>>> So there's no need to repeat it by hand. Segment loads are somewhat
>>> expensive: tens of cycles.
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>>> [ Removed unnecessary comment. ]
>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Oleg Nesterov <oleg@redhat.com>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Will Drewry <wad@chromium.org>
>>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>> ---
>>> arch/x86/vdso/vdso32/syscall.S | 2 --
>>> 1 file changed, 2 deletions(-)
>>>
>>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>>> index 5415b56..6b286bb 100644
>>> --- a/arch/x86/vdso/vdso32/syscall.S
>>> +++ b/arch/x86/vdso/vdso32/syscall.S
>>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>> .Lpush_ebp:
>>> movl %ecx, %ebp
>>> syscall
>>> - movl $__USER32_DS, %ecx
>>> - movl %ecx, %ss
>>> movl %ebp, %ecx
>>> popl %ebp
>>> .Lpop_ebp:
>>
>> This patch unfortunately is causing Wine to break on some applications:
>>
>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>> Register dump:
>> CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>> EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216( R- -- I -A-P- )
>> EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>> ESI:00000040 EDI:7ffd4000
>> Stack dump:
>> 0x00aed60c: 00aed648 f7575e5b 7bcc8000 00000000
>> 0x00aed61c: 7bc7bc09 00000010 00aed750 00000040
>> 0x00aed62c: 00aed750 00aed650 7bcc8000 7bc7bbdd
>> 0x00aed63c: 7bcc8000 00aed6a0 00aed750 00aed738
>> 0x00aed64c: 7bc7cfa9 00000011 00aed750 00000040
>> 0x00aed65c: 00000020 00000000 00000000 7bc4f141
>> Backtrace:
>> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>> 1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>> 2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
>> in ntdll (0x00aed648)
>> 3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>> 4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
>> NumberOfThreadsReleased=0x0(nil))
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
>> in ntdll (0x00aed7c8)
>> 5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
>> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
>> in kernel32 (0x00aed7e8)
>> 6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
>> ...
>>
>> __kernel_vsyscall+0x7 points to "pop %ebp".
>>
>> This is on an AMD Phenom(tm) II X6 1055T Processor.
>>
>> It appears that there are some subtle differences in how sysretl works
>> on AMD vs. Intel. According to the Intel docs, the SS selector and
>> descriptor cache is completely reset by sysret to fixed values. The
>> AMD docs however are concerning:
>
> My understanding is that, in long mode, the segment attributes are
> ignored, and that there is no such thing as a "64-bit stack". So...
>
>>
>> AMD's syscall:
>> SS.sel = MSR_STAR.SYSCALL_CS + 8
>> SS.attr = 64-bit stack,dpl0
>
> I don't really believe that.
>
>> SS.base = 0x00000000
>> SS.limit = 0xFFFFFFFF
>>
>> AMD's sysret:
>> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>> // SS base, limit, attributes unchanged.
>>
>
> I'm pretty sure that this is at least a little bit wrong. It makes no
> sense for me for syscall to set SS.DPL=0 and for sysret to leave
> SS.DPL=0. It had better at least change DPL to 3. (Except... don't
> they mean RPL? Why is the DPL cached at all? But RPL is clearly
> changed, since it's part of the selector.)
>
>> Not changing base or limit is no big deal, but not changing attributes
>> could be the problem. It might be leaving the "64-bit stack"
>> attribute set, for whatever that means.
>
> Hmm. I don't know if I believe that explanation. For one thing, the
> APM says "Executing SYSRET in non-64-bit mode or with a 16- or 32-bit
> operand size returns to 32-bit mode with a 32-bit stack pointer."
>
> We can revert this patch or fix it, but I'd like to at least try to
> understand what's wrong first. Borislav, any ideas?
>
> I'm curious whether we can somehow end up in the kernel without a
> sensible SS. What happens if we have SS = 0?
Nothing happens. We continue to execute as if nothing is wrong.
24593_APM_v21.pdf says:
8.9 Long-Mode Interrupt Control Transfers
The long-mode architecture expands the legacy interrupt-mechanism to support 64-bit operating
systems and applications. These changes include:
* All interrupt handlers are 64-bit code and operate in 64-bit mode.
* The size of an interrupt-stack push is fixed at 64 bits (8 bytes).
* The interrupt-stack frame is aligned on a 16-byte boundary.
* The stack pointer, SS:RSP, is pushed unconditionally on interrupts, rather than conditionally based
on a change in CPL.
* The SS selector register is loaded with a null selector as a result of an interrupt, if the CPL changes.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ !!!!!
* The IRET instruction behavior changes, to unconditionally pop SS:RSP, allowing a null SS to be
popped.
...
Intel's doc has a similar statement.
> Try this on for size:
>
> 1. Wine process does syscall
> 2. Context switch to any other task
> 3. Interrupt (software or hardware), which loads SS with ss0, which is
> 0 on x86_64.
(should be "which loads SS with 0 unconditionally" according to AMD docs)
> 4. Context switch back to Wine.
> 5. sysretl
Plausible. But why it happens only with Wine?
The fix can look like this (untested):
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..9f4c232 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -198,6 +198,18 @@ sysexit_from_sys_call:
* with 'sysenter' and it uses the SYSENTER calling convention.
*/
andl $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
+ /*
+ * On AMD, SYSRET32 does not modify %ss cached descriptor;
+ * and in kernel, %ss can be loaded with 0, invalidating it.
+ * On return to 32-bit mode, this makes stack ops fail.
+ * Fix %ss only if it's wrong: it takes ~40 cycles.
+ */
+ movl %ss, %ecx
+ cmpl $__USER32_DS, %ecx
+ je 1f
+ movl $__USER32_DS, %ecx
+ movl %ecx, %ss
+1:
movl RIP(%rsp),%ecx /* User %eip */
CFI_REGISTER rip,rcx
RESTORE_RSI_RDI
@@ -408,6 +420,18 @@ cstar_dispatch:
sysretl_from_sys_call:
andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
RESTORE_RSI_RDI_RDX
+ /*
+ * On AMD, SYSRET32 does not modify %ss cached descriptor;
+ * and in kernel, %ss can be loaded with 0, invalidating it.
+ * On return to 32-bit mode, this makes stack ops fail.
+ * Fix %ss only if it's wrong: it takes ~40 cycles.
+ */
+ movl %ss, %ecx
+ cmpl $__USER32_DS, %ecx
+ je 1f
+ movl $__USER32_DS, %ecx
+ movl %ecx, %ss
+1:
movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
movl EFLAGS(%rsp),%r11d
next prev parent reply other threads:[~2015-04-23 9:57 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-26 1:11 [GIT PULL] x86/vdso changes for 4.1 Andy Lutomirski
[not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
2015-03-27 18:47 ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
2015-03-27 18:48 ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso32/syscall.S: Do " tip-bot for Denys Vlasenko
2015-04-23 7:37 ` Brian Gerst
2015-04-23 8:49 ` Andy Lutomirski
2015-04-23 9:07 ` Andy Lutomirski
2015-04-23 9:23 ` Denys Vlasenko
2015-04-23 9:47 ` Borislav Petkov
2015-04-23 9:56 ` Denys Vlasenko [this message]
2015-04-23 10:18 ` Borislav Petkov
2015-04-23 10:26 ` Denys Vlasenko
2015-04-23 10:44 ` Borislav Petkov
2015-04-23 11:05 ` Denys Vlasenko
2015-04-23 15:48 ` Andy Lutomirski
2015-04-23 16:41 ` Denys Vlasenko
2015-04-23 16:50 ` Andy Lutomirski
2015-04-23 17:14 ` Borislav Petkov
2015-04-23 18:24 ` Andy Lutomirski
2015-04-23 18:36 ` Linus Torvalds
2015-04-23 18:52 ` Borislav Petkov
2015-04-23 19:20 ` Andy Lutomirski
2015-04-23 19:50 ` Denys Vlasenko
2015-04-23 9:20 ` Denys Vlasenko
2015-04-23 9:56 ` Borislav Petkov
2015-04-23 11:11 ` Brian Gerst
2015-04-23 11:28 ` Brian Gerst
2015-04-23 11:46 ` Denys Vlasenko
2015-04-23 12:01 ` Brian Gerst
2015-04-23 12:35 ` Denys Vlasenko
2015-04-23 11:12 ` Denys Vlasenko
2015-03-27 18:48 ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso: Teach 'make clean' to " tip-bot for Andrey Skvortsov
2015-03-27 18:48 ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
2015-03-31 12:39 ` [tip:x86/vdso] x86/vdso: Remove x32 intermediates during ' make clean' tip-bot for Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes tip-bot for Tommi Kyntola
-- strict thread matches above, loose matches on Subject: below --
2015-02-16 14:15 [PATCH] x86,vdso: fix " Tommi Kyntola
2015-02-16 16:29 ` Andy Lutomirski
[not found] ` <CAO2cUkRstHcKzy+sMvaQoXHBjTX1yheN2EMQW-wCd0tDRCLNYQ@mail.gmail.com>
2015-02-16 20:40 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5538C1C5.7010408@redhat.com \
--to=dvlasenk@redhat.com \
--cc=ast@plumgrid.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=wad@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox