From: Denys Vlasenko <dvlasenk@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>, Brian Gerst <brgerst@gmail.com>
Cc: "Steven Rostedt" <rostedt@goodmis.org>,
"Oleg Nesterov" <oleg@redhat.com>,
"Ingo Molnar" <mingo@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"Borislav Petkov" <bp@alien8.de>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Will Drewry" <wad@chromium.org>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Alexei Starovoitov" <ast@plumgrid.com>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Kees Cook" <keescook@chromium.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"linux-tip-commits@vger.kernel.org"
<linux-tip-commits@vger.kernel.org>
Subject: Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Date: Thu, 23 Apr 2015 11:56:21 +0200 [thread overview]
Message-ID: <5538C1C5.7010408@redhat.com> (raw)
In-Reply-To: <CALCETrVH6STDwEuLY3gFaNevtjvtOZ7Ox3e10ZhtmitCJnJUKQ@mail.gmail.com>
On 04/23/2015 10:49 AM, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 12:37 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
>> <tipbot@zytor.com> wrote:
>>> Commit-ID: e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Gitweb: http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Author: Denys Vlasenko <dvlasenk@redhat.com>
>>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>>> Committer: Ingo Molnar <mingo@kernel.org>
>>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>>
>>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>>
>>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>>
>>> On 64-bit kernels, the data segment is the same for 32-bit and
>>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>>> selector.
>>>
>>> So there's no need to repeat it by hand. Segment loads are somewhat
>>> expensive: tens of cycles.
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>>> [ Removed unnecessary comment. ]
>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Oleg Nesterov <oleg@redhat.com>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Will Drewry <wad@chromium.org>
>>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>> ---
>>> arch/x86/vdso/vdso32/syscall.S | 2 --
>>> 1 file changed, 2 deletions(-)
>>>
>>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>>> index 5415b56..6b286bb 100644
>>> --- a/arch/x86/vdso/vdso32/syscall.S
>>> +++ b/arch/x86/vdso/vdso32/syscall.S
>>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>> .Lpush_ebp:
>>> movl %ecx, %ebp
>>> syscall
>>> - movl $__USER32_DS, %ecx
>>> - movl %ecx, %ss
>>> movl %ebp, %ecx
>>> popl %ebp
>>> .Lpop_ebp:
>>
>> This patch unfortunately is causing Wine to break on some applications:
>>
>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>> Register dump:
>> CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>> EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216( R- -- I -A-P- )
>> EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>> ESI:00000040 EDI:7ffd4000
>> Stack dump:
>> 0x00aed60c: 00aed648 f7575e5b 7bcc8000 00000000
>> 0x00aed61c: 7bc7bc09 00000010 00aed750 00000040
>> 0x00aed62c: 00aed750 00aed650 7bcc8000 7bc7bbdd
>> 0x00aed63c: 7bcc8000 00aed6a0 00aed750 00aed738
>> 0x00aed64c: 7bc7cfa9 00000011 00aed750 00000040
>> 0x00aed65c: 00000020 00000000 00000000 7bc4f141
>> Backtrace:
>> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>> 1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>> 2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
>> in ntdll (0x00aed648)
>> 3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>> 4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
>> NumberOfThreadsReleased=0x0(nil))
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
>> in ntdll (0x00aed7c8)
>> 5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
>> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
>> in kernel32 (0x00aed7e8)
>> 6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
>> ...
>>
>> __kernel_vsyscall+0x7 points to "pop %ebp".
>>
>> This is on an AMD Phenom(tm) II X6 1055T Processor.
>>
>> It appears that there are some subtle differences in how sysretl works
>> on AMD vs. Intel. According to the Intel docs, the SS selector and
>> descriptor cache is completely reset by sysret to fixed values. The
>> AMD docs however are concerning:
>
> My understanding is that, in long mode, the segment attributes are
> ignored, and that there is no such thing as a "64-bit stack". So...
>
>>
>> AMD's syscall:
>> SS.sel = MSR_STAR.SYSCALL_CS + 8
>> SS.attr = 64-bit stack,dpl0
>
> I don't really believe that.
>
>> SS.base = 0x00000000
>> SS.limit = 0xFFFFFFFF
>>
>> AMD's sysret:
>> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>> // SS base, limit, attributes unchanged.
>>
>
> I'm pretty sure that this is at least a little bit wrong. It makes no
> sense for me for syscall to set SS.DPL=0 and for sysret to leave
> SS.DPL=0. It had better at least change DPL to 3. (Except... don't
> they mean RPL? Why is the DPL cached at all? But RPL is clearly
> changed, since it's part of the selector.)
>
>> Not changing base or limit is no big deal, but not changing attributes
>> could be the problem. It might be leaving the "64-bit stack"
>> attribute set, for whatever that means.
>
> Hmm. I don't know if I believe that explanation. For one thing, the
> APM says "Executing SYSRET in non-64-bit mode or with a 16- or 32-bit
> operand size returns to 32-bit mode with a 32-bit stack pointer."
>
> We can revert this patch or fix it, but I'd like to at least try to
> understand what's wrong first. Borislav, any ideas?
>
> I'm curious whether we can somehow end up in the kernel without a
> sensible SS. What happens if we have SS = 0?
Nothing happens. We continue to execute as if nothing is wrong.
24593_APM_v21.pdf says:
8.9 Long-Mode Interrupt Control Transfers
The long-mode architecture expands the legacy interrupt-mechanism to support 64-bit operating
systems and applications. These changes include:
* All interrupt handlers are 64-bit code and operate in 64-bit mode.
* The size of an interrupt-stack push is fixed at 64 bits (8 bytes).
* The interrupt-stack frame is aligned on a 16-byte boundary.
* The stack pointer, SS:RSP, is pushed unconditionally on interrupts, rather than conditionally based
on a change in CPL.
* The SS selector register is loaded with a null selector as a result of an interrupt, if the CPL changes.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ !!!!!
* The IRET instruction behavior changes, to unconditionally pop SS:RSP, allowing a null SS to be
popped.
...
Intel's doc has a similar statement.
> Try this on for size:
>
> 1. Wine process does syscall
> 2. Context switch to any other task
> 3. Interrupt (software or hardware), which loads SS with ss0, which is
> 0 on x86_64.
(should be "which loads SS with 0 unconditionally" according to AMD docs)
> 4. Context switch back to Wine.
> 5. sysretl
Plausible. But why it happens only with Wine?
The fix can look like this (untested):
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..9f4c232 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -198,6 +198,18 @@ sysexit_from_sys_call:
* with 'sysenter' and it uses the SYSENTER calling convention.
*/
andl $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
+ /*
+ * On AMD, SYSRET32 does not modify %ss cached descriptor;
+ * and in kernel, %ss can be loaded with 0, invalidating it.
+ * On return to 32-bit mode, this makes stack ops fail.
+ * Fix %ss only if it's wrong: it takes ~40 cycles.
+ */
+ movl %ss, %ecx
+ cmpl $__USER32_DS, %ecx
+ je 1f
+ movl $__USER32_DS, %ecx
+ movl %ecx, %ss
+1:
movl RIP(%rsp),%ecx /* User %eip */
CFI_REGISTER rip,rcx
RESTORE_RSI_RDI
@@ -408,6 +420,18 @@ cstar_dispatch:
sysretl_from_sys_call:
andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
RESTORE_RSI_RDI_RDX
+ /*
+ * On AMD, SYSRET32 does not modify %ss cached descriptor;
+ * and in kernel, %ss can be loaded with 0, invalidating it.
+ * On return to 32-bit mode, this makes stack ops fail.
+ * Fix %ss only if it's wrong: it takes ~40 cycles.
+ */
+ movl %ss, %ecx
+ cmpl $__USER32_DS, %ecx
+ je 1f
+ movl $__USER32_DS, %ecx
+ movl %ecx, %ss
+1:
movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
movl EFLAGS(%rsp),%r11d
next prev parent reply other threads:[~2015-04-23 9:57 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-26 1:11 [GIT PULL] x86/vdso changes for 4.1 Andy Lutomirski
[not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
2015-03-27 18:47 ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
2015-03-27 18:48 ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso32/syscall.S: Do " tip-bot for Denys Vlasenko
2015-04-23 7:37 ` Brian Gerst
2015-04-23 8:49 ` Andy Lutomirski
2015-04-23 9:07 ` Andy Lutomirski
2015-04-23 9:23 ` Denys Vlasenko
2015-04-23 9:47 ` Borislav Petkov
2015-04-23 9:56 ` Denys Vlasenko [this message]
2015-04-23 10:18 ` Borislav Petkov
2015-04-23 10:26 ` Denys Vlasenko
2015-04-23 10:44 ` Borislav Petkov
2015-04-23 11:05 ` Denys Vlasenko
2015-04-23 15:48 ` Andy Lutomirski
2015-04-23 16:41 ` Denys Vlasenko
2015-04-23 16:50 ` Andy Lutomirski
2015-04-23 17:14 ` Borislav Petkov
2015-04-23 18:24 ` Andy Lutomirski
2015-04-23 18:36 ` Linus Torvalds
2015-04-23 18:52 ` Borislav Petkov
2015-04-23 19:20 ` Andy Lutomirski
2015-04-23 19:50 ` Denys Vlasenko
2015-04-23 9:20 ` Denys Vlasenko
2015-04-23 9:56 ` Borislav Petkov
2015-04-23 11:11 ` Brian Gerst
2015-04-23 11:28 ` Brian Gerst
2015-04-23 11:46 ` Denys Vlasenko
2015-04-23 12:01 ` Brian Gerst
2015-04-23 12:35 ` Denys Vlasenko
2015-04-23 11:12 ` Denys Vlasenko
2015-03-27 18:48 ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso: Teach 'make clean' to " tip-bot for Andrey Skvortsov
2015-03-27 18:48 ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
2015-03-31 12:39 ` [tip:x86/vdso] x86/vdso: Remove x32 intermediates during ' make clean' tip-bot for Andy Lutomirski
2015-03-31 12:38 ` [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes tip-bot for Tommi Kyntola
-- strict thread matches above, loose matches on Subject: below --
2015-02-16 14:15 [PATCH] x86,vdso: fix " Tommi Kyntola
2015-02-16 16:29 ` Andy Lutomirski
[not found] ` <CAO2cUkRstHcKzy+sMvaQoXHBjTX1yheN2EMQW-wCd0tDRCLNYQ@mail.gmail.com>
2015-02-16 20:40 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5538C1C5.7010408@redhat.com \
--to=dvlasenk@redhat.com \
--cc=ast@plumgrid.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=wad@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.