* [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set
@ 2015-04-01 19:25 Andy Lutomirski
2015-04-01 21:18 ` Denys Vlasenko
0 siblings, 1 reply; 3+ messages in thread
From: Andy Lutomirski @ 2015-04-01 19:25 UTC (permalink / raw)
To: Ingo Molnar, x86, linux-kernel
Cc: Borislav Petkov, Denys Vlasenko, Andy Lutomirski
When I wrote the opportunistic SYSRET code, I missed an important
difference between SYSRET and IRET. Both instructions are capable
of setting EFLAGS.TF, but they behave differently when doing so.
IRET will not issue a #DB trap after execution when it sets TF This
is critical -- otherwise you'd never be able to make forward
progress when returning to userspace. SYSRET, on the other hand,
will trap with #DB immediately after returning to CPL3, and the next
instruction will never execute.
This breaks anything that opportunistically SYSRETs to a user
context with TF set. For example, running this code with TF set and
a SIGTRAP handler loaded never gets past post_nop.
extern unsigned char post_nop[];
asm volatile ("pushfq\n\t"
"popq %%r11\n\t"
"nop\n\t"
"post_nop:"
: : "c" (post_nop) : "r11");
In my defense, I can't find this documented in the AMD or Intel
manual.
Fix it by using IRET to restore TF. Since it's late, I'm keeping
this minimal and keeping "testq" instead of switching to "testl".
Fixes: 2a23c6b8a9c4 x86_64, entry: Use sysret to return to userspace when possible
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
This affects 4.0-rc as well as -tip. A full test case lives here:
https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/
It's called single_step_syscall_64.
On Intel systems, the 32-bit version of that test fails for unrelated
reasons, but that's not a regression, and fixing it will be much more
intrusive.
arch/x86/kernel/entry_64.S | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 750c6efcb718..369f2716ef3f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -715,7 +715,14 @@ retint_swapgs: /* return to user-space */
cmpq %r11,EFLAGS(%rsp) /* R11 == RFLAGS */
jne opportunistic_sysret_failed
- testq $X86_EFLAGS_RF,%r11 /* sysret can't restore RF */
+ /*
+ * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET,
+ * restoring TF results in a trap from userspace immediately after
+ * SYSRET. This would cause an infinite loop whenever #DB happens
+ * with register state that satisfies the opportunistic SYSRET
+ * conditions.
+ */
+ testq $(X86_EFLAGS_RF|X86_EFLAGS_TF),%r11
jnz opportunistic_sysret_failed
/* nothing to check for RSP */
--
2.3.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set
2015-04-01 19:25 [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set Andy Lutomirski
@ 2015-04-01 21:18 ` Denys Vlasenko
2015-04-02 6:16 ` Borislav Petkov
0 siblings, 1 reply; 3+ messages in thread
From: Denys Vlasenko @ 2015-04-01 21:18 UTC (permalink / raw)
To: Andy Lutomirski, Ingo Molnar, x86, linux-kernel; +Cc: Borislav Petkov
On 04/01/2015 09:25 PM, Andy Lutomirski wrote:
> Fix it by using IRET to restore TF. Since it's late, I'm keeping
> this minimal and keeping "testq" instead of switching to "testl".
Changing to "testl" here wins nothing. Since r11 is used,
REX prefix will be encoded anyway.
>
> - testq $X86_EFLAGS_RF,%r11 /* sysret can't restore RF */
> + /*
> + * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET,
> + * restoring TF results in a trap from userspace immediately after
> + * SYSRET.
> This would cause an infinite loop whenever #DB happens
> + * with register state that satisfies the opportunistic SYSRET
> + * conditions.
> + */
I propose to just show an example of the affected code:
> This can cause an infinite loop. Example:
> * asm volatile("movq $1f,%rcx\n\t"
> * "pushfq\n\t"
> * "popq %r11\n\t"
> * "nop\n\t"
> * "1:");
> * The above example would get stuck at "1:".
> */
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set
2015-04-01 21:18 ` Denys Vlasenko
@ 2015-04-02 6:16 ` Borislav Petkov
0 siblings, 0 replies; 3+ messages in thread
From: Borislav Petkov @ 2015-04-02 6:16 UTC (permalink / raw)
To: Denys Vlasenko; +Cc: Andy Lutomirski, Ingo Molnar, x86, linux-kernel
On Wed, Apr 01, 2015 at 11:18:16PM +0200, Denys Vlasenko wrote:
> On 04/01/2015 09:25 PM, Andy Lutomirski wrote:
> > Fix it by using IRET to restore TF. Since it's late, I'm keeping
> > this minimal and keeping "testq" instead of switching to "testl".
>
> Changing to "testl" here wins nothing.
Except less data (a dword) being shuffled and tracked for dependencies
in the machine instead of qword.
> Since r11 is used, REX prefix will be encoded anyway.
As a future cleanup, one could use one of the "old", i.e. not-extended
registers to save 2 bytes per insn (REX pfx and ModRM) but one has to
remember to do
mov %rax, %r11
in the end.
And yep, it should preferrably be %rax as we have opcode 0xa9 which
tests an immediate and RAX and saves us the ModRM as we don't need to
specify a register.
orig:
a42: 49 f7 c3 00 00 01 00 test $0x10000,%r11
a49: 75 41 jne a8c <opportunistic_sysret_failed>
Andy:
a42: 49 f7 c3 00 01 01 00 test $0x10100,%r11
a49: 75 41 jne a8c <opportunistic_sysret_failed>
testl:
a42: 41 f7 c3 00 01 01 00 test $0x10100,%r11d
a49: 75 41 jne a8c <opportunistic_sysret_failed>
%rax:
a42: a9 00 01 01 00 test $0x10100,%eax
a47: 75 41 jne a8a <opportunistic_sysret_failed>
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-04-02 6:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-01 19:25 [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set Andy Lutomirski
2015-04-01 21:18 ` Denys Vlasenko
2015-04-02 6:16 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox