From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Rik van Riel" <riel@redhat.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"Denys Vlasenko" <vda.linux@googlemail.com>,
"Borislav Petkov" <bp@alien8.de>,
"Kees Cook" <keescook@chromium.org>,
"Brian Gerst" <brgerst@gmail.com>
Subject: Re: [RFC/INCOMPLETE 00/13] x86: Rewrite exit-to-userspace code
Date: Wed, 17 Jun 2015 12:32:26 +0200 [thread overview]
Message-ID: <20150617103226.GA30325@gmail.com> (raw)
In-Reply-To: <cover.1434485184.git.luto@kernel.org>
* Andy Lutomirski <luto@kernel.org> wrote:
> The main things that are missing are that I haven't done the 32-bit parts
> (anyone want to help?) and therefore I haven't deleted the old C code. I also
> think this may break UML for trivial reasons.
So I'd suggest moving most of the SYSRET fast path to C too.
This is how it looks like now after your patches:
testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz tracesys
entry_SYSCALL_64_fastpath:
#if __SYSCALL_MASK == ~0
cmpq $__NR_syscall_max, %rax
#else
andl $__SYSCALL_MASK, %eax
cmpl $__NR_syscall_max, %eax
#endif
ja 1f /* return -ENOSYS (already in pt_regs->ax) */
movq %r10, %rcx
call *sys_call_table(, %rax, 8)
movq %rax, RAX(%rsp)
1:
/*
* Syscall return path ending with SYSRET (fast path).
* Has incompletely filled pt_regs.
*/
LOCKDEP_SYS_EXIT
/*
* We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
* it is too small to ever cause noticeable irq latency.
*/
DISABLE_INTERRUPTS(CLBR_NONE)
/*
* We must check ti flags with interrupts (or at least preemption)
* off because we must *never* return to userspace without
* processing exit work that is enqueued if we're preempted here.
* In particular, returning to userspace with any of the one-shot
* flags (TIF_NOTIFY_RESUME, TIF_USER_RETURN_NOTIFY, etc) set is
* very bad.
*/
testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz int_ret_from_sys_call_irqs_off /* Go to the slow path */
Most of that can be done in C.
And I think we could also convert the IRET syscall return slow path to C too:
GLOBAL(int_ret_from_sys_call)
SAVE_EXTRA_REGS
movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */
RESTORE_EXTRA_REGS
/*
* Try to use SYSRET instead of IRET if we're returning to
* a completely clean 64-bit userspace context.
*/
movq RCX(%rsp), %rcx
movq RIP(%rsp), %r11
cmpq %rcx, %r11 /* RCX == RIP */
jne opportunistic_sysret_failed
/*
* On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
* in kernel space. This essentially lets the user take over
* the kernel, since userspace controls RSP.
*
* If width of "canonical tail" ever becomes variable, this will need
* to be updated to remain correct on both old and new CPUs.
*/
.ifne __VIRTUAL_MASK_SHIFT - 47
.error "virtual address width changed -- SYSRET checks need update"
.endif
/* Change top 16 bits to be the sign-extension of 47th bit */
shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
/* If this changed %rcx, it was not canonical */
cmpq %rcx, %r11
jne opportunistic_sysret_failed
cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
jne opportunistic_sysret_failed
movq R11(%rsp), %r11
cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
jne opportunistic_sysret_failed
/*
* SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET,
* restoring TF results in a trap from userspace immediately after
* SYSRET. This would cause an infinite loop whenever #DB happens
* with register state that satisfies the opportunistic SYSRET
* conditions. For example, single-stepping this user code:
*
* movq $stuck_here, %rcx
* pushfq
* popq %r11
* stuck_here:
*
* would never get past 'stuck_here'.
*/
testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
jnz opportunistic_sysret_failed
/* nothing to check for RSP */
cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
jne opportunistic_sysret_failed
/*
* We win! This label is here just for ease of understanding
* perf profiles. Nothing jumps here.
*/
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
RESTORE_C_REGS_EXCEPT_RCX_R11
movq RSP(%rsp), %rsp
USERGS_SYSRET64
opportunistic_sysret_failed:
SWAPGS
jmp restore_c_regs_and_iret
END(entry_SYSCALL_64)
Basically there would be a single C function we'd call, which returns a condition
(or fixes up its return address on the stack directly) to determine between the
SYSRET and IRET return paths.
Moving this to C too has immediate benefits: that way we could easily add
instrumentation to see how efficient these various return methods are, etc.
I.e. I don't think there's two ways about this: once the entry code moves to the
domain of C code, we get the best benefits by moving as much of it as possible.
The only low level bits remaining in assembly will be low level hardware ABI
details: saving registers and restoring registers to the expected format - no
'active' code whatsoever.
Thanks,
Ingo
next prev parent reply other threads:[~2015-06-17 10:32 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-16 20:16 [RFC/INCOMPLETE 00/13] x86: Rewrite exit-to-userspace code Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 01/13] context_tracking: Add context_tracking_assert_state Andy Lutomirski
2015-06-17 9:41 ` Ingo Molnar
2015-06-17 14:15 ` Andy Lutomirski
2015-06-18 9:57 ` Ingo Molnar
2015-06-18 11:07 ` Andy Lutomirski
2015-06-18 15:52 ` Andy Lutomirski
2015-06-18 16:17 ` Ingo Molnar
2015-06-18 16:26 ` Frederic Weisbecker
2015-06-18 19:26 ` Andy Lutomirski
2015-06-17 15:27 ` Paul E. McKenney
2015-06-18 9:59 ` Ingo Molnar
2015-06-18 22:54 ` Paul E. McKenney
2015-06-19 2:19 ` Paul E. McKenney
2015-06-30 11:04 ` Ingo Molnar
2015-06-30 16:16 ` Paul E. McKenney
2015-06-16 20:16 ` [RFC/INCOMPLETE 02/13] notifiers: Assert that RCU is watching in notify_die Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 03/13] x86: Move C entry and exit code to arch/x86/entry/common.c Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 04/13] x86/traps: Assert that we're in CONTEXT_KERNEL in exception entries Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 05/13] x86/entry: Add enter_from_user_mode and use it in syscalls Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 06/13] x86/entry: Add new, comprehensible entry and exit hooks Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 07/13] x86/entry/64: Really create an error-entry-from-usermode code path Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 08/13] x86/entry/64: Migrate 64-bit syscalls to new exit hooks Andy Lutomirski
2015-06-17 10:00 ` Ingo Molnar
2015-06-17 10:02 ` Ingo Molnar
2015-06-17 14:12 ` Andy Lutomirski
2015-06-18 10:17 ` Ingo Molnar
2015-06-18 10:19 ` Ingo Molnar
2015-06-16 20:16 ` [RFC/INCOMPLETE 09/13] x86/entry/compat: Migrate compat " Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 10/13] x86/asm/entry/64: Save all regs on interrupt entry Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 11/13] x86/asm/entry/64: Simplify irq stack pt_regs handling Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 12/13] x86/asm/entry/64: Migrate error and interrupt exit work to C Andy Lutomirski
2015-06-16 20:16 ` [RFC/INCOMPLETE 13/13] x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h Andy Lutomirski
2015-06-17 9:48 ` [RFC/INCOMPLETE 00/13] x86: Rewrite exit-to-userspace code Ingo Molnar
2015-06-17 10:13 ` Richard Weinberger
2015-06-17 11:04 ` Ingo Molnar
2015-06-17 14:19 ` Andy Lutomirski
2015-06-17 15:16 ` Andy Lutomirski
2015-06-18 10:14 ` Ingo Molnar
2015-06-17 10:32 ` Ingo Molnar [this message]
2015-06-17 11:14 ` Ingo Molnar
2015-06-17 14:23 ` Andy Lutomirski
2015-06-18 10:11 ` Ingo Molnar
2015-06-18 11:06 ` Andy Lutomirski
2015-06-18 16:24 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150617103226.GA30325@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=fweisbec@gmail.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=vda.linux@googlemail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox