* [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments
@ 2015-03-25 18:20 Denys Vlasenko
2015-03-25 18:20 ` [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path Denys Vlasenko
2015-03-26 11:45 ` [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Borislav Petkov
0 siblings, 2 replies; 5+ messages in thread
From: Denys Vlasenko @ 2015-03-25 18:20 UTC (permalink / raw)
To: Ingo Molnar
Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
x86, linux-kernel
A named label "ret_from_sys_call" implies that there are jumps
to this location from elsewhere, as happens with many other labels
in this file.
But this label is used only by the JMP a few insns above.
To make that obvious, use local numeric label instead.
Do the same in the second copy of the syscall table dispatch code,
it has a similar JMP which skips the CALL.
Improve comments:
"and return regs->ax" isn't too informative. We always return regs->ax.
The comment suggesting that it'd be cool to use rip relative addressing for CALL
is deleted. It's unclear why that would be an improvement - we aren't striving
to use position-independent code here. PIC code here would require something like
LEA sys_call_table(%rip),reg + CALL *(reg,%rax*8)...
"iret frame is also incomplete" is no longer true, fix that too.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
Changes in v2: gave "jmp 1f" treatment to the second jump.
arch/x86/kernel/entry_64.S | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index bf9afad..9c8661c 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -258,16 +258,16 @@ system_call_fastpath:
andl $__SYSCALL_MASK,%eax
cmpl $__NR_syscall_max,%eax
#endif
- ja ret_from_sys_call /* and return regs->ax */
+ ja 1f /* return -ENOSYS (already in pt_regs->ax) */
movq %r10,%rcx
- call *sys_call_table(,%rax,8) # XXX: rip relative
+ call *sys_call_table(,%rax,8)
movq %rax,RAX(%rsp)
+1:
+
/*
- * Syscall return path ending with SYSRET (fast path)
- * Has incompletely filled pt_regs, iret frame is also incomplete.
+ * Syscall return path ending with SYSRET (fast path).
+ * Has incompletely filled pt_regs.
*/
-ret_from_sys_call:
-
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -334,10 +334,11 @@ tracesys_phase2:
andl $__SYSCALL_MASK,%eax
cmpl $__NR_syscall_max,%eax
#endif
- ja int_ret_from_sys_call /* RAX(%rsp) is already set */
+ ja 1f /* return -ENOSYS (already in pt_regs->ax) */
movq %r10,%rcx /* fixup for C */
call *sys_call_table(,%rax,8)
movq %rax,RAX(%rsp)
+1:
/* Use IRET because user could have changed pt_regs->foo */
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path
2015-03-25 18:20 [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Denys Vlasenko
@ 2015-03-25 18:20 ` Denys Vlasenko
2015-03-25 18:38 ` Ingo Molnar
2015-03-26 11:45 ` [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Borislav Petkov
1 sibling, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2015-03-25 18:20 UTC (permalink / raw)
To: Ingo Molnar
Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
x86, linux-kernel
SYSRET code path has a small irq-off block.
On this code path, TRACE_IRQS_ON can't be called right before interrupts
are enabled for real, we can't clobber registers there.
So current code does it earlier, in a safe place.
But with this, TRACE_IRQS_OFF/ON frames just two fast instructions,
which is ridiculous: now most of irq-off block is _outside_ of the framing.
Do the same thing that we do on SYSCALL entry: do not track this irq-off block,
it is very small to ever cause noticeable irq latency.
Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off" now does
invoke TRACE_IRQS_OFF - move int_ret_from_sys_call_irqs_off label before
TRACE_IRQS_OFF.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
Changes in v2: added comment
arch/x86/kernel/entry_64.S | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9c8661c..658cf2e 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -269,8 +269,11 @@ system_call_fastpath:
* Has incompletely filled pt_regs.
*/
LOCKDEP_SYS_EXIT
+ /*
+ * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
+ * it is too small to ever cause noticeable irq latency.
+ */
DISABLE_INTERRUPTS(CLBR_NONE)
- TRACE_IRQS_OFF
/*
* We must check ti flags with interrupts (or at least preemption)
@@ -284,10 +287,7 @@ system_call_fastpath:
jnz int_ret_from_sys_call_irqs_off /* Go to the slow path */
CFI_REMEMBER_STATE
- /*
- * sysretq will re-enable interrupts:
- */
- TRACE_IRQS_ON
+
RESTORE_C_REGS_EXCEPT_RCX_R11
movq RIP(%rsp),%rcx
CFI_REGISTER rip,rcx
@@ -298,6 +298,7 @@ system_call_fastpath:
* 64bit SYSRET restores rip from rcx,
* rflags from r11 (but RF and VM bits are forced to 0),
* cs and ss are loaded from MSRs.
+ * Restoration of rflags re-enables interrupts.
*/
USERGS_SYSRET64
@@ -347,8 +348,8 @@ tracesys_phase2:
*/
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
+int_ret_from_sys_call_irqs_off: /* jumps come here from the irqs-off SYSRET path */
TRACE_IRQS_OFF
-int_ret_from_sys_call_irqs_off:
movl $_TIF_ALLWORK_MASK,%edi
/* edi: mask to check */
GLOBAL(int_with_check)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path
2015-03-25 18:20 ` [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path Denys Vlasenko
@ 2015-03-25 18:38 ` Ingo Molnar
2015-03-25 19:15 ` Denys Vlasenko
0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2015-03-25 18:38 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Linus Torvalds, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
Andy Lutomirski, Oleg Nesterov, Frederic Weisbecker,
Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel
* Denys Vlasenko <dvlasenk@redhat.com> wrote:
> SYSRET code path has a small irq-off block.
> On this code path, TRACE_IRQS_ON can't be called right before interrupts
> are enabled for real, we can't clobber registers there.
> So current code does it earlier, in a safe place.
>
> But with this, TRACE_IRQS_OFF/ON frames just two fast instructions,
> which is ridiculous: now most of irq-off block is _outside_ of the framing.
>
> Do the same thing that we do on SYSCALL entry: do not track this irq-off block,
> it is very small to ever cause noticeable irq latency.
>
> Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off" now does
> invoke TRACE_IRQS_OFF - move int_ret_from_sys_call_irqs_off label before
> TRACE_IRQS_OFF.
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>
> Changes in v2: added comment
>
> arch/x86/kernel/entry_64.S | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 9c8661c..658cf2e 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -269,8 +269,11 @@ system_call_fastpath:
> * Has incompletely filled pt_regs.
> */
> LOCKDEP_SYS_EXIT
> + /*
> + * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
> + * it is too small to ever cause noticeable irq latency.
* ... but if we enter the slowpath from here, we'll execute a
* proper TRACE_IRQS_OFF call.
> @@ -298,6 +298,7 @@ system_call_fastpath:
> * 64bit SYSRET restores rip from rcx,
> * rflags from r11 (but RF and VM bits are forced to 0),
> * cs and ss are loaded from MSRs.
> + * Restoration of rflags re-enables interrupts.
> */
> USERGS_SYSRET64
Is that true even if user-space disabled irqs (via CLI) and executed a
syscall while having irqs off?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path
2015-03-25 18:38 ` Ingo Molnar
@ 2015-03-25 19:15 ` Denys Vlasenko
0 siblings, 0 replies; 5+ messages in thread
From: Denys Vlasenko @ 2015-03-25 19:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
Andy Lutomirski, Oleg Nesterov, Frederic Weisbecker,
Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel
On 03/25/2015 07:38 PM, Ingo Molnar wrote:
>
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>
>> SYSRET code path has a small irq-off block.
>> On this code path, TRACE_IRQS_ON can't be called right before interrupts
>> are enabled for real, we can't clobber registers there.
>> So current code does it earlier, in a safe place.
>>
>> But with this, TRACE_IRQS_OFF/ON frames just two fast instructions,
>> which is ridiculous: now most of irq-off block is _outside_ of the framing.
>>
>> Do the same thing that we do on SYSCALL entry: do not track this irq-off block,
>> it is very small to ever cause noticeable irq latency.
>>
>> Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off" now does
>> invoke TRACE_IRQS_OFF - move int_ret_from_sys_call_irqs_off label before
>> TRACE_IRQS_OFF.
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: Ingo Molnar <mingo@kernel.org>
>> CC: Borislav Petkov <bp@alien8.de>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: Frederic Weisbecker <fweisbec@gmail.com>
>> CC: Alexei Starovoitov <ast@plumgrid.com>
>> CC: Will Drewry <wad@chromium.org>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: x86@kernel.org
>> CC: linux-kernel@vger.kernel.org
>> ---
>>
>> Changes in v2: added comment
>>
>> arch/x86/kernel/entry_64.S | 13 +++++++------
>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 9c8661c..658cf2e 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -269,8 +269,11 @@ system_call_fastpath:
>> * Has incompletely filled pt_regs.
>> */
>> LOCKDEP_SYS_EXIT
>> + /*
>> + * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
>> + * it is too small to ever cause noticeable irq latency.
>
> * ... but if we enter the slowpath from here, we'll execute a
> * proper TRACE_IRQS_OFF call.
>
>> @@ -298,6 +298,7 @@ system_call_fastpath:
>> * 64bit SYSRET restores rip from rcx,
>> * rflags from r11 (but RF and VM bits are forced to 0),
>> * cs and ss are loaded from MSRs.
>> + * Restoration of rflags re-enables interrupts.
>> */
>> USERGS_SYSRET64
>
> Is that true even if user-space disabled irqs (via CLI) and executed a
> syscall while having irqs off?
sysret restore "interrupt enable" state as it was before syscall.
Userspace normally can't disable interrupts. Therefore
usually sysret will enable interrupts because they were enabled
before syscall.
Userspace (root) can disable interrupts after it executed sys_iopl(3).
Then CLI starts working. In this case, sysret won't enable interrupts.
This is a very untypical use case.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments
2015-03-25 18:20 [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Denys Vlasenko
2015-03-25 18:20 ` [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path Denys Vlasenko
@ 2015-03-26 11:45 ` Borislav Petkov
1 sibling, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2015-03-26 11:45 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, H. Peter Anvin,
Andy Lutomirski, Oleg Nesterov, Frederic Weisbecker,
Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel
On Wed, Mar 25, 2015 at 07:20:28PM +0100, Denys Vlasenko wrote:
> A named label "ret_from_sys_call" implies that there are jumps
> to this location from elsewhere, as happens with many other labels
> in this file.
> But this label is used only by the JMP a few insns above.
> To make that obvious, use local numeric label instead.
>
> Do the same in the second copy of the syscall table dispatch code,
> it has a similar JMP which skips the CALL.
>
> Improve comments:
>
> "and return regs->ax" isn't too informative. We always return regs->ax.
>
> The comment suggesting that it'd be cool to use rip relative addressing for CALL
> is deleted. It's unclear why that would be an improvement - we aren't striving
> to use position-independent code here. PIC code here would require something like
> LEA sys_call_table(%rip),reg + CALL *(reg,%rax*8)...
>
> "iret frame is also incomplete" is no longer true, fix that too.
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>
> Changes in v2: gave "jmp 1f" treatment to the second jump.
>
> arch/x86/kernel/entry_64.S | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
Acked-by: Borislav Petkov <bp@suse.de>
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-26 11:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-25 18:20 [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Denys Vlasenko
2015-03-25 18:20 ` [PATCH 2/2] x86/asm/entry/64: do not TRACE_IRQS fast SYSRET64 path Denys Vlasenko
2015-03-25 18:38 ` Ingo Molnar
2015-03-25 19:15 ` Denys Vlasenko
2015-03-26 11:45 ` [PATCH 1/2 v2] x86/asm/entry/64: better label name, fix comments Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox