From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Willy Tarreau <w@1wt.eu>
Subject: [ 15/25] x86_64, traps: Rework bad_iret
Date: Sat, 06 Dec 2014 18:42:03 +0100 [thread overview]
Message-ID: <20141206174149.063219019@1wt.eu> (raw)
In-Reply-To: <2a26e912d2438674771c36169c190830@local>
2.6.32-longterm review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski <luto@amacapital.net>
It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b645af2d5905c4e32399005b867987919cbfc3ae)
[wt: notes for backport to 2.6.32:
- _ASM_EXTABLE was open-coded.
- removed unneeded CFI_ENDPROC
- removed __visible (introduced in 2.6.37-rc1, not needed here)
/wt]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
arch/x86/kernel/entry_64.S | 48 ++++++++++++++++++----------------------------
arch/x86/kernel/traps.c | 29 ++++++++++++++++++++++++++++
2 files changed, 48 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c862780..d9bcee0 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -873,12 +873,14 @@ ENTRY(native_iret)
.global native_irq_return_iret
native_irq_return_iret:
+ /*
+ * This may fault. Non-paranoid faults on return to userspace are
+ * handled by fixup_bad_iret. These include #SS, #GP, and #NP.
+ * Double-faults due to espfix64 are handled in do_double_fault.
+ * Other faults here are fatal.
+ */
iretq
- .section __ex_table,"a"
- .quad native_irq_return_iret, bad_iret
- .previous
-
#ifdef CONFIG_X86_ESPFIX64
native_irq_return_ldt:
pushq_cfi %rax
@@ -905,25 +907,6 @@ native_irq_return_ldt:
jmp native_irq_return_iret
#endif
- .section .fixup,"ax"
-bad_iret:
- /*
- * The iret traps when the %cs or %ss being restored is bogus.
- * We've lost the original trap vector and error code.
- * #GPF is the most likely one to get for an invalid selector.
- * So pretend we completed the iret and took the #GPF in user mode.
- *
- * We are now running with the kernel GS after exception recovery.
- * But error_entry expects us to have user GS to match the user %cs,
- * so swap back.
- */
- pushq $0
-
- SWAPGS
- jmp general_protection
-
- .previous
-
/* edi: workmask, edx: work */
retint_careful:
CFI_RESTORE_STATE
@@ -1512,16 +1495,15 @@ error_sti:
/*
* There are two places in the kernel that can potentially fault with
- * usergs. Handle them here. The exception handlers after iret run with
- * kernel gs again, so don't set the user space flag. B stepping K8s
- * sometimes report an truncated RIP for IRET exceptions returning to
- * compat mode. Check for these here too.
+ * usergs. Handle them here. B stepping K8s sometimes report a
+ * truncated RIP for IRET exceptions returning to compat mode. Check
+ * for these here too.
*/
error_kernelspace:
incl %ebx
leaq native_irq_return_iret(%rip),%rcx
cmpq %rcx,RIP+8(%rsp)
- je error_swapgs
+ je error_bad_iret
movl %ecx,%eax /* zero extend */
cmpq %rax,RIP+8(%rsp)
je bstep_iret
@@ -1532,7 +1514,15 @@ error_kernelspace:
bstep_iret:
/* Fix truncated RIP */
movq %rcx,RIP+8(%rsp)
- je error_swapgs
+ /* fall through */
+
+error_bad_iret:
+ SWAPGS
+ mov %rsp,%rdi
+ call fixup_bad_iret
+ mov %rax,%rsp
+ decl %ebx /* Return to usergs */
+ jmp error_sti
END(error_entry)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03563a4..8a39a6c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -512,6 +512,35 @@ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
*regs = *eregs;
return regs;
}
+
+struct bad_iret_stack {
+ void *error_entry_ret;
+ struct pt_regs regs;
+};
+
+asmlinkage
+struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+{
+ /*
+ * This is called from entry_64.S early in handling a fault
+ * caused by a bad iret to user mode. To handle the fault
+ * correctly, we want move our stack frame to task_pt_regs
+ * and we want to pretend that the exception came from the
+ * iret target.
+ */
+ struct bad_iret_stack *new_stack =
+ container_of(task_pt_regs(current),
+ struct bad_iret_stack, regs);
+
+ /* Copy the IRET target to the new stack. */
+ memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
+
+ /* Copy the remainder of the stack from the current stack. */
+ memmove(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
+
+ BUG_ON(!user_mode_vm(&new_stack->regs));
+ return new_stack;
+}
#endif
/*
--
1.7.12.2.21.g234cd45.dirty
next prev parent reply other threads:[~2014-12-06 17:42 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <2a26e912d2438674771c36169c190830@local>
2014-12-06 17:41 ` [ 00/25] 2.6.32.65-longterm review Willy Tarreau
2014-12-08 0:58 ` Willy Tarreau
2014-12-06 17:41 ` [ 01/25] net: sendmsg: fix failed backport of "fix NULL pointer dereference" Willy Tarreau
2014-12-06 17:41 ` [ 02/25] x86, 64-bit: Move K8 B step iret fixup to fault entry asm Willy Tarreau
2014-12-06 17:41 ` [ 03/25] x86-64: Adjust frame type at paranoid_exit: Willy Tarreau
2014-12-06 17:41 ` [ 04/25] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels Willy Tarreau
2014-12-06 17:41 ` [ 05/25] x86-32, espfix: Remove filter for espfix32 due to race Willy Tarreau
2014-12-06 17:41 ` [ 06/25] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Willy Tarreau
2014-12-06 17:41 ` [ 07/25] x86, espfix: Move espfix definitions into a separate header file Willy Tarreau
2014-12-06 17:41 ` [ 08/25] x86, espfix: Fix broken header guard Willy Tarreau
2014-12-06 17:41 ` [ 09/25] x86, espfix: Make espfix64 a Kconfig option, fix UML Willy Tarreau
2014-12-06 17:41 ` [ 10/25] x86, espfix: Make it possible to disable 16-bit support Willy Tarreau
2014-12-08 2:58 ` Ben Hutchings
2014-12-08 7:11 ` Willy Tarreau
2014-12-06 17:41 ` [ 11/25] x86_64/entry/xen: Do not invoke espfix64 on Xen Willy Tarreau
2014-12-06 17:42 ` [ 12/25] x86/espfix/xen: Fix allocation of pages for paravirt page tables Willy Tarreau
2014-12-06 17:42 ` [ 13/25] x86_64, traps: Stop using IST for #SS Willy Tarreau
2014-12-06 17:42 ` [ 14/25] x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C Willy Tarreau
2014-12-06 17:42 ` Willy Tarreau [this message]
2014-12-06 17:42 ` [ 16/25] net/l2tp: dont fall back on UDP [get|set]sockopt Willy Tarreau
2014-12-06 17:42 ` [ 17/25] ALSA: control: Dont access controls outside of protected regions Willy Tarreau
2014-12-06 17:42 ` [ 18/25] ALSA: control: Fix replacing user controls Willy Tarreau
2014-12-06 17:42 ` [ 19/25] USB: whiteheat: Added bounds checking for bulk command response Willy Tarreau
2014-12-06 17:42 ` [ 20/25] net: sctp: fix panic on duplicate ASCONF chunks Willy Tarreau
2014-12-06 17:42 ` [ 21/25] net: sctp: fix remote memory pressure from excessive queueing Willy Tarreau
2014-12-06 17:42 ` [ 22/25] udf: Avoid infinite loop when processing indirect ICBs Willy Tarreau
2014-12-06 17:42 ` [ 23/25] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Willy Tarreau
2014-12-06 17:42 ` [ 24/25] mac80211: fix fragmentation code, particularly for encryption Willy Tarreau
2014-12-06 17:42 ` [ 25/25] ttusb-dec: buffer overflow in ioctl Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141206174149.063219019@1wt.eu \
--to=w@1wt.eu \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).