From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from borg.tf-network.de ([62.75.218.204]:45699 "EHLO borg.tf-network.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751801AbbHRRMH (ORCPT ); Tue, 18 Aug 2015 13:12:07 -0400 Subject: Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry To: Jiri Slaby , "stable@vger.kernel.org" References: <20150817132349.GA26797@kroah.com> <1439852125-6581-1-git-send-email-whissi@whissi.de> <1439852125-6581-4-git-send-email-whissi@whissi.de> <55D35304.8050000@suse.cz> Cc: "luto@kernel.org" , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Greg Kroah-Hartman From: "Thomas D." Message-ID: <55D36763.80609@whissi.de> Date: Tue, 18 Aug 2015 19:12:03 +0200 MIME-Version: 1.0 In-Reply-To: <55D35304.8050000@suse.cz> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org List-ID: Hi, Jiri Slaby wrote: > On 08/18/2015, 12:55 AM, Thomas D wrote: >> From: Andy Lutomirski >> >> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream. >> >> Returning to userspace is tricky: IRET can fail, and ESPFIX can >> rearrange the stack prior to IRET. >> >> The NMI nesting fixup relies on a precise stack layout and >> atomic IRET. Rather than trying to teach the NMI nesting fixup >> to handle ESPFIX and failed IRET, punt: run NMIs that came from >> user mode on the normal kernel stack. >> >> This will make some nested NMIs visible to C code, but the C >> code is okay with that. >> >> As a side effect, this should speed up perf: it eliminates an >> RDMSR when NMIs come from user mode. >> >> Signed-off-by: Andy Lutomirski >> Reviewed-by: Steven Rostedt >> Reviewed-by: Borislav Petkov >> Cc: Linus Torvalds >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: stable@vger.kernel.org >> Signed-off-by: Ingo Molnar >> Signed-off-by: Greg Kroah-Hartman >> --- >> arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 73 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S >> index 28b08345..bd7d8aa 100644 >> --- a/arch/x86/kernel/entry_64.S >> +++ b/arch/x86/kernel/entry_64.S >> @@ -1715,19 +1715,88 @@ ENTRY(nmi) >> * a nested NMI that updated the copy interrupt stack frame, a >> * jump will be made to the repeat_nmi code that will handle the second >> * NMI. >> + * >> + * However, espfix prevents us from directly returning to userspace >> + * with a single IRET instruction. Similarly, IRET to user mode >> + * can fault. We therefore handle NMIs from user space like >> + * other IST entries. >> */ >> >> /* Use %rdx as out temp variable throughout */ >> pushq_cfi %rdx >> CFI_REL_OFFSET rdx, 0 >> >> + testb $3, CS-RIP+8(%rsp) >> + jz .Lnmi_from_kernel >> + >> + /* >> + * NMI from user mode. We need to run on the thread stack, but we >> + * can't go through the normal entry paths: NMIs are masked, and >> + * we don't want to enable interrupts, because then we'll end >> + * up in an awkward situation in which IRQs are on but NMIs >> + * are off. >> + */ >> + >> + SWAPGS >> + cld >> + movq %rsp, %rdx >> + movq PER_CPU_VAR(kernel_stack), %rsp > > I think you are wasting stack space here. With kernel_stack, you should > add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5 > registers is pre-reserved at kernel_stack already. (Or use movq instead > of the 5 pushq below.) > > Why don't you re-use the 3.16's version anyway? > >> + pushq 5*8(%rdx) /* pt_regs->ss */ >> + pushq 4*8(%rdx) /* pt_regs->rsp */ >> + pushq 3*8(%rdx) /* pt_regs->flags */ >> + pushq 2*8(%rdx) /* pt_regs->cs */ >> + pushq 1*8(%rdx) /* pt_regs->rip */ >> + pushq $-1 /* pt_regs->orig_ax */ >> + pushq %rdi /* pt_regs->di */ >> + pushq %rsi /* pt_regs->si */ >> + pushq (%rdx) /* pt_regs->dx */ >> + pushq %rcx /* pt_regs->cx */ >> + pushq %rax /* pt_regs->ax */ >> + pushq %r8 /* pt_regs->r8 */ >> + pushq %r9 /* pt_regs->r9 */ >> + pushq %r10 /* pt_regs->r10 */ >> + pushq %r11 /* pt_regs->r11 */ >> + pushq %rbx /* pt_regs->rbx */ >> + pushq %rbp /* pt_regs->rbp */ >> + pushq %r12 /* pt_regs->r12 */ >> + pushq %r13 /* pt_regs->r13 */ >> + pushq %r14 /* pt_regs->r14 */ >> + pushq %r15 /* pt_regs->r15 */ Mh, so you mean > + addq $KERNEL_STACK_OFFSET, %rsp between > + movq PER_CPU_VAR(kernel_stack), %rsp and > + pushq 5*8(%rdx) /* pt_regs->ss */ is missing? That seems to be the only difference between this patch and Debian's 3.16.7-ckt11-1+deb8u2 [1] version. [1] https://anonscm.debian.org/cgit/kernel/linux.git/tree/debian/patches/bugfix/x86/0006-x86-nmi-64-Switch-stacks-on-userspace-NMI-entry.patch?h=jessie#n69 -Thomas