From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brian Gerst Subject: Re: [PATCH v4 15/29] x86/mm/64: Enable vmapped stacks Date: Mon, 27 Jun 2016 12:17:50 -0400 Message-ID: References: Reply-To: kernel-hardening@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: List-Post: List-Help: List-Unsubscribe: List-Subscribe: In-Reply-To: To: Andy Lutomirski Cc: Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , linux-arch , Borislav Petkov , Nadav Amit , Kees Cook , "kernel-hardening@lists.openwall.com" , Linus Torvalds , Josh Poimboeuf , Jann Horn , Heiko Carstens List-Id: linux-arch.vger.kernel.org On Mon, Jun 27, 2016 at 11:54 AM, Andy Lutomirski wrote: > On Mon, Jun 27, 2016 at 8:22 AM, Andy Lutomirski wrote: >> On Mon, Jun 27, 2016 at 8:12 AM, Brian Gerst wrote: >>> On Mon, Jun 27, 2016 at 11:01 AM, Brian Gerst wrote: >>>> On Sun, Jun 26, 2016 at 5:55 PM, Andy Lutomirski wrote: >>>>> #ifdef CONFIG_X86_64 >>>>> /* Runs on IST stack */ >>>>> dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) >>>>> { >>>>> static const char str[] = "double fault"; >>>>> struct task_struct *tsk = current; >>>>> +#ifdef CONFIG_VMAP_STACK >>>>> + unsigned long cr2; >>>>> +#endif >>>>> >>>>> #ifdef CONFIG_X86_ESPFIX64 >>>>> extern unsigned char native_irq_return_iret[]; >>>>> @@ -332,6 +350,20 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) >>>>> tsk->thread.error_code = error_code; >>>>> tsk->thread.trap_nr = X86_TRAP_DF; >>>>> >>>>> +#ifdef CONFIG_VMAP_STACK >>>>> + /* >>>>> + * If we overflow the stack into a guard page, the CPU will fail >>>>> + * to deliver #PF and will send #DF instead. CR2 will contain >>>>> + * the linear address of the second fault, which will be in the >>>>> + * guard page below the bottom of the stack. >>>>> + */ >>>>> + cr2 = read_cr2(); >>>>> + if ((unsigned long)tsk->stack - 1 - cr2 < PAGE_SIZE) >>>>> + handle_stack_overflow( >>>>> + "kernel stack overflow (double-fault)", >>>>> + regs, cr2); >>>>> +#endif >>>> >>>> Is there any other way to tell if this was from a page fault? If it >>>> wasn't a page fault then CR2 is undefined. >>> >>> I guess it doesn't really matter, since the fault is fatal either way. >>> The error message might be incorrect though. >>> >> >> It's at least worth a comment, though. Maybe I should check if >> regs->rsp is within 40 bytes of the bottom of the stack, too, such >> that delivery of an inner fault would have double-faulted assuming the >> inner fault didn't use an IST vector. >> > > How about: > > /* > * If we overflow the stack into a guard page, the CPU will fail > * to deliver #PF and will send #DF instead. CR2 will contain > * the linear address of the second fault, which will be in the > * guard page below the bottom of the stack. > * > * We're limited to using heuristics here, since the CPU does > * not tell us what type of fault failed and, if the first fault > * wasn't a page fault, CR2 may contain stale garbage. To mostly > * rule out garbage, we check if the saved RSP is close enough to > * the bottom of the stack to cause exception delivery to fail. > * The worst case is 7 stack slots: one for alignment, five for > * SS..RIP, and one for the error code. > */ > tsk_stack = (unsigned long)task_stack_page(tsk); > if (regs->rsp <= tsk_stack + 7*8 && regs->rsp > tsk_stack - PAGE_SIZE) { > /* A double-fault due to #PF delivery failure is plausible. */ > cr2 = read_cr2(); > if (tsk_stack - 1 - cr2 < PAGE_SIZE) > handle_stack_overflow( > "kernel stack overflow (double-fault)", > regs, cr2); > } I think RSP anywhere in the guard page would be best, since it could have been decremented by a function prologue into the guard page before an access that triggers the page fault. -- Brian Gerst From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f68.google.com ([209.85.218.68]:36065 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751673AbcF0QSF (ORCPT ); Mon, 27 Jun 2016 12:18:05 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Brian Gerst Date: Mon, 27 Jun 2016 12:17:50 -0400 Message-ID: Subject: Re: [PATCH v4 15/29] x86/mm/64: Enable vmapped stacks Content-Type: text/plain; charset=UTF-8 Sender: linux-arch-owner@vger.kernel.org List-ID: To: Andy Lutomirski Cc: Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , linux-arch , Borislav Petkov , Nadav Amit , Kees Cook , "kernel-hardening@lists.openwall.com" , Linus Torvalds , Josh Poimboeuf , Jann Horn , Heiko Carstens Message-ID: <20160627161750.RwPTi9k_6yjLTWBW-lKzJ1MF3WA9tyPe5RqNP2_RYhE@z> On Mon, Jun 27, 2016 at 11:54 AM, Andy Lutomirski wrote: > On Mon, Jun 27, 2016 at 8:22 AM, Andy Lutomirski wrote: >> On Mon, Jun 27, 2016 at 8:12 AM, Brian Gerst wrote: >>> On Mon, Jun 27, 2016 at 11:01 AM, Brian Gerst wrote: >>>> On Sun, Jun 26, 2016 at 5:55 PM, Andy Lutomirski wrote: >>>>> #ifdef CONFIG_X86_64 >>>>> /* Runs on IST stack */ >>>>> dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) >>>>> { >>>>> static const char str[] = "double fault"; >>>>> struct task_struct *tsk = current; >>>>> +#ifdef CONFIG_VMAP_STACK >>>>> + unsigned long cr2; >>>>> +#endif >>>>> >>>>> #ifdef CONFIG_X86_ESPFIX64 >>>>> extern unsigned char native_irq_return_iret[]; >>>>> @@ -332,6 +350,20 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) >>>>> tsk->thread.error_code = error_code; >>>>> tsk->thread.trap_nr = X86_TRAP_DF; >>>>> >>>>> +#ifdef CONFIG_VMAP_STACK >>>>> + /* >>>>> + * If we overflow the stack into a guard page, the CPU will fail >>>>> + * to deliver #PF and will send #DF instead. CR2 will contain >>>>> + * the linear address of the second fault, which will be in the >>>>> + * guard page below the bottom of the stack. >>>>> + */ >>>>> + cr2 = read_cr2(); >>>>> + if ((unsigned long)tsk->stack - 1 - cr2 < PAGE_SIZE) >>>>> + handle_stack_overflow( >>>>> + "kernel stack overflow (double-fault)", >>>>> + regs, cr2); >>>>> +#endif >>>> >>>> Is there any other way to tell if this was from a page fault? If it >>>> wasn't a page fault then CR2 is undefined. >>> >>> I guess it doesn't really matter, since the fault is fatal either way. >>> The error message might be incorrect though. >>> >> >> It's at least worth a comment, though. Maybe I should check if >> regs->rsp is within 40 bytes of the bottom of the stack, too, such >> that delivery of an inner fault would have double-faulted assuming the >> inner fault didn't use an IST vector. >> > > How about: > > /* > * If we overflow the stack into a guard page, the CPU will fail > * to deliver #PF and will send #DF instead. CR2 will contain > * the linear address of the second fault, which will be in the > * guard page below the bottom of the stack. > * > * We're limited to using heuristics here, since the CPU does > * not tell us what type of fault failed and, if the first fault > * wasn't a page fault, CR2 may contain stale garbage. To mostly > * rule out garbage, we check if the saved RSP is close enough to > * the bottom of the stack to cause exception delivery to fail. > * The worst case is 7 stack slots: one for alignment, five for > * SS..RIP, and one for the error code. > */ > tsk_stack = (unsigned long)task_stack_page(tsk); > if (regs->rsp <= tsk_stack + 7*8 && regs->rsp > tsk_stack - PAGE_SIZE) { > /* A double-fault due to #PF delivery failure is plausible. */ > cr2 = read_cr2(); > if (tsk_stack - 1 - cr2 < PAGE_SIZE) > handle_stack_overflow( > "kernel stack overflow (double-fault)", > regs, cr2); > } I think RSP anywhere in the guard page would be best, since it could have been decremented by a function prologue into the guard page before an access that triggers the page fault. -- Brian Gerst