Re: [PATCH v4 15/29] x86/mm/64: Enable vmapped stacks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andy Lutomirski <luto@amacapital.net>
To: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>, Nadav Amit <nadav.amit@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	"kernel-hardening@lists.openwall.com" 
	<kernel-hardening@lists.openwall.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>, Jann Horn <jann@thejh.net>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: Re: [PATCH v4 15/29] x86/mm/64: Enable vmapped stacks
Date: Mon, 27 Jun 2016 09:35:22 -0700	[thread overview]
Message-ID: <CALCETrU6fFNJDNvKn2ZJDz+5CW22mWb5nH6AXmNbGUvC4t2RSA@mail.gmail.com> (raw)
In-Reply-To: <CAMzpN2jVghmEbpx7yyOVcOa4GMw8iRUez0WeehGYnrtdm7Ad1w@mail.gmail.com>

On Mon, Jun 27, 2016 at 9:17 AM, Brian Gerst <brgerst@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 11:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, Jun 27, 2016 at 8:22 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Mon, Jun 27, 2016 at 8:12 AM, Brian Gerst <brgerst@gmail.com> wrote:
>>>> On Mon, Jun 27, 2016 at 11:01 AM, Brian Gerst <brgerst@gmail.com> wrote:
>>>>> On Sun, Jun 26, 2016 at 5:55 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>  #ifdef CONFIG_X86_64
>>>>>>  /* Runs on IST stack */
>>>>>>  dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
>>>>>>  {
>>>>>>         static const char str[] = "double fault";
>>>>>>         struct task_struct *tsk = current;
>>>>>> +#ifdef CONFIG_VMAP_STACK
>>>>>> +       unsigned long cr2;
>>>>>> +#endif
>>>>>>
>>>>>>  #ifdef CONFIG_X86_ESPFIX64
>>>>>>         extern unsigned char native_irq_return_iret[];
>>>>>> @@ -332,6 +350,20 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
>>>>>>         tsk->thread.error_code = error_code;
>>>>>>         tsk->thread.trap_nr = X86_TRAP_DF;
>>>>>>
>>>>>> +#ifdef CONFIG_VMAP_STACK
>>>>>> +       /*
>>>>>> +        * If we overflow the stack into a guard page, the CPU will fail
>>>>>> +        * to deliver #PF and will send #DF instead.  CR2 will contain
>>>>>> +        * the linear address of the second fault, which will be in the
>>>>>> +        * guard page below the bottom of the stack.
>>>>>> +        */
>>>>>> +       cr2 = read_cr2();
>>>>>> +       if ((unsigned long)tsk->stack - 1 - cr2 < PAGE_SIZE)
>>>>>> +               handle_stack_overflow(
>>>>>> +                       "kernel stack overflow (double-fault)",
>>>>>> +                       regs, cr2);
>>>>>> +#endif
>>>>>
>>>>> Is there any other way to tell if this was from a page fault?  If it
>>>>> wasn't a page fault then CR2 is undefined.
>>>>
>>>> I guess it doesn't really matter, since the fault is fatal either way.
>>>> The error message might be incorrect though.
>>>>
>>>
>>> It's at least worth a comment, though.  Maybe I should check if
>>> regs->rsp is within 40 bytes of the bottom of the stack, too, such
>>> that delivery of an inner fault would have double-faulted assuming the
>>> inner fault didn't use an IST vector.
>>>
>>
>> How about:
>>
>>     /*
>>      * If we overflow the stack into a guard page, the CPU will fail
>>      * to deliver #PF and will send #DF instead.  CR2 will contain
>>      * the linear address of the second fault, which will be in the
>>      * guard page below the bottom of the stack.
>>      *
>>      * We're limited to using heuristics here, since the CPU does
>>      * not tell us what type of fault failed and, if the first fault
>>      * wasn't a page fault, CR2 may contain stale garbage.  To mostly
>>      * rule out garbage, we check if the saved RSP is close enough to
>>      * the bottom of the stack to cause exception delivery to fail.
>>      * The worst case is 7 stack slots: one for alignment, five for
>>      * SS..RIP, and one for the error code.
>>      */
>>     tsk_stack = (unsigned long)task_stack_page(tsk);
>>     if (regs->rsp <= tsk_stack + 7*8 && regs->rsp > tsk_stack - PAGE_SIZE) {
>>         /* A double-fault due to #PF delivery failure is plausible. */
>>         cr2 = read_cr2();
>>         if (tsk_stack - 1 - cr2 < PAGE_SIZE)
>>             handle_stack_overflow(
>>                 "kernel stack overflow (double-fault)",
>>                 regs, cr2);
>>     }
>
> I think RSP anywhere in the guard page would be best, since it could
> have been decremented by a function prologue into the guard page
> before an access that triggers the page fault.
>

I think that can miss some stack overflows.  Suppose that RSP points
very close to the bottom of the stack and we take an unrelated fault.
The CPU can fail to deliver that fault and we get a double fault
instead.  But I counted wrong, too.  Do you like this version and its
explanation?

    /*
     * If we overflow the stack into a guard page, the CPU will fail
     * to deliver #PF and will send #DF instead.  Similarly, if we
     * take any non-IST exception while too close to the bottom of
     * the stack, the processor will get a page fault while
     * delivering the exception and will generate a double fault.
     *
     * According to the SDM (footnote in 6.15 under "Interrupt 14 -
     * Page-Fault Exception (#PF):
     *
     *   Processors update CR2 whenever a page fault is detected. If a
     *   second page fault occurs while an earlier page fault is being
     *   deliv- ered, the faulting linear address of the second fault will
     *   overwrite the contents of CR2 (replacing the previous
     *   address). These updates to CR2 occur even if the page fault
     *   results in a double fault or occurs during the delivery of a
     *   double fault.
     *
     * However, if we got here due to a non-page-fault exception while
     * delivering a non-page-fault exception, CR2 may contain a
     * stale value.
     *
     * As a heuristic: we consider this double fault to be a stack
     * overflow if CR2 points to the guard page and RSP is either
     * in the guard page or close enough to the bottom of the stack
     *
     * We're limited to using heuristics here, since the CPU does
     * not tell us what type of fault failed and, if the first fault
     * wasn't a page fault, CR2 may contain stale garbage.  To
     * mostly rule out garbage, we check if the saved RSP is close
     * enough to the bottom of the stack to cause exception delivery
     * to fail.  If RSP == tsk_stack + 48 and we take an exception,
     * the stack is already aligned and there will be enough room
     * SS, RSP, RFLAGS, CS, RIP, and a possible error code.  With
     * any less space left, exception delivery could fail.
     */
    tsk_stack = (unsigned long)task_stack_page(tsk);
    if (regs->rsp < tsk_stack + 48 && regs->rsp > tsk_stack - PAGE_SIZE) {
        /* A double-fault due to #PF delivery failure is plausible. */
        cr2 = read_cr2();
        if (tsk_stack - 1 - cr2 < PAGE_SIZE)
            handle_stack_overflow(
                "kernel stack overflow (double-fault)",
                regs, cr2);
    }

next prev parent reply	other threads:[~2016-06-27 16:35 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-26 21:55 [PATCH v4 00/29] virtually mapped stacks and thread_info cleanup Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 01/29] bluetooth: Switch SMP to crypto_cipher_encrypt_one() Andy Lutomirski
2016-06-27  5:58   ` Marcel Holtmann
2016-06-27  8:54     ` Ingo Molnar
2016-06-27 22:30       ` Marcel Holtmann
2016-06-27 22:33         ` Andy Lutomirski
2016-07-04 17:56           ` Marcel Holtmann
2016-07-06 13:17             ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 02/29] rxrpc: Avoid using stack memory in SG lists in rxkad Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 03/29] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 04/29] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated Andy Lutomirski
2016-06-28 18:48   ` Borislav Petkov
2016-06-28 19:07     ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 05/29] x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables() Andy Lutomirski
2016-06-27  7:19   ` Borislav Petkov
2016-06-26 21:55 ` [PATCH v4 06/29] mm: Track NR_KERNEL_STACK in KiB instead of number of stacks Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 07/29] mm: Fix memcg stack accounting for sub-page stacks Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 08/29] dma-api: Teach the "DMA-from-stack" check about vmapped stacks Andy Lutomirski
2016-06-30 19:37   ` Borislav Petkov
2016-07-06 13:20     ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 09/29] fork: Add generic vmalloced stack support Andy Lutomirski
2016-07-01 14:59   ` Borislav Petkov
2016-07-01 16:30     ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 10/29] x86/die: Don't try to recover from an OOPS on a non-default stack Andy Lutomirski
2016-07-02 17:24   ` Borislav Petkov
2016-07-02 18:34     ` Josh Poimboeuf
2016-07-03  9:40       ` Borislav Petkov
2016-07-03 14:25       ` Andy Lutomirski
2016-07-03 18:42         ` Borislav Petkov
2016-06-26 21:55 ` [PATCH v4 11/29] x86/dumpstack: When OOPSing, rewind the stack before do_exit Andy Lutomirski
2016-07-04 18:45   ` Borislav Petkov
2016-06-26 21:55 ` [PATCH v4 12/29] x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 13/29] x86/dumpstack: Try harder to get a call trace on stack overflow Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 14/29] x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 15/29] x86/mm/64: Enable vmapped stacks Andy Lutomirski
2016-06-27 15:01   ` Brian Gerst
2016-06-27 15:12     ` Brian Gerst
2016-06-27 15:22       ` Andy Lutomirski
2016-06-27 15:54         ` Andy Lutomirski
2016-06-27 16:17           ` Brian Gerst
2016-06-27 16:35             ` Andy Lutomirski [this message]
2016-06-27 17:09               ` Brian Gerst
2016-06-27 17:23                 ` Brian Gerst
2016-06-27 17:28           ` Linus Torvalds
2016-06-27 17:30             ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 16/29] x86/mm: Improve stack-overflow #PF handling Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 17/29] x86: Move uaccess_err and sig_on_uaccess_err to thread_struct Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 18/29] x86: Move addr_limit " Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 19/29] signal: Consolidate {TS,TLF}_RESTORE_SIGMASK code Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 20/29] x86/smp: Remove stack_smp_processor_id() Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 21/29] x86/smp: Remove unnecessary initialization of thread_info::cpu Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 22/29] x86/asm: Move 'status' from struct thread_info to struct thread_struct Andy Lutomirski
2016-06-26 23:55   ` Brian Gerst
2016-06-27  0:23     ` Andy Lutomirski
2016-06-27  0:36       ` Brian Gerst
2016-06-27  0:40         ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 23/29] kdb: Use task_cpu() instead of task_thread_info()->cpu Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 24/29] x86/entry: Get rid of pt_regs_to_thread_info() Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 25/29] um: Stop conflating task_struct::stack with thread_info Andy Lutomirski
2016-06-26 23:40   ` Brian Gerst
2016-06-26 23:49     ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 26/29] sched: Allow putting thread_info into task_struct Andy Lutomirski
2016-07-11 10:08   ` [kernel-hardening] " Mark Rutland
2016-07-11 14:55     ` Andy Lutomirski
2016-07-11 15:08       ` Mark Rutland
     [not found]       ` <CA+55aFy2Sno+bS0A2k0cMWpEJy-bpXufSAw3+ufrfQYbp9rcMQ@mail.gmail.com>
2016-07-11 16:31         ` Mark Rutland
2016-07-11 16:42           ` Linus Torvalds
2016-06-26 21:55 ` [PATCH v4 27/29] x86: Move " Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 28/29] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK Andy Lutomirski
2016-06-27  2:35   ` Andy Lutomirski
2016-06-26 21:55 ` [PATCH v4 29/29] fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set Andy Lutomirski
2016-06-28  7:32 ` [PATCH v4 02/29] rxrpc: Avoid using stack memory in SG lists in rxkad David Howells
2016-06-28  7:37   ` Herbert Xu
2016-06-28  9:07   ` David Howells
2016-06-28  9:45     ` Herbert Xu
2016-06-28  7:41 ` David Howells
2016-06-28  7:52 ` David Howells
2016-06-28  7:55   ` Herbert Xu
2016-06-28  8:54   ` David Howells
2016-06-28  9:43     ` Herbert Xu
2016-06-28 10:00     ` David Howells
2016-06-28 13:23     ` David Howells
2016-06-29  7:06 ` [PATCH v4 00/29] virtually mapped stacks and thread_info cleanup Mika Penttilä
2016-06-29 17:24   ` Mika Penttilä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrU6fFNJDNvKn2ZJDz+5CW22mWb5nH6AXmNbGUvC4t2RSA@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jann@thejh.net \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).