public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	Borislav Petkov <bp@suse.de>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>
Subject: Re: [PATCH urgent v2] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set
Date: Thu, 2 Apr 2015 11:07:44 +0200	[thread overview]
Message-ID: <20150402090744.GA26846@gmail.com> (raw)
In-Reply-To: <9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org>


* Andy Lutomirski <luto@kernel.org> wrote:

> When I wrote the opportunistic SYSRET code, I missed an important
> difference between SYSRET and IRET.  Both instructions are capable
> of setting EFLAGS.TF, but they behave differently when doing so.
> IRET will not issue a #DB trap after execution when it sets TF This
> is critical -- otherwise you'd never be able to make forward
> progress when returning to userspace.  SYSRET, on the other hand,
> will trap with #DB immediately after returning to CPL3, and the next
> instruction will never execute.
> 
> This breaks anything that opportunistically SYSRETs to a user
> context with TF set.  For example, running this code with TF set and
> a SIGTRAP handler loaded never gets past post_nop.
> 
> 	extern unsigned char post_nop[];
> 	asm volatile ("pushfq\n\t"
> 		      "popq %%r11\n\t"
> 		      "nop\n\t"
> 		      "post_nop:"
> 		      : : "c" (post_nop) : "r11");
> 
> In my defense, I can't find this documented in the AMD or Intel
> manual.
> 
> Fix it by using IRET to restore TF.
> 
> Fixes: 2a23c6b8a9c4 x86_64, entry: Use sysret to return to userspace when possible
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> 
> This affects 4.0-rc as well as -tip.  A full test case lives here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/
> 
> It's called single_step_syscall_64.
> 
> On Intel systems, the 32-bit version of that test fails for unrelated
> reasons, but that's not a regression, and fixing it will be much more
> intrusive.
> 
> Changes from v1:
>  - Remove mention of testl from changelog.
>  - Improve comment per Denys' suggestion.
> 
> arch/x86/kernel/entry_64.S | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 750c6efcb718..537716380959 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -715,7 +715,21 @@ retint_swapgs:		/* return to user-space */
>  	cmpq %r11,EFLAGS(%rsp)		/* R11 == RFLAGS */
>  	jne opportunistic_sysret_failed
>  
> -	testq $X86_EFLAGS_RF,%r11	/* sysret can't restore RF */
> +	/*
> +	 * SYSRET can't restore RF.  SYSRET can restore TF, but unlike IRET,
> +	 * restoring TF results in a trap from userspace immediately after
> +	 * SYSRET.  This would cause an infinite loop whenever #DB happens
> +	 * with register state that satisfies the opportunistic SYSRET
> +	 * conditions.  For example, single-stepping this user code:
> +	 *
> +	 *           movq $stuck_here,%rcx
> +	 *           pushfq
> +	 *           popq %r11
> +	 *   stuck_here:
> +	 *
> +	 * would never get past stuck_here.
> +	 */
> +	testq $(X86_EFLAGS_RF|X86_EFLAGS_TF),%r11
>  	jnz opportunistic_sysret_failed

So I merged this as it's an obvious bugfix, but in hindsight I'm 
really uneasy about the whole opportunistic SYSRET concept: it appears 
that the chance that %rcx matches return-%rip is astronomical - this 
is why this bug wasn't noticed live so far.

So should we really be doing this?

It invites fragility and despite being clever, it adds average 
overhead to interrupt returns to user-space: the chance of the 
optimization hitting and helping us are much lower than the always 
paid for cost of doing the tests for it...

So please defend it.

Thanks,

	Ingo

  parent reply	other threads:[~2015-04-02  9:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-01 21:26 [PATCH urgent v2] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set Andy Lutomirski
2015-04-02  6:21 ` Borislav Petkov
2015-04-02  9:07 ` Ingo Molnar [this message]
2015-04-02 10:07   ` Denys Vlasenko
2015-04-02 10:37     ` Ingo Molnar
2015-04-02 11:14       ` Brian Gerst
2015-04-02 12:24         ` Denys Vlasenko
2015-04-02 12:31           ` Ingo Molnar
2015-04-02 12:59             ` Denys Vlasenko
2015-04-02 15:49               ` Denys Vlasenko
2015-04-02 16:08                 ` Ingo Molnar
2015-04-02 14:26             ` Andy Lutomirski
2015-04-02 12:32 ` [tip:x86/urgent] x86/asm/entry/64: " tip-bot for Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150402090744.GA26846@gmail.com \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox