All of lore.kernel.org
 help / color / mirror / Atom feed
From: Denys Vlasenko <dvlasenk@redhat.com>
To: Ingo Molnar <mingo@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: ia32_sysenter_target does not preserve EFLAGS
Date: Fri, 27 Mar 2015 15:25:47 +0100	[thread overview]
Message-ID: <5515686B.3080204@redhat.com> (raw)

Hi,

While running some tests I noticed that EFLAGS
is not saved across syscalls if I use 32-bit
userspace, use SYSENTER, and paravirt is active.

Looking at the code, it's actually clear why that happens.

/*
 * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
 * IF and VM in rflags are cleared (IOW: interrupts are off).
 * SYSENTER does not save anything on the stack,
 * and does not save old rip (!!!) and rflags.
 */
ENTRY(ia32_sysenter_target)
        SWAPGS_UNSAFE_STACK  <============================
        movq    PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp
        ENABLE_INTERRUPTS(CLBR_NONE)

        movl    %ebp, %ebp
        movl    %eax, %eax
        movl    ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d

        /* Construct struct pt_regs on stack */
        pushq_cfi       $__USER32_DS            /* pt_regs->ss */
        pushq_cfi       %rbp                    /* pt_regs->sp */
        CFI_REL_OFFSET  rsp,0
        pushfq_cfi                              /* pt_regs->flags */

The SWAPGS_UNSAFE_STACK, it's it involves paravirt callbacks,
will change EFLAGS, and it *can't* save/restore them -
there is no place to save it, since neither stack nor
PER_CPU() is usable at that point.

Interestingly, *no one ever complained*!

Apparently, users *don't* depend on arithmetic flags
to survive over syscall. They also okay with DF flag
being cleared.

Let's go flag-by-flag.

ID - probably no one depends on it
VIP,VIF,VM - v86 stuff, not supported in 64bit
AC - someone probably do use this
RF - should be cleared to 0
NT - iret via task gate, not supported in 64bit
IOPL - usually 00, sys_iopl() can change it
DF - according to C ABI, should be 0
IF - should be preserved (but almost always 1)
TF - should be preserved
arith flags - probably no one cares

IOW. Bits to be preseved are only AC, IOPL, TF, and _maybe_
IF.

AC and IOPL are preserved even with this paravirt quirk
because paravirt hooks do not mangle them.

TF preservation and proper restoration is handled by
	do_debug + syscall_trace_enter_phase2 + iret
combo.

We unconditionally set IF. This is only a problem for applications
which use sys_iopl(3) and, disable IRQs in userspace and perform
syscalls. The set of such apps is probably empty.
(This "bug" exists even for non-paravirt case).

So, formally, we have a bug: we do not preserve IF,
DF and arith flags.

I'm proposing to use this opportunity to amend syscall ABI
to say that arith flags are not preserved across syscalls,
and DF can be cleared to 0 by syscalls (but can't be set to 1).
Evidently, it's broken for some time for some virtualized
setups and users are okay.

I'm not sure what to do with the "bug" of forcing IF=1.
Fix it? Or also declare that syscalls can set IF=1?
Do you think this is a legitimate userspace code?

	sys_iopl(3);
	cli;
	syscall();
	/* expects irqs still disabled */

-- 
vda

             reply	other threads:[~2015-03-27 14:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 14:25 Denys Vlasenko [this message]
2015-03-27 17:12 ` ia32_sysenter_target does not preserve EFLAGS Borislav Petkov
2015-03-27 18:37 ` Andy Lutomirski
2015-03-27 20:09   ` Linus Torvalds
2015-03-27 20:16     ` Andy Lutomirski
2015-03-27 20:31       ` Linus Torvalds
2015-03-27 21:11         ` Andy Lutomirski
2015-03-28  9:42     ` Borislav Petkov
2015-03-27 20:53   ` Brian Gerst
2015-03-27 21:02     ` Linus Torvalds
2015-03-28  9:30       ` Ingo Molnar
2015-03-28  9:39       ` Ingo Molnar
2015-03-27 20:00 ` Linus Torvalds
2015-03-28  0:34   ` Denys Vlasenko
2015-03-28  9:28     ` Olivier Galibert
2015-03-28  9:46     ` Ingo Molnar
2015-03-28 11:17       ` Denys Vlasenko
2015-03-29 17:12         ` Andy Lutomirski
     [not found]     ` <CA+55aFxwNq6g+Oi-UhGBgEZuDQyNkeg6qZnkDY11PNhTN=fmzg@mail.gmail.com>
2015-03-28 11:01       ` Denys Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5515686B.3080204@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.