public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>
Subject: [RFC 00/30] x86: Rewrite all syscall entries except native 64-bit
Date: Tue,  1 Sep 2015 15:41:00 -0700	[thread overview]
Message-ID: <cover.1441146105.git.luto@kernel.org> (raw)

Here's a monster series that I'm working on.  I think it's in decent
shape now.

The first couple patches are tests and some old stuff.  There's a
test that validates the vDSO AT_SYSINFO annotations (which fails on
32-bit Debian systems for some reason that I can't yet fathom
because fast syscalls simply don't happen on my VM for unknown
reasons presumably related to glibc bugs or misconfiguration, and I
need to do something about the test).  There's also a test that
exercises some assumptions that signal handling and ptracers make
about syscalls that currently do *not* hold on 64-bit AMD using
32-bit AT_SYSINFO.

The next few patches are the NT stuff.  Ingo, feel free to pretend
you don't see it until the merge window closes :)

The rest is basically a rewrite of syscalls for all cases except
64-bit native.  With these patches applied, there is a single 32-bit
vDSO and it uses SYSCALL, SYSENTER, and INT80 almost interchangeably
via alternatives.  The semantics of SYSENTER and SYSCALL are defined
as:

 1. If SYSCALL, ESP = ECX
 2. ECX = *ESP
 3. IP = INT80 landing pad
 4. Opportunistic SYSRET/SYSEXIT is enabled on return

The vDSO is rearranged so that these semantics work.  Anything that
backs IP up by 2 ends up pointing at a bona fide int $0x80
instruction with the expected regs.

In the process, the vDSO CFI annotations (which are actually used)
get rewritten using normal CFI directives.

Opportunistic SYSRET/SYSEXIT only happens on return when CS and SS
are as expected, IP points to the INT80 landing pad, and flags are
in good shape.

Other than that, the system call entries are simplified to the bare
minimum prologue and a call to a C function.  Amusingly, SYSENTER
and SYSCALL32 use the same C function.

To make that work, I had to remove all the 32-bit syscall stubs
except the clone argument hack.  This is because, for C code to call
through the system call table, the system call table entries need to
be real function pointers with C-compatible ABIs.

There is nothing at all anymore that requires that x86_32 syscalls
be asmlinkage.  That could be removed in a subsequent patch.

The upshot appears to be a ~25 cycle performance hit on 32-bit fast
path syscalls.  The slow path is probably faster under most
circumstances and, if the exit slow path gets hit, it'll be much
faster because (as we already do in the 64-bit native case) we can
still use SYSEXIT/SYSRET.

The patchset is structured as a removal of the old fast syscall
code, then the change that makes syscalls into real functions, then
a clean re-implementation of fast syscalls.

If we want some of the 25 cycles back, we could consider open-coding
a new C fast path.

When reading the diffstat, keep in mind that 544 lines are new tests
and ~187 are reinstatement of the CFI macros for compatability with
old binutils.  There are only ~444 lines of real new code in here,
which I think is pretty good.  The asm diffstat is:

 arch/x86/entry/entry_32.S                | 184 +++----
 arch/x86/entry/entry_64_compat.S         | 541 +++++----------------
 arch/x86/entry/vdso/vdso32/int80.S       |  56 ---
 arch/x86/entry/vdso/vdso32/syscall.S     |  75 ---
 arch/x86/entry/vdso/vdso32/sysenter.S    | 116 -----
 arch/x86/entry/vdso/vdso32/system_call.S |  57 +++
 6 files changed, 246 insertions(+), 783 deletions(-)

Andy Lutomirski (30):
  selftests/x86: Add a test for vDSO unwinding
  selftests/x86: Add a test for syscall restart and arg modification
  x86/entry/64/compat: Fix SYSENTER's NT flag before user memory access
  x86/entry: Move lockdep_sys_exit to prepare_exit_to_usermode
  x86/entry/64/compat: After SYSENTER, move STI after the NT fixup
  x86/sched/64: Don't save flags on context switch (reinstated)
  x86/vdso: Remove runtime 32-bit vDSO selection
  x86/asm: Re-add manual CFI infrastructure
  x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm
  x86/vdso: Replace hex int80 CFI annotations with gas directives
  x86/vdso/32: Save extra registers in the INT80 vsyscall path
  x86/entry/64/compat: Disable SYSENTER and SYSCALL32 entries
  x86/entry/64/compat: Remove audit optimizations
  x86/entry/64/compat: Remove most of the fast system call machinery
  x86/entry/64/compat: Set up full pt_regs for all compat syscalls
  x86/entry/syscalls: Move syscall table declarations into
    asm/syscalls.h
  x86/syscalls: Give sys_call_ptr_t a useful type
  x86/entry: Add do_syscall_32, a C function to do 32-bit syscalls
  x86/entry/64/compat: Migrate the body of the syscall entry to C
  x86/entry: Add C code for fast system call entries
  x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
  x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
  x86/entry/32: Open-code return tracking from fork and kthreads
  x86/entry/32: Switch INT80 to the new C syscall path
  x86/entry/32: Re-implement SYSENTER using the new C path
  x86/asm: Remove thread_info.sysenter_return
  x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
  x86/entry: Make irqs_disabled checks in exit code depend on lockdep
  x86/entry: Force inlining of 32-bit syscall code
  x86/entry: Micro-optimize compat fast syscall arg fetch

 arch/x86/Makefile                                  |  10 +-
 arch/x86/entry/common.c                            | 138 +++++-
 arch/x86/entry/entry_32.S                          | 184 +++----
 arch/x86/entry/entry_64.S                          |   3 +-
 arch/x86/entry/entry_64_compat.S                   | 541 +++++----------------
 arch/x86/entry/syscall_32.c                        |   9 +-
 arch/x86/entry/syscall_64.c                        |   4 +-
 arch/x86/entry/syscalls/syscall_32.tbl             |   8 +-
 arch/x86/entry/vdso/Makefile                       |  39 +-
 arch/x86/entry/vdso/vdso2c.c                       |   2 +-
 arch/x86/entry/vdso/vdso32-setup.c                 |  28 +-
 arch/x86/entry/vdso/vdso32/int80.S                 |  56 ---
 arch/x86/entry/vdso/vdso32/syscall.S               |  75 ---
 arch/x86/entry/vdso/vdso32/sysenter.S              | 116 -----
 arch/x86/entry/vdso/vdso32/system_call.S           |  57 +++
 arch/x86/entry/vdso/vma.c                          |  13 +-
 arch/x86/ia32/ia32_signal.c                        |   4 +-
 arch/x86/include/asm/dwarf2.h                      | 177 +++++++
 arch/x86/include/asm/elf.h                         |   2 +-
 arch/x86/include/asm/switch_to.h                   |  12 +-
 arch/x86/include/asm/syscall.h                     |  14 +-
 arch/x86/include/asm/thread_info.h                 |   1 -
 arch/x86/include/asm/vdso.h                        |  10 +-
 arch/x86/kernel/asm-offsets.c                      |   3 -
 arch/x86/kernel/signal.c                           |   4 +-
 arch/x86/um/sys_call_table_32.c                    |   7 +-
 arch/x86/um/sys_call_table_64.c                    |   7 +-
 arch/x86/xen/setup.c                               |  13 +-
 tools/testing/selftests/x86/Makefile               |   5 +-
 tools/testing/selftests/x86/ptrace_syscall.c       | 294 +++++++++++
 .../testing/selftests/x86/raw_syscall_helper_32.S  |  46 ++
 tools/testing/selftests/x86/unwind_vdso.c          | 204 ++++++++
 32 files changed, 1175 insertions(+), 911 deletions(-)
 delete mode 100644 arch/x86/entry/vdso/vdso32/int80.S
 delete mode 100644 arch/x86/entry/vdso/vdso32/syscall.S
 delete mode 100644 arch/x86/entry/vdso/vdso32/sysenter.S
 create mode 100644 arch/x86/entry/vdso/vdso32/system_call.S
 create mode 100644 arch/x86/include/asm/dwarf2.h
 create mode 100644 tools/testing/selftests/x86/ptrace_syscall.c
 create mode 100644 tools/testing/selftests/x86/raw_syscall_helper_32.S
 create mode 100644 tools/testing/selftests/x86/unwind_vdso.c

-- 
2.4.3


             reply	other threads:[~2015-09-01 22:41 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 22:41 Andy Lutomirski [this message]
2015-09-01 22:41 ` [RFC 01/30] selftests/x86: Add a test for vDSO unwinding Andy Lutomirski
2015-09-01 22:41 ` [RFC 02/30] selftests/x86: Add a test for syscall restart and arg modification Andy Lutomirski
2015-09-01 22:41 ` [RFC 03/30] x86/entry/64/compat: Fix SYSENTER's NT flag before user memory access Andy Lutomirski
2015-09-01 22:41 ` [RFC 04/30] x86/entry: Move lockdep_sys_exit to prepare_exit_to_usermode Andy Lutomirski
2015-09-01 22:41 ` [RFC 05/30] x86/entry/64/compat: After SYSENTER, move STI after the NT fixup Andy Lutomirski
2015-09-01 22:41 ` [RFC 06/30] x86/sched/64: Don't save flags on context switch (reinstated) Andy Lutomirski
2015-09-24 17:11   ` Andy Lutomirski
2015-09-25 12:21   ` [tip:x86/asm] x86/sched/64: Don't save flags on context switch ( reinstated) tip-bot for Andy Lutomirski
2015-09-01 22:41 ` [RFC 07/30] x86/vdso: Remove runtime 32-bit vDSO selection Andy Lutomirski
2015-09-01 22:41 ` [RFC 08/30] x86/asm: Re-add manual CFI infrastructure Andy Lutomirski
2015-09-01 22:41 ` [RFC 09/30] x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm Andy Lutomirski
2015-09-01 22:41 ` [RFC 10/30] x86/vdso: Replace hex int80 CFI annotations with gas directives Andy Lutomirski
2015-09-01 22:41 ` [RFC 11/30] x86/vdso/32: Save extra registers in the INT80 vsyscall path Andy Lutomirski
2015-09-01 22:41 ` [RFC 12/30] x86/entry/64/compat: Disable SYSENTER and SYSCALL32 entries Andy Lutomirski
2015-09-01 22:41 ` [RFC 13/30] x86/entry/64/compat: Remove audit optimizations Andy Lutomirski
2015-09-01 22:41 ` [RFC 14/30] x86/entry/64/compat: Remove most of the fast system call machinery Andy Lutomirski
2015-09-01 22:41 ` [RFC 15/30] x86/entry/64/compat: Set up full pt_regs for all compat syscalls Andy Lutomirski
2015-09-01 22:41 ` [RFC 16/30] x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h Andy Lutomirski
2015-09-01 22:41 ` [RFC 17/30] x86/syscalls: Give sys_call_ptr_t a useful type Andy Lutomirski
2015-09-01 22:41 ` [RFC 18/30] x86/entry: Add do_syscall_32, a C function to do 32-bit syscalls Andy Lutomirski
2015-09-01 22:41 ` [RFC 19/30] x86/entry/64/compat: Migrate the body of the syscall entry to C Andy Lutomirski
2015-09-01 22:41 ` [RFC 20/30] x86/entry: Add C code for fast system call entries Andy Lutomirski
2015-09-01 22:41 ` [RFC 21/30] x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace Andy Lutomirski
2015-09-01 22:41 ` [RFC 22/30] x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls Andy Lutomirski
2015-09-01 22:41 ` [RFC 23/30] x86/entry/32: Open-code return tracking from fork and kthreads Andy Lutomirski
2015-09-01 22:41 ` [RFC 24/30] x86/entry/32: Switch INT80 to the new C syscall path Andy Lutomirski
2015-09-03 16:45   ` Brian Gerst
2015-09-03 17:22     ` Andy Lutomirski
2015-09-01 22:41 ` [RFC 25/30] x86/entry/32: Re-implement SYSENTER using the new C path Andy Lutomirski
2015-09-01 22:41 ` [RFC 26/30] x86/asm: Remove thread_info.sysenter_return Andy Lutomirski
2015-09-01 22:41 ` [RFC 27/30] x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls Andy Lutomirski
2015-09-01 22:41 ` [RFC 28/30] x86/entry: Make irqs_disabled checks in exit code depend on lockdep Andy Lutomirski
2015-09-01 22:41 ` [RFC 29/30] x86/entry: Force inlining of 32-bit syscall code Andy Lutomirski
2015-09-01 22:41 ` [RFC 30/30] x86/entry: Micro-optimize compat fast syscall arg fetch Andy Lutomirski
2015-09-03  5:23 ` [RFC 00/30] x86: Rewrite all syscall entries except native 64-bit Brian Gerst
2015-09-03 17:18   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1441146105.git.luto@kernel.org \
    --to=luto@kernel.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox