public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Proposal for finishing the 64-bit x86 syscall cleanup
@ 2015-08-24 21:13 Andy Lutomirski
  2015-08-25  7:29 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-08-24 21:13 UTC (permalink / raw)
  To: X86 ML, Denys Vlasenko, Brian Gerst, Borislav Petkov,
	Linus Torvalds, linux-kernel@vger.kernel.org, Jan Beulich

Hi all-

I want to (try to) mostly or fully get rid of the messy bits (as
opposed to the hardware-bs-forced bits) of the 64-bit syscall asm.
There are two major conceptual things that are in the way.

Thing 1: partial pt_regs

64-bit fast path syscalls don't fully initialize pt_regs: bx, bp, and
r12-r15 are uninitialized.  Some syscalls require them to be
initialized, and they have special awful stubs to do it.  The entry
and exit tracing code (except for phase1 tracing) also need them
initialized, and they have their own messy initialization.  Compat
syscalls are their own private little mess here.

This gets in the way of all kinds of cleanups, because C code can't
switch between the full and partial pt_regs states.

I can see two ways out.  We could remove the optimization entirely,
which consists of pushing and popping six more registers and adds
about ten cycles to fast path syscalls on Sandy Bridge.  It also
simplifies and presumably speeds up the slow paths.

We could also annotate with syscalls need full regs and jump to the
slow path for them.  This would leave the fast path unchanged (we
could duplicate the sys call table so that regs-requiring syscalls
would turn into some asm that switches to the slow path).  We'd make
the syscall table say something like:

59      64      execve                  sys_execve:regs

The fast path would have exactly identical performance and the slow
path would presumably speed up.  The down side would be additional
complexity.

Thing 2: vdso compilation with binutils that doesn't support .cfi directives

Userspace debuggers really like having the vdso properly
CFI-annotated, and the 32-bit fast syscall entries are annotatied
manually in hexidecimal.  AFAIK Jan Beulich is the only person who
understands it.

I want to be able to change the entries a little bit to clean them up
(and possibly rework the SYSCALL32 and SYSENTER register tricks, which
currently suck), but it's really, really messy right now because of
the hex CFI stuff.  Could we just drop the CFI annotations if the
binutils version is too old or even just require new enough binutils
to build 32-bit and compat kernels?

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-27  3:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-24 21:13 Proposal for finishing the 64-bit x86 syscall cleanup Andy Lutomirski
2015-08-25  7:29 ` Jan Beulich
2015-08-25  8:18 ` Ingo Molnar
2015-08-25  8:42   ` Ingo Molnar
2015-08-25 10:59 ` Brian Gerst
2015-08-25 16:28   ` Andy Lutomirski
2015-08-25 16:59     ` Linus Torvalds
2015-08-26  5:20     ` Brian Gerst
2015-08-26 17:10       ` Andy Lutomirski
2015-08-27  3:13         ` Brian Gerst
2015-08-27  3:38           ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox