From: tip-bot for Dominik Brodowski <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: peterz@infradead.org, linux-kernel@vger.kernel.org,
jpoimboe@redhat.com, hpa@zytor.com, linux@dominikbrodowski.net,
viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
dvlasenk@redhat.com, mingo@kernel.org, luto@kernel.org,
brgerst@gmail.com, bp@alien8.de, torvalds@linux-foundation.org,
tglx@linutronix.de
Subject: [tip:x86/asm] syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
Date: Fri, 6 Apr 2018 10:11:17 -0700 [thread overview]
Message-ID: <tip-fa697140f9a20119a9ec8fd7460cc4314fbdaff3@git.kernel.org> (raw)
In-Reply-To: <20180405095307.3730-4-linux@dominikbrodowski.net>
Commit-ID: fa697140f9a20119a9ec8fd7460cc4314fbdaff3
Gitweb: https://git.kernel.org/tip/fa697140f9a20119a9ec8fd7460cc4314fbdaff3
Author: Dominik Brodowski <linux@dominikbrodowski.net>
AuthorDate: Thu, 5 Apr 2018 11:53:02 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 5 Apr 2018 16:59:26 +0200
syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
Each syscall defines a stub which takes struct pt_regs as its only
argument. It decodes just those parameters it needs, e.g:
asmlinkage long sys_xyzzy(const struct pt_regs *regs)
{
return SyS_xyzzy(regs->di, regs->si, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
For example, for sys_recv() which is a 4-parameter syscall, the assembly
now is (in slightly reordered fashion):
<sys_recv>:
callq <__fentry__>
/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx
[ SyS_recv() is automatically inlined by the compiler,
as it is not [yet] used anywhere else ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d
/* do the actual work */
callq __sys_recvfrom
/* cleanup and return */
cltq
retq
The only valid place in an x86-64 kernel which rightfully calls
a syscall function on its own -- vsyscall -- needs to be modified
to pass struct pt_regs onwards as well.
To keep the syscall table generation working independent of
SYSCALL_PTREGS being enabled, the stubs are named the same as the
"original" syscall stubs, i.e. sys_*().
This patch is based on an original proof-of-concept
| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
and to update the vsyscall to the new calling convention.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/Kconfig | 5 +++
arch/x86/entry/common.c | 4 ++
arch/x86/entry/syscall_64.c | 9 ++++-
arch/x86/entry/vsyscall/vsyscall_64.c | 22 +++++++++++
arch/x86/include/asm/syscall.h | 4 ++
arch/x86/include/asm/syscall_wrapper.h | 70 ++++++++++++++++++++++++++++++++++
arch/x86/include/asm/syscalls.h | 7 ++++
include/linux/syscalls.h | 2 +-
8 files changed, 120 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 27fede438959..67348efc2540 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2954,3 +2954,8 @@ source "crypto/Kconfig"
source "arch/x86/kvm/Kconfig"
source "lib/Kconfig"
+
+config SYSCALL_PTREGS
+ def_bool y
+ depends on X86_64 && !COMPAT
+ select ARCH_HAS_SYSCALL_WRAPPER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index a8b066dbbf48..e1b91bffa988 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -284,9 +284,13 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
nr &= __SYSCALL_MASK;
if (likely(nr < NR_syscalls)) {
nr = array_index_nospec(nr, NR_syscalls);
+#ifdef CONFIG_SYSCALL_PTREGS
+ regs->ax = sys_call_table[nr](regs);
+#else
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
+#endif
}
syscall_return_slowpath(regs);
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index c176d2fab1da..6197850adf91 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,19 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
+#ifdef CONFIG_SYSCALL_PTREGS
+/* this is a lie, but it does not hurt as sys_ni_syscall just returns -EINVAL */
+extern asmlinkage long sys_ni_syscall(const struct pt_regs *);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(const struct pt_regs *);
+#else /* CONFIG_SYSCALL_PTREGS */
+extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#endif /* CONFIG_SYSCALL_PTREGS */
#include <asm/syscalls_64.h>
#undef __SYSCALL_64
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
-extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-
asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 317be365bce3..05eebbf9b989 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -127,6 +127,9 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
int vsyscall_nr, syscall_nr, tmp;
int prev_sig_on_uaccess_err;
long ret;
+#ifdef CONFIG_SYSCALL_PTREGS
+ unsigned long orig_dx;
+#endif
/*
* No point in checking CS -- the only way to get here is a user mode
@@ -227,19 +230,38 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* this decodes regs->di and regs->si on its own */
+ ret = sys_gettimeofday(regs);
+#else
ret = sys_gettimeofday(
(struct timeval __user *)regs->di,
(struct timezone __user *)regs->si);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 1:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* this decodes regs->di on its own */
+ ret = sys_time(regs);
+#else
ret = sys_time((time_t __user *)regs->di);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 2:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* while we could clobber regs->dx, we didn't in the past... */
+ orig_dx = regs->dx;
+ regs->dx = 0;
+ /* this decodes regs->di, regs->si and regs->dx on its own */
+ ret = sys_getcpu(regs);
+ regs->dx = orig_dx;
+#else
ret = sys_getcpu((unsigned __user *)regs->di,
(unsigned __user *)regs->si,
NULL);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
}
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 03eedc21246d..17c62373a6f9 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -20,9 +20,13 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
+#ifdef CONFIG_SYSCALL_PTREGS
+typedef asmlinkage long (*sys_call_ptr_t)(const struct pt_regs *);
+#else
typedef asmlinkage long (*sys_call_ptr_t)(unsigned long, unsigned long,
unsigned long, unsigned long,
unsigned long, unsigned long);
+#endif /* CONFIG_SYSCALL_PTREGS */
extern const sys_call_ptr_t sys_call_table[];
#if defined(CONFIG_X86_32)
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
new file mode 100644
index 000000000000..702bdee377af
--- /dev/null
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_wrapper.h - x86 specific wrappers to syscall definitions
+ */
+
+#ifndef _ASM_X86_SYSCALL_WRAPPER_H
+#define _ASM_X86_SYSCALL_WRAPPER_H
+
+/*
+ * Instead of the generic __SYSCALL_DEFINEx() definition, this macro takes
+ * struct pt_regs *regs as the only argument of the syscall stub named
+ * sys_*(). It decodes just the registers it needs and passes them on to
+ * the SyS_*() wrapper and then to the SYSC_*() function doing the actual job.
+ * These wrappers and functions are inlined, meaning that the assembly looks
+ * as follows (slightly re-ordered):
+ *
+ * <sys_recv>: <-- syscall with 4 parameters
+ * callq <__fentry__>
+ *
+ * mov 0x70(%rdi),%rdi <-- decode regs->di
+ * mov 0x68(%rdi),%rsi <-- decode regs->si
+ * mov 0x60(%rdi),%rdx <-- decode regs->dx
+ * mov 0x38(%rdi),%rcx <-- decode regs->r10
+ *
+ * xor %r9d,%r9d <-- clear %r9
+ * xor %r8d,%r8d <-- clear %r8
+ *
+ * callq __sys_recvfrom <-- do the actual work in __sys_recvfrom()
+ * which takes 6 arguments
+ *
+ * cltq <-- extend return value to 64-bit
+ * retq <-- return
+ *
+ * This approach avoids leaking random user-provided register content down
+ * the call chain.
+ *
+ * As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
+ * obvious reasons, and passing struct pt_regs *regs to it in %rdi does not
+ * hurt, there is no need to override it.
+ */
+#define __SYSCALL_DEFINEx(x, name, ...) \
+ asmlinkage long sys##name(const struct pt_regs *regs); \
+ ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
+ static long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
+ static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+ asmlinkage long sys##name(const struct pt_regs *regs) \
+ { \
+ return SyS##name(__MAP(x,__SC_ARGS \
+ ,,regs->di,,regs->si,,regs->dx \
+ ,,regs->r10,,regs->r8,,regs->r9)); \
+ } \
+ static long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
+ { \
+ long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
+ __MAP(x,__SC_TEST,__VA_ARGS__); \
+ __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ return ret; \
+ } \
+ static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+
+/*
+ * For VSYSCALLS, we need to declare these three syscalls with the new
+ * pt_regs-based calling convention for in-kernel use.
+ */
+struct pt_regs;
+asmlinkage long sys_getcpu(const struct pt_regs *regs); /* di,si,dx */
+asmlinkage long sys_gettimeofday(const struct pt_regs *regs); /* di,si */
+asmlinkage long sys_time(const struct pt_regs *regs); /* di */
+
+#endif /* _ASM_X86_SYSCALL_WRAPPER_H */
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index ae6e05fdc24b..e4ad93c05f02 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -18,6 +18,12 @@
/* Common in X86_32 and X86_64 */
/* kernel/ioport.c */
long ksys_ioperm(unsigned long from, unsigned long num, int turn_on);
+
+#ifndef CONFIG_SYSCALL_PTREGS
+/*
+ * If CONFIG_SYSCALL_PTREGS is enabled, a different syscall calling convention
+ * is used. Do not include these -- invalid -- prototypes then
+ */
asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
asmlinkage long sys_iopl(unsigned int);
@@ -53,4 +59,5 @@ asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
unsigned long, unsigned long, unsigned long);
#endif /* CONFIG_X86_32 */
+#endif /* CONFIG_SYSCALL_PTREGS */
#endif /* _ASM_X86_SYSCALLS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 503ab245d4ce..d7168b3a4b4c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -102,7 +102,7 @@ union bpf_attr;
* for SYSCALL_DEFINE<n>/COMPAT_SYSCALL_DEFINE<n>
*/
#define __MAP0(m,...)
-#define __MAP1(m,t,a) m(t,a)
+#define __MAP1(m,t,a,...) m(t,a)
#define __MAP2(m,t,a,...) m(t,a), __MAP1(m,__VA_ARGS__)
#define __MAP3(m,t,a,...) m(t,a), __MAP2(m,__VA_ARGS__)
#define __MAP4(m,t,a,...) m(t,a), __MAP3(m,__VA_ARGS__)
next prev parent reply other threads:[~2018-04-06 17:12 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-05 9:52 [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
2018-04-05 9:53 ` [PATCH 1/8] x86: don't pointlessly reload the system call number Dominik Brodowski
2018-04-06 17:09 ` [tip:x86/asm] x86/syscalls: Don't " tip-bot for Linus Torvalds
2018-04-05 9:53 ` [PATCH 2/8] syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER Dominik Brodowski
2018-04-06 17:10 ` [tip:x86/asm] syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y tip-bot for Dominik Brodowski
2018-04-05 9:53 ` [PATCH 3/8] syscalls/x86: use struct pt_regs based syscall calling for 64-bit syscalls Dominik Brodowski
2018-04-06 17:11 ` tip-bot for Dominik Brodowski [this message]
2018-04-05 9:53 ` [PATCH 4/8] syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls Dominik Brodowski
2018-04-06 17:11 ` [tip:x86/asm] syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y " tip-bot for Dominik Brodowski
2018-04-05 9:53 ` [PATCH 5/8] syscalls/x86: use struct pt_regs based syscall calling for IA32_EMULATION and x32 Dominik Brodowski
2018-04-06 17:12 ` [tip:x86/asm] syscalls/x86: Use 'struct pt_regs' " tip-bot for Dominik Brodowski
2018-04-05 9:53 ` [PATCH 6/8] syscalls/x86: unconditionally enable struct pt_regs based syscalls on x86_64 Dominik Brodowski
2018-04-06 17:12 ` [tip:x86/asm] syscalls/x86: Unconditionally enable 'struct pt_regs' " tip-bot for Dominik Brodowski
2018-04-05 9:53 ` [PATCH 7/8] x86/entry/64: extend register clearing on syscall entry to lower registers Dominik Brodowski
2018-04-06 17:13 ` [tip:x86/asm] syscalls/x86: Extend " tip-bot for Dominik Brodowski
2018-04-05 9:53 ` [PATCH 8/8] syscalls/x86: rename struct pt_regs-based sys_*() to __sys_x86_*() Dominik Brodowski
2018-04-05 18:35 ` kbuild test robot
2018-04-05 15:19 ` [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 Ingo Molnar
2018-04-05 20:31 ` Dominik Brodowski
2018-04-06 8:23 ` Ingo Molnar
2018-04-06 8:31 ` Dominik Brodowski
2018-04-06 8:34 ` Dominik Brodowski
2018-04-06 9:20 ` Ingo Molnar
2018-04-06 9:34 ` Dominik Brodowski
2018-04-06 12:34 ` Ingo Molnar
2018-04-06 13:07 ` Dominik Brodowski
2018-04-06 17:03 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-fa697140f9a20119a9ec8fd7460cc4314fbdaff3@git.kernel.org \
--to=tipbot@zytor.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=linux@dominikbrodowski.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.