* [PATCH 000/109] remove in-kernel calls to syscalls
@ 2018-03-29 11:22 Dominik Brodowski
2018-03-29 11:23 ` [PATCH 044/109] mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c Dominik Brodowski
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, hmclauchlan, tautschn,
Amir Goldstein, Andi Kleen, Andrew Morton, Christoph Hellwig,
Darren Hart, David S . Miller, Eric W . Biederman,
H . Peter Anvin, Ingo Molnar, Jaswinder Singh, Jeff Dike,
Jiri Slaby, kexec, linux-fsdevel, linux-mm, linux-s390,
Luis R . Rodriguez, netdev, Peter Zijlstra, Thomas Gleixner,
user-mode-linux-devel, x86
[ While most parts of this patch set have been sent out already at least
once, I send out *all* patches to lkml once again as this whole series
touches several different subsystems in sensitive areas. ]
System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy()
should only be called from userspace via the syscall table, but not from
elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from v4.17
onwards to not call system call functions in the kernel: It is better to
use use a different calling convention for system calls there, where
struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
processing over to the actual syscall function. This means that only those
parameters which are actually needed for a specific syscall are passed on
during syscall entry, instead of filling in six CPU registers with random
user space content all the time (which may cause serious trouble down the
call chain).[*]
Moreover, rules on how data may be accessed may differ between kernel data
and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h.
Patches 1 to 101 have been sent out earlier, namely
- part 1 ( http://lkml.kernel.org/r/20180315190529.20943-1-linux@dominikbrodowski.net )
- part 2 ( http://lkml.kernel.org/r/20180316170614.5392-1-linux@dominikbrodowski.net )
- part 3 ( http://lkml.kernel.org/r/20180322090059.19361-1-linux@dominikbrodowski.net ).
Changes since these earlier versions are:
- I have added a lot more documentation and improved the commit messages,
namely to explain the naming convention and the rationale of this
patches.
- ACKs/Reviewed-by (thanks!) were added .
- Shuffle the patches around to have them grouped together systematically:
First goes a patch which defines the goal and explains the rationale:
syscalls: define and explain goal to not call syscalls in the kernel
A few codepaths can trivially be converted to existing in-kernel interfaces:
kernel: use kernel_wait4() instead of sys_wait4()
kernel: open-code sys_rt_sigpending() in sys_sigpending()
kexec: call do_kexec_load() in compat syscall directly
mm: use do_futex() instead of sys_futex() in mm_release()
x86: use _do_fork() in compat_sys_x86_clone()
x86: remove compat_sys_x86_waitpid()
Then follow many patches which only affect specfic subsystems each, and
replace sys_*() with internal helpers named __sys_*() or do_sys_*(). Let's
start with net/:
net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall
net: socket: add __sys_sendto() helper; remove in-kernel call to syscall
net: socket: add __sys_accept4() helper; remove in-kernel call to syscall
net: socket: add __sys_socket() helper; remove in-kernel call to syscall
net: socket: add __sys_bind() helper; remove in-kernel call to syscall
net: socket: add __sys_connect() helper; remove in-kernel call to syscall
net: socket: add __sys_listen() helper; remove in-kernel call to syscall
net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall
net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall
net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall
net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall
net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall
net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall
net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall
net: socket: move check for forbid_cmsg_compat to __sys_...msg()
net: socket: replace calls to sys_send() with __sys_sendto()
net: socket: replace call to sys_recv() with __sys_recvfrom()
net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to compat syscall
net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to compat syscall
net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to compat syscall
net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to compat syscall
net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls
The changes in ipc/ are limited to this specific subsystem. The wrappers are
named ksys_*() to denote that these functions are meant as a drop-in replacement
for the syscalls.
ipc: add semtimedop syscall/compat_syscall wrappers
ipc: add semget syscall wrapper
ipc: add semctl syscall/compat_syscall wrappers
ipc: add msgget syscall wrapper
ipc: add shmget syscall wrapper
ipc: add shmdt syscall wrapper
ipc: add shmctl syscall/compat_syscall wrappers
ipc: add msgctl syscall/compat_syscall wrappers
ipc: add msgrcv syscall/compat_syscall wrappers
ipc: add msgsnd syscall/compat_syscall wrappers
A few mindless conversions in kernel/ and mm/:
kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
kernel: add do_compat_sigaltstack() helper; remove in-kernel call to compat syscall
kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c
sched: add do_sched_yield() helper; remove in-kernel call to sched_yield()
mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c
mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c
mm: add kernel_mbind() helper; remove in-kernel call to syscall
mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls
Then, let's handle those instances internal to fs/ which call syscalls:
fs: add do_readlinkat() helper; remove internal call to sys_readlinkat()
fs: add do_pipe2() helper; remove internal call to sys_pipe2()
fs: add do_renameat2() helper; remove internal call to sys_renameat2()
fs: add do_futimesat() helper; remove internal call to sys_futimesat()
fs: add do_epoll_*() helpers; remove internal calls to sys_epoll_*()
fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4()
fs: add do_eventfd() helper; remove internal call to sys_eventfd()
fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall
fs: add do_vmsplice() helper; remove in-kernel call to syscall
fs: add kern_select() helper; remove in-kernel call to sys_select()
fs: add do_compat_fcntl64() helper; remove in-kernel call to compat syscall
fs: add do_compat_select() helper; remove in-kernel call to compat syscall
fs: add do_compat_signalfd4() helper; remove in-kernel call to compat syscall
fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall
inotify: add do_inotify_init() helper; remove in-kernel call to syscall
fanotify: add do_fanotify_mark() helper; remove in-kernel call to syscall
fs/quota: add kernel_quotactl() helper; remove in-kernel call to syscall
fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl()
Several fs- and some mm-related syscalls are called in initramfs, initrd and
init, devtmpfs, and pm code. While at least many of these instances should be
converted to use proper in-kernel VFS interfaces in future, convert them
mindlessly to ksys_*() helpers or wrappers for now.
fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
fs: add ksys_write() helper; remove in-kernel calls to sys_write()
fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
hostfs: rename do_rmdir() to hostfs_do_rmdir()
fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall
fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall
fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall
fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall
fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall
fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers
fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate()
fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64()
fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl()
fs: add ksys_lseek() helper; remove in-kernel calls to sys_lseek()
fs: add ksys_read() helper; remove in-kernel calls to sys_read()
fs: add ksys_sync() helper; remove in-kernel calls to sys_sync()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
To reach the goal to get rid of all in-kernel calls to syscalls for x86, we
need to handle a few further syscalls called from compat syscalls in x86 and
(mostly) from other architectures. Those could be made generic making use of
Al Viro's macro trickery. For v4.17, I'd suggest to keep it simple:
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
Then, throw in two fixes for x86:
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/sigreturn: use SYSCALL_DEFINE0 (by Michael Tautschnig)
... and clean up the three places where many syscalls are referenced or
prototyped (kernel/sys_ni.c, include/linux/syscalls.h and
include/linux/compat.h):
kexec: move sys_kexec_load() prototype to syscalls.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/compat.h
syscalls/x86: auto-create compat_sys_*() prototypes
kernel/sys_ni: sort cond_syscall() entries
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
Last but not least, add a patch by Howard McLauchlan to whitelist all syscalls
for error injection:
bpf: whitelist all syscalls for error injection (by Howard McLauchlan)
Tze whole series is available at
https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
and I intend to push this upstream early in the v4.17-rc1 cycle.
Thanks,
Dominik
Documentation/process/adding-syscalls.rst | 34 +-
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm/kernel/sys_arm.c | 2 +-
arch/arm64/kernel/sys.c | 2 +-
arch/ia64/kernel/sys_ia64.c | 4 +-
arch/m68k/kernel/sys_m68k.c | 2 +-
arch/microblaze/kernel/sys_microblaze.c | 6 +-
arch/mips/kernel/linux32.c | 22 +-
arch/mips/kernel/syscall.c | 6 +-
arch/parisc/kernel/sys_parisc.c | 30 +-
arch/powerpc/kernel/sys_ppc32.c | 18 +-
arch/powerpc/kernel/syscalls.c | 6 +-
arch/riscv/kernel/sys_riscv.c | 4 +-
arch/s390/kernel/compat_linux.c | 37 +-
arch/s390/kernel/sys_s390.c | 2 +-
arch/sh/kernel/sys_sh.c | 4 +-
arch/sh/kernel/sys_sh32.c | 12 +-
arch/sparc/kernel/setup_32.c | 2 +-
arch/sparc/kernel/sys_sparc32.c | 26 +-
arch/sparc/kernel/sys_sparc_32.c | 6 +-
arch/sparc/kernel/sys_sparc_64.c | 2 +-
arch/um/kernel/syscall.c | 2 +-
arch/x86/entry/syscalls/syscall_32.tbl | 4 +-
arch/x86/ia32/ia32_signal.c | 1 -
arch/x86/ia32/sys_ia32.c | 50 +-
arch/x86/include/asm/sys_ia32.h | 67 --
arch/x86/include/asm/syscalls.h | 3 +-
arch/x86/kernel/ioport.c | 7 +-
arch/x86/kernel/signal.c | 5 +-
arch/x86/kernel/sys_x86_64.c | 2 +-
arch/xtensa/kernel/syscall.c | 2 +-
drivers/base/devtmpfs.c | 11 +-
drivers/tty/sysrq.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 6 +-
fs/autofs4/dev-ioctl.c | 2 +-
fs/binfmt_misc.c | 2 +-
fs/dcookies.c | 11 +-
fs/eventfd.c | 9 +-
fs/eventpoll.c | 23 +-
fs/fcntl.c | 12 +-
fs/file.c | 17 +-
fs/hostfs/hostfs.h | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hostfs/hostfs_user.c | 2 +-
fs/internal.h | 14 +
fs/ioctl.c | 7 +-
fs/namei.c | 61 +-
fs/namespace.c | 19 +-
fs/notify/fanotify/fanotify_user.c | 14 +-
fs/notify/inotify/inotify_user.c | 9 +-
fs/open.c | 77 +-
fs/pipe.c | 9 +-
fs/quota/compat.c | 13 +-
fs/quota/quota.c | 10 +-
fs/read_write.c | 45 +-
fs/readdir.c | 11 +-
fs/select.c | 29 +-
fs/signalfd.c | 31 +-
fs/splice.c | 12 +-
fs/stat.c | 12 +-
fs/sync.c | 19 +-
fs/utimes.c | 25 +-
include/linux/compat.h | 644 ++++++------
include/linux/futex.h | 13 +-
include/linux/kexec.h | 4 -
include/linux/quotaops.h | 3 +
include/linux/socket.h | 37 +-
include/linux/syscalls.h | 1511 +++++++++++++++++------------
include/net/compat.h | 11 -
init/do_mounts.c | 26 +-
init/do_mounts.h | 4 +-
init/do_mounts_initrd.c | 42 +-
init/do_mounts_md.c | 29 +-
init/do_mounts_rd.c | 40 +-
init/initramfs.c | 52 +-
init/main.c | 9 +-
init/noinitramfs.c | 6 +-
ipc/msg.c | 60 +-
ipc/sem.c | 44 +-
ipc/shm.c | 28 +-
ipc/syscall.c | 58 +-
ipc/util.h | 31 +
kernel/compat.c | 55 --
kernel/exit.c | 2 +-
kernel/fork.c | 11 +-
kernel/kexec.c | 52 +-
kernel/pid_namespace.c | 6 +-
kernel/power/hibernate.c | 2 +-
kernel/power/suspend.c | 2 +-
kernel/power/user.c | 2 +-
kernel/sched/core.c | 8 +-
kernel/signal.c | 29 +-
kernel/sys.c | 74 +-
kernel/sys_ni.c | 617 +++++++-----
kernel/uid16.c | 25 +-
kernel/uid16.h | 14 +
kernel/umh.c | 4 +-
mm/fadvise.c | 10 +-
mm/mempolicy.c | 92 +-
mm/migrate.c | 39 +-
mm/mmap.c | 17 +-
mm/nommu.c | 17 +-
mm/readahead.c | 7 +-
net/compat.c | 136 ++-
net/socket.c | 234 +++--
105 files changed, 3129 insertions(+), 1868 deletions(-)
delete mode 100644 arch/x86/include/asm/sys_ia32.h
create mode 100644 kernel/uid16.h
[*] An early, not-yet-ready version and partly untested (i386, x32) of the
patches required to implement this on top of this series is available at
https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
--
2.16.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 044/109] mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
@ 2018-03-29 11:23 ` Dominik Brodowski
2018-03-29 11:23 ` [PATCH 045/109] mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c Dominik Brodowski
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:23 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Al Viro, linux-mm,
Andrew Morton
Move compat_sys_migrate_pages() to mm/mempolicy.c and make it call a newly
introduced helper -- kernel_migrate_pages() -- instead of the syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
kernel/compat.c | 33 ---------------------------------
mm/mempolicy.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/kernel/compat.c b/kernel/compat.c
index 3f5fa8902e7d..51bdf1808943 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -508,39 +508,6 @@ COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
}
return sys_move_pages(pid, nr_pages, pages, nodes, status, flags);
}
-
-COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
- compat_ulong_t, maxnode,
- const compat_ulong_t __user *, old_nodes,
- const compat_ulong_t __user *, new_nodes)
-{
- unsigned long __user *old = NULL;
- unsigned long __user *new = NULL;
- nodemask_t tmp_mask;
- unsigned long nr_bits;
- unsigned long size;
-
- nr_bits = min_t(unsigned long, maxnode - 1, MAX_NUMNODES);
- size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
- if (old_nodes) {
- if (compat_get_bitmap(nodes_addr(tmp_mask), old_nodes, nr_bits))
- return -EFAULT;
- old = compat_alloc_user_space(new_nodes ? size * 2 : size);
- if (new_nodes)
- new = old + size / sizeof(unsigned long);
- if (copy_to_user(old, nodes_addr(tmp_mask), size))
- return -EFAULT;
- }
- if (new_nodes) {
- if (compat_get_bitmap(nodes_addr(tmp_mask), new_nodes, nr_bits))
- return -EFAULT;
- if (new == NULL)
- new = compat_alloc_user_space(size);
- if (copy_to_user(new, nodes_addr(tmp_mask), size))
- return -EFAULT;
- }
- return sys_migrate_pages(pid, nr_bits + 1, old, new);
-}
#endif
/*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d879f1d8a44a..7399ede02b5f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1377,9 +1377,9 @@ SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
return do_set_mempolicy(mode, flags, &nodes);
}
-SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
- const unsigned long __user *, old_nodes,
- const unsigned long __user *, new_nodes)
+static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
+ const unsigned long __user *old_nodes,
+ const unsigned long __user *new_nodes)
{
struct mm_struct *mm = NULL;
struct task_struct *task;
@@ -1469,6 +1469,13 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
}
+SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
+ const unsigned long __user *, old_nodes,
+ const unsigned long __user *, new_nodes)
+{
+ return kernel_migrate_pages(pid, maxnode, old_nodes, new_nodes);
+}
+
/* Retrieve NUMA policy */
SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
@@ -1571,7 +1578,40 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
return sys_mbind(start, len, mode, nm, nr_bits+1, flags);
}
-#endif
+COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
+ compat_ulong_t, maxnode,
+ const compat_ulong_t __user *, old_nodes,
+ const compat_ulong_t __user *, new_nodes)
+{
+ unsigned long __user *old = NULL;
+ unsigned long __user *new = NULL;
+ nodemask_t tmp_mask;
+ unsigned long nr_bits;
+ unsigned long size;
+
+ nr_bits = min_t(unsigned long, maxnode - 1, MAX_NUMNODES);
+ size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+ if (old_nodes) {
+ if (compat_get_bitmap(nodes_addr(tmp_mask), old_nodes, nr_bits))
+ return -EFAULT;
+ old = compat_alloc_user_space(new_nodes ? size * 2 : size);
+ if (new_nodes)
+ new = old + size / sizeof(unsigned long);
+ if (copy_to_user(old, nodes_addr(tmp_mask), size))
+ return -EFAULT;
+ }
+ if (new_nodes) {
+ if (compat_get_bitmap(nodes_addr(tmp_mask), new_nodes, nr_bits))
+ return -EFAULT;
+ if (new == NULL)
+ new = compat_alloc_user_space(size);
+ if (copy_to_user(new, nodes_addr(tmp_mask), size))
+ return -EFAULT;
+ }
+ return kernel_migrate_pages(pid, nr_bits + 1, old, new);
+}
+
+#endif /* CONFIG_COMPAT */
struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
unsigned long addr)
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 045/109] mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
2018-03-29 11:23 ` [PATCH 044/109] mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c Dominik Brodowski
@ 2018-03-29 11:23 ` Dominik Brodowski
2018-03-29 11:23 ` [PATCH 046/109] mm: add kernel_mbind() helper; remove in-kernel call to syscall Dominik Brodowski
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:23 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Al Viro, linux-mm,
Andrew Morton
Move compat_sys_move_pages() to mm/migrate.c and make it call a newly
introduced helper -- kernel_move_pages() -- instead of the syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
kernel/compat.c | 22 ----------------------
mm/migrate.c | 39 +++++++++++++++++++++++++++++++++++----
2 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/kernel/compat.c b/kernel/compat.c
index 51bdf1808943..6d21894806b4 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -488,28 +488,6 @@ get_compat_sigset(sigset_t *set, const compat_sigset_t __user *compat)
}
EXPORT_SYMBOL_GPL(get_compat_sigset);
-#ifdef CONFIG_NUMA
-COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
- compat_uptr_t __user *, pages32,
- const int __user *, nodes,
- int __user *, status,
- int, flags)
-{
- const void __user * __user *pages;
- int i;
-
- pages = compat_alloc_user_space(nr_pages * sizeof(void *));
- for (i = 0; i < nr_pages; i++) {
- compat_uptr_t p;
-
- if (get_user(p, pages32 + i) ||
- put_user(compat_ptr(p), pages + i))
- return -EFAULT;
- }
- return sys_move_pages(pid, nr_pages, pages, nodes, status, flags);
-}
-#endif
-
/*
* Allocate user-space memory for the duration of a single system call,
* in order to marshall parameters inside a compat thunk.
diff --git a/mm/migrate.c b/mm/migrate.c
index 1e5525a25691..003886606a22 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -34,6 +34,7 @@
#include <linux/backing-dev.h>
#include <linux/compaction.h>
#include <linux/syscalls.h>
+#include <linux/compat.h>
#include <linux/hugetlb.h>
#include <linux/hugetlb_cgroup.h>
#include <linux/gfp.h>
@@ -1745,10 +1746,10 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
* Move a list of pages in the address space of the currently executing
* process.
*/
-SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
- const void __user * __user *, pages,
- const int __user *, nodes,
- int __user *, status, int, flags)
+static int kernel_move_pages(pid_t pid, unsigned long nr_pages,
+ const void __user * __user *pages,
+ const int __user *nodes,
+ int __user *status, int flags)
{
struct task_struct *task;
struct mm_struct *mm;
@@ -1807,6 +1808,36 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
return err;
}
+SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
+ const void __user * __user *, pages,
+ const int __user *, nodes,
+ int __user *, status, int, flags)
+{
+ return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags);
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
+ compat_uptr_t __user *, pages32,
+ const int __user *, nodes,
+ int __user *, status,
+ int, flags)
+{
+ const void __user * __user *pages;
+ int i;
+
+ pages = compat_alloc_user_space(nr_pages * sizeof(void *));
+ for (i = 0; i < nr_pages; i++) {
+ compat_uptr_t p;
+
+ if (get_user(p, pages32 + i) ||
+ put_user(compat_ptr(p), pages + i))
+ return -EFAULT;
+ }
+ return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags);
+}
+#endif /* CONFIG_COMPAT */
+
#ifdef CONFIG_NUMA_BALANCING
/*
* Returns true if this is a safe migration target node for misplaced NUMA
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 046/109] mm: add kernel_mbind() helper; remove in-kernel call to syscall
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
2018-03-29 11:23 ` [PATCH 044/109] mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c Dominik Brodowski
2018-03-29 11:23 ` [PATCH 045/109] mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c Dominik Brodowski
@ 2018-03-29 11:23 ` Dominik Brodowski
2018-03-29 11:23 ` [PATCH 047/109] mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls Dominik Brodowski
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:23 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Al Viro, linux-mm,
Andrew Morton
Using the mm-internal kernel_mbind() helper allows us to get rid of the
mm-internal call to the sys_mbind() syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
mm/mempolicy.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7399ede02b5f..e4d7d4c0b253 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1336,9 +1336,9 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode,
return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0;
}
-SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
- unsigned long, mode, const unsigned long __user *, nmask,
- unsigned long, maxnode, unsigned, flags)
+static long kernel_mbind(unsigned long start, unsigned long len,
+ unsigned long mode, const unsigned long __user *nmask,
+ unsigned long maxnode, unsigned int flags)
{
nodemask_t nodes;
int err;
@@ -1357,6 +1357,13 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
return do_mbind(start, len, mode, mode_flags, &nodes, flags);
}
+SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
+ unsigned long, mode, const unsigned long __user *, nmask,
+ unsigned long, maxnode, unsigned int, flags)
+{
+ return kernel_mbind(start, len, mode, nmask, maxnode, flags);
+}
+
/* Set the process memory policy */
SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
unsigned long, maxnode)
@@ -1575,7 +1582,7 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
return -EFAULT;
}
- return sys_mbind(start, len, mode, nm, nr_bits+1, flags);
+ return kernel_mbind(start, len, mode, nm, nr_bits+1, flags);
}
COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 047/109] mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
` (2 preceding siblings ...)
2018-03-29 11:23 ` [PATCH 046/109] mm: add kernel_mbind() helper; remove in-kernel call to syscall Dominik Brodowski
@ 2018-03-29 11:23 ` Dominik Brodowski
2018-03-29 11:24 ` [PATCH 096/109] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:23 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Al Viro, linux-mm,
Andrew Morton
Using the mm-internal kernel_[sg]et_mempolicy() helper allows us to get
rid of the mm-internal calls to the sys_[sg]et_mempolicy() syscalls.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
mm/mempolicy.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e4d7d4c0b253..ca817e768d0e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1365,8 +1365,8 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
}
/* Set the process memory policy */
-SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
- unsigned long, maxnode)
+static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask,
+ unsigned long maxnode)
{
int err;
nodemask_t nodes;
@@ -1384,6 +1384,12 @@ SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
return do_set_mempolicy(mode, flags, &nodes);
}
+SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
+ unsigned long, maxnode)
+{
+ return kernel_set_mempolicy(mode, nmask, maxnode);
+}
+
static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
const unsigned long __user *old_nodes,
const unsigned long __user *new_nodes)
@@ -1485,9 +1491,11 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
/* Retrieve NUMA policy */
-SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
- unsigned long __user *, nmask, unsigned long, maxnode,
- unsigned long, addr, unsigned long, flags)
+static int kernel_get_mempolicy(int __user *policy,
+ unsigned long __user *nmask,
+ unsigned long maxnode,
+ unsigned long addr,
+ unsigned long flags)
{
int err;
int uninitialized_var(pval);
@@ -1510,6 +1518,13 @@ SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
return err;
}
+SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
+ unsigned long __user *, nmask, unsigned long, maxnode,
+ unsigned long, addr, unsigned long, flags)
+{
+ return kernel_get_mempolicy(policy, nmask, maxnode, addr, flags);
+}
+
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
@@ -1528,7 +1543,7 @@ COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
if (nmask)
nm = compat_alloc_user_space(alloc_size);
- err = sys_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
+ err = kernel_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
if (!err && nmask) {
unsigned long copy_size;
@@ -1560,7 +1575,7 @@ COMPAT_SYSCALL_DEFINE3(set_mempolicy, int, mode, compat_ulong_t __user *, nmask,
return -EFAULT;
}
- return sys_set_mempolicy(mode, nm, nr_bits+1);
+ return kernel_set_mempolicy(mode, nm, nr_bits+1);
}
COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 096/109] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
` (3 preceding siblings ...)
2018-03-29 11:23 ` [PATCH 047/109] mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls Dominik Brodowski
@ 2018-03-29 11:24 ` Dominik Brodowski
2018-03-29 11:24 ` [PATCH 097/109] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:24 UTC (permalink / raw)
To: linux-kernel; +Cc: viro, torvalds, arnd, linux-arch, Andrew Morton, linux-mm
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel
calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that
this function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as ksys_fadvise64_64().
Some compat stubs called sys_fadvise64(), which then just passed through
the arguments to sys_fadvise64_64(). Get rid of this indirection, and call
ksys_fadvise64_64() directly.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/arm/kernel/sys_arm.c | 2 +-
arch/mips/kernel/linux32.c | 2 +-
arch/parisc/kernel/sys_parisc.c | 2 +-
arch/powerpc/kernel/sys_ppc32.c | 4 ++--
arch/powerpc/kernel/syscalls.c | 4 ++--
arch/s390/kernel/compat_linux.c | 5 +++--
arch/sh/kernel/sys_sh32.c | 8 ++++----
arch/sparc/kernel/sys_sparc32.c | 10 +++++-----
arch/x86/ia32/sys_ia32.c | 12 ++++++------
arch/xtensa/kernel/syscall.c | 2 +-
include/linux/syscalls.h | 9 +++++++++
mm/fadvise.c | 10 ++++++++--
12 files changed, 43 insertions(+), 27 deletions(-)
diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index 3151f5623d0e..bdf7514204ab 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -35,5 +35,5 @@
asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
loff_t offset, loff_t len)
{
- return sys_fadvise64_64(fd, offset, len, advice);
+ return ksys_fadvise64_64(fd, offset, len, advice);
}
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 0779d474c8ad..1c5785e72db4 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -149,7 +149,7 @@ asmlinkage long sys32_fadvise64_64(int fd, int __pad,
unsigned long a4, unsigned long a5,
int flags)
{
- return sys_fadvise64_64(fd,
+ return ksys_fadvise64_64(fd,
merge_64(a2, a3), merge_64(a4, a5),
flags);
}
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 588fab336ddd..f36ab1f09595 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -352,7 +352,7 @@ asmlinkage long parisc_fadvise64_64(int fd,
unsigned int high_off, unsigned int low_off,
unsigned int high_len, unsigned int low_len, int advice)
{
- return sys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
+ return ksys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
(loff_t)high_len << 32 | low_len, advice);
}
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 68f11e1065f8..0b95fa13307f 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -113,8 +113,8 @@ asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long h
long ppc32_fadvise64(int fd, u32 unused, u32 offset_high, u32 offset_low,
size_t len, int advice)
{
- return sys_fadvise64(fd, (u64)offset_high << 32 | offset_low, len,
- advice);
+ return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low, len,
+ advice);
}
asmlinkage long compat_sys_sync_file_range2(int fd, unsigned int flags,
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index a877bf8269fe..ecb981eea74b 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -119,8 +119,8 @@ long ppc64_personality(unsigned long personality)
long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
u32 len_high, u32 len_low)
{
- return sys_fadvise64(fd, (u64)offset_high << 32 | offset_low,
- (u64)len_high << 32 | len_low, advice);
+ return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low,
+ (u64)len_high << 32 | len_low, advice);
}
long sys_switch_endian(void)
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 039858f9f128..9bb897e443a6 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -483,7 +483,8 @@ COMPAT_SYSCALL_DEFINE5(s390_fadvise64, int, fd, u32, high, u32, low, compat_size
advise = POSIX_FADV_DONTNEED;
else if (advise == 5)
advise = POSIX_FADV_NOREUSE;
- return sys_fadvise64(fd, (unsigned long)high << 32 | low, len, advise);
+ return ksys_fadvise64_64(fd, (unsigned long)high << 32 | low, len,
+ advise);
}
struct fadvise64_64_args {
@@ -503,7 +504,7 @@ COMPAT_SYSCALL_DEFINE1(s390_fadvise64_64, struct fadvise64_64_args __user *, arg
a.advice = POSIX_FADV_DONTNEED;
else if (a.advice == 5)
a.advice = POSIX_FADV_NOREUSE;
- return sys_fadvise64_64(a.fd, a.offset, a.len, a.advice);
+ return ksys_fadvise64_64(a.fd, a.offset, a.len, a.advice);
}
COMPAT_SYSCALL_DEFINE6(s390_sync_file_range, int, fd, u32, offhigh, u32, offlow,
diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
index c37ee3d0c803..9dca568509a5 100644
--- a/arch/sh/kernel/sys_sh32.c
+++ b/arch/sh/kernel/sys_sh32.c
@@ -52,10 +52,10 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
u32 len0, u32 len1, int advice)
{
#ifdef __LITTLE_ENDIAN__
- return sys_fadvise64_64(fd, (u64)offset1 << 32 | offset0,
- (u64)len1 << 32 | len0, advice);
+ return ksys_fadvise64_64(fd, (u64)offset1 << 32 | offset0,
+ (u64)len1 << 32 | len0, advice);
#else
- return sys_fadvise64_64(fd, (u64)offset0 << 32 | offset1,
- (u64)len0 << 32 | len1, advice);
+ return ksys_fadvise64_64(fd, (u64)offset0 << 32 | offset1,
+ (u64)len0 << 32 | len1, advice);
#endif
}
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 4ba62d676632..4da66aed50b4 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -225,7 +225,7 @@ long compat_sys_fadvise64(int fd,
unsigned long offlo,
compat_size_t len, int advice)
{
- return sys_fadvise64_64(fd, (offhi << 32) | offlo, len, advice);
+ return ksys_fadvise64_64(fd, (offhi << 32) | offlo, len, advice);
}
long compat_sys_fadvise64_64(int fd,
@@ -233,10 +233,10 @@ long compat_sys_fadvise64_64(int fd,
unsigned long lenhi, unsigned long lenlo,
int advice)
{
- return sys_fadvise64_64(fd,
- (offhi << 32) | offlo,
- (lenhi << 32) | lenlo,
- advice);
+ return ksys_fadvise64_64(fd,
+ (offhi << 32) | offlo,
+ (lenhi << 32) | lenlo,
+ advice);
}
long sys32_sync_file_range(unsigned int fd, unsigned long off_high, unsigned long off_low, unsigned long nb_high, unsigned long nb_low, unsigned int flags)
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index df2acb13623f..401bd8ec9cf0 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -194,10 +194,10 @@ COMPAT_SYSCALL_DEFINE6(x86_fadvise64_64, int, fd, __u32, offset_low,
__u32, offset_high, __u32, len_low, __u32, len_high,
int, advice)
{
- return sys_fadvise64_64(fd,
- (((u64)offset_high)<<32) | offset_low,
- (((u64)len_high)<<32) | len_low,
- advice);
+ return ksys_fadvise64_64(fd,
+ (((u64)offset_high)<<32) | offset_low,
+ (((u64)len_high)<<32) | len_low,
+ advice);
}
COMPAT_SYSCALL_DEFINE4(x86_readahead, int, fd, unsigned int, off_lo,
@@ -218,8 +218,8 @@ COMPAT_SYSCALL_DEFINE6(x86_sync_file_range, int, fd, unsigned int, off_low,
COMPAT_SYSCALL_DEFINE5(x86_fadvise64, int, fd, unsigned int, offset_lo,
unsigned int, offset_hi, size_t, len, int, advice)
{
- return sys_fadvise64_64(fd, ((u64)offset_hi << 32) | offset_lo,
- len, advice);
+ return ksys_fadvise64_64(fd, ((u64)offset_hi << 32) | offset_lo,
+ len, advice);
}
COMPAT_SYSCALL_DEFINE6(x86_fallocate, int, fd, int, mode,
diff --git a/arch/xtensa/kernel/syscall.c b/arch/xtensa/kernel/syscall.c
index 74afbf02d07e..8201748da05b 100644
--- a/arch/xtensa/kernel/syscall.c
+++ b/arch/xtensa/kernel/syscall.c
@@ -55,7 +55,7 @@ asmlinkage long xtensa_shmat(int shmid, char __user *shmaddr, int shmflg)
asmlinkage long xtensa_fadvise64_64(int fd, int advice,
unsigned long long offset, unsigned long long len)
{
- return sys_fadvise64_64(fd, offset, len, advice);
+ return ksys_fadvise64_64(fd, offset, len, advice);
}
#ifdef CONFIG_MMU
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 613b8127834d..466d408deefd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -970,6 +970,15 @@ ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
size_t count, loff_t pos);
int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len);
+#ifdef CONFIG_ADVISE_SYSCALLS
+int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice);
+#else
+static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
+ int advice)
+{
+ return -EINVAL;
+}
+#endif
/*
* The following kernel syscall equivalents are just wrappers to fs-internal
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 767887f5f3bf..afa41491d324 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -26,7 +26,8 @@
* POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could
* deactivate the pages and clear PG_Referenced.
*/
-SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
+
+int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
{
struct fd f = fdget(fd);
struct inode *inode;
@@ -185,11 +186,16 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
return ret;
}
+SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
+{
+ return ksys_fadvise64_64(fd, offset, len, advice);
+}
+
#ifdef __ARCH_WANT_SYS_FADVISE64
SYSCALL_DEFINE4(fadvise64, int, fd, loff_t, offset, size_t, len, int, advice)
{
- return sys_fadvise64_64(fd, offset, len, advice);
+ return ksys_fadvise64_64(fd, offset, len, advice);
}
#endif
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 097/109] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
` (4 preceding siblings ...)
2018-03-29 11:24 ` [PATCH 096/109] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
@ 2018-03-29 11:24 ` Dominik Brodowski
2018-03-29 11:24 ` [PATCH 098/109] mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Dominik Brodowski
2018-03-29 14:20 ` [PATCH 000/109] remove in-kernel calls to syscalls Matthew Wilcox
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:24 UTC (permalink / raw)
To: linux-kernel; +Cc: viro, torvalds, arnd, linux-arch, Andrew Morton, linux-mm
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm64/kernel/sys.c | 2 +-
arch/ia64/kernel/sys_ia64.c | 4 ++--
arch/m68k/kernel/sys_m68k.c | 2 +-
arch/microblaze/kernel/sys_microblaze.c | 6 +++---
arch/mips/kernel/linux32.c | 4 ++--
arch/mips/kernel/syscall.c | 6 ++++--
arch/parisc/kernel/sys_parisc.c | 6 +++---
arch/powerpc/kernel/syscalls.c | 2 +-
arch/riscv/kernel/sys_riscv.c | 4 ++--
arch/s390/kernel/compat_linux.c | 6 +++---
arch/s390/kernel/sys_s390.c | 2 +-
arch/sh/kernel/sys_sh.c | 4 ++--
arch/sparc/kernel/sys_sparc_32.c | 6 +++---
arch/sparc/kernel/sys_sparc_64.c | 2 +-
arch/um/kernel/syscall.c | 2 +-
arch/x86/ia32/sys_ia32.c | 2 +-
arch/x86/kernel/sys_x86_64.c | 2 +-
include/linux/syscalls.h | 3 +++
mm/mmap.c | 17 ++++++++++++-----
mm/nommu.c | 17 ++++++++++++-----
21 files changed, 60 insertions(+), 41 deletions(-)
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index fa1a392ca9a2..89faa6f4de47 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -189,7 +189,7 @@ SYSCALL_DEFINE6(osf_mmap, unsigned long, addr, unsigned long, len,
goto out;
if (off & ~PAGE_MASK)
goto out;
- ret = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ ret = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
out:
return ret;
}
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
index 26fe8ea93ea2..72981bae10eb 100644
--- a/arch/arm64/kernel/sys.c
+++ b/arch/arm64/kernel/sys.c
@@ -34,7 +34,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
if (offset_in_page(off) != 0)
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
}
SYSCALL_DEFINE1(arm64_personality, unsigned int, personality)
diff --git a/arch/ia64/kernel/sys_ia64.c b/arch/ia64/kernel/sys_ia64.c
index 085adfcc74a4..9ebe1d633abc 100644
--- a/arch/ia64/kernel/sys_ia64.c
+++ b/arch/ia64/kernel/sys_ia64.c
@@ -139,7 +139,7 @@ int ia64_mmap_check(unsigned long addr, unsigned long len,
asmlinkage unsigned long
sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, long pgoff)
{
- addr = sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+ addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
if (!IS_ERR((void *) addr))
force_successful_syscall_return();
return addr;
@@ -151,7 +151,7 @@ sys_mmap (unsigned long addr, unsigned long len, int prot, int flags, int fd, lo
if (offset_in_page(off) != 0)
return -EINVAL;
- addr = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
if (!IS_ERR((void *) addr))
force_successful_syscall_return();
return addr;
diff --git a/arch/m68k/kernel/sys_m68k.c b/arch/m68k/kernel/sys_m68k.c
index 27e10af5153a..6363ec83a290 100644
--- a/arch/m68k/kernel/sys_m68k.c
+++ b/arch/m68k/kernel/sys_m68k.c
@@ -46,7 +46,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
* so we need to shift the argument down by 1; m68k mmap64(3)
* (in libc) expects the last argument of mmap2 in 4Kb units.
*/
- return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
}
/* Convert virtual (user) address VADDR to physical address PADDR */
diff --git a/arch/microblaze/kernel/sys_microblaze.c b/arch/microblaze/kernel/sys_microblaze.c
index f1e1f666ddde..ed9f34da1a2a 100644
--- a/arch/microblaze/kernel/sys_microblaze.c
+++ b/arch/microblaze/kernel/sys_microblaze.c
@@ -40,7 +40,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
if (pgoff & ~PAGE_MASK)
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
}
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
@@ -50,6 +50,6 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
if (pgoff & (~PAGE_MASK >> 12))
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
- pgoff >> (PAGE_SHIFT - 12));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ pgoff >> (PAGE_SHIFT - 12));
}
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 1c5785e72db4..0571ab7b68b0 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -67,8 +67,8 @@ SYSCALL_DEFINE6(32_mmap2, unsigned long, addr, unsigned long, len,
{
if (pgoff & (~PAGE_MASK >> 12))
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
- pgoff >> (PAGE_SHIFT-12));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ pgoff >> (PAGE_SHIFT-12));
}
#define RLIM_INFINITY32 0x7fffffff
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index 58c6f634b550..69c17b549fd3 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -63,7 +63,8 @@ SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
{
if (offset & ~PAGE_MASK)
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ offset >> PAGE_SHIFT);
}
SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
@@ -73,7 +74,8 @@ SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
if (pgoff & (~PAGE_MASK >> 12))
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> (PAGE_SHIFT-12));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ pgoff >> (PAGE_SHIFT - 12));
}
save_static_function(sys_fork);
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index f36ab1f09595..080d566654ea 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -270,8 +270,8 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
{
/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
we have. */
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
- pgoff >> (PAGE_SHIFT - 12));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ pgoff >> (PAGE_SHIFT - 12));
}
asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
@@ -279,7 +279,7 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
unsigned long offset)
{
if (!(offset & ~PAGE_MASK)) {
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> PAGE_SHIFT);
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index ecb981eea74b..1ef3b80b62a6 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -57,7 +57,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
off >>= shift;
}
- ret = sys_mmap_pgoff(addr, len, prot, flags, fd, off);
+ ret = ksys_mmap_pgoff(addr, len, prot, flags, fd, off);
out:
return ret;
}
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 79c78668258e..f7181ed8aafc 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -24,8 +24,8 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
{
if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
- offset >> (PAGE_SHIFT - page_shift_offset));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ offset >> (PAGE_SHIFT - page_shift_offset));
}
#ifdef CONFIG_64BIT
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 9bb897e443a6..da5ef7718254 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -442,8 +442,8 @@ COMPAT_SYSCALL_DEFINE1(s390_old_mmap, struct mmap_arg_struct_emu31 __user *, arg
return -EFAULT;
if (a.offset & ~PAGE_MASK)
return -EINVAL;
- return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
- a.offset >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+ a.offset >> PAGE_SHIFT);
}
COMPAT_SYSCALL_DEFINE1(s390_mmap2, struct mmap_arg_struct_emu31 __user *, arg)
@@ -452,7 +452,7 @@ COMPAT_SYSCALL_DEFINE1(s390_mmap2, struct mmap_arg_struct_emu31 __user *, arg)
if (copy_from_user(&a, arg, sizeof(a)))
return -EFAULT;
- return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
+ return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
}
COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count)
diff --git a/arch/s390/kernel/sys_s390.c b/arch/s390/kernel/sys_s390.c
index 0090037ab148..31cefe0c28c0 100644
--- a/arch/s390/kernel/sys_s390.c
+++ b/arch/s390/kernel/sys_s390.c
@@ -53,7 +53,7 @@ SYSCALL_DEFINE1(mmap2, struct s390_mmap_arg_struct __user *, arg)
if (copy_from_user(&a, arg, sizeof(a)))
goto out;
- error = sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
+ error = ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd, a.offset);
out:
return error;
}
diff --git a/arch/sh/kernel/sys_sh.c b/arch/sh/kernel/sys_sh.c
index 724911c59e7d..f8afc014e084 100644
--- a/arch/sh/kernel/sys_sh.c
+++ b/arch/sh/kernel/sys_sh.c
@@ -35,7 +35,7 @@ asmlinkage int old_mmap(unsigned long addr, unsigned long len,
{
if (off & ~PAGE_MASK)
return -EINVAL;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, off>>PAGE_SHIFT);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, off>>PAGE_SHIFT);
}
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
@@ -51,7 +51,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
pgoff >>= PAGE_SHIFT - 12;
- return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
}
/* sys_cacheflush -- flush (part of) the processor cache. */
diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c
index 990703b7cf4d..d980da4ffd7b 100644
--- a/arch/sparc/kernel/sys_sparc_32.c
+++ b/arch/sparc/kernel/sys_sparc_32.c
@@ -104,8 +104,8 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
{
/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
we have. */
- return sys_mmap_pgoff(addr, len, prot, flags, fd,
- pgoff >> (PAGE_SHIFT - 12));
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd,
+ pgoff >> (PAGE_SHIFT - 12));
}
asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
@@ -113,7 +113,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
unsigned long off)
{
/* no alignment check? */
- return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
}
long sparc_remap_file_pages(unsigned long start, unsigned long size,
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 55416db482ad..ebb84dc8a5a7 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -458,7 +458,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
goto out;
if (off & ~PAGE_MASK)
goto out;
- retval = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ retval = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
out:
return retval;
}
diff --git a/arch/um/kernel/syscall.c b/arch/um/kernel/syscall.c
index 6258676bed85..35f7047bdebc 100644
--- a/arch/um/kernel/syscall.c
+++ b/arch/um/kernel/syscall.c
@@ -22,7 +22,7 @@ long old_mmap(unsigned long addr, unsigned long len,
if (offset & ~PAGE_MASK)
goto out;
- err = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
+ err = ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
out:
return err;
}
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index 401bd8ec9cf0..bff71b9ae3f5 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -166,7 +166,7 @@ COMPAT_SYSCALL_DEFINE1(x86_mmap, struct mmap_arg_struct32 __user *, arg)
if (a.offset & ~PAGE_MASK)
return -EINVAL;
- return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+ return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
a.offset>>PAGE_SHIFT);
}
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 676774b9bb8d..a3f15ed545b5 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -97,7 +97,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
if (off & ~PAGE_MASK)
goto out;
- error = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+ error = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
out:
return error;
}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 466d408deefd..ec866c959e7d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -979,6 +979,9 @@ static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
return -EINVAL;
}
#endif
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff);
/*
* The following kernel syscall equivalents are just wrappers to fs-internal
diff --git a/mm/mmap.c b/mm/mmap.c
index 9efdc021ad22..aa0dc8231c0d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1488,9 +1488,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return addr;
}
-SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
- unsigned long, prot, unsigned long, flags,
- unsigned long, fd, unsigned long, pgoff)
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff)
{
struct file *file = NULL;
unsigned long retval;
@@ -1537,6 +1537,13 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
return retval;
}
+SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
+ unsigned long, prot, unsigned long, flags,
+ unsigned long, fd, unsigned long, pgoff)
+{
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}
+
#ifdef __ARCH_WANT_SYS_OLD_MMAP
struct mmap_arg_struct {
unsigned long addr;
@@ -1556,8 +1563,8 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
if (offset_in_page(a.offset))
return -EINVAL;
- return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
- a.offset >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+ a.offset >> PAGE_SHIFT);
}
#endif /* __ARCH_WANT_SYS_OLD_MMAP */
diff --git a/mm/nommu.c b/mm/nommu.c
index ebb6e618dade..cad329629530 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1423,9 +1423,9 @@ unsigned long do_mmap(struct file *file,
return -ENOMEM;
}
-SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
- unsigned long, prot, unsigned long, flags,
- unsigned long, fd, unsigned long, pgoff)
+unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff)
{
struct file *file = NULL;
unsigned long retval = -EBADF;
@@ -1447,6 +1447,13 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
return retval;
}
+SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
+ unsigned long, prot, unsigned long, flags,
+ unsigned long, fd, unsigned long, pgoff)
+{
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}
+
#ifdef __ARCH_WANT_SYS_OLD_MMAP
struct mmap_arg_struct {
unsigned long addr;
@@ -1466,8 +1473,8 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
if (offset_in_page(a.offset))
return -EINVAL;
- return sys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
- a.offset >> PAGE_SHIFT);
+ return ksys_mmap_pgoff(a.addr, a.len, a.prot, a.flags, a.fd,
+ a.offset >> PAGE_SHIFT);
}
#endif /* __ARCH_WANT_SYS_OLD_MMAP */
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 098/109] mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
` (5 preceding siblings ...)
2018-03-29 11:24 ` [PATCH 097/109] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
@ 2018-03-29 11:24 ` Dominik Brodowski
2018-03-29 14:20 ` [PATCH 000/109] remove in-kernel calls to syscalls Matthew Wilcox
7 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 11:24 UTC (permalink / raw)
To: linux-kernel; +Cc: viro, torvalds, arnd, linux-arch, Andrew Morton, linux-mm
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/mips/kernel/linux32.c | 2 +-
arch/parisc/kernel/sys_parisc.c | 2 +-
arch/powerpc/kernel/sys_ppc32.c | 2 +-
arch/s390/kernel/compat_linux.c | 2 +-
arch/sparc/kernel/sys_sparc32.c | 2 +-
arch/x86/ia32/sys_ia32.c | 2 +-
include/linux/syscalls.h | 1 +
mm/readahead.c | 7 ++++++-
8 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 0571ab7b68b0..318f1c05c5b3 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -131,7 +131,7 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality)
asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3,
size_t count)
{
- return sys_readahead(fd, merge_64(a2, a3), count);
+ return ksys_readahead(fd, merge_64(a2, a3), count);
}
asmlinkage long sys32_sync_file_range(int fd, int __pad,
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 080d566654ea..8c99ebbe2bac 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -345,7 +345,7 @@ asmlinkage ssize_t parisc_pwrite64(unsigned int fd, const char __user *buf,
asmlinkage ssize_t parisc_readahead(int fd, unsigned int high, unsigned int low,
size_t count)
{
- return sys_readahead(fd, (loff_t)high << 32 | low, count);
+ return ksys_readahead(fd, (loff_t)high << 32 | low, count);
}
asmlinkage long parisc_fadvise64_64(int fd,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 0b95fa13307f..c11c73373691 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -88,7 +88,7 @@ compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, com
compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 count)
{
- return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count);
+ return ksys_readahead(fd, ((loff_t)offhi << 32) | offlo, count);
}
asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4,
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index da5ef7718254..8ac38d51ed7d 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -328,7 +328,7 @@ COMPAT_SYSCALL_DEFINE5(s390_pwrite64, unsigned int, fd, const char __user *, ubu
COMPAT_SYSCALL_DEFINE4(s390_readahead, int, fd, u32, high, u32, low, s32, count)
{
- return sys_readahead(fd, (unsigned long)high << 32 | low, count);
+ return ksys_readahead(fd, (unsigned long)high << 32 | low, count);
}
struct stat64_emu31 {
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 4da66aed50b4..f166e5bbf506 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -217,7 +217,7 @@ asmlinkage long compat_sys_readahead(int fd,
unsigned long offlo,
compat_size_t count)
{
- return sys_readahead(fd, (offhi << 32) | offlo, count);
+ return ksys_readahead(fd, (offhi << 32) | offlo, count);
}
long compat_sys_fadvise64(int fd,
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index bff71b9ae3f5..bd8a7020b9a7 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -203,7 +203,7 @@ COMPAT_SYSCALL_DEFINE6(x86_fadvise64_64, int, fd, __u32, offset_low,
COMPAT_SYSCALL_DEFINE4(x86_readahead, int, fd, unsigned int, off_lo,
unsigned int, off_hi, size_t, count)
{
- return sys_readahead(fd, ((u64)off_hi << 32) | off_lo, count);
+ return ksys_readahead(fd, ((u64)off_hi << 32) | off_lo, count);
}
COMPAT_SYSCALL_DEFINE6(x86_sync_file_range, int, fd, unsigned int, off_low,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index ec866c959e7d..815fbdd9cca1 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -982,6 +982,7 @@ static inline int ksys_fadvise64_64(int fd, loff_t offset, loff_t len,
unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
+ssize_t ksys_readahead(int fd, loff_t offset, size_t count);
/*
* The following kernel syscall equivalents are just wrappers to fs-internal
diff --git a/mm/readahead.c b/mm/readahead.c
index c4ca70239233..4d57b4644f98 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -573,7 +573,7 @@ do_readahead(struct address_space *mapping, struct file *filp,
return force_page_cache_readahead(mapping, filp, index, nr);
}
-SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
+ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
{
ssize_t ret;
struct fd f;
@@ -592,3 +592,8 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
}
return ret;
}
+
+SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
+{
+ return ksys_readahead(fd, offset, count);
+}
--
2.16.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 000/109] remove in-kernel calls to syscalls
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
` (6 preceding siblings ...)
2018-03-29 11:24 ` [PATCH 098/109] mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Dominik Brodowski
@ 2018-03-29 14:20 ` Matthew Wilcox
2018-03-29 14:42 ` Dominik Brodowski
7 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2018-03-29 14:20 UTC (permalink / raw)
To: Dominik Brodowski
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, hmclauchlan,
tautschn, Amir Goldstein, Andi Kleen, Andrew Morton,
Christoph Hellwig, Darren Hart, David S . Miller,
Eric W . Biederman, H . Peter Anvin, Ingo Molnar, Jaswinder Singh,
Jeff Dike, Jiri Slaby, kexec, linux-fsdevel, linux-mm, linux-s390,
Luis R . Rodriguez, netdev, Peter Zijlstra, Thomas Gleixner,
user-mode-linux-devel, x86
On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> At least on 64-bit x86, it will likely be a hard requirement from v4.17
> onwards to not call system call functions in the kernel: It is better to
> use use a different calling convention for system calls there, where
> struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> processing over to the actual syscall function. This means that only those
> parameters which are actually needed for a specific syscall are passed on
> during syscall entry, instead of filling in six CPU registers with random
> user space content all the time (which may cause serious trouble down the
> call chain).[*]
How do we stop new ones from springing up? Some kind of linker trick
like was used to, er, "dissuade" people from using gets()?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 000/109] remove in-kernel calls to syscalls
2018-03-29 14:20 ` [PATCH 000/109] remove in-kernel calls to syscalls Matthew Wilcox
@ 2018-03-29 14:42 ` Dominik Brodowski
2018-03-29 14:46 ` David Laight
0 siblings, 1 reply; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 14:42 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, hmclauchlan,
tautschn, Amir Goldstein, Andi Kleen, Andrew Morton,
Christoph Hellwig, Darren Hart, David S . Miller,
Eric W . Biederman, H . Peter Anvin, Ingo Molnar, Jaswinder Singh,
Jeff Dike, Jiri Slaby, kexec, linux-fsdevel, linux-mm, linux-s390,
Luis R . Rodriguez, netdev, Peter Zijlstra, Thomas Gleixner,
user-mode-linux-devel, x86
On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > onwards to not call system call functions in the kernel: It is better to
> > use use a different calling convention for system calls there, where
> > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> > processing over to the actual syscall function. This means that only those
> > parameters which are actually needed for a specific syscall are passed on
> > during syscall entry, instead of filling in six CPU registers with random
> > user space content all the time (which may cause serious trouble down the
> > call chain).[*]
>
> How do we stop new ones from springing up? Some kind of linker trick
> like was used to, er, "dissuade" people from using gets()?
Once the patches which modify the syscall calling convention are merged,
it won't compile on 64-bit x86, but bark loudly. That should frighten anyone.
Meow.
Thanks,
Dominik
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 000/109] remove in-kernel calls to syscalls
2018-03-29 14:42 ` Dominik Brodowski
@ 2018-03-29 14:46 ` David Laight
2018-03-29 14:55 ` Dominik Brodowski
0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2018-03-29 14:46 UTC (permalink / raw)
To: 'Dominik Brodowski', Matthew Wilcox
Cc: linux-kernel@vger.kernel.org, viro@ZenIV.linux.org.uk,
torvalds@linux-foundation.org, arnd@arndb.de,
linux-arch@vger.kernel.org, hmclauchlan@fb.com,
tautschn@amazon.co.uk, Amir Goldstein, Andi Kleen, Andrew Morton,
Christoph Hellwig, Darren Hart, David S . Miller,
Eric W . Biederman, H . Peter Anvin, Ingo Molnar, Jaswinder Singh,
Jeff Dike, Jiri Slaby, kexec@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-s390@vger.kernel.org, Luis R . Rodriguez,
netdev@vger.kernel.org, Peter Zijlstra, Thomas Gleixner,
user-mode-linux-devel@lists.sourceforge.net, x86@kernel.org
From: Dominik Brodowski
> Sent: 29 March 2018 15:42
> On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > > onwards to not call system call functions in the kernel: It is better to
> > > use use a different calling convention for system calls there, where
> > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> > > processing over to the actual syscall function. This means that only those
> > > parameters which are actually needed for a specific syscall are passed on
> > > during syscall entry, instead of filling in six CPU registers with random
> > > user space content all the time (which may cause serious trouble down the
> > > call chain).[*]
> >
> > How do we stop new ones from springing up? Some kind of linker trick
> > like was used to, er, "dissuade" people from using gets()?
>
> Once the patches which modify the syscall calling convention are merged,
> it won't compile on 64-bit x86, but bark loudly. That should frighten anyone.
> Meow.
Should be pretty easy to ensure the prototypes aren't in any normal header.
Renaming the global symbols (to not match the function name) will make it
much harder to call them as well.
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 000/109] remove in-kernel calls to syscalls
2018-03-29 14:46 ` David Laight
@ 2018-03-29 14:55 ` Dominik Brodowski
0 siblings, 0 replies; 12+ messages in thread
From: Dominik Brodowski @ 2018-03-29 14:55 UTC (permalink / raw)
To: David Laight
Cc: Matthew Wilcox, linux-kernel@vger.kernel.org,
viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org,
arnd@arndb.de, linux-arch@vger.kernel.org, hmclauchlan@fb.com,
tautschn@amazon.co.uk, Amir Goldstein, Andi Kleen, Andrew Morton,
Christoph Hellwig, Darren Hart, David S . Miller,
Eric W . Biederman, H . Peter Anvin, Ingo Molnar, Jaswinder Singh,
Jeff Dike, Jiri Slaby, kexec@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-s390@vger.kernel.org, Luis R . Rodriguez,
netdev@vger.kernel.org, Peter Zijlstra, Thomas Gleixner,
user-mode-linux-devel@lists.sourceforge.net, x86@kernel.org
On Thu, Mar 29, 2018 at 02:46:44PM +0000, David Laight wrote:
> From: Dominik Brodowski
> > Sent: 29 March 2018 15:42
> > On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> > > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > > > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > > > onwards to not call system call functions in the kernel: It is better to
> > > > use use a different calling convention for system calls there, where
> > > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> > > > processing over to the actual syscall function. This means that only those
> > > > parameters which are actually needed for a specific syscall are passed on
> > > > during syscall entry, instead of filling in six CPU registers with random
> > > > user space content all the time (which may cause serious trouble down the
> > > > call chain).[*]
> > >
> > > How do we stop new ones from springing up? Some kind of linker trick
> > > like was used to, er, "dissuade" people from using gets()?
> >
> > Once the patches which modify the syscall calling convention are merged,
> > it won't compile on 64-bit x86, but bark loudly. That should frighten anyone.
> > Meow.
>
> Should be pretty easy to ensure the prototypes aren't in any normal header.
That's exactly why the compile will fail.
> Renaming the global symbols (to not match the function name) will make it
> much harder to call them as well.
That still depends on the exact design of the patchset, which is still under
review.
Thanks,
Dominik
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-03-29 14:55 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-29 11:22 [PATCH 000/109] remove in-kernel calls to syscalls Dominik Brodowski
2018-03-29 11:23 ` [PATCH 044/109] mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c Dominik Brodowski
2018-03-29 11:23 ` [PATCH 045/109] mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c Dominik Brodowski
2018-03-29 11:23 ` [PATCH 046/109] mm: add kernel_mbind() helper; remove in-kernel call to syscall Dominik Brodowski
2018-03-29 11:23 ` [PATCH 047/109] mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls Dominik Brodowski
2018-03-29 11:24 ` [PATCH 096/109] mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Dominik Brodowski
2018-03-29 11:24 ` [PATCH 097/109] mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Dominik Brodowski
2018-03-29 11:24 ` [PATCH 098/109] mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Dominik Brodowski
2018-03-29 14:20 ` [PATCH 000/109] remove in-kernel calls to syscalls Matthew Wilcox
2018-03-29 14:42 ` Dominik Brodowski
2018-03-29 14:46 ` David Laight
2018-03-29 14:55 ` Dominik Brodowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).