* [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe @ 2017-11-22 19:36 Mike Rapoport [not found] ` <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> ` (4 more replies) 0 siblings, 5 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-22 19:36 UTC (permalink / raw) To: Andrew Morton, Alexander Viro Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Yossi Kuperman From: Yossi Kuperman <yossiku@il.ibm.com> Hi, This patches introduces new process_vmsplice system call that combines functionality of process_vm_read and vmsplice. It allows to map the memory of another process into a pipe, similarly to what vmsplice does for its own address space. The patch 2/4 ("vm: add a syscall to map a process memory into a pipe") actually adds the new system call and provides its elaborate description. The patchset is against -mm tree. v3: minor refactoring to reduce code duplication v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH give correct flags to get_user_pages_remote() Andrei Vagin (3): vm: add a syscall to map a process memory into a pipe x86: wire up the process_vmsplice syscall test: add a test for the process_vmsplice syscall Mike Rapoport (1): fs/splice: introduce pages_to_pipe helper arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 2 + fs/splice.c | 262 +++++++++++++++++++-- include/linux/compat.h | 3 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 2 + tools/testing/selftests/process_vmsplice/Makefile | 5 + .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++ 9 files changed, 450 insertions(+), 22 deletions(-) create mode 100644 tools/testing/selftests/process_vmsplice/Makefile create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c -- 2.7.4 ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* [PATCH v3 1/4] fs/splice: introduce pages_to_pipe helper [not found] ` <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-11-22 19:36 ` Mike Rapoport 0 siblings, 0 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-22 19:36 UTC (permalink / raw) To: Andrew Morton, Alexander Viro Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Mike Rapoport Signed-off-by: Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> --- fs/splice.c | 57 ++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 21 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 39e2dc0..7f1ffc5 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1185,6 +1185,36 @@ static long do_splice(struct file *in, loff_t __user *off_in, return -EINVAL; } +static int pages_to_pipe(struct page **pages, struct pipe_inode_info *pipe, + struct pipe_buffer *buf, size_t *total, + ssize_t copied, size_t start) +{ + bool failed = false; + size_t len = 0; + int ret = 0; + int n; + + for (n = 0; copied; n++, start = 0) { + int size = min_t(int, copied, PAGE_SIZE - start); + if (!failed) { + buf->page = pages[n]; + buf->offset = start; + buf->len = size; + ret = add_to_pipe(pipe, buf); + if (unlikely(ret < 0)) + failed = true; + else + len += ret; + } else { + put_page(pages[n]); + } + copied -= size; + } + + *total += len; + return failed ? ret : len; +} + static int iter_to_pipe(struct iov_iter *from, struct pipe_inode_info *pipe, unsigned flags) @@ -1195,13 +1225,11 @@ static int iter_to_pipe(struct iov_iter *from, }; size_t total = 0; int ret = 0; - bool failed = false; - while (iov_iter_count(from) && !failed) { + while (iov_iter_count(from)) { struct page *pages[16]; ssize_t copied; size_t start; - int n; copied = iov_iter_get_pages(from, pages, ~0UL, 16, &start); if (copied <= 0) { @@ -1209,24 +1237,11 @@ static int iter_to_pipe(struct iov_iter *from, break; } - for (n = 0; copied; n++, start = 0) { - int size = min_t(int, copied, PAGE_SIZE - start); - if (!failed) { - buf.page = pages[n]; - buf.offset = start; - buf.len = size; - ret = add_to_pipe(pipe, &buf); - if (unlikely(ret < 0)) { - failed = true; - } else { - iov_iter_advance(from, ret); - total += ret; - } - } else { - put_page(pages[n]); - } - copied -= size; - } + ret = pages_to_pipe(pages, pipe, &buf, &total, copied, start); + if (unlikely(ret < 0)) + break; + + iov_iter_advance(from, ret); } return total ? total : ret; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/4] vm: add a syscall to map a process memory into a pipe 2017-11-22 19:36 [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport [not found] ` <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-11-22 19:36 ` Mike Rapoport 2017-11-22 19:36 ` [PATCH v3 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport ` (2 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-22 19:36 UTC (permalink / raw) To: Andrew Morton, Alexander Viro Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Andrei Vagin, Mike Rapoport From: Andrei Vagin <avagin@virtuozzo.com> It is a hybrid of process_vm_readv() and vmsplice(). vmsplice can map memory from a current address space into a pipe. process_vm_readv can read memory of another process. A new system call can map memory of another process into a pipe. ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int flags) All arguments are identical with vmsplice except pid which specifies a target process. Currently if we want to dump a process memory to a file or to a socket, we can use process_vm_readv() + write(), but it works slow, because data are copied into a temporary user-space buffer. A second way is to use vmsplice() + splice(). It is more effective, because data are not copied into a temporary buffer, but here is another problem. vmsplice works with the currect address space, so it can be used only if we inject our code into a target process. The second way suffers from a few other issues: * a process has to be stopped to run a parasite code * a number of pipes is limited, so it may be impossible to dump all memory in one iteration, and we have to stop process and inject our code a few times. * pages in pipes are unreclaimable, so it isn't good to hold a lot of memory in pipes. The introduced syscall allows to use a second way without injecting any code into a target process. My experiments shows that process_vmsplice() + splice() works two time faster than process_vm_readv() + write(). It is particularly useful on a pre-dump stage. On this stage we enable a memory tracker, and then we are dumping a process memory while a process continues work. On the first iteration we are dumping all memory, and then we are dumpung only modified memory from a previous iteration. After a few pre-dump operations, a process is stopped and dumped finally. The pre-dump operations allow to significantly decrease a process downtime, when a process is migrated to another host. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> --- fs/splice.c | 205 ++++++++++++++++++++++++++++++++++++++ include/linux/compat.h | 3 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 2 + 5 files changed, 218 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index 7f1ffc5..72397d2 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -34,6 +34,7 @@ #include <linux/socket.h> #include <linux/compat.h> #include <linux/sched/signal.h> +#include <linux/sched/mm.h> #include "internal.h" @@ -1373,6 +1374,210 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, iov, return error; } +#ifdef CONFIG_CROSS_MEMORY_ATTACH +/* + * Map pages from a specified task into a pipe + */ +static int remote_single_vec_to_pipe(struct task_struct *task, + struct mm_struct *mm, + const struct iovec *rvec, + struct pipe_inode_info *pipe, + unsigned int flags, + size_t *total) +{ + struct pipe_buffer buf = { + .ops = &user_page_pipe_buf_ops, + .flags = flags + }; + unsigned long addr = (unsigned long) rvec->iov_base; + unsigned long pa = addr & PAGE_MASK; + unsigned long start_offset = addr - pa; + unsigned long nr_pages; + ssize_t len = rvec->iov_len; + struct page *process_pages[16]; + bool failed = false; + int ret = 0; + + nr_pages = (addr + len - 1) / PAGE_SIZE - addr / PAGE_SIZE + 1; + while (nr_pages) { + long pages = min(nr_pages, 16UL); + int locked = 1; + ssize_t copied; + + /* + * Get the pages we're interested in. We must + * access remotely because task/mm might not + * current/current->mm + */ + down_read(&mm->mmap_sem); + pages = get_user_pages_remote(task, mm, pa, pages, 0, + process_pages, NULL, &locked); + if (locked) + up_read(&mm->mmap_sem); + if (pages <= 0) { + failed = true; + ret = -EFAULT; + break; + } + + copied = pages * PAGE_SIZE - start_offset; + if (copied > len) + copied = len; + len -= copied; + + ret = pages_to_pipe(process_pages, pipe, &buf, total, copied, + start_offset); + if (unlikely(ret < 0)) + break; + + start_offset = 0; + nr_pages -= pages; + pa += pages * PAGE_SIZE; + } + return ret < 0 ? ret : 0; +} + +static ssize_t remote_iovec_to_pipe(struct task_struct *task, + struct mm_struct *mm, + const struct iovec *rvec, + unsigned long riovcnt, + struct pipe_inode_info *pipe, + unsigned int flags) +{ + size_t total = 0; + int ret = 0, i; + + for (i = 0; i < riovcnt; i++) { + /* Work out address and page range required */ + if (rvec[i].iov_len == 0) + continue; + + ret = remote_single_vec_to_pipe( + task, mm, &rvec[i], pipe, flags, &total); + if (ret < 0) + break; + } + return total ? total : ret; +} + +static long process_vmsplice_to_pipe(struct task_struct *task, + struct mm_struct *mm, struct file *file, + const struct iovec __user *uiov, + unsigned long nr_segs, unsigned int flags) +{ + struct pipe_inode_info *pipe; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + unsigned int buf_flag = 0; + long ret; + + if (flags & SPLICE_F_GIFT) + buf_flag = PIPE_BUF_FLAG_GIFT; + + pipe = get_pipe_info(file); + if (!pipe) + return -EBADF; + + ret = rw_copy_check_uvector(CHECK_IOVEC_ONLY, uiov, nr_segs, + UIO_FASTIOV, iovstack, &iov); + if (ret < 0) + return ret; + + pipe_lock(pipe); + ret = wait_for_space(pipe, flags); + if (!ret) + ret = remote_iovec_to_pipe(task, mm, iov, + nr_segs, pipe, buf_flag); + pipe_unlock(pipe); + if (ret > 0) + wakeup_pipe_readers(pipe); + + if (iov != iovstack) + kfree(iov); + return ret; +} + +/* process_vmsplice splices a process address range into a pipe. */ +SYSCALL_DEFINE5(process_vmsplice, int, pid, int, fd, + const struct iovec __user *, iov, + unsigned long, nr_segs, unsigned int, flags) +{ + struct task_struct *task; + struct mm_struct *mm; + struct fd f; + long ret; + + if (unlikely(flags & ~SPLICE_F_ALL)) + return -EINVAL; + if (unlikely(nr_segs > UIO_MAXIOV)) + return -EINVAL; + else if (unlikely(!nr_segs)) + return 0; + + f = fdget(fd); + if (!f.file) + return -EBADF; + + /* Get process information */ + task = find_get_task_by_vpid(pid); + if (!task) { + ret = -ESRCH; + goto out_fput; + } + + mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS); + if (!mm || IS_ERR(mm)) { + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; + /* + * Explicitly map EACCES to EPERM as EPERM is a more a + * appropriate error code for process_vw_readv/writev + */ + if (ret == -EACCES) + ret = -EPERM; + goto put_task_struct; + } + + ret = -EBADF; + if (f.file->f_mode & FMODE_WRITE) + ret = process_vmsplice_to_pipe(task, mm, f.file, + iov, nr_segs, flags); + mmput(mm); + +put_task_struct: + put_task_struct(task); + +out_fput: + fdput(f); + + return ret; +} + +#ifdef CONFIG_COMPAT +COMPAT_SYSCALL_DEFINE5(process_vmsplice, pid_t, pid, int, fd, + const struct compat_iovec __user *, iov32, + unsigned int, nr_segs, unsigned int, flags) +{ + struct iovec __user *iov; + unsigned int i; + + if (nr_segs > UIO_MAXIOV) + return -EINVAL; + + iov = compat_alloc_user_space(nr_segs * sizeof(struct iovec)); + for (i = 0; i < nr_segs; i++) { + struct compat_iovec v; + + if (get_user(v.iov_base, &iov32[i].iov_base) || + get_user(v.iov_len, &iov32[i].iov_len) || + put_user(compat_ptr(v.iov_base), &iov[i].iov_base) || + put_user(v.iov_len, &iov[i].iov_len)) + return -EFAULT; + } + return sys_process_vmsplice(pid, fd, iov, nr_segs, flags); +} +#endif +#endif /* CONFIG_CROSS_MEMORY_ATTACH */ + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32, unsigned int, nr_segs, unsigned int, flags) diff --git a/include/linux/compat.h b/include/linux/compat.h index 0fc3640..11b3753 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -550,6 +550,9 @@ asmlinkage long compat_sys_getdents(unsigned int fd, unsigned int count); asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *, unsigned int nr_segs, unsigned int flags); +asmlinkage long compat_sys_process_vmsplice(pid_t pid, int fd, + const struct compat_iovec __user *, + unsigned int nr_segs, unsigned int flags); asmlinkage long compat_sys_open(const char __user *filename, int flags, umode_t mode); asmlinkage long compat_sys_openat(int dfd, const char __user *filename, diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a78186d..4ba9333 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -941,4 +941,8 @@ asmlinkage long sys_pkey_free(int pkey); asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); +asmlinkage long sys_process_vmsplice(pid_t pid, + int fd, const struct iovec __user *iov, + unsigned long nr_segs, unsigned int flags); + #endif diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 8b87de0..37f1832 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -732,9 +732,12 @@ __SYSCALL(__NR_pkey_alloc, sys_pkey_alloc) __SYSCALL(__NR_pkey_free, sys_pkey_free) #define __NR_statx 291 __SYSCALL(__NR_statx, sys_statx) +#define __NR_process_vmsplice 292 +__SC_COMP(__NR_process_vmsplice, sys_process_vmsplice, + compat_sys_process_vmsplice) #undef __NR_syscalls -#define __NR_syscalls 292 +#define __NR_syscalls 293 /* * All syscalls below here should go away really, diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index b518976..a939fbb 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -158,8 +158,10 @@ cond_syscall(sys_sysfs); cond_syscall(sys_syslog); cond_syscall(sys_process_vm_readv); cond_syscall(sys_process_vm_writev); +cond_syscall(sys_process_vmsplice); cond_syscall(compat_sys_process_vm_readv); cond_syscall(compat_sys_process_vm_writev); +cond_syscall(compat_sys_process_vmsplice); cond_syscall(sys_uselib); cond_syscall(sys_fadvise64); cond_syscall(sys_fadvise64_64); -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/4] x86: wire up the process_vmsplice syscall 2017-11-22 19:36 [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport [not found] ` <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-11-22 19:36 ` [PATCH v3 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport @ 2017-11-22 19:36 ` Mike Rapoport 2017-11-22 19:36 ` [PATCH v3 4/4] test: add a test for " Mike Rapoport 2017-11-22 20:43 ` [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Michael Kerrisk (man-pages) 4 siblings, 0 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-22 19:36 UTC (permalink / raw) To: Andrew Morton, Alexander Viro Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Andrei Vagin From: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 2 ++ 2 files changed, 3 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 448ac21..dc64bf5 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -391,3 +391,4 @@ 382 i386 pkey_free sys_pkey_free 383 i386 statx sys_statx 384 i386 arch_prctl sys_arch_prctl compat_sys_arch_prctl +385 i386 process_vmsplice sys_process_vmsplice compat_sys_process_vmsplice diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 5aef183..d2f916c 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -339,6 +339,7 @@ 330 common pkey_alloc sys_pkey_alloc 331 common pkey_free sys_pkey_free 332 common statx sys_statx +333 64 process_vmsplice sys_process_vmsplice # # x32-specific system call numbers start at 512 to avoid cache impact @@ -380,3 +381,4 @@ 545 x32 execveat compat_sys_execveat/ptregs 546 x32 preadv2 compat_sys_preadv64v2 547 x32 pwritev2 compat_sys_pwritev64v2 +548 x32 process_vmsplice compat_sys_process_vmsplice -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 4/4] test: add a test for the process_vmsplice syscall 2017-11-22 19:36 [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport ` (2 preceding siblings ...) 2017-11-22 19:36 ` [PATCH v3 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport @ 2017-11-22 19:36 ` Mike Rapoport [not found] ` <1511379391-988-5-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-11-22 20:43 ` [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Michael Kerrisk (man-pages) 4 siblings, 1 reply; 9+ messages in thread From: Mike Rapoport @ 2017-11-22 19:36 UTC (permalink / raw) To: Andrew Morton, Alexander Viro Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Andrei Vagin From: Andrei Vagin <avagin@openvz.org> This test checks that process_vmsplice() can splice pages from a remote process and returns EFAULT, if process_vmsplice() tries to splice pages by an unaccessiable address. Signed-off-by: Andrei Vagin <avagin@openvz.org> --- tools/testing/selftests/process_vmsplice/Makefile | 5 + .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++++++++ 2 files changed, 193 insertions(+) create mode 100644 tools/testing/selftests/process_vmsplice/Makefile create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c diff --git a/tools/testing/selftests/process_vmsplice/Makefile b/tools/testing/selftests/process_vmsplice/Makefile new file mode 100644 index 0000000..246d5a7 --- /dev/null +++ b/tools/testing/selftests/process_vmsplice/Makefile @@ -0,0 +1,5 @@ +CFLAGS += -I../../../../usr/include/ + +TEST_GEN_PROGS := process_vmsplice_test + +include ../lib.mk diff --git a/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c new file mode 100644 index 0000000..8abf59b --- /dev/null +++ b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c @@ -0,0 +1,188 @@ +#define _GNU_SOURCE +#include <stdio.h> +#include <unistd.h> +#include <sys/mman.h> +#include <sys/syscall.h> +#include <fcntl.h> +#include <sys/uio.h> +#include <errno.h> +#include <signal.h> +#include <sys/prctl.h> +#include <sys/wait.h> + +#include "../kselftest.h" + +#ifndef __NR_process_vmsplice +#define __NR_process_vmsplice 333 +#endif + +#define pr_err(fmt, ...) \ + ({ \ + fprintf(stderr, "%s:%d:" fmt, \ + __func__, __LINE__, ##__VA_ARGS__); \ + KSFT_FAIL; \ + }) +#define pr_perror(fmt, ...) pr_err(fmt ": %m\n", ##__VA_ARGS__) +#define fail(fmt, ...) pr_err("FAIL:" fmt, ##__VA_ARGS__) + +static ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov, + unsigned long nr_segs, unsigned int flags) +{ + return syscall(__NR_process_vmsplice, pid, fd, iov, nr_segs, flags); + +} + +#define MEM_SIZE (4096 * 100) +#define MEM_WRONLY_SIZE (4096 * 10) + +int main(int argc, char **argv) +{ + char *addr, *addr_wronly; + int p[2]; + struct iovec iov[2]; + char buf[4096]; + int status, ret; + pid_t pid; + + ksft_print_header(); + + addr = mmap(0, MEM_SIZE, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (addr == MAP_FAILED) + return pr_perror("Unable to create a mapping"); + + addr_wronly = mmap(0, MEM_WRONLY_SIZE, PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (addr_wronly == MAP_FAILED) + return pr_perror("Unable to create a write-only mapping"); + + if (pipe(p)) + return pr_perror("Unable to create a pipe"); + + pid = fork(); + if (pid < 0) + return pr_perror("Unable to fork"); + + if (pid == 0) { + addr[0] = 'C'; + addr[4096 + 128] = 'A'; + addr[4096 + 128 + 4096 - 1] = 'B'; + + if (prctl(PR_SET_PDEATHSIG, SIGKILL)) + return pr_perror("Unable to set PR_SET_PDEATHSIG"); + if (write(p[1], "c", 1) != 1) + return pr_perror("Unable to write data into pipe"); + + while (1) + sleep(1); + return 1; + } + if (read(p[0], buf, 1) != 1) { + pr_perror("Unable to read data from pipe"); + kill(pid, SIGKILL); + wait(&status); + return 1; + } + + munmap(addr, MEM_SIZE); + munmap(addr_wronly, MEM_WRONLY_SIZE); + + iov[0].iov_base = addr; + iov[0].iov_len = 1; + + iov[1].iov_base = addr + 4096 + 128; + iov[1].iov_len = 4096; + + /* check one iovec */ + if (process_vmsplice(pid, p[1], iov, 1, SPLICE_F_GIFT) != 1) + return pr_perror("Unable to splice pages"); + + if (read(p[0], buf, 1) != 1) + return pr_perror("Unable to read from pipe"); + + if (buf[0] != 'C') + ksft_test_result_fail("Get wrong data\n"); + else + ksft_test_result_pass("Check process_vmsplice with one vec\n"); + + /* check two iovec-s */ + if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 4097) + return pr_perror("Unable to spice pages\n"); + + if (read(p[0], buf, 1) != 1) + return pr_perror("Unable to read from pipe\n"); + + if (buf[0] != 'C') + ksft_test_result_fail("Get wrong data\n"); + + if (read(p[0], buf, 4096) != 4096) + return pr_perror("Unable to read from pipe\n"); + + if (buf[0] != 'A' || buf[4095] != 'B') + ksft_test_result_fail("Get wrong data\n"); + else + ksft_test_result_pass("check process_vmsplice with two vecs\n"); + + /* check how an unreadable region in a second vec is handled */ + iov[0].iov_base = addr; + iov[0].iov_len = 1; + + iov[1].iov_base = addr_wronly + 5; + iov[1].iov_len = 1; + + if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 1) + return pr_perror("Unable to splice data"); + + if (read(p[0], buf, 1) != 1) + return pr_perror("Unable to read form pipe"); + + if (buf[0] != 'C') + ksft_test_result_fail("Get wrong data\n"); + else + ksft_test_result_pass("unreadable region in a second vec\n"); + + /* check how an unreadable region in a first vec is handled */ + errno = 0; + if (process_vmsplice(pid, p[1], iov + 1, 1, SPLICE_F_GIFT) != -1 || + errno != EFAULT) + ksft_test_result_fail("Got anexpected errno %d\n", errno); + else + ksft_test_result_pass("splice as much as possible\n"); + + iov[0].iov_base = addr; + iov[0].iov_len = 1; + + iov[1].iov_base = addr; + iov[1].iov_len = MEM_SIZE; + + /* splice as much as possible */ + ret = process_vmsplice(pid, p[1], iov, 2, + SPLICE_F_GIFT | SPLICE_F_NONBLOCK); + if (ret != 4096 * 15 + 1) /* by default a pipe can fit 16 pages */ + return pr_perror("Unable to splice pages"); + + while (ret > 0) { + int len; + + len = read(p[0], buf, 4096); + if (len < 0) + return pr_perror("Unable to read data"); + if (len > ret) + return pr_err("Read more than expected\n"); + ret -= len; + } + ksft_test_result_pass("splice as much as possible\n"); + + if (kill(pid, SIGTERM)) + return pr_perror("Unable to kill a child process"); + status = -1; + if (wait(&status) < 0) + return pr_perror("Unable to wait a child process"); + if (!WIFSIGNALED(status) || WTERMSIG(status) != SIGTERM) + return pr_err("The child exited with an unexpected code %d\n", + status); + + if (ksft_get_fail_cnt()) + return ksft_exit_fail(); + return ksft_exit_pass(); +} -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <1511379391-988-5-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH v3 4/4] test: add a test for the process_vmsplice syscall [not found] ` <1511379391-988-5-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-11-23 8:01 ` Greg KH 2017-11-23 14:07 ` Mike Rapoport 0 siblings, 1 reply; 9+ messages in thread From: Greg KH @ 2017-11-23 8:01 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, Alexander Viro, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Andrei Vagin On Wed, Nov 22, 2017 at 09:36:31PM +0200, Mike Rapoport wrote: > From: Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> > > This test checks that process_vmsplice() can splice pages from a remote > process and returns EFAULT, if process_vmsplice() tries to splice pages > by an unaccessiable address. > > Signed-off-by: Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> > --- > tools/testing/selftests/process_vmsplice/Makefile | 5 + > .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++++++++ > 2 files changed, 193 insertions(+) > create mode 100644 tools/testing/selftests/process_vmsplice/Makefile > create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > > diff --git a/tools/testing/selftests/process_vmsplice/Makefile b/tools/testing/selftests/process_vmsplice/Makefile > new file mode 100644 > index 0000000..246d5a7 > --- /dev/null > +++ b/tools/testing/selftests/process_vmsplice/Makefile > @@ -0,0 +1,5 @@ > +CFLAGS += -I../../../../usr/include/ > + > +TEST_GEN_PROGS := process_vmsplice_test > + > +include ../lib.mk > diff --git a/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > new file mode 100644 > index 0000000..8abf59b > --- /dev/null > +++ b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > @@ -0,0 +1,188 @@ > +#define _GNU_SOURCE > +#include <stdio.h> > +#include <unistd.h> > +#include <sys/mman.h> > +#include <sys/syscall.h> > +#include <fcntl.h> > +#include <sys/uio.h> > +#include <errno.h> > +#include <signal.h> > +#include <sys/prctl.h> > +#include <sys/wait.h> > + > +#include "../kselftest.h" > + > +#ifndef __NR_process_vmsplice > +#define __NR_process_vmsplice 333 > +#endif > + > +#define pr_err(fmt, ...) \ > + ({ \ > + fprintf(stderr, "%s:%d:" fmt, \ > + __func__, __LINE__, ##__VA_ARGS__); \ > + KSFT_FAIL; \ > + }) > +#define pr_perror(fmt, ...) pr_err(fmt ": %m\n", ##__VA_ARGS__) > +#define fail(fmt, ...) pr_err("FAIL:" fmt, ##__VA_ARGS__) > + > +static ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov, > + unsigned long nr_segs, unsigned int flags) > +{ > + return syscall(__NR_process_vmsplice, pid, fd, iov, nr_segs, flags); > + > +} > + > +#define MEM_SIZE (4096 * 100) > +#define MEM_WRONLY_SIZE (4096 * 10) > + > +int main(int argc, char **argv) > +{ > + char *addr, *addr_wronly; > + int p[2]; > + struct iovec iov[2]; > + char buf[4096]; > + int status, ret; > + pid_t pid; > + > + ksft_print_header(); > + > + addr = mmap(0, MEM_SIZE, PROT_READ | PROT_WRITE, > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > + if (addr == MAP_FAILED) > + return pr_perror("Unable to create a mapping"); > + > + addr_wronly = mmap(0, MEM_WRONLY_SIZE, PROT_WRITE, > + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > + if (addr_wronly == MAP_FAILED) > + return pr_perror("Unable to create a write-only mapping"); > + > + if (pipe(p)) > + return pr_perror("Unable to create a pipe"); > + > + pid = fork(); > + if (pid < 0) > + return pr_perror("Unable to fork"); > + > + if (pid == 0) { > + addr[0] = 'C'; > + addr[4096 + 128] = 'A'; > + addr[4096 + 128 + 4096 - 1] = 'B'; > + > + if (prctl(PR_SET_PDEATHSIG, SIGKILL)) > + return pr_perror("Unable to set PR_SET_PDEATHSIG"); > + if (write(p[1], "c", 1) != 1) > + return pr_perror("Unable to write data into pipe"); > + > + while (1) > + sleep(1); > + return 1; > + } > + if (read(p[0], buf, 1) != 1) { > + pr_perror("Unable to read data from pipe"); > + kill(pid, SIGKILL); > + wait(&status); > + return 1; > + } > + > + munmap(addr, MEM_SIZE); > + munmap(addr_wronly, MEM_WRONLY_SIZE); > + > + iov[0].iov_base = addr; > + iov[0].iov_len = 1; > + > + iov[1].iov_base = addr + 4096 + 128; > + iov[1].iov_len = 4096; > + > + /* check one iovec */ > + if (process_vmsplice(pid, p[1], iov, 1, SPLICE_F_GIFT) != 1) > + return pr_perror("Unable to splice pages"); Shouldn't you check to see if the syscall is even present? You should not error if it is not, as this test will then "fail" on kernels/arches without the syscall enabled, which isn't the nicest. thanks, greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 4/4] test: add a test for the process_vmsplice syscall 2017-11-23 8:01 ` Greg KH @ 2017-11-23 14:07 ` Mike Rapoport 0 siblings, 0 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-23 14:07 UTC (permalink / raw) To: Greg KH Cc: Andrew Morton, Alexander Viro, linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn, Andrei Vagin On Thu, Nov 23, 2017 at 09:01:03AM +0100, Greg KH wrote: > On Wed, Nov 22, 2017 at 09:36:31PM +0200, Mike Rapoport wrote: > > From: Andrei Vagin <avagin@openvz.org> > > > > This test checks that process_vmsplice() can splice pages from a remote > > process and returns EFAULT, if process_vmsplice() tries to splice pages > > by an unaccessiable address. > > > > Signed-off-by: Andrei Vagin <avagin@openvz.org> > > --- > > tools/testing/selftests/process_vmsplice/Makefile | 5 + > > .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++++++++ > > 2 files changed, 193 insertions(+) > > create mode 100644 tools/testing/selftests/process_vmsplice/Makefile > > create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > > [ ... ] > > Shouldn't you check to see if the syscall is even present? You should > not error if it is not, as this test will then "fail" on kernels/arches > without the syscall enabled, which isn't the nicest. Sure, will fix. > thanks, > > greg k-h > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe 2017-11-22 19:36 [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport ` (3 preceding siblings ...) 2017-11-22 19:36 ` [PATCH v3 4/4] test: add a test for " Mike Rapoport @ 2017-11-22 20:43 ` Michael Kerrisk (man-pages) 2017-11-23 6:29 ` Mike Rapoport 4 siblings, 1 reply; 9+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-11-22 20:43 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, Alexander Viro, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, lkml, Linux API, criu, Arnd Bergmann, Pavel Emelyanov, Thomas Gleixner, Josh Triplett, Jann Horn, Yossi Kuperman Hi Mike, On 22 November 2017 at 20:36, Mike Rapoport <rppt@linux.vnet.ibm.com> wrote: > From: Yossi Kuperman <yossiku@il.ibm.com> > > Hi, > > This patches introduces new process_vmsplice system call that combines > functionality of process_vm_read and vmsplice. > > It allows to map the memory of another process into a pipe, similarly to > what vmsplice does for its own address space. > > The patch 2/4 ("vm: add a syscall to map a process memory into a pipe") > actually adds the new system call and provides its elaborate description. Where is the man page for this new syscall? Cheers, Michael > The patchset is against -mm tree. > > v3: minor refactoring to reduce code duplication > v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH > give correct flags to get_user_pages_remote() > > Andrei Vagin (3): > vm: add a syscall to map a process memory into a pipe > x86: wire up the process_vmsplice syscall > test: add a test for the process_vmsplice syscall > > Mike Rapoport (1): > fs/splice: introduce pages_to_pipe helper > > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 2 + > fs/splice.c | 262 +++++++++++++++++++-- > include/linux/compat.h | 3 + > include/linux/syscalls.h | 4 + > include/uapi/asm-generic/unistd.h | 5 +- > kernel/sys_ni.c | 2 + > tools/testing/selftests/process_vmsplice/Makefile | 5 + > .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++ > 9 files changed, 450 insertions(+), 22 deletions(-) > create mode 100644 tools/testing/selftests/process_vmsplice/Makefile > create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > > -- > 2.7.4 > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe 2017-11-22 20:43 ` [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Michael Kerrisk (man-pages) @ 2017-11-23 6:29 ` Mike Rapoport 0 siblings, 0 replies; 9+ messages in thread From: Mike Rapoport @ 2017-11-23 6:29 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Andrew Morton, Alexander Viro, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, lkml, Linux API, criu, Arnd Bergmann, Pavel Emelyanov, Thomas Gleixner, Josh Triplett, Jann Horn, Yossi Kuperman On Wed, Nov 22, 2017 at 09:43:31PM +0100, Michael Kerrisk (man-pages) wrote: > Hi Mike, > > On 22 November 2017 at 20:36, Mike Rapoport <rppt@linux.vnet.ibm.com> wrote: > > Hi, > > > > This patches introduces new process_vmsplice system call that combines > > functionality of process_vm_read and vmsplice. > > > > It allows to map the memory of another process into a pipe, similarly to > > what vmsplice does for its own address space. > > > > The patch 2/4 ("vm: add a syscall to map a process memory into a pipe") > > actually adds the new system call and provides its elaborate description. > > Where is the man page for this new syscall? It's still WIP, I'll send it out soon. > Cheers, > > Michael > > > The patchset is against -mm tree. > > > > v3: minor refactoring to reduce code duplication > > v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH > > give correct flags to get_user_pages_remote() > > > > Andrei Vagin (3): > > vm: add a syscall to map a process memory into a pipe > > x86: wire up the process_vmsplice syscall > > test: add a test for the process_vmsplice syscall > > > > Mike Rapoport (1): > > fs/splice: introduce pages_to_pipe helper > > > > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > > arch/x86/entry/syscalls/syscall_64.tbl | 2 + > > fs/splice.c | 262 +++++++++++++++++++-- > > include/linux/compat.h | 3 + > > include/linux/syscalls.h | 4 + > > include/uapi/asm-generic/unistd.h | 5 +- > > kernel/sys_ni.c | 2 + > > tools/testing/selftests/process_vmsplice/Makefile | 5 + > > .../process_vmsplice/process_vmsplice_test.c | 188 +++++++++++++++ > > 9 files changed, 450 insertions(+), 22 deletions(-) > > create mode 100644 tools/testing/selftests/process_vmsplice/Makefile > > create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c > > > > -- > > 2.7.4 > > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ > -- Sincerely yours, Mike. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-11-23 14:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-22 19:36 [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport [not found] ` <1511379391-988-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-11-22 19:36 ` [PATCH v3 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport 2017-11-22 19:36 ` [PATCH v3 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport 2017-11-22 19:36 ` [PATCH v3 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport 2017-11-22 19:36 ` [PATCH v3 4/4] test: add a test for " Mike Rapoport [not found] ` <1511379391-988-5-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-11-23 8:01 ` Greg KH 2017-11-23 14:07 ` Mike Rapoport 2017-11-22 20:43 ` [PATCH v3 0/4] vm: add a syscall to map a process memory into a pipe Michael Kerrisk (man-pages) 2017-11-23 6:29 ` Mike Rapoport
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).