* [RFC] ptrace, pidfd: add pidfd_ptrace syscall @ 2020-04-26 13:01 Hagen Paul Pfeifer 2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer 0 siblings, 1 reply; 17+ messages in thread From: Hagen Paul Pfeifer @ 2020-04-26 13:01 UTC (permalink / raw) To: linux-kernel Cc: Florian Weimer, Al Viro, Hagen Paul Pfeifer, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch Working on a safety-critical stress testing tool, using ptrace in an rather uncommon way (stop, peeking memory, ...) for a bunch of applications in an automated way I realized that once opened processes where restarted and PIDs recycled. Resulting in monitoring and manipulating the wrong processes. With the advent of pidfd we are now able to stick with one stable handle to identifying processes exactly. We now have the ability to get this race free. Sending signals now works like a charm, next step is to extend the functionality also for ptrace. API: long pidfd_ptrace(int pidfd, enum __ptrace_request request, void *addr, void *data, unsigned flags); Based on original ptrace, the following API changes where made: - Process identificator (pidfd) is now moved as first argument, this is aligned with pidfd_send_signal(int pidfd, ...) because potential future pidfd_* will have one thing in common: the pid identifier. I think is natural to have this argument upfront - Add an additional flags argument, not used now - but you never know All other arguments are identical compared to ptrace - no other modifications where made. Currently there are some pieces missing! This is just an early proposal for a new syscall. Still missing: - support for every architecture - re-use shared functions and move to common place - perf syscall registration - selftests - ... Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Christian Brauner <christian@brauner.io> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- kernel/ptrace.c | 129 ++++++++++++++++++++----- kernel/sys_ni.c | 1 + 6 files changed, 115 insertions(+), 23 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 54581ac671b4..593f7fab90eb 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -442,3 +442,4 @@ 435 i386 clone3 sys_clone3 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd +438 i386 pidfd_ptrace sys_pidfd_ptrace diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 37b844f839bc..cd76d8343510 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -359,6 +359,7 @@ 435 common clone3 sys_clone3 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd +439 common pidfd_ptrace sys_pidfd_ptrace # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 1815065d52f3..254b071a5334 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1003,6 +1003,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_pidfd_ptrace(int pidfd, long request, unsigned long addr, + unsigned long data, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 3a3201e4618e..64749a6f156e 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_openat2, sys_openat2) #define __NR_pidfd_getfd 438 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) +#define __NR_pidfd_getfd 439 +__SYSCALL(__NR_pidfd_ptrace, sys_pidfd_ptrace) #undef __NR_syscalls -#define __NR_syscalls 439 +#define __NR_syscalls 440 /* * 32 bit systems traditionally used different diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 43d6179508d6..8f4e99247742 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -29,6 +29,7 @@ #include <linux/regset.h> #include <linux/hw_breakpoint.h> #include <linux/cn_proc.h> +#include <linux/proc_fs.h> #include <linux/compat.h> #include <linux/sched/signal.h> @@ -1239,48 +1240,132 @@ int ptrace_request(struct task_struct *child, long request, #define arch_ptrace_attach(child) do { } while (0) #endif +static inline long ptrace_call(struct task_struct *task, long request, unsigned long addr, + unsigned long data) +{ + long ret; + + if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { + ret = ptrace_attach(task, request, addr, data); + /* + * Some architectures need to do book-keeping after + * a ptrace attach. + */ + if (!ret) + arch_ptrace_attach(task); + goto out; + } + + ret = ptrace_check_attach(task, request == PTRACE_KILL || + request == PTRACE_INTERRUPT); + if (ret < 0) + goto out; + + ret = arch_ptrace(task, request, addr, data); + if (ret || request != PTRACE_DETACH) + ptrace_unfreeze_traced(task); + + out: + put_task_struct(task); + return ret; +} + SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, unsigned long, data) { - struct task_struct *child; + struct task_struct *task; long ret; if (request == PTRACE_TRACEME) { ret = ptrace_traceme(); if (!ret) arch_ptrace_attach(current); - goto out; + return ret; } - child = find_get_task_by_vpid(pid); - if (!child) { + task = find_get_task_by_vpid(pid); + if (!task) { ret = -ESRCH; goto out; } - if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { - ret = ptrace_attach(child, request, addr, data); - /* - * Some architectures need to do book-keeping after - * a ptrace attach. - */ + + ret = ptrace_call(task, request, addr, data); +out: + return ret; +} + +static struct pid *pidfd_to_pid(const struct file *file) +{ + struct pid *pid; + + pid = pidfd_pid(file); + if (!IS_ERR(pid)) + return pid; + + return tgid_pidfd_to_pid(file); +} + +static bool access_pidfd_pidns(struct pid *pid) +{ + struct pid_namespace *active = task_active_pid_ns(current); + struct pid_namespace *p = ns_of_pid(pid); + + for (;;) { + if (!p) + return false; + if (p == active) + break; + p = p->parent; + } + + return true; +} + +SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr, + unsigned long, data, unsigned int, flags) +{ + long ret; + struct fd f; + struct pid *pid; + struct task_struct *task; + + /* Enforce flags be set to 0 until we add an extension. */ + if (flags) + return -EINVAL; + + if (request == PTRACE_TRACEME) { + ret = ptrace_traceme(); if (!ret) - arch_ptrace_attach(child); - goto out_put_task_struct; + arch_ptrace_attach(current); + goto out; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); - if (ret < 0) - goto out_put_task_struct; + f = fdget(pidfd); + if (!f.file) + return -EBADF; - ret = arch_ptrace(child, request, addr, data); - if (ret || request != PTRACE_DETACH) - ptrace_unfreeze_traced(child); + /* Is this a pidfd? */ + pid = pidfd_to_pid(f.file); + if (IS_ERR(pid)) { + ret = PTR_ERR(pid); + goto err; + } - out_put_task_struct: - put_task_struct(child); - out: + ret = -EINVAL; + if (!access_pidfd_pidns(pid)) + goto err; + + task = pid_task(pid, PIDTYPE_PID); + if (!task) { + ret = -EINVAL; + goto err; + } + + ret = ptrace_call(task, request, addr, data); +err: + fdput(f); +out: return ret; } diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 3b69a560a7ac..f7795294b8c4 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -166,6 +166,7 @@ COND_SYSCALL(delete_module); COND_SYSCALL(syslog); /* kernel/ptrace.c */ +COND_SYSCALL_COMPAT(pidfd_ptrace); /* kernel/sched/core.c */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-26 13:01 [RFC] ptrace, pidfd: add pidfd_ptrace syscall Hagen Paul Pfeifer @ 2020-04-26 16:34 ` Hagen Paul Pfeifer 2020-04-27 8:30 ` Arnd Bergmann 2020-04-27 17:08 ` Christian Brauner 0 siblings, 2 replies; 17+ messages in thread From: Hagen Paul Pfeifer @ 2020-04-26 16:34 UTC (permalink / raw) To: linux-kernel Cc: Florian Weimer, Al Viro, Hagen Paul Pfeifer, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch Working on a safety-critical stress testing tool, using ptrace in an rather uncommon way (stop, peeking memory, ...) for a bunch of applications in an automated way I realized that once opened processes where restarted and PIDs recycled. Resulting in monitoring and manipulating the wrong processes. With the advent of pidfd we are now able to stick with one stable handle to identifying processes exactly. We now have the ability to get this race free. Sending signals now works like a charm, next step is to extend the functionality also for ptrace. API: long pidfd_ptrace(int pidfd, enum __ptrace_request request, void *addr, void *data, unsigned flags); Based on original ptrace, the following API changes where made: - Process identificator (pidfd) is now moved to start, this is aligned with pidfd_send_signal(int pidfd, ...) because potential future pidfd_* will have one thing in common: the pid identifier. I think is natural to have this argument upfront - Add an additional flags argument, not used now - but you never know All other arguments are identical compared to ptrace - no other modifications where made. Currently there are some pieces missing! This is just an early proposal for a new syscall. Still missing: - support for every architecture - re-use shared functions and move to common place - perf syscall registration - selftests - ...| Userspace Example: #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/user.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <linux/limits.h> #ifndef __NR_pidfd_ptrace #define __NR_pidfd_ptrace 439 #endif static inline long do_pidfd_ptrace(int pidfd, int request, void *addr, void *data, unsigned int flags) { #ifdef __NR_pidfd_ptrace return syscall(__NR_pidfd_ptrace, pidfd, request, addr, data, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int pid, pidfd, ret, sleep_time = 10; char pid_path[PATH_MAX]; struct user_regs_struct regs; if (argc < 2) { fprintf(stderr, "Usage: %s <pid>\n", argv[0]); goto err; } pid = atoi(argv[1]); sprintf(pid_path, "/proc/%d", pid); pidfd = open(pid_path, O_DIRECTORY | O_CLOEXEC); if (pidfd == -1) { fprintf(stderr, "failed to open %s\n", pid_path); goto err; } ret = do_pidfd_ptrace(pidfd, PTRACE_ATTACH, 0, 0, 0); if (ret < 0) { perror("do_pidfd_ptrace, PTRACE_ATTACH:"); goto err; } waitpid(pid, NULL, 0); ret = do_pidfd_ptrace(pidfd, PTRACE_GETREGS, NULL, ®s, 0); if (ret == -1) { perror("do_pidfd_ptrace, PTRACE_GETREGS:"); goto err; } printf("RIP: %llx\nRAX: %llx\nRCX: %llx\nRDX: %llx\nRSI: %llx\nRDI: %llx\n", regs.rip, regs.rax, regs.rcx, regs.rdx, regs.rsi, regs.rdi); fprintf(stdout, "stopping task for %d seconds\n", sleep_time); sleep(sleep_time); ret = do_pidfd_ptrace(pidfd, PTRACE_DETACH, 0, 0, 0); if (ret == -1) { perror("do_pidfd_ptrace, PTRACE_DETACH:"); goto err; } exit(EXIT_SUCCESS); err: exit(EXIT_FAILURE); } Cc: Christian Brauner <christian@brauner.io> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> --- v2: - fixed a OOPS in __x64_sys_pidfd_ptrace+0x1bf/0x220 (call to __put_task_struct()) - add userland example --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- kernel/ptrace.c | 126 ++++++++++++++++++++----- kernel/sys_ni.c | 1 + 6 files changed, 113 insertions(+), 22 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 54581ac671b4..593f7fab90eb 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -442,3 +442,4 @@ 435 i386 clone3 sys_clone3 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd +438 i386 pidfd_ptrace sys_pidfd_ptrace diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 37b844f839bc..cd76d8343510 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -359,6 +359,7 @@ 435 common clone3 sys_clone3 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd +439 common pidfd_ptrace sys_pidfd_ptrace # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 1815065d52f3..254b071a5334 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1003,6 +1003,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_pidfd_ptrace(int pidfd, long request, unsigned long addr, + unsigned long data, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 3a3201e4618e..d62505742447 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_openat2, sys_openat2) #define __NR_pidfd_getfd 438 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) +#define __NR_pidfd_ptrace 439 +__SYSCALL(__NR_pidfd_ptrace, sys_pidfd_ptrace) #undef __NR_syscalls -#define __NR_syscalls 439 +#define __NR_syscalls 440 /* * 32 bit systems traditionally used different diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 43d6179508d6..e9e7e3225b9a 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -29,6 +29,7 @@ #include <linux/regset.h> #include <linux/hw_breakpoint.h> #include <linux/cn_proc.h> +#include <linux/proc_fs.h> #include <linux/compat.h> #include <linux/sched/signal.h> @@ -1239,10 +1240,39 @@ int ptrace_request(struct task_struct *child, long request, #define arch_ptrace_attach(child) do { } while (0) #endif +static inline long ptrace_call(struct task_struct *task, long request, unsigned long addr, + unsigned long data) +{ + long ret; + + if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { + ret = ptrace_attach(task, request, addr, data); + /* + * Some architectures need to do book-keeping after + * a ptrace attach. + */ + if (!ret) + arch_ptrace_attach(task); + goto out; + } + + ret = ptrace_check_attach(task, request == PTRACE_KILL || + request == PTRACE_INTERRUPT); + if (ret < 0) + goto out; + + ret = arch_ptrace(task, request, addr, data); + if (ret || request != PTRACE_DETACH) + ptrace_unfreeze_traced(task); + + out: + return ret; +} + SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, unsigned long, data) { - struct task_struct *child; + struct task_struct *task; long ret; if (request == PTRACE_TRACEME) { @@ -1252,35 +1282,89 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, goto out; } - child = find_get_task_by_vpid(pid); - if (!child) { + task = find_get_task_by_vpid(pid); + if (!task) { ret = -ESRCH; goto out; } - if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { - ret = ptrace_attach(child, request, addr, data); - /* - * Some architectures need to do book-keeping after - * a ptrace attach. - */ + ret = ptrace_call(task, request, addr, data); + put_task_struct(task); +out: + return ret; +} + +static struct pid *pidfd_to_pid(const struct file *file) +{ + struct pid *pid; + + pid = pidfd_pid(file); + if (!IS_ERR(pid)) + return pid; + + return tgid_pidfd_to_pid(file); +} + +static bool access_pidfd_pidns(struct pid *pid) +{ + struct pid_namespace *active = task_active_pid_ns(current); + struct pid_namespace *p = ns_of_pid(pid); + + for (;;) { + if (!p) + return false; + if (p == active) + break; + p = p->parent; + } + + return true; +} + +SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr, + unsigned long, data, unsigned int, flags) +{ + long ret; + struct fd f; + struct pid *pid; + struct task_struct *task; + + /* Enforce flags be set to 0 until we add an extension. */ + if (flags) + return -EINVAL; + + if (request == PTRACE_TRACEME) { + ret = ptrace_traceme(); if (!ret) - arch_ptrace_attach(child); - goto out_put_task_struct; + arch_ptrace_attach(current); + goto out; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); - if (ret < 0) - goto out_put_task_struct; + f = fdget(pidfd); + if (!f.file) + return -EBADF; - ret = arch_ptrace(child, request, addr, data); - if (ret || request != PTRACE_DETACH) - ptrace_unfreeze_traced(child); + /* Is this a pidfd? */ + pid = pidfd_to_pid(f.file); + if (IS_ERR(pid)) { + ret = PTR_ERR(pid); + goto err; + } - out_put_task_struct: - put_task_struct(child); - out: + ret = -EINVAL; + if (!access_pidfd_pidns(pid)) + goto err; + + task = pid_task(pid, PIDTYPE_PID); + if (!task) { + ret = -EINVAL; + goto err; + } + + ret = ptrace_call(task, request, addr, data); +err: + fdput(f); +out: return ret; } diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 3b69a560a7ac..f7795294b8c4 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -166,6 +166,7 @@ COND_SYSCALL(delete_module); COND_SYSCALL(syslog); /* kernel/ptrace.c */ +COND_SYSCALL_COMPAT(pidfd_ptrace); /* kernel/sched/core.c */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer @ 2020-04-27 8:30 ` Arnd Bergmann 2020-04-27 9:00 ` Hagen Paul Pfeifer 2020-04-27 17:08 ` Christian Brauner 1 sibling, 1 reply; 17+ messages in thread From: Arnd Bergmann @ 2020-04-27 8:30 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: linux-kernel@vger.kernel.org, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > include/linux/syscalls.h | 2 + > include/uapi/asm-generic/unistd.h | 4 +- When you add a new system call, please add it to all architectures. See the patches for the last few additions on how to do it, in particular the bit around adding the arm64 compat entry that is a bit tricky. It may be best to split out the patch changing all architectures from the one adding the new syscall. > +SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr, > + unsigned long, data, unsigned int, flags) > +{ When you add a new variant of ptrace, there also needs to be the corresponding COMPAT_SYSCLAL_DEFINE5(...) calling compat_ptrace_request(). If you want, you could unify the native and compat code paths more by merging compat_ptrace_request() into ptrace_request() and using in_compat_syscall() checks for the ones that are different. This also would best be done as a separate cleanup patch upfront. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 8:30 ` Arnd Bergmann @ 2020-04-27 9:00 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 17+ messages in thread From: Hagen Paul Pfeifer @ 2020-04-27 9:00 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel@vger.kernel.org, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch > On April 27, 2020 10:30 AM Arnd Bergmann <arnd@arndb.de> wrote: Hey Arnd > When you add a new system call, please add it to all architectures. > See the patches for the last few additions on how to do it, in > particular the bit around adding the arm64 compat entry that is > a bit tricky. Yes, the patch was intended to be as an rough (but x86_64 working) RFC patch to basically check if there is interest in it. Or whether there are fundamental reasons against pidfd_ptrace(). If not I will prepare v3 with all input, sure! Thank you Arnd Hagen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer 2020-04-27 8:30 ` Arnd Bergmann @ 2020-04-27 17:08 ` Christian Brauner 2020-04-27 17:52 ` Jann Horn 1 sibling, 1 reply; 17+ messages in thread From: Christian Brauner @ 2020-04-27 17:08 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: linux-kernel, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote: > Working on a safety-critical stress testing tool, using ptrace in an > rather uncommon way (stop, peeking memory, ...) for a bunch of > applications in an automated way I realized that once opened processes > where restarted and PIDs recycled. Resulting in monitoring and > manipulating the wrong processes. > > With the advent of pidfd we are now able to stick with one stable handle > to identifying processes exactly. We now have the ability to get this > race free. Sending signals now works like a charm, next step is to > extend the functionality also for ptrace. > > API: > long pidfd_ptrace(int pidfd, enum __ptrace_request request, > void *addr, void *data, unsigned flags); I'm in general not opposed to this if there's a clear need for this and users that are interested. But I think if people really prefer having this a new syscall then we should probably try to improve on the old one. Things that come to mind right away without doing a deep review are replacing the void *addr pointer with a dedicated struct ptract_args or union ptrace_args and a size argument. If we're not doing something like this or something more fundamental we can equally well either just duplicate all enums in the old ptrace syscall and append a _PIDFD to it where it makes sense. Christian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 17:08 ` Christian Brauner @ 2020-04-27 17:52 ` Jann Horn 2020-04-27 18:18 ` Eric W. Biederman 0 siblings, 1 reply; 17+ messages in thread From: Jann Horn @ 2020-04-27 17:52 UTC (permalink / raw) To: Christian Brauner Cc: Hagen Paul Pfeifer, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch On Mon, Apr 27, 2020 at 7:08 PM Christian Brauner <christian.brauner@ubuntu.com> wrote: > On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote: > > Working on a safety-critical stress testing tool, using ptrace in an > > rather uncommon way (stop, peeking memory, ...) for a bunch of > > applications in an automated way I realized that once opened processes > > where restarted and PIDs recycled. Resulting in monitoring and > > manipulating the wrong processes. > > > > With the advent of pidfd we are now able to stick with one stable handle > > to identifying processes exactly. We now have the ability to get this > > race free. Sending signals now works like a charm, next step is to > > extend the functionality also for ptrace. > > > > API: > > long pidfd_ptrace(int pidfd, enum __ptrace_request request, > > void *addr, void *data, unsigned flags); > > I'm in general not opposed to this if there's a clear need for this and > users that are interested. But I think if people really prefer having > this a new syscall then we should probably try to improve on the old > one. Things that come to mind right away without doing a deep review are > replacing the void *addr pointer with a dedicated struct ptract_args or > union ptrace_args and a size argument. If we're not doing something > like this or something more fundamental we can equally well either just > duplicate all enums in the old ptrace syscall and append a _PIDFD to it > where it makes sense. Yeah, it seems like just adding pidfd flavors of PTRACE_ATTACH and PTRACE_SEIZE should do the job. And if we do make a new syscall, there is a bunch of de-crufting that can be done... for example, just as some low-hanging fruit, a new ptrace API probably shouldn't have PTRACE_PEEKTEXT/PTRACE_PEEKDATA/PTRACE_POKETEXT/PTRACE_POKEDATA (we have /proc/$pid/mem for that, which is much saner than doing peek/poke in word-size units), and probably also shouldn't support all the weird arch-specific register-accessing requests (e.g. PTRACE_PEEKUSR/PTRACE_POKEUSR/PTRACE_GETREGS/PTRACE_SETREGS/PTRACE_GETFPREGS/...) that are nowadays accessible via PTRACE_GETREGSET/PTRACE_SETREGSET. (And there are also some more major changes that I think would be sensible; for example, it'd be neat if you could have notifications about the tracee delivered through a pollable file descriptor, and if you could get the kernel to tell you in each notification which type of ptrace stop you're dealing with (e.g. distinguishing syscall-entry-stop vs syscall-exit-stop), and it would be great to be able to directly inject syscalls into the child instead of having to figure out where a syscall instruction you can abuse is and then setting the instruction pointer to that, and it'd be helpful to be able to have multiple tracers attached to a single process so that you can e.g. have strace and gdb active on the same process concurrently...) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 17:52 ` Jann Horn @ 2020-04-27 18:18 ` Eric W. Biederman 2020-04-27 18:59 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-04-27 18:18 UTC (permalink / raw) To: Jann Horn Cc: Christian Brauner, Hagen Paul Pfeifer, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch Jann Horn <jannh@google.com> writes: > On Mon, Apr 27, 2020 at 7:08 PM Christian Brauner > <christian.brauner@ubuntu.com> wrote: >> On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote: >> > Working on a safety-critical stress testing tool, using ptrace in an >> > rather uncommon way (stop, peeking memory, ...) for a bunch of >> > applications in an automated way I realized that once opened processes >> > where restarted and PIDs recycled. Resulting in monitoring and >> > manipulating the wrong processes. >> > >> > With the advent of pidfd we are now able to stick with one stable handle >> > to identifying processes exactly. We now have the ability to get this >> > race free. Sending signals now works like a charm, next step is to >> > extend the functionality also for ptrace. >> > >> > API: >> > long pidfd_ptrace(int pidfd, enum __ptrace_request request, >> > void *addr, void *data, unsigned flags); >> >> I'm in general not opposed to this if there's a clear need for this and >> users that are interested. But I think if people really prefer having >> this a new syscall then we should probably try to improve on the old >> one. Things that come to mind right away without doing a deep review are >> replacing the void *addr pointer with a dedicated struct ptract_args or >> union ptrace_args and a size argument. If we're not doing something >> like this or something more fundamental we can equally well either just >> duplicate all enums in the old ptrace syscall and append a _PIDFD to it >> where it makes sense. > > Yeah, it seems like just adding pidfd flavors of PTRACE_ATTACH and > PTRACE_SEIZE should do the job. I am conflicted about that but I have to agree. Instead of duplicating everything it would be good enough to duplicate the once that cause the process to be attached to use. Then there would be no more pid races to worry about. > And if we do make a new syscall, there is a bunch of de-crufting that > can be done... for example, just as some low-hanging fruit, a new > ptrace API probably shouldn't have > PTRACE_PEEKTEXT/PTRACE_PEEKDATA/PTRACE_POKETEXT/PTRACE_POKEDATA (we > have /proc/$pid/mem for that, which is much saner than doing peek/poke > in word-size units), and probably also shouldn't support all the weird > arch-specific register-accessing requests (e.g. > PTRACE_PEEKUSR/PTRACE_POKEUSR/PTRACE_GETREGS/PTRACE_SETREGS/PTRACE_GETFPREGS/...) > that are nowadays accessible via PTRACE_GETREGSET/PTRACE_SETREGSET. > (And there are also some more major changes that I think would be > sensible; for example, it'd be neat if you could have notifications > about the tracee delivered through a pollable file descriptor, and if > you could get the kernel to tell you in each notification which type > of ptrace stop you're dealing with (e.g. distinguishing > syscall-entry-stop vs syscall-exit-stop), and it would be great to be > able to directly inject syscalls into the child instead of having to > figure out where a syscall instruction you can abuse is and then > setting the instruction pointer to that, and it'd be helpful to be > able to have multiple tracers attached to a single process so that you > can e.g. have strace and gdb active on the same process > concurrently...) How does this differ using the tracing related infrastructure we have for the kernel on a userspace process? I suspect augmenting the tracing infrastructure with the ability to set breakpoints and watchpoints (aka stopping userspace threads and processes might be a more fertile direction to go). But I agree either we want to just address the races in PTRACE_ATTACH and PTRACE_SIEZE or we want to take a good hard look at things. There is a good case for minimal changes because one of the cases that comes up is how much work will it take to change existing programs. But ultimately ptrace pretty much sucks so a very good set of test cases and documentation for what we want to implement would be a very good idea. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 18:18 ` Eric W. Biederman @ 2020-04-27 18:59 ` Hagen Paul Pfeifer 2020-04-27 20:08 ` Arnd Bergmann 0 siblings, 1 reply; 17+ messages in thread From: Hagen Paul Pfeifer @ 2020-04-27 18:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Jann Horn, Christian Brauner, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]: >I am conflicted about that but I have to agree. Instead of >duplicating everything it would be good enough to duplicate the once >that cause the process to be attached to use. Then there would be no >more pid races to worry about. >How does this differ using the tracing related infrastructure we have >for the kernel on a userspace process? I suspect augmenting the tracing >infrastructure with the ability to set breakpoints and watchpoints (aka >stopping userspace threads and processes might be a more fertile >direction to go). > >But I agree either we want to just address the races in PTRACE_ATTACH >and PTRACE_SIEZE or we want to take a good hard look at things. > >There is a good case for minimal changes because one of the cases that >comes up is how much work will it take to change existing programs. But >ultimately ptrace pretty much sucks so a very good set of test cases and >documentation for what we want to implement would be a very good idea. Hey Eric, Jann, Christian, Arnd, thank you for your valuable input! IMHO I think we have exactly two choices here: a) we go with my patchset that is 100% ptrace feature compatible - except the pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is automatically extended and vice versa. Both APIs are feature identical without any headaches. b) leave ptrace completely behind us and design ptrace that we have always dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness, a speedy API to copy & modify large chunks of data, io_uring/epoll support and of course: pidfd based (missed likely thousands of other dreams) I think a solution in between is not worth the effort! It will not be compatible in any way for the userspace and the benefit will be negligible. Ptrace is horrible API - everybody knows that but developers get comfy with it. You find examples everywhere, why should we make it harder for the user for no or little benefit (except that stable handle with pidfd and some cleanups)? Any thoughts on this? Hagen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 18:59 ` Hagen Paul Pfeifer @ 2020-04-27 20:08 ` Arnd Bergmann 2020-04-27 20:13 ` Christian Brauner 0 siblings, 1 reply; 17+ messages in thread From: Arnd Bergmann @ 2020-04-27 20:08 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Eric W. Biederman, Jann Horn, Christian Brauner, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman On Mon, Apr 27, 2020 at 8:59 PM Hagen Paul Pfeifer <hagen@jauu.net> wrote: > > * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]: > > >I am conflicted about that but I have to agree. Instead of > >duplicating everything it would be good enough to duplicate the once > >that cause the process to be attached to use. Then there would be no > >more pid races to worry about. > > >How does this differ using the tracing related infrastructure we have > >for the kernel on a userspace process? I suspect augmenting the tracing > >infrastructure with the ability to set breakpoints and watchpoints (aka > >stopping userspace threads and processes might be a more fertile > >direction to go). > > > >But I agree either we want to just address the races in PTRACE_ATTACH > >and PTRACE_SIEZE or we want to take a good hard look at things. > > > >There is a good case for minimal changes because one of the cases that > >comes up is how much work will it take to change existing programs. But > >ultimately ptrace pretty much sucks so a very good set of test cases and > >documentation for what we want to implement would be a very good idea. > > Hey Eric, Jann, Christian, Arnd, > > thank you for your valuable input! IMHO I think we have exactly two choices > here: > > a) we go with my patchset that is 100% ptrace feature compatible - except the > pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is > automatically extended and vice versa. Both APIs are feature identical > without any headaches. > b) leave ptrace completely behind us and design ptrace that we have always > dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness, > a speedy API to copy & modify large chunks of data, io_uring/epoll support > and of course: pidfd based (missed likely thousands of other dreams) > > I think a solution in between is not worth the effort! It will not be > compatible in any way for the userspace and the benefit will be negligible. > Ptrace is horrible API - everybody knows that but developers get comfy with > it. You find examples everywhere, why should we make it harder for the user for > no or little benefit (except that stable handle with pidfd and some cleanups)? > > Any thoughts on this? The way I understood Jann was that instead of a new syscall that duplicates everything in ptrace(), there would only need to be a new ptrace request such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH but takes a pidfd as the second argument, perhaps setting the return value to the pid on success. Same for PTRACE_SEIZE. In effect this is not much different from your a), just a variation on the calling conventions. The main upside is that it avoids adding another ugly interface, the flip side is that it makes the existing one slightly worse by adding complexity. Arnd ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 20:08 ` Arnd Bergmann @ 2020-04-27 20:13 ` Christian Brauner 2020-04-28 0:45 ` Aleksa Sarai 0 siblings, 1 reply; 17+ messages in thread From: Christian Brauner @ 2020-04-27 20:13 UTC (permalink / raw) To: Arnd Bergmann Cc: Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman On Mon, Apr 27, 2020 at 10:08:03PM +0200, Arnd Bergmann wrote: > On Mon, Apr 27, 2020 at 8:59 PM Hagen Paul Pfeifer <hagen@jauu.net> wrote: > > > > * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]: > > > > >I am conflicted about that but I have to agree. Instead of > > >duplicating everything it would be good enough to duplicate the once > > >that cause the process to be attached to use. Then there would be no > > >more pid races to worry about. > > > > >How does this differ using the tracing related infrastructure we have > > >for the kernel on a userspace process? I suspect augmenting the tracing > > >infrastructure with the ability to set breakpoints and watchpoints (aka > > >stopping userspace threads and processes might be a more fertile > > >direction to go). > > > > > >But I agree either we want to just address the races in PTRACE_ATTACH > > >and PTRACE_SIEZE or we want to take a good hard look at things. > > > > > >There is a good case for minimal changes because one of the cases that > > >comes up is how much work will it take to change existing programs. But > > >ultimately ptrace pretty much sucks so a very good set of test cases and > > >documentation for what we want to implement would be a very good idea. > > > > Hey Eric, Jann, Christian, Arnd, > > > > thank you for your valuable input! IMHO I think we have exactly two choices > > here: > > > > a) we go with my patchset that is 100% ptrace feature compatible - except the > > pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is > > automatically extended and vice versa. Both APIs are feature identical > > without any headaches. > > b) leave ptrace completely behind us and design ptrace that we have always > > dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness, > > a speedy API to copy & modify large chunks of data, io_uring/epoll support > > and of course: pidfd based (missed likely thousands of other dreams) > > > > I think a solution in between is not worth the effort! It will not be > > compatible in any way for the userspace and the benefit will be negligible. > > Ptrace is horrible API - everybody knows that but developers get comfy with > > it. You find examples everywhere, why should we make it harder for the user for > > no or little benefit (except that stable handle with pidfd and some cleanups)? > > > > Any thoughts on this? > > The way I understood Jann was that instead of a new syscall that duplicates > everything in ptrace(), there would only need to be a new ptrace request > such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH > but takes a pidfd as the second argument, perhaps setting the return value > to the pid on success. Same for PTRACE_SEIZE. That was my initial suggestion, yes. Any enum that identifies a target by a pid will get a new _PIDFD version and the pidfd is passed as pid_t argument. That should work and is similar to what I did for waitid() P_PIDFD. Realistically, there shouldn't be any system where pid_t is smaller than an int that we care about. > > In effect this is not much different from your a), just a variation on the > calling conventions. The main upside is that it avoids adding another > ugly interface, the flip side is that it makes the existing one slightly worse > by adding complexity. Basically, if a new syscall than please a proper re-design with real benefits. In the meantime we could make due with the _PIDFD variant. And then if someone wants to do the nitty gritty work of adding a ptrace variant purely based on pidfds and with a better api and features that e.g. Jann pointed out then by all means, please do so. I'm sure we would all welcome this as well. Christian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-27 20:13 ` Christian Brauner @ 2020-04-28 0:45 ` Aleksa Sarai 2020-04-28 1:36 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Aleksa Sarai @ 2020-04-28 0:45 UTC (permalink / raw) To: Christian Brauner Cc: Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman [-- Attachment #1: Type: text/plain, Size: 1949 bytes --] On 2020-04-27, Christian Brauner <christian.brauner@ubuntu.com> wrote: > On Mon, Apr 27, 2020 at 10:08:03PM +0200, Arnd Bergmann wrote: > > The way I understood Jann was that instead of a new syscall that duplicates > > everything in ptrace(), there would only need to be a new ptrace request > > such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH > > but takes a pidfd as the second argument, perhaps setting the return value > > to the pid on success. Same for PTRACE_SEIZE. > > That was my initial suggestion, yes. Any enum that identifies a target > by a pid will get a new _PIDFD version and the pidfd is passed as pid_t > argument. That should work and is similar to what I did for waitid() > P_PIDFD. Realistically, there shouldn't be any system where pid_t is > smaller than an int that we care about. > > > In effect this is not much different from your a), just a variation on the > > calling conventions. The main upside is that it avoids adding another > > ugly interface, the flip side is that it makes the existing one slightly worse > > by adding complexity. > > Basically, if a new syscall than please a proper re-design with real > benefits. > > In the meantime we could make due with the _PIDFD variant. And then if > someone wants to do the nitty gritty work of adding a ptrace variant > purely based on pidfds and with a better api and features that e.g. Jann > pointed out then by all means, please do so. I'm sure we would all > welcome this as well. I agree. It would be a shame to add a new ptrace syscall and not take the opportunity to fix the multitude of problems with the existing API. But that's a Pandora's box which we shouldn't open unless we want to wait a long time to get an API everyone is okay with -- a pretty high price to just get pidfds support in ptrace. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 0:45 ` Aleksa Sarai @ 2020-04-28 1:36 ` Linus Torvalds 2020-04-28 4:17 ` Andy Lutomirski 0 siblings, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2020-04-28 1:36 UTC (permalink / raw) To: Aleksa Sarai Cc: Christian Brauner, Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman On Mon, Apr 27, 2020 at 5:46 PM Aleksa Sarai <cyphar@cyphar.com> wrote: > > I agree. It would be a shame to add a new ptrace syscall and not take > the opportunity to fix the multitude of problems with the existing API. > But that's a Pandora's box which we shouldn't open unless we want to > wait a long time to get an API everyone is okay with -- a pretty high > price to just get pidfds support in ptrace. We should really be very very careful with some "smarter ptrace". We've had _so_ many security issues with ptrace that it's not even funny. And that's ignoring all the practical issues we've had. I would definitely not want to have anything that looks like ptrace AT ALL using pidfd. If we have a file descriptor to specify the target process, then we should probably take advantage of that file descriptor to actually make it more of a asynchronous interface that doesn't cause the kinds of deadlocks that we've had with ptrace. The synchronous nature of ptrace() means that not only do we have those nasty deadlocks, it's also very very expensive to use. It also has some other fundamental problems, like the whole "take over parent" and the SIGCHLD behavior. It also is hard to ptrace a ptracer. Which is annoying when you're debugging gdb or strace or whatever. So I think the thing to do is ask the gdb (and strace) people if they have any _very_ particular painpoints that we could perhaps help with. And then very carefully think things through and not repeat all the mistakes ptrace did. I'm not very optimistic. Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 1:36 ` Linus Torvalds @ 2020-04-28 4:17 ` Andy Lutomirski 2020-04-28 4:28 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Andy Lutomirski @ 2020-04-28 4:17 UTC (permalink / raw) To: Linus Torvalds Cc: Aleksa Sarai, Christian Brauner, Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman > On Apr 27, 2020, at 6:36 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Apr 27, 2020 at 5:46 PM Aleksa Sarai <cyphar@cyphar.com> wrote: >> >> I agree. It would be a shame to add a new ptrace syscall and not take >> the opportunity to fix the multitude of problems with the existing API. >> But that's a Pandora's box which we shouldn't open unless we want to >> wait a long time to get an API everyone is okay with -- a pretty high >> price to just get pidfds support in ptrace. > > We should really be very very careful with some "smarter ptrace". > We've had _so_ many security issues with ptrace that it's not even > funny. > > And that's ignoring all the practical issues we've had. > > I would definitely not want to have anything that looks like ptrace AT > ALL using pidfd. If we have a file descriptor to specify the target > process, then we should probably take advantage of that file > descriptor to actually make it more of a asynchronous interface that > doesn't cause the kinds of deadlocks that we've had with ptrace. > > The synchronous nature of ptrace() means that not only do we have > those nasty deadlocks, it's also very very expensive to use. It also > has some other fundamental problems, like the whole "take over parent" > and the SIGCHLD behavior. > > It also is hard to ptrace a ptracer. Which is annoying when you're > debugging gdb or strace or whatever. > > So I think the thing to do is ask the gdb (and strace) people if they > have any _very_ particular painpoints that we could perhaps help with. > > And then very carefully think things through and not repeat all the > mistakes ptrace did. > > I'm not very optimistic. I hate to say this, but I’m not convinced that asking the gdb folks is the right approach. GDB has an ancient architecture and is *incredibly* buggy. I’m sure ptrace is somewhere on the pain point list, but I suspect it’s utterly dwarfed by everything else. Maybe the LLDB people would have a better perspective? The rr folks would be a good bet, too. Or, and I know this is sacrilege, the VSCode people? I think one requirement for a better ptrace is that it should work if you try to debug, simultaneously, a debugger and its debugee. Maybe not perfectly, but it should work. And you should be able to debug init. Another major pain point I’ve seen is compat. A 64-bit debugger should be able to debug a program that switches back and forth between 32-bit and 64-bit. A debugger that is entirely unaware of a set of registers should be able to debug a process using those registers. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 4:17 ` Andy Lutomirski @ 2020-04-28 4:28 ` Linus Torvalds 2020-04-28 6:39 ` Hagen Paul Pfeifer 2020-04-28 8:21 ` Christian Brauner 0 siblings, 2 replies; 17+ messages in thread From: Linus Torvalds @ 2020-04-28 4:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Aleksa Sarai, Christian Brauner, Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman On Mon, Apr 27, 2020 at 9:17 PM Andy Lutomirski <luto@amacapital.net> wrote: > > I hate to say this, but I’m not convinced that asking the gdb folks is > the right approach. GDB has an ancient architecture and is > *incredibly* buggy. I’m sure ptrace is somewhere on the pain point > list, but I suspect it’s utterly dwarfed by everything else. You may be right. However, if gdbn isn't going to use it, then I seriously don't think it's worth changing much. It might be worth looking at people who don't use ptrace() for debugging, but for "incidental" reasons. IOW sandboxing, tracing, things like that. Maybe those people want things that are simpler and don't actually need the kinds of hard serialization that ptrace() wants. I'd rather add a few really simple things that might not be a full complement of operations for a debugger, but exactly because they aren't a full debugger, maybe they are things that we can tell are obviously secure and simple? Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 4:28 ` Linus Torvalds @ 2020-04-28 6:39 ` Hagen Paul Pfeifer 2020-04-28 7:45 ` Christian Brauner 2020-04-28 8:21 ` Christian Brauner 1 sibling, 1 reply; 17+ messages in thread From: Hagen Paul Pfeifer @ 2020-04-28 6:39 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Aleksa Sarai, Christian Brauner, Arnd Bergmann, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman * Linus Torvalds | 2020-04-27 21:28:14 [-0700]: >> I hate to say this, but I’m not convinced that asking the gdb folks is >> the right approach. GDB has an ancient architecture and is >> *incredibly* buggy. I’m sure ptrace is somewhere on the pain point >> list, but I suspect it’s utterly dwarfed by everything else. > >You may be right. However, if gdbn isn't going to use it, then I >seriously don't think it's worth changing much. > >It might be worth looking at people who don't use ptrace() for >debugging, but for "incidental" reasons. IOW sandboxing, tracing, >things like that. > >Maybe those people want things that are simpler and don't actually >need the kinds of hard serialization that ptrace() wants. > >I'd rather add a few really simple things that might not be a full >complement of operations for a debugger, but exactly because they >aren't a full debugger, maybe they are things that we can tell are >obviously secure and simple? Okay, to sum up the the whole discussion: we go forward with Jann's proposal by simple adding PTRACE_ATTACH_PIDFD and friends. This is the minimal invasive solution and the risk of an potenial security problem is almost not present[TM]. Changing the whole ptrace API is a different beast. I rather believe that I see Linus Linux successor rather than a ptrace successor. I am fine with PTRACE_ATTACH_PIDFD! Hagen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 6:39 ` Hagen Paul Pfeifer @ 2020-04-28 7:45 ` Christian Brauner 0 siblings, 0 replies; 17+ messages in thread From: Christian Brauner @ 2020-04-28 7:45 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Linus Torvalds, Andy Lutomirski, Aleksa Sarai, Arnd Bergmann, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman On Tue, Apr 28, 2020 at 08:39:35AM +0200, Hagen Paul Pfeifer wrote: > * Linus Torvalds | 2020-04-27 21:28:14 [-0700]: > > >> I hate to say this, but I’m not convinced that asking the gdb folks is > >> the right approach. GDB has an ancient architecture and is > >> *incredibly* buggy. I’m sure ptrace is somewhere on the pain point > >> list, but I suspect it’s utterly dwarfed by everything else. > > > >You may be right. However, if gdbn isn't going to use it, then I > >seriously don't think it's worth changing much. > > > >It might be worth looking at people who don't use ptrace() for > >debugging, but for "incidental" reasons. IOW sandboxing, tracing, > >things like that. > > > >Maybe those people want things that are simpler and don't actually > >need the kinds of hard serialization that ptrace() wants. > > > >I'd rather add a few really simple things that might not be a full > >complement of operations for a debugger, but exactly because they > >aren't a full debugger, maybe they are things that we can tell are > >obviously secure and simple? > > Okay, to sum up the the whole discussion: we go forward with Jann's proposal > by simple adding PTRACE_ATTACH_PIDFD and friends. This is the minimal invasive > solution and the risk of an potenial security problem is almost not present[TM]. > > Changing the whole ptrace API is a different beast. I rather believe that I > see Linus Linux successor rather than a ptrace successor. > > I am fine with PTRACE_ATTACH_PIDFD! If this is enough for you use-case then we should make due with my initial suggestion, yes. I'd be fine with adding this variant. I initially thought that we'd likely would need to support a few more but I don't think we want to actually; there's a bunch of crazy stuff in there. Christian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall 2020-04-28 4:28 ` Linus Torvalds 2020-04-28 6:39 ` Hagen Paul Pfeifer @ 2020-04-28 8:21 ` Christian Brauner 1 sibling, 0 replies; 17+ messages in thread From: Christian Brauner @ 2020-04-28 8:21 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Aleksa Sarai, Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list, Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch, Greg Kroah-Hartman On Mon, Apr 27, 2020 at 09:28:14PM -0700, Linus Torvalds wrote: > On Mon, Apr 27, 2020 at 9:17 PM Andy Lutomirski <luto@amacapital.net> wrote: > > > > I hate to say this, but I’m not convinced that asking the gdb folks is > > the right approach. GDB has an ancient architecture and is > > *incredibly* buggy. I’m sure ptrace is somewhere on the pain point > > list, but I suspect it’s utterly dwarfed by everything else. > > You may be right. However, if gdbn isn't going to use it, then I > seriously don't think it's worth changing much. > > It might be worth looking at people who don't use ptrace() for > debugging, but for "incidental" reasons. IOW sandboxing, tracing, > things like that. > > Maybe those people want things that are simpler and don't actually > need the kinds of hard serialization that ptrace() wants. > > I'd rather add a few really simple things that might not be a full > complement of operations for a debugger, but exactly because they > aren't a full debugger, maybe they are things that we can tell are > obviously secure and simple? I think the biggest non-anecdotal user of ptrace() besides debuggers is actually criu (and strace of course). They use it to inject parasite code (their phrasing not mine) into another task to handle restoring the parts of a task that can't be restored from the outside. Looking through their repo they make quite a bit of use of ptrace functionality including some arch specific bits: PTRACE_GETREGSET PTRACE_GETFPREGS PTRACE_PEEKUSER PTRACE_POKEUSER PTRACE_CONT PTRACE_SETREGSET PTRACE_GETVFPREGS /* arm/arm64 */ PTRACE_GETVRREGS /* powerpc */ PTRACE_GETVSRREGS /* powerpc */ PTRACE_EVENT_STOP PTRACE_GETSIGMASK PTRACE_INTERRUPT PTRACE_DETACH PTRACE_GETSIGINFO PTRACE_SEIZE PTRACE_SETSIGMASK PTRACE_SI_EVENT PTRACE_SYSCALL PTRACE_SETOPTIONS PTRACE_ATTACH PTRACE_O_SUSPEND_SECCOMP PTRACE_PEEKSIGINFO PTRACE_SECCOMP_GET_FILTER PTRACE_SECCOMP_GET_METADATA So I guess strace and criu would be the ones to go and ask and if they don't care enough we already need to start squinting for other larg-ish users. proot comes to mind https://github.com/proot-me/proot (From personal experience, most of the time when ptrace is used in a non-debugger codebase it's either to plug a security hole exploitable through ptracing the task and the fix is ptracing that very task to prevent the attacker from ptracing it (where non-dumpability alone doesn't cut it) or the idea is dropped immediately to not lose the ability to use a debugger on the program.) Christian ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-04-28 8:21 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-26 13:01 [RFC] ptrace, pidfd: add pidfd_ptrace syscall Hagen Paul Pfeifer 2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer 2020-04-27 8:30 ` Arnd Bergmann 2020-04-27 9:00 ` Hagen Paul Pfeifer 2020-04-27 17:08 ` Christian Brauner 2020-04-27 17:52 ` Jann Horn 2020-04-27 18:18 ` Eric W. Biederman 2020-04-27 18:59 ` Hagen Paul Pfeifer 2020-04-27 20:08 ` Arnd Bergmann 2020-04-27 20:13 ` Christian Brauner 2020-04-28 0:45 ` Aleksa Sarai 2020-04-28 1:36 ` Linus Torvalds 2020-04-28 4:17 ` Andy Lutomirski 2020-04-28 4:28 ` Linus Torvalds 2020-04-28 6:39 ` Hagen Paul Pfeifer 2020-04-28 7:45 ` Christian Brauner 2020-04-28 8:21 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).