[RFC] ptrace, pidfd: add pidfd

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] ptrace, pidfd: add pidfd_ptrace syscall
@ 2020-04-26 13:01 Hagen Paul Pfeifer
  2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer
  0 siblings, 1 reply; 17+ messages in thread
From: Hagen Paul Pfeifer @ 2020-04-26 13:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Florian Weimer, Al Viro, Hagen Paul Pfeifer, Christian Brauner,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells,
	Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
	Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch

Working on a safety-critical stress testing tool, using ptrace in an
rather uncommon way (stop, peeking memory, ...) for a bunch of
applications in an automated way I realized that once opened processes
where restarted and PIDs recycled.  Resulting in monitoring and
manipulating the wrong processes.

With the advent of pidfd we are now able to stick with one stable handle
to identifying processes exactly. We now have the ability to get this
race free. Sending signals now works like a charm, next step is to
extend the functionality also for ptrace.

API:
         long pidfd_ptrace(int pidfd, enum __ptrace_request request,
                           void *addr, void *data, unsigned flags);

Based on original ptrace, the following API changes where made:

- Process identificator (pidfd) is now moved as first argument, this is aligned
  with pidfd_send_signal(int pidfd, ...) because potential future pidfd_* will have
  one thing in common: the pid identifier. I think is natural to have
  this argument upfront
- Add an additional flags argument, not used now - but you never know

All other arguments are identical compared to ptrace - no other
modifications where made.

Currently there are some pieces missing! This is just an early proposal
for a new syscall. Still missing:
- support for every architecture
- re-use shared functions and move to common place
- perf syscall registration
- selftests
- ...

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Christian Brauner <christian@brauner.io>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 include/linux/syscalls.h               |   2 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/ptrace.c                        | 129 ++++++++++++++++++++-----
 kernel/sys_ni.c                        |   1 +
 6 files changed, 115 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54581ac671b4..593f7fab90eb 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -442,3 +442,4 @@
 435	i386	clone3			sys_clone3
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
+438	i386	pidfd_ptrace		sys_pidfd_ptrace
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f839bc..cd76d8343510 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -359,6 +359,7 @@
 435	common	clone3			sys_clone3
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
+439	common	pidfd_ptrace		sys_pidfd_ptrace
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1815065d52f3..254b071a5334 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1003,6 +1003,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_pidfd_ptrace(int pidfd, long request, unsigned long addr,
+		                 unsigned long data, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..64749a6f156e 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3)
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
+#define __NR_pidfd_getfd 439
+__SYSCALL(__NR_pidfd_ptrace, sys_pidfd_ptrace)
 
 #undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 440
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..8f4e99247742 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -29,6 +29,7 @@
 #include <linux/regset.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/cn_proc.h>
+#include <linux/proc_fs.h>
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 
@@ -1239,48 +1240,132 @@ int ptrace_request(struct task_struct *child, long request,
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
 
+static inline long ptrace_call(struct task_struct *task, long request, unsigned long addr,
+		               unsigned long data)
+{
+	long ret;
+
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(task, request, addr, data);
+		/*
+		 * Some architectures need to do book-keeping after
+		 * a ptrace attach.
+		 */
+		if (!ret)
+			arch_ptrace_attach(task);
+		goto out;
+	}
+
+	ret = ptrace_check_attach(task, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
+	if (ret < 0)
+		goto out;
+
+	ret = arch_ptrace(task, request, addr, data);
+	if (ret || request != PTRACE_DETACH)
+		ptrace_unfreeze_traced(task);
+
+ out:
+	put_task_struct(task);
+	return ret;
+}
+
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
-	struct task_struct *child;
+	struct task_struct *task;
 	long ret;
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
 		if (!ret)
 			arch_ptrace_attach(current);
-		goto out;
+		return ret;
 	}
 
-	child = find_get_task_by_vpid(pid);
-	if (!child) {
+	task = find_get_task_by_vpid(pid);
+	if (!task) {
 		ret = -ESRCH;
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
-		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
+
+	ret = ptrace_call(task, request, addr, data);
+out:
+	return ret;
+}
+
+static struct pid *pidfd_to_pid(const struct file *file)
+{
+	struct pid *pid;
+
+	pid = pidfd_pid(file);
+	if (!IS_ERR(pid))
+		return pid;
+
+	return tgid_pidfd_to_pid(file);
+}
+
+static bool access_pidfd_pidns(struct pid *pid)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *p = ns_of_pid(pid);
+
+	for (;;) {
+		if (!p)
+			return false;
+		if (p == active)
+			break;
+		p = p->parent;
+	}
+
+	return true;
+}
+
+SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr,
+		unsigned long, data, unsigned int, flags)
+{
+	long ret;
+	struct fd f;
+	struct pid *pid;
+	struct task_struct *task;
+
+	/* Enforce flags be set to 0 until we add an extension. */
+	if (flags)
+		return -EINVAL;
+
+	if (request == PTRACE_TRACEME) {
+		ret = ptrace_traceme();
 		if (!ret)
-			arch_ptrace_attach(child);
-		goto out_put_task_struct;
+			arch_ptrace_attach(current);
+		goto out;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
-	if (ret < 0)
-		goto out_put_task_struct;
+	f = fdget(pidfd);
+	if (!f.file)
+		return -EBADF;
 
-	ret = arch_ptrace(child, request, addr, data);
-	if (ret || request != PTRACE_DETACH)
-		ptrace_unfreeze_traced(child);
+	/* Is this a pidfd? */
+	pid = pidfd_to_pid(f.file);
+	if (IS_ERR(pid)) {
+		ret = PTR_ERR(pid);
+		goto err;
+	}
 
- out_put_task_struct:
-	put_task_struct(child);
- out:
+	ret = -EINVAL;
+	if (!access_pidfd_pidns(pid))
+		goto err;
+
+	task = pid_task(pid, PIDTYPE_PID);
+	if (!task) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = ptrace_call(task, request, addr, data);
+err:
+	fdput(f);
+out:
 	return ret;
 }
 
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..f7795294b8c4 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -166,6 +166,7 @@ COND_SYSCALL(delete_module);
 COND_SYSCALL(syslog);
 
 /* kernel/ptrace.c */
+COND_SYSCALL_COMPAT(pidfd_ptrace);
 
 /* kernel/sched/core.c */
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-26 13:01 [RFC] ptrace, pidfd: add pidfd_ptrace syscall Hagen Paul Pfeifer
@ 2020-04-26 16:34 ` Hagen Paul Pfeifer
  2020-04-27  8:30   ` Arnd Bergmann
  2020-04-27 17:08   ` Christian Brauner
  0 siblings, 2 replies; 17+ messages in thread
From: Hagen Paul Pfeifer @ 2020-04-26 16:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Florian Weimer, Al Viro, Hagen Paul Pfeifer, Christian Brauner,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells,
	Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
	Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch

Working on a safety-critical stress testing tool, using ptrace in an
rather uncommon way (stop, peeking memory, ...) for a bunch of
applications in an automated way I realized that once opened processes
where restarted and PIDs recycled.  Resulting in monitoring and
manipulating the wrong processes.

With the advent of pidfd we are now able to stick with one stable handle
to identifying processes exactly. We now have the ability to get this
race free. Sending signals now works like a charm, next step is to
extend the functionality also for ptrace.

API:
         long pidfd_ptrace(int pidfd, enum __ptrace_request request,
                           void *addr, void *data, unsigned flags);

Based on original ptrace, the following API changes where made:

- Process identificator (pidfd) is now moved to start, this is aligned
  with pidfd_send_signal(int pidfd, ...) because potential future pidfd_* will have
  one thing in common: the pid identifier. I think is natural to have
  this argument upfront
- Add an additional flags argument, not used now - but you never know

All other arguments are identical compared to ptrace - no other
modifications where made.

Currently there are some pieces missing! This is just an early proposal
for a new syscall. Still missing:
- support for every architecture
- re-use shared functions and move to common place
- perf syscall registration
- selftests
- ...|

Userspace Example:

#define _GNU_SOURCE
#include <errno.h>
#include <sched.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <linux/limits.h>

#ifndef __NR_pidfd_ptrace
#define __NR_pidfd_ptrace 439
#endif

static inline long do_pidfd_ptrace(int pidfd, int request, void *addr, void *data, unsigned int flags)
{
#ifdef __NR_pidfd_ptrace
        return syscall(__NR_pidfd_ptrace, pidfd, request, addr, data, flags);
#else
        return -ENOSYS;
#endif
}

int main(int argc, char *argv[])
{
	int pid, pidfd, ret, sleep_time = 10;
	char pid_path[PATH_MAX];
	struct user_regs_struct regs;

	if (argc < 2) {
		fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
		goto err;
	}
	pid = atoi(argv[1]);

	sprintf(pid_path, "/proc/%d", pid);
	pidfd = open(pid_path, O_DIRECTORY | O_CLOEXEC);
	if (pidfd == -1) {
		fprintf(stderr, "failed to open %s\n", pid_path);
		goto err;
	}

	ret = do_pidfd_ptrace(pidfd, PTRACE_ATTACH, 0, 0, 0);
	if (ret < 0) {
		perror("do_pidfd_ptrace, PTRACE_ATTACH:");
		goto err;
	}
	waitpid(pid, NULL, 0);
	ret = do_pidfd_ptrace(pidfd, PTRACE_GETREGS, NULL, &regs, 0);
	if (ret == -1) {
		perror("do_pidfd_ptrace, PTRACE_GETREGS:");
		goto err;
	}
	printf("RIP: %llx\nRAX: %llx\nRCX: %llx\nRDX: %llx\nRSI: %llx\nRDI: %llx\n",
	       regs.rip, regs.rax, regs.rcx, regs.rdx, regs.rsi, regs.rdi);
	fprintf(stdout, "stopping task for %d seconds\n",  sleep_time);
	sleep(sleep_time);
	ret = do_pidfd_ptrace(pidfd, PTRACE_DETACH, 0, 0, 0);
	if (ret == -1) {
		perror("do_pidfd_ptrace, PTRACE_DETACH:");
		goto err;
	}

	exit(EXIT_SUCCESS);
err:
	exit(EXIT_FAILURE);
}


Cc: Christian Brauner <christian@brauner.io>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
---

v2:
- fixed a OOPS in __x64_sys_pidfd_ptrace+0x1bf/0x220 (call to __put_task_struct())
- add userland example

---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 include/linux/syscalls.h               |   2 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/ptrace.c                        | 126 ++++++++++++++++++++-----
 kernel/sys_ni.c                        |   1 +
 6 files changed, 113 insertions(+), 22 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54581ac671b4..593f7fab90eb 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -442,3 +442,4 @@
 435	i386	clone3			sys_clone3
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
+438	i386	pidfd_ptrace		sys_pidfd_ptrace
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f839bc..cd76d8343510 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -359,6 +359,7 @@
 435	common	clone3			sys_clone3
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
+439	common	pidfd_ptrace		sys_pidfd_ptrace
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1815065d52f3..254b071a5334 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1003,6 +1003,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_pidfd_ptrace(int pidfd, long request, unsigned long addr,
+		                 unsigned long data, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..d62505742447 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3)
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
+#define __NR_pidfd_ptrace 439
+__SYSCALL(__NR_pidfd_ptrace, sys_pidfd_ptrace)
 
 #undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 440
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..e9e7e3225b9a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -29,6 +29,7 @@
 #include <linux/regset.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/cn_proc.h>
+#include <linux/proc_fs.h>
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 
@@ -1239,10 +1240,39 @@ int ptrace_request(struct task_struct *child, long request,
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
 
+static inline long ptrace_call(struct task_struct *task, long request, unsigned long addr,
+		               unsigned long data)
+{
+	long ret;
+
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(task, request, addr, data);
+		/*
+		 * Some architectures need to do book-keeping after
+		 * a ptrace attach.
+		 */
+		if (!ret)
+			arch_ptrace_attach(task);
+		goto out;
+	}
+
+	ret = ptrace_check_attach(task, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
+	if (ret < 0)
+		goto out;
+
+	ret = arch_ptrace(task, request, addr, data);
+	if (ret || request != PTRACE_DETACH)
+		ptrace_unfreeze_traced(task);
+
+ out:
+	return ret;
+}
+
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
-	struct task_struct *child;
+	struct task_struct *task;
 	long ret;
 
 	if (request == PTRACE_TRACEME) {
@@ -1252,35 +1282,89 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out;
 	}
 
-	child = find_get_task_by_vpid(pid);
-	if (!child) {
+	task = find_get_task_by_vpid(pid);
+	if (!task) {
 		ret = -ESRCH;
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
-		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
+	ret = ptrace_call(task, request, addr, data);
+	put_task_struct(task);
+out:
+	return ret;
+}
+
+static struct pid *pidfd_to_pid(const struct file *file)
+{
+	struct pid *pid;
+
+	pid = pidfd_pid(file);
+	if (!IS_ERR(pid))
+		return pid;
+
+	return tgid_pidfd_to_pid(file);
+}
+
+static bool access_pidfd_pidns(struct pid *pid)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *p = ns_of_pid(pid);
+
+	for (;;) {
+		if (!p)
+			return false;
+		if (p == active)
+			break;
+		p = p->parent;
+	}
+
+	return true;
+}
+
+SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr,
+		unsigned long, data, unsigned int, flags)
+{
+	long ret;
+	struct fd f;
+	struct pid *pid;
+	struct task_struct *task;
+
+	/* Enforce flags be set to 0 until we add an extension. */
+	if (flags)
+		return -EINVAL;
+
+	if (request == PTRACE_TRACEME) {
+		ret = ptrace_traceme();
 		if (!ret)
-			arch_ptrace_attach(child);
-		goto out_put_task_struct;
+			arch_ptrace_attach(current);
+		goto out;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
-	if (ret < 0)
-		goto out_put_task_struct;
+	f = fdget(pidfd);
+	if (!f.file)
+		return -EBADF;
 
-	ret = arch_ptrace(child, request, addr, data);
-	if (ret || request != PTRACE_DETACH)
-		ptrace_unfreeze_traced(child);
+	/* Is this a pidfd? */
+	pid = pidfd_to_pid(f.file);
+	if (IS_ERR(pid)) {
+		ret = PTR_ERR(pid);
+		goto err;
+	}
 
- out_put_task_struct:
-	put_task_struct(child);
- out:
+	ret = -EINVAL;
+	if (!access_pidfd_pidns(pid))
+		goto err;
+
+	task = pid_task(pid, PIDTYPE_PID);
+	if (!task) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = ptrace_call(task, request, addr, data);
+err:
+	fdput(f);
+out:
 	return ret;
 }
 
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..f7795294b8c4 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -166,6 +166,7 @@ COND_SYSCALL(delete_module);
 COND_SYSCALL(syslog);
 
 /* kernel/ptrace.c */
+COND_SYSCALL_COMPAT(pidfd_ptrace);
 
 /* kernel/sched/core.c */
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer
@ 2020-04-27  8:30   ` Arnd Bergmann
  2020-04-27  9:00     ` Hagen Paul Pfeifer
  2020-04-27 17:08   ` Christian Brauner
  1 sibling, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2020-04-27  8:30 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: linux-kernel@vger.kernel.org, Florian Weimer, Al Viro,
	Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells,
	Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch

>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  include/linux/syscalls.h               |   2 +
>  include/uapi/asm-generic/unistd.h      |   4 +-

When you add a new system call, please add it to all architectures.
See the patches for the last few additions on how to do it, in
particular the bit around adding the arm64 compat entry that is
a bit tricky.

It may be best to split out the patch changing all architectures from
the one adding the new syscall.

> +SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr,
> +               unsigned long, data, unsigned int, flags)
> +{

When you add a new variant of ptrace, there also needs to be the
corresponding COMPAT_SYSCLAL_DEFINE5(...) calling
compat_ptrace_request().

If you want, you could unify the native and compat code paths more by
merging compat_ptrace_request() into ptrace_request() and using
in_compat_syscall() checks for the ones that are different. This also
would best be done as a separate cleanup patch upfront.

      Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27  8:30   ` Arnd Bergmann
@ 2020-04-27  9:00     ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 17+ messages in thread
From: Hagen Paul Pfeifer @ 2020-04-27  9:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel@vger.kernel.org, Florian Weimer, Al Viro,
	Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Brian Gerst, Sami Tolvanen, David Howells,
	Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch

> On April 27, 2020 10:30 AM Arnd Bergmann <arnd@arndb.de> wrote:

Hey Arnd

> When you add a new system call, please add it to all architectures.
> See the patches for the last few additions on how to do it, in
> particular the bit around adding the arm64 compat entry that is
> a bit tricky.

Yes, the patch was intended to be as an rough (but x86_64 working)
RFC patch to basically check if there is interest in it. Or whether
there are fundamental reasons against pidfd_ptrace().

If not I will prepare v3 with all input, sure! Thank you Arnd

Hagen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer
  2020-04-27  8:30   ` Arnd Bergmann
@ 2020-04-27 17:08   ` Christian Brauner
  2020-04-27 17:52     ` Jann Horn
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Brauner @ 2020-04-27 17:08 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: linux-kernel, Florian Weimer, Al Viro, Christian Brauner,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Arnd Bergmann, Brian Gerst, Sami Tolvanen, David Howells,
	Aleksa Sarai, Andy Lutomirski, Oleg Nesterov, Eric W . Biederman,
	Arnaldo Carvalho de Melo, Sargun Dhillon, linux-api, linux-arch

On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote:
> Working on a safety-critical stress testing tool, using ptrace in an
> rather uncommon way (stop, peeking memory, ...) for a bunch of
> applications in an automated way I realized that once opened processes
> where restarted and PIDs recycled.  Resulting in monitoring and
> manipulating the wrong processes.
> 
> With the advent of pidfd we are now able to stick with one stable handle
> to identifying processes exactly. We now have the ability to get this
> race free. Sending signals now works like a charm, next step is to
> extend the functionality also for ptrace.
> 
> API:
>          long pidfd_ptrace(int pidfd, enum __ptrace_request request,
>                            void *addr, void *data, unsigned flags);

I'm in general not opposed to this if there's a clear need for this and
users that are interested. But I think if people really prefer having
this a new syscall then we should probably try to improve on the old
one. Things that come to mind right away without doing a deep review are
replacing the void *addr pointer with a dedicated struct ptract_args or
union ptrace_args and a size argument. If we're not doing something
like this or something more fundamental we can equally well either just
duplicate all enums in the old ptrace syscall and append a _PIDFD to it
where it makes sense.

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 17:08   ` Christian Brauner
@ 2020-04-27 17:52     ` Jann Horn
  2020-04-27 18:18       ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Jann Horn @ 2020-04-27 17:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Hagen Paul Pfeifer, kernel list, Florian Weimer, Al Viro,
	Christian Brauner, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Arnd Bergmann, Brian Gerst, Sami Tolvanen,
	David Howells, Aleksa Sarai, Andy Lutomirski, Oleg Nesterov,
	Eric W . Biederman, Arnaldo Carvalho de Melo, Sargun Dhillon,
	Linux API, linux-arch

On Mon, Apr 27, 2020 at 7:08 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote:
> > Working on a safety-critical stress testing tool, using ptrace in an
> > rather uncommon way (stop, peeking memory, ...) for a bunch of
> > applications in an automated way I realized that once opened processes
> > where restarted and PIDs recycled.  Resulting in monitoring and
> > manipulating the wrong processes.
> >
> > With the advent of pidfd we are now able to stick with one stable handle
> > to identifying processes exactly. We now have the ability to get this
> > race free. Sending signals now works like a charm, next step is to
> > extend the functionality also for ptrace.
> >
> > API:
> >          long pidfd_ptrace(int pidfd, enum __ptrace_request request,
> >                            void *addr, void *data, unsigned flags);
>
> I'm in general not opposed to this if there's a clear need for this and
> users that are interested. But I think if people really prefer having
> this a new syscall then we should probably try to improve on the old
> one. Things that come to mind right away without doing a deep review are
> replacing the void *addr pointer with a dedicated struct ptract_args or
> union ptrace_args and a size argument. If we're not doing something
> like this or something more fundamental we can equally well either just
> duplicate all enums in the old ptrace syscall and append a _PIDFD to it
> where it makes sense.

Yeah, it seems like just adding pidfd flavors of PTRACE_ATTACH and
PTRACE_SEIZE should do the job.

And if we do make a new syscall, there is a bunch of de-crufting that
can be done... for example, just as some low-hanging fruit, a new
ptrace API probably shouldn't have
PTRACE_PEEKTEXT/PTRACE_PEEKDATA/PTRACE_POKETEXT/PTRACE_POKEDATA (we
have /proc/$pid/mem for that, which is much saner than doing peek/poke
in word-size units), and probably also shouldn't support all the weird
arch-specific register-accessing requests (e.g.
PTRACE_PEEKUSR/PTRACE_POKEUSR/PTRACE_GETREGS/PTRACE_SETREGS/PTRACE_GETFPREGS/...)
that are nowadays accessible via PTRACE_GETREGSET/PTRACE_SETREGSET.

(And there are also some more major changes that I think would be
sensible; for example, it'd be neat if you could have notifications
about the tracee delivered through a pollable file descriptor, and if
you could get the kernel to tell you in each notification which type
of ptrace stop you're dealing with (e.g. distinguishing
syscall-entry-stop vs syscall-exit-stop), and it would be great to be
able to directly inject syscalls into the child instead of having to
figure out where a syscall instruction you can abuse is and then
setting the instruction pointer to that, and it'd be helpful to be
able to have multiple tracers attached to a single process so that you
can e.g. have strace and gdb active on the same process
concurrently...)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 17:52     ` Jann Horn
@ 2020-04-27 18:18       ` Eric W. Biederman
  2020-04-27 18:59         ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-04-27 18:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Christian Brauner, Hagen Paul Pfeifer, kernel list,
	Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Arnd Bergmann,
	Brian Gerst, Sami Tolvanen, David Howells, Aleksa Sarai,
	Andy Lutomirski, Oleg Nesterov, Arnaldo Carvalho de Melo,
	Sargun Dhillon, Linux API, linux-arch

Jann Horn <jannh@google.com> writes:

> On Mon, Apr 27, 2020 at 7:08 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
>> On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote:
>> > Working on a safety-critical stress testing tool, using ptrace in an
>> > rather uncommon way (stop, peeking memory, ...) for a bunch of
>> > applications in an automated way I realized that once opened processes
>> > where restarted and PIDs recycled.  Resulting in monitoring and
>> > manipulating the wrong processes.
>> >
>> > With the advent of pidfd we are now able to stick with one stable handle
>> > to identifying processes exactly. We now have the ability to get this
>> > race free. Sending signals now works like a charm, next step is to
>> > extend the functionality also for ptrace.
>> >
>> > API:
>> >          long pidfd_ptrace(int pidfd, enum __ptrace_request request,
>> >                            void *addr, void *data, unsigned flags);
>>
>> I'm in general not opposed to this if there's a clear need for this and
>> users that are interested. But I think if people really prefer having
>> this a new syscall then we should probably try to improve on the old
>> one. Things that come to mind right away without doing a deep review are
>> replacing the void *addr pointer with a dedicated struct ptract_args or
>> union ptrace_args and a size argument. If we're not doing something
>> like this or something more fundamental we can equally well either just
>> duplicate all enums in the old ptrace syscall and append a _PIDFD to it
>> where it makes sense.
>
> Yeah, it seems like just adding pidfd flavors of PTRACE_ATTACH and
> PTRACE_SEIZE should do the job.

I am conflicted about that but I have to agree.    Instead of
duplicating everything it would be good enough to duplicate the once
that cause the process to be attached to use.  Then there would be no
more pid races to worry about.

> And if we do make a new syscall, there is a bunch of de-crufting that
> can be done... for example, just as some low-hanging fruit, a new
> ptrace API probably shouldn't have
> PTRACE_PEEKTEXT/PTRACE_PEEKDATA/PTRACE_POKETEXT/PTRACE_POKEDATA (we
> have /proc/$pid/mem for that, which is much saner than doing peek/poke
> in word-size units), and probably also shouldn't support all the weird
> arch-specific register-accessing requests (e.g.
> PTRACE_PEEKUSR/PTRACE_POKEUSR/PTRACE_GETREGS/PTRACE_SETREGS/PTRACE_GETFPREGS/...)
> that are nowadays accessible via PTRACE_GETREGSET/PTRACE_SETREGSET.


> (And there are also some more major changes that I think would be
> sensible; for example, it'd be neat if you could have notifications
> about the tracee delivered through a pollable file descriptor, and if
> you could get the kernel to tell you in each notification which type
> of ptrace stop you're dealing with (e.g. distinguishing
> syscall-entry-stop vs syscall-exit-stop), and it would be great to be
> able to directly inject syscalls into the child instead of having to
> figure out where a syscall instruction you can abuse is and then
> setting the instruction pointer to that, and it'd be helpful to be
> able to have multiple tracers attached to a single process so that you
> can e.g. have strace and gdb active on the same process
> concurrently...)

How does this differ using the tracing related infrastructure we have
for the kernel on a userspace process?  I suspect augmenting the tracing
infrastructure with the ability to set breakpoints and watchpoints (aka
stopping userspace threads and processes might be a more fertile
direction to go).

But I agree either we want to just address the races in PTRACE_ATTACH
and PTRACE_SIEZE or we want to take a good hard look at things.

There is a good case for minimal changes because one of the cases that
comes up is how much work will it take to change existing programs.  But
ultimately ptrace pretty much sucks so a very good set of test cases and
documentation for what we want to implement would be a very good idea.

Eric



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 18:18       ` Eric W. Biederman
@ 2020-04-27 18:59         ` Hagen Paul Pfeifer
  2020-04-27 20:08           ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Hagen Paul Pfeifer @ 2020-04-27 18:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jann Horn, Christian Brauner, kernel list, Florian Weimer,
	Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Arnd Bergmann, Brian Gerst,
	Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski,
	Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon,
	Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman

* Eric W. Biederman | 2020-04-27 13:18:47 [-0500]:

>I am conflicted about that but I have to agree.    Instead of
>duplicating everything it would be good enough to duplicate the once
>that cause the process to be attached to use.  Then there would be no
>more pid races to worry about.

>How does this differ using the tracing related infrastructure we have
>for the kernel on a userspace process?  I suspect augmenting the tracing
>infrastructure with the ability to set breakpoints and watchpoints (aka
>stopping userspace threads and processes might be a more fertile
>direction to go).
>
>But I agree either we want to just address the races in PTRACE_ATTACH
>and PTRACE_SIEZE or we want to take a good hard look at things.
>
>There is a good case for minimal changes because one of the cases that
>comes up is how much work will it take to change existing programs.  But
>ultimately ptrace pretty much sucks so a very good set of test cases and
>documentation for what we want to implement would be a very good idea.

Hey Eric, Jann, Christian, Arnd,

thank you for your valuable input! IMHO I think we have exactly two choices
here:

a) we go with my patchset that is 100% ptrace feature compatible - except the
   pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is
   automatically extended and vice versa. Both APIs are feature identical
   without any headaches.
b) leave ptrace completely behind us and design ptrace that we have always
   dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness,
   a speedy API to copy & modify large chunks of data, io_uring/epoll support
   and of course: pidfd based (missed likely thousands of other dreams)

I think a solution in between is not worth the effort! It will not be
compatible in any way for the userspace and the benefit will be negligible.
Ptrace is horrible API - everybody knows that but developers get comfy with
it. You find examples everywhere, why should we make it harder for the user for
no or little benefit (except that stable handle with pidfd and some cleanups)?

Any thoughts on this?

Hagen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 18:59         ` Hagen Paul Pfeifer
@ 2020-04-27 20:08           ` Arnd Bergmann
  2020-04-27 20:13             ` Christian Brauner
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2020-04-27 20:08 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Eric W. Biederman, Jann Horn, Christian Brauner, kernel list,
	Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst,
	Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski,
	Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon,
	Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman

On Mon, Apr 27, 2020 at 8:59 PM Hagen Paul Pfeifer <hagen@jauu.net> wrote:
>
> * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]:
>
> >I am conflicted about that but I have to agree.    Instead of
> >duplicating everything it would be good enough to duplicate the once
> >that cause the process to be attached to use.  Then there would be no
> >more pid races to worry about.
>
> >How does this differ using the tracing related infrastructure we have
> >for the kernel on a userspace process?  I suspect augmenting the tracing
> >infrastructure with the ability to set breakpoints and watchpoints (aka
> >stopping userspace threads and processes might be a more fertile
> >direction to go).
> >
> >But I agree either we want to just address the races in PTRACE_ATTACH
> >and PTRACE_SIEZE or we want to take a good hard look at things.
> >
> >There is a good case for minimal changes because one of the cases that
> >comes up is how much work will it take to change existing programs.  But
> >ultimately ptrace pretty much sucks so a very good set of test cases and
> >documentation for what we want to implement would be a very good idea.
>
> Hey Eric, Jann, Christian, Arnd,
>
> thank you for your valuable input! IMHO I think we have exactly two choices
> here:
>
> a) we go with my patchset that is 100% ptrace feature compatible - except the
>    pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is
>    automatically extended and vice versa. Both APIs are feature identical
>    without any headaches.
> b) leave ptrace completely behind us and design ptrace that we have always
>    dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness,
>    a speedy API to copy & modify large chunks of data, io_uring/epoll support
>    and of course: pidfd based (missed likely thousands of other dreams)
>
> I think a solution in between is not worth the effort! It will not be
> compatible in any way for the userspace and the benefit will be negligible.
> Ptrace is horrible API - everybody knows that but developers get comfy with
> it. You find examples everywhere, why should we make it harder for the user for
> no or little benefit (except that stable handle with pidfd and some cleanups)?
>
> Any thoughts on this?

The way I understood Jann was that instead of a new syscall that duplicates
everything in ptrace(), there would only need to be a new ptrace request
such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH
but takes a pidfd as the second argument, perhaps setting the return value
to the pid on success. Same for PTRACE_SEIZE.

In effect this is not much different from your a), just a variation on the
calling conventions. The main upside is that it avoids adding another
ugly interface, the flip side is that it makes the existing one slightly worse
by adding complexity.

       Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 20:08           ` Arnd Bergmann
@ 2020-04-27 20:13             ` Christian Brauner
  2020-04-28  0:45               ` Aleksa Sarai
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Brauner @ 2020-04-27 20:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list,
	Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst,
	Sami Tolvanen, David Howells, Aleksa Sarai, Andy Lutomirski,
	Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon,
	Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman

On Mon, Apr 27, 2020 at 10:08:03PM +0200, Arnd Bergmann wrote:
> On Mon, Apr 27, 2020 at 8:59 PM Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> >
> > * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]:
> >
> > >I am conflicted about that but I have to agree.    Instead of
> > >duplicating everything it would be good enough to duplicate the once
> > >that cause the process to be attached to use.  Then there would be no
> > >more pid races to worry about.
> >
> > >How does this differ using the tracing related infrastructure we have
> > >for the kernel on a userspace process?  I suspect augmenting the tracing
> > >infrastructure with the ability to set breakpoints and watchpoints (aka
> > >stopping userspace threads and processes might be a more fertile
> > >direction to go).
> > >
> > >But I agree either we want to just address the races in PTRACE_ATTACH
> > >and PTRACE_SIEZE or we want to take a good hard look at things.
> > >
> > >There is a good case for minimal changes because one of the cases that
> > >comes up is how much work will it take to change existing programs.  But
> > >ultimately ptrace pretty much sucks so a very good set of test cases and
> > >documentation for what we want to implement would be a very good idea.
> >
> > Hey Eric, Jann, Christian, Arnd,
> >
> > thank you for your valuable input! IMHO I think we have exactly two choices
> > here:
> >
> > a) we go with my patchset that is 100% ptrace feature compatible - except the
> >    pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is
> >    automatically extended and vice versa. Both APIs are feature identical
> >    without any headaches.
> > b) leave ptrace completely behind us and design ptrace that we have always
> >    dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness,
> >    a speedy API to copy & modify large chunks of data, io_uring/epoll support
> >    and of course: pidfd based (missed likely thousands of other dreams)
> >
> > I think a solution in between is not worth the effort! It will not be
> > compatible in any way for the userspace and the benefit will be negligible.
> > Ptrace is horrible API - everybody knows that but developers get comfy with
> > it. You find examples everywhere, why should we make it harder for the user for
> > no or little benefit (except that stable handle with pidfd and some cleanups)?
> >
> > Any thoughts on this?
> 
> The way I understood Jann was that instead of a new syscall that duplicates
> everything in ptrace(), there would only need to be a new ptrace request
> such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH
> but takes a pidfd as the second argument, perhaps setting the return value
> to the pid on success. Same for PTRACE_SEIZE.

That was my initial suggestion, yes. Any enum that identifies a target
by a pid will get a new _PIDFD version and the pidfd is passed as pid_t
argument. That should work and is similar to what I did for waitid()
P_PIDFD. Realistically, there shouldn't be any system where pid_t is
smaller than an int that we care about.

> 
> In effect this is not much different from your a), just a variation on the
> calling conventions. The main upside is that it avoids adding another
> ugly interface, the flip side is that it makes the existing one slightly worse
> by adding complexity.

Basically, if a new syscall than please a proper re-design with real
benefits.

In the meantime we could make due with the _PIDFD variant. And then if
someone wants to do the nitty gritty work of adding a ptrace variant
purely based on pidfds and with a better api and features that e.g. Jann
pointed out then by all means, please do so. I'm sure we would all
welcome this as well.

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-27 20:13             ` Christian Brauner
@ 2020-04-28  0:45               ` Aleksa Sarai
  2020-04-28  1:36                 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Aleksa Sarai @ 2020-04-28  0:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Arnd Bergmann, Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn,
	kernel list, Florian Weimer, Al Viro, Christian Brauner,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Brian Gerst, Sami Tolvanen, David Howells, Andy Lutomirski,
	Oleg Nesterov, Arnaldo Carvalho de Melo, Sargun Dhillon,
	Linux API, linux-arch, Linus Torvalds, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]

On 2020-04-27, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Mon, Apr 27, 2020 at 10:08:03PM +0200, Arnd Bergmann wrote:
> > The way I understood Jann was that instead of a new syscall that duplicates
> > everything in ptrace(), there would only need to be a new ptrace request
> > such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH
> > but takes a pidfd as the second argument, perhaps setting the return value
> > to the pid on success. Same for PTRACE_SEIZE.
> 
> That was my initial suggestion, yes. Any enum that identifies a target
> by a pid will get a new _PIDFD version and the pidfd is passed as pid_t
> argument. That should work and is similar to what I did for waitid()
> P_PIDFD. Realistically, there shouldn't be any system where pid_t is
> smaller than an int that we care about.
> 
> > In effect this is not much different from your a), just a variation on the
> > calling conventions. The main upside is that it avoids adding another
> > ugly interface, the flip side is that it makes the existing one slightly worse
> > by adding complexity.
> 
> Basically, if a new syscall than please a proper re-design with real
> benefits.
> 
> In the meantime we could make due with the _PIDFD variant. And then if
> someone wants to do the nitty gritty work of adding a ptrace variant
> purely based on pidfds and with a better api and features that e.g. Jann
> pointed out then by all means, please do so. I'm sure we would all
> welcome this as well.

I agree. It would be a shame to add a new ptrace syscall and not take
the opportunity to fix the multitude of problems with the existing API.
But that's a Pandora's box which we shouldn't open unless we want to
wait a long time to get an API everyone is okay with -- a pretty high
price to just get pidfds support in ptrace.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  0:45               ` Aleksa Sarai
@ 2020-04-28  1:36                 ` Linus Torvalds
  2020-04-28  4:17                   ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2020-04-28  1:36 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Christian Brauner, Arnd Bergmann, Hagen Paul Pfeifer,
	Eric W. Biederman, Jann Horn, kernel list, Florian Weimer,
	Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen,
	David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman

On Mon, Apr 27, 2020 at 5:46 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> I agree. It would be a shame to add a new ptrace syscall and not take
> the opportunity to fix the multitude of problems with the existing API.
> But that's a Pandora's box which we shouldn't open unless we want to
> wait a long time to get an API everyone is okay with -- a pretty high
> price to just get pidfds support in ptrace.

We should really be very very careful with some "smarter ptrace".
We've had _so_ many security issues with ptrace that it's not even
funny.

And that's ignoring all the practical issues we've had.

I would definitely not want to have anything that looks like ptrace AT
ALL using pidfd. If we have a file descriptor to specify the target
process, then we should probably take advantage of that file
descriptor to actually make it more of a asynchronous interface that
doesn't cause the kinds of deadlocks that we've had with ptrace.

The synchronous nature of ptrace() means that not only do we have
those nasty deadlocks, it's also very very expensive to use. It also
has some other fundamental problems, like the whole "take over parent"
and the SIGCHLD behavior.

It also is hard to ptrace a ptracer. Which is annoying when you're
debugging gdb or strace or whatever.

So I think the thing to do is ask the gdb (and strace) people if they
have any _very_ particular painpoints that we could perhaps help with.

And then very carefully think things through and not repeat all the
mistakes ptrace did.

I'm not very optimistic.

              Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  1:36                 ` Linus Torvalds
@ 2020-04-28  4:17                   ` Andy Lutomirski
  2020-04-28  4:28                     ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2020-04-28  4:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Aleksa Sarai, Christian Brauner, Arnd Bergmann,
	Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list,
	Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst,
	Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman



> On Apr 27, 2020, at 6:36 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Mon, Apr 27, 2020 at 5:46 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>> 
>> I agree. It would be a shame to add a new ptrace syscall and not take
>> the opportunity to fix the multitude of problems with the existing API.
>> But that's a Pandora's box which we shouldn't open unless we want to
>> wait a long time to get an API everyone is okay with -- a pretty high
>> price to just get pidfds support in ptrace.
> 
> We should really be very very careful with some "smarter ptrace".
> We've had _so_ many security issues with ptrace that it's not even
> funny.
> 
> And that's ignoring all the practical issues we've had.
> 
> I would definitely not want to have anything that looks like ptrace AT
> ALL using pidfd. If we have a file descriptor to specify the target
> process, then we should probably take advantage of that file
> descriptor to actually make it more of a asynchronous interface that
> doesn't cause the kinds of deadlocks that we've had with ptrace.
> 
> The synchronous nature of ptrace() means that not only do we have
> those nasty deadlocks, it's also very very expensive to use. It also
> has some other fundamental problems, like the whole "take over parent"
> and the SIGCHLD behavior.
> 
> It also is hard to ptrace a ptracer. Which is annoying when you're
> debugging gdb or strace or whatever.
> 
> So I think the thing to do is ask the gdb (and strace) people if they
> have any _very_ particular painpoints that we could perhaps help with.
> 
> And then very carefully think things through and not repeat all the
> mistakes ptrace did.
> 
> I'm not very optimistic.

I hate to say this, but I’m not convinced that asking the gdb folks is the right approach. GDB has an ancient architecture and is *incredibly* buggy.  I’m sure ptrace is somewhere on the pain point list, but I suspect it’s utterly dwarfed by everything else.

Maybe the LLDB people would have a better perspective?  The rr folks would be a good bet, too. Or, and I know this is sacrilege, the VSCode people?


I think one requirement for a better ptrace is that it should work if you try to debug, simultaneously, a debugger and its debugee. Maybe not perfectly, but it should work. And you should be able to debug init.

Another major pain point I’ve seen is compat. A 64-bit debugger should be able to debug a program that switches back and forth between 32-bit and 64-bit.  A debugger that is entirely unaware of a set of registers should be able to debug a process using those registers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  4:17                   ` Andy Lutomirski
@ 2020-04-28  4:28                     ` Linus Torvalds
  2020-04-28  6:39                       ` Hagen Paul Pfeifer
  2020-04-28  8:21                       ` Christian Brauner
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2020-04-28  4:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Aleksa Sarai, Christian Brauner, Arnd Bergmann,
	Hagen Paul Pfeifer, Eric W. Biederman, Jann Horn, kernel list,
	Florian Weimer, Al Viro, Christian Brauner, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Brian Gerst,
	Sami Tolvanen, David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman

On Mon, Apr 27, 2020 at 9:17 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
> I hate to say this, but I’m not convinced that asking the gdb folks is
> the right approach. GDB has an ancient architecture and is
> *incredibly* buggy. I’m sure ptrace is somewhere on the pain point
> list, but I suspect it’s utterly dwarfed by everything else.

You may be right. However, if gdbn isn't going to use it, then I
seriously don't think it's worth changing much.

It might be worth looking at people who don't use ptrace() for
debugging, but for "incidental" reasons. IOW sandboxing, tracing,
things like that.

Maybe those people want things that are simpler and don't actually
need the kinds of hard serialization that ptrace() wants.

I'd rather add a few really simple things that might not be a full
complement of operations for a debugger, but exactly because they
aren't a full debugger, maybe they are things that we can tell are
obviously secure and simple?

               Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  4:28                     ` Linus Torvalds
@ 2020-04-28  6:39                       ` Hagen Paul Pfeifer
  2020-04-28  7:45                         ` Christian Brauner
  2020-04-28  8:21                       ` Christian Brauner
  1 sibling, 1 reply; 17+ messages in thread
From: Hagen Paul Pfeifer @ 2020-04-28  6:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Aleksa Sarai, Christian Brauner, Arnd Bergmann,
	Eric W. Biederman, Jann Horn, kernel list, Florian Weimer,
	Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen,
	David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman

* Linus Torvalds | 2020-04-27 21:28:14 [-0700]:

>> I hate to say this, but I’m not convinced that asking the gdb folks is
>> the right approach. GDB has an ancient architecture and is
>> *incredibly* buggy. I’m sure ptrace is somewhere on the pain point
>> list, but I suspect it’s utterly dwarfed by everything else.
>
>You may be right. However, if gdbn isn't going to use it, then I
>seriously don't think it's worth changing much.
>
>It might be worth looking at people who don't use ptrace() for
>debugging, but for "incidental" reasons. IOW sandboxing, tracing,
>things like that.
>
>Maybe those people want things that are simpler and don't actually
>need the kinds of hard serialization that ptrace() wants.
>
>I'd rather add a few really simple things that might not be a full
>complement of operations for a debugger, but exactly because they
>aren't a full debugger, maybe they are things that we can tell are
>obviously secure and simple?

Okay, to sum up the the whole discussion: we go forward with Jann's proposal
by simple adding PTRACE_ATTACH_PIDFD and friends. This is the minimal invasive
solution and the risk of an potenial security problem is almost not present[TM].

Changing the whole ptrace API is a different beast. I rather believe that I
see Linus Linux successor rather than a ptrace successor.

I am fine with PTRACE_ATTACH_PIDFD!

Hagen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  6:39                       ` Hagen Paul Pfeifer
@ 2020-04-28  7:45                         ` Christian Brauner
  0 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2020-04-28  7:45 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Linus Torvalds, Andy Lutomirski, Aleksa Sarai, Arnd Bergmann,
	Eric W. Biederman, Jann Horn, kernel list, Florian Weimer,
	Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen,
	David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman

On Tue, Apr 28, 2020 at 08:39:35AM +0200, Hagen Paul Pfeifer wrote:
> * Linus Torvalds | 2020-04-27 21:28:14 [-0700]:
> 
> >> I hate to say this, but I’m not convinced that asking the gdb folks is
> >> the right approach. GDB has an ancient architecture and is
> >> *incredibly* buggy. I’m sure ptrace is somewhere on the pain point
> >> list, but I suspect it’s utterly dwarfed by everything else.
> >
> >You may be right. However, if gdbn isn't going to use it, then I
> >seriously don't think it's worth changing much.
> >
> >It might be worth looking at people who don't use ptrace() for
> >debugging, but for "incidental" reasons. IOW sandboxing, tracing,
> >things like that.
> >
> >Maybe those people want things that are simpler and don't actually
> >need the kinds of hard serialization that ptrace() wants.
> >
> >I'd rather add a few really simple things that might not be a full
> >complement of operations for a debugger, but exactly because they
> >aren't a full debugger, maybe they are things that we can tell are
> >obviously secure and simple?
> 
> Okay, to sum up the the whole discussion: we go forward with Jann's proposal
> by simple adding PTRACE_ATTACH_PIDFD and friends. This is the minimal invasive
> solution and the risk of an potenial security problem is almost not present[TM].
> 
> Changing the whole ptrace API is a different beast. I rather believe that I
> see Linus Linux successor rather than a ptrace successor.
> 
> I am fine with PTRACE_ATTACH_PIDFD!

If this is enough for you use-case then we should make due with my
initial suggestion, yes. I'd be fine with adding this variant.

I initially thought that we'd likely would need to support a few more
but I don't think we want to actually; there's a bunch of crazy stuff in
there.

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall
  2020-04-28  4:28                     ` Linus Torvalds
  2020-04-28  6:39                       ` Hagen Paul Pfeifer
@ 2020-04-28  8:21                       ` Christian Brauner
  1 sibling, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2020-04-28  8:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Aleksa Sarai, Arnd Bergmann, Hagen Paul Pfeifer,
	Eric W. Biederman, Jann Horn, kernel list, Florian Weimer,
	Al Viro, Christian Brauner, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Brian Gerst, Sami Tolvanen,
	David Howells, Andy Lutomirski, Oleg Nesterov,
	Arnaldo Carvalho de Melo, Sargun Dhillon, Linux API, linux-arch,
	Greg Kroah-Hartman

On Mon, Apr 27, 2020 at 09:28:14PM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2020 at 9:17 PM Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > I hate to say this, but I’m not convinced that asking the gdb folks is
> > the right approach. GDB has an ancient architecture and is
> > *incredibly* buggy. I’m sure ptrace is somewhere on the pain point
> > list, but I suspect it’s utterly dwarfed by everything else.
> 
> You may be right. However, if gdbn isn't going to use it, then I
> seriously don't think it's worth changing much.
> 
> It might be worth looking at people who don't use ptrace() for
> debugging, but for "incidental" reasons. IOW sandboxing, tracing,
> things like that.
> 
> Maybe those people want things that are simpler and don't actually
> need the kinds of hard serialization that ptrace() wants.
> 
> I'd rather add a few really simple things that might not be a full
> complement of operations for a debugger, but exactly because they
> aren't a full debugger, maybe they are things that we can tell are
> obviously secure and simple?

I think the biggest non-anecdotal user of ptrace() besides debuggers
is actually criu (and strace of course). They use it to inject parasite
code (their phrasing not mine) into another task to handle restoring the
parts of a task that can't be restored from the outside. Looking through
their repo they make quite a bit of use of ptrace functionality including
some arch specific bits:
PTRACE_GETREGSET
PTRACE_GETFPREGS
PTRACE_PEEKUSER
PTRACE_POKEUSER
PTRACE_CONT
PTRACE_SETREGSET
PTRACE_GETVFPREGS /* arm/arm64 */
PTRACE_GETVRREGS /* powerpc */
PTRACE_GETVSRREGS /* powerpc */
PTRACE_EVENT_STOP
PTRACE_GETSIGMASK
PTRACE_INTERRUPT
PTRACE_DETACH
PTRACE_GETSIGINFO
PTRACE_SEIZE
PTRACE_SETSIGMASK
PTRACE_SI_EVENT
PTRACE_SYSCALL
PTRACE_SETOPTIONS
PTRACE_ATTACH
PTRACE_O_SUSPEND_SECCOMP
PTRACE_PEEKSIGINFO
PTRACE_SECCOMP_GET_FILTER
PTRACE_SECCOMP_GET_METADATA

So I guess strace and criu would be the ones to go and ask and if they
don't care enough we already need to start squinting for other larg-ish
users. proot comes to mind
https://github.com/proot-me/proot

(From personal experience, most of the time when ptrace is used in a
 non-debugger codebase it's either to plug a security hole exploitable
 through ptracing the task and the fix is ptracing that very task to
 prevent the attacker from ptracing it (where non-dumpability alone
 doesn't cut it) or the idea is dropped immediately to not lose the
 ability to use a debugger on the program.)

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-04-28  8:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-26 13:01 [RFC] ptrace, pidfd: add pidfd_ptrace syscall Hagen Paul Pfeifer
2020-04-26 16:34 ` [RFC v2] " Hagen Paul Pfeifer
2020-04-27  8:30   ` Arnd Bergmann
2020-04-27  9:00     ` Hagen Paul Pfeifer
2020-04-27 17:08   ` Christian Brauner
2020-04-27 17:52     ` Jann Horn
2020-04-27 18:18       ` Eric W. Biederman
2020-04-27 18:59         ` Hagen Paul Pfeifer
2020-04-27 20:08           ` Arnd Bergmann
2020-04-27 20:13             ` Christian Brauner
2020-04-28  0:45               ` Aleksa Sarai
2020-04-28  1:36                 ` Linus Torvalds
2020-04-28  4:17                   ` Andy Lutomirski
2020-04-28  4:28                     ` Linus Torvalds
2020-04-28  6:39                       ` Hagen Paul Pfeifer
2020-04-28  7:45                         ` Christian Brauner
2020-04-28  8:21                       ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).