The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification
@ 2026-07-01 15:05 Renzo Davoli
  2026-07-01 15:05 ` [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by allowing a tracer to modify details of a
system call in which the tracee is currently blocked.

The API is designed to let tracers inspect and modify system call
information in a simple, architecture-agnostic manner.

The current implementation only supports modifying the subset of
system call information needed by strace: the system call number,
arguments, and return value.

This patch set extends PTRACE_SET_SYSCALL_INFO with support for:

    Skipping a system call triggered via seccomp

    Modifying the tracee's instruction pointer

1. Seccomp system call skip

When a seccomp filter returns SECCOMP_RET_TRACE, the tracer receives,
via PTRACE_GET_SYSCALL_INFO, a struct ptrace_syscall_info with
op == PTRACE_SYSCALL_INFO_SECCOMP.

The tracer can skip the system call by setting the system call number
to -1. However, the current PTRACE_SET_SYSCALL_INFO interface does not
provide a way to specify the return value or error code that should be
reported to the tracee after skipping the call.

Patch 1/5 introduces a new op value,
PTRACE_SYSCALL_INFO_SECCOMP_SKIP, for use with
PTRACE_SET_SYSCALL_INFO.

When the tracer retrieves a ptrace_syscall_info structure with
op == PTRACE_SYSCALL_INFO_SECCOMP, it may choose to skip the system
call by changing op to PTRACE_SYSCALL_INFO_SECCOMP_SKIP and
populating the exit union fields (rval and is_error) to define
the return value and error status for the tracee.

2. Setting the instruction pointer

Patch 4/5 adds support for modifying the tracee's instruction pointer.

To do this, the tracer stores the new instruction pointer value in the
instruction_pointer field of the ptrace_syscall_info structure and
sets the PTRACE_SYSCALL_INFO_FLAG_SET_IP flag in the flags field.

This flag is introduced to avoid breaking existing code that uses
PTRACE_SET_SYSCALL_INFO and currently ignores the
instruction_pointer field.

Renzo Davoli (5):
  ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  asm/ptrace.h: add instruction_pointer_set
  ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP
  selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP

 arch/alpha/include/asm/ptrace.h               |   6 +
 arch/hexagon/include/asm/ptrace.h             |   6 +
 arch/m68k/include/asm/ptrace.h                |   6 +
 arch/microblaze/include/asm/ptrace.h          |   6 +
 arch/nios2/include/asm/ptrace.h               |   6 +
 arch/um/include/asm/ptrace-generic.h          |   6 +
 arch/xtensa/include/asm/ptrace.h              |   6 +
 include/uapi/linux/ptrace.h                   |   5 +
 kernel/ptrace.c                               |  39 ++-
 .../selftests/ptrace/set_syscall_info.c       | 321 +++++++++++++++++-
 10 files changed, 401 insertions(+), 6 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
@ 2026-07-01 15:05 ` Renzo Davoli
  2026-07-02  8:43   ` Oleg Nesterov
  2026-07-01 15:05 ` [PATCH 2/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

This patch extends PTRACE_SET_SYSCALL_INFO with support for skipping a system
call triggered via seccomp.

When the tracer retrieves a ptrace_syscall_info structure with
op == PTRACE_SYSCALL_INFO_SECCOMP, it may choose to skip the system
call by changing op to PTRACE_SYSCALL_INFO_SECCOMP_SKIP and
populating the exit union fields (rval and is_error) to define
the return value and error status for the tracee.

Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
---
 include/uapi/linux/ptrace.h |  1 +
 kernel/ptrace.c             | 28 +++++++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 5f8ef6156752..22489597a325 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -79,6 +79,7 @@ struct seccomp_metadata {
 #define PTRACE_SYSCALL_INFO_ENTRY	1
 #define PTRACE_SYSCALL_INFO_EXIT	2
 #define PTRACE_SYSCALL_INFO_SECCOMP	3
+#define PTRACE_SYSCALL_INFO_SECCOMP_SKIP 4
 
 struct ptrace_syscall_info {
 	__u8 op;	/* PTRACE_SYSCALL_INFO_* */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d041645d9d17..ff763c87e4f7 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1119,12 +1119,22 @@ ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
 	return 0;
 }
 
+static int
+ptrace_set_syscall_info_seccomp_skip(struct task_struct *child,
+				     struct pt_regs *regs,
+				     struct ptrace_syscall_info *info)
+{
+	syscall_set_nr(child, regs, -1);
+	return ptrace_set_syscall_info_exit(child, regs, info);
+}
+
 static int
 ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 			const void __user *datavp)
 {
 	struct pt_regs *regs = task_pt_regs(child);
 	struct ptrace_syscall_info info;
+	int child_op;
 
 	if (user_size < sizeof(info))
 		return -EINVAL;
@@ -1141,9 +1151,19 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 	if (info.flags || info.reserved)
 		return -EINVAL;
 
-	/* Changing the type of the system call stop is not supported yet. */
-	if (ptrace_get_syscall_info_op(child) != info.op)
-		return -EINVAL;
+	/*
+	 * Changing the type of the system call stop is
+	 * not allowed, with the following exception:
+	 * PTRACE_SYSCALL_INFO_SECCOMP can be changed to
+	 * PTRACE_SYSCALL_INFO_SECCOMP_SKIP.
+	 */
+
+	child_op = ptrace_get_syscall_info_op(child);
+	if (child_op != info.op) {
+		if ((info.op != PTRACE_SYSCALL_INFO_SECCOMP_SKIP) ||
+				(child_op != PTRACE_SYSCALL_INFO_SECCOMP))
+			return -EINVAL;
+	}
 
 	switch (info.op) {
 	case PTRACE_SYSCALL_INFO_ENTRY:
@@ -1152,6 +1172,8 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 		return ptrace_set_syscall_info_exit(child, regs, &info);
 	case PTRACE_SYSCALL_INFO_SECCOMP:
 		return ptrace_set_syscall_info_seccomp(child, regs, &info);
+	case PTRACE_SYSCALL_INFO_SECCOMP_SKIP:
+		return ptrace_set_syscall_info_seccomp_skip(child, regs, &info);
 	default:
 		/* Other types of system call stops are not supported yet. */
 		return -EINVAL;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
  2026-07-01 15:05 ` [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
@ 2026-07-01 15:05 ` Renzo Davoli
  2026-07-01 15:05 ` [PATCH 3/5] asm/ptrace.h: add instruction_pointer_set Renzo Davoli
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

Check whether PTRACE_SYSCALL_INFO_SECCOMP_SKIP semantics implemented in the
kernel matches userspace expectations.

Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
---
 .../selftests/ptrace/set_syscall_info.c       | 174 +++++++++++++++++-
 1 file changed, 173 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/ptrace/set_syscall_info.c b/tools/testing/selftests/ptrace/set_syscall_info.c
index 1cc411a41cd6..7f397469fd00 100644
--- a/tools/testing/selftests/ptrace/set_syscall_info.c
+++ b/tools/testing/selftests/ptrace/set_syscall_info.c
@@ -11,9 +11,16 @@
 #include <err.h>
 #include <fcntl.h>
 #include <signal.h>
+#include <stdlib.h>
+#include <stddef.h>
 #include <asm/unistd.h>
+#include <sys/prctl.h>
 #include <linux/types.h>
 #include <linux/ptrace.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/prctl.h>
+
 
 #if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32
 /*
@@ -36,6 +43,7 @@ struct si_exit {
 
 static unsigned int ptrace_stop;
 static pid_t tracee_pid;
+static pid_t tracer_pid;
 
 static int
 kill_tracee(pid_t pid)
@@ -64,6 +72,25 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data)
 		       ptrace_stop, ##__VA_ARGS__);		\
 	} while (0)
 
+static int sys_seccomp(unsigned int operation, unsigned int flags, void *args)
+{
+	return syscall(__NR_seccomp, operation, flags, args);
+}
+
+static struct sock_filter seccomp_filter[] = {
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),
+
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_restart_syscall, 0, 1),
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRACE),
+};
+
+static struct sock_fprog seccomp_prog = {
+	.filter = seccomp_filter,
+	.len = ARRAY_SIZE(seccomp_filter)
+};
+
 static void
 check_psi_entry(struct __test_metadata *_metadata,
 		const struct ptrace_syscall_info *info,
@@ -128,7 +155,6 @@ check_psi_exit(struct __test_metadata *_metadata,
 
 TEST(set_syscall_info)
 {
-	const pid_t tracer_pid = getpid();
 	const kernel_ulong_t dummy[] = {
 		(kernel_ulong_t) 0xdad0bef0bad0fed0ULL,
 		(kernel_ulong_t) 0xdad1bef1bad1fed1ULL,
@@ -138,6 +164,7 @@ TEST(set_syscall_info)
 		(kernel_ulong_t) 0xdad5bef5bad5fed5ULL,
 	};
 	int splice_in[2], splice_out[2];
+	tracer_pid = getpid();
 
 	ASSERT_EQ(0, pipe(splice_in));
 	ASSERT_EQ(0, pipe(splice_out));
@@ -516,4 +543,149 @@ TEST(set_syscall_info)
 	ASSERT_EQ(ptrace_stop, ARRAY_SIZE(si) * 2);
 }
 
+TEST(set_syscall_info_seccomp)
+{
+	tracer_pid = getpid();
+	tracee_pid = fork();
+
+	ASSERT_LE(0, tracee_pid) {
+		TH_LOG("fork: %m");
+	}
+
+	/* tracee */
+	if (tracee_pid == 0) {
+		tracee_pid = getpid();
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		ASSERT_EQ(0, kill(tracee_pid, SIGSTOP)) {
+			/* cannot happen */
+			TH_LOG("kill SIGSTOP: %m");
+		}
+
+		ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+			TH_LOG("prctl: %m");
+			_exit(1);
+		}
+		ASSERT_EQ(0, sys_seccomp(SECCOMP_SET_MODE_FILTER, 0,
+					(void *) &seccomp_prog)) {
+			TH_LOG("seccomp: %m");
+			_exit(1);
+		}
+
+		/* run getpid unmodified */
+		ASSERT_EQ(tracee_pid, getpid()) {
+			TH_LOG("getpid seccomp unchanged: %m");
+			_exit(1);
+		}
+
+		/* run getppid instead of getpid */
+		ASSERT_EQ(tracer_pid, getpid()) {
+			TH_LOG("getpid seccomp nr changes: %m");
+			_exit(1);
+		}
+
+		/* skip getpid and return 42 */
+		ASSERT_EQ(42, getpid()) {
+			TH_LOG("getpid skip set return value changes: %m");
+			_exit(1);
+		}
+		_exit(0);
+	}
+
+	int status;
+
+	/* tracer */
+	ASSERT_LE(0, waitpid(-1,&status,0)) {
+		LOG_KILL_TRACEE("waitpid: %m");
+	}
+
+	ASSERT_EQ(0, sys_ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP | PTRACE_O_TRACESYSGOOD))
+		LOG_KILL_TRACEE("PTRACE_SETOPTIONS: %m");
+
+	ASSERT_EQ(0, sys_ptrace(PTRACE_CONT, tracee_pid, 0, 0)) {
+		LOG_KILL_TRACEE("PTRACE_CONT: %m");
+	}
+
+	while (1) {
+		ASSERT_EQ(tracee_pid, wait(&status)) {
+			/* cannot happen */
+			LOG_KILL_TRACEE("wait: %m");
+		}
+		if (WIFEXITED(status)) {
+			tracee_pid = 0; /* the tracee is no more */
+			ASSERT_EQ(0, WEXITSTATUS(status)) {
+				LOG_KILL_TRACEE("unexpected exit status %u",
+						WEXITSTATUS(status));
+			}
+			break;
+		}
+		ASSERT_FALSE(WIFSIGNALED(status)) {
+			tracee_pid = 0; /* the tracee is no more */
+			LOG_KILL_TRACEE("unexpected signal %u",
+					WTERMSIG(status));
+		}
+		ASSERT_TRUE(WIFSTOPPED(status)) {
+			LOG_KILL_TRACEE("unexpected wait status %#x", status);
+		}
+
+		if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8))) {
+			struct ptrace_syscall_info info;
+			size_t info_size = sizeof(info);
+			ASSERT_LT(0, sys_ptrace(PTRACE_GET_SYSCALL_INFO, tracee_pid, info_size, (uintptr_t) &info)) {
+				LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m");
+			};
+			ASSERT_EQ(PTRACE_SYSCALL_INFO_SECCOMP, info.op) {
+				LOG_KILL_TRACEE("entry op mismatch: %m");
+			}
+			ASSERT_TRUE(info.arch) {
+				LOG_KILL_TRACEE("entry arch mismatch: %m");
+			}
+			ASSERT_TRUE(info.instruction_pointer) {
+				LOG_KILL_TRACEE("entry instruction_pointer mismatch: %m");
+			}
+			ASSERT_TRUE(info.stack_pointer) {
+				LOG_KILL_TRACEE("entry stack_pointer mismatch: %m");
+			}
+
+			switch (ptrace_stop) {
+				case 0: ASSERT_EQ(__NR_getpid, info.seccomp.nr) {
+						LOG_KILL_TRACEE("step %d nr __NR_getpid mismatch: %m", ptrace_stop);
+					}
+					ptrace_stop++;
+					break;
+				case 1: ASSERT_EQ(__NR_getpid, info.seccomp.nr) {
+						LOG_KILL_TRACEE("step %d nr __NR_getpid mismatch: %m", ptrace_stop);
+					}
+					info.seccomp.nr = __NR_getppid;
+					ptrace_stop++;
+					break;
+				case 2: ASSERT_EQ(__NR_getpid, info.seccomp.nr) {
+						LOG_KILL_TRACEE("step %d nr __NR_getpid mismatch: %m", ptrace_stop);
+					}
+					info.op = PTRACE_SYSCALL_INFO_SECCOMP_SKIP;
+					info.exit.rval = 42;
+					info.exit.is_error = 0;
+					ptrace_stop++;
+					break;
+				case 3:  ASSERT_EQ(__NR_exit_group, info.seccomp.nr) {
+						 LOG_KILL_TRACEE("step %d nr __NR_exit_group mismatch: %m", ptrace_stop);
+					 }
+					 break;
+				default:
+					 LOG_KILL_TRACEE("unexpected system call: %m");
+					 break;
+
+			}
+			ASSERT_EQ(0,sys_ptrace(PTRACE_SET_SYSCALL_INFO, tracee_pid, info_size, (uintptr_t) &info)) {
+				LOG_KILL_TRACEE("PTRACE_SET_SYSCALL_INFO: %m");
+			}
+
+			ASSERT_EQ(0,sys_ptrace(PTRACE_CONT, tracee_pid, 0, 0)) {
+				LOG_KILL_TRACEE("PTRACE_CONT: %m");
+			}
+		}
+	}
+}
+
 TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] asm/ptrace.h: add instruction_pointer_set
  2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
  2026-07-01 15:05 ` [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
  2026-07-01 15:05 ` [PATCH 2/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
@ 2026-07-01 15:05 ` Renzo Davoli
  2026-07-01 15:05 ` [PATCH 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
  2026-07-01 15:05 ` [PATCH 5/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
  4 siblings, 0 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

Add an instruction_pointer_set function for architectures that do
not currently provide one.

Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
---
 arch/alpha/include/asm/ptrace.h      | 6 ++++++
 arch/hexagon/include/asm/ptrace.h    | 6 ++++++
 arch/m68k/include/asm/ptrace.h       | 6 ++++++
 arch/microblaze/include/asm/ptrace.h | 6 ++++++
 arch/nios2/include/asm/ptrace.h      | 6 ++++++
 arch/um/include/asm/ptrace-generic.h | 6 ++++++
 arch/xtensa/include/asm/ptrace.h     | 6 ++++++
 7 files changed, 42 insertions(+)

diff --git a/arch/alpha/include/asm/ptrace.h b/arch/alpha/include/asm/ptrace.h
index 3557ce64ed21..0821fe9a27c8 100644
--- a/arch/alpha/include/asm/ptrace.h
+++ b/arch/alpha/include/asm/ptrace.h
@@ -24,4 +24,10 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
 	return regs->r0;
 }
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 #endif
diff --git a/arch/hexagon/include/asm/ptrace.h b/arch/hexagon/include/asm/ptrace.h
index ed35da1ee685..0a121f6e3bfc 100644
--- a/arch/hexagon/include/asm/ptrace.h
+++ b/arch/hexagon/include/asm/ptrace.h
@@ -18,6 +18,12 @@ extern const char *regs_query_register_name(unsigned int offset);
 	((struct pt_regs *) \
 	 ((unsigned long)current_thread_info() + THREAD_SIZE) - 1)
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 #if CONFIG_HEXAGON_ARCH_VERSION >= 4
 #define arch_has_single_step()	(1)
 #endif
diff --git a/arch/m68k/include/asm/ptrace.h b/arch/m68k/include/asm/ptrace.h
index bc86ce012025..6e8a8f0daee8 100644
--- a/arch/m68k/include/asm/ptrace.h
+++ b/arch/m68k/include/asm/ptrace.h
@@ -18,6 +18,12 @@
 	(struct pt_regs *)((char *)current_thread_info() + THREAD_SIZE) - 1
 #define current_user_stack_pointer() rdusp()
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 #define arch_has_single_step()	(1)
 
 #ifdef CONFIG_MMU
diff --git a/arch/microblaze/include/asm/ptrace.h b/arch/microblaze/include/asm/ptrace.h
index 17982292a64f..69e10658d7a9 100644
--- a/arch/microblaze/include/asm/ptrace.h
+++ b/arch/microblaze/include/asm/ptrace.h
@@ -20,5 +20,11 @@ static inline long regs_return_value(struct pt_regs *regs)
 	return regs->r3;
 }
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 #endif /* __ASSEMBLER__ */
 #endif /* _ASM_MICROBLAZE_PTRACE_H */
diff --git a/arch/nios2/include/asm/ptrace.h b/arch/nios2/include/asm/ptrace.h
index 96cbcd40c7ce..d120d8ecb187 100644
--- a/arch/nios2/include/asm/ptrace.h
+++ b/arch/nios2/include/asm/ptrace.h
@@ -70,6 +70,12 @@ struct switch_stack {
 #define user_stack_pointer(regs)	((regs)->sp)
 extern void show_regs(struct pt_regs *);
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 #define current_pt_regs() \
 	((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE)\
 		- 1)
diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index 86d74f9d33cf..44beb96862d8 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -29,6 +29,12 @@ struct pt_regs {
 
 #define PTRACE_OLDSETOPTIONS 21
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 struct task_struct;
 
 extern long subarch_ptrace(struct task_struct *child, long request,
diff --git a/arch/xtensa/include/asm/ptrace.h b/arch/xtensa/include/asm/ptrace.h
index d0568ff6d349..97b14418955e 100644
--- a/arch/xtensa/include/asm/ptrace.h
+++ b/arch/xtensa/include/asm/ptrace.h
@@ -103,6 +103,12 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
 	return regs->areg[2];
 }
 
+static inline void instruction_pointer_set(struct pt_regs *regs,
+					   unsigned long val)
+{
+	instruction_pointer(regs) = val;
+}
+
 int do_syscall_trace_enter(struct pt_regs *regs);
 void do_syscall_trace_leave(struct pt_regs *regs);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP
  2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
                   ` (2 preceding siblings ...)
  2026-07-01 15:05 ` [PATCH 3/5] asm/ptrace.h: add instruction_pointer_set Renzo Davoli
@ 2026-07-01 15:05 ` Renzo Davoli
  2026-07-01 15:05 ` [PATCH 5/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
  4 siblings, 0 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

This flag adds support for modifying the tracee's instruction pointer.

To do this, the tracer stores the new instruction pointer value in the
instruction_pointer field of the ptrace_syscall_info structure and
sets the PTRACE_SYSCALL_INFO_FLAG_SET_IP flag in the flags field.

This flag is introduced to avoid breaking existing code that uses
PTRACE_SET_SYSCALL_INFO and currently ignores the
instruction_pointer field.

Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
---
 include/uapi/linux/ptrace.h |  4 ++++
 kernel/ptrace.c             | 11 +++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 22489597a325..1d9c04447bb7 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -81,6 +81,10 @@ struct seccomp_metadata {
 #define PTRACE_SYSCALL_INFO_SECCOMP	3
 #define PTRACE_SYSCALL_INFO_SECCOMP_SKIP 4
 
+#define PTRACE_SYSCALL_INFO_FLAG_SET_IP (1 << 0)
+#define PTRACE_SYSCALL_INFO_FLAG_ALL \
+	(PTRACE_SYSCALL_INFO_FLAG_SET_IP)
+
 struct ptrace_syscall_info {
 	__u8 op;	/* PTRACE_SYSCALL_INFO_* */
 	__u8 reserved;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ff763c87e4f7..57a1731451d0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1147,8 +1147,8 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 	if (copy_from_user(&info, datavp, sizeof(info)))
 		return -EFAULT;
 
-	/* Reserved for future use. */
-	if (info.flags || info.reserved)
+	/* Unused flags and fields reserved for future use. */
+	if ((info.flags & ~PTRACE_SYSCALL_INFO_FLAG_ALL) || info.reserved)
 		return -EINVAL;
 
 	/*
@@ -1165,6 +1165,13 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 			return -EINVAL;
 	}
 
+	if (info.flags & PTRACE_SYSCALL_INFO_FLAG_SET_IP) {
+		unsigned long ip = info.instruction_pointer;
+		if (ip != info.instruction_pointer)
+			return -ERANGE;
+		instruction_pointer_set(regs, ip);
+	}
+
 	switch (info.op) {
 	case PTRACE_SYSCALL_INFO_ENTRY:
 		return ptrace_set_syscall_info_entry(child, regs, &info);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP
  2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
                   ` (3 preceding siblings ...)
  2026-07-01 15:05 ` [PATCH 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
@ 2026-07-01 15:05 ` Renzo Davoli
  4 siblings, 0 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-01 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Renzo Davoli, Andrew Morton, Oleg Nesterov, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

Check whether PTRACE_SYSCALL_INFO_FLAG_SET_IP semantics implemented in the
kernel matches userspace expectations.

Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
---
 .../selftests/ptrace/set_syscall_info.c       | 147 ++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/tools/testing/selftests/ptrace/set_syscall_info.c b/tools/testing/selftests/ptrace/set_syscall_info.c
index 7f397469fd00..772312bb8328 100644
--- a/tools/testing/selftests/ptrace/set_syscall_info.c
+++ b/tools/testing/selftests/ptrace/set_syscall_info.c
@@ -91,6 +91,10 @@ static struct sock_fprog seccomp_prog = {
 	.len = ARRAY_SIZE(seccomp_filter)
 };
 
+static char w1[] = {'A', '\n'};
+static char w2[] = {'B', '\n'};
+static char w3[] = {'C', '\n'};
+
 static void
 check_psi_entry(struct __test_metadata *_metadata,
 		const struct ptrace_syscall_info *info,
@@ -688,4 +692,147 @@ TEST(set_syscall_info_seccomp)
 	}
 }
 
+TEST(set_syscall_info_setip)
+{
+	tracer_pid = getpid();
+	tracee_pid = fork();
+
+	ASSERT_LE(0, tracee_pid) {
+		TH_LOG("fork: %m");
+	}
+
+	/* tracee */
+	if (tracee_pid == 0) {
+		tracee_pid = getpid();
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		ASSERT_EQ(0, kill(tracee_pid, SIGSTOP)) {
+			/* cannot happen */
+			TH_LOG("kill SIGSTOP: %m");
+		}
+
+		ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+			TH_LOG("prctl: %m");
+			_exit(1);
+		}
+		ASSERT_EQ(0, sys_seccomp(SECCOMP_SET_MODE_FILTER, 0,
+					(void *) &seccomp_prog)) {
+			TH_LOG("seccomp: %m");
+			_exit(1);
+		}
+
+presyscall:
+		/* this sysall will run twice
+		   (the tracer steps back the instruction pointer) */
+		int rv = write(1, w1, sizeof(w1));
+		ASSERT_EQ(2, rv) {
+			TH_LOG("getpid skip set return value changes: %m");
+			_exit(1);
+		}
+
+		/* run write unmodified */
+		ASSERT_EQ(2, write(1, w3, sizeof(w3))) {
+			TH_LOG("getpid skip set return value changes: %m");
+			_exit(1);
+		}
+		_exit(0);
+	}
+
+	int status;
+	void *doitagain = &&presyscall;
+
+	/* tracer */
+	ASSERT_LE(0, waitpid(-1,&status,0)) {
+		LOG_KILL_TRACEE("waitpid: %m");
+	}
+
+	ASSERT_EQ(0, sys_ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP | PTRACE_O_TRACESYSGOOD))
+		LOG_KILL_TRACEE("PTRACE_SETOPTIONS: %m");
+
+	ASSERT_EQ(0, sys_ptrace(PTRACE_CONT, tracee_pid, 0, 0)) {
+		LOG_KILL_TRACEE("PTRACE_CONT: %m");
+	}
+
+	while (1) {
+		ASSERT_EQ(tracee_pid, wait(&status)) {
+			/* cannot happen */
+			LOG_KILL_TRACEE("wait: %m");
+		}
+		if (WIFEXITED(status)) {
+			tracee_pid = 0; /* the tracee is no more */
+			ASSERT_EQ(0, WEXITSTATUS(status)) {
+				LOG_KILL_TRACEE("unexpected exit status %u",
+						WEXITSTATUS(status));
+			}
+			break;
+		}
+		ASSERT_FALSE(WIFSIGNALED(status)) {
+			tracee_pid = 0; /* the tracee is no more */
+			LOG_KILL_TRACEE("unexpected signal %u",
+					WTERMSIG(status));
+		}
+		ASSERT_TRUE(WIFSTOPPED(status)) {
+			LOG_KILL_TRACEE("unexpected wait status %#x", status);
+		}
+
+		if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8))) {
+			struct ptrace_syscall_info info;
+			size_t info_size = sizeof(info);
+			ASSERT_LT(0, sys_ptrace(PTRACE_GET_SYSCALL_INFO, tracee_pid, info_size, (uintptr_t) &info)) {
+				LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m");
+			}
+			ASSERT_EQ(PTRACE_SYSCALL_INFO_SECCOMP, info.op) {
+				LOG_KILL_TRACEE("entry op mismatch: %m");
+			}
+			ASSERT_TRUE(info.arch) {
+				LOG_KILL_TRACEE("entry arch mismatch: %m");
+			}
+			ASSERT_TRUE(info.instruction_pointer) {
+				LOG_KILL_TRACEE("entry instruction_pointer mismatch: %m");
+			}
+			ASSERT_TRUE(info.stack_pointer) {
+				LOG_KILL_TRACEE("entry stack_pointer mismatch: %m");
+			}
+
+			switch (ptrace_stop) {
+				case 0: ASSERT_EQ(__NR_write, info.seccomp.nr) {
+						LOG_KILL_TRACEE("step %d nr __NR_write mismatch: %m", ptrace_stop);
+					}
+					info.instruction_pointer = (uintptr_t) doitagain;
+					info.flags = PTRACE_SYSCALL_INFO_FLAG_SET_IP;
+					ptrace_stop++;
+					break;
+				case 1:
+					info.seccomp.nr = __NR_write;
+					info.seccomp.args[0] = 1;
+					info.seccomp.args[1] = (uintptr_t) w2;
+					info.seccomp.args[2] = sizeof(w2);
+					ptrace_stop++;
+					break;
+				case 2: ASSERT_EQ(__NR_write, info.seccomp.nr) {
+						LOG_KILL_TRACEE("step %d nr __NR_write mismatch: %m", ptrace_stop);
+					}
+					ptrace_stop++;
+					break;
+				case 3:  ASSERT_EQ(__NR_exit_group, info.seccomp.nr) {
+						 LOG_KILL_TRACEE("step %d nr __NR_exit_group mismatch: %m", ptrace_stop);
+					 }
+					 break;
+				default:
+					 LOG_KILL_TRACEE("unexpected system call: %m");
+					 break;
+
+			}
+			ASSERT_EQ(0,sys_ptrace(PTRACE_SET_SYSCALL_INFO, tracee_pid, info_size, (uintptr_t) &info)) {
+				LOG_KILL_TRACEE("PTRACE_SET_SYSCALL_INFO: %m");
+			}
+
+			ASSERT_EQ(0,sys_ptrace(PTRACE_CONT, tracee_pid, 0, 0)) {
+				LOG_KILL_TRACEE("PTRACE_CONT: %m");
+			}
+		}
+	}
+}
+
 TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-01 15:05 ` [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
@ 2026-07-02  8:43   ` Oleg Nesterov
  2026-07-02  9:09     ` Renzo Davoli
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2026-07-02  8:43 UTC (permalink / raw)
  To: Renzo Davoli
  Cc: linux-kernel, Andrew Morton, Shuah Khan, Alexey Gladkov,
	Eugene Syromyatnikov, Mike Frysinger, Davide Berardi,
	strace-devel, Dmitry Levin

On 07/01, Renzo Davoli wrote:
>
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -79,6 +79,7 @@ struct seccomp_metadata {
>  #define PTRACE_SYSCALL_INFO_ENTRY	1
>  #define PTRACE_SYSCALL_INFO_EXIT	2
>  #define PTRACE_SYSCALL_INFO_SECCOMP	3
> +#define PTRACE_SYSCALL_INFO_SECCOMP_SKIP 4
>
>  struct ptrace_syscall_info {
>  	__u8 op;	/* PTRACE_SYSCALL_INFO_* */
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index d041645d9d17..ff763c87e4f7 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1119,12 +1119,22 @@ ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
>  	return 0;
>  }
>
> +static int
> +ptrace_set_syscall_info_seccomp_skip(struct task_struct *child,
> +				     struct pt_regs *regs,
> +				     struct ptrace_syscall_info *info)
> +{
> +	syscall_set_nr(child, regs, -1);
> +	return ptrace_set_syscall_info_exit(child, regs, info);
> +}

Rather than add the new PTRACE_SYSCALL_INFO_SECCOMP_SKIP, can't we teach
ptrace_set_syscall_info_seccomp() to treat info->entry.nr == -1 as "skip" ?
Note that ptrace_set_syscall_info_seccomp() -> ptrace_set_syscall_info_entry()
already does syscall_set_nr().

And perhaps the changelog should say more about motivation...

See also https://sashiko.dev/#/patchset/20260701150558.330348-1-renzo%40cs.unibo.it

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02  8:43   ` Oleg Nesterov
@ 2026-07-02  9:09     ` Renzo Davoli
  2026-07-02  9:58       ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Renzo Davoli @ 2026-07-02  9:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andrew Morton, Shuah Khan, Alexey Gladkov,
	Eugene Syromyatnikov, Mike Frysinger, Davide Berardi,
	strace-devel, Dmitry Levin

Hi Oleg,

> Rather than add the new PTRACE_SYSCALL_INFO_SECCOMP_SKIP, can't we teach
> ptrace_set_syscall_info_seccomp() to treat info->entry.nr == -1 as "skip" ?
it already does
> Note that ptrace_set_syscall_info_seccomp() -> ptrace_set_syscall_info_entry()
> already does syscall_set_nr().
Syscall skipping is useless if there is not a way to set the return value/errno.

As I explain in the cover letter
+ The tracer can skip the system call by setting the system call number
+ to -1. However, the current PTRACE_SET_SYSCALL_INFO interface does not
+ provide a way to specify the return value or error code that should be
+ reported to the tracee after skipping the call.

currently retvalue/errno can be set only at PTRACE_SYSCALL_INFO_EXIT

renzo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02  9:09     ` Renzo Davoli
@ 2026-07-02  9:58       ` Oleg Nesterov
  2026-07-02 11:07         ` Dmitry V. Levin
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2026-07-02  9:58 UTC (permalink / raw)
  To: Renzo Davoli
  Cc: linux-kernel, Andrew Morton, Shuah Khan, Alexey Gladkov,
	Eugene Syromyatnikov, Mike Frysinger, Davide Berardi,
	strace-devel, Dmitry Levin

On 07/02, Renzo Davoli wrote:
>
> Hi Oleg,
>
> > Rather than add the new PTRACE_SYSCALL_INFO_SECCOMP_SKIP, can't we teach
> > ptrace_set_syscall_info_seccomp() to treat info->entry.nr == -1 as "skip" ?
> it already does
> > Note that ptrace_set_syscall_info_seccomp() -> ptrace_set_syscall_info_entry()
> > already does syscall_set_nr().
> Syscall skipping is useless if there is not a way to set the return value/errno.
>
> As I explain in the cover letter
> + The tracer can skip the system call by setting the system call number
> + to -1. However, the current PTRACE_SET_SYSCALL_INFO interface does not
> + provide a way to specify the return value or error code that should be
> + reported to the tracee after skipping the call.
>
> currently retvalue/errno can be set only at PTRACE_SYSCALL_INFO_EXIT

I meant something like below. This way both PTRACE_SYSCALL_INFO_ENTRY and
__SECCOMP can skip the syscall and set the return/errr value.

Oleg.
---


diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 5f8ef6156752..4ee7870f3291 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -90,7 +90,13 @@ struct ptrace_syscall_info {
 	union {
 		struct {
 			__u64 nr;
-			__u64 args[6];
+			union {
+				__u64 args[6];
+				struct {
+					__s64 rval;
+					__u8 is_error;
+				};
+			};
 		} entry;
 		struct {
 			__s64 rval;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 130043bfc209..1daac0e62cfa 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1031,6 +1031,28 @@ ptrace_get_syscall_info(struct task_struct *child, unsigned long user_size,
 	return copy_to_user(datavp, &info, write_size) ? -EFAULT : actual_size;
 }
 
+static int
+__set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
+			__s64 __rval, __u8 __is_error)
+{
+	long rval = __rval;
+
+	/*
+	 * Check that the return value specified in info->exit.rval
+	 * is either a value of type "long" or a sign-extended value
+	 * of type "long".
+	 */
+	if (rval != __rval)
+		return -ERANGE;
+
+	if (__is_error)
+		syscall_set_return_value(child, regs, rval, 0);
+	else
+		syscall_set_return_value(child, regs, 0, rval);
+
+	return 0;
+}
+
 static int
 ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
 			      struct ptrace_syscall_info *info)
@@ -1047,6 +1069,11 @@ ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
 	if (nr != info->entry.nr)
 		return -ERANGE;
 
+	syscall_set_nr(child, regs, nr);
+	if (nr == -1)
+		return __set_syscall_info_exit(child, regs,
+						info->entry.rval, info->entry.is_error);
+
 	for (i = 0; i < ARRAY_SIZE(args); i++) {
 		args[i] = info->entry.args[i];
 		/*
@@ -1058,16 +1085,7 @@ ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
 			return -ERANGE;
 	}
 
-	syscall_set_nr(child, regs, nr);
-	/*
-	 * If the syscall number is set to -1, setting syscall arguments is not
-	 * just pointless, it would also clobber the syscall return value on
-	 * those architectures that share the same register both for the first
-	 * argument of syscall and its return value.
-	 */
-	if (nr != -1)
-		syscall_set_arguments(child, regs, args);
-
+	syscall_set_arguments(child, regs, args);
 	return 0;
 }
 
@@ -1086,22 +1104,8 @@ static int
 ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
 			     struct ptrace_syscall_info *info)
 {
-	long rval = info->exit.rval;
-
-	/*
-	 * Check that the return value specified in info->exit.rval
-	 * is either a value of type "long" or a sign-extended value
-	 * of type "long".
-	 */
-	if (rval != info->exit.rval)
-		return -ERANGE;
-
-	if (info->exit.is_error)
-		syscall_set_return_value(child, regs, rval, 0);
-	else
-		syscall_set_return_value(child, regs, 0, rval);
-
-	return 0;
+	return __set_syscall_info_exit(child, regs,
+		info->exit.rval, info->exit.is_error);
 }
 
 static int


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02  9:58       ` Oleg Nesterov
@ 2026-07-02 11:07         ` Dmitry V. Levin
  2026-07-02 11:31           ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry V. Levin @ 2026-07-02 11:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Renzo Davoli, linux-kernel, Andrew Morton, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

On Thu, Jul 02, 2026 at 11:58:14AM +0200, Oleg Nesterov wrote:
[...]
> @@ -1047,6 +1069,11 @@ ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
>  	if (nr != info->entry.nr)
>  		return -ERANGE;
>  
> +	syscall_set_nr(child, regs, nr);
> +	if (nr == -1)
> +		return __set_syscall_info_exit(child, regs,
> +						info->entry.rval, info->entry.is_error);

The kernel shouldn't suddenly start interpreting info->entry.rval and
info->entry.is_error because the current users of this interface are not
aware that the kernel might be doing it.  If we want to extend
PTRACE_SYSCALL_INFO_ENTRY/PTRACE_SYSCALL_INFO_SECCOMP this way, we would
have to require setting a flag in info->flags signalling the kernel that
the user requests this new behaviour.


-- 
ldv

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02 11:07         ` Dmitry V. Levin
@ 2026-07-02 11:31           ` Oleg Nesterov
  2026-07-02 11:39             ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2026-07-02 11:31 UTC (permalink / raw)
  To: Dmitry V. Levin
  Cc: Renzo Davoli, linux-kernel, Andrew Morton, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

On 07/02, Dmitry V. Levin wrote:
>
> On Thu, Jul 02, 2026 at 11:58:14AM +0200, Oleg Nesterov wrote:
> [...]
> > @@ -1047,6 +1069,11 @@ ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
> >  	if (nr != info->entry.nr)
> >  		return -ERANGE;
> >
> > +	syscall_set_nr(child, regs, nr);
> > +	if (nr == -1)
> > +		return __set_syscall_info_exit(child, regs,
> > +						info->entry.rval, info->entry.is_error);
>
> The kernel shouldn't suddenly start interpreting info->entry.rval and
> info->entry.is_error because the current users of this interface are not
> aware that the kernel might be doing it.  If we want to extend
> PTRACE_SYSCALL_INFO_ENTRY/PTRACE_SYSCALL_INFO_SECCOMP this way, we would
> have to require setting a flag in info->flags signalling the kernel that
> the user requests this new behaviour.

Ah. I forgot to mention that (obviously) this is a user-visible change,
and a new flag in info->flags will be safer. Of course.

Or we can define a special SKIP_AND_SET_RVAL value for info->entry.nr.

But I am just curious, will this change (without new flag) actually break
strace? What does strace do when it uses PTRACE_SYSCALL_INFO_ENTRY with
info->entry.nr == -1?

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02 11:31           ` Oleg Nesterov
@ 2026-07-02 11:39             ` Oleg Nesterov
  2026-07-02 14:47               ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2026-07-02 11:39 UTC (permalink / raw)
  To: Dmitry V. Levin
  Cc: Renzo Davoli, linux-kernel, Andrew Morton, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

Or we can simply allow the ENTRY/SECCOMP -> EXIT transition, I dunno.
This is more safe.

But somehow I don't like the new PTRACE_SYSCALL_INFO_SECCOMP_SKIP...

Oleg.

On 07/02, Oleg Nesterov wrote:
>
> On 07/02, Dmitry V. Levin wrote:
> >
> > On Thu, Jul 02, 2026 at 11:58:14AM +0200, Oleg Nesterov wrote:
> > [...]
> > > @@ -1047,6 +1069,11 @@ ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
> > >  	if (nr != info->entry.nr)
> > >  		return -ERANGE;
> > >
> > > +	syscall_set_nr(child, regs, nr);
> > > +	if (nr == -1)
> > > +		return __set_syscall_info_exit(child, regs,
> > > +						info->entry.rval, info->entry.is_error);
> >
> > The kernel shouldn't suddenly start interpreting info->entry.rval and
> > info->entry.is_error because the current users of this interface are not
> > aware that the kernel might be doing it.  If we want to extend
> > PTRACE_SYSCALL_INFO_ENTRY/PTRACE_SYSCALL_INFO_SECCOMP this way, we would
> > have to require setting a flag in info->flags signalling the kernel that
> > the user requests this new behaviour.
>
> Ah. I forgot to mention that (obviously) this is a user-visible change,
> and a new flag in info->flags will be safer. Of course.
>
> Or we can define a special SKIP_AND_SET_RVAL value for info->entry.nr.
>
> But I am just curious, will this change (without new flag) actually break
> strace? What does strace do when it uses PTRACE_SYSCALL_INFO_ENTRY with
> info->entry.nr == -1?
> 
> Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02 11:39             ` Oleg Nesterov
@ 2026-07-02 14:47               ` Oleg Nesterov
  2026-07-02 16:10                 ` Renzo Davoli
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2026-07-02 14:47 UTC (permalink / raw)
  To: Dmitry V. Levin
  Cc: Renzo Davoli, linux-kernel, Andrew Morton, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

On 07/02, Oleg Nesterov wrote:
>
> Or we can simply allow the ENTRY/SECCOMP -> EXIT transition, I dunno.
> This is more safe.

and more simple

> But somehow I don't like the new PTRACE_SYSCALL_INFO_SECCOMP_SKIP...

Wdyt about something like below?

Oleg.
---

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 130043bfc209..ecbfa28dfbf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1084,7 +1084,7 @@ ptrace_set_syscall_info_seccomp(struct task_struct *child, struct pt_regs *regs,
 
 static int
 ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
-			     struct ptrace_syscall_info *info)
+			     struct ptrace_syscall_info *info, bool force)
 {
 	long rval = info->exit.rval;
 
@@ -1101,6 +1101,9 @@ ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
 	else
 		syscall_set_return_value(child, regs, 0, rval);
 
+	if (force)
+		syscall_set_nr(child, regs, -1);
+
 	return 0;
 }
 
@@ -1110,6 +1113,7 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 {
 	struct pt_regs *regs = task_pt_regs(child);
 	struct ptrace_syscall_info info;
+	bool force = false;
 
 	if (user_size < sizeof(info))
 		return -EINVAL;
@@ -1127,14 +1131,17 @@ ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
 		return -EINVAL;
 
 	/* Changing the type of the system call stop is not supported yet. */
-	if (ptrace_get_syscall_info_op(child) != info.op)
-		return -EINVAL;
+	if (ptrace_get_syscall_info_op(child) != info.op) {
+		if (info.op != PTRACE_SYSCALL_INFO_EXIT)
+			return -EINVAL;
+		force = true;
+	}
 
 	switch (info.op) {
 	case PTRACE_SYSCALL_INFO_ENTRY:
 		return ptrace_set_syscall_info_entry(child, regs, &info);
 	case PTRACE_SYSCALL_INFO_EXIT:
-		return ptrace_set_syscall_info_exit(child, regs, &info);
+		return ptrace_set_syscall_info_exit(child, regs, &info, force);
 	case PTRACE_SYSCALL_INFO_SECCOMP:
 		return ptrace_set_syscall_info_seccomp(child, regs, &info);
 	default:


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP
  2026-07-02 14:47               ` Oleg Nesterov
@ 2026-07-02 16:10                 ` Renzo Davoli
  0 siblings, 0 replies; 14+ messages in thread
From: Renzo Davoli @ 2026-07-02 16:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry V. Levin, linux-kernel, Andrew Morton, Shuah Khan,
	Alexey Gladkov, Eugene Syromyatnikov, Mike Frysinger,
	Davide Berardi, strace-devel

On Thu, Jul 02, 2026 at 04:47:08PM +0200, Oleg Nesterov wrote:
> On 07/02, Oleg Nesterov wrote:
> Wdyt about something like below?

I like it.

I have one comment:
> +	if (ptrace_get_syscall_info_op(child) != info.op) {
> +		if (info.op != PTRACE_SYSCALL_INFO_EXIT)
> +			return -EINVAL;
> +		force = true;
> +	}

I have found the behavior "negative syscall number => skip syscall" defined
only for PTRACE_EVENT_SECCOMP in the manual. I'd restrict the option to this
case, for safety.

A minimal detail: I'd also rename "force" into "skip" or "skip_syscall", just
for the sake of readability.

Dmitry: does this proposal have counter effects for strace?

renzo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-07-02 16:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 15:05 [PATCH 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
2026-07-01 15:05 ` [PATCH 1/5] ptrace: add PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
2026-07-02  8:43   ` Oleg Nesterov
2026-07-02  9:09     ` Renzo Davoli
2026-07-02  9:58       ` Oleg Nesterov
2026-07-02 11:07         ` Dmitry V. Levin
2026-07-02 11:31           ` Oleg Nesterov
2026-07-02 11:39             ` Oleg Nesterov
2026-07-02 14:47               ` Oleg Nesterov
2026-07-02 16:10                 ` Renzo Davoli
2026-07-01 15:05 ` [PATCH 2/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_SECCOMP_SKIP Renzo Davoli
2026-07-01 15:05 ` [PATCH 3/5] asm/ptrace.h: add instruction_pointer_set Renzo Davoli
2026-07-01 15:05 ` [PATCH 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
2026-07-01 15:05 ` [PATCH 5/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox