Process killed by seccomp looks live by tracer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Process killed by seccomp looks live by tracer
@ 2026-03-04 10:51 Max Ver
  2026-03-04 18:05 ` Kees Cook
  0 siblings, 1 reply; 8+ messages in thread
From: Max Ver @ 2026-03-04 10:51 UTC (permalink / raw)
  To: linux-kernel, bpf; +Cc: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov

I was using ptrace to trace a tracee status, using
`PTRACE_GET_SYSCALL_INFO` and `PTRACE_SYSCALL`
to get its syscall arguments and results. When a tracee killed by its
seccomp, the tracer can't know immediately,
instead, the `PTRACE_GET_SYSCALL_INFO` tell tracer that tracee exit
with no error. The syscall wasn't actually
executed, it was captured by seccomp, even the tracee was killed by seccomp.

Here is a poc explaining what I said.
I was expecting to aware the death of tracee at the fourth
`PTRACE_GET_SYSCALL_INFO`,
at least `PTRACE_GET_SYSCALL_INFO` should tell that the syscall
failed, or get some different message from waitpid.
But the result are below. Tracer can only get the death of tracee at
the fifth loop.

---
Start tracing child
entry
exit ok
entry
exit ok
get_syscall_info: No such process
---
// poc file
#include <linux/audit.h>
#include <linux/filter.h>
#include <linux/seccomp.h>
#include <string.h>
#include <sys/ptrace.h>
#include <linux/ptrace.h>
#include <signal.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <threads.h>
#include <unistd.h>

void
child ()
{
  if (prctl (PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0))
    {
      perror ("prctl");
      exit (1);
    }
  ptrace (PTRACE_TRACEME, 0, 0, 0);
  raise (SIGSTOP);

  struct sock_filter filter[] = {
    BPF_STMT (BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
  };

  struct sock_fprog prog = {
    .len = 1,
    .filter = filter,
  };

  if (syscall (SYS_seccomp, SECCOMP_SET_MODE_FILTER, 0, &prog))
    {
      perror ("seccomp");
      exit (1);
    }
  write (1, "a", 1);
}

void
get_child_info (pid_t pid)
{
  int status;

  struct ptrace_syscall_info info;
  memset (&info, 0, sizeof (info));
  ptrace (PTRACE_SYSCALL, pid, 0, 0);
  if ((waitpid (pid, &status, 0) == -1) || WIFEXITED (status))
    {
      puts ("child exit");
      exit (1);
    }
  if (ptrace (PTRACE_GET_SYSCALL_INFO, pid,
              sizeof (struct ptrace_syscall_info), &info)
      == -1)
    {
      perror ("get_syscall_info");
      exit (1);
    }

  if (info.op == PTRACE_SYSCALL_INFO_ENTRY)
    puts ("entry");
  else if (info.op == PTRACE_SYSCALL_INFO_EXIT)
    puts (info.exit.is_error ? "exit with err" : "exit ok");
  else if (info.op == PTRACE_SYSCALL_INFO_NONE)
    puts ("none");
  else if (info.op == PTRACE_SYSCALL_INFO_SECCOMP)
    puts ("seccomp");
  else
    printf ("%d\n", info.op);
}

void
parent (pid_t pid)
{
  int status;
  waitpid (pid, &status, 0);
  puts ("Start tracing child");

  if (ptrace (PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == -1)
    exit (1);
  while (1)
    get_child_info (pid);
}

int
main ()
{
  pid_t pid = fork ();
  if (pid == 0)
    child ();
  else
    parent (pid);
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-04 10:51 Process killed by seccomp looks live by tracer Max Ver
@ 2026-03-04 18:05 ` Kees Cook
  2026-03-05  2:00   ` Max Ver
  0 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2026-03-04 18:05 UTC (permalink / raw)
  To: Max Ver, linux-kernel, bpf; +Cc: Andy Lutomirski, Will Drewry, Oleg Nesterov



On March 4, 2026 2:51:38 AM PST, Max Ver <dudududumaxver@gmail.com> wrote:
>I was using ptrace to trace a tracee status, using
>`PTRACE_GET_SYSCALL_INFO` and `PTRACE_SYSCALL`
>to get its syscall arguments and results. When a tracee killed by its
>seccomp, the tracer can't know immediately,
>instead, the `PTRACE_GET_SYSCALL_INFO` tell tracer that tracee exit
>with no error. The syscall wasn't actually
>executed, it was captured by seccomp, even the tracee was killed by seccomp.
>
>Here is a poc explaining what I said.
>I was expecting to aware the death of tracee at the fourth
>`PTRACE_GET_SYSCALL_INFO`,
>at least `PTRACE_GET_SYSCALL_INFO` should tell that the syscall
>failed, or get some different message from waitpid.
>But the result are below. Tracer can only get the death of tracee at
>the fifth loop.

This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-04 18:05 ` Kees Cook
@ 2026-03-05  2:00   ` Max Ver
  2026-03-05 14:49     ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Max Ver @ 2026-03-05  2:00 UTC (permalink / raw)
  To: Kees Cook, bpf, linux-kernel; +Cc: Andy Lutomirski, Oleg Nesterov, Will Drewry

>This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.

It also happens at the syscall exit. Take a look at the result, it
shows 'exit ok' twice.
If we can agree on this is a bug, I suggest the kernel give a hint
about tracee exit in waitpid return value, what do you think?

Kees Cook <kees@kernel.org> 于2026年3月5日周四 02:05写道：
>
>
>
> On March 4, 2026 2:51:38 AM PST, Max Ver <dudududumaxver@gmail.com> wrote:
> >I was using ptrace to trace a tracee status, using
> >`PTRACE_GET_SYSCALL_INFO` and `PTRACE_SYSCALL`
> >to get its syscall arguments and results. When a tracee killed by its
> >seccomp, the tracer can't know immediately,
> >instead, the `PTRACE_GET_SYSCALL_INFO` tell tracer that tracee exit
> >with no error. The syscall wasn't actually
> >executed, it was captured by seccomp, even the tracee was killed by seccomp.
> >
> >Here is a poc explaining what I said.
> >I was expecting to aware the death of tracee at the fourth
> >`PTRACE_GET_SYSCALL_INFO`,
> >at least `PTRACE_GET_SYSCALL_INFO` should tell that the syscall
> >failed, or get some different message from waitpid.
> >But the result are below. Tracer can only get the death of tracee at
> >the fifth loop.
>
> This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-05  2:00   ` Max Ver
@ 2026-03-05 14:49     ` Oleg Nesterov
  2026-03-05 17:45       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2026-03-05 14:49 UTC (permalink / raw)
  To: Max Ver; +Cc: Kees Cook, bpf, linux-kernel, Andy Lutomirski, Will Drewry

Hi Max,

On 03/05, Max Ver wrote:
>
> >This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.
>
> It also happens at the syscall exit. Take a look at the result, it
> shows 'exit ok' twice.

Why do you think this is wrong? (and I don't think this has something to
do with seccomp, btw).

> If we can agree on this is a bug, I suggest the kernel give a hint
> about tracee exit in waitpid return value, what do you think?

But the kernel already gives you a hint, no?

Perhaps I missed your point, but see the change of your test-case below.

Oleg.


--- /tmp/PT.c~	2026-03-05 15:18:18.397319905 +0100
+++ /tmp/PT.c	2026-03-05 15:40:11.044415647 +0100
@@ -15,6 +15,8 @@
 #include <sys/wait.h>
 #include <threads.h>
 #include <unistd.h>
+#include <assert.h>
+#include <errno.h>
 
 void
 child ()
@@ -57,6 +59,14 @@
       puts ("child exit");
       exit (1);
     }
+
+	if (WIFSIGNALED(status)) {
+		printf("signalled pid=%d sig=%d\n", pid, WTERMSIG(status));
+		assert(kill(pid, 0) == -1 && errno == ESRCH);
+		exit(0);
+	}
+
+
   if (ptrace (PTRACE_GET_SYSCALL_INFO, pid,
               sizeof (struct ptrace_syscall_info), &info)
       == -1)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-05 14:49     ` Oleg Nesterov
@ 2026-03-05 17:45       ` Oleg Nesterov
  2026-03-06  2:55         ` Max Ver
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2026-03-05 17:45 UTC (permalink / raw)
  To: Max Ver; +Cc: Kees Cook, bpf, linux-kernel, Andy Lutomirski, Will Drewry

That said...

__seccomp_filter() does

	case SECCOMP_RET_KILL_PROCESS:
	...
		/* Show the original registers in the dump. */
		syscall_rollback(current, current_pt_regs());

		/* Trigger a coredump with SIGSYS */
		force_sig_seccomp(this_syscall, data, true);

This means that after syscall_rollback() regs->ax == orig_ax, so
ptrace_get_syscall_info_exit() will always report .is_error == 0.

And since force_sig_seccomp() uses force_coredump == true, SIGSYS
won't be reported (see the SA_IMMUTABLE check in get_signal()).

Again, it is not that I think this wrong. But perhaps Kees and Andy
can take a look and confirm that this is what we actually want.

Oleg.

On 03/05, Oleg Nesterov wrote:
>
> Hi Max,
>
> On 03/05, Max Ver wrote:
> >
> > >This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.
> >
> > It also happens at the syscall exit. Take a look at the result, it
> > shows 'exit ok' twice.
>
> Why do you think this is wrong? (and I don't think this has something to
> do with seccomp, btw).
>
> > If we can agree on this is a bug, I suggest the kernel give a hint
> > about tracee exit in waitpid return value, what do you think?
>
> But the kernel already gives you a hint, no?
>
> Perhaps I missed your point, but see the change of your test-case below.
>
> Oleg.
>
>
> --- /tmp/PT.c~	2026-03-05 15:18:18.397319905 +0100
> +++ /tmp/PT.c	2026-03-05 15:40:11.044415647 +0100
> @@ -15,6 +15,8 @@
>  #include <sys/wait.h>
>  #include <threads.h>
>  #include <unistd.h>
> +#include <assert.h>
> +#include <errno.h>
>
>  void
>  child ()
> @@ -57,6 +59,14 @@
>        puts ("child exit");
>        exit (1);
>      }
> +
> +	if (WIFSIGNALED(status)) {
> +		printf("signalled pid=%d sig=%d\n", pid, WTERMSIG(status));
> +		assert(kill(pid, 0) == -1 && errno == ESRCH);
> +		exit(0);
> +	}
> +
> +
>    if (ptrace (PTRACE_GET_SYSCALL_INFO, pid,
>                sizeof (struct ptrace_syscall_info), &info)
>        == -1)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-05 17:45       ` Oleg Nesterov
@ 2026-03-06  2:55         ` Max Ver
  2026-03-08 13:08           ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Max Ver @ 2026-03-06  2:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: bpf, linux-kernel, Andy Lutomirski, Will Drewry, Kees Cook

Thanks for the `WIFSIGNALED` check, it does work at the fifth loop.

>Why do you think this is wrong? (and I don't think this has something to
do with seccomp, btw).

I suppose it's more reasonable for kernel to give a hint just after
the syscall killed by seccomp at the fourth loop. So that we can know
the syscall is rollbacked, or else we can only assume the syscall may
succeed.

Oleg Nesterov <oleg@redhat.com> 于2026年3月6日周五 01:46写道：
>
> That said...
>
> __seccomp_filter() does
>
>         case SECCOMP_RET_KILL_PROCESS:
>         ...
>                 /* Show the original registers in the dump. */
>                 syscall_rollback(current, current_pt_regs());
>
>                 /* Trigger a coredump with SIGSYS */
>                 force_sig_seccomp(this_syscall, data, true);
>
> This means that after syscall_rollback() regs->ax == orig_ax, so
> ptrace_get_syscall_info_exit() will always report .is_error == 0.
>
> And since force_sig_seccomp() uses force_coredump == true, SIGSYS
> won't be reported (see the SA_IMMUTABLE check in get_signal()).
>
> Again, it is not that I think this wrong. But perhaps Kees and Andy
> can take a look and confirm that this is what we actually want.
>
> Oleg.
>
> On 03/05, Oleg Nesterov wrote:
> >
> > Hi Max,
> >
> > On 03/05, Max Ver wrote:
> > >
> > > >This is expected; PTRACE_GET_SYSCALL_INFO is at syscall entry before seccomp filtering has run.
> > >
> > > It also happens at the syscall exit. Take a look at the result, it
> > > shows 'exit ok' twice.
> >
> > Why do you think this is wrong? (and I don't think this has something to
> > do with seccomp, btw).
> >
> > > If we can agree on this is a bug, I suggest the kernel give a hint
> > > about tracee exit in waitpid return value, what do you think?
> >
> > But the kernel already gives you a hint, no?
> >
> > Perhaps I missed your point, but see the change of your test-case below.
> >
> > Oleg.
> >
> >
> > --- /tmp/PT.c~        2026-03-05 15:18:18.397319905 +0100
> > +++ /tmp/PT.c 2026-03-05 15:40:11.044415647 +0100
> > @@ -15,6 +15,8 @@
> >  #include <sys/wait.h>
> >  #include <threads.h>
> >  #include <unistd.h>
> > +#include <assert.h>
> > +#include <errno.h>
> >
> >  void
> >  child ()
> > @@ -57,6 +59,14 @@
> >        puts ("child exit");
> >        exit (1);
> >      }
> > +
> > +     if (WIFSIGNALED(status)) {
> > +             printf("signalled pid=%d sig=%d\n", pid, WTERMSIG(status));
> > +             assert(kill(pid, 0) == -1 && errno == ESRCH);
> > +             exit(0);
> > +     }
> > +
> > +
> >    if (ptrace (PTRACE_GET_SYSCALL_INFO, pid,
> >                sizeof (struct ptrace_syscall_info), &info)
> >        == -1)
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-06  2:55         ` Max Ver
@ 2026-03-08 13:08           ` Oleg Nesterov
  2026-03-22 13:40             ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2026-03-08 13:08 UTC (permalink / raw)
  To: Max Ver, Andy Lutomirski, Kees Cook, Dmitry Levin
  Cc: bpf, linux-kernel, Will Drewry

On 03/06, Max Ver wrote:
>
> I suppose it's more reasonable for kernel to give a hint just after
> the syscall killed by seccomp at the fourth loop. So that we can know
> the syscall is rollbacked, or else we can only assume the syscall may
> succeed.

Perhaps you are right, but this is a question for seccomp experts...

Kees, Andy, what do you think?

Say, we can do something like

	--- a/kernel/ptrace.c
	+++ b/kernel/ptrace.c
	@@ -951,11 +951,20 @@ ptrace_get_syscall_info_seccomp(struct task_struct *child, struct pt_regs *regs,
		return offsetofend(struct ptrace_syscall_info, seccomp.ret_data);
	 }
	 
	+// currently not exposed
	+#define SECCOMP_MODE_DEAD	(SECCOMP_MODE_FILTER + 1)
	+
	+static long xxx_get_error(struct task_struct *task, struct pt_regs *regs)
	+{
	+	return task->seccomp.mode == SECCOMP_MODE_DEAD
	+		? -EINTR : syscall_get_error(task, regs);
	+}
	+
	 static unsigned long
	 ptrace_get_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
				     struct ptrace_syscall_info *info)
	 {
	-	info->exit.rval = syscall_get_error(child, regs);
	+	info->exit.rval = xxx_get_error(child, regs);
		info->exit.is_error = !!info->exit.rval;
		if (!info->exit.is_error)
			info->exit.rval = syscall_get_return_value(child, regs);

but probably this is not a good solution.

Perhaps we can add a new "killed_by_seccomp" member into ptrace_syscall_info.exit ?

Or even add a new ptrace_syscall_info.op = PTRACE_SYSCALL_INFO_KILLED_BY_SECCOMP ?

Or change ptrace_report_syscall(PTRACE_EVENTMSG_SYSCALL_EXIT) to not report the event
if the tracee was killed by force_sig_seccomp(force_coredump => true) ?

Oleg.

>
> Oleg Nesterov <oleg@redhat.com> 于2026年3月6日周五 01:46写道：
> >
> > That said...
> >
> > __seccomp_filter() does
> >
> >         case SECCOMP_RET_KILL_PROCESS:
> >         ...
> >                 /* Show the original registers in the dump. */
> >                 syscall_rollback(current, current_pt_regs());
> >
> >                 /* Trigger a coredump with SIGSYS */
> >                 force_sig_seccomp(this_syscall, data, true);
> >
> > This means that after syscall_rollback() regs->ax == orig_ax, so
> > ptrace_get_syscall_info_exit() will always report .is_error == 0.
> >
> > And since force_sig_seccomp() uses force_coredump == true, SIGSYS
> > won't be reported (see the SA_IMMUTABLE check in get_signal()).
> >
> > Again, it is not that I think this wrong. But perhaps Kees and Andy
> > can take a look and confirm that this is what we actually want.
> >
> > Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Process killed by seccomp looks live by tracer
  2026-03-08 13:08           ` Oleg Nesterov
@ 2026-03-22 13:40             ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2026-03-22 13:40 UTC (permalink / raw)
  To: Max Ver, Andy Lutomirski, Kees Cook, Dmitry Levin
  Cc: bpf, linux-kernel, Will Drewry

On 03/08, Oleg Nesterov wrote:

> Perhaps you are right, but this is a question for seccomp experts...
>
> Kees, Andy, what do you think?

no comments ;)

Kees, Andy, I think that Max has a point. I'll send the RFC patch in
a minute.

Please review.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-22 13:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 10:51 Process killed by seccomp looks live by tracer Max Ver
2026-03-04 18:05 ` Kees Cook
2026-03-05  2:00   ` Max Ver
2026-03-05 14:49     ` Oleg Nesterov
2026-03-05 17:45       ` Oleg Nesterov
2026-03-06  2:55         ` Max Ver
2026-03-08 13:08           ` Oleg Nesterov
2026-03-22 13:40             ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox