[RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
@ 2026-03-22 13:44 Oleg Nesterov
  2026-03-22 14:47 ` Kees Cook
  2026-03-22 16:36 ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2026-03-22 13:44 UTC (permalink / raw)
  To: Andrew Morton, Andy Lutomirski, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Will Drewry
  Cc: Max Ver, linux-kernel

__seccomp_filter() does

	case SECCOMP_RET_KILL_THREAD:
	case SECCOMP_RET_KILL_PROCESS:
	...
		/* Show the original registers in the dump. */
		syscall_rollback(current, current_pt_regs());

		/* Trigger a coredump with SIGSYS */
		force_sig_seccomp(this_syscall, data, true);

syscall_rollback() does regs->ax == orig_ax. This means that
ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
it looks as if the aborted syscall actually succeeded and returned its
own syscall number.

And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
will "silently" exit with error_code == SIGSYS after the bogus report.

Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
reports if the tracee is SECCOMP_MODE_DEAD.

TODO: With or without this change, get_signal() -> ptrace_signal() may
report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
too and prioritize the fatal SIGSYS.

Reported-by: Max Ver <dudududumaxver@gmail.com>
Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/entry-common.h | 3 +++
 include/linux/seccomp.h      | 8 ++++++++
 kernel/seccomp.c             | 3 ---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f83ca0abf2cd..5c62bda9dcf9 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -250,6 +250,9 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
 		trace_syscall_exit(regs, syscall_get_return_value(current, regs));
 
+	if (killed_by_seccomp(current))
+		return;
+
 	step = report_single_step(work);
 	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
 		arch_ptrace_report_syscall_exit(regs, step);
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9b959972bf4a..e95a251955c1 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -22,6 +22,12 @@
 #include <linux/atomic.h>
 #include <asm/seccomp.h>
 
+/* Not exposed in uapi headers: internal use only. */
+#define SECCOMP_MODE_DEAD	(SECCOMP_MODE_FILTER + 1)
+
+#define killed_by_seccomp(task)	\
+	((task)->seccomp.mode == SECCOMP_MODE_DEAD)
+
 extern int __secure_computing(void);
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -49,6 +55,8 @@ static inline int seccomp_mode(struct seccomp *s)
 
 struct seccomp_data;
 
+#define killed_by_seccomp(task)	0
+
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
 static inline int secure_computing(void) { return 0; }
 #else
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..461eb15c66c3 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -31,9 +31,6 @@
 
 #include <asm/syscall.h>
 
-/* Not exposed in headers: strictly internal use only. */
-#define SECCOMP_MODE_DEAD	(SECCOMP_MODE_FILTER + 1)
-
 #ifdef CONFIG_SECCOMP_FILTER
 #include <linux/file.h>
 #include <linux/filter.h>
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-22 13:44 [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp Oleg Nesterov
@ 2026-03-22 14:47 ` Kees Cook
  2026-03-22 15:14   ` Oleg Nesterov
  2026-03-22 16:36 ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Kees Cook @ 2026-03-22 14:47 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry
  Cc: Max Ver, linux-kernel



On March 22, 2026 6:44:54 AM PDT, Oleg Nesterov <oleg@redhat.com> wrote:
>__seccomp_filter() does
>
>	case SECCOMP_RET_KILL_THREAD:
>	case SECCOMP_RET_KILL_PROCESS:
>	...
>		/* Show the original registers in the dump. */
>		syscall_rollback(current, current_pt_regs());
>
>		/* Trigger a coredump with SIGSYS */
>		force_sig_seccomp(this_syscall, data, true);
>
>syscall_rollback() does regs->ax == orig_ax. This means that
>ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
>it looks as if the aborted syscall actually succeeded and returned its
>own syscall number.
>
>And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
>be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
>will "silently" exit with error_code == SIGSYS after the bogus report.
>
>Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
>reports if the tracee is SECCOMP_MODE_DEAD.
>
>TODO: With or without this change, get_signal() -> ptrace_signal() may
>report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
>Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
>too and prioritize the fatal SIGSYS.
>
>Reported-by: Max Ver <dudududumaxver@gmail.com>
>Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
>Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>---
> include/linux/entry-common.h | 3 +++
> include/linux/seccomp.h      | 8 ++++++++
> kernel/seccomp.c             | 3 ---
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>index f83ca0abf2cd..5c62bda9dcf9 100644
>--- a/include/linux/entry-common.h
>+++ b/include/linux/entry-common.h
>@@ -250,6 +250,9 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
> 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> 		trace_syscall_exit(regs, syscall_get_return_value(current, regs));
> 
>+	if (killed_by_seccomp(current))
>+		return;

Hmm. I'm still not convinced this is right, but if we make this change, I'd want to see a behavioral test added (likely to the seccomp self tests), and to make sure the rr test suite doesn't regress. It's traditionally been the most sensitive to these kinds of notification ordering/behavior changes.

-Kees

>+
> 	step = report_single_step(work);
> 	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> 		arch_ptrace_report_syscall_exit(regs, step);
>diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>index 9b959972bf4a..e95a251955c1 100644
>--- a/include/linux/seccomp.h
>+++ b/include/linux/seccomp.h
>@@ -22,6 +22,12 @@
> #include <linux/atomic.h>
> #include <asm/seccomp.h>
> 
>+/* Not exposed in uapi headers: internal use only. */
>+#define SECCOMP_MODE_DEAD	(SECCOMP_MODE_FILTER + 1)
>+
>+#define killed_by_seccomp(task)	\
>+	((task)->seccomp.mode == SECCOMP_MODE_DEAD)
>+
> extern int __secure_computing(void);
> 
> #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
>@@ -49,6 +55,8 @@ static inline int seccomp_mode(struct seccomp *s)
> 
> struct seccomp_data;
> 
>+#define killed_by_seccomp(task)	0
>+
> #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
> static inline int secure_computing(void) { return 0; }
> #else
>diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>index 066909393c38..461eb15c66c3 100644
>--- a/kernel/seccomp.c
>+++ b/kernel/seccomp.c
>@@ -31,9 +31,6 @@
> 
> #include <asm/syscall.h>
> 
>-/* Not exposed in headers: strictly internal use only. */
>-#define SECCOMP_MODE_DEAD	(SECCOMP_MODE_FILTER + 1)
>-
> #ifdef CONFIG_SECCOMP_FILTER
> #include <linux/file.h>
> #include <linux/filter.h>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-22 14:47 ` Kees Cook
@ 2026-03-22 15:14   ` Oleg Nesterov
  2026-03-23 12:09     ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2026-03-22 15:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Will Drewry, Max Ver, linux-kernel

On 03/22, Kees Cook wrote:
>
> On March 22, 2026 6:44:54 AM PDT, Oleg Nesterov <oleg@redhat.com> wrote:
> >__seccomp_filter() does
> >
> >	case SECCOMP_RET_KILL_THREAD:
> >	case SECCOMP_RET_KILL_PROCESS:
> >	...
> >		/* Show the original registers in the dump. */
> >		syscall_rollback(current, current_pt_regs());
> >
> >		/* Trigger a coredump with SIGSYS */
> >		force_sig_seccomp(this_syscall, data, true);
> >
> >syscall_rollback() does regs->ax == orig_ax. This means that
> >ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
> >it looks as if the aborted syscall actually succeeded and returned its
> >own syscall number.
> >
> >And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
> >be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
> >will "silently" exit with error_code == SIGSYS after the bogus report.
> >
> >Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
> >reports if the tracee is SECCOMP_MODE_DEAD.
> >
> >TODO: With or without this change, get_signal() -> ptrace_signal() may
> >report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
> >Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
> >too and prioritize the fatal SIGSYS.
> >
> >Reported-by: Max Ver <dudududumaxver@gmail.com>
> >Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
> >Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >---
> > include/linux/entry-common.h | 3 +++
> > include/linux/seccomp.h      | 8 ++++++++
> > kernel/seccomp.c             | 3 ---
> > 3 files changed, 11 insertions(+), 3 deletions(-)
> >
> >diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> >index f83ca0abf2cd..5c62bda9dcf9 100644
> >--- a/include/linux/entry-common.h
> >+++ b/include/linux/entry-common.h
> >@@ -250,6 +250,9 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
> > 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> > 		trace_syscall_exit(regs, syscall_get_return_value(current, regs));
> >
> >+	if (killed_by_seccomp(current))
> >+		return;
>
> Hmm. I'm still not convinced this is right,

Me too actually ;)

That is why RFC. So:

	- Do you agree that the current behaviour is not really "sane" and
	  can confuse ptracers?

	- If yes, what else do you think we can do? No, I no longer think it
	  makes sense to change the ptrace_get_syscall_info_exit() paths...


> but if we make this change, I'd want to see a behavioral test added
> (likely to the seccomp self tests), and to make sure the rr test suite doesn't regress.

OK. I'll try to take a look at these tests and possibly add another one.

But (sorry) not the next week, I will be travelling.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-22 13:44 [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp Oleg Nesterov
  2026-03-22 14:47 ` Kees Cook
@ 2026-03-22 16:36 ` Andrew Morton
  2026-03-22 17:32   ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2026-03-22 16:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry, Max Ver, linux-kernel

On Sun, 22 Mar 2026 14:44:54 +0100 Oleg Nesterov <oleg@redhat.com> wrote:

> __seccomp_filter() does
> 
> 	case SECCOMP_RET_KILL_THREAD:
> 	case SECCOMP_RET_KILL_PROCESS:
> 	...
> 		/* Show the original registers in the dump. */
> 		syscall_rollback(current, current_pt_regs());
> 
> 		/* Trigger a coredump with SIGSYS */
> 		force_sig_seccomp(this_syscall, data, true);
> 
> syscall_rollback() does regs->ax == orig_ax. This means that
> ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
> it looks as if the aborted syscall actually succeeded and returned its
> own syscall number.
> 
> And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
> be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
> will "silently" exit with error_code == SIGSYS after the bogus report.
> 
> Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
> reports if the tracee is SECCOMP_MODE_DEAD.
> 
> TODO: With or without this change, get_signal() -> ptrace_signal() may
> report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
> Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
> too and prioritize the fatal SIGSYS.

AI review has questions:
	https://sashiko.dev/#/patchset/ab_yVqQ7WW3flal3@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-22 16:36 ` Andrew Morton
@ 2026-03-22 17:32   ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2026-03-22 17:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry, Max Ver, linux-kernel

On 03/22, Andrew Morton wrote:
>
> On Sun, 22 Mar 2026 14:44:54 +0100 Oleg Nesterov <oleg@redhat.com> wrote:
>
> > __seccomp_filter() does
> >
> > 	case SECCOMP_RET_KILL_THREAD:
> > 	case SECCOMP_RET_KILL_PROCESS:
> > 	...
> > 		/* Show the original registers in the dump. */
> > 		syscall_rollback(current, current_pt_regs());
> >
> > 		/* Trigger a coredump with SIGSYS */
> > 		force_sig_seccomp(this_syscall, data, true);
> >
> > syscall_rollback() does regs->ax == orig_ax. This means that
> > ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
> > it looks as if the aborted syscall actually succeeded and returned its
> > own syscall number.
> >
> > And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
> > be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
> > will "silently" exit with error_code == SIGSYS after the bogus report.
> >
> > Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
> > reports if the tracee is SECCOMP_MODE_DEAD.
> >
> > TODO: With or without this change, get_signal() -> ptrace_signal() may
> > report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
> > Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
> > too and prioritize the fatal SIGSYS.
>
> AI review has questions:
> 	https://sashiko.dev/#/patchset/ab_yVqQ7WW3flal3@redhat.com

Excellent question ;) Thanks sashiko!

I will have this in mind when (if) I send V2.

So far my main concern is the behavioral change caused by my RFC, I will wait
for more comments before that.

In any case: yes! I have missed another syscall_rollback() on SECCOMP_RET_TRAP in
__seccomp_filter(). In this case force_sig_seccomp() uses force_coredump == false,
so SIGSYS will be reported. But this doesn't really make a difference wrt ptrace
confusion.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-22 15:14   ` Oleg Nesterov
@ 2026-03-23 12:09     ` Oleg Nesterov
  2026-04-03 15:26       ` Kusaram Devineni
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2026-03-23 12:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Will Drewry, Max Ver, linux-kernel

On 03/22, Oleg Nesterov wrote:
>
> On 03/22, Kees Cook wrote:
> >
> > Hmm. I'm still not convinced this is right,
>
> Me too actually ;)
>
> That is why RFC. So:
>
> 	- Do you agree that the current behaviour is not really "sane" and
> 	  can confuse ptracers?
>
> 	- If yes, what else do you think we can do? No, I no longer think it
> 	  makes sense to change the ptrace_get_syscall_info_exit() paths...

Perhaps _something_ like the change below makes more sense?

Oleg.

--- x/kernel/seccomp.c
+++ x/kernel/seccomp.c
@@ -1357,8 +1357,8 @@ static int __seccomp_filter(int this_sys
 		/* Dump core only if this is the last remaining thread. */
 		if (action != SECCOMP_RET_KILL_THREAD ||
 		    (atomic_read(&current->signal->live) == 1)) {
-			/* Show the original registers in the dump. */
-			syscall_rollback(current, current_pt_regs());
+			syscall_set_return_value(current, current_pt_regs(),
+						 -EINTR, 0);
 			/* Trigger a coredump with SIGSYS */
 			force_sig_seccomp(this_syscall, data, true);
 		} else {
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2916,6 +2916,11 @@ bool get_signal(struct ksignal *ksig)
 		if (!signr)
 			break; /* will return 0 */
 
+
+		// incomplete and ugly, just for illustration
+		if (ksig->info.si_code == SYS_SECCOMP)
+			syscall_rollback(current, current_pt_regs());
+
 		if (unlikely(current->ptrace) && (signr != SIGKILL) &&
 		    !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
 			signr = ptrace_signal(signr, &ksig->info, type);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-03-23 12:09     ` Oleg Nesterov
@ 2026-04-03 15:26       ` Kusaram Devineni
  2026-04-03 15:48         ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Kusaram Devineni @ 2026-04-03 15:26 UTC (permalink / raw)
  To: Oleg Nesterov, Kees Cook
  Cc: Andrew Morton, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Will Drewry, Max Ver, linux-kernel

On 23-03-2026 17:39, Oleg Nesterov wrote:
> Perhaps _something_ like the change below makes more sense?

We have been working internally on a related issue in the same 
seccomp/signal area, so sharing our thoughts here in case they are useful.

This change does seem closer to the real condition than checking
SECCOMP_MODE_DEAD in syscall_exit_work(). In our analysis too, the bogus 
syscall-exit report appears to be a real issue, in seccomp paths which 
do syscall_rollback(), e.g. the fatal kill path and also 
SECCOMP_RET_TRAP, the return register no longer reflects a valid exit 
result. So ptrace can observe a value that did not come from a completed 
syscall.

Because of that, using SECCOMP_MODE_DEAD still feels a bit broader than 
the exact condition. It couples syscall-exit suppression to a persistent 
seccomp task state, while the reason to suppress reporting seems more 
specific to a single syscall instance: once that syscall has been rolled 
back, it never actually completed, so there is no valid exit result to 
report. From that point of view, a per-syscall “aborted after rollback” 
condition still feels like the more natural abstraction.

It also seems worth considering whether the same issue extends beyond 
ptrace syscall-exit reporting to other exit-side observers such as 
audit_syscall_exit() and trace_syscall_exit().

Also, on the TODO from the RFC:

> TODO: With or without this change, get_signal() -> ptrace_signal() may
> report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
> Perhaps it makes sense to change get_signal() to check
> SECCOMP_MODE_DEAD too and prioritize the fatal SIGSYS.

while tracing the same overall issue locally, we hit another path where 
the forced fatal SIGSYS could be taken off the normal delivery path 
before get_signal() handled it, in our case via signalfd. There,
force_sig_seccomp(..., true) marks SIGSYS as SA_IMMUTABLE via 
HANDLER_EXIT, but signalfd could still dequeue it before normal fatal 
delivery.

So this direction looks better than the original RFC, but for the 
overall solution to be reliable, it would probably also need to ensure 
that a forced fatal SA_IMMUTABLE signal is not bypassed by other 
signal-ordering, delivery, or consumption paths.

Thanks
Kusaram

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-04-03 15:26       ` Kusaram Devineni
@ 2026-04-03 15:48         ` Oleg Nesterov
  2026-04-03 17:16           ` Kusaram Devineni
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2026-04-03 15:48 UTC (permalink / raw)
  To: Kusaram Devineni
  Cc: Kees Cook, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry, Max Ver, linux-kernel

Thanks Kusaram!

I was travelling, hope to send V2 this weekend. And write a more
detailed reply.

Just one note for now:

On 04/03, Kusaram Devineni wrote:
>
> while tracing the same overall issue locally, we hit another path where the
> forced fatal SIGSYS could be taken off the normal delivery path before
> get_signal() handled it, in our case via signalfd. There,
> force_sig_seccomp(..., true) marks SIGSYS as SA_IMMUTABLE via HANDLER_EXIT,
> but signalfd could still dequeue it before normal fatal delivery.

How?

seccomp does force_sig_seccomp() sends the signal to current, current can't
return to usermode and call signalfd_dequeue(), get_signal() must dequeue
SIGSYS and notice SA_IMMUTABLE.

And since this signal is private, signalfd_dequeue() from another thread can't
dequeue it either.

No?

Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-04-03 15:48         ` Oleg Nesterov
@ 2026-04-03 17:16           ` Kusaram Devineni
  2026-04-04 14:33             ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Kusaram Devineni @ 2026-04-03 17:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry, Max Ver, linux-kernel

On 03-04-2026 21:18, Oleg Nesterov wrote:

 > seccomp does force_sig_seccomp() sends the signal to current, current 
can't
 > return to usermode and call signalfd_dequeue(), get_signal() must dequeue
 > SIGSYS and notice SA_IMMUTABLE.

 > And since this signal is private, signalfd_dequeue() from another 
thread can't
 > dequeue it either.

 > No?

Right Oleg, not by returning to userspace and calling signalfd_dequeue() 
afterward,
and not from another thread.

We identified a case when working on a syzbot bug
https://syzbot.org/bug?extid=0a4c46806941297fecb9 where the forced 
SIGSYS was
consumed through the signalfd path from task_work on the same task 
before get_signal()
handled normal fatal delivery. The setup there had an outstanding 
io_uring-driven signalfd
request, and task_work_run() executed before get_signal() dequeued the 
fatal SIGSYS.

So the sequence was roughly:
     seccomp -> force_sig_seccomp(..., true) -> pending private SIGSYS
     get_signal() entry -> task_work_run()
     task_work/signalfd path consumes SIGSYS
     get_signal() then no longer sees it to dequeue

That allowed the task to survive 'long enough' to enter another syscall in
SECCOMP_MODE_DEAD and hit the WARN_ON_ONCE() in __secure_computing().

So your point is correct in the normal case: current cannot return to 
userspace
and then call signalfd_dequeue(), and another thread cannot dequeue this
private signal. The case we hit was narrower and more specific: same-task
consumption via task_work before normal fatal delivery.

For that specific path, one approach that seems to work is making 
signalfd exclude
SA_IMMUTABLE signals from the mask it passes to 
next_signal()/dequeue_signal(),
so kernel-forced fatal signals remain pending for normal delivery via 
get_signal().

Kusaram

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-04-03 17:16           ` Kusaram Devineni
@ 2026-04-04 14:33             ` Oleg Nesterov
  2026-04-05 15:57               ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2026-04-04 14:33 UTC (permalink / raw)
  To: Kusaram Devineni
  Cc: Kees Cook, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry, Max Ver, linux-kernel

On 04/03, Kusaram Devineni wrote:
>
> On 03-04-2026 21:18, Oleg Nesterov wrote:
>
> > seccomp does force_sig_seccomp() sends the signal to current, current
> can't
> > return to usermode and call signalfd_dequeue(), get_signal() must dequeue
> > SIGSYS and notice SA_IMMUTABLE.
>
> > And since this signal is private, signalfd_dequeue() from another thread
> can't
> > dequeue it either.
>
> > No?
>
> Right Oleg, not by returning to userspace and calling signalfd_dequeue()
> afterward,
> and not from another thread.
>
> We identified a case when working on a syzbot bug
> https://syzbot.org/bug?extid=0a4c46806941297fecb9 where the forced SIGSYS
> was
> consumed through the signalfd path from task_work on the same task before
> get_signal()
> handled normal fatal delivery. The setup there had an outstanding
> io_uring-driven signalfd

Aaah... Thanks again.

OK, this is another (although related) issue, lets discuss it separately.

> For that specific path, one approach that seems to work is making signalfd
> exclude
> SA_IMMUTABLE signals from the mask it passes to

Perhaps... But this is nasty.

May be something like "brute force" hack I sent to syzbot can work...

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-04-04 14:33             ` Oleg Nesterov
@ 2026-04-05 15:57               ` Oleg Nesterov
  2026-04-06 10:43                 ` Kusaram Devineni
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2026-04-05 15:57 UTC (permalink / raw)
  To: Kusaram Devineni
  Cc: Kees Cook, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry, Max Ver, linux-kernel

On 04/04, Oleg Nesterov wrote:
>
> On 04/03, Kusaram Devineni wrote:
> >
> OK, this is another (although related) issue, lets discuss it separately.

Yes...

> > For that specific path, one approach that seems to work is making signalfd
> > exclude
> > SA_IMMUTABLE signals from the mask it passes to
>
> Perhaps... But this is nasty.

OK, lets do it. I'll send the patch in a minute. It was already tested by syzbot.
Sorry, forgot to CC you, see https://lore.kernel.org/all/adJvw9gEC9D1Gxtq@redhat.com/

I don't see a better fix for now. Hopefully we can cleanup this later.
I think force_exit_sig / force_sig_seccomp should make fatal_signal_pending()
true, but we need a simple fix...

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp
  2026-04-05 15:57               ` Oleg Nesterov
@ 2026-04-06 10:43                 ` Kusaram Devineni
  0 siblings, 0 replies; 12+ messages in thread
From: Kusaram Devineni @ 2026-04-06 10:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Andrew Morton, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Will Drewry, Max Ver, linux-kernel

> I don't see a better fix for now. Hopefully we can cleanup this later.
> I think force_exit_sig / force_sig_seccomp should make fatal_signal_pending()
> true, but we need a simple fix...

Thanks Oleg.

This is exactly aligned with what we validated locally as well. Excluding
SA_IMMUTABLE signals from the signalfd dequeue mask addresses that directly.

So this fix makes sense as the contained solution for the reported bug.
The broader signal-ordering / fatal_signal_pending() cleanup you mentioned
also seems worthwhile, but as a follow-up rather than something to fold into
this fix.

Kusaram

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-04-06 10:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 13:44 [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp Oleg Nesterov
2026-03-22 14:47 ` Kees Cook
2026-03-22 15:14   ` Oleg Nesterov
2026-03-23 12:09     ` Oleg Nesterov
2026-04-03 15:26       ` Kusaram Devineni
2026-04-03 15:48         ` Oleg Nesterov
2026-04-03 17:16           ` Kusaram Devineni
2026-04-04 14:33             ` Oleg Nesterov
2026-04-05 15:57               ` Oleg Nesterov
2026-04-06 10:43                 ` Kusaram Devineni
2026-03-22 16:36 ` Andrew Morton
2026-03-22 17:32   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox