[RFC PATCH 0/2] seccomp: defer syscall_rollback() to get

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
@ 2026-04-14 16:47 Oleg Nesterov
  2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:47 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Kusaram Devineni, Max Ver, linux-kernel

Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
in __seccomp_filter() is not right.

This is just RFC.

In fact I think that syscall_exit_work() should do nothing if a
syscall was rejected with force_sig_seccomp() by __seccomp_filter().
If nothing else, the syscall was never actually executed.

Perhaps we can add a new SYSCALL_WORK_SYSCALL_XXX to SYSCALL_WORK_EXIT.
seccomp_nack_syscall() can set this flag, and syscall_exit_work() can do

	if (work & SYSCALL_WORK_SYSCALL_XXX) {
		clear_syscall_work(SYSCALL_XXX); // for the !force_coredump case
		return;
	}

after the "if (SYSCALL_WORK_SYSCALL_USER_DISPATCH)" block.

But I didn't dare to do such a change.

What do you think?

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper
  2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
@ 2026-04-14 16:48 ` Oleg Nesterov
  2026-04-14 16:48 ` [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
  2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
  2 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:48 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Kusaram Devineni, Max Ver, linux-kernel

To factor out the code and simplify the next change

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/seccomp.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..cb8dd78791cd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1256,6 +1256,14 @@ static int seccomp_do_user_notification(int this_syscall,
 	return -1;
 }
 
+static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
+{
+	/* Show the handler or coredump the original registers. */
+	syscall_rollback(current, current_pt_regs());
+	/* Let the filter pass back 16 bits of data. */
+	force_sig_seccomp(this_syscall, data, force_coredump);
+}
+
 static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 {
 	u32 filter_ret, action;
@@ -1285,10 +1293,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		goto skip;
 
 	case SECCOMP_RET_TRAP:
-		/* Show the handler the original registers. */
-		syscall_rollback(current, current_pt_regs());
-		/* Let the filter pass back 16 bits of data. */
-		force_sig_seccomp(this_syscall, data, false);
+		seccomp_nack_syscall(this_syscall, data, false);
 		goto skip;
 
 	case SECCOMP_RET_TRACE:
@@ -1360,10 +1365,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		/* Dump core only if this is the last remaining thread. */
 		if (action != SECCOMP_RET_KILL_THREAD ||
 		    (atomic_read(&current->signal->live) == 1)) {
-			/* Show the original registers in the dump. */
-			syscall_rollback(current, current_pt_regs());
-			/* Trigger a coredump with SIGSYS */
-			force_sig_seccomp(this_syscall, data, true);
+			seccomp_nack_syscall(this_syscall, data, true);
 		} else {
 			do_exit(SIGSYS);
 		}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
  2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
@ 2026-04-14 16:48 ` Oleg Nesterov
  2026-04-14 17:27   ` Kees Cook
  2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
  2 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:48 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Kusaram Devineni, Max Ver, linux-kernel

Currently, seccomp_nack_syscall() calls syscall_rollback() immediately.
Because this restores the original registers, the syscall exit path sees
the original syscall number as the return value.

This confuses audit_syscall_exit(), trace_syscall_exit(), and ptrace.

Change seccomp_nack_syscall() to call syscall_set_return_value(-EINTR),
and add the new check_force_sig_seccomp() helper called by get_signal()
which does syscall_rollback() if the signal was sent by seccomp.

Note that the si_code == SYS_SECCOMP check in check_force_sig_seccomp()
is not 100% reliable, see the comment in check_force_sig_seccomp(), but
I hope we don't really care.

Reported-by: Max Ver <dudududumaxver@gmail.com>
Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/seccomp.c |  4 ++--
 kernel/signal.c  | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index cb8dd78791cd..a8d103054212 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1258,8 +1258,8 @@ static int seccomp_do_user_notification(int this_syscall,
 
 static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
 {
-	/* Show the handler or coredump the original registers. */
-	syscall_rollback(current, current_pt_regs());
+	/* check_force_sig_seccomp() will restore the original registers */
+	syscall_set_return_value(current, current_pt_regs(), -EINTR, 0);
 	/* Let the filter pass back 16 bits of data. */
 	force_sig_seccomp(this_syscall, data, force_coredump);
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index d65d0fe24bfb..b93e37517d6d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2796,6 +2796,24 @@ static void hide_si_addr_tag_bits(struct ksignal *ksig)
 	}
 }
 
+static inline void check_force_sig_seccomp(kernel_siginfo_t *info)
+{
+	/*
+	 * See seccomp_nack_syscall(). Show the original registers to
+	 * the handler or coredump.
+	 *
+	 * Note: a task can send a .si_code == SYS_SECCOMP signal to
+	 * itself, but syscall_rollback() is harmless in this case.
+	 * SYS_SECCOMP can also be missed if a prior SIGSYS was pending
+	 * and blocked before force_sig_seccomp(), but in that case the
+	 * seccomp siginfo is already lost anyway.
+	 */
+	if (IS_ENABLED(CONFIG_SECCOMP_FILTER)) {
+		if (info->si_code == SYS_SECCOMP)
+			syscall_rollback(current, current_pt_regs());
+	}
+}
+
 bool get_signal(struct ksignal *ksig)
 {
 	struct sighand_struct *sighand = current->sighand;
@@ -2916,6 +2934,8 @@ bool get_signal(struct ksignal *ksig)
 		if (!signr)
 			break; /* will return 0 */
 
+		check_force_sig_seccomp(&ksig->info);
+
 		if (unlikely(current->ptrace) && (signr != SIGKILL) &&
 		    !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
 			signr = ptrace_signal(signr, &ksig->info, type);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-14 16:48 ` [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
@ 2026-04-14 17:27   ` Kees Cook
  2026-04-14 17:41     ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2026-04-14 17:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On Tue, Apr 14, 2026 at 06:48:20PM +0200, Oleg Nesterov wrote:
> Currently, seccomp_nack_syscall() calls syscall_rollback() immediately.
> Because this restores the original registers, the syscall exit path sees
> the original syscall number as the return value.
> 
> This confuses audit_syscall_exit(), trace_syscall_exit(), and ptrace.
> 
> Change seccomp_nack_syscall() to call syscall_set_return_value(-EINTR),
> and add the new check_force_sig_seccomp() helper called by get_signal()
> which does syscall_rollback() if the signal was sent by seccomp.
> 
> Note that the si_code == SYS_SECCOMP check in check_force_sig_seccomp()
> is not 100% reliable, see the comment in check_force_sig_seccomp(), but
> I hope we don't really care.
> 
> Reported-by: Max Ver <dudududumaxver@gmail.com>
> Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Can we also add a new selftest for this case? I'd like to be sure we
don't regress when we make changes in the future...

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-14 17:27   ` Kees Cook
@ 2026-04-14 17:41     ` Oleg Nesterov
  2026-04-15 15:50       ` Kees Cook
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-14 17:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On 04/14, Kees Cook wrote:
>
> On Tue, Apr 14, 2026 at 06:48:20PM +0200, Oleg Nesterov wrote:
> > Currently, seccomp_nack_syscall() calls syscall_rollback() immediately.
> > Because this restores the original registers, the syscall exit path sees
> > the original syscall number as the return value.
> >
> > This confuses audit_syscall_exit(), trace_syscall_exit(), and ptrace.
> >
> > Change seccomp_nack_syscall() to call syscall_set_return_value(-EINTR),
> > and add the new check_force_sig_seccomp() helper called by get_signal()
> > which does syscall_rollback() if the signal was sent by seccomp.
> >
> > Note that the si_code == SYS_SECCOMP check in check_force_sig_seccomp()
> > is not 100% reliable, see the comment in check_force_sig_seccomp(), but
> > I hope we don't really care.
> >
> > Reported-by: Max Ver <dudududumaxver@gmail.com>
> > Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Can we also add a new selftest for this case?

Yes sure. but do you agree with this RFC approach?

See also 0/2. Perhaps SYSCALL_WORK_SYSCALL_XXX makes more sense?

I do think it makes more sense and it is closer to my initial
"[RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp"
attempt/

But I'm afraid this change would be "too visible".

> I'd like to be sure we
> don't regress when we make changes in the future...

Yes, I understand.

And just in case... I ran tools/testing/selftests/seccomp/seccomp_bpf, it
doesn't show any regression.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
  2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
  2026-04-14 16:48 ` [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
@ 2026-04-15 10:44 ` Oleg Nesterov
  2026-04-15 16:07   ` Kees Cook
  2026-04-15 19:21   ` Kees Cook
  2 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-15 10:44 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Kusaram Devineni, Max Ver, linux-kernel

On 04/14, Oleg Nesterov wrote:
>
> Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
> in __seccomp_filter() is not right.

I'll recheck, but in fact this logic looks broken... force_sig_seccomp() assumes
that it can't race with (say) SIGSEGV which has a handler. And 2/2 makes the things
slightly worse. So self-nack for now.

> In fact I think that syscall_exit_work() should do nothing if a
> syscall was rejected with force_sig_seccomp() by __seccomp_filter().
> If nothing else, the syscall was never actually executed.
>
> Perhaps we can add a new SYSCALL_WORK_SYSCALL_XXX to SYSCALL_WORK_EXIT.
> seccomp_nack_syscall() can set this flag, and syscall_exit_work() can do
>
> 	if (work & SYSCALL_WORK_SYSCALL_XXX) {
> 		clear_syscall_work(SYSCALL_XXX); // for the !force_coredump case
> 		return;
> 	}
>
> after the "if (SYSCALL_WORK_SYSCALL_USER_DISPATCH)" block.
>
> But I didn't dare to do such a change.
>
> What do you think?

I'll try to send a patch based on above this week.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-14 17:41     ` Oleg Nesterov
@ 2026-04-15 15:50       ` Kees Cook
  2026-04-15 16:08         ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2026-04-15 15:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On Tue, Apr 14, 2026 at 07:41:39PM +0200, Oleg Nesterov wrote:
> Yes sure. but do you agree with this RFC approach?

I like it so far; I'm going to run the rr regression tests to
double-check.

> See also 0/2. Perhaps SYSCALL_WORK_SYSCALL_XXX makes more sense?

This _feels_ like a more complex solution, but I'll study it more.

> And just in case... I ran tools/testing/selftests/seccomp/seccomp_bpf, it
> doesn't show any regression.

Great!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
@ 2026-04-15 16:07   ` Kees Cook
  2026-04-15 19:21   ` Kees Cook
  1 sibling, 0 replies; 11+ messages in thread
From: Kees Cook @ 2026-04-15 16:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On Wed, Apr 15, 2026 at 12:44:25PM +0200, Oleg Nesterov wrote:
> On 04/14, Oleg Nesterov wrote:
> >
> > Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
> > in __seccomp_filter() is not right.
> 
> I'll recheck, but in fact this logic looks broken... force_sig_seccomp() assumes
> that it can't race with (say) SIGSEGV which has a handler. And 2/2 makes the things
> slightly worse. So self-nack for now.

Oh, I just read this now. Yeah, that's a good point. Hrmpf. A corner
case, but yeah, the proposed change makes things worse.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-15 15:50       ` Kees Cook
@ 2026-04-15 16:08         ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-15 16:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On 04/15, Kees Cook wrote:
>
> On Tue, Apr 14, 2026 at 07:41:39PM +0200, Oleg Nesterov wrote:
> > Yes sure. but do you agree with this RFC approach?
>
> I like it so far; I'm going to run the rr regression tests to
> double-check.

Thanks!

But see my reply to 0/2 ... I'll write another email later.

And I just noticed that I forgot to check info->si_signo == SIGSYS
in check_force_sig_seccomp().

So if you are going to run the test, please apply the fix below...

Oleg.


diff --git a/kernel/signal.c b/kernel/signal.c
index b93e37517d6d..49d73e4991b2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2809,7 +2816,7 @@ static inline void check_force_sig_seccomp(kernel_siginfo_t *info)
 	 * seccomp siginfo is already lost anyway.
 	 */
 	if (IS_ENABLED(CONFIG_SECCOMP_FILTER)) {
-		if (info->si_code == SYS_SECCOMP)
+		if (info->si_signo == SIGSYS && info->si_code == SYS_SECCOMP)
 			syscall_rollback(current, current_pt_regs());
 	}
 }


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
  2026-04-15 16:07   ` Kees Cook
@ 2026-04-15 19:21   ` Kees Cook
  2026-04-16 14:07     ` Oleg Nesterov
  1 sibling, 1 reply; 11+ messages in thread
From: Kees Cook @ 2026-04-15 19:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On Wed, Apr 15, 2026 at 12:44:25PM +0200, Oleg Nesterov wrote:
> On 04/14, Oleg Nesterov wrote:
> >
> > Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
> > in __seccomp_filter() is not right.
> 
> I'll recheck, but in fact this logic looks broken... force_sig_seccomp() assumes
> that it can't race with (say) SIGSEGV which has a handler. And 2/2 makes the things
> slightly worse. So self-nack for now.

I've spent some more time looking at all this. It does seem to me that
dropping syscall_exit_work() entirely for killed syscalls is the right way
to go for fixing the audit/trace/ptrace confusion on the exit side. But
I don't think it closes the whole problem. Apologies for any verbosity
here, I'm kind of taking notes for myself too. :)

Once the spurious exit-stop is gone, the sequence for RET_KILL becomes:

 - entry-stop for syscall (tracer sees "entry")
 - tracer PTRACE_SYSCALLs
 - seccomp RET_KILLs the syscall; no exit-stop
 - get_signal() dequeues SIGSYS
   - SA_IMMUTABLE means ptrace_signal() skipped and no signal-delivery stop
 - task dies (and maybe coredumps)
 - tracer's next waitpid() returns WIFSIGNALED, WTERMSIG==SIGSYS
    (and maybe WCOREDUMP==1)

This view is technically correct (entry then death), but the
tracer has no visibility into what happened as the whole siginfo that
force_sig_seccomp() assembled never reaches the tracer due to SA_IMMUTABLE
being set, since get_signal() short-circuits ptrace_signal() entirely.

RET_TRAP doesn't have this problem (force_coredump=false, no SA_IMMUTABLE,
the tracer sees a normal signal-delivery stop for SIGSYS). So the
asymmetry is specifically the RET_KILL path, AIUI.

I was trying to consider whether fixing this with a new ptrace event
(PTRACE_EVENT_SECCOMP_KILL or a new PTRACE_SYSCALL_INFO op) would be
better than reusing the existing signal-delivery stop (but perhaps in a
"read-only" mode). My sense is that a new event isn't worth it, because
the mutation surface a tracer can reach would be nearly the same:

Tracer action  | signal-stop for SIGSYS        | new event
---------------+-------------------------------+--------------------------
CONT sig=0     | Direct mutation, must reject. | N/A (SIGSYS still queued)
CONT sig=X     | Direct mutation, must reject. | Injects X racing SIGSYS,
               |                               | must reject.
SETSIGINFO     | Mutates last_siginfo,         | last_siginfo unset,
               | must reject or restore.       | mostly no-op?
SETREGS        | Corrupts coredump view,       | Same.
               | should reject or restore.     |
POKEDATA       | Info only, doesn't matter.    | Same.

The only thing the new event gets for free is "can't suppress/replace
the stopping signal," which is a single check to enforce in the sig-stop
approach (ignore exit_code on resume). Register and siginfo mutation is
identical in both. On the other hand, the sig-stop approach doesn't change
ABI and existing tracers already handle SYS_SECCOMP siginfo because they
see it on RET_TRAP today.

So I'm thinking the full fix is to change what SA_IMMUTABLE actually
means: instead of "ptrace is disabled", it can be "the signal cannot
be changed (i.e. cannot stop the kill)". Which means in get_signal()
at the SA_IMMUTABLE check, stop gating ptrace_signal() on the flag and
instead pass the flag into ptrace_signal() (or check in other places) so
it can run in a "read-only" mode. I think refusing tracer actions would
be best, but perhaps just snapshot all the things we don't want changed?
For example:

 - Snapshot ksig->info and the relevant pt_regs before ptrace_stop().
 - After resume, if the immutable flag was set:
   - ignore current->exit_code; keep the original signr (no
     suppression, no replacement);
   - restore ksig->info from the snapshot (SETSIGINFO is ignored);
   - restore pt_regs from the snapshot so the coredump still sees
     the original syscall attempt that syscall_rollback() set up.
 - Leave POKEDATA alone: it's not a security concern, AFAICT.

This preserves what SA_IMMUTABLE was actually meant to guarantee (the
tracee dies, from SIGSYS, with the coredump reflecting the attempted
syscall) while giving the tracer the observation point they need. rr,
strace, and gdb all already know how to read SYS_SECCOMP siginfo from
a SIGSYS stop, so there's nothing to teach them. However, they may not
be expecting the stop, which is the only part we'd need to double check.

So, tl;dr:

 - syscall_exit_work() skips the exit tracehook, audit, and trace
   when the syscall was RET_KILLed.
 - SA_IMMUTABLE stops disabling ptrace_signal() and starts gating
   mutations within it.

What do you think?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
  2026-04-15 19:21   ` Kees Cook
@ 2026-04-16 14:07     ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2026-04-16 14:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Will Drewry,
	Kusaram Devineni, Max Ver, linux-kernel

On 04/15, Kees Cook wrote:
>
> I've spent some more time looking at all this. It does seem to me that
> dropping syscall_exit_work() entirely for killed syscalls is the right way
> to go for fixing the audit/trace/ptrace confusion on the exit side.

OK, great. I'll try to make a patch as soon I have time. Hopefully this week.

> But
> I don't think it closes the whole problem.

I guess we can discuss this in more detail later/separately?

> Apologies for any verbosity
> here, I'm kind of taking notes for myself too. :)

Thanks for the detailed email ;)

I will snip some parts for now...

> I was trying to consider whether fixing this with a new ptrace event
> (PTRACE_EVENT_SECCOMP_KILL or a new PTRACE_SYSCALL_INFO op) would be
> better than reusing the existing signal-delivery stop (but perhaps in a
> "read-only" mode). My sense is that a new event isn't worth it,

Agreed,

> So I'm thinking the full fix is to change what SA_IMMUTABLE actually
> means: instead of "ptrace is disabled", it can be "the signal cannot
> be changed (i.e. cannot stop the kill)". Which means in get_signal()
> at the SA_IMMUTABLE check, stop gating ptrace_signal() on the flag and
> instead pass the flag into ptrace_signal() (or check in other places) so
> it can run in a "read-only" mode.

OK, we can add something like PT_FREEZED which leaves in task->ptrace,
but see below.

> I think refusing tracer actions would be best,

agreed

>  - syscall_exit_work() skips the exit tracehook, audit, and trace
>    when the syscall was RET_KILLed.

Good ;)

>  - SA_IMMUTABLE stops disabling ptrace_signal() and starts gating
>    mutations within it.

Honestly, I am not sure this is really useful... But I do not know.
And again, we can discuss this later I hope.

----------------------------------------------------------------------
Now a stupid question ;)

Why does __seccomp_filter() use syscall_rollback() anyway?

OK, may be ax == orig_ax makes sense for coredump, I dunno.

But
	case SECCOMP_RET_TRAP:
		/* Show the handler the original registers. */
		syscall_rollback(current, current_pt_regs());
		/* Let the filter pass back 16 bits of data. */
		force_sig_seccomp(this_syscall, data, false);

the handler can just use info.si_syscall instead of sigcontext.rax ?

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-16 14:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 17:27   ` Kees Cook
2026-04-14 17:41     ` Oleg Nesterov
2026-04-15 15:50       ` Kees Cook
2026-04-15 16:08         ` Oleg Nesterov
2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
2026-04-15 16:07   ` Kees Cook
2026-04-15 19:21   ` Kees Cook
2026-04-16 14:07     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox