Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Kees Cook <kees@kernel.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@kernel.org>, Will Drewry <wad@chromium.org>,
	Kusaram Devineni <kusaram@devineni.in>,
	Max Ver <dudududumaxver@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
Date: Wed, 15 Apr 2026 12:21:38 -0700	[thread overview]
Message-ID: <202604151217.4E571EC3E4@keescook> (raw)
In-Reply-To: <ad9sCZbKTrNIAOuq@redhat.com>

On Wed, Apr 15, 2026 at 12:44:25PM +0200, Oleg Nesterov wrote:
> On 04/14, Oleg Nesterov wrote:
> >
> > Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
> > in __seccomp_filter() is not right.
> 
> I'll recheck, but in fact this logic looks broken... force_sig_seccomp() assumes
> that it can't race with (say) SIGSEGV which has a handler. And 2/2 makes the things
> slightly worse. So self-nack for now.

I've spent some more time looking at all this. It does seem to me that
dropping syscall_exit_work() entirely for killed syscalls is the right way
to go for fixing the audit/trace/ptrace confusion on the exit side. But
I don't think it closes the whole problem. Apologies for any verbosity
here, I'm kind of taking notes for myself too. :)

Once the spurious exit-stop is gone, the sequence for RET_KILL becomes:

 - entry-stop for syscall (tracer sees "entry")
 - tracer PTRACE_SYSCALLs
 - seccomp RET_KILLs the syscall; no exit-stop
 - get_signal() dequeues SIGSYS
   - SA_IMMUTABLE means ptrace_signal() skipped and no signal-delivery stop
 - task dies (and maybe coredumps)
 - tracer's next waitpid() returns WIFSIGNALED, WTERMSIG==SIGSYS
    (and maybe WCOREDUMP==1)

This view is technically correct (entry then death), but the
tracer has no visibility into what happened as the whole siginfo that
force_sig_seccomp() assembled never reaches the tracer due to SA_IMMUTABLE
being set, since get_signal() short-circuits ptrace_signal() entirely.

RET_TRAP doesn't have this problem (force_coredump=false, no SA_IMMUTABLE,
the tracer sees a normal signal-delivery stop for SIGSYS). So the
asymmetry is specifically the RET_KILL path, AIUI.

I was trying to consider whether fixing this with a new ptrace event
(PTRACE_EVENT_SECCOMP_KILL or a new PTRACE_SYSCALL_INFO op) would be
better than reusing the existing signal-delivery stop (but perhaps in a
"read-only" mode). My sense is that a new event isn't worth it, because
the mutation surface a tracer can reach would be nearly the same:

Tracer action  | signal-stop for SIGSYS        | new event
---------------+-------------------------------+--------------------------
CONT sig=0     | Direct mutation, must reject. | N/A (SIGSYS still queued)
CONT sig=X     | Direct mutation, must reject. | Injects X racing SIGSYS,
               |                               | must reject.
SETSIGINFO     | Mutates last_siginfo,         | last_siginfo unset,
               | must reject or restore.       | mostly no-op?
SETREGS        | Corrupts coredump view,       | Same.
               | should reject or restore.     |
POKEDATA       | Info only, doesn't matter.    | Same.

The only thing the new event gets for free is "can't suppress/replace
the stopping signal," which is a single check to enforce in the sig-stop
approach (ignore exit_code on resume). Register and siginfo mutation is
identical in both. On the other hand, the sig-stop approach doesn't change
ABI and existing tracers already handle SYS_SECCOMP siginfo because they
see it on RET_TRAP today.

So I'm thinking the full fix is to change what SA_IMMUTABLE actually
means: instead of "ptrace is disabled", it can be "the signal cannot
be changed (i.e. cannot stop the kill)". Which means in get_signal()
at the SA_IMMUTABLE check, stop gating ptrace_signal() on the flag and
instead pass the flag into ptrace_signal() (or check in other places) so
it can run in a "read-only" mode. I think refusing tracer actions would
be best, but perhaps just snapshot all the things we don't want changed?
For example:

 - Snapshot ksig->info and the relevant pt_regs before ptrace_stop().
 - After resume, if the immutable flag was set:
   - ignore current->exit_code; keep the original signr (no
     suppression, no replacement);
   - restore ksig->info from the snapshot (SETSIGINFO is ignored);
   - restore pt_regs from the snapshot so the coredump still sees
     the original syscall attempt that syscall_rollback() set up.
 - Leave POKEDATA alone: it's not a security concern, AFAICT.

This preserves what SA_IMMUTABLE was actually meant to guarantee (the
tracee dies, from SIGSYS, with the coredump reflecting the attempted
syscall) while giving the tracer the observation point they need. rr,
strace, and gdb all already know how to read SYS_SECCOMP siginfo from
a SIGSYS stop, so there's nothing to teach them. However, they may not
be expecting the stop, which is the only part we'd need to double check.

So, tl;dr:

 - syscall_exit_work() skips the exit tracehook, audit, and trace
   when the syscall was RET_KILLed.
 - SA_IMMUTABLE stops disabling ptrace_signal() and starts gating
   mutations within it.

What do you think?

-Kees

-- 
Kees Cook

next prev parent reply	other threads:[~2026-04-15 19:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 2/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 17:27   ` Kees Cook
2026-04-14 17:41     ` Oleg Nesterov
2026-04-15 15:50       ` Kees Cook
2026-04-15 16:08         ` Oleg Nesterov
2026-04-15 10:44 ` [RFC PATCH 0/2] " Oleg Nesterov
2026-04-15 16:07   ` Kees Cook
2026-04-15 19:21   ` Kees Cook [this message]
2026-04-16 14:07     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202604151217.4E571EC3E4@keescook \
    --to=kees@kernel.org \
    --cc=dudududumaxver@gmail.com \
    --cc=kusaram@devineni.in \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@kernel.org \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox