From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B865240DFDC for ; Wed, 15 Apr 2026 19:21:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776280899; cv=none; b=itfxasmGUSBlyuZzEGN2JioLAqCcbYGh1JClq8sE1yuB6tI3BpJDbT+F1/PnBQdLgvXJ/pbwJtvT/aHGZDBObjq4D/Rh4wKDVFCrClp658toIccbHSAQBoum/ZzbCYmKcoG27dyxGkeAqgCXd2YbgSdQKgFOevY8qx2gpzvOCHY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776280899; c=relaxed/simple; bh=rWLsBDf740LMDt6od3ffwvg/ncGpOJF5vv1VQ55Vlgo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=vB2tvApw7PNY5nnwzP7IOqA3c1Yb43Biu73tJTVzJO4Hbp6jTi+Tv4sjwxJmGjRpWUpnYhXzfMxcHAc0keFsIJxy2IrDXu7n9i6zYQOgRD9QondkKQFqtpP550rfkDwySS3wWjJC3GWml/2J0ZXLRKgPeSO6ZoffT0Pyx+3TC5Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FSZWpyCa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FSZWpyCa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56D01C19424; Wed, 15 Apr 2026 19:21:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776280899; bh=rWLsBDf740LMDt6od3ffwvg/ncGpOJF5vv1VQ55Vlgo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FSZWpyCajtXg1wGQ7XOAj0ARc1U0C4Q/RsOLALciUTYKBsNDYtjA+RDd955K3Jm7I by1ApT2ipd2uNmoyfz9OussUTDDvvh2iV9CkBopJFstJYELv+3A8ag1gCzNmJprkLJ RBv6L6/bWbYnS7Mb8YRfhwLJWAj8qseZP1GKTcLMmHz+fTrcb9fCsYGjcrnFzztPrv LO8dD1r5nAedY4mqrlfC5pfUd5VjksHZQm8YSs40BYyG3QGg2RTAI+y72yHp74wkQ7 RJVo/4i9UghDPQ828kVbTb2i29Atesv/q8nKuf32aOxVmpRwp7FBjDwvfdpHZB0Y7F jnH9ySv/9gTzA== Date: Wed, 15 Apr 2026 12:21:38 -0700 From: Kees Cook To: Oleg Nesterov Cc: Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Will Drewry , Kusaram Devineni , Max Ver , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Message-ID: <202604151217.4E571EC3E4@keescook> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Apr 15, 2026 at 12:44:25PM +0200, Oleg Nesterov wrote: > On 04/14, Oleg Nesterov wrote: > > > > Kees, Andy, et al, please comment. I think the usage of syscall_rollback() > > in __seccomp_filter() is not right. > > I'll recheck, but in fact this logic looks broken... force_sig_seccomp() assumes > that it can't race with (say) SIGSEGV which has a handler. And 2/2 makes the things > slightly worse. So self-nack for now. I've spent some more time looking at all this. It does seem to me that dropping syscall_exit_work() entirely for killed syscalls is the right way to go for fixing the audit/trace/ptrace confusion on the exit side. But I don't think it closes the whole problem. Apologies for any verbosity here, I'm kind of taking notes for myself too. :) Once the spurious exit-stop is gone, the sequence for RET_KILL becomes: - entry-stop for syscall (tracer sees "entry") - tracer PTRACE_SYSCALLs - seccomp RET_KILLs the syscall; no exit-stop - get_signal() dequeues SIGSYS - SA_IMMUTABLE means ptrace_signal() skipped and no signal-delivery stop - task dies (and maybe coredumps) - tracer's next waitpid() returns WIFSIGNALED, WTERMSIG==SIGSYS (and maybe WCOREDUMP==1) This view is technically correct (entry then death), but the tracer has no visibility into what happened as the whole siginfo that force_sig_seccomp() assembled never reaches the tracer due to SA_IMMUTABLE being set, since get_signal() short-circuits ptrace_signal() entirely. RET_TRAP doesn't have this problem (force_coredump=false, no SA_IMMUTABLE, the tracer sees a normal signal-delivery stop for SIGSYS). So the asymmetry is specifically the RET_KILL path, AIUI. I was trying to consider whether fixing this with a new ptrace event (PTRACE_EVENT_SECCOMP_KILL or a new PTRACE_SYSCALL_INFO op) would be better than reusing the existing signal-delivery stop (but perhaps in a "read-only" mode). My sense is that a new event isn't worth it, because the mutation surface a tracer can reach would be nearly the same: Tracer action | signal-stop for SIGSYS | new event ---------------+-------------------------------+-------------------------- CONT sig=0 | Direct mutation, must reject. | N/A (SIGSYS still queued) CONT sig=X | Direct mutation, must reject. | Injects X racing SIGSYS, | | must reject. SETSIGINFO | Mutates last_siginfo, | last_siginfo unset, | must reject or restore. | mostly no-op? SETREGS | Corrupts coredump view, | Same. | should reject or restore. | POKEDATA | Info only, doesn't matter. | Same. The only thing the new event gets for free is "can't suppress/replace the stopping signal," which is a single check to enforce in the sig-stop approach (ignore exit_code on resume). Register and siginfo mutation is identical in both. On the other hand, the sig-stop approach doesn't change ABI and existing tracers already handle SYS_SECCOMP siginfo because they see it on RET_TRAP today. So I'm thinking the full fix is to change what SA_IMMUTABLE actually means: instead of "ptrace is disabled", it can be "the signal cannot be changed (i.e. cannot stop the kill)". Which means in get_signal() at the SA_IMMUTABLE check, stop gating ptrace_signal() on the flag and instead pass the flag into ptrace_signal() (or check in other places) so it can run in a "read-only" mode. I think refusing tracer actions would be best, but perhaps just snapshot all the things we don't want changed? For example: - Snapshot ksig->info and the relevant pt_regs before ptrace_stop(). - After resume, if the immutable flag was set: - ignore current->exit_code; keep the original signr (no suppression, no replacement); - restore ksig->info from the snapshot (SETSIGINFO is ignored); - restore pt_regs from the snapshot so the coredump still sees the original syscall attempt that syscall_rollback() set up. - Leave POKEDATA alone: it's not a security concern, AFAICT. This preserves what SA_IMMUTABLE was actually meant to guarantee (the tracee dies, from SIGSYS, with the coredump reflecting the attempted syscall) while giving the tracer the observation point they need. rr, strace, and gdb all already know how to read SYS_SECCOMP siginfo from a SIGSYS stop, so there's nothing to teach them. However, they may not be expecting the stop, which is the only part we'd need to double check. So, tl;dr: - syscall_exit_work() skips the exit tracehook, audit, and trace when the syscall was RET_KILLed. - SA_IMMUTABLE stops disabling ptrace_signal() and starts gating mutations within it. What do you think? -Kees -- Kees Cook