All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Robert Święcki" <robert@swiecki.net>,
	"Andy Lutomirski" <luto@amacapital.net>,
	"Will Drewry" <wad@chromium.org>,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH 0/3] signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
Date: Thu, 10 Feb 2022 12:43:14 -0800	[thread overview]
Message-ID: <202202101137.B48D02138@keescook> (raw)
In-Reply-To: <87pmnu5z28.fsf@email.froward.int.ebiederm.org>

On Thu, Feb 10, 2022 at 12:58:07PM -0600, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Thu, Feb 10, 2022 at 12:17:50PM -0600, Eric W. Biederman wrote:
> >> Kees Cook <keescook@chromium.org> writes:
> >> 
> >> > Hi,
> >> >
> >> > This fixes the signal refactoring to actually kill unkillable processes
> >> > when receiving a fatal SIGSYS from seccomp. Thanks to Robert for the
> >> > report and Eric for the fix! I've also tweaked seccomp internal a bit to
> >> > fail more safely. This was a partial seccomp bypass, in the sense that
> >> > SECCOMP_RET_KILL_* didn't kill the process, but it didn't bypass other
> >> > aspects of the filters. (i.e. the syscall was still blocked, etc.)
> >> 
> >> Any luck on figuring out how to suppress the extra event?
> >
> > I haven't found a good single indicator of a process being in an "I am dying"
> > state, and even if I did, it seems every architecture's exit path would
> > need to add a new test.
> 
> The "I am dying" state for a task is fatal_signal_pending, at least
> before get_signal is reached, for a process there is SIGNAL_GROUP_EXIT.
> Something I am busily cleaning up and making more reliable at the
> moment.

The state I need to catch is "I am dying and this syscall was
interrupted". fatal_signal_pending() is kind of only the first half
(though it doesn't cover fatal SIGSYS?)

For example, if a process hits a BUG() in the middle of running a
syscall, that syscall isn't expected to "exit" from the perspective of
userspace. This is similarly true for seccomp's fatal SIGSYS.

> What is the event that is happening?  Is it
> tracehook_report_syscall_exit or something else?

Yes, but in more completely, it's these three, which are called in
various fashions from architecture syscall exit code:

	audit_syscall_exit()		(audit)
	trace_sys_exit()		(see "TRACE_EVENT_FN(sys_exit,")
	tracehook_report_syscall_exit()	(ptrace)

> From the bits I have seen it seems like something else.

But yes, the place Robert and I both noticed it was with ptrace from
tracehook_report_syscall_exit(), which is rather poorly named. :)

Looking at the results, audit_syscall_exit() and trace_sys_exit() need
to be skipped too, since they would each be reporting potential nonsense.

> > The best approach seems to be clearing the TIF_*WORK* bits, but that's
> > still a bit arch-specific. And I'm not sure which layer would do that.
> > At what point have we decided the process will not continue? More
> > than seccomp was calling do_exit() in the middle of a syscall, but those
> > appear to have all been either SIGKILL or SIGSEGV?
> 
> This is where I get confused what TIF_WORK bits matter?

This is where I wish all the architectures were using the common syscall
code. The old do_exit() path would completely skip _everything_ in the
exit path, so it was like never calling anything after the syscall
dispatch table. The only userspace visible things in there are triggered
from having TIF_WORK... flags (but again, it's kind of a per-arch mess).

Skipping the entire exit path makes a fair bit of sense. For example,
rseq_syscall() is redundant (forcing SIGSEGV).

Regardless, at least the three places above need to be skipped.

But just testing fatal_signal_pending() seems wrong: a normal syscall
could be finishing just fine, it just happens to have a fatal signal
ready to be processed.

Here's the ordering after a syscall on x86 from do_syscall_64():

do_syscall_x64()
	sys_call_table[...](regs)
syscall_exit_to_user_mode()
	__syscall_exit_to_user_mode_work()
		syscall_exit_to_user_mode_prepare()
			syscall_exit_work()
				arch_syscall_exit_tracehook()
					tracehook_report_syscall_exit()
	exit_to_user_mode_prepare()
		exit_to_user_mode_loop()
			handle_signal_work()
				arch_do_signal_or_restart()
					get_signal()
						do_group_exit()

Here's arm64 from el0_svc():

do_el0_svc()
	el0_svc_common()
		invoke_syscall()
			syscall_table[...](regs)
		syscall_trace_exit()
			tracehook_report_syscall()
				tracehook_report_syscall_exit()
exit_to_user_mode()
	prepare_exit_to_user_mode()
		do_notify_resume()
			do_signal()
				get_signal()
					do_group_exit()

In the past, any do_exit() would short circuit everything after the
syscall table. Now, we do all the exit work before starting the return
to user mode which is what processes the signals. So I guess there's
more precisely a difference between "visible to userspace" and "return
to userspace".

(an aside: where to PF_IO_WORKER threads die?)

> I expect if anything else mattered we would need to change it to
> HANDLER_EXIT.
> 
> I made a mistake conflating to cases and I want to make certain I
> successfully separate those two cases at the end of the day.

For skipping the exit work, I'm not sure it matters, since all the
signal stuff is "too late"...

-- 
Kees Cook

  reply	other threads:[~2022-02-10 20:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-10  2:53 [PATCH 0/3] signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE Kees Cook
2022-02-10  2:53 ` [PATCH 1/3] " Kees Cook
2022-02-10 16:18   ` Jann Horn
2022-02-10 17:37     ` Kees Cook
2022-02-10 18:01       ` Jann Horn
2022-02-10 18:12         ` Eric W. Biederman
2022-02-10 21:09         ` Kees Cook
2022-02-11 20:15           ` Jann Horn
2022-02-10 18:16   ` Eric W. Biederman
2022-02-10  2:53 ` [PATCH 2/3] seccomp: Invalidate seccomp mode to catch death failures Kees Cook
2022-02-10  2:53 ` [PATCH 3/3] samples/seccomp: Adjust sample to also provide kill option Kees Cook
2022-02-10 18:17 ` [PATCH 0/3] signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE Eric W. Biederman
2022-02-10 18:41   ` Kees Cook
2022-02-10 18:58     ` Eric W. Biederman
2022-02-10 20:43       ` Kees Cook [this message]
2022-02-10 22:48         ` Eric W. Biederman
2022-02-11  1:26           ` Kees Cook
2022-02-11  1:47             ` Eric W. Biederman
2022-02-11  2:53               ` Kees Cook
2022-02-11 12:54                 ` Robert Święcki
2022-02-11 17:46                   ` Eric W. Biederman
2022-02-11 18:57                     ` Robert Święcki
2022-02-11 20:01                     ` Kees Cook
2022-02-11 19:58                   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202202101137.B48D02138@keescook \
    --to=keescook@chromium.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=robert@swiecki.net \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.