[PATCH 0/3] signal: requeuing undeliverable signals

public inbox for linux-api@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] signal: requeuing undeliverable signals
       [not found]         ` <CAP045AqsstnxfTyXhhCGDSucqGN7BTtfHJ5s6ZxUQC5K-JU56A@mail.gmail.com>
@ 2021-11-16  5:29           ` Eric W. Biederman
  2021-11-16  5:32             ` [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop Eric W. Biederman
                               ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-16  5:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, linux-api, Al Viro,
	Linus Torvalds, Kees Cook


Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
ptrace_signal from delivering a signal, as the kernel setups up a signal
frame for a signal that rr did not have a chance to observe with ptrace.

In looking into it I found a couple of bugs and a quality of
implementation issue.

- The test for signal_group_exit should be inside the for loop in get_signal.
- Signals should be requeued on the same queue they were dequeued from.
- When a fatal signal is pending ptrace_signal should not return another
  signal for delivery.

Kyle Huey has verified[2] an earlier version of this change.

I have reworked things one more time to completely fix the issues
raised, and to keep the code maintainable long term.

I have smoke tested this code and combined with a careful review I
expect this code to work fine.  Kyle if you can double check that
my last round of changes still works for rr I would appreciate it.

Eric W. Biederman (3):
      signal: In get_signal test for signal_group_exit every time through the loop
      signal: Requeue signals in the appropriate queue
      signal: Requeue ptrace signals

 fs/signalfd.c                |  5 +++--
 include/linux/sched/signal.h |  7 ++++---
 kernel/signal.c              | 44 ++++++++++++++++++++++++++------------------
 3 files changed, 33 insertions(+), 23 deletions(-)

[1] https://lkml.kernel.org/r/20211101034147.6203-1-khuey@kylehuey.com
[2] https://lkml.kernel.org/r/CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop
  2021-11-16  5:29           ` [PATCH 0/3] signal: requeuing undeliverable signals Eric W. Biederman
@ 2021-11-16  5:32             ` Eric W. Biederman
  2021-11-16 18:23               ` Kees Cook
  2021-11-16  5:33             ` [PATCH 2/3] signal: Requeue signals in the appropriate queue Eric W. Biederman
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-16  5:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, linux-api, Al Viro,
	Linus Torvalds, Kees Cook


Recently while investigating a problem with rr and signals I noticed
that siglock is dropped in ptrace_signal and get_signal does not jump
to relock.

Looking farther to see if the problem is anywhere else I see that
do_signal_stop also returns if signal_group_exit is true.  I believe
that test can now never be true, but it is a bit hard to trace
through and be certain.

Testing signal_group_exit is not expensive, so move the test for
signal_group_exit into the for loop inside of get_signal to ensure
the test is never skipped improperly.

This has been a potential since I added the test for signal_group_exit
was added.

Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7c4b7ae714d4..986fa69c15c5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
 		goto relock;
 	}
 
-	/* Has this task already been marked for death? */
-	if (signal_group_exit(signal)) {
-		ksig->info.si_signo = signr = SIGKILL;
-		sigdelset(&current->pending.signal, SIGKILL);
-		trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
-				&sighand->action[SIGKILL - 1]);
-		recalc_sigpending();
-		goto fatal;
-	}
-
 	for (;;) {
 		struct k_sigaction *ka;
 
+		/* Has this task already been marked for death? */
+		if (signal_group_exit(signal)) {
+			ksig->info.si_signo = signr = SIGKILL;
+			sigdelset(&current->pending.signal, SIGKILL);
+			trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
+				&sighand->action[SIGKILL - 1]);
+			recalc_sigpending();
+			goto fatal;
+		}
+
 		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
 		    do_signal_stop(0))
 			goto relock;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] signal: Requeue signals in the appropriate queue
  2021-11-16  5:29           ` [PATCH 0/3] signal: requeuing undeliverable signals Eric W. Biederman
  2021-11-16  5:32             ` [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop Eric W. Biederman
@ 2021-11-16  5:33             ` Eric W. Biederman
  2021-11-16 18:30               ` Kees Cook
  2021-11-16  5:34             ` [PATCH 3/3] signal: Requeue ptrace signals Eric W. Biederman
  2021-11-17 16:24             ` [PATCH 0/3] signal: requeuing undeliverable signals Kyle Huey
  3 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-16  5:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, linux-api, Al Viro,
	Linus Torvalds, Kees Cook


In the event that a tracer changes which signal needs to be delivered
and that signal is currently blocked then the signal needs to be
requeued for later delivery.

With the advent of CLONE_THREAD the kernel has 2 signal queues per
task.  The per process queue and the per task queue.  Update the code
so that if the signal is removed from the per process queue it is
requeued on the per process queue.  This is necessary to make it
appear the signal was never dequeued.

The rr debugger reasonably believes that the state of the process from
the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
by simply letting a process run.  If a SIGKILL interrupts a ptrace_stop
this is not true today.

So return signals to their original queue in ptrace_signal so that
signals that are not delivered appear like they were never dequeued.

Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/signalfd.c                |  5 +++--
 include/linux/sched/signal.h |  7 ++++---
 kernel/signal.c              | 21 ++++++++++++++-------
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 040e1cf90528..74f134cd1ff6 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
 				int nonblock)
 {
+	enum pid_type type;
 	ssize_t ret;
 	DECLARE_WAITQUEUE(wait, current);
 
 	spin_lock_irq(&current->sighand->siglock);
-	ret = dequeue_signal(current, &ctx->sigmask, info);
+	ret = dequeue_signal(current, &ctx->sigmask, info, &type);
 	switch (ret) {
 	case 0:
 		if (!nonblock)
@@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
 	add_wait_queue(&current->sighand->signalfd_wqh, &wait);
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);
-		ret = dequeue_signal(current, &ctx->sigmask, info);
+		ret = dequeue_signal(current, &ctx->sigmask, info, &type);
 		if (ret != 0)
 			break;
 		if (signal_pending(current)) {
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 23505394ef70..167995d471da 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
 extern void flush_signals(struct task_struct *);
 extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
-extern int dequeue_signal(struct task_struct *task,
-			  sigset_t *mask, kernel_siginfo_t *info);
+extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
+			  kernel_siginfo_t *info, enum pid_type *type);
 
 static inline int kernel_dequeue_signal(void)
 {
 	struct task_struct *task = current;
 	kernel_siginfo_t __info;
+	enum pid_type __type;
 	int ret;
 
 	spin_lock_irq(&task->sighand->siglock);
-	ret = dequeue_signal(task, &task->blocked, &__info);
+	ret = dequeue_signal(task, &task->blocked, &__info, &__type);
 	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
diff --git a/kernel/signal.c b/kernel/signal.c
index 986fa69c15c5..43e8b7e362b0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
  *
  * All callers have to hold the siglock.
  */
-int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
+int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
+		   kernel_siginfo_t *info, enum pid_type *type)
 {
 	bool resched_timer = false;
 	int signr;
@@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
+	*type = PIDTYPE_PID;
 	signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
 	if (!signr) {
+		*type = PIDTYPE_TGID;
 		signr = __dequeue_signal(&tsk->signal->shared_pending,
 					 mask, info, &resched_timer);
 #ifdef CONFIG_POSIX_TIMERS
@@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
 	freezable_schedule();
 }
 
-static int ptrace_signal(int signr, kernel_siginfo_t *info)
+static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 {
 	/*
 	 * We do not check sig_kernel_stop(signr) but set this marker
@@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
 
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr)) {
-		send_signal(signr, info, current, PIDTYPE_PID);
+		send_signal(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)
 
 	for (;;) {
 		struct k_sigaction *ka;
+		enum pid_type type;
 
 		/* Has this task already been marked for death? */
 		if (signal_group_exit(signal)) {
@@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
 		 * so that the instruction pointer in the signal stack
 		 * frame points to the faulting instruction.
 		 */
+		type = PIDTYPE_PID;
 		signr = dequeue_synchronous_signal(&ksig->info);
 		if (!signr)
-			signr = dequeue_signal(current, &current->blocked, &ksig->info);
+			signr = dequeue_signal(current, &current->blocked,
+					       &ksig->info, &type);
 
 		if (!signr)
 			break; /* will return 0 */
 
 		if (unlikely(current->ptrace) && (signr != SIGKILL) &&
 		    !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
-			signr = ptrace_signal(signr, &ksig->info);
+			signr = ptrace_signal(signr, &ksig->info, type);
 			if (!signr)
 				continue;
 		}
@@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
 	ktime_t *to = NULL, timeout = KTIME_MAX;
 	struct task_struct *tsk = current;
 	sigset_t mask = *which;
+	enum pid_type type;
 	int sig, ret = 0;
 
 	if (ts) {
@@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
 	signotset(&mask);
 
 	spin_lock_irq(&tsk->sighand->siglock);
-	sig = dequeue_signal(tsk, &mask, info);
+	sig = dequeue_signal(tsk, &mask, info, &type);
 	if (!sig && timeout) {
 		/*
 		 * None ready, temporarily unblock those we're interested
@@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
 		spin_lock_irq(&tsk->sighand->siglock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
 		sigemptyset(&tsk->real_blocked);
-		sig = dequeue_signal(tsk, &mask, info);
+		sig = dequeue_signal(tsk, &mask, info, &type);
 	}
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] signal: Requeue ptrace signals
  2021-11-16  5:29           ` [PATCH 0/3] signal: requeuing undeliverable signals Eric W. Biederman
  2021-11-16  5:32             ` [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop Eric W. Biederman
  2021-11-16  5:33             ` [PATCH 2/3] signal: Requeue signals in the appropriate queue Eric W. Biederman
@ 2021-11-16  5:34             ` Eric W. Biederman
  2021-11-16 18:31               ` Kees Cook
  2021-11-17 16:24             ` [PATCH 0/3] signal: requeuing undeliverable signals Kyle Huey
  3 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-16  5:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, linux-api, Al Viro,
	Linus Torvalds, Kees Cook

Kyle Huey <me@kylehuey.com> writes:

> rr, a userspace record and replay debugger[0], uses the recorded register
> state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> executing the program during replay.
>
> If a SIGKILL races with processing another signal in get_signal, it is
> possible for the kernel to decline to notify the tracer of the original
> signal. But if the original signal had a handler, the kernel proceeds
> with setting up a signal handler frame as if the tracer had chosen to
> deliver the signal unmodified to the tracee. When the kernel goes to
> execute the signal handler that it has now modified the stack and registers
> for, it will discover the pending SIGKILL, and terminate the tracee
> without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> the tracer, however, the effects of handler setup will be visible to
> the tracer.
>
> Because rr (the tracer) was never notified of the signal, it is not aware
> that a signal handler frame was set up and expects the state of the program
> at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> by allowing the program to execute from the last event. When that fails
> to happen during replay, rr will assert and die.
>
> The following patches add an explicit check for a newly pending SIGKILL
> after the ptracer has been notified and the siglock has been reacquired.
> If this happens, we stop processing the current signal and proceed
> immediately to handling the SIGKILL. This makes the state reported at
> PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> work to set up a signal handler frame that will never be used.
>
> [0] https://rr-project.org/

The problem is that while the traced process makes it into ptrace_stop,
the tracee is killed before the tracer manages to wait for the
tracee and discover which signal was about to be delivered.

More generally the problem is that while siglock was dropped a signal
with process wide effect is short cirucit delivered to the entire
process killing it, but the process continues to try and deliver another
signal.

In general it impossible to avoid all cases where work is performed
after the process has been killed.  In particular if the process is
killed after get_signal returns the code will simply not know it has
been killed until after delivering the signal frame to userspace.

On the other hand when the code has already discovered the process
has been killed and taken user space visible action that shows
the kernel knows the process has been killed, it is just silly
to then write the signal frame to the user space stack.

Instead of being silly detect the process has been killed
in ptrace_signal and requeue the signal so the code can pretend
it was simply never dequeued for delivery.

To test the process has been killed I use fatal_signal_pending rather
than signal_group_exit to match the test in signal_pending_state which
is used in schedule which is where ptrace_stop detects the process has
been killed.

Requeuing the signal so the code can pretend it was simply never
dequeued improves the user space visible behavior that has been
present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").

Kyle Huey verified that this change in behavior and makes rr happy.

Reported-by: Kyle Huey <khuey@kylehuey.com>
Reported-by: Marko Mäkelä <marko.makela@mariadb.com>
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 43e8b7e362b0..621401550f0f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	}

 	/* If the (new) signal is now blocked, requeue it.  */
-	if (sigismember(&current->blocked, signr)) {
+	if (sigismember(&current->blocked, signr) ||
+	    fatal_signal_pending(current)) {
 		send_signal(signr, info, current, type);
 		signr = 0;
 	}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop
  2021-11-16  5:32             ` [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop Eric W. Biederman
@ 2021-11-16 18:23               ` Kees Cook
  2021-11-17 16:31                 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2021-11-16 18:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

On Mon, Nov 15, 2021 at 11:32:50PM -0600, Eric W. Biederman wrote:
> 
> Recently while investigating a problem with rr and signals I noticed
> that siglock is dropped in ptrace_signal and get_signal does not jump
> to relock.
> 
> Looking farther to see if the problem is anywhere else I see that
> do_signal_stop also returns if signal_group_exit is true.  I believe
> that test can now never be true, but it is a bit hard to trace
> through and be certain.
> 
> Testing signal_group_exit is not expensive, so move the test for
> signal_group_exit into the for loop inside of get_signal to ensure
> the test is never skipped improperly.

Seems reasonable.

> 
> This has been a potential since I added the test for signal_group_exit

nit: missing word after "potential"? "bug", "problem"?

> was added.
> 
> Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

> ---
>  kernel/signal.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 7c4b7ae714d4..986fa69c15c5 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
>  		goto relock;
>  	}
>  
> -	/* Has this task already been marked for death? */
> -	if (signal_group_exit(signal)) {
> -		ksig->info.si_signo = signr = SIGKILL;
> -		sigdelset(&current->pending.signal, SIGKILL);
> -		trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
> -				&sighand->action[SIGKILL - 1]);
> -		recalc_sigpending();
> -		goto fatal;
> -	}
> -
>  	for (;;) {
>  		struct k_sigaction *ka;
>  
> +		/* Has this task already been marked for death? */
> +		if (signal_group_exit(signal)) {
> +			ksig->info.si_signo = signr = SIGKILL;
> +			sigdelset(&current->pending.signal, SIGKILL);
> +			trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
> +				&sighand->action[SIGKILL - 1]);
> +			recalc_sigpending();
> +			goto fatal;
> +		}
> +
>  		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
>  		    do_signal_stop(0))
>  			goto relock;
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] signal: Requeue signals in the appropriate queue
  2021-11-16  5:33             ` [PATCH 2/3] signal: Requeue signals in the appropriate queue Eric W. Biederman
@ 2021-11-16 18:30               ` Kees Cook
  2021-11-17 16:42                 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2021-11-16 18:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

On Mon, Nov 15, 2021 at 11:33:19PM -0600, Eric W. Biederman wrote:
> 
> In the event that a tracer changes which signal needs to be delivered
> and that signal is currently blocked then the signal needs to be
> requeued for later delivery.
> 
> With the advent of CLONE_THREAD the kernel has 2 signal queues per
> task.  The per process queue and the per task queue.  Update the code
> so that if the signal is removed from the per process queue it is
> requeued on the per process queue.  This is necessary to make it
> appear the signal was never dequeued.
> 
> The rr debugger reasonably believes that the state of the process from
> the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
> by simply letting a process run.  If a SIGKILL interrupts a ptrace_stop
> this is not true today.
> 
> So return signals to their original queue in ptrace_signal so that
> signals that are not delivered appear like they were never dequeued.

The only comment I have on this is that it seems like many callers
totally ignore the result store in "type" (signalfd_dequeue,
kernel_dequeue_signal, do_sigtimedwait), which would imply a different
API might be desirable. That said, it's also not a big deal.

> 
> Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>


> ---
>  fs/signalfd.c                |  5 +++--
>  include/linux/sched/signal.h |  7 ++++---
>  kernel/signal.c              | 21 ++++++++++++++-------
>  3 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/signalfd.c b/fs/signalfd.c
> index 040e1cf90528..74f134cd1ff6 100644
> --- a/fs/signalfd.c
> +++ b/fs/signalfd.c
> @@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
>  static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
>  				int nonblock)
>  {
> +	enum pid_type type;
>  	ssize_t ret;
>  	DECLARE_WAITQUEUE(wait, current);
>  
>  	spin_lock_irq(&current->sighand->siglock);
> -	ret = dequeue_signal(current, &ctx->sigmask, info);
> +	ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>  	switch (ret) {
>  	case 0:
>  		if (!nonblock)
> @@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
>  	add_wait_queue(&current->sighand->signalfd_wqh, &wait);
>  	for (;;) {
>  		set_current_state(TASK_INTERRUPTIBLE);
> -		ret = dequeue_signal(current, &ctx->sigmask, info);
> +		ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>  		if (ret != 0)
>  			break;
>  		if (signal_pending(current)) {
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 23505394ef70..167995d471da 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
>  extern void flush_signals(struct task_struct *);
>  extern void ignore_signals(struct task_struct *);
>  extern void flush_signal_handlers(struct task_struct *, int force_default);
> -extern int dequeue_signal(struct task_struct *task,
> -			  sigset_t *mask, kernel_siginfo_t *info);
> +extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
> +			  kernel_siginfo_t *info, enum pid_type *type);
>  
>  static inline int kernel_dequeue_signal(void)
>  {
>  	struct task_struct *task = current;
>  	kernel_siginfo_t __info;
> +	enum pid_type __type;
>  	int ret;
>  
>  	spin_lock_irq(&task->sighand->siglock);
> -	ret = dequeue_signal(task, &task->blocked, &__info);
> +	ret = dequeue_signal(task, &task->blocked, &__info, &__type);
>  	spin_unlock_irq(&task->sighand->siglock);
>  
>  	return ret;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 986fa69c15c5..43e8b7e362b0 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
>   *
>   * All callers have to hold the siglock.
>   */
> -int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
> +int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
> +		   kernel_siginfo_t *info, enum pid_type *type)
>  {
>  	bool resched_timer = false;
>  	int signr;
> @@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
>  	/* We only dequeue private signals from ourselves, we don't let
>  	 * signalfd steal them
>  	 */
> +	*type = PIDTYPE_PID;
>  	signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
>  	if (!signr) {
> +		*type = PIDTYPE_TGID;
>  		signr = __dequeue_signal(&tsk->signal->shared_pending,
>  					 mask, info, &resched_timer);
>  #ifdef CONFIG_POSIX_TIMERS
> @@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
>  	freezable_schedule();
>  }
>  
> -static int ptrace_signal(int signr, kernel_siginfo_t *info)
> +static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>  {
>  	/*
>  	 * We do not check sig_kernel_stop(signr) but set this marker
> @@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>  
>  	/* If the (new) signal is now blocked, requeue it.  */
>  	if (sigismember(&current->blocked, signr)) {
> -		send_signal(signr, info, current, PIDTYPE_PID);
> +		send_signal(signr, info, current, type);
>  		signr = 0;
>  	}
>  
> @@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)
>  
>  	for (;;) {
>  		struct k_sigaction *ka;
> +		enum pid_type type;
>  
>  		/* Has this task already been marked for death? */
>  		if (signal_group_exit(signal)) {
> @@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
>  		 * so that the instruction pointer in the signal stack
>  		 * frame points to the faulting instruction.
>  		 */
> +		type = PIDTYPE_PID;
>  		signr = dequeue_synchronous_signal(&ksig->info);
>  		if (!signr)
> -			signr = dequeue_signal(current, &current->blocked, &ksig->info);
> +			signr = dequeue_signal(current, &current->blocked,
> +					       &ksig->info, &type);
>  
>  		if (!signr)
>  			break; /* will return 0 */
>  
>  		if (unlikely(current->ptrace) && (signr != SIGKILL) &&
>  		    !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
> -			signr = ptrace_signal(signr, &ksig->info);
> +			signr = ptrace_signal(signr, &ksig->info, type);
>  			if (!signr)
>  				continue;
>  		}
> @@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>  	ktime_t *to = NULL, timeout = KTIME_MAX;
>  	struct task_struct *tsk = current;
>  	sigset_t mask = *which;
> +	enum pid_type type;
>  	int sig, ret = 0;
>  
>  	if (ts) {
> @@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>  	signotset(&mask);
>  
>  	spin_lock_irq(&tsk->sighand->siglock);
> -	sig = dequeue_signal(tsk, &mask, info);
> +	sig = dequeue_signal(tsk, &mask, info, &type);
>  	if (!sig && timeout) {
>  		/*
>  		 * None ready, temporarily unblock those we're interested
> @@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>  		spin_lock_irq(&tsk->sighand->siglock);
>  		__set_task_blocked(tsk, &tsk->real_blocked);
>  		sigemptyset(&tsk->real_blocked);
> -		sig = dequeue_signal(tsk, &mask, info);
> +		sig = dequeue_signal(tsk, &mask, info, &type);
>  	}
>  	spin_unlock_irq(&tsk->sighand->siglock);
>  
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] signal: Requeue ptrace signals
  2021-11-16  5:34             ` [PATCH 3/3] signal: Requeue ptrace signals Eric W. Biederman
@ 2021-11-16 18:31               ` Kees Cook
  2021-11-17 16:44                 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2021-11-16 18:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

On Mon, Nov 15, 2021 at 11:34:33PM -0600, Eric W. Biederman wrote:
> 
> Kyle Huey <me@kylehuey.com> writes:
> 
> > rr, a userspace record and replay debugger[0], uses the recorded register
> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> > executing the program during replay.
> >
> > If a SIGKILL races with processing another signal in get_signal, it is
> > possible for the kernel to decline to notify the tracer of the original
> > signal. But if the original signal had a handler, the kernel proceeds
> > with setting up a signal handler frame as if the tracer had chosen to
> > deliver the signal unmodified to the tracee. When the kernel goes to
> > execute the signal handler that it has now modified the stack and registers
> > for, it will discover the pending SIGKILL, and terminate the tracee
> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> > the tracer, however, the effects of handler setup will be visible to
> > the tracer.
> >
> > Because rr (the tracer) was never notified of the signal, it is not aware
> > that a signal handler frame was set up and expects the state of the program
> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> > by allowing the program to execute from the last event. When that fails
> > to happen during replay, rr will assert and die.
> >
> > The following patches add an explicit check for a newly pending SIGKILL
> > after the ptracer has been notified and the siglock has been reacquired.
> > If this happens, we stop processing the current signal and proceed
> > immediately to handling the SIGKILL. This makes the state reported at
> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> > work to set up a signal handler frame that will never be used.
> >
> > [0] https://rr-project.org/
> 
> The problem is that while the traced process makes it into ptrace_stop,
> the tracee is killed before the tracer manages to wait for the
> tracee and discover which signal was about to be delivered.
> 
> More generally the problem is that while siglock was dropped a signal
> with process wide effect is short cirucit delivered to the entire
> process killing it, but the process continues to try and deliver another
> signal.
> 
> In general it impossible to avoid all cases where work is performed
> after the process has been killed.  In particular if the process is
> killed after get_signal returns the code will simply not know it has
> been killed until after delivering the signal frame to userspace.
> 
> On the other hand when the code has already discovered the process
> has been killed and taken user space visible action that shows
> the kernel knows the process has been killed, it is just silly
> to then write the signal frame to the user space stack.
> 
> Instead of being silly detect the process has been killed
> in ptrace_signal and requeue the signal so the code can pretend
> it was simply never dequeued for delivery.
> 
> To test the process has been killed I use fatal_signal_pending rather
> than signal_group_exit to match the test in signal_pending_state which
> is used in schedule which is where ptrace_stop detects the process has
> been killed.
> 
> Requeuing the signal so the code can pretend it was simply never
> dequeued improves the user space visible behavior that has been
> present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").
> 
> Kyle Huey verified that this change in behavior and makes rr happy.
> 
> Reported-by: Kyle Huey <khuey@kylehuey.com>
> Reported-by: Marko Mäkelä <marko.makela@mariadb.com>
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yay pre-git-history! :)

Reviewed-by: Kees Cook <keescook@chromium.org>

> ---
>  kernel/signal.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 43e8b7e362b0..621401550f0f 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>  	}
>  
>  	/* If the (new) signal is now blocked, requeue it.  */
> -	if (sigismember(&current->blocked, signr)) {
> +	if (sigismember(&current->blocked, signr) ||
> +	    fatal_signal_pending(current)) {
>  		send_signal(signr, info, current, type);
>  		signr = 0;
>  	}
> -- 
> 2.20.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] signal: requeuing undeliverable signals
  2021-11-16  5:29           ` [PATCH 0/3] signal: requeuing undeliverable signals Eric W. Biederman
                               ` (2 preceding siblings ...)
  2021-11-16  5:34             ` [PATCH 3/3] signal: Requeue ptrace signals Eric W. Biederman
@ 2021-11-17 16:24             ` Kyle Huey
  2021-11-17 16:51               ` Eric W. Biederman
  3 siblings, 1 reply; 13+ messages in thread
From: Kyle Huey @ 2021-11-17 16:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: open list, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, Linux API, Al Viro,
	Linus Torvalds, Kees Cook

On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>
> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
> ptrace_signal from delivering a signal, as the kernel setups up a signal
> frame for a signal that rr did not have a chance to observe with ptrace.
>
> In looking into it I found a couple of bugs and a quality of
> implementation issue.
>
> - The test for signal_group_exit should be inside the for loop in get_signal.
> - Signals should be requeued on the same queue they were dequeued from.
> - When a fatal signal is pending ptrace_signal should not return another
>   signal for delivery.
>
> Kyle Huey has verified[2] an earlier version of this change.
>
> I have reworked things one more time to completely fix the issues
> raised, and to keep the code maintainable long term.
>
> I have smoke tested this code and combined with a careful review I
> expect this code to work fine.  Kyle if you can double check that
> my last round of changes still works for rr I would appreciate it.

This still fixes the race we reported.

Tested-by: Kyle Huey <khuey@kylehuey.com>

- Kyle

> Eric W. Biederman (3):
>       signal: In get_signal test for signal_group_exit every time through the loop
>       signal: Requeue signals in the appropriate queue
>       signal: Requeue ptrace signals
>
>  fs/signalfd.c                |  5 +++--
>  include/linux/sched/signal.h |  7 ++++---
>  kernel/signal.c              | 44 ++++++++++++++++++++++++++------------------
>  3 files changed, 33 insertions(+), 23 deletions(-)
>
> [1] https://lkml.kernel.org/r/20211101034147.6203-1-khuey@kylehuey.com
> [2] https://lkml.kernel.org/r/CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com
>
> Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop
  2021-11-16 18:23               ` Kees Cook
@ 2021-11-17 16:31                 ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-17 16:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

Kees Cook <keescook@chromium.org> writes:

> On Mon, Nov 15, 2021 at 11:32:50PM -0600, Eric W. Biederman wrote:
>> 
>> Recently while investigating a problem with rr and signals I noticed
>> that siglock is dropped in ptrace_signal and get_signal does not jump
>> to relock.
>> 
>> Looking farther to see if the problem is anywhere else I see that
>> do_signal_stop also returns if signal_group_exit is true.  I believe
>> that test can now never be true, but it is a bit hard to trace
>> through and be certain.
>> 
>> Testing signal_group_exit is not expensive, so move the test for
>> signal_group_exit into the for loop inside of get_signal to ensure
>> the test is never skipped improperly.
>
> Seems reasonable.
>
>> 
>> This has been a potential since I added the test for signal_group_exit
>
> nit: missing word after "potential"? "bug", "problem"?

Yes.  Potential problem.  I will update the comment.

>> was added.
>> 
>> Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
>> ---
>>  kernel/signal.c | 20 ++++++++++----------
>>  1 file changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 7c4b7ae714d4..986fa69c15c5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
>>  		goto relock;
>>  	}
>>  
>> -	/* Has this task already been marked for death? */
>> -	if (signal_group_exit(signal)) {
>> -		ksig->info.si_signo = signr = SIGKILL;
>> -		sigdelset(&current->pending.signal, SIGKILL);
>> -		trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>> -				&sighand->action[SIGKILL - 1]);
>> -		recalc_sigpending();
>> -		goto fatal;
>> -	}
>> -
>>  	for (;;) {
>>  		struct k_sigaction *ka;
>>  
>> +		/* Has this task already been marked for death? */
>> +		if (signal_group_exit(signal)) {
>> +			ksig->info.si_signo = signr = SIGKILL;
>> +			sigdelset(&current->pending.signal, SIGKILL);
>> +			trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>> +				&sighand->action[SIGKILL - 1]);
>> +			recalc_sigpending();
>> +			goto fatal;
>> +		}
>> +
>>  		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
>>  		    do_signal_stop(0))
>>  			goto relock;
>> -- 
>> 2.20.1
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] signal: Requeue signals in the appropriate queue
  2021-11-16 18:30               ` Kees Cook
@ 2021-11-17 16:42                 ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-17 16:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

Kees Cook <keescook@chromium.org> writes:

> On Mon, Nov 15, 2021 at 11:33:19PM -0600, Eric W. Biederman wrote:
>> 
>> In the event that a tracer changes which signal needs to be delivered
>> and that signal is currently blocked then the signal needs to be
>> requeued for later delivery.
>> 
>> With the advent of CLONE_THREAD the kernel has 2 signal queues per
>> task.  The per process queue and the per task queue.  Update the code
>> so that if the signal is removed from the per process queue it is
>> requeued on the per process queue.  This is necessary to make it
>> appear the signal was never dequeued.
>> 
>> The rr debugger reasonably believes that the state of the process from
>> the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
>> by simply letting a process run.  If a SIGKILL interrupts a ptrace_stop
>> this is not true today.
>> 
>> So return signals to their original queue in ptrace_signal so that
>> signals that are not delivered appear like they were never dequeued.
>
> The only comment I have on this is that it seems like many callers
> totally ignore the result store in "type" (signalfd_dequeue,
> kernel_dequeue_signal, do_sigtimedwait), which would imply a different
> API might be desirable. That said, it's also not a big deal.

I thought about it.  The primary user of dequeue_signal get_signal does
care which queue you can from.  Always returning the value makes it
possible for the other callers to ask the question do they care, when
they are updated.

My conclusion was that in this case it is more useful for the callers of
dequeue_signal to ask if they care.  My sense is that if we have the
information right in people's faces and they care they will realize they
care.  If the information is not trivially available people are unlikely
to remember the kernel supports two queues, even when it makes a
difference.

Eric

>> Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
>
>> ---
>>  fs/signalfd.c                |  5 +++--
>>  include/linux/sched/signal.h |  7 ++++---
>>  kernel/signal.c              | 21 ++++++++++++++-------
>>  3 files changed, 21 insertions(+), 12 deletions(-)
>> 
>> diff --git a/fs/signalfd.c b/fs/signalfd.c
>> index 040e1cf90528..74f134cd1ff6 100644
>> --- a/fs/signalfd.c
>> +++ b/fs/signalfd.c
>> @@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
>>  static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
>>  				int nonblock)
>>  {
>> +	enum pid_type type;
>>  	ssize_t ret;
>>  	DECLARE_WAITQUEUE(wait, current);
>>  
>>  	spin_lock_irq(&current->sighand->siglock);
>> -	ret = dequeue_signal(current, &ctx->sigmask, info);
>> +	ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>>  	switch (ret) {
>>  	case 0:
>>  		if (!nonblock)
>> @@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
>>  	add_wait_queue(&current->sighand->signalfd_wqh, &wait);
>>  	for (;;) {
>>  		set_current_state(TASK_INTERRUPTIBLE);
>> -		ret = dequeue_signal(current, &ctx->sigmask, info);
>> +		ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>>  		if (ret != 0)
>>  			break;
>>  		if (signal_pending(current)) {
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index 23505394ef70..167995d471da 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
>>  extern void flush_signals(struct task_struct *);
>>  extern void ignore_signals(struct task_struct *);
>>  extern void flush_signal_handlers(struct task_struct *, int force_default);
>> -extern int dequeue_signal(struct task_struct *task,
>> -			  sigset_t *mask, kernel_siginfo_t *info);
>> +extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
>> +			  kernel_siginfo_t *info, enum pid_type *type);
>>  
>>  static inline int kernel_dequeue_signal(void)
>>  {
>>  	struct task_struct *task = current;
>>  	kernel_siginfo_t __info;
>> +	enum pid_type __type;
>>  	int ret;
>>  
>>  	spin_lock_irq(&task->sighand->siglock);
>> -	ret = dequeue_signal(task, &task->blocked, &__info);
>> +	ret = dequeue_signal(task, &task->blocked, &__info, &__type);
>>  	spin_unlock_irq(&task->sighand->siglock);
>>  
>>  	return ret;
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 986fa69c15c5..43e8b7e362b0 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
>>   *
>>   * All callers have to hold the siglock.
>>   */
>> -int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
>> +int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
>> +		   kernel_siginfo_t *info, enum pid_type *type)
>>  {
>>  	bool resched_timer = false;
>>  	int signr;
>> @@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
>>  	/* We only dequeue private signals from ourselves, we don't let
>>  	 * signalfd steal them
>>  	 */
>> +	*type = PIDTYPE_PID;
>>  	signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
>>  	if (!signr) {
>> +		*type = PIDTYPE_TGID;
>>  		signr = __dequeue_signal(&tsk->signal->shared_pending,
>>  					 mask, info, &resched_timer);
>>  #ifdef CONFIG_POSIX_TIMERS
>> @@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
>>  	freezable_schedule();
>>  }
>>  
>> -static int ptrace_signal(int signr, kernel_siginfo_t *info)
>> +static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>>  {
>>  	/*
>>  	 * We do not check sig_kernel_stop(signr) but set this marker
>> @@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>>  
>>  	/* If the (new) signal is now blocked, requeue it.  */
>>  	if (sigismember(&current->blocked, signr)) {
>> -		send_signal(signr, info, current, PIDTYPE_PID);
>> +		send_signal(signr, info, current, type);
>>  		signr = 0;
>>  	}
>>  
>> @@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)
>>  
>>  	for (;;) {
>>  		struct k_sigaction *ka;
>> +		enum pid_type type;
>>  
>>  		/* Has this task already been marked for death? */
>>  		if (signal_group_exit(signal)) {
>> @@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
>>  		 * so that the instruction pointer in the signal stack
>>  		 * frame points to the faulting instruction.
>>  		 */
>> +		type = PIDTYPE_PID;
>>  		signr = dequeue_synchronous_signal(&ksig->info);
>>  		if (!signr)
>> -			signr = dequeue_signal(current, &current->blocked, &ksig->info);
>> +			signr = dequeue_signal(current, &current->blocked,
>> +					       &ksig->info, &type);
>>  
>>  		if (!signr)
>>  			break; /* will return 0 */
>>  
>>  		if (unlikely(current->ptrace) && (signr != SIGKILL) &&
>>  		    !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
>> -			signr = ptrace_signal(signr, &ksig->info);
>> +			signr = ptrace_signal(signr, &ksig->info, type);
>>  			if (!signr)
>>  				continue;
>>  		}
>> @@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>>  	ktime_t *to = NULL, timeout = KTIME_MAX;
>>  	struct task_struct *tsk = current;
>>  	sigset_t mask = *which;
>> +	enum pid_type type;
>>  	int sig, ret = 0;
>>  
>>  	if (ts) {
>> @@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>>  	signotset(&mask);
>>  
>>  	spin_lock_irq(&tsk->sighand->siglock);
>> -	sig = dequeue_signal(tsk, &mask, info);
>> +	sig = dequeue_signal(tsk, &mask, info, &type);
>>  	if (!sig && timeout) {
>>  		/*
>>  		 * None ready, temporarily unblock those we're interested
>> @@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>>  		spin_lock_irq(&tsk->sighand->siglock);
>>  		__set_task_blocked(tsk, &tsk->real_blocked);
>>  		sigemptyset(&tsk->real_blocked);
>> -		sig = dequeue_signal(tsk, &mask, info);
>> +		sig = dequeue_signal(tsk, &mask, info, &type);
>>  	}
>>  	spin_unlock_irq(&tsk->sighand->siglock);
>>  
>> -- 
>> 2.20.1
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] signal: Requeue ptrace signals
  2021-11-16 18:31               ` Kees Cook
@ 2021-11-17 16:44                 ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-17 16:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Kyle Huey, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Marko Mäkelä,
	linux-api, Al Viro, Linus Torvalds

Kees Cook <keescook@chromium.org> writes:

> On Mon, Nov 15, 2021 at 11:34:33PM -0600, Eric W. Biederman wrote:
>> 
>> Kyle Huey <me@kylehuey.com> writes:
>> 
>> > rr, a userspace record and replay debugger[0], uses the recorded register
>> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
>> > executing the program during replay.
>> >
>> > If a SIGKILL races with processing another signal in get_signal, it is
>> > possible for the kernel to decline to notify the tracer of the original
>> > signal. But if the original signal had a handler, the kernel proceeds
>> > with setting up a signal handler frame as if the tracer had chosen to
>> > deliver the signal unmodified to the tracee. When the kernel goes to
>> > execute the signal handler that it has now modified the stack and registers
>> > for, it will discover the pending SIGKILL, and terminate the tracee
>> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
>> > the tracer, however, the effects of handler setup will be visible to
>> > the tracer.
>> >
>> > Because rr (the tracer) was never notified of the signal, it is not aware
>> > that a signal handler frame was set up and expects the state of the program
>> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
>> > by allowing the program to execute from the last event. When that fails
>> > to happen during replay, rr will assert and die.
>> >
>> > The following patches add an explicit check for a newly pending SIGKILL
>> > after the ptracer has been notified and the siglock has been reacquired.
>> > If this happens, we stop processing the current signal and proceed
>> > immediately to handling the SIGKILL. This makes the state reported at
>> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
>> > work to set up a signal handler frame that will never be used.
>> >
>> > [0] https://rr-project.org/
>> 
>> The problem is that while the traced process makes it into ptrace_stop,
>> the tracee is killed before the tracer manages to wait for the
>> tracee and discover which signal was about to be delivered.
>> 
>> More generally the problem is that while siglock was dropped a signal
>> with process wide effect is short cirucit delivered to the entire
>> process killing it, but the process continues to try and deliver another
>> signal.
>> 
>> In general it impossible to avoid all cases where work is performed
>> after the process has been killed.  In particular if the process is
>> killed after get_signal returns the code will simply not know it has
>> been killed until after delivering the signal frame to userspace.
>> 
>> On the other hand when the code has already discovered the process
>> has been killed and taken user space visible action that shows
>> the kernel knows the process has been killed, it is just silly
>> to then write the signal frame to the user space stack.
>> 
>> Instead of being silly detect the process has been killed
>> in ptrace_signal and requeue the signal so the code can pretend
>> it was simply never dequeued for delivery.
>> 
>> To test the process has been killed I use fatal_signal_pending rather
>> than signal_group_exit to match the test in signal_pending_state which
>> is used in schedule which is where ptrace_stop detects the process has
>> been killed.
>> 
>> Requeuing the signal so the code can pretend it was simply never
>> dequeued improves the user space visible behavior that has been
>> present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").
>> 
>> Kyle Huey verified that this change in behavior and makes rr happy.
>> 
>> Reported-by: Kyle Huey <khuey@kylehuey.com>
>> Reported-by: Marko Mäkelä <marko.makela@mariadb.com>
>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Yay pre-git-history! :)

One of these days we might finish removing the rough edges and
fixing the corner case bugs in the original linux pthreads support.

Eric


> Reviewed-by: Kees Cook <keescook@chromium.org>
>
>> ---
>>  kernel/signal.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 43e8b7e362b0..621401550f0f 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>>  	}
>>  
>>  	/* If the (new) signal is now blocked, requeue it.  */
>> -	if (sigismember(&current->blocked, signr)) {
>> +	if (sigismember(&current->blocked, signr) ||
>> +	    fatal_signal_pending(current)) {
>>  		send_signal(signr, info, current, type);
>>  		signr = 0;
>>  	}
>> -- 
>> 2.20.1
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] signal: requeuing undeliverable signals
  2021-11-17 16:24             ` [PATCH 0/3] signal: requeuing undeliverable signals Kyle Huey
@ 2021-11-17 16:51               ` Eric W. Biederman
  2021-11-18  6:12                 ` Marko Mäkelä
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-11-17 16:51 UTC (permalink / raw)
  To: Kyle Huey
  Cc: open list, Jens Axboe, Peter Zijlstra, Marco Elver, Oleg Nesterov,
	Thomas Gleixner, Peter Collingbourne, Alexey Gladkov,
	Robert O'Callahan, Marko Mäkelä, Linux API, Al Viro,
	Linus Torvalds, Kees Cook

Kyle Huey <me@kylehuey.com> writes:

> On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>>
>> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
>> ptrace_signal from delivering a signal, as the kernel setups up a signal
>> frame for a signal that rr did not have a chance to observe with ptrace.
>>
>> In looking into it I found a couple of bugs and a quality of
>> implementation issue.
>>
>> - The test for signal_group_exit should be inside the for loop in get_signal.
>> - Signals should be requeued on the same queue they were dequeued from.
>> - When a fatal signal is pending ptrace_signal should not return another
>>   signal for delivery.
>>
>> Kyle Huey has verified[2] an earlier version of this change.
>>
>> I have reworked things one more time to completely fix the issues
>> raised, and to keep the code maintainable long term.
>>
>> I have smoke tested this code and combined with a careful review I
>> expect this code to work fine.  Kyle if you can double check that
>> my last round of changes still works for rr I would appreciate it.
>
> This still fixes the race we reported.

>
> Tested-by: Kyle Huey <khuey@kylehuey.com>

Thank you very much for retesting.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] signal: requeuing undeliverable signals
  2021-11-17 16:51               ` Eric W. Biederman
@ 2021-11-18  6:12                 ` Marko Mäkelä
  0 siblings, 0 replies; 13+ messages in thread
From: Marko Mäkelä @ 2021-11-18  6:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kyle Huey, open list, Jens Axboe, Peter Zijlstra, Marco Elver,
	Oleg Nesterov, Thomas Gleixner, Peter Collingbourne,
	Alexey Gladkov, Robert O'Callahan, Linux API, Al Viro,
	Linus Torvalds, Kees Cook

On Wed, Nov 17, 2021 at 6:51 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <me@kylehuey.com> writes:
>
> > On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >>
> >> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
> >> ptrace_signal from delivering a signal, as the kernel setups up a signal
> >> frame for a signal that rr did not have a chance to observe with ptrace.
> >>
> >> In looking into it I found a couple of bugs and a quality of
> >> implementation issue.
> >>
> >> - The test for signal_group_exit should be inside the for loop in get_signal.
> >> - Signals should be requeued on the same queue they were dequeued from.
> >> - When a fatal signal is pending ptrace_signal should not return another
> >>   signal for delivery.
> >>
> >> Kyle Huey has verified[2] an earlier version of this change.
> >>
> >> I have reworked things one more time to completely fix the issues
> >> raised, and to keep the code maintainable long term.
> >>
> >> I have smoke tested this code and combined with a careful review I
> >> expect this code to work fine.  Kyle if you can double check that
> >> my last round of changes still works for rr I would appreciate it.
> >
> > This still fixes the race we reported.
>
> >
> > Tested-by: Kyle Huey <khuey@kylehuey.com>
>
> Thank you very much for retesting.
>
> Eric

Thank you, Kyle and Eric, for reporting and fixing the root cause of this race.

Meanwhile, I followed Kyle's suggestion and will disable the crash
handlers in the tracee whenever it is being traced.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-18  6:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211101034147.6203-1-khuey@kylehuey.com>
     [not found] ` <877ddqabvs.fsf@disp2133>
     [not found]   ` <CAP045AqJVXA60R9RF8Gb2PWGBsK6bZ7tVBkdCcPYYrp6rOkG-Q@mail.gmail.com>
     [not found]     ` <87fsse8maf.fsf@disp2133>
     [not found]       ` <CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com>
     [not found]         ` <CAP045AqsstnxfTyXhhCGDSucqGN7BTtfHJ5s6ZxUQC5K-JU56A@mail.gmail.com>
2021-11-16  5:29           ` [PATCH 0/3] signal: requeuing undeliverable signals Eric W. Biederman
2021-11-16  5:32             ` [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop Eric W. Biederman
2021-11-16 18:23               ` Kees Cook
2021-11-17 16:31                 ` Eric W. Biederman
2021-11-16  5:33             ` [PATCH 2/3] signal: Requeue signals in the appropriate queue Eric W. Biederman
2021-11-16 18:30               ` Kees Cook
2021-11-17 16:42                 ` Eric W. Biederman
2021-11-16  5:34             ` [PATCH 3/3] signal: Requeue ptrace signals Eric W. Biederman
2021-11-16 18:31               ` Kees Cook
2021-11-17 16:44                 ` Eric W. Biederman
2021-11-17 16:24             ` [PATCH 0/3] signal: requeuing undeliverable signals Kyle Huey
2021-11-17 16:51               ` Eric W. Biederman
2021-11-18  6:12                 ` Marko Mäkelä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox