ptrace bug in -rc2+

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ptrace bug in -rc2+
@ 2004-10-14 17:49 Gerd Knorr
  2004-10-26  5:04 ` Roland McGrath
  0 siblings, 1 reply; 3+ messages in thread
From: Gerd Knorr @ 2004-10-14 17:49 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Kernel List

  Hi,

The introduction of the new TASK_TRACED state in 2.6.9-rc2 changed the
behavior of the kernel in a IMHO buggy way.  Sending a SIGKILL to a
process which is traced _and_ stopped doesn't work any more.  user mode
linux kernels do that on shutdown, thats why I ran into this.

Below is a short test app which shows the behavior.  On 2.6.9-rc2+ the
last waitpid() call blocks forever, on older kernels it doesn't ...

  Gerd

==============================[ cut here ]==============================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char *argv[])
{
	int child,rc,status;

	child = fork();
	if (0 == child) {
		fprintf(stderr,"[child] ptrace me ...\n");
		ptrace(PTRACE_TRACEME);
		fprintf(stderr,"[child] exec sleep 10 ...\n");
		execlp("sleep", "sleep", "10", NULL);
		perror("execlp");
		exit(1);
	}

	sleep(1);
	fprintf(stderr,"kill %d,STOP ...\n",child);
	kill(child,SIGSTOP);
	fprintf(stderr,"waitpid %d...\n",child);
	rc = waitpid(child,&status,WUNTRACED);
	fprintf(stderr,"%s: rc=%d status=%s%s%s termsig=%d\n",__FUNCTION__,rc,
		WIFEXITED(status)   ? "exit"    : "",
		WIFSIGNALED(status) ? "signal"  : "",
		WIFSTOPPED(status)  ? "stopped" : "",
		WTERMSIG(status));

	sleep(1);
	fprintf(stderr,"kill %d,KILL ...\n",child);
	kill(child,SIGKILL);
	fprintf(stderr,"waitpid %d...\n",child);
	rc = waitpid(child,&status,WUNTRACED);
	fprintf(stderr,"%s: rc=%d status=%s%s%s termsig=%d\n",__FUNCTION__,rc,
		WIFEXITED(status)   ? "exit"    : "",
		WIFSIGNALED(status) ? "signal"  : "",
		WIFSTOPPED(status)  ? "stopped" : "",
		WTERMSIG(status));

	exit(0);
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ptrace bug in -rc2+
  2004-10-14 17:49 ptrace bug in -rc2+ Gerd Knorr
@ 2004-10-26  5:04 ` Roland McGrath
  2004-10-26  6:37   ` Gerd Knorr
  0 siblings, 1 reply; 3+ messages in thread
From: Roland McGrath @ 2004-10-26  5:04 UTC (permalink / raw)
  To: Gerd Knorr, Andrew Morton, Linus Torvalds; +Cc: Kernel List

Sorry it took a while for me to get back to you on this problem.

> The introduction of the new TASK_TRACED state in 2.6.9-rc2 changed the
> behavior of the kernel in a IMHO buggy way.  Sending a SIGKILL to a
> process which is traced _and_ stopped doesn't work any more.  user mode
> linux kernels do that on shutdown, thats why I ran into this.

This is a change that I explained when I posted the ptrace cleanup patches.
In general it is not safe to do any non-ptrace wakeup of a thread in
TASK_TRACED, because the waking thread could race with a ptrace call that
could be doing things like mucking directly with its kernel stack.  AFAIK
noone has established that whatever clobberation ptrace can do to a running
thread is safe even if it will never return to user mode, so we can't allow
this even for SIGKILL.

What we can safely do is make a thread switching out of TASK_TRACED resume
rather than sitting in TASK_STOPPED if it has a pending SIGKILL or SIGCONT.
The following patch does this.  

That doesn't make your test program happy.  But it should be sufficient for
the shutdown case.  When killing all processes, if the tracer gets killed
first, the tracee goes into TASK_STOPPED and will be woken and killed by
the SIGKILL (same as before).  If the tracee gets killed first, it gets a
pending SIGKILL and doesn't wake up immediately--but, now, when the tracer
gets killed, the tracee will then wake up to die.  You can observe the
change by kill -9'ing your test program and seeing that its child winds up
dead rather than stopped.  This will also fix the (same) situations that
can arise now where you have used gdb (or whatever ptrace caller), killed
-9 the gdb and the process being debugged, but still have to kill -CONT the
process before it goes away (now it should just go away either the first
time or when you kill gdb).

Your particular test program is the one special case where we could make
the SIGKILL work immediately: the caller of kill is the ptracer, so we know
noone else can be using ptrace at the same time.  But I am not in favor of
adding this special case.  If you use ptrace yourself, you should cope.


Thanks,
Roland


Signed-off-by: Roland McGrath <roland@redhat.com>

--- linux-2.6/kernel/ptrace.c 23 Oct 2004 17:07:11 -0000 1.39
+++ linux-2.6/kernel/ptrace.c 26 Oct 2004 04:13:16 -0000
@@ -38,6 +38,12 @@ void __ptrace_link(task_t *child, task_t
 	SET_LINKS(child);
 }
  
+static inline int pending_resume_signal(struct sigpending *pending)
+{
+#define M(sig) (1UL << ((sig)-1))
+	return sigtestsetmask(&pending->signal, M(SIGCONT) | M(SIGKILL));
+}
+
 /*
  * unptrace a task: move it back to its original parent and
  * remove it from the ptrace list.
@@ -61,8 +67,16 @@ void __ptrace_unlink(task_t *child)
 		 * Turn a tracing stop into a normal stop now,
 		 * since with no tracer there would be no way
 		 * to wake it up with SIGCONT or SIGKILL.
+		 * If there was a signal sent that would resume the child,
+		 * but didn't because it was in TASK_TRACED, resume it now.
 		 */
+		spin_lock(&child->sighand->siglock);
 		child->state = TASK_STOPPED;
+		if (pending_resume_signal(&child->pending) ||
+		    pending_resume_signal(&child->signal->shared_pending)) {
+			signal_wake_up(child, 1);
+		}
+		spin_unlock(&child->sighand->siglock);
 	}
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ptrace bug in -rc2+
  2004-10-26  5:04 ` Roland McGrath
@ 2004-10-26  6:37   ` Gerd Knorr
  0 siblings, 0 replies; 3+ messages in thread
From: Gerd Knorr @ 2004-10-26  6:37 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Andrew Morton, Linus Torvalds, Kernel List

On Mon, Oct 25, 2004 at 10:04:14PM -0700, Roland McGrath wrote:
> Sorry it took a while for me to get back to you on this problem.
> 
> > The introduction of the new TASK_TRACED state in 2.6.9-rc2 changed the
> > behavior of the kernel in a IMHO buggy way.  Sending a SIGKILL to a
> > process which is traced _and_ stopped doesn't work any more.  user mode
> > linux kernels do that on shutdown, thats why I ran into this.
> 
> This is a change that I explained when I posted the ptrace cleanup patches.
> In general it is not safe to do any non-ptrace wakeup of a thread in
> TASK_TRACED, because the waking thread could race with a ptrace call that
> could be doing things like mucking directly with its kernel stack.  AFAIK
> noone has established that whatever clobberation ptrace can do to a running
> thread is safe even if it will never return to user mode, so we can't allow
> this even for SIGKILL.

Yes, some days later after studing the source code for some time (and
learing alot about ptrace) I figured that myself as well ;)

> Your particular test program is the one special case where we could make
> the SIGKILL work immediately: the caller of kill is the ptracer, so we know
> noone else can be using ptrace at the same time.  But I am not in favor of
> adding this special case.  If you use ptrace yourself, you should cope.

I agree, that can easily fixed in the app, either first SIGKILL then
PTRACE_CONT, or just PTRACE_KILL directly ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-10-26  7:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-14 17:49 ptrace bug in -rc2+ Gerd Knorr
2004-10-26  5:04 ` Roland McGrath
2004-10-26  6:37   ` Gerd Knorr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).