linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Pavel Labath <labath@google.com>
Cc: linux-kernel@vger.kernel.org, Josh Stone <jistone@redhat.com>
Subject: Re: A peculiarity in ptrace/waitpid behavior
Date: Sun, 22 Mar 2015 16:33:45 +0100	[thread overview]
Message-ID: <20150322153345.GA30897@redhat.com> (raw)
In-Reply-To: <20150321185745.GA11090@redhat.com>

On 03/21, Oleg Nesterov wrote:
>
> On 03/20, Pavel Labath wrote:
> >
> > One difference I see though is that in
> > our test, we are not sending any additional signals to the thread in
> > question (at least we shouldn't be sending them, but we are sending some
> > signals to other threads in the same process). Do you think it could still
> > be the same issue?
>
> Not sure...
>
> And. I found another race, which looks more promising wrt your description.
> ptrace_resume() sets ->exit_code before it wakes the tracee up. If the
> tracer's sub-thread calls wait() right after that, it can wrongly see
> task_stopped_code(tracee, true) != 0, as if the tracee reports its
> ->exit_code.
>
> > I would be happy to test your patch. I don't think I can patch the kernel
> > on my work machine directly, but I think I might be able to set up some
> > sort of a test environment to try it out.
>
> Thanks! could you try the patch below? It won't help my test-case, but
> _perhaps_ it can fix the problem you hit?
>
> And a couple of questions just in case...
>
> Which kernel version? Although probably this doesn't matter, this race
> is very-very old.
>
> Let me return to your description,
>
> 	1) we get a waitpid() notification that the tracee got SIGUSR1
> 	2) we do a ptrace(GETSIGINFO) to get more info
> 	3) eventually we decide to restart the tracee with PTRACE_CONT, passing it
> 	   SIGUSR1
> 	4) immediately after that we get another waitpid notification, again with
> 	   SIGUSR1,
>
> Does this "waitpid notification" mean that _another_ thread returns
> from waitpid() ?
>
> And status == (SIGUSR1 << 8) | 0x7f , yes? IOW, is WIFSTOPPED() true?


Please see the updated (comments + optimization) patch below. Note that
it has the test-case, so I'll send it in any case.

But it would be nice to know if it fixes your problem or not.

Josh, this reminds me the obsure bug report we discussed some time ago,
perhaps this is the same thing...

Oleg.

------------------------------------------------------------------------------
Subject: [PATCH] ptrace: fix race between ptrace_resume() and wait_task_stopped()

ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <sys/ptrace.h>
	#include <pthread.h>
	#include <assert.h>

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Cc: <stable@vger.kernel.org>
---
 kernel/ptrace.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1eb9d90..5009263 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -697,6 +697,8 @@ static int ptrace_peek_siginfo(struct task_struct *child,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
+	bool need_siglock;
+
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -724,8 +726,26 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
+	/*
+	 * Change ->exit_code and ->state under siglock to avoid the race
+	 * with wait_task_stopped() in between; a non-zero ->exit_code will
+	 * wrongly look like another report from tracee.
+	 *
+	 * Note that we need siglock even if ->exit_code == data and/or this
+	 * status was not reported yet, the new status must not be cleared by
+	 * wait_task_stopped() after resume.
+	 *
+	 * If data == 0 we do not care if wait_task_stopped() reports the old
+	 * status and clears the code too; this can't race with the tracee, it
+	 * takes siglock after resume.
+	 */
+	need_siglock = data && !thread_group_empty(current);
+	if (need_siglock)
+		spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
+	if (need_siglock)
+		spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
1.5.5.1



  reply	other threads:[~2015-03-22 15:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJt8pk-+UGsmAzA8cTn3deWfSrDAy__Yh=bqi4_NRqJVhg63JQ@mail.gmail.com>
2015-03-20 16:25 ` A peculiarity in ptrace/waitpid behavior Oleg Nesterov
     [not found]   ` <CAJt8pk8-=xV_Tofr2W7jT8kLX-GseEeL5d2+w0U4zv2QqnP6rQ@mail.gmail.com>
2015-03-20 18:53     ` Pavel Labath
2015-03-21 18:57     ` Oleg Nesterov
2015-03-22 15:33       ` Oleg Nesterov [this message]
2015-03-23  9:42       ` Pavel Labath
2015-03-24 16:53         ` Pavel Labath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150322153345.GA30897@redhat.com \
    --to=oleg@redhat.com \
    --cc=jistone@redhat.com \
    --cc=labath@google.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).