From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752438AbbCUS7s (ORCPT <rfc822;w@1wt.eu>);
	Sat, 21 Mar 2015 14:59:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:57802 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751608AbbCUS7o (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Sat, 21 Mar 2015 14:59:44 -0400
Date: Sat, 21 Mar 2015 19:57:45 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: Pavel Labath <labath@google.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: A peculiarity in ptrace/waitpid behavior
Message-ID: <20150321185745.GA11090@redhat.com>
References: <CAJt8pk-+UGsmAzA8cTn3deWfSrDAy__Yh=bqi4_NRqJVhg63JQ@mail.gmail.com> <20150320162548.GA21069@redhat.com> <CAJt8pk8-=xV_Tofr2W7jT8kLX-GseEeL5d2+w0U4zv2QqnP6rQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJt8pk8-=xV_Tofr2W7jT8kLX-GseEeL5d2+w0U4zv2QqnP6rQ@mail.gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/20, Pavel Labath wrote:
>
> One difference I see though is that in
> our test, we are not sending any additional signals to the thread in
> question (at least we shouldn't be sending them, but we are sending some
> signals to other threads in the same process). Do you think it could still
> be the same issue?

Not sure...

And. I found another race, which looks more promising wrt your description.
ptrace_resume() sets ->exit_code before it wakes the tracee up. If the
tracer's sub-thread calls wait() right after that, it can wrongly see
task_stopped_code(tracee, true) != 0, as if the tracee reports its
->exit_code.

> I would be happy to test your patch. I don't think I can patch the kernel
> on my work machine directly, but I think I might be able to set up some
> sort of a test environment to try it out.

Thanks! could you try the patch below? It won't help my test-case, but
_perhaps_ it can fix the problem you hit?

And a couple of questions just in case...

Which kernel version? Although probably this doesn't matter, this race
is very-very old.

Let me return to your description,

	1) we get a waitpid() notification that the tracee got SIGUSR1
	2) we do a ptrace(GETSIGINFO) to get more info
	3) eventually we decide to restart the tracee with PTRACE_CONT, passing it
	   SIGUSR1
	4) immediately after that we get another waitpid notification, again with
	   SIGUSR1,

Does this "waitpid notification" mean that _another_ thread returns
from waitpid() ?

And status == (SIGUSR1 << 8) | 0x7f , yes? IOW, is WIFSTOPPED() true?

Oleg.

--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -724,8 +724,10 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }