From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752438AbbCUS7s (ORCPT ); Sat, 21 Mar 2015 14:59:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57802 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751608AbbCUS7o (ORCPT ); Sat, 21 Mar 2015 14:59:44 -0400 Date: Sat, 21 Mar 2015 19:57:45 +0100 From: Oleg Nesterov To: Pavel Labath Cc: linux-kernel@vger.kernel.org Subject: Re: A peculiarity in ptrace/waitpid behavior Message-ID: <20150321185745.GA11090@redhat.com> References: <20150320162548.GA21069@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/20, Pavel Labath wrote: > > One difference I see though is that in > our test, we are not sending any additional signals to the thread in > question (at least we shouldn't be sending them, but we are sending some > signals to other threads in the same process). Do you think it could still > be the same issue? Not sure... And. I found another race, which looks more promising wrt your description. ptrace_resume() sets ->exit_code before it wakes the tracee up. If the tracer's sub-thread calls wait() right after that, it can wrongly see task_stopped_code(tracee, true) != 0, as if the tracee reports its ->exit_code. > I would be happy to test your patch. I don't think I can patch the kernel > on my work machine directly, but I think I might be able to set up some > sort of a test environment to try it out. Thanks! could you try the patch below? It won't help my test-case, but _perhaps_ it can fix the problem you hit? And a couple of questions just in case... Which kernel version? Although probably this doesn't matter, this race is very-very old. Let me return to your description, 1) we get a waitpid() notification that the tracee got SIGUSR1 2) we do a ptrace(GETSIGINFO) to get more info 3) eventually we decide to restart the tracee with PTRACE_CONT, passing it SIGUSR1 4) immediately after that we get another waitpid notification, again with SIGUSR1, Does this "waitpid notification" mean that _another_ thread returns from waitpid() ? And status == (SIGUSR1 << 8) | 0x7f , yes? IOW, is WIFSTOPPED() true? Oleg. --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -724,8 +724,10 @@ static int ptrace_resume(struct task_struct *child, long request, user_disable_single_step(child); } + spin_lock_irq(&child->sighand->siglock); child->exit_code = data; wake_up_state(child, __TASK_TRACED); + spin_unlock_irq(&child->sighand->siglock); return 0; }