From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751982AbbCTQ1s (ORCPT ); Fri, 20 Mar 2015 12:27:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49107 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751142AbbCTQ1q (ORCPT ); Fri, 20 Mar 2015 12:27:46 -0400 Date: Fri, 20 Mar 2015 17:25:48 +0100 From: Oleg Nesterov To: Pavel Labath Cc: linux-kernel@vger.kernel.org Subject: Re: A peculiarity in ptrace/waitpid behavior Message-ID: <20150320162548.GA21069@redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Pavel, let me add lkml, we should not discuss this offlist. On 03/20, Pavel Labath wrote: > > 1) we get a waitpid() notification that the tracee got SIGUSR1 > 2) we do a ptrace(GETSIGINFO) to get more info > 3) eventually we decide to restart the tracee with PTRACE_CONT, passing it > SIGUSR1 > 4) immediately after that we get another waitpid notification, again with > SIGUSR1, even though the thread had received no additional signals > 5) we again try to a GETSIGINFO, however this time it fails with ESRCH. > Therefore, we assume that the thread has died I found a similar bug by code inspection some time ago. I even have a fix, but I need to think more... And I even wrote the test-case ;) see below. But so far I can't say if you hit the same problem or not. If you can reproduce the problem, perhaps I can send you debugging patch? Oleg. #include #include #include #include #include #include #define tkill(pid, sig) \ syscall(__NR_tkill, pid, sig) void run_test(void) { int pid, stat; pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); raise(SIGSTOP); assert(0); } assert(pid == wait(&stat) && stat == 0x137f); tkill(pid, SIGTRAP); /* should not be reported */ tkill(pid, SIGKILL); assert(pid == wait(&stat)); if (stat == 0x9) return; printf("unexpected wait: stat=%x\n", stat); kill(0, SIGKILL); } int main(void) { int i = 8; /* random */ while (--i) if (!fork()) break; for (;;) run_test(); return 0; }