From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tvrtko Ursulin Subject: Re: [PATCH 1/3] Wait for any pid in order to reap failure quicker Date: Fri, 11 Jul 2014 12:56:43 +0100 Message-ID: <53BFD0FB.3060505@linux.intel.com> References: <1405071640-29692-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id E7F356E20C for ; Fri, 11 Jul 2014 04:56:44 -0700 (PDT) In-Reply-To: <1405071640-29692-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On 07/11/2014 10:40 AM, Chris Wilson wrote: > When waiting for the forked tests, we can respond quicker to a failure > (such as oom) by waiting for any child to exit rather than waiting for > each child in order. Then when we see that a test failed, we can kill > all other children before aborting. > > Signed-off-by: Chris Wilson > --- > lib/igt_core.c | 42 +++++++++++++++++++++++++++++++----------- > 1 file changed, 31 insertions(+), 11 deletions(-) > > diff --git a/lib/igt_core.c b/lib/igt_core.c > index 7ac7ebe..e5dc78b 100644 > --- a/lib/igt_core.c > +++ b/lib/igt_core.c > @@ -915,32 +915,52 @@ bool __igt_fork(void) > */ > void igt_waitchildren(void) > { > + int err = 0; > + int count; > + > assert(!test_child); > > - for (int nc = 0; nc < num_test_children; nc++) { > + count = 0; > + while (count < num_test_children) { > int status = -1; > - while (waitpid(test_children[nc], &status, 0) == -1 && > - errno == EINTR) > - ; > + pid_t pid; > + int c; > > - if (status != 0) { > + pid = wait(&status); > + if (pid == -1) > + continue; Not sure if it would make sense to be more defensive and maybe assert on some errors from wait(2) here (like ECHILD). But it is the same as before the patch so doesn't really matter. > + > + for (c = 0; c < num_test_children; c++) > + if (pid == test_children[c]) > + break; > + if (c == num_test_children) > + continue; > + > + if (err == 0 && status != 0) { > if (WIFEXITED(status)) { > printf("child %i failed with exit status %i\n", > - nc, WEXITSTATUS(status)); > - igt_fail(WEXITSTATUS(status)); > + c, WEXITSTATUS(status)); > + err = WEXITSTATUS(status); > } else if (WIFSIGNALED(status)) { > printf("child %i died with signal %i, %s\n", > - nc, WTERMSIG(status), > + c, WTERMSIG(status), > strsignal(WTERMSIG(status))); > - igt_fail(99); > + err = 128 + WTERMSIG(status); > } else { > - printf("Unhandled failure in child %i\n", nc); > - abort(); > + printf("Unhandled failure [%d] in child %i\n", status, c); > + err = 256; > } > + > + for (c = 0; c < num_test_children; c++) > + kill(test_children[c], SIGKILL); > } > + > + count++; > } > > num_test_children = 0; > + if (err) > + igt_fail(err); > } > > /* exit handler code */ > Looks fine. Reviewed-by: Tvrtko Ursulin