Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: Kamil Konieczny <kamil.konieczny@linux.intel.com>,
	igt-dev@lists.freedesktop.org,
	Petri Latvala <adrinael@adrinael.net>
Subject: Re: [PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal
Date: Wed, 11 Sep 2024 09:06:12 +0200	[thread overview]
Message-ID: <da9e03fa-1ecf-49eb-84ef-66aeee6bd99d@linux.intel.com> (raw)
In-Reply-To: <20240910155814.jq2og45xf7apwb64@kamilkon-DESK.igk.intel.com>

Hi Kamil,

Thank you for your review.

On 10.09.2024 17:58, Kamil Konieczny wrote:
> Hi Peter,
> On 2024-09-03 at 14:05:05 +0200, Peter Senna Tschudin wrote:
>> Make igt-runner aware about tests being killed by signals. Before this
>> patch, manually killing a test process would result in igt-runner silently
>> marking the test as incomplete.
>>
>> Now igt-runner aborts the run verbosely. As an example the following was
>> extracted from results.json:
>>  This test caused an abort condition: Test terminated by a signal Killed (-9).
>>
>>  v5: do not use sigdescr_np() as it seems to be a fairly new lib function that does
>>      not compile on older Ubuntu
>>  v4: improve abort code path to not interfere with igt-runner timeouts
>>  v3: do not interfere with igt-runner killing tests due to timeout and diskspace
>>  v2: fix race condition
>>
>> Cc: Petri Latvala <adrinael@adrinael.net>
>> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
>> Signed-off-by: Peter Senna Tschudin <peter.senna@intel.com>
> 
> Your signed-off-by is different from your e-mail:
> Peter Senna Tschudin <peter.senna@linux.intel.com>

That was a feature, not a bug.

> 
> checkpatch.pl script warns:
> 
> 100: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Peter Senna Tschudin <peter.senna@linux.intel.com>' != 'Signed-off-by: Peter Senna Tschudin <peter.senna@intel.com>'

Will fix and not repeat.

> 
>> ---
>>  runner/executor.c | 38 +++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 37 insertions(+), 1 deletion(-)
>>
>> diff --git a/runner/executor.c b/runner/executor.c
>> index ac73e1dde..990d932f3 100644
>> --- a/runner/executor.c
>> +++ b/runner/executor.c
>> @@ -888,6 +888,8 @@ static int monitor_output(pid_t child,
>>  	const int interval_length = 1;
>>  	int wd_timeout;
>>  	int killed = 0; /* 0 if not killed, signal number otherwise */
>> +	bool child_reaped = false;
>> +	bool child_killed_by_signal = false;
>>  	struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
>>  	unsigned long taints = 0;
>>  	bool aborting = false;
>> @@ -960,6 +962,25 @@ static int monitor_output(pid_t child,
>>  
>>  		igt_gettime(&time_now);
>>  
>> +		/* Testing for !killed to prevent aborting too early after igt-runner
>> +		 * decides to kill a process.
>> +		 */
>> +		if (!killed && (child == waitpid(child, &status, WNOHANG))) {
>> +			child_reaped = true;
>> +			if (WIFSIGNALED(status)) {
>> +				child_killed_by_signal = true;
>> +				killed = WTERMSIG(status);
>> +
>> +				/*
>> +				 * Do not abort just yet, because igt-runner can kill the test
>> +				 * due to a timeout for example. Aborting here prevents
>> +				 * igt-runner from reporting a timeout. The code that aborts
>> +				 * the run after the test was killed is at the end of the
>> +				 * while() loop.
>> +				 */
>> +			}
>> +		}
>> +
> 
> Why is it here? Imho better place it at second hunk.

Please notice that the problem is that igt-runner does not check if a child ended due to a signal, but it notices when a child ends. The problem with your suggestion is that the loop _may_ decide to exercise the old behavior and simply mark the test as incomplete.

My proposal is to always check for child termination due to a signal, and hence unconditional detection before all other logic inside the loop. However there are corner cases, such as excessive disk space consumption by a test, in which igt-runner will signal a test, and will check for the child being signaled. To not interfere with these corner cases, the detection of a signal and taking the action of aborting the run are at different places.

> 
>>  		/* TODO: Refactor these handlers to their own functions */
>>  		if (outfd >= 0 && FD_ISSET(outfd, &set)) {
>>  			char *newline;
>> @@ -1241,7 +1262,11 @@ static int monitor_output(pid_t child,
>>  				errf("Error reading from signalfd: %m\n");
>>  				continue;
>>  			} else if (siginfo.ssi_signo == SIGCHLD) {
>> -				if (child != waitpid(child, &status, WNOHANG)) {
> 
> This code here do what you wrote at first hunk:
> 	if (child != waitpid(child, &status, WNOHANG)) {
> 		// fatal err, set status=9999
> 	} else { // all 'else if' checks 'child == '
> 	}
> 
> The only difference is that your code calls waitpid() unconditionally
> before loop. Could you write some testing code in one simple test

Exactly. Calling waitpid() unconditionally, and early enough to catch all signals.

> and make checks on trybot? e.g. to not rely that your code will
> catch such scenarios if they are rarly seen but to test them in
> predictable way. You could also add here some debugs.

I did not understand what you want to test. The "condition" here is igt-runner itself killing a test due to excessive disk space consumption for example.

Please let me know how to proceed.

Thank you!

[...]

  reply	other threads:[~2024-09-11  7:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28  6:40 [PATCH i-g-t] runner/executor.c: Detect tests killed by a signal Peter Senna Tschudin
2024-08-28  8:37 ` ✗ GitLab.Pipeline: warning for " Patchwork
2024-08-28  8:40 ` ✓ CI.xeBAT: success " Patchwork
2024-08-28  8:52 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-08-28 13:15 ` ✗ CI.xeFULL: " Patchwork
2024-08-29 12:13 ` [PATCH i-g-t v2] runner/executor: Detect when child process is " Peter Senna Tschudin
2024-08-29 17:37 ` ✗ CI.xeBAT: failure for runner/executor.c: Detect tests killed by a signal (rev2) Patchwork
2024-08-30  4:28   ` Peter Senna Tschudin
2024-08-29 17:49 ` ✓ Fi.CI.BAT: success " Patchwork
2024-08-30  6:34 ` ✗ CI.xeFULL: failure " Patchwork
2024-08-30 12:39 ` [PATCH i-g-t v3] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-08-30 16:19 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev3) Patchwork
2024-08-30 16:31 ` ✓ Fi.CI.BAT: " Patchwork
2024-08-31  0:57 ` ✗ Fi.CI.IGT: failure for runner/executor.c: Detect tests killed by a signal (rev2) Patchwork
2024-08-31  4:32 ` ✓ CI.xeFULL: success for runner/executor.c: Detect tests killed by a signal (rev3) Patchwork
2024-09-01  3:32 ` ✓ Fi.CI.IGT: " Patchwork
2024-09-03  6:19 ` [PATCH i-g-t v4] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-09-03 11:44 ` ✗ Fi.CI.BUILD: failure for runner/executor.c: Detect tests killed by a signal (rev4) Patchwork
2024-09-03 12:05 ` [PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-09-10 15:58   ` Kamil Konieczny
2024-09-11  7:06     ` Peter Senna Tschudin [this message]
2024-09-11 10:50       ` Kamil Konieczny
2024-09-03 12:36 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev5) Patchwork
2024-09-03 13:04 ` ✓ Fi.CI.BAT: " Patchwork
2024-09-03 14:46 ` ✗ CI.xeFULL: failure " Patchwork
2024-09-03 14:56   ` Peter Senna Tschudin
2024-09-04 14:34 ` ✓ Fi.CI.IGT: success " Patchwork
2024-09-05 10:44 ` ✗ GitLab.Pipeline: warning for runner/executor.c: Detect tests killed by a signal (rev4) Patchwork
2024-10-03 13:27 ` [PATCH i-g-t v6] runner/executor: Abort when child process is killed by a signal Peter Senna Tschudin
2024-10-09 14:46   ` Kamil Konieczny
2024-10-09 16:36   ` Kamil Konieczny
2024-10-11 15:18   ` Kamil Konieczny
2024-10-03 16:17 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev6) Patchwork
2024-10-03 16:29 ` ✓ Fi.CI.BAT: " Patchwork
2024-10-03 18:00 ` ✗ CI.xeFULL: failure " Patchwork
2024-10-04  5:30   ` Peter Senna Tschudin
2024-10-04 10:34 ` ✗ GitLab.Pipeline: warning " Patchwork
2024-10-08  3:28 ` ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da9e03fa-1ecf-49eb-84ef-66aeee6bd99d@linux.intel.com \
    --to=peter.senna@linux.intel.com \
    --cc=adrinael@adrinael.net \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=kamil.konieczny@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox