From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: Kamil Konieczny <kamil.konieczny@linux.intel.com>,
igt-dev@lists.freedesktop.org,
Petri Latvala <adrinael@adrinael.net>
Subject: Re: [PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal
Date: Wed, 11 Sep 2024 09:06:12 +0200 [thread overview]
Message-ID: <da9e03fa-1ecf-49eb-84ef-66aeee6bd99d@linux.intel.com> (raw)
In-Reply-To: <20240910155814.jq2og45xf7apwb64@kamilkon-DESK.igk.intel.com>
Hi Kamil,
Thank you for your review.
On 10.09.2024 17:58, Kamil Konieczny wrote:
> Hi Peter,
> On 2024-09-03 at 14:05:05 +0200, Peter Senna Tschudin wrote:
>> Make igt-runner aware about tests being killed by signals. Before this
>> patch, manually killing a test process would result in igt-runner silently
>> marking the test as incomplete.
>>
>> Now igt-runner aborts the run verbosely. As an example the following was
>> extracted from results.json:
>> This test caused an abort condition: Test terminated by a signal Killed (-9).
>>
>> v5: do not use sigdescr_np() as it seems to be a fairly new lib function that does
>> not compile on older Ubuntu
>> v4: improve abort code path to not interfere with igt-runner timeouts
>> v3: do not interfere with igt-runner killing tests due to timeout and diskspace
>> v2: fix race condition
>>
>> Cc: Petri Latvala <adrinael@adrinael.net>
>> Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>
>> Signed-off-by: Peter Senna Tschudin <peter.senna@intel.com>
>
> Your signed-off-by is different from your e-mail:
> Peter Senna Tschudin <peter.senna@linux.intel.com>
That was a feature, not a bug.
>
> checkpatch.pl script warns:
>
> 100: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Peter Senna Tschudin <peter.senna@linux.intel.com>' != 'Signed-off-by: Peter Senna Tschudin <peter.senna@intel.com>'
Will fix and not repeat.
>
>> ---
>> runner/executor.c | 38 +++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 37 insertions(+), 1 deletion(-)
>>
>> diff --git a/runner/executor.c b/runner/executor.c
>> index ac73e1dde..990d932f3 100644
>> --- a/runner/executor.c
>> +++ b/runner/executor.c
>> @@ -888,6 +888,8 @@ static int monitor_output(pid_t child,
>> const int interval_length = 1;
>> int wd_timeout;
>> int killed = 0; /* 0 if not killed, signal number otherwise */
>> + bool child_reaped = false;
>> + bool child_killed_by_signal = false;
>> struct timespec time_beg, time_now, time_last_activity, time_last_subtest, time_killed;
>> unsigned long taints = 0;
>> bool aborting = false;
>> @@ -960,6 +962,25 @@ static int monitor_output(pid_t child,
>>
>> igt_gettime(&time_now);
>>
>> + /* Testing for !killed to prevent aborting too early after igt-runner
>> + * decides to kill a process.
>> + */
>> + if (!killed && (child == waitpid(child, &status, WNOHANG))) {
>> + child_reaped = true;
>> + if (WIFSIGNALED(status)) {
>> + child_killed_by_signal = true;
>> + killed = WTERMSIG(status);
>> +
>> + /*
>> + * Do not abort just yet, because igt-runner can kill the test
>> + * due to a timeout for example. Aborting here prevents
>> + * igt-runner from reporting a timeout. The code that aborts
>> + * the run after the test was killed is at the end of the
>> + * while() loop.
>> + */
>> + }
>> + }
>> +
>
> Why is it here? Imho better place it at second hunk.
Please notice that the problem is that igt-runner does not check if a child ended due to a signal, but it notices when a child ends. The problem with your suggestion is that the loop _may_ decide to exercise the old behavior and simply mark the test as incomplete.
My proposal is to always check for child termination due to a signal, and hence unconditional detection before all other logic inside the loop. However there are corner cases, such as excessive disk space consumption by a test, in which igt-runner will signal a test, and will check for the child being signaled. To not interfere with these corner cases, the detection of a signal and taking the action of aborting the run are at different places.
>
>> /* TODO: Refactor these handlers to their own functions */
>> if (outfd >= 0 && FD_ISSET(outfd, &set)) {
>> char *newline;
>> @@ -1241,7 +1262,11 @@ static int monitor_output(pid_t child,
>> errf("Error reading from signalfd: %m\n");
>> continue;
>> } else if (siginfo.ssi_signo == SIGCHLD) {
>> - if (child != waitpid(child, &status, WNOHANG)) {
>
> This code here do what you wrote at first hunk:
> if (child != waitpid(child, &status, WNOHANG)) {
> // fatal err, set status=9999
> } else { // all 'else if' checks 'child == '
> }
>
> The only difference is that your code calls waitpid() unconditionally
> before loop. Could you write some testing code in one simple test
Exactly. Calling waitpid() unconditionally, and early enough to catch all signals.
> and make checks on trybot? e.g. to not rely that your code will
> catch such scenarios if they are rarly seen but to test them in
> predictable way. You could also add here some debugs.
I did not understand what you want to test. The "condition" here is igt-runner itself killing a test due to excessive disk space consumption for example.
Please let me know how to proceed.
Thank you!
[...]
next prev parent reply other threads:[~2024-09-11 7:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-28 6:40 [PATCH i-g-t] runner/executor.c: Detect tests killed by a signal Peter Senna Tschudin
2024-08-28 8:37 ` ✗ GitLab.Pipeline: warning for " Patchwork
2024-08-28 8:40 ` ✓ CI.xeBAT: success " Patchwork
2024-08-28 8:52 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-08-28 13:15 ` ✗ CI.xeFULL: " Patchwork
2024-08-29 12:13 ` [PATCH i-g-t v2] runner/executor: Detect when child process is " Peter Senna Tschudin
2024-08-29 17:37 ` ✗ CI.xeBAT: failure for runner/executor.c: Detect tests killed by a signal (rev2) Patchwork
2024-08-30 4:28 ` Peter Senna Tschudin
2024-08-29 17:49 ` ✓ Fi.CI.BAT: success " Patchwork
2024-08-30 6:34 ` ✗ CI.xeFULL: failure " Patchwork
2024-08-30 12:39 ` [PATCH i-g-t v3] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-08-30 16:19 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev3) Patchwork
2024-08-30 16:31 ` ✓ Fi.CI.BAT: " Patchwork
2024-08-31 0:57 ` ✗ Fi.CI.IGT: failure for runner/executor.c: Detect tests killed by a signal (rev2) Patchwork
2024-08-31 4:32 ` ✓ CI.xeFULL: success for runner/executor.c: Detect tests killed by a signal (rev3) Patchwork
2024-09-01 3:32 ` ✓ Fi.CI.IGT: " Patchwork
2024-09-03 6:19 ` [PATCH i-g-t v4] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-09-03 11:44 ` ✗ Fi.CI.BUILD: failure for runner/executor.c: Detect tests killed by a signal (rev4) Patchwork
2024-09-03 12:05 ` [PATCH i-g-t v5] runner/executor: Detect when child process is killed by a signal Peter Senna Tschudin
2024-09-10 15:58 ` Kamil Konieczny
2024-09-11 7:06 ` Peter Senna Tschudin [this message]
2024-09-11 10:50 ` Kamil Konieczny
2024-09-03 12:36 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev5) Patchwork
2024-09-03 13:04 ` ✓ Fi.CI.BAT: " Patchwork
2024-09-03 14:46 ` ✗ CI.xeFULL: failure " Patchwork
2024-09-03 14:56 ` Peter Senna Tschudin
2024-09-04 14:34 ` ✓ Fi.CI.IGT: success " Patchwork
2024-09-05 10:44 ` ✗ GitLab.Pipeline: warning for runner/executor.c: Detect tests killed by a signal (rev4) Patchwork
2024-10-03 13:27 ` [PATCH i-g-t v6] runner/executor: Abort when child process is killed by a signal Peter Senna Tschudin
2024-10-09 14:46 ` Kamil Konieczny
2024-10-09 16:36 ` Kamil Konieczny
2024-10-11 15:18 ` Kamil Konieczny
2024-10-03 16:17 ` ✓ CI.xeBAT: success for runner/executor.c: Detect tests killed by a signal (rev6) Patchwork
2024-10-03 16:29 ` ✓ Fi.CI.BAT: " Patchwork
2024-10-03 18:00 ` ✗ CI.xeFULL: failure " Patchwork
2024-10-04 5:30 ` Peter Senna Tschudin
2024-10-04 10:34 ` ✗ GitLab.Pipeline: warning " Patchwork
2024-10-08 3:28 ` ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da9e03fa-1ecf-49eb-84ef-66aeee6bd99d@linux.intel.com \
--to=peter.senna@linux.intel.com \
--cc=adrinael@adrinael.net \
--cc=igt-dev@lists.freedesktop.org \
--cc=kamil.konieczny@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.