public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Petri Latvala <petri.latvala@intel.com>
To: igt-dev@lists.freedesktop.org
Cc: Petri Latvala <petri.latvala@intel.com>
Subject: [igt-dev] [PATCH i-g-t v2 2/2] runner: Don't wait forever for processes to die
Date: Tue,  3 Dec 2019 13:15:37 +0200	[thread overview]
Message-ID: <20191203111537.23389-1-petri.latvala@intel.com> (raw)
In-Reply-To: <20191203104457.20176-2-petri.latvala@intel.com>

While the originally written timeout for process killing (2 seconds)
was way too short, waiting indefinitely is suboptimal as well. We're
seeing cases where the test is stuck for possibly hours in
uninterruptible sleep (IO). Wait a fairly longer selected time period
of 2 minutes, because even making progress for that long means the
machine is in bad enough state to require a good kicking and booting.

v2:
 - Abort quicker if kernel is tainted (Chris)
 - Correctly convert process-exists check with kill() to process-does-not-exist

Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
---
 runner/executor.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/runner/executor.c b/runner/executor.c
index e6086772..f36bfd3d 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -682,6 +682,9 @@ static int monitor_output(pid_t child,
 	int timeout_intervals = 1, intervals_left;
 	int wd_extra = 10;
 	int killed = 0; /* 0 if not killed, signal number otherwise */
+	int sigkill_timeout = 120;
+	int sigkill_interval = 20;
+	int sigkill_intervals_left = sigkill_timeout / sigkill_interval;
 	struct timespec time_beg, time_end;
 	unsigned long taints = 0;
 	bool aborting = false;
@@ -776,25 +779,33 @@ static int monitor_output(pid_t child,
 				if (!kill_child(killed, child))
 					return -1;
 
-				intervals_left = timeout_intervals = 1;
-				break;
-			case SIGKILL:
 				/*
-				 * If the child still exists, and the kernel
-				 * hasn't oopsed, assume it is still making
-				 * forward progress towards exiting (i.e. still
-				 * freeing all of its resources).
+				 * Allow the test two minutes to die
+				 * on SIGKILL. If it takes more than
+				 * that, we're quite likely in a
+				 * scenario where we want to reboot
+				 * the machine anyway.
 				 */
-				if (kill(child, 0) == 0 && !tainted(&taints)) {
-					intervals_left =  1;
+				watchdogs_set_timeout(sigkill_timeout);
+				timeout = sigkill_interval;
+				intervals_left = 1; /* Intervals handled separately for sigkill */
+				break;
+			case SIGKILL:
+				if (!tainted(&taints) && --sigkill_intervals_left) {
+					intervals_left = 1;
 					break;
 				}
 
 				/* Nothing that can be done, really. Let's tell the caller we want to abort. */
+
 				if (settings->log_level >= LOG_LEVEL_NORMAL) {
 					errf("Child refuses to die, tainted %lx. Aborting.\n",
 					     taints);
+					if (kill(child, 0) && errno == ESRCH)
+						errf("The test process no longer exists, "
+						     "but we didn't get informed of its demise...\n");
 				}
+
 				close_watchdogs(settings);
 				free(outbuf);
 				close(outfd);
-- 
2.19.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

  parent reply	other threads:[~2019-12-03 11:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-03 10:44 [igt-dev] [PATCH i-g-t 1/2] runner: Actually ping watchdogs every interval Petri Latvala
2019-12-03 10:44 ` [igt-dev] [PATCH i-g-t 2/2] runner: Don't wait forever for processes to die Petri Latvala
2019-12-03 10:53   ` Chris Wilson
2019-12-03 10:58     ` Petri Latvala
2019-12-03 11:15   ` Petri Latvala [this message]
2019-12-03 11:37     ` [igt-dev] [PATCH i-g-t v2 " Chris Wilson
2019-12-03 11:39 ` [igt-dev] [PATCH i-g-t 1/2] runner: Actually ping watchdogs every interval Chris Wilson
2019-12-03 11:51 ` [igt-dev] ✓ Fi.CI.BAT: success for series starting with [i-g-t,1/2] runner: Actually ping watchdogs every interval (rev2) Patchwork
2019-12-03 16:31 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203111537.23389-1-petri.latvala@intel.com \
    --to=petri.latvala@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox