* [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console
@ 2023-01-18 21:54 Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 1/3] ktest.pl: Fix missing "end_monitor" when machine check fails Steven Rostedt
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-01-18 21:54 UTC (permalink / raw)
To: linux-kernel; +Cc: John Warthog9 Hawley, Masami Hiramatsu
I've noticed that my ssh sessions would hang during test runs, which
is really frustrating when you kick off a 13 hour test before going to
bed, and the second test (1 hour into it) hangs, and you need to kick
it off again in the morning (wasting all that time over night).
I finally figured out the cause. There is a disconnect between
the run_command that executes the test, and the "wait_for_input" that
monitors the test. The wait_for_input has a default of 2 minute timeout
if it doesn't see any output it returns. The run_command takes the
empty string from wait_for_input as the test is finished, and then
stops monitoring it, and calls waitpid() waiting for the test to
exit.
The problem is that if the test has a lot of output, it will continue
writing into the pipe that was suppose to go to the monitor, which has
now exited the loop. When the pipe fills up, it will not finish.
When the test is over, it just hangs waiting for the pipe to flush
(which never happens).
To fix this, change the run_command to by default have an infinite
run (which can be overridden by the new RUN_TIMEOUT option), and
make the wait_for_input also wait indefinitely in this case. It now
Now the tests will have its content continuously read and will exit
normally.
While debugging this, I also found out why you can lose stdout on
the terminal sometimes. Especially if you hit Ctrl^C while the monitor
is running. It was due to missing "end_monitor" which gives back the
tty to the terminal. The first two patches fix that.
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest.git
devel
Head SHA1: aa9aba9382884554fe6a303744884866d137422d
Steven Rostedt (3):
ktest.pl: Fix missing "end_monitor" when machine check fails
ktest.pl: Give back console on Ctrt^C on monitor
ktest.pl: Add RUN_TIMEOUT option with default unlimited
----
tools/testing/ktest/ktest.pl | 26 +++++++++++++++++++++-----
tools/testing/ktest/sample.conf | 5 +++++
2 files changed, 26 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [for-linus][PATCH 1/3] ktest.pl: Fix missing "end_monitor" when machine check fails
2023-01-18 21:54 [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Steven Rostedt
@ 2023-01-18 21:54 ` Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 2/3] ktest.pl: Give back console on Ctrt^C on monitor Steven Rostedt
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-01-18 21:54 UTC (permalink / raw)
To: linux-kernel; +Cc: John Warthog9 Hawley, Masami Hiramatsu, stable
From: Steven Rostedt <rostedt@goodmis.org>
In the "reboot" command, it does a check of the machine to see if it is
still alive with a simple "ssh echo" command. If it fails, it will assume
that a normal "ssh reboot" is not possible and force a power cycle.
In this case, the "start_monitor" is executed, but the "end_monitor" is
not, and this causes the screen will not be given back to the console. That
is, after the test, a "reset" command needs to be performed, as "echo" is
turned off.
Cc: stable@vger.kernel.org
Fixes: 6474ace999edd ("ktest.pl: Powercycle the box on reboot if no connection can be made")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
tools/testing/ktest/ktest.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/ktest/ktest.pl b/tools/testing/ktest/ktest.pl
index 6f9fff88cedf..f2f48ce6ac4d 100755
--- a/tools/testing/ktest/ktest.pl
+++ b/tools/testing/ktest/ktest.pl
@@ -1499,7 +1499,8 @@ sub reboot {
# Still need to wait for the reboot to finish
wait_for_monitor($time, $reboot_success_line);
-
+ }
+ if ($powercycle || $time) {
end_monitor;
}
}
--
2.39.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [for-linus][PATCH 2/3] ktest.pl: Give back console on Ctrt^C on monitor
2023-01-18 21:54 [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 1/3] ktest.pl: Fix missing "end_monitor" when machine check fails Steven Rostedt
@ 2023-01-18 21:54 ` Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 3/3] ktest.pl: Add RUN_TIMEOUT option with default unlimited Steven Rostedt
2023-01-20 9:10 ` [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Masami Hiramatsu
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-01-18 21:54 UTC (permalink / raw)
To: linux-kernel; +Cc: John Warthog9 Hawley, Masami Hiramatsu, stable
From: Steven Rostedt <rostedt@goodmis.org>
When monitoring the console output, the stdout is being redirected to do
so. If Ctrl^C is hit during this mode, the stdout is not back to the
console, the user does not see anything they type (no echo).
Add "end_monitor" to the SIGINT interrupt handler to give back the console
on Ctrl^C.
Cc: stable@vger.kernel.org
Fixes: 9f2cdcbbb90e7 ("ktest: Give console process a dedicated tty")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
tools/testing/ktest/ktest.pl | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/ktest/ktest.pl b/tools/testing/ktest/ktest.pl
index f2f48ce6ac4d..78249c3a03a5 100755
--- a/tools/testing/ktest/ktest.pl
+++ b/tools/testing/ktest/ktest.pl
@@ -4205,6 +4205,9 @@ sub send_email {
}
sub cancel_test {
+ if ($monitor_cnt) {
+ end_monitor;
+ }
if ($email_when_canceled) {
my $name = get_test_name;
send_email("KTEST: Your [$name] test was cancelled",
--
2.39.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [for-linus][PATCH 3/3] ktest.pl: Add RUN_TIMEOUT option with default unlimited
2023-01-18 21:54 [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 1/3] ktest.pl: Fix missing "end_monitor" when machine check fails Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 2/3] ktest.pl: Give back console on Ctrt^C on monitor Steven Rostedt
@ 2023-01-18 21:54 ` Steven Rostedt
2023-01-20 9:10 ` [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Masami Hiramatsu
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-01-18 21:54 UTC (permalink / raw)
To: linux-kernel; +Cc: John Warthog9 Hawley, Masami Hiramatsu, stable
From: Steven Rostedt <rostedt@goodmis.org>
There is a disconnect between the run_command function and the
wait_for_input. The wait_for_input has a default timeout of 2 minutes. But
if that happens, the run_command loop will exit out to the waitpid() of
the executing command. This fails in that it no longer monitors the
command, and also, the ssh to the test box can hang when its finished, as
it's waiting for the pipe it's writing to to flush, but the loop that
reads that pipe has already exited, leaving the command stuck, and the
test hangs.
Instead, make the default "wait_for_input" of the run_command infinite,
and allow the user to override it if they want with a default timeout
option "RUN_TIMEOUT".
But this fixes the hang that happens when the pipe is full and the ssh
session never exits.
Cc: stable@vger.kernel.org
Fixes: 6e98d1b4415fe ("ktest: Add timeout to ssh command")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
tools/testing/ktest/ktest.pl | 20 ++++++++++++++++----
tools/testing/ktest/sample.conf | 5 +++++
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/tools/testing/ktest/ktest.pl b/tools/testing/ktest/ktest.pl
index 78249c3a03a5..7c91d753a9f2 100755
--- a/tools/testing/ktest/ktest.pl
+++ b/tools/testing/ktest/ktest.pl
@@ -178,6 +178,7 @@ my $store_failures;
my $store_successes;
my $test_name;
my $timeout;
+my $run_timeout;
my $connect_timeout;
my $config_bisect_exec;
my $booted_timeout;
@@ -340,6 +341,7 @@ my %option_map = (
"STORE_SUCCESSES" => \$store_successes,
"TEST_NAME" => \$test_name,
"TIMEOUT" => \$timeout,
+ "RUN_TIMEOUT" => \$run_timeout,
"CONNECT_TIMEOUT" => \$connect_timeout,
"CONFIG_BISECT_EXEC" => \$config_bisect_exec,
"BOOTED_TIMEOUT" => \$booted_timeout,
@@ -1862,6 +1864,14 @@ sub run_command {
$command =~ s/\$SSH_USER/$ssh_user/g;
$command =~ s/\$MACHINE/$machine/g;
+ if (!defined($timeout)) {
+ $timeout = $run_timeout;
+ }
+
+ if (!defined($timeout)) {
+ $timeout = -1; # tell wait_for_input to wait indefinitely
+ }
+
doprint("$command ... ");
$start_time = time;
@@ -1888,13 +1898,10 @@ sub run_command {
while (1) {
my $fp = \*CMD;
- if (defined($timeout)) {
- doprint "timeout = $timeout\n";
- }
my $line = wait_for_input($fp, $timeout);
if (!defined($line)) {
my $now = time;
- if (defined($timeout) && (($now - $start_time) >= $timeout)) {
+ if ($timeout >= 0 && (($now - $start_time) >= $timeout)) {
doprint "Hit timeout of $timeout, killing process\n";
$hit_timeout = 1;
kill 9, $pid;
@@ -2066,6 +2073,11 @@ sub wait_for_input {
$time = $timeout;
}
+ if ($time < 0) {
+ # Negative number means wait indefinitely
+ undef $time;
+ }
+
$rin = '';
vec($rin, fileno($fp), 1) = 1;
vec($rin, fileno(\*STDIN), 1) = 1;
diff --git a/tools/testing/ktest/sample.conf b/tools/testing/ktest/sample.conf
index 2d0fe15a096d..f43477a9b857 100644
--- a/tools/testing/ktest/sample.conf
+++ b/tools/testing/ktest/sample.conf
@@ -817,6 +817,11 @@
# is issued instead of a reboot.
# CONNECT_TIMEOUT = 25
+# The timeout in seconds for how long to wait for any running command
+# to timeout. If not defined, it will let it go indefinitely.
+# (default undefined)
+#RUN_TIMEOUT = 600
+
# In between tests, a reboot of the box may occur, and this
# is the time to wait for the console after it stops producing
# output. Some machines may not produce a large lag on reboot
--
2.39.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console
2023-01-18 21:54 [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Steven Rostedt
` (2 preceding siblings ...)
2023-01-18 21:54 ` [for-linus][PATCH 3/3] ktest.pl: Add RUN_TIMEOUT option with default unlimited Steven Rostedt
@ 2023-01-20 9:10 ` Masami Hiramatsu
3 siblings, 0 replies; 5+ messages in thread
From: Masami Hiramatsu @ 2023-01-20 9:10 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, John Warthog9 Hawley, Masami Hiramatsu
On Wed, 18 Jan 2023 16:54:35 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> I've noticed that my ssh sessions would hang during test runs, which
> is really frustrating when you kick off a 13 hour test before going to
> bed, and the second test (1 hour into it) hangs, and you need to kick
> it off again in the morning (wasting all that time over night).
>
> I finally figured out the cause. There is a disconnect between
> the run_command that executes the test, and the "wait_for_input" that
> monitors the test. The wait_for_input has a default of 2 minute timeout
> if it doesn't see any output it returns. The run_command takes the
> empty string from wait_for_input as the test is finished, and then
> stops monitoring it, and calls waitpid() waiting for the test to
> exit.
>
> The problem is that if the test has a lot of output, it will continue
> writing into the pipe that was suppose to go to the monitor, which has
> now exited the loop. When the pipe fills up, it will not finish.
> When the test is over, it just hangs waiting for the pipe to flush
> (which never happens).
>
> To fix this, change the run_command to by default have an infinite
> run (which can be overridden by the new RUN_TIMEOUT option), and
> make the wait_for_input also wait indefinitely in this case. It now
> Now the tests will have its content continuously read and will exit
> normally.
>
> While debugging this, I also found out why you can lose stdout on
> the terminal sometimes. Especially if you hit Ctrl^C while the monitor
> is running. It was due to missing "end_monitor" which gives back the
> tty to the terminal. The first two patches fix that.
>
>
Thanks for updating. I ran the test and confirmed that the terminal
setting is recovered :)
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
BTW, I found another issue that if I didn't set up sendmail,
it doesn't recover stty. Let me send a fix.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest.git
> devel
Also, this devel branch seems not pushed yet.
Thank you,
>
> Head SHA1: aa9aba9382884554fe6a303744884866d137422d
>
>
> Steven Rostedt (3):
> ktest.pl: Fix missing "end_monitor" when machine check fails
> ktest.pl: Give back console on Ctrt^C on monitor
> ktest.pl: Add RUN_TIMEOUT option with default unlimited
>
> ----
> tools/testing/ktest/ktest.pl | 26 +++++++++++++++++++++-----
> tools/testing/ktest/sample.conf | 5 +++++
> 2 files changed, 26 insertions(+), 5 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-01-20 9:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-18 21:54 [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 1/3] ktest.pl: Fix missing "end_monitor" when machine check fails Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 2/3] ktest.pl: Give back console on Ctrt^C on monitor Steven Rostedt
2023-01-18 21:54 ` [for-linus][PATCH 3/3] ktest.pl: Add RUN_TIMEOUT option with default unlimited Steven Rostedt
2023-01-20 9:10 ` [for-linus][PATCH 0/3] ktest.pl: Fix ssh hanging and reseting of console Masami Hiramatsu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.