qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
@ 2023-06-01 16:31 Daniel P. Berrangé
  2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

Perhaps the most painful of all the GitLab CI failures we see are
the enforced job timeouts:

   "ERROR: Job failed: execution took longer than 1h15m0s seconds"

   https://gitlab.com/qemu-project/qemu/-/jobs/4387047648

when that hits the CI log shows what has *already* run, but figuring
out what was currently running (or rather stuck) is an horrendously
difficult.

The initial meson port disabled the meson test timeouts, in order to
limit the scope for introducing side effects from the port that would
complicate adoption.

Now that the meson port is basically finished we can take advantage of
more of its improved features. It has the ability to set timeouts for
test programs, defaulting to 30 seconds, but overridable per test. This
is further helped by fact that we changed the iotests integration so
that each iotests was a distinct meson test, instead of having one
single giant (slow) test.

We already set overrides for a bunch of tests, but they've not been
kept up2date since we had timeouts disabled. So this series first
updates the timeout overrides such that all tests pass when run in
my test gitlab CI pipeline. Then it enables use of meson timeouts.

We might still hit timeouts due to non-deterministic performance of
gitlab CI runners. So we'll probably have to increase a few more
timeouts in the short term. Fortunately this is going to be massively
easier to diagnose. For example this job during my testing:

   https://gitlab.com/berrange/qemu/-/jobs/4392029495

we can immediately see  the problem tests

Summary of Failures:
  6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                TIMEOUT        120.02s   killed by signal 15 SIGTERM
  7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          TIMEOUT        120.03s   killed by signal 15 SIGTERM
 64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  TIMEOUT        300.03s   killed by signal 15 SIGTERM

The full meson testlog.txt will show each individual TAP log output,
so we can then see exactly which test case we got stuck on.

NB, the artifacts are missing on the job links above, until this
patch merges:

   https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html

NB, this series sets the migration-test timeout to 5 minutes, which
is only valid if this series is merged to make the migration test
not suck:

  https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html

without that series, we'll need to set the migration-test timeout to
30 minutes instead.

Daniel P. Berrangé (6):
  qtest: bump min meson timeout to 60 seconds
  qtest: bump migration-test timeout to 5 minutes
  qtest: bump qom-test timeout to 7 minutes
  qtest: bump aspeed_smc-test timeout to 2 minutes
  qtest: bump bios-table-test timeout to 6 minutes
  mtest2make: stop disabling meson test timeouts

 scripts/mtest2make.py   |  3 ++-
 tests/qtest/meson.build | 16 ++++++----------
 2 files changed, 8 insertions(+), 11 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] qtest: bump min meson timeout to 60 seconds
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 18:38   ` Thomas Huth
  2023-06-01 16:31 ` [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes Daniel P. Berrangé
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

Even some of the relatively fast qtests can sometimes hit the 30 second
timeout in GitLab CI under high parallelism/load conditions. Bump the
min to 60 seconds to give a higher margin for reliability.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/meson.build | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 087f2dc9d7..18d046b53c 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,12 +1,7 @@
 slow_qtests = {
-  'ahci-test' : 60,
   'bios-tables-test' : 120,
-  'boot-serial-test' : 60,
   'migration-test' : 150,
   'npcm7xx_pwm-test': 150,
-  'prom-env-test' : 60,
-  'pxe-test' : 60,
-  'qos-test' : 60,
   'qom-test' : 300,
   'test-hmp' : 120,
 }
@@ -373,8 +368,8 @@ foreach dir : target_dirs
          env: qtest_env,
          args: ['--tap', '-k'],
          protocol: 'tap',
-         timeout: slow_qtests.get(test, 30),
-         priority: slow_qtests.get(test, 30),
+         timeout: slow_qtests.get(test, 60),
+         priority: slow_qtests.get(test, 60),
          suite: ['qtest', 'qtest-' + target_base])
   endforeach
 endforeach
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
  2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 18:39   ` Thomas Huth
  2023-06-01 16:31 ` [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes Daniel P. Berrangé
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

The migration test should take between 1 min 30 and 2 mins on reasonably
modern hardware. The test is not especially compute bound, rather its
running time is dominated by the guest RAM size relative to the
bandwidth cap, which forces each iteration to take at least 30 seconds.
None the less under high load conditions with multiple QEMU processes
spawned and competing with other parallel tests, the worst case running
time might be somewhat extended. Bumping the timeout to 5 minutes gives
us good headroom, while still catching stuck tests relatively quickly.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 18d046b53c..6684591fcf 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,6 +1,6 @@
 slow_qtests = {
   'bios-tables-test' : 120,
-  'migration-test' : 150,
+  'migration-test' : 300,
   'npcm7xx_pwm-test': 150,
   'qom-test' : 300,
   'test-hmp' : 120,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
  2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
  2023-06-01 16:31 ` [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 19:07   ` Thomas Huth
  2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

The qom-test is periodically hitting the 5 minute timeout when running
on the aarch64 emulator under GitLab CI. Add another 2 minutes headroom
to the timeout to improve reliability.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6684591fcf..6943bbfdd5 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -2,7 +2,7 @@ slow_qtests = {
   'bios-tables-test' : 120,
   'migration-test' : 300,
   'npcm7xx_pwm-test': 150,
-  'qom-test' : 300,
+  'qom-test' : 420,
   'test-hmp' : 120,
 }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2023-06-01 16:31 ` [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 19:07   ` Thomas Huth
  2023-06-05 11:31   ` Thomas Huth
  2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

On a reasonably modern laptop this test takes 40 seconds with the arm
emulator. Raising the timeout to 2 minutes gives greater headroom for
slowdown under GitLab CI.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6943bbfdd5..d9fa30edbc 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,4 +1,5 @@
 slow_qtests = {
+  'aspeed_smc-test': 120,
   'bios-tables-test' : 120,
   'migration-test' : 300,
   'npcm7xx_pwm-test': 150,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 19:10   ` Thomas Huth
  2023-06-05 11:37   ` Thomas Huth
  2023-06-01 16:31 ` [PATCH 6/6] mtest2make: stop disabling meson test timeouts Daniel P. Berrangé
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

This is reliably hitting the current 2 minute timeout in GitLab CI
for the TCI job, and even hits a 4 minute timeout. At 6 minutes it
looks sufficiently reliable.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index d9fa30edbc..4f45369421 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,6 +1,6 @@
 slow_qtests = {
   'aspeed_smc-test': 120,
-  'bios-tables-test' : 120,
+  'bios-tables-test' : 360,
   'migration-test' : 300,
   'npcm7xx_pwm-test': 150,
   'qom-test' : 420,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] mtest2make: stop disabling meson test timeouts
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
@ 2023-06-01 16:31 ` Daniel P. Berrangé
  2023-06-01 19:15   ` Thomas Huth
  2023-06-01 18:44 ` [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Richard Henderson
  2023-06-05 14:07 ` Thomas Huth
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth, Daniel P. Berrangé

The mtest2make.py script passes the arg '-t 0' to 'meson test' which
disables all test timeouts. This is a major source of pain when running
in GitLab CI and a test gets stuck. It will stall until GitLab kills the
CI job. This leaves us with little easily consumable information about
the stalled test. The TAP format doesn't show the test name until it is
completed, and TAP output from multiple tests it interleaved. So we
have to analyse the log to figure out what tests had un-finished TAP
output present and thus infer which test case caused the hang. This is
very time consuming and error prone.

By allowing meson to kill stalled tests, we get a direct display of what
test program got stuck, which lets us more directly focus in on what
specific test case within the test program hung.

The other issue with disabling meson test timeouts by default is that it
makes it more likely that maintainers inadvertantly introduce slowdowns.
For example the recent-ish change that accidentally made migrate-test
take 15-20 minutes instead of around 1 minute.

The main risk of this change is that the individual test timeouts might
be too short to allow completion in high load scenarios. Thus, there is
likely to be some short term pain where we have to bump the timeouts for
certain tests to make them reliable enough. The preceeding few patches
raised the timeouts for all failures that were immediately apparent
in GitLab CI.

Even with the possible short term instability, this should still be a
net win for debuggability of failed CI pipelines over the long term.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 scripts/mtest2make.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
index 179dd54871..eb01a05ddb 100644
--- a/scripts/mtest2make.py
+++ b/scripts/mtest2make.py
@@ -27,7 +27,8 @@ def names(self, base):
 .speed.slow = $(foreach s,$(sort $(filter-out %-thorough, $1)), --suite $s)
 .speed.thorough = $(foreach s,$(sort $1), --suite $s)
 
-.mtestargs = --no-rebuild -t 0
+TIMEOUT_MULTIPLIER = 1
+.mtestargs = --no-rebuild -t $(TIMEOUT_MULTIPLIER)
 ifneq ($(SPEED), quick)
 .mtestargs += --setup $(SPEED)
 endif
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] qtest: bump min meson timeout to 60 seconds
  2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
@ 2023-06-01 18:38   ` Thomas Huth
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 18:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> Even some of the relatively fast qtests can sometimes hit the 30 second
> timeout in GitLab CI under high parallelism/load conditions. Bump the
> min to 60 seconds to give a higher margin for reliability.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 087f2dc9d7..18d046b53c 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,12 +1,7 @@
>   slow_qtests = {
> -  'ahci-test' : 60,
>     'bios-tables-test' : 120,
> -  'boot-serial-test' : 60,
>     'migration-test' : 150,
>     'npcm7xx_pwm-test': 150,
> -  'prom-env-test' : 60,
> -  'pxe-test' : 60,
> -  'qos-test' : 60,
>     'qom-test' : 300,
>     'test-hmp' : 120,
>   }
> @@ -373,8 +368,8 @@ foreach dir : target_dirs
>            env: qtest_env,
>            args: ['--tap', '-k'],
>            protocol: 'tap',
> -         timeout: slow_qtests.get(test, 30),
> -         priority: slow_qtests.get(test, 30),
> +         timeout: slow_qtests.get(test, 60),
> +         priority: slow_qtests.get(test, 60),
>            suite: ['qtest', 'qtest-' + target_base])
>     endforeach
>   endforeach

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes
  2023-06-01 16:31 ` [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes Daniel P. Berrangé
@ 2023-06-01 18:39   ` Thomas Huth
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 18:39 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> The migration test should take between 1 min 30 and 2 mins on reasonably
> modern hardware. The test is not especially compute bound, rather its
> running time is dominated by the guest RAM size relative to the
> bandwidth cap, which forces each iteration to take at least 30 seconds.
> None the less under high load conditions with multiple QEMU processes
> spawned and competing with other parallel tests, the worst case running
> time might be somewhat extended. Bumping the timeout to 5 minutes gives
> us good headroom, while still catching stuck tests relatively quickly.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 18d046b53c..6684591fcf 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,6 +1,6 @@
>   slow_qtests = {
>     'bios-tables-test' : 120,
> -  'migration-test' : 150,
> +  'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,
>     'qom-test' : 300,
>     'test-hmp' : 120,

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
                   ` (5 preceding siblings ...)
  2023-06-01 16:31 ` [PATCH 6/6] mtest2make: stop disabling meson test timeouts Daniel P. Berrangé
@ 2023-06-01 18:44 ` Richard Henderson
  2023-06-05 14:07 ` Thomas Huth
  7 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2023-06-01 18:44 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier, Thomas Huth

On 6/1/23 09:31, Daniel P. Berrangé wrote:
> Daniel P. Berrangé (6):
>    qtest: bump min meson timeout to 60 seconds
>    qtest: bump migration-test timeout to 5 minutes
>    qtest: bump qom-test timeout to 7 minutes
>    qtest: bump aspeed_smc-test timeout to 2 minutes
>    qtest: bump bios-table-test timeout to 6 minutes
>    mtest2make: stop disabling meson test timeouts

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes
  2023-06-01 16:31 ` [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes Daniel P. Berrangé
@ 2023-06-01 19:07   ` Thomas Huth
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 19:07 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> The qom-test is periodically hitting the 5 minute timeout when running
> on the aarch64 emulator under GitLab CI. Add another 2 minutes headroom
> to the timeout to improve reliability.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 6684591fcf..6943bbfdd5 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -2,7 +2,7 @@ slow_qtests = {
>     'bios-tables-test' : 120,
>     'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,
> -  'qom-test' : 300,
> +  'qom-test' : 420,
>     'test-hmp' : 120,
>   }


Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes
  2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
@ 2023-06-01 19:07   ` Thomas Huth
  2023-06-05 11:31   ` Thomas Huth
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 19:07 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> On a reasonably modern laptop this test takes 40 seconds with the arm
> emulator. Raising the timeout to 2 minutes gives greater headroom for
> slowdown under GitLab CI.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 6943bbfdd5..d9fa30edbc 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,4 +1,5 @@
>   slow_qtests = {
> +  'aspeed_smc-test': 120,
>     'bios-tables-test' : 120,
>     'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes
  2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
@ 2023-06-01 19:10   ` Thomas Huth
  2023-06-05 11:37   ` Thomas Huth
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 19:10 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> This is reliably hitting the current 2 minute timeout in GitLab CI
> for the TCI job, and even hits a 4 minute timeout.

That sentence is somewhat hard to parse... maybe rather:

This is reliably hitting the current 2 minute timeout in GitLab CI, and for 
the TCI job, it even hits a 4 minute timeout.

?

> At 6 minutes it
> looks sufficiently reliable.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index d9fa30edbc..4f45369421 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,6 +1,6 @@
>   slow_qtests = {
>     'aspeed_smc-test': 120,
> -  'bios-tables-test' : 120,
> +  'bios-tables-test' : 360,
>     'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,
>     'qom-test' : 420,

With the commit description updated:
Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] mtest2make: stop disabling meson test timeouts
  2023-06-01 16:31 ` [PATCH 6/6] mtest2make: stop disabling meson test timeouts Daniel P. Berrangé
@ 2023-06-01 19:15   ` Thomas Huth
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-01 19:15 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> The mtest2make.py script passes the arg '-t 0' to 'meson test' which
> disables all test timeouts. This is a major source of pain when running
> in GitLab CI and a test gets stuck. It will stall until GitLab kills the
> CI job. This leaves us with little easily consumable information about
> the stalled test. The TAP format doesn't show the test name until it is
> completed, and TAP output from multiple tests it interleaved. So we
> have to analyse the log to figure out what tests had un-finished TAP
> output present and thus infer which test case caused the hang. This is
> very time consuming and error prone.
> 
> By allowing meson to kill stalled tests, we get a direct display of what
> test program got stuck, which lets us more directly focus in on what
> specific test case within the test program hung.
> 
> The other issue with disabling meson test timeouts by default is that it
> makes it more likely that maintainers inadvertantly introduce slowdowns.
> For example the recent-ish change that accidentally made migrate-test
> take 15-20 minutes instead of around 1 minute.
> 
> The main risk of this change is that the individual test timeouts might
> be too short to allow completion in high load scenarios. Thus, there is
> likely to be some short term pain where we have to bump the timeouts for
> certain tests to make them reliable enough. The preceeding few patches
> raised the timeouts for all failures that were immediately apparent
> in GitLab CI.
> 
> Even with the possible short term instability, this should still be a
> net win for debuggability of failed CI pipelines over the long term.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   scripts/mtest2make.py | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
> index 179dd54871..eb01a05ddb 100644
> --- a/scripts/mtest2make.py
> +++ b/scripts/mtest2make.py
> @@ -27,7 +27,8 @@ def names(self, base):
>   .speed.slow = $(foreach s,$(sort $(filter-out %-thorough, $1)), --suite $s)
>   .speed.thorough = $(foreach s,$(sort $1), --suite $s)
>   
> -.mtestargs = --no-rebuild -t 0
> +TIMEOUT_MULTIPLIER = 1
> +.mtestargs = --no-rebuild -t $(TIMEOUT_MULTIPLIER)
>   ifneq ($(SPEED), quick)
>   .mtestargs += --setup $(SPEED)
>   endif

Basically Ack, but could you please double-check that "make check 
-j$(nproc)" still works if configure has been run with "--enable-debug" ? 
... maybe we need to adjust the multiplier in that case...

  Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes
  2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
  2023-06-01 19:07   ` Thomas Huth
@ 2023-06-05 11:31   ` Thomas Huth
  2023-06-05 11:48     ` Daniel P. Berrangé
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Huth @ 2023-06-05 11:31 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> On a reasonably modern laptop this test takes 40 seconds with the arm
> emulator. Raising the timeout to 2 minutes gives greater headroom for
> slowdown under GitLab CI.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 6943bbfdd5..d9fa30edbc 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,4 +1,5 @@
>   slow_qtests = {
> +  'aspeed_smc-test': 120,
>     'bios-tables-test' : 120,
>     'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,

I gave it a try in the CI and this one was failing for me:

  https://gitlab.com/thuth/qemu/-/jobs/4412460476

Looking at an earlier run, it needed 146 seconds to finish:

  https://gitlab.com/thuth/qemu/-/jobs/4411373417#L4954

  Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes
  2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
  2023-06-01 19:10   ` Thomas Huth
@ 2023-06-05 11:37   ` Thomas Huth
  2023-06-05 11:49     ` Daniel P. Berrangé
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Huth @ 2023-06-05 11:37 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> This is reliably hitting the current 2 minute timeout in GitLab CI
> for the TCI job, and even hits a 4 minute timeout. At 6 minutes it
> looks sufficiently reliable.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/meson.build | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index d9fa30edbc..4f45369421 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,6 +1,6 @@
>   slow_qtests = {
>     'aspeed_smc-test': 120,
> -  'bios-tables-test' : 120,
> +  'bios-tables-test' : 360,
>     'migration-test' : 300,
>     'npcm7xx_pwm-test': 150,
>     'qom-test' : 420,

Seems not to be enough for the CI (in the TCI job which is very slow in 
running the tests):

  https://gitlab.com/thuth/qemu/-/jobs/4412460439

An earlier run needed 403 seconds here already:

  https://gitlab.com/thuth/qemu/-/jobs/4411373383

  Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes
  2023-06-05 11:31   ` Thomas Huth
@ 2023-06-05 11:48     ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-05 11:48 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Alex Bennée, Cleber Rosa, Paolo Bonzini,
	John Snow, Laurent Vivier

On Mon, Jun 05, 2023 at 01:31:36PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > On a reasonably modern laptop this test takes 40 seconds with the arm
> > emulator. Raising the timeout to 2 minutes gives greater headroom for
> > slowdown under GitLab CI.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   tests/qtest/meson.build | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > index 6943bbfdd5..d9fa30edbc 100644
> > --- a/tests/qtest/meson.build
> > +++ b/tests/qtest/meson.build
> > @@ -1,4 +1,5 @@
> >   slow_qtests = {
> > +  'aspeed_smc-test': 120,
> >     'bios-tables-test' : 120,
> >     'migration-test' : 300,
> >     'npcm7xx_pwm-test': 150,
> 
> I gave it a try in the CI and this one was failing for me:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/4412460476
> 
> Looking at an earlier run, it needed 146 seconds to finish:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/4411373417#L4954

Ok, lets make it 4 minutes then.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes
  2023-06-05 11:37   ` Thomas Huth
@ 2023-06-05 11:49     ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-05 11:49 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Alex Bennée, Cleber Rosa, Paolo Bonzini,
	John Snow, Laurent Vivier

On Mon, Jun 05, 2023 at 01:37:39PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > This is reliably hitting the current 2 minute timeout in GitLab CI
> > for the TCI job, and even hits a 4 minute timeout. At 6 minutes it
> > looks sufficiently reliable.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   tests/qtest/meson.build | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > index d9fa30edbc..4f45369421 100644
> > --- a/tests/qtest/meson.build
> > +++ b/tests/qtest/meson.build
> > @@ -1,6 +1,6 @@
> >   slow_qtests = {
> >     'aspeed_smc-test': 120,
> > -  'bios-tables-test' : 120,
> > +  'bios-tables-test' : 360,
> >     'migration-test' : 300,
> >     'npcm7xx_pwm-test': 150,
> >     'qom-test' : 420,
> 
> Seems not to be enough for the CI (in the TCI job which is very slow in
> running the tests):
> 
>  https://gitlab.com/thuth/qemu/-/jobs/4412460439
> 
> An earlier run needed 403 seconds here already:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/4411373383

Guess gitlab CI was less heavily loaded when i ran my many tests at the
end of last week.

9 minutes it is.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
  2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
                   ` (6 preceding siblings ...)
  2023-06-01 18:44 ` [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Richard Henderson
@ 2023-06-05 14:07 ` Thomas Huth
  2023-06-05 14:14   ` Daniel P. Berrangé
  2023-06-05 15:45   ` Peter Maydell
  7 siblings, 2 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-05 14:07 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Alex Bennée, Cleber Rosa, Paolo Bonzini, John Snow,
	Laurent Vivier

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> Perhaps the most painful of all the GitLab CI failures we see are
> the enforced job timeouts:
> 
>     "ERROR: Job failed: execution took longer than 1h15m0s seconds"
> 
>     https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
> 
> when that hits the CI log shows what has *already* run, but figuring
> out what was currently running (or rather stuck) is an horrendously
> difficult.
> 
> The initial meson port disabled the meson test timeouts, in order to
> limit the scope for introducing side effects from the port that would
> complicate adoption.
> 
> Now that the meson port is basically finished we can take advantage of
> more of its improved features. It has the ability to set timeouts for
> test programs, defaulting to 30 seconds, but overridable per test. This
> is further helped by fact that we changed the iotests integration so
> that each iotests was a distinct meson test, instead of having one
> single giant (slow) test.
> 
> We already set overrides for a bunch of tests, but they've not been
> kept up2date since we had timeouts disabled. So this series first
> updates the timeout overrides such that all tests pass when run in
> my test gitlab CI pipeline. Then it enables use of meson timeouts.
> 
> We might still hit timeouts due to non-deterministic performance of
> gitlab CI runners. So we'll probably have to increase a few more
> timeouts in the short term. Fortunately this is going to be massively
> easier to diagnose. For example this job during my testing:
> 
>     https://gitlab.com/berrange/qemu/-/jobs/4392029495
> 
> we can immediately see  the problem tests
> 
> Summary of Failures:
>    6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                TIMEOUT        120.02s   killed by signal 15 SIGTERM
>    7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          TIMEOUT        120.03s   killed by signal 15 SIGTERM
>   64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  TIMEOUT        300.03s   killed by signal 15 SIGTERM
> 
> The full meson testlog.txt will show each individual TAP log output,
> so we can then see exactly which test case we got stuck on.
> 
> NB, the artifacts are missing on the job links above, until this
> patch merges:
> 
>     https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
> 
> NB, this series sets the migration-test timeout to 5 minutes, which
> is only valid if this series is merged to make the migration test
> not suck:
> 
>    https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
> 
> without that series, we'll need to set the migration-test timeout to
> 30 minutes instead.
> 
> Daniel P. Berrangé (6):
>    qtest: bump min meson timeout to 60 seconds
>    qtest: bump migration-test timeout to 5 minutes
>    qtest: bump qom-test timeout to 7 minutes
>    qtest: bump aspeed_smc-test timeout to 2 minutes
>    qtest: bump bios-table-test timeout to 6 minutes
>    mtest2make: stop disabling meson test timeouts
> 
>   scripts/mtest2make.py   |  3 ++-
>   tests/qtest/meson.build | 16 ++++++----------
>   2 files changed, 8 insertions(+), 11 deletions(-)

FWIW, I now ran this on my rather old laptop with an --enable-debug
build with "make -j$(nproc) check-qtest" and got these additional
failures (beside the expected migration-test that still needs its
final speedup):

  qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
  qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
  qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
  qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
  qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
  qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
  qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM
  
When I run them manually without the timeout patch, I get these
values:

  qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
  qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
  qtest-arm/qom-test                 OK   621.64s   89 subtests passed
  qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
  qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
  qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
  qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed

  HTH,
   Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
  2023-06-05 14:07 ` Thomas Huth
@ 2023-06-05 14:14   ` Daniel P. Berrangé
  2023-06-05 15:36     ` Thomas Huth
  2023-06-05 15:45   ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2023-06-05 14:14 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Alex Bennée, Cleber Rosa, Paolo Bonzini,
	John Snow, Laurent Vivier

On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > Perhaps the most painful of all the GitLab CI failures we see are
> > the enforced job timeouts:
> > 
> >     "ERROR: Job failed: execution took longer than 1h15m0s seconds"
> > 
> >     https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
> > 
> > when that hits the CI log shows what has *already* run, but figuring
> > out what was currently running (or rather stuck) is an horrendously
> > difficult.
> > 
> > The initial meson port disabled the meson test timeouts, in order to
> > limit the scope for introducing side effects from the port that would
> > complicate adoption.
> > 
> > Now that the meson port is basically finished we can take advantage of
> > more of its improved features. It has the ability to set timeouts for
> > test programs, defaulting to 30 seconds, but overridable per test. This
> > is further helped by fact that we changed the iotests integration so
> > that each iotests was a distinct meson test, instead of having one
> > single giant (slow) test.
> > 
> > We already set overrides for a bunch of tests, but they've not been
> > kept up2date since we had timeouts disabled. So this series first
> > updates the timeout overrides such that all tests pass when run in
> > my test gitlab CI pipeline. Then it enables use of meson timeouts.
> > 
> > We might still hit timeouts due to non-deterministic performance of
> > gitlab CI runners. So we'll probably have to increase a few more
> > timeouts in the short term. Fortunately this is going to be massively
> > easier to diagnose. For example this job during my testing:
> > 
> >     https://gitlab.com/berrange/qemu/-/jobs/4392029495
> > 
> > we can immediately see  the problem tests
> > 
> > Summary of Failures:
> >    6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                TIMEOUT        120.02s   killed by signal 15 SIGTERM
> >    7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          TIMEOUT        120.03s   killed by signal 15 SIGTERM
> >   64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  TIMEOUT        300.03s   killed by signal 15 SIGTERM
> > 
> > The full meson testlog.txt will show each individual TAP log output,
> > so we can then see exactly which test case we got stuck on.
> > 
> > NB, the artifacts are missing on the job links above, until this
> > patch merges:
> > 
> >     https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
> > 
> > NB, this series sets the migration-test timeout to 5 minutes, which
> > is only valid if this series is merged to make the migration test
> > not suck:
> > 
> >    https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
> > 
> > without that series, we'll need to set the migration-test timeout to
> > 30 minutes instead.
> > 
> > Daniel P. Berrangé (6):
> >    qtest: bump min meson timeout to 60 seconds
> >    qtest: bump migration-test timeout to 5 minutes
> >    qtest: bump qom-test timeout to 7 minutes
> >    qtest: bump aspeed_smc-test timeout to 2 minutes
> >    qtest: bump bios-table-test timeout to 6 minutes
> >    mtest2make: stop disabling meson test timeouts
> > 
> >   scripts/mtest2make.py   |  3 ++-
> >   tests/qtest/meson.build | 16 ++++++----------
> >   2 files changed, 8 insertions(+), 11 deletions(-)
> 
> FWIW, I now ran this on my rather old laptop with an --enable-debug
> build with "make -j$(nproc) check-qtest" and got these additional
> failures (beside the expected migration-test that still needs its
> final speedup):
> 
>  qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
>  qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
>  qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
>  qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
>  qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM

Did you see any others in the 45-60 second time window, as those would
be candidates for increases too - don't want to have things right below
the 60 second cutoff ?

> When I run them manually without the timeout patch, I get these
> values:
> 
>  qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
>  qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
>  qtest-arm/qom-test                 OK   621.64s   89 subtests passed
>  qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
>  qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
>  qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
>  qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed
> 
>  HTH,
>   Thomas
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
  2023-06-05 14:14   ` Daniel P. Berrangé
@ 2023-06-05 15:36     ` Thomas Huth
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Huth @ 2023-06-05 15:36 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Alex Bennée, Cleber Rosa, Paolo Bonzini,
	John Snow, Laurent Vivier

On 05/06/2023 16.14, Daniel P. Berrangé wrote:
> On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote:
>> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
>>> Perhaps the most painful of all the GitLab CI failures we see are
>>> the enforced job timeouts:
>>>
>>>      "ERROR: Job failed: execution took longer than 1h15m0s seconds"
>>>
>>>      https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
>>>
>>> when that hits the CI log shows what has *already* run, but figuring
>>> out what was currently running (or rather stuck) is an horrendously
>>> difficult.
>>>
>>> The initial meson port disabled the meson test timeouts, in order to
>>> limit the scope for introducing side effects from the port that would
>>> complicate adoption.
>>>
>>> Now that the meson port is basically finished we can take advantage of
>>> more of its improved features. It has the ability to set timeouts for
>>> test programs, defaulting to 30 seconds, but overridable per test. This
>>> is further helped by fact that we changed the iotests integration so
>>> that each iotests was a distinct meson test, instead of having one
>>> single giant (slow) test.
>>>
>>> We already set overrides for a bunch of tests, but they've not been
>>> kept up2date since we had timeouts disabled. So this series first
>>> updates the timeout overrides such that all tests pass when run in
>>> my test gitlab CI pipeline. Then it enables use of meson timeouts.
>>>
>>> We might still hit timeouts due to non-deterministic performance of
>>> gitlab CI runners. So we'll probably have to increase a few more
>>> timeouts in the short term. Fortunately this is going to be massively
>>> easier to diagnose. For example this job during my testing:
>>>
>>>      https://gitlab.com/berrange/qemu/-/jobs/4392029495
>>>
>>> we can immediately see  the problem tests
>>>
>>> Summary of Failures:
>>>     6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                TIMEOUT        120.02s   killed by signal 15 SIGTERM
>>>     7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          TIMEOUT        120.03s   killed by signal 15 SIGTERM
>>>    64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  TIMEOUT        300.03s   killed by signal 15 SIGTERM
>>>
>>> The full meson testlog.txt will show each individual TAP log output,
>>> so we can then see exactly which test case we got stuck on.
>>>
>>> NB, the artifacts are missing on the job links above, until this
>>> patch merges:
>>>
>>>      https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
>>>
>>> NB, this series sets the migration-test timeout to 5 minutes, which
>>> is only valid if this series is merged to make the migration test
>>> not suck:
>>>
>>>     https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
>>>
>>> without that series, we'll need to set the migration-test timeout to
>>> 30 minutes instead.
>>>
>>> Daniel P. Berrangé (6):
>>>     qtest: bump min meson timeout to 60 seconds
>>>     qtest: bump migration-test timeout to 5 minutes
>>>     qtest: bump qom-test timeout to 7 minutes
>>>     qtest: bump aspeed_smc-test timeout to 2 minutes
>>>     qtest: bump bios-table-test timeout to 6 minutes
>>>     mtest2make: stop disabling meson test timeouts
>>>
>>>    scripts/mtest2make.py   |  3 ++-
>>>    tests/qtest/meson.build | 16 ++++++----------
>>>    2 files changed, 8 insertions(+), 11 deletions(-)
>>
>> FWIW, I now ran this on my rather old laptop with an --enable-debug
>> build with "make -j$(nproc) check-qtest" and got these additional
>> failures (beside the expected migration-test that still needs its
>> final speedup):
>>
>>   qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
>>   qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
>>   qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
>>   qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
>>   qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
>>   qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
>>   qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM
> 
> Did you see any others in the 45-60 second time window, as those would
> be candidates for increases too - don't want to have things right below
> the 60 second cutoff ?

The qos-test on arm likely needs some adjustment, too:

  qtest-arm/qos-test                    OK     40.72s   115 subtests passed
  qtest-aarch64/qos-test                OK     54.00s   108 subtests passed

  Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
  2023-06-05 14:07 ` Thomas Huth
  2023-06-05 14:14   ` Daniel P. Berrangé
@ 2023-06-05 15:45   ` Peter Maydell
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2023-06-05 15:45 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Daniel P. Berrangé, qemu-devel, Alex Bennée,
	Cleber Rosa, Paolo Bonzini, John Snow, Laurent Vivier,
	Tyrone Ting, Hao Wu

On Mon, 5 Jun 2023 at 15:08, Thomas Huth <thuth@redhat.com> wrote:
> When I run them manually without the timeout patch, I get these
> values:
>
>   qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
>   qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
>   qtest-arm/qom-test                 OK   621.64s   89 subtests passed
>   qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
>   qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
>   qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
>   qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed

The qom-tests being slow is kind of expected, since they have
to go through every board and device, and there are a lot of
arm boards and devices. That npcm7xx_pwm-test runtime is a bit OTT
though for something testing a single device.

Would one of the Nuvoton maintainers like to take a look at
that test and make it a bit less of a hog of runtime ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-05 15:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
2023-06-01 18:38   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes Daniel P. Berrangé
2023-06-01 18:39   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes Daniel P. Berrangé
2023-06-01 19:07   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
2023-06-01 19:07   ` Thomas Huth
2023-06-05 11:31   ` Thomas Huth
2023-06-05 11:48     ` Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
2023-06-01 19:10   ` Thomas Huth
2023-06-05 11:37   ` Thomas Huth
2023-06-05 11:49     ` Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 6/6] mtest2make: stop disabling meson test timeouts Daniel P. Berrangé
2023-06-01 19:15   ` Thomas Huth
2023-06-01 18:44 ` [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Richard Henderson
2023-06-05 14:07 ` Thomas Huth
2023-06-05 14:14   ` Daniel P. Berrangé
2023-06-05 15:36     ` Thomas Huth
2023-06-05 15:45   ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).