All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Thomas Huth <thuth@redhat.com>
Cc: qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Cleber Rosa" <crosa@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"John Snow" <jsnow@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>
Subject: Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
Date: Mon, 5 Jun 2023 15:14:35 +0100	[thread overview]
Message-ID: <ZH3tyx//NRnvKY0m@redhat.com> (raw)
In-Reply-To: <a0bb4c2a-92ad-4290-3eb8-4168b8828d76@redhat.com>

On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > Perhaps the most painful of all the GitLab CI failures we see are
> > the enforced job timeouts:
> > 
> >     "ERROR: Job failed: execution took longer than 1h15m0s seconds"
> > 
> >     https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
> > 
> > when that hits the CI log shows what has *already* run, but figuring
> > out what was currently running (or rather stuck) is an horrendously
> > difficult.
> > 
> > The initial meson port disabled the meson test timeouts, in order to
> > limit the scope for introducing side effects from the port that would
> > complicate adoption.
> > 
> > Now that the meson port is basically finished we can take advantage of
> > more of its improved features. It has the ability to set timeouts for
> > test programs, defaulting to 30 seconds, but overridable per test. This
> > is further helped by fact that we changed the iotests integration so
> > that each iotests was a distinct meson test, instead of having one
> > single giant (slow) test.
> > 
> > We already set overrides for a bunch of tests, but they've not been
> > kept up2date since we had timeouts disabled. So this series first
> > updates the timeout overrides such that all tests pass when run in
> > my test gitlab CI pipeline. Then it enables use of meson timeouts.
> > 
> > We might still hit timeouts due to non-deterministic performance of
> > gitlab CI runners. So we'll probably have to increase a few more
> > timeouts in the short term. Fortunately this is going to be massively
> > easier to diagnose. For example this job during my testing:
> > 
> >     https://gitlab.com/berrange/qemu/-/jobs/4392029495
> > 
> > we can immediately see  the problem tests
> > 
> > Summary of Failures:
> >    6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                TIMEOUT        120.02s   killed by signal 15 SIGTERM
> >    7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          TIMEOUT        120.03s   killed by signal 15 SIGTERM
> >   64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  TIMEOUT        300.03s   killed by signal 15 SIGTERM
> > 
> > The full meson testlog.txt will show each individual TAP log output,
> > so we can then see exactly which test case we got stuck on.
> > 
> > NB, the artifacts are missing on the job links above, until this
> > patch merges:
> > 
> >     https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
> > 
> > NB, this series sets the migration-test timeout to 5 minutes, which
> > is only valid if this series is merged to make the migration test
> > not suck:
> > 
> >    https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
> > 
> > without that series, we'll need to set the migration-test timeout to
> > 30 minutes instead.
> > 
> > Daniel P. Berrangé (6):
> >    qtest: bump min meson timeout to 60 seconds
> >    qtest: bump migration-test timeout to 5 minutes
> >    qtest: bump qom-test timeout to 7 minutes
> >    qtest: bump aspeed_smc-test timeout to 2 minutes
> >    qtest: bump bios-table-test timeout to 6 minutes
> >    mtest2make: stop disabling meson test timeouts
> > 
> >   scripts/mtest2make.py   |  3 ++-
> >   tests/qtest/meson.build | 16 ++++++----------
> >   2 files changed, 8 insertions(+), 11 deletions(-)
> 
> FWIW, I now ran this on my rather old laptop with an --enable-debug
> build with "make -j$(nproc) check-qtest" and got these additional
> failures (beside the expected migration-test that still needs its
> final speedup):
> 
>  qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
>  qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
>  qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
>  qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
>  qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM

Did you see any others in the 45-60 second time window, as those would
be candidates for increases too - don't want to have things right below
the 60 second cutoff ?

> When I run them manually without the timeout patch, I get these
> values:
> 
>  qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
>  qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
>  qtest-arm/qom-test                 OK   621.64s   89 subtests passed
>  qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
>  qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
>  qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
>  qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed
> 
>  HTH,
>   Thomas
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-06-05 14:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01 16:31 [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 1/6] qtest: bump min meson timeout to 60 seconds Daniel P. Berrangé
2023-06-01 18:38   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 2/6] qtest: bump migration-test timeout to 5 minutes Daniel P. Berrangé
2023-06-01 18:39   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 3/6] qtest: bump qom-test timeout to 7 minutes Daniel P. Berrangé
2023-06-01 19:07   ` Thomas Huth
2023-06-01 16:31 ` [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes Daniel P. Berrangé
2023-06-01 19:07   ` Thomas Huth
2023-06-05 11:31   ` Thomas Huth
2023-06-05 11:48     ` Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 5/6] qtest: bump bios-table-test timeout to 6 minutes Daniel P. Berrangé
2023-06-01 19:10   ` Thomas Huth
2023-06-05 11:37   ` Thomas Huth
2023-06-05 11:49     ` Daniel P. Berrangé
2023-06-01 16:31 ` [PATCH 6/6] mtest2make: stop disabling meson test timeouts Daniel P. Berrangé
2023-06-01 19:15   ` Thomas Huth
2023-06-01 18:44 ` [PATCH 0/6] tests: enable meson test timeouts to improve debuggability Richard Henderson
2023-06-05 14:07 ` Thomas Huth
2023-06-05 14:14   ` Daniel P. Berrangé [this message]
2023-06-05 15:36     ` Thomas Huth
2023-06-05 15:45   ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZH3tyx//NRnvKY0m@redhat.com \
    --to=berrange@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=crosa@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.