qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Beraldo Leal" <bleal@redhat.com>
Subject: Re: [PATCH] gitlab: remove unreliable avocado CI jobs
Date: Tue, 12 Sep 2023 18:39:29 +0100	[thread overview]
Message-ID: <87y1hbtf09.fsf@linaro.org> (raw)
In-Reply-To: <CAJSP0QX09DUU1GQNLBW2ZkAsiR2HNC03+ZohmOZpwJDq04fz3Q@mail.gmail.com>


Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé <berrange@redhat.com> wrote:
>
>  On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote:
>  > 
>  > Daniel P. Berrangé <berrange@redhat.com> writes:
>  > 
>  > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote:
>  > >> The avocado-system-alpine, avocado-system-fedora, and
>  > >> avocado-system-ubuntu jobs are unreliable. I identified them while
>  > >> looking over CI failures from the past week:
>  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
>  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
>  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571
>  > >> 
>  > >> Thomas Huth suggest on IRC today that there may be a legitimate failure
>  > >> in there:
>  > >> 
>  > >>   th_huth: f4bug, yes, seems like it does not start at all correctly on
>  > >>   alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody
>  > >>   noticed this by now, this is worrying
>  > >> 
>  > >> It crept in because the jobs were already unreliable.
>  > >> 
>  > >> I don't know how to interpret the job output, so all I can do is to
>  > >> propose removing these jobs. A useful CI job has two outcomes: pass or
>  > >> fail. Timeouts and other in-between states are not useful because they
>  > >> require constant triaging by someone who understands the details of the
>  > >> tests and they can occur when run against pull requests that have
>  > >> nothing to do with the area covered by the test.
>  > >> 
>  > >> Hopefully test owners will be able to identify the root causes and solve
>  > >> them so that these jobs can stay. In their current state the jobs are
>  > >> not useful since I cannot cannot tell whether job failures are real or
>  > >> just intermittent when merging qemu.git pull requests.
>  > >> 
>  > >> If you are a test owner, please take a look.
>  > >> 
>  > >> It is likely that other avocado-system-* CI jobs have similar failures
>  > >> from time to time, but I'll leave them as long as they are passing.
>  > >> 
>  > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
>  > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>  > >> ---
>  > >>  .gitlab-ci.d/buildtest.yml | 27 ---------------------------
>  > >>  1 file changed, 27 deletions(-)
>  > >> 
>  > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>  > >> index aee9101507..83ce448c4d 100644
>  > >> --- a/.gitlab-ci.d/buildtest.yml
>  > >> +++ b/.gitlab-ci.d/buildtest.yml
>  > >> @@ -22,15 +22,6 @@ check-system-alpine:
>  > >>      IMAGE: alpine
>  > >>      MAKE_CHECK_ARGS: check-unit check-qtest
>  > >>  
>  > >> -avocado-system-alpine:
>  > >> -  extends: .avocado_test_job_template
>  > >> -  needs:
>  > >> -    - job: build-system-alpine
>  > >> -      artifacts: true
>  > >> -  variables:
>  > >> -    IMAGE: alpine
>  > >> -    MAKE_CHECK_ARGS: check-avocado
>  > >
>  > > Instead of entirely deleting, I'd suggest adding
>  > >
>  > >    # Disabled due to frequent random failures
>  > >    # https://gitlab.com/qemu-project/qemu/-/issues/1884
>  > >    when: manual
>  > >
>  > > See example: https://docs.gitlab.com/ee/ci/yaml/#when
>  > >
>  > > This disables the job from running unless someone explicitly
>  > > tells it to run
>  > 
>  > What I don't understand is why we didn't gate the release back when they
>  > first tripped. We should have noticed between:
>  > 
>  >   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
>  > 
>  > and
>  > 
>  >   https://gitlab.com/qemu-project/qemu/-/pipelines/957154381
>  > 
>  > that the system tests where regressing. Yet we merged the changes
>  > anyway.
>
>  I think that green series is misleading, based on Richard's
>  mail on list wrt the TCG pull series:
>
>    https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html
>
>    "It's some sort of timing issue, which sometimes goes away
>     when re-run. I was re-running tests *a lot* in order to
>     get them to go green while running the 8.1 release. "

But I think in that actual case a change exposed a race condition which
has only recently been fixed - however we've had additional regresssions
since.

Rather than kill the system tests we can disable the flaky individual
tests in avocado. 

>
>  Essentially I'd put this down to the tests being soo non-deterministic
>  that we've given up trusting them.
>
> Yes.
>
> Stefan
>
>  With regards,
>  Daniel
>  -- 
>  |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>  |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>  |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


  reply	other threads:[~2023-09-12 18:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12 15:06 [PATCH] gitlab: remove unreliable avocado CI jobs Stefan Hajnoczi
2023-09-12 15:20 ` Daniel P. Berrangé
2023-09-12 16:01   ` Alex Bennée
2023-09-12 16:14     ` Daniel P. Berrangé
2023-09-12 16:19       ` Stefan Hajnoczi
2023-09-12 17:39         ` Alex Bennée [this message]
2023-09-12 18:52           ` Stefan Hajnoczi
2023-09-12 19:58 ` Thomas Huth
2023-09-13  6:43   ` Philippe Mathieu-Daudé
2023-09-13  9:18   ` Peter Maydell
2023-09-13  9:45     ` Philippe Mathieu-Daudé
2023-09-13 10:35       ` Alex Bennée
2023-09-13 10:39         ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1hbtf09.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).