All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: "Stefan Hajnoczi" <stefanha@redhat.com>,
	qemu-devel@nongnu.org,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Beraldo Leal" <bleal@redhat.com>
Subject: Re: [PATCH] gitlab: remove unreliable avocado CI jobs
Date: Tue, 12 Sep 2023 17:14:04 +0100	[thread overview]
Message-ID: <ZQCOTJMyMCgNCu3l@redhat.com> (raw)
In-Reply-To: <8734zjv0ph.fsf@linaro.org>

On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote:
> 
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote:
> >> The avocado-system-alpine, avocado-system-fedora, and
> >> avocado-system-ubuntu jobs are unreliable. I identified them while
> >> looking over CI failures from the past week:
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571
> >> 
> >> Thomas Huth suggest on IRC today that there may be a legitimate failure
> >> in there:
> >> 
> >>   th_huth: f4bug, yes, seems like it does not start at all correctly on
> >>   alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody
> >>   noticed this by now, this is worrying
> >> 
> >> It crept in because the jobs were already unreliable.
> >> 
> >> I don't know how to interpret the job output, so all I can do is to
> >> propose removing these jobs. A useful CI job has two outcomes: pass or
> >> fail. Timeouts and other in-between states are not useful because they
> >> require constant triaging by someone who understands the details of the
> >> tests and they can occur when run against pull requests that have
> >> nothing to do with the area covered by the test.
> >> 
> >> Hopefully test owners will be able to identify the root causes and solve
> >> them so that these jobs can stay. In their current state the jobs are
> >> not useful since I cannot cannot tell whether job failures are real or
> >> just intermittent when merging qemu.git pull requests.
> >> 
> >> If you are a test owner, please take a look.
> >> 
> >> It is likely that other avocado-system-* CI jobs have similar failures
> >> from time to time, but I'll leave them as long as they are passing.
> >> 
> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
> >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> ---
> >>  .gitlab-ci.d/buildtest.yml | 27 ---------------------------
> >>  1 file changed, 27 deletions(-)
> >> 
> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> index aee9101507..83ce448c4d 100644
> >> --- a/.gitlab-ci.d/buildtest.yml
> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> @@ -22,15 +22,6 @@ check-system-alpine:
> >>      IMAGE: alpine
> >>      MAKE_CHECK_ARGS: check-unit check-qtest
> >>  
> >> -avocado-system-alpine:
> >> -  extends: .avocado_test_job_template
> >> -  needs:
> >> -    - job: build-system-alpine
> >> -      artifacts: true
> >> -  variables:
> >> -    IMAGE: alpine
> >> -    MAKE_CHECK_ARGS: check-avocado
> >
> > Instead of entirely deleting, I'd suggest adding
> >
> >    # Disabled due to frequent random failures
> >    # https://gitlab.com/qemu-project/qemu/-/issues/1884
> >    when: manual
> >
> > See example: https://docs.gitlab.com/ee/ci/yaml/#when
> >
> > This disables the job from running unless someone explicitly
> > tells it to run
> 
> What I don't understand is why we didn't gate the release back when they
> first tripped. We should have noticed between:
> 
>   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
> 
> and
> 
>   https://gitlab.com/qemu-project/qemu/-/pipelines/957154381
> 
> that the system tests where regressing. Yet we merged the changes
> anyway.

I think that green series is misleading, based on Richard's
mail on list wrt the TCG pull series:

  https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html

  "It's some sort of timing issue, which sometimes goes away
   when re-run. I was re-running tests *a lot* in order to
   get them to go green while running the 8.1 release. "


Essentially I'd put this down to the tests being soo non-deterministic
that we've given up trusting them.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-09-12 16:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12 15:06 [PATCH] gitlab: remove unreliable avocado CI jobs Stefan Hajnoczi
2023-09-12 15:20 ` Daniel P. Berrangé
2023-09-12 16:01   ` Alex Bennée
2023-09-12 16:14     ` Daniel P. Berrangé [this message]
2023-09-12 16:19       ` Stefan Hajnoczi
2023-09-12 17:39         ` Alex Bennée
2023-09-12 18:52           ` Stefan Hajnoczi
2023-09-12 19:58 ` Thomas Huth
2023-09-13  6:43   ` Philippe Mathieu-Daudé
2023-09-13  9:18   ` Peter Maydell
2023-09-13  9:45     ` Philippe Mathieu-Daudé
2023-09-13 10:35       ` Alex Bennée
2023-09-13 10:39         ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQCOTJMyMCgNCu3l@redhat.com \
    --to=berrange@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=bleal@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.