Re: [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Wainer dos Santos Moschetta <wainersm@redhat.com>
To: "Cleber Rosa" <crosa@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: "Fam Zheng" <fam@euphon.net>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Aleksandar Rikalo" <arikalo@wavecomp.com>,
	qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Aurelien Jarno" <aurelien@aurel32.net>
Subject: Re: [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests
Date: Fri, 28 Jun 2019 17:43:09 -0300	[thread overview]
Message-ID: <785c89ed-a7a4-78f6-972a-e36615017268@redhat.com> (raw)
In-Reply-To: <20190621143816.GA24282@localhost.localdomain>


On 06/21/2019 11:38 AM, Cleber Rosa wrote:
> On Fri, Jun 21, 2019 at 09:03:33AM +0200, Philippe Mathieu-Daudé wrote:
>> On 6/21/19 8:09 AM, Cleber Rosa wrote:
>>> It's a fact that some tests may not be 100% reliable in all
>>> environments.  While it's a tough call to remove a useful test that
>>> from the tree because it may fail every 1/100th time (or so), having
>>> human attention drawn to known issues is very bad for humans and for
>>> the projects they manage.
>>>
>>> As a compromise solution, this marks tests that are known to have
>>> issues, or that exercises known issues in QEMU or other components,
>>> and excludes them from the entry point.  As a consequence, tests
>>> marked as "flaky" will not be executed as part of "make
>>> check-acceptance".
>>>
>>> Because such tests should be forgiven but never be forgotten, it's
>>> possible to list them with (assuming "make check-venv" or "make
>>> check-acceptance" has already initiatilized the venv):
>>>
>>>    $ ./tests/venv/bin/avocado list -t flaky tests/acceptance

It needs a Make target to run those flaky tests (If we ever agree on 
this idea of flaky tests). Other Avocado flags are passed (e.g. -t for 
tags) that can happen to fail tests on their absent. One clear example 
is the spice test on patch 02 of this series...

Side note: check-acceptance seems to get growing in complexity that I 
worry will end up in pitfalls. is a Make target the proper way to 
implement complex test runs (I don't think so). Perhaps Avocado runner 
concept could help somehow?

>>>
>>> The current list of tests marked as flaky are a result of running
>>> the entire set of acceptance tests around 20 times.  The results
>>> were then processed with a helper script[1].  That either confirmed
>>> known issues (in the case of aarch64 and arm)[2] or revealed new
>>> ones (mips).
>>>
>>> This also bumps the Avocado version to one that includes a fix to the
>>> parsing of multiple and mix "key:val" and simple tag values.
>>>
>>> [1] https://raw.githubusercontent.com/avocado-framework/avocado/master/contrib/scripts/summarize-job-failures.py
>>> [2] https://bugs.launchpad.net/qemu/+bug/1829779
>>>
>>> Signed-off-by: Cleber Rosa <crosa@redhat.com>
>>> ---
>>>   docs/devel/testing.rst                   | 17 +++++++++++++++++
>>>   tests/Makefile.include                   |  6 +++++-
>>>   tests/acceptance/boot_linux_console.py   |  2 ++
>>>   tests/acceptance/linux_ssh_mips_malta.py |  2 ++
>>>   tests/requirements.txt                   |  2 +-
>>>   5 files changed, 27 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
>>> index da2d0fc964..ff4d8e2e1c 100644
>>> --- a/docs/devel/testing.rst
>>> +++ b/docs/devel/testing.rst
>>> @@ -574,6 +574,23 @@ may be invoked by running:
>>>   
>>>     tests/venv/bin/avocado run $OPTION1 $OPTION2 tests/acceptance/
>>>   
>>> +Tagging tests
>>> +-------------
>>> +
>>> +flaky
>>> +~~~~~
>>> +
>>> +If a test is known to fail intermittently, even if only every one
>>> +hundredth time, it's highly advisable to mark it as a flaky test.
>>> +This will prevent these individual tests from failing much larger
>>> +jobs, will avoid human interaction and time wasted to verify a known
>>> +issue, and worse of all, can lead to the discredit of automated
>>> +testing.
>>> +
>>> +To mark a test as flaky, add to its docstring.::
>>> +
>>> +  :avocado: tags=flaky
>> I certainly disagree with this patch, failing tests have to be fixed.
>> Why not tag all the codebase flaky and sing "happy coding"?
>>
> That's a great idea! :)
>
> Now, seriously, I also resisted this for quite a long time.  The
> reality, though, is that intermittent failures will continue to
> appear, and letting tests (and jobs, and CI pipelines, and whatnot)
> fail is a very bad idea.  We all agree that real fixes are better than
> this, but many times they don't come quickly.

It seems to me that flaky test is just a case in a broaden scenario: run 
(or not) grouped tests. You may have tests indeed broken or that takes 
considerable time (those tagged "slow") which one may fairly want to 
exclude from `make check-acceptance` as well. Thus some way to group 
tests plus define run inclusion/exclusion patterns seems the ultimate 
goal here.

>
>> Anyway if this get accepted, 'flaky' tags must have the intermittent
>> failure well described, and a Launchpad/Bugzilla tracking ticket referenced.
>>
> And here you have a key point that I absolutely agree with.  The
> "flaky" approach can either poison a lot of tests, and be seen as
> quick way out of a difficult issue revealed by a test.  Or, it can
> serve as an effective tool to keep track of these very important
> issues.
>
> If we add:
>
>     # https://bugs.launchpad.net/qemu/+bug/1829779
>     :avocado: flaky
>
> Topped with some human, I believe this can be very effective.  This goes
> without saying, but comments here are very much welcome.

I agree that all flaky test should have a tracking bug. In the end it 
represents a technical debit that we should address.

- Wainer

>
> - Cleber.
>
>>> +
>>>   Manual Installation
>>>   -------------------
>>>   
>>> diff --git a/tests/Makefile.include b/tests/Makefile.include
>>> index db750dd6d0..4c97da2878 100644
>>> --- a/tests/Makefile.include
>>> +++ b/tests/Makefile.include
>>> @@ -1125,7 +1125,11 @@ TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
>>>   # Any number of command separated loggers are accepted.  For more
>>>   # information please refer to "avocado --help".
>>>   AVOCADO_SHOW=app
>>> -AVOCADO_TAGS=$(patsubst %-softmmu,-t arch:%, $(filter %-softmmu,$(TARGET_DIRS)))
>>> +
>>> +# Additional tags that are added to each occurence of "--filter-by-tags"
>>> +AVOCADO_EXTRA_TAGS := ,-flaky
>>> +
>>> +AVOCADO_TAGS=$(patsubst %-softmmu,--filter-by-tags=arch:%$(AVOCADO_EXTRA_TAGS), $(filter %-softmmu,$(TARGET_DIRS)))
>>>   
>>>   ifneq ($(findstring v2,"v$(PYTHON_VERSION)"),v2)
>>>   $(TESTS_VENV_DIR): $(TESTS_VENV_REQ)
>>> diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
>>> index 32159503e9..6bd5c1ab53 100644
>>> --- a/tests/acceptance/boot_linux_console.py
>>> +++ b/tests/acceptance/boot_linux_console.py
>>> @@ -249,6 +249,7 @@ class BootLinuxConsole(Test):
>>>           """
>>>           :avocado: tags=arch:aarch64
>>>           :avocado: tags=machine:virt
>>> +        :avocado: tags=flaky
>>>           """
>>>           kernel_url = ('https://download.fedoraproject.org/pub/fedora/linux/'
>>>                         'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz')
>>> @@ -270,6 +271,7 @@ class BootLinuxConsole(Test):
>>>           """
>>>           :avocado: tags=arch:arm
>>>           :avocado: tags=machine:virt
>>> +        :avocado: tags=flaky
>>>           """
>>>           kernel_url = ('https://download.fedoraproject.org/pub/fedora/linux/'
>>>                         'releases/29/Everything/armhfp/os/images/pxeboot/vmlinuz')
>>> diff --git a/tests/acceptance/linux_ssh_mips_malta.py b/tests/acceptance/linux_ssh_mips_malta.py
>>> index aafb0c39f6..ae70b658e0 100644
>>> --- a/tests/acceptance/linux_ssh_mips_malta.py
>>> +++ b/tests/acceptance/linux_ssh_mips_malta.py
>>> @@ -208,6 +208,7 @@ class LinuxSSH(Test):
>>>           :avocado: tags=machine:malta
>>>           :avocado: tags=endian:big
>>>           :avocado: tags=device:pcnet32
>>> +        :avocado: tags=flaky
>>>           """
>>>           kernel_url = ('https://people.debian.org/~aurel32/qemu/mips/'
>>>                         'vmlinux-3.2.0-4-5kc-malta')
>>> @@ -222,6 +223,7 @@ class LinuxSSH(Test):
>>>           :avocado: tags=machine:malta
>>>           :avocado: tags=endian:little
>>>           :avocado: tags=device:pcnet32
>>> +        :avocado: tags=flaky
>>>           """
>>>           kernel_url = ('https://people.debian.org/~aurel32/qemu/mipsel/'
>>>                         'vmlinux-3.2.0-4-5kc-malta')
>>> diff --git a/tests/requirements.txt b/tests/requirements.txt
>>> index 3ae0e29ad7..58d63d171f 100644
>>> --- a/tests/requirements.txt
>>> +++ b/tests/requirements.txt
>>> @@ -1,5 +1,5 @@
>>>   # Add Python module requirements, one per line, to be installed
>>>   # in the tests/venv Python virtual environment. For more info,
>>>   # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
>>> -avocado-framework==68.0
>>> +avocado-framework==69.1
>>>   paramiko
>>>

next prev parent reply	other threads:[~2019-06-28 20:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-21  6:09 [Qemu-devel] [PATCH 0/2] Acceptance tests: exclude "flaky" tests and introduce SPICE test Cleber Rosa
2019-06-21  6:09 ` [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests Cleber Rosa
2019-06-21  7:03   ` Philippe Mathieu-Daudé
2019-06-21 14:38     ` Cleber Rosa
2019-06-28 20:43       ` Wainer dos Santos Moschetta [this message]
2019-06-30 17:51         ` Cleber Rosa
2019-07-05 19:01           ` Wainer dos Santos Moschetta
2019-06-21  6:09 ` [Qemu-devel] [PATCH 2/2] Acceptance tests: add SPICE protocol check Cleber Rosa
2019-06-28 20:54   ` Wainer dos Santos Moschetta
2019-06-30 18:01     ` Cleber Rosa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=785c89ed-a7a4-78f6-972a-e36615017268@redhat.com \
    --to=wainersm@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=arikalo@wavecomp.com \
    --cc=aurelien@aurel32.net \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=fam@euphon.net \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).