All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jani Nikula <jani.nikula@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>,
	Martin Peres <martin.peres@linux.intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: Making IGT runnable by CI and developers
Date: Fri, 21 Jul 2017 18:13:37 +0300	[thread overview]
Message-ID: <87mv7xn7q6.fsf@intel.com> (raw)
In-Reply-To: <CAKMK7uFzd9JQLLV9_5hZD13Htr56gHdXqCScgkc7xDTPiZ4YYg@mail.gmail.com>

On Fri, 21 Jul 2017, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres
> <martin.peres@linux.intel.com> wrote:
>> Hi everyone,
>>
>> As some of you may already know, we have made great strides in making our CI
>> system usable, especially in the last 6 months when everything started
>> clicking together.
>>
>> The CI team is no longer overwhelmed with fires and bug reports, so we
>> started working on increasing the coverage from just fast-feedback, to a
>> bigger set of IGT tests.
>>
>> As some of you may know, running IGT has been a challenge that few manage to
>> overcome. Not only is the execution time counted in machine months, but it
>> can also lead to disk corruption, which does not encourage developers to run
>> it either. One test takes 21 days, on its own, and it is a subset of another
>> test which we never ran for obvious reasons.
>>
>> I would thus like to get the CI team and developers to work together to
>> decrease sharply the execution time of IGT, and get these tests run multiple
>> times per day!
>>
>> There are three usages that the CI team envision (up for debate):
>>  - Basic acceptance testing: Meant for developers and CI to check quickly if
>> a patch series is not completely breaking the world (< 10 minutes, timeout
>> per test of 30s)
>>  - Full run: Meant to be ran overnight by developers and users (< 6 hours)
>>  - Stress tests: They can be in the test suite as a way to catch rare
>> issues, but they cannot be part of the default run mode. They likely should
>> be run on a case-by-case basis, on demand of a developer. Each test could be
>> allowed to take up to 1h.
>>
>> There are multiple ways of getting to this situation (up for debate):
>>
>>  1) All the tests exposed by default are fast and meant to be run:
>>   - Fast-feedback is provided by a testlist, for BAT
>>   - Stress tests ran using a special command, kept for on-demand testing
>>
>>  2) Tests are all tagged with information about their exec time:
>>   - igt@basic@.*: Meant for BAT
>>   - igt@complete@.*: Meant for FULL
>>   - igt@stress@.*: The stress tests

Ugh. I don't want any scheme that relies on modifying or renaming the
tests themselves to categorize them. IMO the names of tests should only
be informative. Categorization should be external to that.

>>
>>  3) Testlists all the way:
>>   - fast-feedback: for BAT
>>   - all: the tests that people are expected to run (CI will run them)
>>   - Stress tests will not be part of any testlist.
>>
>> Whatever decision is being accepted, the CI team is mandating global
>> timeouts for both BAT and FULL testing, in order to guarantee throughput.
>> This will require the team as a whole to agree on time quotas per
>> sub-systems, and enforce them.
>>
>> Can we try to get some healthy debate and reach a consensus on this? Our CI
>> efforts are being limited by this issue right now, and we will be doing
>> whatever we can until the test suite becomes saner and runnable, but this
>> may be unfair to some developers.
>>
>> Looking forward to some constructive feedback and intelligent discussions!
>> Martin
>
> Imo the critical bit for the full run (which should regression test
> all features while being fast enough that we can use it for pre-merge
> testing) must be the default set. Default here means what you get
> without any special cmdline options (to either the test or piglit),
> and without any special testlist that are separately maintained.
> Default also means that it will be included by default if you do a new
> testcase. There's two reasons for that:
>
> - Maintaining a separate test list is a pain. Also, it encourages
> adding tons of tests that no one runs.
>
> - If tests aren't run by default we can't test them pre-merging before
> they land in igt and wreak havoc.

I agree the goal should be to run all tests by default. And this means
we should start being more critical of the tests we add.

For stress tests I would like to look more into splitting up the tests
in a way that you could run one iteration fast (as part of default), and
repeat the tests for more stress and coverage. I don't know how feasible
this is, and if it requires carrying over state from one iteration to
other, but I like the goal of running also some of this by default. This
would better catch silly bugs in tests too. (We discussed this offline
with Martin and Tomi.)

> Second, we must have a reasonable runtime, and reasonable runtime here
> means a few hours of machine time for everything, total. There's two
> reasons for that:
> - Only pre-merge is early enough to catch regressions. We can lament
> all day long, but fact is that post-merge regressions don't get fixed
> or handled in a timely manner, except when they're really serious.
> This means any testing strategy that depends upon lots of post-merge
> testing, or expects such post-merge testing to work, is bound to fail.
> Either we can test everything pre-merge, or there's no regression
> testing at all.

It's rarely as black and white as you make it out to be, but it's easy
to agree pre-merge is the thing that really motivates people to figure
stuff out because it blocks their patch from being merged.

> - We can't mix together multiple patch series bisect autobisecting is
> too unreliable. I've been promised an autobisector for 3 years by
> about 4 different teams now, making that happen in a reliable way is
> _really_ hard. Blocking CI on this is not reasonable.
>
> Also, the testsuite really should be fast enough that developers can
> run it locally on their machines in a work day. Current plan is that
> we can only test on HSW for now, until more budget appears (again, we
> can lament about this, but it's not going to change), which means
> developers _must_ be able to run stuff on e.g. SKL in a reasonable
> amount of time.
>
> Right now we have a runtime of the gem|prime tests of around 24 days,
> and estimated 10 months for the stress tests included. I think the
> actual machine time we'll have available in the near future, on this
> HSW farm is going to allow 2-3h for gem tests. That's the time budget
> for this default set of regression tests.
>
> Wrt actually implementing it: I don't care, as long as it fulfills the
> above. So tagging, per-test comdline options, outright deleting all
> the tests we can't run anyway, disabling them in the build system or
> whatever else is all fine with me, as long as the default set doesn't
> require any special action. For tags this would mean that untagged
> tests are _all_ included.

Off-topic, but IMO test lists are just an implementation of tags. In
that sense, we already have tagging.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2017-07-21 15:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 16:23 Making IGT runnable by CI and developers Martin Peres
2017-07-21  9:39 ` Daniel Vetter
2017-07-21  9:47   ` Daniel Vetter
2017-07-21 15:13   ` Jani Nikula [this message]
2017-07-21 15:52     ` Daniel Vetter
2017-07-24  8:21       ` Tvrtko Ursulin
2017-07-24  9:23         ` Daniel Vetter
2017-07-21 10:56 ` Tvrtko Ursulin
2017-07-21 15:45   ` Daniel Vetter
2017-07-24  8:15     ` Tvrtko Ursulin
2017-07-24  9:27       ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mv7xn7q6.fsf@intel.com \
    --to=jani.nikula@linux.intel.com \
    --cc=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=martin.peres@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.