* Making IGT runnable by CI and developers
@ 2017-07-20 16:23 Martin Peres
2017-07-21 9:39 ` Daniel Vetter
2017-07-21 10:56 ` Tvrtko Ursulin
0 siblings, 2 replies; 11+ messages in thread
From: Martin Peres @ 2017-07-20 16:23 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org
Hi everyone,
As some of you may already know, we have made great strides in making
our CI system usable, especially in the last 6 months when everything
started clicking together.
The CI team is no longer overwhelmed with fires and bug reports, so we
started working on increasing the coverage from just fast-feedback, to a
bigger set of IGT tests.
As some of you may know, running IGT has been a challenge that few
manage to overcome. Not only is the execution time counted in machine
months, but it can also lead to disk corruption, which does not
encourage developers to run it either. One test takes 21 days, on its
own, and it is a subset of another test which we never ran for obvious
reasons.
I would thus like to get the CI team and developers to work together to
decrease sharply the execution time of IGT, and get these tests run
multiple times per day!
There are three usages that the CI team envision (up for debate):
- Basic acceptance testing: Meant for developers and CI to check
quickly if a patch series is not completely breaking the world (< 10
minutes, timeout per test of 30s)
- Full run: Meant to be ran overnight by developers and users (< 6 hours)
- Stress tests: They can be in the test suite as a way to catch rare
issues, but they cannot be part of the default run mode. They likely
should be run on a case-by-case basis, on demand of a developer. Each
test could be allowed to take up to 1h.
There are multiple ways of getting to this situation (up for debate):
1) All the tests exposed by default are fast and meant to be run:
- Fast-feedback is provided by a testlist, for BAT
- Stress tests ran using a special command, kept for on-demand testing
2) Tests are all tagged with information about their exec time:
- igt@basic@.*: Meant for BAT
- igt@complete@.*: Meant for FULL
- igt@stress@.*: The stress tests
3) Testlists all the way:
- fast-feedback: for BAT
- all: the tests that people are expected to run (CI will run them)
- Stress tests will not be part of any testlist.
Whatever decision is being accepted, the CI team is mandating global
timeouts for both BAT and FULL testing, in order to guarantee
throughput. This will require the team as a whole to agree on time
quotas per sub-systems, and enforce them.
Can we try to get some healthy debate and reach a consensus on this? Our
CI efforts are being limited by this issue right now, and we will be
doing whatever we can until the test suite becomes saner and runnable,
but this may be unfair to some developers.
Looking forward to some constructive feedback and intelligent discussions!
Martin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-20 16:23 Making IGT runnable by CI and developers Martin Peres
@ 2017-07-21 9:39 ` Daniel Vetter
2017-07-21 9:47 ` Daniel Vetter
2017-07-21 15:13 ` Jani Nikula
2017-07-21 10:56 ` Tvrtko Ursulin
1 sibling, 2 replies; 11+ messages in thread
From: Daniel Vetter @ 2017-07-21 9:39 UTC (permalink / raw)
To: Martin Peres; +Cc: intel-gfx@lists.freedesktop.org
On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres
<martin.peres@linux.intel.com> wrote:
> Hi everyone,
>
> As some of you may already know, we have made great strides in making our CI
> system usable, especially in the last 6 months when everything started
> clicking together.
>
> The CI team is no longer overwhelmed with fires and bug reports, so we
> started working on increasing the coverage from just fast-feedback, to a
> bigger set of IGT tests.
>
> As some of you may know, running IGT has been a challenge that few manage to
> overcome. Not only is the execution time counted in machine months, but it
> can also lead to disk corruption, which does not encourage developers to run
> it either. One test takes 21 days, on its own, and it is a subset of another
> test which we never ran for obvious reasons.
>
> I would thus like to get the CI team and developers to work together to
> decrease sharply the execution time of IGT, and get these tests run multiple
> times per day!
>
> There are three usages that the CI team envision (up for debate):
> - Basic acceptance testing: Meant for developers and CI to check quickly if
> a patch series is not completely breaking the world (< 10 minutes, timeout
> per test of 30s)
> - Full run: Meant to be ran overnight by developers and users (< 6 hours)
> - Stress tests: They can be in the test suite as a way to catch rare
> issues, but they cannot be part of the default run mode. They likely should
> be run on a case-by-case basis, on demand of a developer. Each test could be
> allowed to take up to 1h.
>
> There are multiple ways of getting to this situation (up for debate):
>
> 1) All the tests exposed by default are fast and meant to be run:
> - Fast-feedback is provided by a testlist, for BAT
> - Stress tests ran using a special command, kept for on-demand testing
>
> 2) Tests are all tagged with information about their exec time:
> - igt@basic@.*: Meant for BAT
> - igt@complete@.*: Meant for FULL
> - igt@stress@.*: The stress tests
>
> 3) Testlists all the way:
> - fast-feedback: for BAT
> - all: the tests that people are expected to run (CI will run them)
> - Stress tests will not be part of any testlist.
>
> Whatever decision is being accepted, the CI team is mandating global
> timeouts for both BAT and FULL testing, in order to guarantee throughput.
> This will require the team as a whole to agree on time quotas per
> sub-systems, and enforce them.
>
> Can we try to get some healthy debate and reach a consensus on this? Our CI
> efforts are being limited by this issue right now, and we will be doing
> whatever we can until the test suite becomes saner and runnable, but this
> may be unfair to some developers.
>
> Looking forward to some constructive feedback and intelligent discussions!
> Martin
Imo the critical bit for the full run (which should regression test
all features while being fast enough that we can use it for pre-merge
testing) must be the default set. Default here means what you get
without any special cmdline options (to either the test or piglit),
and without any special testlist that are separately maintained.
Default also means that it will be included by default if you do a new
testcase. There's two reasons for that:
- Maintaining a separate test list is a pain. Also, it encourages
adding tons of tests that no one runs.
- If tests aren't run by default we can't test them pre-merging before
they land in igt and wreak havoc.
Second, we must have a reasonable runtime, and reasonable runtime here
means a few hours of machine time for everything, total. There's two
reasons for that:
- Only pre-merge is early enough to catch regressions. We can lament
all day long, but fact is that post-merge regressions don't get fixed
or handled in a timely manner, except when they're really serious.
This means any testing strategy that depends upon lots of post-merge
testing, or expects such post-merge testing to work, is bound to fail.
Either we can test everything pre-merge, or there's no regression
testing at all.
- We can't mix together multiple patch series bisect autobisecting is
too unreliable. I've been promised an autobisector for 3 years by
about 4 different teams now, making that happen in a reliable way is
_really_ hard. Blocking CI on this is not reasonable.
Also, the testsuite really should be fast enough that developers can
run it locally on their machines in a work day. Current plan is that
we can only test on HSW for now, until more budget appears (again, we
can lament about this, but it's not going to change), which means
developers _must_ be able to run stuff on e.g. SKL in a reasonable
amount of time.
Right now we have a runtime of the gem|prime tests of around 24 days,
and estimated 10 months for the stress tests included. I think the
actual machine time we'll have available in the near future, on this
HSW farm is going to allow 2-3h for gem tests. That's the time budget
for this default set of regression tests.
Wrt actually implementing it: I don't care, as long as it fulfills the
above. So tagging, per-test comdline options, outright deleting all
the tests we can't run anyway, disabling them in the build system or
whatever else is all fine with me, as long as the default set doesn't
require any special action. For tags this would mean that untagged
tests are _all_ included.
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 9:39 ` Daniel Vetter
@ 2017-07-21 9:47 ` Daniel Vetter
2017-07-21 15:13 ` Jani Nikula
1 sibling, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2017-07-21 9:47 UTC (permalink / raw)
To: Martin Peres; +Cc: intel-gfx@lists.freedesktop.org
On Fri, Jul 21, 2017 at 11:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres
> <martin.peres@linux.intel.com> wrote:
>> Hi everyone,
>>
>> As some of you may already know, we have made great strides in making our CI
>> system usable, especially in the last 6 months when everything started
>> clicking together.
>>
>> The CI team is no longer overwhelmed with fires and bug reports, so we
>> started working on increasing the coverage from just fast-feedback, to a
>> bigger set of IGT tests.
>>
>> As some of you may know, running IGT has been a challenge that few manage to
>> overcome. Not only is the execution time counted in machine months, but it
>> can also lead to disk corruption, which does not encourage developers to run
>> it either. One test takes 21 days, on its own, and it is a subset of another
>> test which we never ran for obvious reasons.
>>
>> I would thus like to get the CI team and developers to work together to
>> decrease sharply the execution time of IGT, and get these tests run multiple
>> times per day!
>>
>> There are three usages that the CI team envision (up for debate):
>> - Basic acceptance testing: Meant for developers and CI to check quickly if
>> a patch series is not completely breaking the world (< 10 minutes, timeout
>> per test of 30s)
>> - Full run: Meant to be ran overnight by developers and users (< 6 hours)
>> - Stress tests: They can be in the test suite as a way to catch rare
>> issues, but they cannot be part of the default run mode. They likely should
>> be run on a case-by-case basis, on demand of a developer. Each test could be
>> allowed to take up to 1h.
>>
>> There are multiple ways of getting to this situation (up for debate):
>>
>> 1) All the tests exposed by default are fast and meant to be run:
>> - Fast-feedback is provided by a testlist, for BAT
>> - Stress tests ran using a special command, kept for on-demand testing
>>
>> 2) Tests are all tagged with information about their exec time:
>> - igt@basic@.*: Meant for BAT
>> - igt@complete@.*: Meant for FULL
>> - igt@stress@.*: The stress tests
>>
>> 3) Testlists all the way:
>> - fast-feedback: for BAT
>> - all: the tests that people are expected to run (CI will run them)
>> - Stress tests will not be part of any testlist.
>>
>> Whatever decision is being accepted, the CI team is mandating global
>> timeouts for both BAT and FULL testing, in order to guarantee throughput.
>> This will require the team as a whole to agree on time quotas per
>> sub-systems, and enforce them.
>>
>> Can we try to get some healthy debate and reach a consensus on this? Our CI
>> efforts are being limited by this issue right now, and we will be doing
>> whatever we can until the test suite becomes saner and runnable, but this
>> may be unfair to some developers.
>>
>> Looking forward to some constructive feedback and intelligent discussions!
>> Martin
>
> Imo the critical bit for the full run (which should regression test
> all features while being fast enough that we can use it for pre-merge
> testing) must be the default set. Default here means what you get
> without any special cmdline options (to either the test or piglit),
> and without any special testlist that are separately maintained.
> Default also means that it will be included by default if you do a new
> testcase. There's two reasons for that:
>
> - Maintaining a separate test list is a pain. Also, it encourages
> adding tons of tests that no one runs.
>
> - If tests aren't run by default we can't test them pre-merging before
> they land in igt and wreak havoc.
>
> Second, we must have a reasonable runtime, and reasonable runtime here
> means a few hours of machine time for everything, total. There's two
> reasons for that:
> - Only pre-merge is early enough to catch regressions. We can lament
> all day long, but fact is that post-merge regressions don't get fixed
> or handled in a timely manner, except when they're really serious.
> This means any testing strategy that depends upon lots of post-merge
> testing, or expects such post-merge testing to work, is bound to fail.
> Either we can test everything pre-merge, or there's no regression
> testing at all.
>
> - We can't mix together multiple patch series bisect autobisecting is
> too unreliable. I've been promised an autobisector for 3 years by
> about 4 different teams now, making that happen in a reliable way is
> _really_ hard. Blocking CI on this is not reasonable.
I forgot one: Randomize test running is also not an acceptable
solution for pre-merge testing, because it's guaranteed to make the
results more noisy. And also guarantees that we'll miss some important
regression tests.
> Also, the testsuite really should be fast enough that developers can
> run it locally on their machines in a work day. Current plan is that
> we can only test on HSW for now, until more budget appears (again, we
> can lament about this, but it's not going to change), which means
> developers _must_ be able to run stuff on e.g. SKL in a reasonable
> amount of time.
>
> Right now we have a runtime of the gem|prime tests of around 24 days,
> and estimated 10 months for the stress tests included. I think the
> actual machine time we'll have available in the near future, on this
> HSW farm is going to allow 2-3h for gem tests. That's the time budget
> for this default set of regression tests.
>
> Wrt actually implementing it: I don't care, as long as it fulfills the
> above. So tagging, per-test comdline options, outright deleting all
> the tests we can't run anyway, disabling them in the build system or
> whatever else is all fine with me, as long as the default set doesn't
> require any special action. For tags this would mean that untagged
> tests are _all_ included.
Just to make it clear: This are the hard constraints we have.
Demanding that we have more machines to run more tests, demanding that
we have better post-regression tracking or anything else isn't a
constructive approach here. The challenge is to engineer a test suite
that fits within the hard constraints, and refusing to do that is
simply not proper software engienering.
And the current gem/prime regression test suite we do have entirely
fails that reality check.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-20 16:23 Making IGT runnable by CI and developers Martin Peres
2017-07-21 9:39 ` Daniel Vetter
@ 2017-07-21 10:56 ` Tvrtko Ursulin
2017-07-21 15:45 ` Daniel Vetter
1 sibling, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2017-07-21 10:56 UTC (permalink / raw)
To: Martin Peres, intel-gfx@lists.freedesktop.org
On 20/07/2017 17:23, Martin Peres wrote:
> Hi everyone,
>
> As some of you may already know, we have made great strides in making
> our CI system usable, especially in the last 6 months when everything
> started clicking together.
>
> The CI team is no longer overwhelmed with fires and bug reports, so we
> started working on increasing the coverage from just fast-feedback, to a
> bigger set of IGT tests.
>
> As some of you may know, running IGT has been a challenge that few
> manage to overcome. Not only is the execution time counted in machine
> months, but it can also lead to disk corruption, which does not
> encourage developers to run it either. One test takes 21 days, on its
> own, and it is a subset of another test which we never ran for obvious
> reasons.
>
> I would thus like to get the CI team and developers to work together to
> decrease sharply the execution time of IGT, and get these tests run
> multiple times per day!
>
> There are three usages that the CI team envision (up for debate):
> - Basic acceptance testing: Meant for developers and CI to check
> quickly if a patch series is not completely breaking the world (< 10
> minutes, timeout per test of 30s)
> - Full run: Meant to be ran overnight by developers and users (< 6 hours)
We could start by splitting this budget to logical components/teams.
So far we have been talking about GEM and KMS, but I was just thinking
that we may want to have a separate units on this level of likes of
power management, DRM (core), external stuff like sw fences? TBD I guess.
Assuming GEM/KMS split only, fair thing seems to be split the time
budget 50-50 and let the respective teams start working.
I assume this is x hours on the slowest machine?
Teams would also need easy access to up-to-date test run times.
> - Stress tests: They can be in the test suite as a way to catch rare
> issues, but they cannot be part of the default run mode. They likely
> should be run on a case-by-case basis, on demand of a developer. Each
> test could be allowed to take up to 1h.
>
> There are multiple ways of getting to this situation (up for debate):
>
> 1) All the tests exposed by default are fast and meant to be run:
> - Fast-feedback is provided by a testlist, for BAT
> - Stress tests ran using a special command, kept for on-demand testing
>
> 2) Tests are all tagged with information about their exec time:
> - igt@basic@.*: Meant for BAT
> - igt@complete@.*: Meant for FULL
> - igt@stress@.*: The stress tests
>
> 3) Testlists all the way:
> - fast-feedback: for BAT
> - all: the tests that people are expected to run (CI will run them)
> - Stress tests will not be part of any testlist.
I have a historical fondness for tagging and have just sent a v2 of my
tagging RFC. There would be some work involved to convert all tests to
support --list-subtest, but once there it sounds flexible and easy to
use to me.
How well this would fit with the CI systems I don't have a good
visibility to. So ultimately I don't care that much what gets picked
unless it ends up being very cumbersome or work intensive for either side.
To re-iterate:
* if we get a clear time allocation for GEM for example
* URL showing us how do we stand relative to that dynamically
* method of adding/removing tests to the default/full/extended
(whatever people want to call it) test run
Then I think this is enough for us to start working towards the common goal.
> Whatever decision is being accepted, the CI team is mandating global
> timeouts for both BAT and FULL testing, in order to guarantee
> throughput. This will require the team as a whole to agree on time
> quotas per sub-systems, and enforce them.
Is the current CI capable of adding together total per sub-system
runtimes, and based on what does it do that? I am wondering about tests
which do not prefix with gem_ or kms_ here.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 9:39 ` Daniel Vetter
2017-07-21 9:47 ` Daniel Vetter
@ 2017-07-21 15:13 ` Jani Nikula
2017-07-21 15:52 ` Daniel Vetter
1 sibling, 1 reply; 11+ messages in thread
From: Jani Nikula @ 2017-07-21 15:13 UTC (permalink / raw)
To: Daniel Vetter, Martin Peres; +Cc: intel-gfx@lists.freedesktop.org
On Fri, 21 Jul 2017, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres
> <martin.peres@linux.intel.com> wrote:
>> Hi everyone,
>>
>> As some of you may already know, we have made great strides in making our CI
>> system usable, especially in the last 6 months when everything started
>> clicking together.
>>
>> The CI team is no longer overwhelmed with fires and bug reports, so we
>> started working on increasing the coverage from just fast-feedback, to a
>> bigger set of IGT tests.
>>
>> As some of you may know, running IGT has been a challenge that few manage to
>> overcome. Not only is the execution time counted in machine months, but it
>> can also lead to disk corruption, which does not encourage developers to run
>> it either. One test takes 21 days, on its own, and it is a subset of another
>> test which we never ran for obvious reasons.
>>
>> I would thus like to get the CI team and developers to work together to
>> decrease sharply the execution time of IGT, and get these tests run multiple
>> times per day!
>>
>> There are three usages that the CI team envision (up for debate):
>> - Basic acceptance testing: Meant for developers and CI to check quickly if
>> a patch series is not completely breaking the world (< 10 minutes, timeout
>> per test of 30s)
>> - Full run: Meant to be ran overnight by developers and users (< 6 hours)
>> - Stress tests: They can be in the test suite as a way to catch rare
>> issues, but they cannot be part of the default run mode. They likely should
>> be run on a case-by-case basis, on demand of a developer. Each test could be
>> allowed to take up to 1h.
>>
>> There are multiple ways of getting to this situation (up for debate):
>>
>> 1) All the tests exposed by default are fast and meant to be run:
>> - Fast-feedback is provided by a testlist, for BAT
>> - Stress tests ran using a special command, kept for on-demand testing
>>
>> 2) Tests are all tagged with information about their exec time:
>> - igt@basic@.*: Meant for BAT
>> - igt@complete@.*: Meant for FULL
>> - igt@stress@.*: The stress tests
Ugh. I don't want any scheme that relies on modifying or renaming the
tests themselves to categorize them. IMO the names of tests should only
be informative. Categorization should be external to that.
>>
>> 3) Testlists all the way:
>> - fast-feedback: for BAT
>> - all: the tests that people are expected to run (CI will run them)
>> - Stress tests will not be part of any testlist.
>>
>> Whatever decision is being accepted, the CI team is mandating global
>> timeouts for both BAT and FULL testing, in order to guarantee throughput.
>> This will require the team as a whole to agree on time quotas per
>> sub-systems, and enforce them.
>>
>> Can we try to get some healthy debate and reach a consensus on this? Our CI
>> efforts are being limited by this issue right now, and we will be doing
>> whatever we can until the test suite becomes saner and runnable, but this
>> may be unfair to some developers.
>>
>> Looking forward to some constructive feedback and intelligent discussions!
>> Martin
>
> Imo the critical bit for the full run (which should regression test
> all features while being fast enough that we can use it for pre-merge
> testing) must be the default set. Default here means what you get
> without any special cmdline options (to either the test or piglit),
> and without any special testlist that are separately maintained.
> Default also means that it will be included by default if you do a new
> testcase. There's two reasons for that:
>
> - Maintaining a separate test list is a pain. Also, it encourages
> adding tons of tests that no one runs.
>
> - If tests aren't run by default we can't test them pre-merging before
> they land in igt and wreak havoc.
I agree the goal should be to run all tests by default. And this means
we should start being more critical of the tests we add.
For stress tests I would like to look more into splitting up the tests
in a way that you could run one iteration fast (as part of default), and
repeat the tests for more stress and coverage. I don't know how feasible
this is, and if it requires carrying over state from one iteration to
other, but I like the goal of running also some of this by default. This
would better catch silly bugs in tests too. (We discussed this offline
with Martin and Tomi.)
> Second, we must have a reasonable runtime, and reasonable runtime here
> means a few hours of machine time for everything, total. There's two
> reasons for that:
> - Only pre-merge is early enough to catch regressions. We can lament
> all day long, but fact is that post-merge regressions don't get fixed
> or handled in a timely manner, except when they're really serious.
> This means any testing strategy that depends upon lots of post-merge
> testing, or expects such post-merge testing to work, is bound to fail.
> Either we can test everything pre-merge, or there's no regression
> testing at all.
It's rarely as black and white as you make it out to be, but it's easy
to agree pre-merge is the thing that really motivates people to figure
stuff out because it blocks their patch from being merged.
> - We can't mix together multiple patch series bisect autobisecting is
> too unreliable. I've been promised an autobisector for 3 years by
> about 4 different teams now, making that happen in a reliable way is
> _really_ hard. Blocking CI on this is not reasonable.
>
> Also, the testsuite really should be fast enough that developers can
> run it locally on their machines in a work day. Current plan is that
> we can only test on HSW for now, until more budget appears (again, we
> can lament about this, but it's not going to change), which means
> developers _must_ be able to run stuff on e.g. SKL in a reasonable
> amount of time.
>
> Right now we have a runtime of the gem|prime tests of around 24 days,
> and estimated 10 months for the stress tests included. I think the
> actual machine time we'll have available in the near future, on this
> HSW farm is going to allow 2-3h for gem tests. That's the time budget
> for this default set of regression tests.
>
> Wrt actually implementing it: I don't care, as long as it fulfills the
> above. So tagging, per-test comdline options, outright deleting all
> the tests we can't run anyway, disabling them in the build system or
> whatever else is all fine with me, as long as the default set doesn't
> require any special action. For tags this would mean that untagged
> tests are _all_ included.
Off-topic, but IMO test lists are just an implementation of tags. In
that sense, we already have tagging.
BR,
Jani.
--
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 10:56 ` Tvrtko Ursulin
@ 2017-07-21 15:45 ` Daniel Vetter
2017-07-24 8:15 ` Tvrtko Ursulin
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2017-07-21 15:45 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx@lists.freedesktop.org
On Fri, Jul 21, 2017 at 12:56 PM, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> On 20/07/2017 17:23, Martin Peres wrote:
>>
>> Hi everyone,
>>
>> As some of you may already know, we have made great strides in making our
>> CI system usable, especially in the last 6 months when everything started
>> clicking together.
>>
>> The CI team is no longer overwhelmed with fires and bug reports, so we
>> started working on increasing the coverage from just fast-feedback, to a
>> bigger set of IGT tests.
>>
>> As some of you may know, running IGT has been a challenge that few manage
>> to overcome. Not only is the execution time counted in machine months, but
>> it can also lead to disk corruption, which does not encourage developers to
>> run it either. One test takes 21 days, on its own, and it is a subset of
>> another test which we never ran for obvious reasons.
>>
>> I would thus like to get the CI team and developers to work together to
>> decrease sharply the execution time of IGT, and get these tests run multiple
>> times per day!
>>
>> There are three usages that the CI team envision (up for debate):
>> - Basic acceptance testing: Meant for developers and CI to check quickly
>> if a patch series is not completely breaking the world (< 10 minutes,
>> timeout per test of 30s)
>> - Full run: Meant to be ran overnight by developers and users (< 6
>> hours)
>
>
> We could start by splitting this budget to logical components/teams.
>
> So far we have been talking about GEM and KMS, but I was just thinking that
> we may want to have a separate units on this level of likes of power
> management, DRM (core), external stuff like sw fences? TBD I guess.
>
> Assuming GEM/KMS split only, fair thing seems to be split the time budget
> 50-50 and let the respective teams start working.
Yes, KMS is also not perfect, but there it's maybe a factor of 2x that
it's taking too long. GEM is 50x or worse. Also note KMS includes
everything, so core drm, PM tests. 2x is something can be fixed as we
go, which is good, since it means we should be able to pre-merge test
any changes to igt before pushing. GEM is not even close.
> I assume this is x hours on the slowest machine?
>
> Teams would also need easy access to up-to-date test run times.
Right now you can't have that for GEM, because it takes 24d. That
means 1 run of GEM takes away 50 runs of everything else (need to
check, it might be worse). There's simply no way we can even hand out
that data without blocking pre-merge CI for everyone else.
We might be able to schedule the occasional manual run over the w/e,
but that's about it.
>> - Stress tests: They can be in the test suite as a way to catch rare
>> issues, but they cannot be part of the default run mode. They likely should
>> be run on a case-by-case basis, on demand of a developer. Each test could be
>> allowed to take up to 1h.
>>
>> There are multiple ways of getting to this situation (up for debate):
>>
>> 1) All the tests exposed by default are fast and meant to be run:
>> - Fast-feedback is provided by a testlist, for BAT
>> - Stress tests ran using a special command, kept for on-demand testing
>>
>> 2) Tests are all tagged with information about their exec time:
>> - igt@basic@.*: Meant for BAT
>> - igt@complete@.*: Meant for FULL
>> - igt@stress@.*: The stress tests
>>
>> 3) Testlists all the way:
>> - fast-feedback: for BAT
>> - all: the tests that people are expected to run (CI will run them)
>> - Stress tests will not be part of any testlist.
>
>
> I have a historical fondness for tagging and have just sent a v2 of my
> tagging RFC. There would be some work involved to convert all tests to
> support --list-subtest, but once there it sounds flexible and easy to use to
> me.
>
> How well this would fit with the CI systems I don't have a good visibility
> to. So ultimately I don't care that much what gets picked unless it ends up
> being very cumbersome or work intensive for either side.
>
> To re-iterate:
>
> * if we get a clear time allocation for GEM for example
2h as a start or goal, maybe 3h where we can start to run it in
pre-merge. On a fast HSW. Yes this is real tough, but I think by the
time the GEM testsuite is getting closer to that number KMS is a lot
faster. At least I plan to invest a pile of my own time into
optimizing stuff.
> * URL showing us how do we stand relative to that dynamically
> * method of adding/removing tests to the default/full/extended (whatever
> people want to call it) test run
>
> Then I think this is enough for us to start working towards the common goal.
>
>> Whatever decision is being accepted, the CI team is mandating global
>> timeouts for both BAT and FULL testing, in order to guarantee throughput.
>> This will require the team as a whole to agree on time quotas per
>> sub-systems, and enforce them.
>
>
> Is the current CI capable of adding together total per sub-system runtimes,
> and based on what does it do that? I am wondering about tests which do not
> prefix with gem_ or kms_ here.
We have run-data for everything. Jani S. has the full spreadsheet
somewhere from an outdated run (note that that one has a lot of issues
because gpu reset killed boxes back then, now it's just a bit too
slow). Tomi can generate a new list, but if you want GEM data it's 3
full days of real time across the entire HSW farm, and I really don't
think that's a terrible good use of these machines.
Personally I'd say once you are at a point where you can run the
entire of GEM on your own local box in less than 8h, that's the point
where we can at least make daily runs on the CI farm. Before that it's
simply a waste of machine-time that we don't have (insert lament about
budget freeze, but that's simply the reality for the next few months
at least).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 15:13 ` Jani Nikula
@ 2017-07-21 15:52 ` Daniel Vetter
2017-07-24 8:21 ` Tvrtko Ursulin
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2017-07-21 15:52 UTC (permalink / raw)
To: Jani Nikula; +Cc: intel-gfx@lists.freedesktop.org
On Fri, Jul 21, 2017 at 5:13 PM, Jani Nikula
<jani.nikula@linux.intel.com> wrote:
> On Fri, 21 Jul 2017, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres
>> <martin.peres@linux.intel.com> wrote:
>>> Hi everyone,
>>>
>>> As some of you may already know, we have made great strides in making our CI
>>> system usable, especially in the last 6 months when everything started
>>> clicking together.
>>>
>>> The CI team is no longer overwhelmed with fires and bug reports, so we
>>> started working on increasing the coverage from just fast-feedback, to a
>>> bigger set of IGT tests.
>>>
>>> As some of you may know, running IGT has been a challenge that few manage to
>>> overcome. Not only is the execution time counted in machine months, but it
>>> can also lead to disk corruption, which does not encourage developers to run
>>> it either. One test takes 21 days, on its own, and it is a subset of another
>>> test which we never ran for obvious reasons.
>>>
>>> I would thus like to get the CI team and developers to work together to
>>> decrease sharply the execution time of IGT, and get these tests run multiple
>>> times per day!
>>>
>>> There are three usages that the CI team envision (up for debate):
>>> - Basic acceptance testing: Meant for developers and CI to check quickly if
>>> a patch series is not completely breaking the world (< 10 minutes, timeout
>>> per test of 30s)
>>> - Full run: Meant to be ran overnight by developers and users (< 6 hours)
>>> - Stress tests: They can be in the test suite as a way to catch rare
>>> issues, but they cannot be part of the default run mode. They likely should
>>> be run on a case-by-case basis, on demand of a developer. Each test could be
>>> allowed to take up to 1h.
>>>
>>> There are multiple ways of getting to this situation (up for debate):
>>>
>>> 1) All the tests exposed by default are fast and meant to be run:
>>> - Fast-feedback is provided by a testlist, for BAT
>>> - Stress tests ran using a special command, kept for on-demand testing
>>>
>>> 2) Tests are all tagged with information about their exec time:
>>> - igt@basic@.*: Meant for BAT
>>> - igt@complete@.*: Meant for FULL
>>> - igt@stress@.*: The stress tests
>
> Ugh. I don't want any scheme that relies on modifying or renaming the
> tests themselves to categorize them. IMO the names of tests should only
> be informative. Categorization should be external to that.
>
>>>
>>> 3) Testlists all the way:
>>> - fast-feedback: for BAT
>>> - all: the tests that people are expected to run (CI will run them)
>>> - Stress tests will not be part of any testlist.
>>>
>>> Whatever decision is being accepted, the CI team is mandating global
>>> timeouts for both BAT and FULL testing, in order to guarantee throughput.
>>> This will require the team as a whole to agree on time quotas per
>>> sub-systems, and enforce them.
>>>
>>> Can we try to get some healthy debate and reach a consensus on this? Our CI
>>> efforts are being limited by this issue right now, and we will be doing
>>> whatever we can until the test suite becomes saner and runnable, but this
>>> may be unfair to some developers.
>>>
>>> Looking forward to some constructive feedback and intelligent discussions!
>>> Martin
>>
>> Imo the critical bit for the full run (which should regression test
>> all features while being fast enough that we can use it for pre-merge
>> testing) must be the default set. Default here means what you get
>> without any special cmdline options (to either the test or piglit),
>> and without any special testlist that are separately maintained.
>> Default also means that it will be included by default if you do a new
>> testcase. There's two reasons for that:
>>
>> - Maintaining a separate test list is a pain. Also, it encourages
>> adding tons of tests that no one runs.
>>
>> - If tests aren't run by default we can't test them pre-merging before
>> they land in igt and wreak havoc.
>
> I agree the goal should be to run all tests by default. And this means
> we should start being more critical of the tests we add.
>
> For stress tests I would like to look more into splitting up the tests
> in a way that you could run one iteration fast (as part of default), and
> repeat the tests for more stress and coverage. I don't know how feasible
> this is, and if it requires carrying over state from one iteration to
> other, but I like the goal of running also some of this by default. This
> would better catch silly bugs in tests too. (We discussed this offline
> with Martin and Tomi.)
I think right now, and for the near future (up to at least a year) the
only time we'll run stress tests if developers need nastier testcases
to help reproduce a bug locally. We simply don't have neither the CI
nor the QA resources to run these tests. If we're making really great
progress on overall quality and CI infrastructure (that needs budget
we don't have right now) and pre-merge testing we might be able to
start looking into running stress tests in CI or QA.
I think a good example is kms_frontbuffer_tracking --show-hidden,
which Paulo used to debug issues on his own machine.
>> Second, we must have a reasonable runtime, and reasonable runtime here
>> means a few hours of machine time for everything, total. There's two
>> reasons for that:
>> - Only pre-merge is early enough to catch regressions. We can lament
>> all day long, but fact is that post-merge regressions don't get fixed
>> or handled in a timely manner, except when they're really serious.
>> This means any testing strategy that depends upon lots of post-merge
>> testing, or expects such post-merge testing to work, is bound to fail.
>> Either we can test everything pre-merge, or there's no regression
>> testing at all.
>
> It's rarely as black and white as you make it out to be, but it's easy
> to agree pre-merge is the thing that really motivates people to figure
> stuff out because it blocks their patch from being merged.
Yeah it's not black/white for sure, but the difference in how serious
a regression is taking that BAT highlights, versus anything once stuff
is merged is rather extreme. I'd say with BAT-caught issues you have a
good chance that the developer will come up with a fixed patch version
within a day.
With regressions in the tree it takes a few weeks simply to get
agreement that maybe we should revert it. If (and that's a big one)
someone decides to deal with that hassle. I'd say there's at least 1
order of magnitude difference in how effective we deal with
regressions pre-merge vs. post-merge.
>> - We can't mix together multiple patch series bisect autobisecting is
>> too unreliable. I've been promised an autobisector for 3 years by
>> about 4 different teams now, making that happen in a reliable way is
>> _really_ hard. Blocking CI on this is not reasonable.
>>
>> Also, the testsuite really should be fast enough that developers can
>> run it locally on their machines in a work day. Current plan is that
>> we can only test on HSW for now, until more budget appears (again, we
>> can lament about this, but it's not going to change), which means
>> developers _must_ be able to run stuff on e.g. SKL in a reasonable
>> amount of time.
>>
>> Right now we have a runtime of the gem|prime tests of around 24 days,
>> and estimated 10 months for the stress tests included. I think the
>> actual machine time we'll have available in the near future, on this
>> HSW farm is going to allow 2-3h for gem tests. That's the time budget
>> for this default set of regression tests.
>>
>> Wrt actually implementing it: I don't care, as long as it fulfills the
>> above. So tagging, per-test comdline options, outright deleting all
>> the tests we can't run anyway, disabling them in the build system or
>> whatever else is all fine with me, as long as the default set doesn't
>> require any special action. For tags this would mean that untagged
>> tests are _all_ included.
>
> Off-topic, but IMO test lists are just an implementation of tags. In
> that sense, we already have tagging.
Agreed in principle, in practice the test-list is very far away from
where the subtest is defined. It's unlikely someone types a new
subtest and forgets to wire it up with an igt_subtest stanza. It's
very likely they'll forget to add it to some random file somewhere
(because that's not needed to run things locally). That's why I think
we need the default set. So even with tags, tags would be for the
special case (stress tests that need to be excluded, or maybe for
marking BAT stuff).
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 15:45 ` Daniel Vetter
@ 2017-07-24 8:15 ` Tvrtko Ursulin
2017-07-24 9:27 ` Daniel Vetter
0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2017-07-24 8:15 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx@lists.freedesktop.org
On 21/07/2017 16:45, Daniel Vetter wrote:
> On Fri, Jul 21, 2017 at 12:56 PM, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> On 20/07/2017 17:23, Martin Peres wrote:
>>>
>>> Hi everyone,
>>>
>>> As some of you may already know, we have made great strides in making our
>>> CI system usable, especially in the last 6 months when everything started
>>> clicking together.
>>>
>>> The CI team is no longer overwhelmed with fires and bug reports, so we
>>> started working on increasing the coverage from just fast-feedback, to a
>>> bigger set of IGT tests.
>>>
>>> As some of you may know, running IGT has been a challenge that few manage
>>> to overcome. Not only is the execution time counted in machine months, but
>>> it can also lead to disk corruption, which does not encourage developers to
>>> run it either. One test takes 21 days, on its own, and it is a subset of
>>> another test which we never ran for obvious reasons.
>>>
>>> I would thus like to get the CI team and developers to work together to
>>> decrease sharply the execution time of IGT, and get these tests run multiple
>>> times per day!
>>>
>>> There are three usages that the CI team envision (up for debate):
>>> - Basic acceptance testing: Meant for developers and CI to check quickly
>>> if a patch series is not completely breaking the world (< 10 minutes,
>>> timeout per test of 30s)
>>> - Full run: Meant to be ran overnight by developers and users (< 6
>>> hours)
>>
>>
>> We could start by splitting this budget to logical components/teams.
>>
>> So far we have been talking about GEM and KMS, but I was just thinking that
>> we may want to have a separate units on this level of likes of power
>> management, DRM (core), external stuff like sw fences? TBD I guess.
>>
>> Assuming GEM/KMS split only, fair thing seems to be split the time budget
>> 50-50 and let the respective teams start working.
>
> Yes, KMS is also not perfect, but there it's maybe a factor of 2x that
> it's taking too long. GEM is 50x or worse. Also note KMS includes
> everything, so core drm, PM tests. 2x is something can be fixed as we
> go, which is good, since it means we should be able to pre-merge test
> any changes to igt before pushing. GEM is not even close.
>
>> I assume this is x hours on the slowest machine?
>>
>> Teams would also need easy access to up-to-date test run times.
>
> Right now you can't have that for GEM, because it takes 24d. That
> means 1 run of GEM takes away 50 runs of everything else (need to
> check, it might be worse). There's simply no way we can even hand out
> that data without blocking pre-merge CI for everyone else.
>
> We might be able to schedule the occasional manual run over the w/e,
> but that's about it.
I did not explain well here what I was thinking about by access to
up-to-date runtime. I assumed we would start from a cut down list, the
one which fits in the time budget.
As Martin and me chatted on Friday, I would be completely fine with the
CI team just picking a list of GEM tests which fits, and then the GEM
team responsibility is to add, remove and improve tests until this time
is used in the most optimal way.
This was we would be getting daily test run time updates.
We also talked about the idea to set up an IGT trybot, where we could
send test changes, followed by a testlist updates, and so see the
specific test runtimes across the platforms.
Once that looks ok, we could submit a patch to the real test list and so
keep iterating until the above goal is reached.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-21 15:52 ` Daniel Vetter
@ 2017-07-24 8:21 ` Tvrtko Ursulin
2017-07-24 9:23 ` Daniel Vetter
0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2017-07-24 8:21 UTC (permalink / raw)
To: Daniel Vetter, Jani Nikula; +Cc: intel-gfx@lists.freedesktop.org
On 21/07/2017 16:52, Daniel Vetter wrote:
> On Fri, Jul 21, 2017 at 5:13 PM, Jani Nikula
> <jani.nikula@linux.intel.com> wrote:
[snip]
>> I agree the goal should be to run all tests by default. And this means
>> we should start being more critical of the tests we add.
>>
>> For stress tests I would like to look more into splitting up the tests
>> in a way that you could run one iteration fast (as part of default), and
>> repeat the tests for more stress and coverage. I don't know how feasible
>> this is, and if it requires carrying over state from one iteration to
>> other, but I like the goal of running also some of this by default. This
>> would better catch silly bugs in tests too. (We discussed this offline
>> with Martin and Tomi.)
>
> I think right now, and for the near future (up to at least a year) the
> only time we'll run stress tests if developers need nastier testcases
> to help reproduce a bug locally. We simply don't have neither the CI
> nor the QA resources to run these tests. If we're making really great
> progress on overall quality and CI infrastructure (that needs budget
> we don't have right now) and pre-merge testing we might be able to
> start looking into running stress tests in CI or QA.
>
> I think a good example is kms_frontbuffer_tracking --show-hidden,
> which Paulo used to debug issues on his own machine.
Just on this particular point - to make this facility generic would be a
flavour of test tagging, so pretty much the same high level effect as
the RFC I sent. I am not pushing it (again), just saying it would be
effectively the same approach/effort, only that tags are more
generic/flexible.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-24 8:21 ` Tvrtko Ursulin
@ 2017-07-24 9:23 ` Daniel Vetter
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2017-07-24 9:23 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx@lists.freedesktop.org
On Mon, Jul 24, 2017 at 09:21:39AM +0100, Tvrtko Ursulin wrote:
>
> On 21/07/2017 16:52, Daniel Vetter wrote:
> > On Fri, Jul 21, 2017 at 5:13 PM, Jani Nikula
> > <jani.nikula@linux.intel.com> wrote:
>
> [snip]
>
> > > I agree the goal should be to run all tests by default. And this means
> > > we should start being more critical of the tests we add.
> > >
> > > For stress tests I would like to look more into splitting up the tests
> > > in a way that you could run one iteration fast (as part of default), and
> > > repeat the tests for more stress and coverage. I don't know how feasible
> > > this is, and if it requires carrying over state from one iteration to
> > > other, but I like the goal of running also some of this by default. This
> > > would better catch silly bugs in tests too. (We discussed this offline
> > > with Martin and Tomi.)
> >
> > I think right now, and for the near future (up to at least a year) the
> > only time we'll run stress tests if developers need nastier testcases
> > to help reproduce a bug locally. We simply don't have neither the CI
> > nor the QA resources to run these tests. If we're making really great
> > progress on overall quality and CI infrastructure (that needs budget
> > we don't have right now) and pre-merge testing we might be able to
> > start looking into running stress tests in CI or QA.
> >
> > I think a good example is kms_frontbuffer_tracking --show-hidden,
> > which Paulo used to debug issues on his own machine.
>
> Just on this particular point - to make this facility generic would be a
> flavour of test tagging, so pretty much the same high level effect as the
> RFC I sent. I am not pushing it (again), just saying it would be effectively
> the same approach/effort, only that tags are more generic/flexible.
The one difference would be it's binary, as in it only differentiates
between real regression tests everyone should care about vs. tools and
stuff that is only useful for the feature developer. It's _not_ tagging of
everything (which I fully agree with Jani is not going to work if we put
it into the source files).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Making IGT runnable by CI and developers
2017-07-24 8:15 ` Tvrtko Ursulin
@ 2017-07-24 9:27 ` Daniel Vetter
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2017-07-24 9:27 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx@lists.freedesktop.org
On Mon, Jul 24, 2017 at 09:15:28AM +0100, Tvrtko Ursulin wrote:
>
> On 21/07/2017 16:45, Daniel Vetter wrote:
> > On Fri, Jul 21, 2017 at 12:56 PM, Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > >
> > > On 20/07/2017 17:23, Martin Peres wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > As some of you may already know, we have made great strides in making our
> > > > CI system usable, especially in the last 6 months when everything started
> > > > clicking together.
> > > >
> > > > The CI team is no longer overwhelmed with fires and bug reports, so we
> > > > started working on increasing the coverage from just fast-feedback, to a
> > > > bigger set of IGT tests.
> > > >
> > > > As some of you may know, running IGT has been a challenge that few manage
> > > > to overcome. Not only is the execution time counted in machine months, but
> > > > it can also lead to disk corruption, which does not encourage developers to
> > > > run it either. One test takes 21 days, on its own, and it is a subset of
> > > > another test which we never ran for obvious reasons.
> > > >
> > > > I would thus like to get the CI team and developers to work together to
> > > > decrease sharply the execution time of IGT, and get these tests run multiple
> > > > times per day!
> > > >
> > > > There are three usages that the CI team envision (up for debate):
> > > > - Basic acceptance testing: Meant for developers and CI to check quickly
> > > > if a patch series is not completely breaking the world (< 10 minutes,
> > > > timeout per test of 30s)
> > > > - Full run: Meant to be ran overnight by developers and users (< 6
> > > > hours)
> > >
> > >
> > > We could start by splitting this budget to logical components/teams.
> > >
> > > So far we have been talking about GEM and KMS, but I was just thinking that
> > > we may want to have a separate units on this level of likes of power
> > > management, DRM (core), external stuff like sw fences? TBD I guess.
> > >
> > > Assuming GEM/KMS split only, fair thing seems to be split the time budget
> > > 50-50 and let the respective teams start working.
> >
> > Yes, KMS is also not perfect, but there it's maybe a factor of 2x that
> > it's taking too long. GEM is 50x or worse. Also note KMS includes
> > everything, so core drm, PM tests. 2x is something can be fixed as we
> > go, which is good, since it means we should be able to pre-merge test
> > any changes to igt before pushing. GEM is not even close.
> >
> > > I assume this is x hours on the slowest machine?
> > >
> > > Teams would also need easy access to up-to-date test run times.
> >
> > Right now you can't have that for GEM, because it takes 24d. That
> > means 1 run of GEM takes away 50 runs of everything else (need to
> > check, it might be worse). There's simply no way we can even hand out
> > that data without blocking pre-merge CI for everyone else.
> >
> > We might be able to schedule the occasional manual run over the w/e,
> > but that's about it.
>
> I did not explain well here what I was thinking about by access to
> up-to-date runtime. I assumed we would start from a cut down list, the one
> which fits in the time budget.
>
> As Martin and me chatted on Friday, I would be completely fine with the CI
> team just picking a list of GEM tests which fits, and then the GEM team
> responsibility is to add, remove and improve tests until this time is used
> in the most optimal way.
>
> This was we would be getting daily test run time updates.
>
> We also talked about the idea to set up an IGT trybot, where we could send
> test changes, followed by a testlist updates, and so see the specific test
> runtimes across the platforms.
>
> Once that looks ok, we could submit a patch to the real test list and so
> keep iterating until the above goal is reached.
Atm we have 99% of GEM stuff that we simply cannot run. I dont think it's
a good idea to carry that around forever (simply because enumerating all
these tests alone kills machine time if you try to run stuff locally). Is
the plan to not clean that up?
2nd issue I have with an explicit gem test suite: New testcases won't get
tested by default, which means no pressure on them to be fast or useful or
stable. That's imo a big reason for why we ended up here. So if you think
an explicit gem test list is the way to go, then I think the only way to
do that is with a blacklist (which would start out with all gem tests).
And after a few months we'd just go through the sources and delete all the
tests still blacklisted, or something like that.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-07-24 9:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-20 16:23 Making IGT runnable by CI and developers Martin Peres
2017-07-21 9:39 ` Daniel Vetter
2017-07-21 9:47 ` Daniel Vetter
2017-07-21 15:13 ` Jani Nikula
2017-07-21 15:52 ` Daniel Vetter
2017-07-24 8:21 ` Tvrtko Ursulin
2017-07-24 9:23 ` Daniel Vetter
2017-07-21 10:56 ` Tvrtko Ursulin
2017-07-21 15:45 ` Daniel Vetter
2017-07-24 8:15 ` Tvrtko Ursulin
2017-07-24 9:27 ` Daniel Vetter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.