* proposal to run Ceph tests on pull requests
@ 2015-12-05 11:49 Loic Dachary
2015-12-07 11:29 ` John Spray
0 siblings, 1 reply; 3+ messages in thread
From: Loic Dachary @ 2015-12-05 11:49 UTC (permalink / raw)
To: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 5299 bytes --]
Hi Ceph,
TL;DR: a ceph-qa-suite bot running on pull requests is sustainable and is an incentive for contributors to use teuthology-openstack independently
When a pull request is submitted, it is compiled, some tests are run[1] and the result is added to the pull request to confirm that it does not introduce a trivial problem. Such tests are however limited because they must:
* run within a few minutes at most
* not require multiple machines
* not require root privileges
More extensive tests (primarily integration tests) are needed before a contribution can be merged into Ceph [2], to verify it does not introduce a subtle regression. It would be ideal to run these integration tests on each pull request but there are two obstacles:
* each test takes ~ 1.5 hour
* each test cost ~ 0.30 euros
On the current master, running all tests would require ~1000 jobs [3]. That would cost ~ 300 euros on each pull request and take ~10 hours assuming 100 jobs can run in parallel. We could resolve that problem by:
* maintaining a ceph-qa-suite map to be used as a white list mapping a diff to a set of tests. For instance, if the diff modifies the src/ceph-disk file, it outputs the ceph-disk suite[4]. This would effectively trim the tests that are unrelated to the contribution and reduce the number of tests to a maximum of ~100 [4] and most likely a dozen.
* tests are run if one of the commits of the pull request has the *Needs-qa: true* flag in the commit message[5]
* limiting the number of tests to fit in the allocated budget. If there was enough funding for 10,000 jobs during the previous period and there was a total of 1,000 test run required (a test run is a set of tests as produced by the ceph-qa-suite map), each run is trimmed to a maximum of ten tests, regardless.
Here is an example:
Joe submits a pull request to fix a bug in the librados API
The make check bot compiles and fails make check because it introduces a bug
Joe uses run-make-check.sh locally to repeat the failure, fixes it and repush
The make check bot compiles and passes make check
Joe amends the commit message to add *Needs-qa: true* and repushes
The ceph-qa-suite map script finds a change on the librados API and outputs smoke/basic/tasks/rados_api_tests.yaml
The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml which fails
Joe examines the logs found at http://teuthology-logs.public.ceph.com/ and decides to debug by running the test himself
Joe runs teuthology-openstack --suite smoke/basic/tasks/rados_api_tests.yaml against his own OpenStack tenant [6]
Joe repush with a fix
The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml which succeeds
Kefu reviews the pull request and has a link to the successful test runs in the comments
This approach scales with the size of the Ceph developer community [7] because regular contributors benefit directly from funding the ceph-qa-suite bot. New contributors can focus on learning how to interpret the ceph-qa-suite error logs for their contribution and learn about how to debug it via teuthology-openstack if needed, which is a better user experience than trying to figure out which ceph-qa-suite job to run, learning about teuthology, schedule the test and interpret the results.
The maintenance workload of a ceph-qa-suite bot probably requires one work day a week, to handle funding, sysadmin of the server where the bot runs but mostly to sort out the false negatives. I believe a pure self-service approach where each contributor would be asked to run teuthology-openstack independently would actually require more work. The ceph-qa-suite bot provides a baseline on which everybody can agree to sort out the false negatives. When a contributor runs teuthology-openstack by herself/himself, it is difficult for her/him to figure out if a failure comes from something she/he did incorrectly because she/he is not familiar with teuthology-openstack or if it is related to her/his contribution. She/He will asks for assistance in situations where comparing her/his run with the output of the ceph-qa-suite bot would probably give her/him enough hints to fix the problem herself/himself.
If the ceph-qa-suite bot becomes unavailable, the contributors are not blocked because they can run it by themselves on their own OpenStack tenant and link the results to the pull request in the same way the bot would. Debugging a failed test is essentially the same thing as running the ceph-qa-suite bot.
Cheers
[1] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh
[2] Ceph test suites https://github.com/ceph/ceph-qa-suite/tree/master/suites
[3] teuthology-suite --suite . --subset 1/40000
[4] minimal number of tests to run all tasks at least once: 130 for rados, 76 for fs, 113 for upgrade, 18 for rgw, 45 for rbd.
[5] a former proposal was to include the test suite to run in the commit message, but this is more difficult to maintain that a boolean flag that states a given commit needs to pass all the relevant tests
[6] teuthology-openstack https://github.com/dachary/teuthology/tree/openstack#openstack-backend
[7] Scaling out the Ceph community lab http://dachary.org/?p=3852
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: proposal to run Ceph tests on pull requests
2015-12-05 11:49 proposal to run Ceph tests on pull requests Loic Dachary
@ 2015-12-07 11:29 ` John Spray
2015-12-07 14:42 ` Gregory Farnum
0 siblings, 1 reply; 3+ messages in thread
From: John Spray @ 2015-12-07 11:29 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
On Sat, Dec 5, 2015 at 11:49 AM, Loic Dachary <loic@dachary.org> wrote:
> Hi Ceph,
>
> TL;DR: a ceph-qa-suite bot running on pull requests is sustainable and is an incentive for contributors to use teuthology-openstack independently
A bot for scheduling a named suite on a named PR, and posting the
results back the PR is definitely a good thing.
Thinking further about using commit messages to toggle the testing, I
think that this could get awkward when it's coupled to the human side
of code review. When someone pushes a "how about this?" modification
they don't necessarily want to re-run the test suite until the
reviewer has okayed it, but then that means that they have to push
again, and the final thing that's tested would be a different SHA1
(hopefully the same code) than what the human last reviewed. We'll
also have e.g. rebases, where there tends to be some discretion about
whether a rebase requires a re-test.
When you were talking about having the suite selected in the qa: tag,
there was the motivation to put it in the commit message so that it
would be preserved in backports. However, if the "Needs-qa:" flag is
just a boolean, then I think it makes more sense to control it with a
github label or by posting a command in a PR comment.
I'm not sure how this really helps with the resource issues; for
example with the fs suite we would probably not be able to make a
finer-grained choice about what tests to run based on the diff. The
part about randomly dropping a subset of tests when resources are low
doesn't make sense to me -- I think the bot should either give up or
enqueue itself.
Cheers,
John
> When a pull request is submitted, it is compiled, some tests are run[1] and the result is added to the pull request to confirm that it does not introduce a trivial problem. Such tests are however limited because they must:
>
> * run within a few minutes at most
> * not require multiple machines
> * not require root privileges
>
> More extensive tests (primarily integration tests) are needed before a contribution can be merged into Ceph [2], to verify it does not introduce a subtle regression. It would be ideal to run these integration tests on each pull request but there are two obstacles:
>
> * each test takes ~ 1.5 hour
> * each test cost ~ 0.30 euros
>
> On the current master, running all tests would require ~1000 jobs [3]. That would cost ~ 300 euros on each pull request and take ~10 hours assuming 100 jobs can run in parallel. We could resolve that problem by:
>
> * maintaining a ceph-qa-suite map to be used as a white list mapping a diff to a set of tests. For instance, if the diff modifies the src/ceph-disk file, it outputs the ceph-disk suite[4]. This would effectively trim the tests that are unrelated to the contribution and reduce the number of tests to a maximum of ~100 [4] and most likely a dozen.
> * tests are run if one of the commits of the pull request has the *Needs-qa: true* flag in the commit message[5]
> * limiting the number of tests to fit in the allocated budget. If there was enough funding for 10,000 jobs during the previous period and there was a total of 1,000 test run required (a test run is a set of tests as produced by the ceph-qa-suite map), each run is trimmed to a maximum of ten tests, regardless.
>
> Here is an example:
>
> Joe submits a pull request to fix a bug in the librados API
> The make check bot compiles and fails make check because it introduces a bug
> Joe uses run-make-check.sh locally to repeat the failure, fixes it and repush
> The make check bot compiles and passes make check
> Joe amends the commit message to add *Needs-qa: true* and repushes
> The ceph-qa-suite map script finds a change on the librados API and outputs smoke/basic/tasks/rados_api_tests.yaml
> The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml which fails
> Joe examines the logs found at http://teuthology-logs.public.ceph.com/ and decides to debug by running the test himself
> Joe runs teuthology-openstack --suite smoke/basic/tasks/rados_api_tests.yaml against his own OpenStack tenant [6]
> Joe repush with a fix
> The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml which succeeds
> Kefu reviews the pull request and has a link to the successful test runs in the comments
>
> This approach scales with the size of the Ceph developer community [7] because regular contributors benefit directly from funding the ceph-qa-suite bot. New contributors can focus on learning how to interpret the ceph-qa-suite error logs for their contribution and learn about how to debug it via teuthology-openstack if needed, which is a better user experience than trying to figure out which ceph-qa-suite job to run, learning about teuthology, schedule the test and interpret the results.
>
> The maintenance workload of a ceph-qa-suite bot probably requires one work day a week, to handle funding, sysadmin of the server where the bot runs but mostly to sort out the false negatives. I believe a pure self-service approach where each contributor would be asked to run teuthology-openstack independently would actually require more work. The ceph-qa-suite bot provides a baseline on which everybody can agree to sort out the false negatives. When a contributor runs teuthology-openstack by herself/himself, it is difficult for her/him to figure out if a failure comes from something she/he did incorrectly because she/he is not familiar with teuthology-openstack or if it is related to her/his contribution. She/He will asks for assistance in situations where comparing her/his run with the output of the ceph-qa-suite bot would probably give her/him enough hints to fix the problem herself/himself.
>
> If the ceph-qa-suite bot becomes unavailable, the contributors are not blocked because they can run it by themselves on their own OpenStack tenant and link the results to the pull request in the same way the bot would. Debugging a failed test is essentially the same thing as running the ceph-qa-suite bot.
>
> Cheers
>
> [1] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh
> [2] Ceph test suites https://github.com/ceph/ceph-qa-suite/tree/master/suites
> [3] teuthology-suite --suite . --subset 1/40000
> [4] minimal number of tests to run all tasks at least once: 130 for rados, 76 for fs, 113 for upgrade, 18 for rgw, 45 for rbd.
> [5] a former proposal was to include the test suite to run in the commit message, but this is more difficult to maintain that a boolean flag that states a given commit needs to pass all the relevant tests
> [6] teuthology-openstack https://github.com/dachary/teuthology/tree/openstack#openstack-backend
> [7] Scaling out the Ceph community lab http://dachary.org/?p=3852
> --
> Loïc Dachary, Artisan Logiciel Libre
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: proposal to run Ceph tests on pull requests
2015-12-07 11:29 ` John Spray
@ 2015-12-07 14:42 ` Gregory Farnum
0 siblings, 0 replies; 3+ messages in thread
From: Gregory Farnum @ 2015-12-07 14:42 UTC (permalink / raw)
To: John Spray; +Cc: Loic Dachary, Ceph Development
On Mon, Dec 7, 2015 at 3:29 AM, John Spray <jspray@redhat.com> wrote:
> On Sat, Dec 5, 2015 at 11:49 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Ceph,
>>
>> TL;DR: a ceph-qa-suite bot running on pull requests is sustainable and is an incentive for contributors to use teuthology-openstack independently
>
> A bot for scheduling a named suite on a named PR, and posting the
> results back the PR is definitely a good thing.
>
> Thinking further about using commit messages to toggle the testing, I
> think that this could get awkward when it's coupled to the human side
> of code review. When someone pushes a "how about this?" modification
> they don't necessarily want to re-run the test suite until the
> reviewer has okayed it, but then that means that they have to push
> again, and the final thing that's tested would be a different SHA1
> (hopefully the same code) than what the human last reviewed. We'll
> also have e.g. rebases, where there tends to be some discretion about
> whether a rebase requires a re-test.
>
> When you were talking about having the suite selected in the qa: tag,
> there was the motivation to put it in the commit message so that it
> would be preserved in backports. However, if the "Needs-qa:" flag is
> just a boolean, then I think it makes more sense to control it with a
> github label or by posting a command in a PR comment.
>
> I'm not sure how this really helps with the resource issues; for
> example with the fs suite we would probably not be able to make a
> finer-grained choice about what tests to run based on the diff. The
> part about randomly dropping a subset of tests when resources are low
> doesn't make sense to me -- I think the bot should either give up or
> enqueue itself.
Yeah. It might eventually be nice for whitelisted people to be able to
trigger a suite run from the github interface, but right now we don't
have anywhere near the resources for an automatic run of each PR, and
I don't see us getting them.
And as John says, for almost every change an automatic selection isn't
going to be able to do any better than picking a full suite to run
(based on what a developer has manually entered into the commit
message).
Let's just stick with what we've got for now and see how it goes with
getting people to do their own testing.
-Greg
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-12-07 14:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-05 11:49 proposal to run Ceph tests on pull requests Loic Dachary
2015-12-07 11:29 ` John Spray
2015-12-07 14:42 ` Gregory Farnum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.