On Thu, May 17, 2012 at 09:11:12AM +0100, Matt Fleming wrote:
> On Thu, 2012-05-17 at 14:59 +0800, Fengguang Wu wrote:
> > I happen to be building a kernel test backend for our team and
> > hopefully for the wider kernel community.
> > 
> > What I do is to fetch a number of git trees (including yours) into my
> > test server, iterate through *all* remote branches and test out *every
> > single* commits there.
> 
> That is very cool! I'll keep this in mind in future to make sure that my
> trees don't contain known buggy commits.

Heh. On the other hand, you are welcome to take advantage of this
test infrastructure by creating a random test branch and pushing
commits there. Within hours you should be able to receive (private)
email notifications about possible compile/boot errors.

[ more words on the motivations ]

Of cause one shall still try to code it right in the first place and
carry out the more oriented in-house tests. However in general not all
kernel developers can afford to do comprehensive compile&boot tests on
a variety of kconfigs and hardwares. In a global view it's also not
economical for everyone to setup and run such test environments.

So I'm setting up this "0day" kernel test service with highlights:

0) 0 efforts to use
1) 1-hour 24x7 response (aka. 0day)
2) commit-by-commit tests
3) covers all branches of a git tree

It would be a fast responding system because I would really really
like bugs be found and fixed ASAP. So that they don't land linux-next
at all. IMHO linux-next was supposed to be the "integration" test tree,
however the bugs landing it are mostly non-integration bugs..

linux-next is over-used. The result is bad experiences on running
linux-next kernels. People run into silly mistakes by the others which
could be otherwise avoided if the tests can be carried out in the very
moment new commits are pushed to the git tree. At the time when the
developer is still "hot" to fix any problems. At the time no others
have been disturbed or even aware of the problem being fixed.

Look at the attached compile status graphs. The X-axis is the git
commits in a branch and the Y-axis is the kconfigs. Each blue 'C'/'B'
character means a successful compile/boot. Red 'c'/'b' characters
indicate failures. 'r' means the compiled kernel is to be ran. 'R'
kernel is being boot tested.

As you can see, some kconfig sees a complete red line. And there are
lines that has a long run of red 'C' or 'b' in the middle of line.
Which means the bug silently appears and disappears, or the bug is
found/fixed late and the git branch is an important one that cannot be
rebased. (The mm tree deliberately keeps standalone bug fixes, which
is fine to us users other than making the graph looking ugly.)

Interestingly, it only requires several (perhaps trivial) bugs to
impact a lot of commits, kconfigs and users.

Overall, the red portions in the compile status graphs are significant
and that's a common phenomenon over the big subsystem trees. What I'd
like to see is for the bugs be fixed promptly in each end developers'
topic or testing branches before they are merged into linux-next (and
found by others there in the painful way). I'd like the linux-next be
more pleasant for me and other developers to use. If so, linux-next will
be able to attract more users, resulting even better quality for itself
and then Linus' tree. Hopefully the whole user base and code quality
chain can be moved a bit forward in this way.

On the technical side, I'm currently running a 16-core compile server
which can compile test one commit in 2 minutes on average, covering
about 20 kconfigs. Another 6-core boot server will run 5 kvm instances
each can boot test a kernel in about 1 minute. Note that only several
kconfigs will be boot tested for now. Overall the current setup can
test up to 700 commits each day. Hopefully the hardware pool can be
further expanded on demand, given that it's proved to be important for
the community.

There are a lot of rooms for future improvements.

- more kconfigs are desirable

- it'd be valuable to include some stress tests in the boot test, so
  as to trigger more possible kernel panics

- physical test boxes with different hw profile will be allocated to
  improve coverage of the boot tests

- micro performance benchmarks are also good candidate features to
  catch performance regressions early, though performance drops are
  far less painful as kernel panics for the other developers

- the test scripts can be improved over time

- more test back-ends could be established (with different focus) as
  long as there are interests and resources: there are never too many
  tests!

Feedbacks are welcome! Please drop me a mail if you would like me to
add (or drop) your tree from the 0day kernel tests :-)

Thanks,
Fengguang