public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kernel-testers - finding regressions in the kernel
@ 2008-05-07 19:58 Adrian Bunk
       [not found] ` <804dabb00805080712h6d8c282dy5efff381298acfeb@mail.gmail.com>
       [not found] ` <20080508230543.GL22887@cs181133002.pp.htv.fi>
  0 siblings, 2 replies; 3+ messages in thread
From: Adrian Bunk @ 2008-05-07 19:58 UTC (permalink / raw)
  To: linux-kernel, kernel-testers, kernelnewbies, kernel-janitors

[ Please send further questions only on the kernel-testers list. ]

One big problem we face in kernel development are regressions - a kernel 
that worked for the user and a more recent kernel no longer works for 
the user.

Another problem is for newbies to get into kernel development.
There aren't many useful coding tasks for newbies available.

Testing the kernel and learning how to debug problems both brings 
immediate value to the kernel and can be a good way to get started
with kernel development.


Regressions are bad since they make it harder for users to upgrade the 
kernel. They cause problems for people when upgrading the kernel and 
result in many people running old kernels with known security holes.

There are easy to find regressions like compile errors, but the real 
problems are things like for example:
- system doesn't boot
- system crashes with a specific workload
- mp3 playback now stutters when copying files in the background
- when doing this or that the system feels worse

Automated tests can find some of these problems, but many of the 
problems that affect only some specific hardware or the interactive 
feeling of the computer can not be found automatically.

And a helpful property of regressions is that for a reproducible 
regression it's relatively easy to figure out what broke it through 
bisection.

There are usually two months between some kernel developer adding a new 
regression to the kernel and the regression reaching a stable kernel, 
and it's important to catch as many of them as early as possible.

The next step will be to also test -next and -mm kernels for identifying 
regressions even before they reach Linus' tree.



Mailing list:
http://vger.kernel.org/vger-lists.html#kernel-testers

The list is primarily for people starting at testing kernels, but if 
people with experiences in kernel testing and/or kernel debugging want 
to join that's highly appreciated.


Webpage:
http://kernelnewbies.org/KernelTesters


FAQ:


Q:
What will be discussed on the mailing list?

A:
- which kernels to test
- how to test them
- how to turn observed problems into proper bugreports


Q:
What are the prerequisites for participating?

A:
You should already know how to build and install your own kernel
(http://www.kroah.com/lkn/ is a good introduction).

And be willing to spend quite some time with debugging problems
you discover.


Q:
How to start?

A:
Run 2.6.25 for a week.
If you already did this, check what the latest snapshot listed at
http://www.kernel.org/ is and try this kernel.


Q:
The snapshot worked just fine for some days.

A:
The kernel was less broken than expected. :-)
(Well, this actually should be the normal case...)

Try a more recent snapshot and follow the discussions on the mailing 
list what to test next.


Q:
Is it dangerous to test such kernels?

A:
Most of the times there are no problems.

But your system might crash, and in the worst case it might turn the 
contents of your disks into garbage.

But you anyway have a backup, don't you?

If you need your system to work for reaching some deadline in real life 
you obviously shouldn't test any kernels.


Q:
I found a regression!

A:
Great!

Please tell the details on the mailing list.


Q:
What about bugs that are not regressions?

A:
Such bugs can also be discussed, but they are not the primary target.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel-testers - finding regressions in the kernel
       [not found]   ` <482475E8.5090208@googlemail.com>
@ 2008-05-09 16:22     ` Rafael J. Wysocki
  0 siblings, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2008-05-09 16:22 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: brot, Adrian Bunk, kernel-testers, LKML, Andrew Morton

Peter, Ingo, can you please have a look at this?

On Friday, 9 of May 2008, brot wrote:
> Adrian Bunk schrieb:
> > On Thu, May 08, 2008 at 11:54:34PM +0200, brot wrote:
> >> 2008/5/8 Peter Teoh <htmldeveloper@gmail.com>:
> >>
> >>     Recently, I had two different problems:
> >>
> >>     a.   For a problem that started since somewher after 2.6.24-rc9 till
> >>     now, even with the sched-devel git version, the keyboard on entry on
> >>     my laptop will jump/skip some characters.   Sometimes even repeating
> >>     many times SUDDENLY when I just entered once.   That is for Dell
> >>     Inspiron 9300 (Intel ICH6 chipset, ie, already > 3 years old, with FC7
> >>     as the OS).   But no such problem for desktop (Intel/AMD alike).   But
> >>     I don't know what to capture in the kernel when such thing happened.
> >>
> >>
> >> Same here. I think that depends heavily on the load. I have boinc
> >> running all the time, crunching numbers for worldcommunitygrid. As long
> >> as both cores have 100% load, everything is fine. But as soon as another
> >> task gets started (compiling kde is a task where this happens, same
> >> nice) the keyboard gets sluggish, just like Peter said. Also, sometimes
> >> the mp3 player (amarok) stutters.
> >> I am getting this since the early 2.6.25 - rc kernels.  PC is a amd64
> >> (Opteron 170, 2G Ram)
> >>
> >> If i can help somehow to debug that i would me more than happy to help.
> > 
> > Assuming you can create a workload with which you can blindly 
> > distinguish a good kernel from a bad kernel, one of you could
> > bisect the problem (first decide here who of you wants to try it):
> > 
> > After you've verified that 2.6.24 is good and both 2.6.25 and the latest 
> > 2.6.26-rc1-git are bad do:
> > 
> > <--  snip  -->
> > 
> > # install git
> > 
> > # clone Linus' tree:
> > git clone \ 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> > 
> > # start bisecting:
> > cd linux-2.6
> > git bisect start
> > git bisect bad v2.6.25
> > git bisect good v2.6.24
> > cp /path/to/.config .
> > 
> > # start a round
> > make oldconfig
> > make
> > # install kernel, check whether it's good or bad, then:
> > git bisect [bad|good]
> > # start next round
> > 
> > 
> > After at about 15 reboots you'll have found the guilty commit
> > ("...  is first bad commit").
> > 
> > 
> > More information on git bisecting:
> >   man git-bisect
> > 
> > <--  snip  -->
> > 
> > You might be busy for several hours, but this has a good chance of 
> > finding the source of your problem.
> > 
> >> Have a nice day,
> >> brot
> > 
> > cu
> > Adrian
> > 
> 
> You were right, i have been busy for the last hours, but i was able to find the source of that problem.
> I am not really sure what to do now, but that is what the last git bisect command said:
> 
> brotkastn linux # git bisect bad
> 3fe69747dab906cd6a8523230276a9820d6a514f is first bad commit
> commit 3fe69747dab906cd6a8523230276a9820d6a514f
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Fri Mar 14 20:55:51 2008 +0100
> 
>     sched: min_vruntime fix
> 
>     Current min_vruntime tracking is incorrect and will cause serious
>     problems when we don't run the leftmost task for some reason.
> 
>     min_vruntime does two things; 1) it's used to determine a forward
>     direction when the u64 vruntime wraps, 2) it's used to track the
>     leftmost vruntime to position newly enqueued tasks from.
> 
>     The current logic advances min_vruntime whenever the current task's
>     vruntime advance. Because the current task may pass the leftmost task
>     still waiting we're failing the second goal. This causes new tasks to be
>     placed too far ahead and thus penalizes their runtime.
> 
>     Fix this by making min_vruntime the min_vruntime of the waiting tasks by
>     tracking it in enqueue/dequeue, and compare against current's vruntime
>     to obtain the absolute minimum when placing new tasks.
> 
>     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> :040000 040000 a4f0b33e63cdb4e46d16d58a72a9d2dfbce88580 a48374666644a94cb33c0caf70604e2e53e0881e M      kernel
> 
> I hope my testing was usefull.
> 
> Have a nice day everyone,

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel-testers - finding regressions in the kernel
       [not found]     ` <482421D2.9060103@googlemail.com>
@ 2008-05-09 16:33       ` brot
  0 siblings, 0 replies; 3+ messages in thread
From: brot @ 2008-05-09 16:33 UTC (permalink / raw)
  Cc: linux-kernel

brot wrote:
> 
> I found the specific load. I am running prime torture test, in mode 2:
> 
> 2 = In-place large FFTs (maximum heat and power consumption, some RAM
> tested)
> 
> 1 thread per cpu.
> 
> Prime can be downloaded from:
> http://www.mersenneforum.org/showthread.php?t=9779
> 
> While that test is running, i compile gtkmm-2.12.7 with -j3. I
> remembered that while i had to compile it i really had problems typing.
> And it looks like i was right. in 2.6.25.1 i had some skips in amarok
> while playing an mp3 (just normal short skips, but clearly noticable).
> My typing in kvirc looked like that:
> 
> 
> [11:07:41] <!brot> und nu laggds keyboard auch, aber nur
> manchmaaaaaaaaaaaaaaal
> (..)
> [11:09:18] <!brot> ichh  ababer auch priiiiiiiiiiiiime laaufen
> 
> That happens during normal typing, then i see that kvirc has "stuck" and
> can type normally. then the key during the lagg is repeated like you can
> see above and the other keys i typed are normal again.
> 
> In 2.6.24.3 there is no such problem, so i am starting a bisect test now.
> 
> Have a nice day everyone.
> brot

Just in case you want to know how i distinguished good from bad kernels, and how to reproduce those laggs.

Thanks,
brot



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-05-09 16:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-07 19:58 kernel-testers - finding regressions in the kernel Adrian Bunk
     [not found] ` <804dabb00805080712h6d8c282dy5efff381298acfeb@mail.gmail.com>
     [not found]   ` <55f47b0b0805081043y701b406awa2087b875a7e040e@mail.gmail.com>
     [not found] ` <20080508230543.GL22887@cs181133002.pp.htv.fi>
     [not found]   ` <482475E8.5090208@googlemail.com>
2008-05-09 16:22     ` Rafael J. Wysocki
     [not found]   ` <48241184.2070107@googlemail.com>
     [not found]     ` <482421D2.9060103@googlemail.com>
2008-05-09 16:33       ` brot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox