Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
       [not found] <200507140912.22532.mgross@linux.intel.com.suse.lists.linux.kernel>
@ 2005-07-15  0:38 ` Andi Kleen
  2005-07-15  1:45   ` Jesper Juhl
  2005-07-15  2:09   ` Parag Warudkar
  0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2005-07-15  0:38 UTC (permalink / raw)
  To: Mark Gross; +Cc: linux-kernel

Mark Gross <mgross@linux.intel.com> writes:
> 
> The problem is the process, not than the code.
> * The issues are too much ad-hock code flux without enough disciplined/formal 
> regression testing and review.  

It's basically impossible to regression test swsusp except to release it. 
Its success or failure depends on exactly the driver combination/platform/BIOS
version etc.  e.g. all drivers have to cooperate and the particular
bugs in your BIOS need to be worked around etc. Since that is quite fragile
regressions are common.

However in some other cases I agree some more regression testing
before release would be nice. But that's not how Linux works.  Linux
does regression testing after release.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  0:38 ` Why is 2.6.12.2 less stable on my laptop than 2.6.10? Andi Kleen
@ 2005-07-15  1:45   ` Jesper Juhl
  2005-07-15  2:02     ` Chris Friesen
  2005-07-15  2:09     ` Dave Jones
  2005-07-15  2:09   ` Parag Warudkar
  1 sibling, 2 replies; 21+ messages in thread
From: Jesper Juhl @ 2005-07-15  1:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Gross, linux-kernel

On 15 Jul 2005 02:38:58 +0200, Andi Kleen <ak@suse.de> wrote:
> Mark Gross <mgross@linux.intel.com> writes:
> >
> > The problem is the process, not than the code.
> > * The issues are too much ad-hock code flux without enough disciplined/formal
> > regression testing and review.
> 
> It's basically impossible to regression test swsusp except to release it.
> Its success or failure depends on exactly the driver combination/platform/BIOS
> version etc.  e.g. all drivers have to cooperate and the particular
> bugs in your BIOS need to be worked around etc. Since that is quite fragile
> regressions are common.
> 
> However in some other cases I agree some more regression testing
> before release would be nice. But that's not how Linux works.  Linux
> does regression testing after release.
> 
And who says that couldn't change?

In my oppinion it would be nice if Linus/Andrew had some basic
regression tests they could run on kernels before releasing them.
There are plenty of "Linux test" projects out there that could be
borrowed from to create some sort of regression test harness for them
to run prior to release.   It would be super nice if they had a suite
of tests to run and could then drop a mail on lkml saying 2.6.x is
almost ready to go, but it currently fails regression tests #x, #y &
#z, we need to get those fixed first before we can release this - and
then every time a bug was found that could resonably be tested for in
the future it would be added to the regression test suite...  That
would lead to more consistent quality I believe.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  1:45   ` Jesper Juhl
@ 2005-07-15  2:02     ` Chris Friesen
  2005-07-15  2:06       ` Jesper Juhl
  2005-07-15  2:09     ` Dave Jones
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Friesen @ 2005-07-15  2:02 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Andi Kleen, Mark Gross, linux-kernel

Jesper Juhl wrote:

> In my oppinion it would be nice if Linus/Andrew had some basic
> regression tests they could run on kernels before releasing them.

How do you regression test behaviour on broken hardware (and BIOSes) 
that you don't have?

Chris



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:02     ` Chris Friesen
@ 2005-07-15  2:06       ` Jesper Juhl
  2005-07-15  2:09         ` Andi Kleen
  2005-07-15  2:16         ` Dave Airlie
  0 siblings, 2 replies; 21+ messages in thread
From: Jesper Juhl @ 2005-07-15  2:06 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Andi Kleen, Mark Gross, linux-kernel

On 7/15/05, Chris Friesen <cfriesen@nortel.com> wrote:
> Jesper Juhl wrote:
> 
> > In my oppinion it would be nice if Linus/Andrew had some basic
> > regression tests they could run on kernels before releasing them.
> 
> How do you regression test behaviour on broken hardware (and BIOSes)
> that you don't have?
> 
That, of course, you cannot do. But, you can regression test a lot of
other things, and having a default test suite that is constantly being
added to and always being run before releases (that test hardware
agnostic stuff) could help cut down on the number of regressions in
new releases.
You can't test everything this way, nor should you, but you can test
many things, and adding a bit of formal testing to the release
procedure wouldn't be a bad thing IMO.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:06       ` Jesper Juhl
@ 2005-07-15  2:09         ` Andi Kleen
  2005-07-15 21:33           ` Mark Gross
  2005-07-15  2:16         ` Dave Airlie
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-07-15  2:09 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Chris Friesen, Andi Kleen, Mark Gross, linux-kernel

> You can't test everything this way, nor should you, but you can test
> many things, and adding a bit of formal testing to the release
> procedure wouldn't be a bad thing IMO.

In the linux model that's left to the distributions. In fact doing it properly
takes months. You wouldn't want to wait months for a new mainline kernel.

Formal testing is not really compatible with "release early, release often" 

You could do things like "run LTP first", but in practice LTP rarely finds
bugs.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:09         ` Andi Kleen
@ 2005-07-15 21:33           ` Mark Gross
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Gross @ 2005-07-15 21:33 UTC (permalink / raw)
  To: Andi Kleen, Jesper Juhl; +Cc: Chris Friesen, Andi Kleen, linux-kernel

On Thursday 14 July 2005 19:09, Andi Kleen wrote:
> > You can't test everything this way, nor should you, but you can test
> > many things, and adding a bit of formal testing to the release
> > procedure wouldn't be a bad thing IMO.
>
> In the linux model that's left to the distributions. In fact doing it
> properly takes months. You wouldn't want to wait months for a new mainline
> kernel.
>
> Formal testing is not really compatible with "release early, release often"
>

This is true.  I think we are seeing the effects of releasing more often than 
we should be into a "stable" tree.  Early and Often make sence for developing 
new features, but should they be pushed into a stable release so often?

> You could do things like "run LTP first", but in practice LTP rarely finds
> bugs.
>
> -Andi

-- 
--mgross
BTW: This may or may not be the opinion of my employer, more likely not.  


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:06       ` Jesper Juhl
  2005-07-15  2:09         ` Andi Kleen
@ 2005-07-15  2:16         ` Dave Airlie
  2005-07-15 21:39           ` Mark Gross
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Airlie @ 2005-07-15  2:16 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Chris Friesen, Andi Kleen, Mark Gross, linux-kernel

> That, of course, you cannot do. But, you can regression test a lot of
> other things, and having a default test suite that is constantly being
> added to and always being run before releases (that test hardware
> agnostic stuff) could help cut down on the number of regressions in
> new releases.
> You can't test everything this way, nor should you, but you can test
> many things, and adding a bit of formal testing to the release
> procedure wouldn't be a bad thing IMO.

But if you read peoples complaints about regression they are nearly
always to do with hardware that used to work not working any more ..
alps touchpads, sound cards, software suspend.. so these people still
gain nothing by you regression testing anything so you still get as
many reports.. the -rc series is meant to provide the testing for the
release so nothing really big gets through (like can't boot from IDE
anymore or something like that)....

Dave.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:16         ` Dave Airlie
@ 2005-07-15 21:39           ` Mark Gross
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Gross @ 2005-07-15 21:39 UTC (permalink / raw)
  To: Dave Airlie, Jesper Juhl; +Cc: Chris Friesen, Andi Kleen, linux-kernel

On Thursday 14 July 2005 19:16, Dave Airlie wrote:
> > That, of course, you cannot do. But, you can regression test a lot of
> > other things, and having a default test suite that is constantly being
> > added to and always being run before releases (that test hardware
> > agnostic stuff) could help cut down on the number of regressions in
> > new releases.
> > You can't test everything this way, nor should you, but you can test
> > many things, and adding a bit of formal testing to the release
> > procedure wouldn't be a bad thing IMO.
>
> But if you read peoples complaints about regression they are nearly
> always to do with hardware that used to work not working any more ..
> alps touchpads, sound cards, software suspend.. so these people still
> gain nothing by you regression testing anything so you still get as
> many reports.. the -rc series is meant to provide the testing for the
> release so nothing really big gets through (like can't boot from IDE
> anymore or something like that)....
>

I've seen large labs of lots of different systems used for dedicated testing 
of products I've worked on in the past.  The validation folks held the keys 
to the build and if a change got in that broke on an important OEM's 
hardware, then everything stops until that change is either fixed or backed 
out.

It aint cheap.  In open source we are attempting to simulate this, but we 
don't simulate the control of the validation leads.

> Dave.

-- 
--mgross
BTW: This may or may not be the opinion of my employer, more likely not.  


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  1:45   ` Jesper Juhl
  2005-07-15  2:02     ` Chris Friesen
@ 2005-07-15  2:09     ` Dave Jones
  2005-07-15 21:47       ` Mark Gross
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Jones @ 2005-07-15  2:09 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Andi Kleen, Mark Gross, linux-kernel

On Fri, Jul 15, 2005 at 03:45:28AM +0200, Jesper Juhl wrote:
 
 > > > The problem is the process, not than the code.
 > > > * The issues are too much ad-hock code flux without enough disciplined/formal
 > > > regression testing and review.
 > > 
 > > It's basically impossible to regression test swsusp except to release it.
 > > Its success or failure depends on exactly the driver combination/platform/BIOS
 > > version etc.  e.g. all drivers have to cooperate and the particular
 > > bugs in your BIOS need to be worked around etc. Since that is quite fragile
 > > regressions are common.
 > > 
 > > However in some other cases I agree some more regression testing
 > > before release would be nice. But that's not how Linux works.  Linux
 > > does regression testing after release.
 > > 
 > And who says that couldn't change?
 > 
 > In my oppinion it would be nice if Linus/Andrew had some basic
 > regression tests they could run on kernels before releasing them.

The problem is that this wouldn't cover the more painful problems
such as hardware specific problems.

As Fedora kernel maintainer, I frequently get asked why peoples
sound cards stopped working when they did an update, or why
their system no longer boots, usually followed by a
"wasnt this update tested before it was released?"

The bulk of all the regressions I see reported every time
I put out a kernel update rpm that rebases to a newer
upstream release are in drivers. Those just aren't going
to be caught by folks that don't have the hardware.

The only way to cover as many combinations of hardware
out there is by releasing test kernels. (Updates-testing
repository for Fedora users, or -rc kernels in Linus' case).
If users won't/don't test those 'test' releases, we're
going to regress when the final release happens, there's
no two ways about it.

		Dave


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:09     ` Dave Jones
@ 2005-07-15 21:47       ` Mark Gross
  2005-07-15 22:19         ` Dave Jones
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Mark Gross @ 2005-07-15 21:47 UTC (permalink / raw)
  To: Dave Jones, Jesper Juhl; +Cc: Andi Kleen, linux-kernel

On Thursday 14 July 2005 19:09, Dave Jones wrote:
> On Fri, Jul 15, 2005 at 03:45:28AM +0200, Jesper Juhl wrote:
>  > > > The problem is the process, not than the code.
>  > > > * The issues are too much ad-hock code flux without enough
>  > > > disciplined/formal regression testing and review.
>  > >
>  > > It's basically impossible to regression test swsusp except to release
>  > > it. Its success or failure depends on exactly the driver
>  > > combination/platform/BIOS version etc.  e.g. all drivers have to
>  > > cooperate and the particular bugs in your BIOS need to be worked
>  > > around etc. Since that is quite fragile regressions are common.
>  > >
>  > > However in some other cases I agree some more regression testing
>  > > before release would be nice. But that's not how Linux works.  Linux
>  > > does regression testing after release.
>  >
>  > And who says that couldn't change?
>  >
>  > In my oppinion it would be nice if Linus/Andrew had some basic
>  > regression tests they could run on kernels before releasing them.
>
> The problem is that this wouldn't cover the more painful problems
> such as hardware specific problems.
>
> As Fedora kernel maintainer, I frequently get asked why peoples
> sound cards stopped working when they did an update, or why
> their system no longer boots, usually followed by a
> "wasnt this update tested before it was released?"
>
> The bulk of all the regressions I see reported every time
> I put out a kernel update rpm that rebases to a newer
> upstream release are in drivers. Those just aren't going
> to be caught by folks that don't have the hardware.

This problem is the developer making driver changes without have the resources 
to test the changes on a enough of the hardware effected by his change, and 
therefore probubly shouldn't be making changes they cannot realisticaly test.

What would be wrong in expecting the folks making the driver changes have some 
story on how they are validating there changes don't break existing working 
hardware?  I could probly be accomplished in open source with subsystem 
testing volenteers.

>
> The only way to cover as many combinations of hardware
> out there is by releasing test kernels. (Updates-testing
> repository for Fedora users, or -rc kernels in Linus' case).
> If users won't/don't test those 'test' releases, we're
> going to regress when the final release happens, there's
> no two ways about it.

You can't blame the users!  Don't fall into that trap.  Its not productive.

>
> 		Dave

-- 
--mgross
BTW: This may or may not be the opinion of my employer, more likely not.  


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15 21:47       ` Mark Gross
@ 2005-07-15 22:19         ` Dave Jones
  2005-07-15 22:25         ` David Lang
  2005-07-15 23:14         ` Rik van Riel
  2 siblings, 0 replies; 21+ messages in thread
From: Dave Jones @ 2005-07-15 22:19 UTC (permalink / raw)
  To: Mark Gross; +Cc: Jesper Juhl, Andi Kleen, linux-kernel

On Fri, Jul 15, 2005 at 02:47:46PM -0700, Mark Gross wrote:

 > This problem is the developer making driver changes without have the resources 
 > to test the changes on a enough of the hardware effected by his change, and 
 > therefore probubly shouldn't be making changes they cannot realisticaly test.

Such is life. The situation arises quite often where fixing a bug
for one person breaks it for another. The lack of hardware to test on
isn't the fault of the person making the change, nor the person requesting
the change. The problem is that the person it breaks for doesn't test
testing kernels, so the problem is only found out about when its too late.

The agpgart driver for example supports around 50-60 different chipsets.
I don't have a tenth of the hardware that it supports at my disposal,
yet when I get patches fixing some problem for someone, or adding support
for yet another variant, I'm not going to go out and find the variants
I don't have.  By your metric I shouldn't apply that change.

That's not how things work.

 > What would be wrong in expecting the folks making the driver changes have some 
 > story on how they are validating there changes don't break existing working 
 > hardware?

It's impractical given the plethora of hardware combinations out there.

 > I could probly be accomplished in open source with subsystem 
 > testing volenteers.

People tend not to test things marked 'test kernels' or 'rc kernels'.
They prefer to shout loudly when the final release happens, and
blame it on 'the new kernel development model sucking'.

 > > The only way to cover as many combinations of hardware
 > > out there is by releasing test kernels. (Updates-testing
 > > repository for Fedora users, or -rc kernels in Linus' case).
 > > If users won't/don't test those 'test' releases, we're
 > > going to regress when the final release happens, there's
 > > no two ways about it.
 > 
 > You can't blame the users!  Don't fall into that trap.  Its not productive.

You're missing my point. The bits are out there for people to
test with.  We can't help people who won't help themselves,
and they shouldn't be at all surprised to find things breaking
if they choose to not take part in testing.

		Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15 21:47       ` Mark Gross
  2005-07-15 22:19         ` Dave Jones
@ 2005-07-15 22:25         ` David Lang
  2005-07-15 23:14         ` Rik van Riel
  2 siblings, 0 replies; 21+ messages in thread
From: David Lang @ 2005-07-15 22:25 UTC (permalink / raw)
  To: Mark Gross; +Cc: Dave Jones, Jesper Juhl, Andi Kleen, linux-kernel

On Fri, 15 Jul 2005, Mark Gross wrote:

> On Thursday 14 July 2005 19:09, Dave Jones wrote:
>> On Fri, Jul 15, 2005 at 03:45:28AM +0200, Jesper Juhl wrote:
>> >>> The problem is the process, not than the code.
>> >>> * The issues are too much ad-hock code flux without enough
>> >>> disciplined/formal regression testing and review.
>> >>
>> >> It's basically impossible to regression test swsusp except to release
>> >> it. Its success or failure depends on exactly the driver
>> >> combination/platform/BIOS version etc.  e.g. all drivers have to
>> >> cooperate and the particular bugs in your BIOS need to be worked
>> >> around etc. Since that is quite fragile regressions are common.
>> >>
>> >> However in some other cases I agree some more regression testing
>> >> before release would be nice. But that's not how Linux works.  Linux
>> >> does regression testing after release.
>> >
>> > And who says that couldn't change?
>> >
>> > In my oppinion it would be nice if Linus/Andrew had some basic
>> > regression tests they could run on kernels before releasing them.
>>
>> The problem is that this wouldn't cover the more painful problems
>> such as hardware specific problems.
>>
>> As Fedora kernel maintainer, I frequently get asked why peoples
>> sound cards stopped working when they did an update, or why
>> their system no longer boots, usually followed by a
>> "wasnt this update tested before it was released?"
>>
>> The bulk of all the regressions I see reported every time
>> I put out a kernel update rpm that rebases to a newer
>> upstream release are in drivers. Those just aren't going
>> to be caught by folks that don't have the hardware.
>
> This problem is the developer making driver changes without have the resources
> to test the changes on a enough of the hardware effected by his change, and
> therefore probubly shouldn't be making changes they cannot realisticaly test.
>
> What would be wrong in expecting the folks making the driver changes have some
> story on how they are validating there changes don't break existing working
> hardware?  I could probly be accomplished in open source with subsystem
> testing volenteers.

in that case you will have a lot of drivers that won't work becouse the 
rest of the kernel has changed and they haven't been changed to match.

do you have the resources to test a few hundred network cards, video 
cards, etc? if you do great, hope you can help out, if not why should you 
require other kernel folks to have resources that you don't have?

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15 21:47       ` Mark Gross
  2005-07-15 22:19         ` Dave Jones
  2005-07-15 22:25         ` David Lang
@ 2005-07-15 23:14         ` Rik van Riel
  2005-07-18 21:14           ` Mark Gross
  2 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2005-07-15 23:14 UTC (permalink / raw)
  To: Mark Gross; +Cc: Dave Jones, Jesper Juhl, Andi Kleen, linux-kernel

On Fri, 15 Jul 2005, Mark Gross wrote:

> What would be wrong in expecting the folks making the driver changes 
> have some story on how they are validating there changes don't break 
> existing working hardware?  I could probly be accomplished in open 
> source with subsystem testing volenteers.

Are you volunteering ?

-- 
The Theory of Escalating Commitment: "The cost of continuing mistakes is
borne by others, while the cost of admitting mistakes is borne by yourself."
  -- Joseph Stiglitz, Nobel Laureate in Economics

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15 23:14         ` Rik van Riel
@ 2005-07-18 21:14           ` Mark Gross
  2005-07-19 10:12             ` Paolo Ciarrocchi
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Gross @ 2005-07-18 21:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Dave Jones, Jesper Juhl, Andi Kleen, linux-kernel

On Friday 15 July 2005 16:14, Rik van Riel wrote:
> On Fri, 15 Jul 2005, Mark Gross wrote:
> > What would be wrong in expecting the folks making the driver changes
> > have some story on how they are validating there changes don't break
> > existing working hardware?  I could probly be accomplished in open
> > source with subsystem testing volenteers.
>
> Are you volunteering ?

I am not volunteering.  That last sentence was meant to say "It could 
probubly..."

I'm just poking at a process change that would include a more formal 
validation / testing phase as part of getting change into the stable tree.  I 
don't have any silver bullets.

-- 
--mgross
BTW: This may or may not be the opinion of my employer, more likely not.  


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-18 21:14           ` Mark Gross
@ 2005-07-19 10:12             ` Paolo Ciarrocchi
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Ciarrocchi @ 2005-07-19 10:12 UTC (permalink / raw)
  To: Mark Gross
  Cc: Rik van Riel, Dave Jones, Jesper Juhl, Andi Kleen, linux-kernel

2005/7/18, Mark Gross <mgross@linux.intel.com>:
> On Friday 15 July 2005 16:14, Rik van Riel wrote:
> > On Fri, 15 Jul 2005, Mark Gross wrote:
> > > What would be wrong in expecting the folks making the driver changes
> > > have some story on how they are validating there changes don't break
> > > existing working hardware?  I could probly be accomplished in open
> > > source with subsystem testing volenteers.
> >
> > Are you volunteering ?
> 
> I am not volunteering.  That last sentence was meant to say "It could
> probubly..."
> 
> I'm just poking at a process change that would include a more formal
> validation / testing phase as part of getting change into the stable tree.  I
> don't have any silver bullets.

I totaly agree with you, but the real problem is *how* to do that.
Do you have any suggestion ?

-- 
Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  0:38 ` Why is 2.6.12.2 less stable on my laptop than 2.6.10? Andi Kleen
  2005-07-15  1:45   ` Jesper Juhl
@ 2005-07-15  2:09   ` Parag Warudkar
  2005-07-15  2:14     ` Andi Kleen
  2005-07-15 13:32     ` Alan Cox
  1 sibling, 2 replies; 21+ messages in thread
From: Parag Warudkar @ 2005-07-15  2:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Gross, linux-kernel

On Thursday 14 July 2005 20:38, Andi Kleen wrote:
> It's basically impossible to regression test swsusp except to release it.
> Its success or failure depends on exactly the driver
> combination/platform/BIOS version etc.  e.g. all drivers have to cooperate
> and the particular bugs in your BIOS need to be worked around etc. Since
> that is quite fragile regressions are common.

I have always wondered how Windows got it right circa 1995 - Version after 
version, several different hardwares and it always works reliably. 
I am using Linux since 1997 and not a single time have I succeeded in getting 
it to suspend and resume reliably. 

Is it such an un-interesting subject to warrant serious effort or there is a 
lot of hardware documentation missing or in general the driver model and OS 
design itself makes it impossible to get suspend / resume right?

Parag

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:09   ` Parag Warudkar
@ 2005-07-15  2:14     ` Andi Kleen
  2005-07-15 13:32     ` Alan Cox
  1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-07-15  2:14 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: Andi Kleen, Mark Gross, linux-kernel

On Thu, Jul 14, 2005 at 10:09:11PM -0400, Parag Warudkar wrote:
> I have always wondered how Windows got it right circa 1995 - Version after 
> version, several different hardwares and it always works reliably. 
> I am using Linux since 1997 and not a single time have I succeeded in getting 
> it to suspend and resume reliably. 

What happens with Windows is that the Laptop vendor takes the
frozen Windows version available at the time the machine hits the market 
and then tweaks the BIOS and the drivers until everything runs and then
releases the machine.

But if you use newer (or older) W. releases or even service packs or different
drivers on that machine you end up exactly with the same problem.

> Is it such an un-interesting subject to warrant serious effort or there is a 
> lot of hardware documentation missing or in general the driver model and OS 
> design itself makes it impossible to get suspend / resume right?

I think you underestimate the complexity of the problem. Suspend/resume
is a fragile cooperation  of many many different components in the kernel/firmware/hardware
and all of them have to work flawlessly together.  That's hard.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-15  2:09   ` Parag Warudkar
  2005-07-15  2:14     ` Andi Kleen
@ 2005-07-15 13:32     ` Alan Cox
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Cox @ 2005-07-15 13:32 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: Andi Kleen, Mark Gross, Linux Kernel Mailing List

> I have always wondered how Windows got it right circa 1995 - Version after 
> version, several different hardwares and it always works reliably. 
> I am using Linux since 1997 and not a single time have I succeeded in getting 
> it to suspend and resume reliably. 

Because Windows at the time used the APM BIOS and the APM BIOS vendors
made sure Windows worked and generally didnt care about more. When the
vendor got it right it worked, indeed Linux back to 1.x will suspend to
disk nicely on an old IBM thinkpad.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Why is 2.6.12.2 less stable on my laptop than 2.6.10?
@ 2005-07-14 16:12 Mark Gross
  2005-07-14 23:55 ` Andrew Morton
  2005-07-15  8:45 ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Mark Gross @ 2005-07-14 16:12 UTC (permalink / raw)
  To: linux-kernel

I know this is a broken record, but the development process within the LKML 
isn't resulting in more stable and better code.  Some process change could be 
a good thing.

Why does my alps mouse pad have to stop working every time I test a new 
"STABLE" kernel?  

Why does swsup have to start hanging on shut and startup down randomly?

I rolled back my home box with 2.6.10 because I want some stability (2.6.10 
has problems with swsusp from time to time, but it livable for me, for now.)

The process is broken if on a stable series we cannot at least make sure 
obvious regressions don't smack users between the eyes.

I see the problem as that too much code flux is happening from people without 
the resources, or discipline, to effectively regresion test for side effects 
of their changes.  

I know there is a lot of back patting on how well the dot-dot stability 
release process is working, but that process is a solution for a different 
and simpler problem and we still have breakage.

Stability and deliberate feature design and development along with disciplined 
regression testing and validation is what is needed.  Why can't there be more 
targeted and planned development?  Are we in a race to see how many changes 
we can push into a "stable" tree?

Shouldn't changes be regression tested, formally, before its allowed to go 
into a tree? 

Why can't I expect SWSusp work better and more reliable from release to 
release?  

I know there is a point where software goes from fun to work, but without more 
deliberate and disciplined WORK I see the 2.6 tree spinning out of control.

The problem is the process, not than the code.
* The issues are too much ad-hock code flux without enough disciplined/formal 
regression testing and review.  
* Small regressions are accepted and expected to be cached latter.
* ad-hock validation before changes are accepted.

Some possible things that could help:

*Addopt a no-regressions-allowed policy and everthing stops until any 
identified regressions (in performance, functionally or stability) is fixed 
or the changes are all rolled back.  This works really well if in addition 
organized pre-flight testing is done before calling a new version number.  
You simply cannot rely on ad-hock regression testing and reporting.  Its got 
too much latency.
* assign validation folks that the developer need to appease before changes 
are allowed to be accepted into the tree. 
* Make all changes to the kernel not be submitted by the developers, but by 
designated subsystem validation owners.  If too many bugs continue to sneak 
by address the problem by adding validation help to that subsystem or get a 
new owner for the problem subsystem.  (<-- I like this one a lot.)
* start 2.7 
* all of the above (<--this one is good too)

--mgross
BTW: This may or may not be the opinion of my employer, more likely not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-14 16:12 Mark Gross
@ 2005-07-14 23:55 ` Andrew Morton
  2005-07-15  8:45 ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2005-07-14 23:55 UTC (permalink / raw)
  To: Mark Gross; +Cc: linux-kernel

Mark Gross <mgross@linux.intel.com> wrote:
>
> I know this is a broken record, but the development process within the LKML 
>  isn't resulting in more stable and better code.  Some process change could be 
>  a good thing.

We rely upon people (such as mgross@linux.intel.com!) to send bug reports.

>  Why does my alps mouse pad have to stop working every time I test a new 
>  "STABLE" kernel?  

The alps driver is always broken.  Seems to be a feature.

Please test 2.6.13-rc3 and if it also fails send a comprehensive bug report
to Dmitry Torokhov <dtor_core@ameritech.net> and Vojtech Pavlik
<vojtech@suse.cz>

>  Why does swsup have to start hanging on shut and startup down randomly?

swsusp also is a problematic feature.  You appear to have chosen two of the
very most problematic parts of the kernel (you missed ACPI) and then
generalised them to the whole.  That isn't valid.

Please test 2.6.13-rc3 and if it also fails send a comprehensive bug report
to Pavel Machek <pavel@ucw.cz>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why is 2.6.12.2 less stable on my laptop than 2.6.10?
  2005-07-14 16:12 Mark Gross
  2005-07-14 23:55 ` Andrew Morton
@ 2005-07-15  8:45 ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2005-07-15  8:45 UTC (permalink / raw)
  To: Mark Gross; +Cc: linux-kernel

Hi!

> Why can't I expect SWSusp work better and more reliable from release to 
> release?  

Patches welcome. Or employ someone to do swsusp development for you.

> Some possible things that could help:
> 
> *Addopt a no-regressions-allowed policy and everthing stops until any 
> identified regressions (in performance, functionally or stability) is fixed 
> or the changes are all rolled back.  This works really well if in addition 
> organized pre-flight testing is done before calling a new version number.  
> You simply cannot rely on ad-hock regression testing and reporting.  Its got 
> too much latency.

This would also mean "no development at all".

> * assign validation folks that the developer need to appease before changes 
> are allowed to be accepted into the tree. 

So... get me someone to test swsusp in each -rc and -mm
release... that would help. If you can't provide the manpower, why are
you whining?

									Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-07-19 10:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200507140912.22532.mgross@linux.intel.com.suse.lists.linux.kernel>
2005-07-15  0:38 ` Why is 2.6.12.2 less stable on my laptop than 2.6.10? Andi Kleen
2005-07-15  1:45   ` Jesper Juhl
2005-07-15  2:02     ` Chris Friesen
2005-07-15  2:06       ` Jesper Juhl
2005-07-15  2:09         ` Andi Kleen
2005-07-15 21:33           ` Mark Gross
2005-07-15  2:16         ` Dave Airlie
2005-07-15 21:39           ` Mark Gross
2005-07-15  2:09     ` Dave Jones
2005-07-15 21:47       ` Mark Gross
2005-07-15 22:19         ` Dave Jones
2005-07-15 22:25         ` David Lang
2005-07-15 23:14         ` Rik van Riel
2005-07-18 21:14           ` Mark Gross
2005-07-19 10:12             ` Paolo Ciarrocchi
2005-07-15  2:09   ` Parag Warudkar
2005-07-15  2:14     ` Andi Kleen
2005-07-15 13:32     ` Alan Cox
2005-07-14 16:12 Mark Gross
2005-07-14 23:55 ` Andrew Morton
2005-07-15  8:45 ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox