* stable? quality assurance?
@ 2010-07-11 7:18 Martin Steigerwald
2010-07-11 8:39 ` Eric Dumazet
` (4 more replies)
0 siblings, 5 replies; 72+ messages in thread
From: Martin Steigerwald @ 2010-07-11 7:18 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]
Hi!
2.6.34 was a desaster for me: bug #15969 - patch was availble before
2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well
as most important two complete lockups - well maybe just X.org and radeon
KMS, I didn't start my second laptop to SSH into the locked up one - on my
ThinkPad T42. I fixed the first one with the patch, but after the lockups I
just downgraded to 2.6.33 again.
I still actually *use* my machines for something else than hunting patches
for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel"
(accentuation from me). I know of the argument that one should use a
distro kernel for machines that are for production use. But frankly, does
that justify to deliver in advance known crap to the distributors? What
impact do partly grave bugs reported on bugzilla have on the release
decision?
And how about people who have their reasons - mine is TuxOnIce - to
compile their own kernels?
Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the
freezes as well. So far so good.
Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the
website. And I just again always wait for .2 or .3, as with 2.6.34.1 I
still have some problems like the hang on hibernation reported in
hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1
on this mailing list just a moment ago. But then 2.6.33 did hang with
TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since
2.6.34 did not hang with it anymore which was a reason for me to try
2.6.34 earlier.
I am quite a bit worried about the quality of the recent kernels. Some
iterations earlier I just compiled them, partly even rc-ones which I do
not expact to be table, and they just worked. But in the recent times .0,
partly even .1 or .2 versions haven't been stable for me quite some times
already and thus they better not be advertised as such on kernel.org I
think. I am willing to risk some testing and do bug reports, but these are
still production machines, I do not have any spare test machines, and
there needs to be some balance, i.e. the kernels should basically work.
Thus I for sure will be more reluctant to upgrade in the future.
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 72+ messages in thread* Re: stable? quality assurance? 2010-07-11 7:18 stable? quality assurance? Martin Steigerwald @ 2010-07-11 8:39 ` Eric Dumazet 2010-07-11 14:22 ` Martin Steigerwald ` (2 more replies) 2010-07-11 13:16 ` Ted Ts'o ` (3 subsequent siblings) 4 siblings, 3 replies; 72+ messages in thread From: Eric Dumazet @ 2010-07-11 8:39 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit : > Hi! > > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well > as most important two complete lockups - well maybe just X.org and radeon > KMS, I didn't start my second laptop to SSH into the locked up one - on my > ThinkPad T42. I fixed the first one with the patch, but after the lockups I > just downgraded to 2.6.33 again. > > I still actually *use* my machines for something else than hunting patches > for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" > (accentuation from me). I know of the argument that one should use a > distro kernel for machines that are for production use. But frankly, does > that justify to deliver in advance known crap to the distributors? What > impact do partly grave bugs reported on bugzilla have on the release > decision? > > And how about people who have their reasons - mine is TuxOnIce - to > compile their own kernels? > > Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the > freezes as well. So far so good. > > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the > website. And I just again always wait for .2 or .3, as with 2.6.34.1 I > still have some problems like the hang on hibernation reported in > > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > on this mailing list just a moment ago. But then 2.6.33 did hang with > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since > 2.6.34 did not hang with it anymore which was a reason for me to try > 2.6.34 earlier. > > I am quite a bit worried about the quality of the recent kernels. Some > iterations earlier I just compiled them, partly even rc-ones which I do > not expact to be table, and they just worked. But in the recent times .0, > partly even .1 or .2 versions haven't been stable for me quite some times > already and thus they better not be advertised as such on kernel.org I > think. I am willing to risk some testing and do bug reports, but these are > still production machines, I do not have any spare test machines, and > there needs to be some balance, i.e. the kernels should basically work. > Thus I for sure will be more reluctant to upgrade in the future. > > Ciao, Anybody running latest kernel on a production machine is living dangerously. Dont you already know that ? When 2.6.X is released, everybody knows it contains at least 100 bugs. It was true for all previous values of X, it will be true for all futures values. If you want to be safer, use a one year old kernel, with all stable patches in. Something like 2.6.32.16 : Its probably more stable than all 2.6.X kernels. If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6 contains numerous bug fixes. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 8:39 ` Eric Dumazet @ 2010-07-11 14:22 ` Martin Steigerwald 2010-07-11 14:52 ` Martin Steigerwald 2010-07-11 15:58 ` William Pitcock 2010-07-11 17:04 ` Heinz Diehl 2 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-07-11 14:22 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 2998 bytes --] Am Sonntag 11 Juli 2010 schrieb Eric Dumazet: > Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit : > > Hi! Hi Eric, > > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as > > well as most important two complete lockups - well maybe just X.org > > and radeon KMS, I didn't start my second laptop to SSH into the > > locked up one - on my ThinkPad T42. I fixed the first one with the > > patch, but after the lockups I just downgraded to 2.6.33 again. > > > > I still actually *use* my machines for something else than hunting > > patches for kernel bugs and on kernel.org it is written "Latest > > *Stable* Kernel" (accentuation from me). I know of the argument that [...] > > advertised as such on kernel.org I think. I am willing to risk some > > testing and do bug reports, but these are still production machines, > > I do not have any spare test machines, and there needs to be some > > balance, i.e. the kernels should basically work. Thus I for sure > > will be more reluctant to upgrade in the future. > > > > Ciao, > > Anybody running latest kernel on a production machine is living > dangerously. Dont you already know that ? Yes, and I indicated it above. But in my - naturally rather subjective I admit - perception the balance between stable and unstable from about 1 or 2 years ago has been lost. In my personal experience it has gotten much worse in the last time. To the extent that I skipped some major kernels versions completely. For example 2.6.30. And its not servers - these use distro kernels. > When 2.6.X is released, everybody knows it contains at least 100 bugs. Then why its still labeled "stable" on kernel.org? It is not. It is at most beta quality software. Its not more stable than KDE 4.0 wasn't stable, but at least they mentioned in the release notes. > It was true for all previous values of X, it will be true for all > futures values. > > If you want to be safer, use a one year old kernel, with all stable > patches in. > > Something like 2.6.32.16 : Its probably more stable than all 2.6.X > kernels. > > If 2.6.33 runs OK on your machine, you are lucky, since 2.6.33.6 > contains numerous bug fixes. Actually it was 2.6.33.1 with userspace software suspend and it had pretty good uptimes above 20 days - only interrupted by installing 2.6.34. Well then if everybody else considers this for granted I just replace that "stable" on kernel.org by "beta quality" - from my perception it does not even have release candidate status in the last iterations - in my mind and be done with it. At as soon as the kernel contains a performant hibernation infrastructure I will probably just use distro kernels and be done with it. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 14:22 ` Martin Steigerwald @ 2010-07-11 14:52 ` Martin Steigerwald 0 siblings, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-07-11 14:52 UTC (permalink / raw) To: linux-kernel Am Sonntag 11 Juli 2010 schrieb Martin Steigerwald: > worse in the last time. To the extent that I skipped some major > kernels versions completely. For example 2.6.30. Okay, not some, but one. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 8:39 ` Eric Dumazet 2010-07-11 14:22 ` Martin Steigerwald @ 2010-07-11 15:58 ` William Pitcock 2010-07-11 16:34 ` Eric Dumazet 2010-07-16 6:59 ` Greg KH 2010-07-11 17:04 ` Heinz Diehl 2 siblings, 2 replies; 72+ messages in thread From: William Pitcock @ 2010-07-11 15:58 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel ----- "Eric Dumazet" <eric.dumazet@gmail.com> wrote: > Le dimanche 11 juillet 2010 à 09:18 +0200, Martin Steigerwald a écrit > : > > Hi! > > > > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > > > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, > as well > > as most important two complete lockups - well maybe just X.org and > radeon > > KMS, I didn't start my second laptop to SSH into the locked up one - > on my > > ThinkPad T42. I fixed the first one with the patch, but after the > lockups I > > just downgraded to 2.6.33 again. > > > > I still actually *use* my machines for something else than hunting > patches > > for kernel bugs and on kernel.org it is written "Latest *Stable* > Kernel" > > (accentuation from me). I know of the argument that one should use a > > > distro kernel for machines that are for production use. But frankly, > does > > that justify to deliver in advance known crap to the distributors? > What > > impact do partly grave bugs reported on bugzilla have on the release > > > decision? > > > > And how about people who have their reasons - mine is TuxOnIce - to > > > compile their own kernels? > > > > Well 2.6.34.1 fixed the two reported bugs and it seemed to have > fixed the > > freezes as well. So far so good. > > > > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on > the > > website. And I just again always wait for .2 or .3, as with 2.6.34.1 > I > > still have some problems like the hang on hibernation reported in > > > > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > > > on this mailing list just a moment ago. But then 2.6.33 did hang > with > > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, > since > > 2.6.34 did not hang with it anymore which was a reason for me to try > > > 2.6.34 earlier. > > > > I am quite a bit worried about the quality of the recent kernels. > Some > > iterations earlier I just compiled them, partly even rc-ones which I > do > > not expact to be table, and they just worked. But in the recent > times .0, > > partly even .1 or .2 versions haven't been stable for me quite some > times > > already and thus they better not be advertised as such on kernel.org > I > > think. I am willing to risk some testing and do bug reports, but > these are > > still production machines, I do not have any spare test machines, > and > > there needs to be some balance, i.e. the kernels should basically > work. > > Thus I for sure will be more reluctant to upgrade in the future. > > > > Ciao, > > Anybody running latest kernel on a production machine is living > dangerously. Dont you already know that ? > > When 2.6.X is released, everybody knows it contains at least 100 > bugs. > > It was true for all previous values of X, it will be true for all > futures values. > > If you want to be safer, use a one year old kernel, with all stable > patches in. > > Something like 2.6.32.16 : Its probably more stable than all 2.6.X > kernels. 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable as a Xen domU. I would say 2.6.32.12 is the best choice since who knows what other regressions there are in .16. William ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 15:58 ` William Pitcock @ 2010-07-11 16:34 ` Eric Dumazet 2010-07-16 6:59 ` Greg KH 1 sibling, 0 replies; 72+ messages in thread From: Eric Dumazet @ 2010-07-11 16:34 UTC (permalink / raw) To: William Pitcock; +Cc: linux-kernel Le dimanche 11 juillet 2010 à 19:58 +0400, William Pitcock a écrit : > ----- "Eric Dumazet" <eric.dumazet@gmail.com> wrote: > > > > Something like 2.6.32.16 : Its probably more stable than all 2.6.X > > kernels. > > 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable > as a Xen domU. I would say 2.6.32.12 is the best choice since who knows > what other regressions there are in .16. > Yea, strictly speaking, you can be sure no kernel will be bug free, ever. This is why I said "probably more stable" ;) ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 15:58 ` William Pitcock 2010-07-11 16:34 ` Eric Dumazet @ 2010-07-16 6:59 ` Greg KH 2010-08-05 3:27 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 72+ messages in thread From: Greg KH @ 2010-07-16 6:59 UTC (permalink / raw) To: William Pitcock; +Cc: Eric Dumazet, linux-kernel On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote: > 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable > as a Xen domU. I would say 2.6.32.12 is the best choice since who knows > what other regressions there are in .16. Did you happen to tell the stable maintainer about this and do a simple 'git bisect' to find the offending patch so that it can be resolved? {sigh} ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-16 6:59 ` Greg KH @ 2010-08-05 3:27 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 72+ messages in thread From: Jeremy Fitzhardinge @ 2010-08-05 3:27 UTC (permalink / raw) To: Greg KH; +Cc: William Pitcock, Eric Dumazet, linux-kernel On 07/15/2010 11:59 PM, Greg KH wrote: > On Sun, Jul 11, 2010 at 07:58:42PM +0400, William Pitcock wrote: >> 2.6.32.16 (possibly 2.6.32.15) has a regression where it is unusable >> as a Xen domU. I would say 2.6.32.12 is the best choice since who knows >> what other regressions there are in .16. > Did you happen to tell the stable maintainer about this and do a simple > 'git bisect' to find the offending patch so that it can be resolved? If it is compiled on Debian then its probably that cmpxchg memory argument bug which hits in pvclock.c. J ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 8:39 ` Eric Dumazet 2010-07-11 14:22 ` Martin Steigerwald 2010-07-11 15:58 ` William Pitcock @ 2010-07-11 17:04 ` Heinz Diehl 2 siblings, 0 replies; 72+ messages in thread From: Heinz Diehl @ 2010-07-11 17:04 UTC (permalink / raw) To: linux-kernel On 11.07.2010, Eric Dumazet wrote: > When 2.6.X is released, everybody knows it contains at least 100 bugs. [....] http://s5.directupload.net/file/d/2217/ckghonrx_jpg.htm :-) ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 7:18 stable? quality assurance? Martin Steigerwald 2010-07-11 8:39 ` Eric Dumazet @ 2010-07-11 13:16 ` Ted Ts'o 2010-07-11 18:02 ` Anca Emanuel ` (2 more replies) 2010-07-11 13:56 ` Lee Mathers ` (2 subsequent siblings) 4 siblings, 3 replies; 72+ messages in thread From: Ted Ts'o @ 2010-07-11 13:16 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote: > > I still actually *use* my machines for something else than hunting patches > for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" > (accentuation from me). I know of the argument that one should use a > distro kernel for machines that are for production use. But frankly, does > that justify to deliver in advance known crap to the distributors? What > impact do partly grave bugs reported on bugzilla have on the release > decision? So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when I find bugs, I report them and I help fix them. If more people did that, then the 2.6.X.0 releases would be more stable. But kernel development is a volunteer effort, so it's up to the volunteers to test and fix bugs during the rc4, -rc5 and -rc6 time frame. But if the work tails off, because the developers are busily working on new features for the new release, then past a certain point, delaying the release reaches a point of diminishing returns. This is why we do time-based releases. It is possible to do other types of release strategies, but look at Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens if you insist on waiting until all release blockers are fixed (and even with Debian, past a certain point the release engineer will still just reclassify bugs as no longer being release blockers --- after the stable release has slipped for months or years past the original projected release date.) So if you and others like you are willing to help, then the quality of the Linux kernels can continue to improve. But simply complaining about it is not likely to solve things, since threating to not be willing to upgrade kernels is generally not going to motivate many, if not most, of the volunteers who work on stablizing the kernel. > I am willing to risk some testing and do bug reports, but these are > still production machines, I do not have any spare test machines, and > there needs to be some balance, i.e. the kernels should basically work. So you want the latest and greatest new features in a brand-new kernel release, but you're not willing to pay for test machines, and you're not willing to pay for a distribution support... The fact that you are willing to do some testing is appreciated, but remember, there's no such thing as a free lunch. Linux may be a very good bargain (look at how much Oracle has increased its support contracts for Solaris!), but it's still not a free lunch. At the end of the day, you get what you put into it. Best regards, - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 13:16 ` Ted Ts'o @ 2010-07-11 18:02 ` Anca Emanuel 2010-07-12 6:46 ` David Newall 2010-09-04 17:12 ` Martin Steigerwald 2 siblings, 0 replies; 72+ messages in thread From: Anca Emanuel @ 2010-07-11 18:02 UTC (permalink / raw) To: Ted Ts'o, Martin Steigerwald, linux-kernel Offtopic. I'm using Ubuntu 10.04 and kernel 2.6.35-rc1 from kernel.ubuntu.com Wonking fine (stable, but my webcam still not working). Using this https://wiki.ubuntu.com/KernelTeam/GitKernelBuild tutorial to compile the kernel. But no success (it finish the compile but no deb packages). I have done it from virtualbox some weeks ago, and grub can not mount. Is there any tutorial how to build the kernel for Ubuntu 10.04 ? Please test it yourself in (Ubuntu 10.04): sudo cfdisk result: Bad primary partition 1. (any kernel, any enviroment). ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 13:16 ` Ted Ts'o 2010-07-11 18:02 ` Anca Emanuel @ 2010-07-12 6:46 ` David Newall [not found] ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com> 2010-09-04 17:12 ` Martin Steigerwald 2 siblings, 1 reply; 72+ messages in thread From: David Newall @ 2010-07-12 6:46 UTC (permalink / raw) To: Ted Ts'o, Martin Steigerwald, linux-kernel Ted Ts'o wrote: > It is possible to do other types of release strategies, but look at > Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens > if you insist on waiting until all release blockers are fixed I don't know if Ted intended to be snide, but that is how he sounded. And yet, his comment was a fair reflection of how core developers seem to feel about stability, namely that a stable kernel is obsolete and therefore not particularly desirable. (I use the word "stable" in it's common English meaning, not the almost inexplicable Tux variation.) I think the truth is that linux kernels are only ever stable as released by distributions, and then only the more conservative of them. What comes direct from kernel.org, I mean those called "latest stable", are an exercise in dissembling. It's stable because someone calls it stable, even though it crashes and has regressions? That's not stable, that's just misleading. Stable kernels *could* be stable. Debian succeeds. If it takes them a long time, that is only because the core developers fail to release reasonable quality kernels. Don't sneer at them because they do the right thing; do the right thing yourself so that they can produce more timely updates. I don't expect fair consideration of these comments; why change when shooting the messenger is so much more satisfying? ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>]
* Fwd: stable? quality assurance? [not found] ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com> @ 2010-07-12 12:35 ` Marcin Letyns 2010-07-12 12:42 ` Alexey Dobriyan 2010-07-12 15:56 ` David Newall 1 sibling, 1 reply; 72+ messages in thread From: Marcin Letyns @ 2010-07-12 12:35 UTC (permalink / raw) To: linux-kernel ---------- Forwarded message ---------- From: Marcin Letyns <mletyns@gmail.com> Date: 2010/7/12 Subject: Re: stable? quality assurance? To: David Newall <davidn@davidnewall.com> 2010/7/12 David Newall <davidn@davidnewall.com>: > > I don't know if Ted intended to be snide, but that is how he sounded. And > yet, his comment was a fair reflection of how core developers seem to feel > about stability, namely that a stable kernel is obsolete and therefore not > particularly desirable. (I use the word "stable" in it's common English > meaning, not the almost inexplicable Tux variation.) What about a bsd variation? Last time I tried freebsd it wasn't stable. It had problems with my hard drive controler. There are many regressions introduced in newer releases. I see you don't want Linux to be developed rapidly (remember your lame slow down please?). > I think the truth is that linux kernels are only ever stable as released by > distributions, and then only the more conservative of them. What comes > direct from kernel.org, I mean those called "latest stable", are an exercise > in dissembling. It's stable because someone calls it stable, even though it > crashes and has regressions? That's not stable, that's just misleading. Show me a "stable" kernel. Windows, *bsd, solaris, os x? There's none. I've never had problems with the newest mainline kernels, because they're rock stable and rock solid for me. Why don't go at freebsd.com and why don't you complain they should stop calling some of the freebsd releases a stable ones? There are regressions, crashes, but I guess it's a *bsd variation of a "stable" term. > Stable kernels *could* be stable. Debian succeeds. If it takes them a long > time, that is only because the core developers fail to release reasonable > quality kernels. Don't sneer at them because they do the right thing; do > the right thing yourself so that they can produce more timely updates. While there's Debian with the stable kernel then what the hell do you want? :> I don't want Debian with its old user space and with the old kernel. If this is what you want then what are you complaining here about? You want everyone to choose a Debian's way? Btw. it takes Debian developers a long time to make a release, mainly because of the user space... > I don't expect fair consideration of these comments; why change when > shooting the messenger is so much more satisfying? You missed the point, so what do you expect? Btw. slowing down would be very stupid. If you don't know why, it's because you're missing the point. > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 12:35 ` Fwd: " Marcin Letyns @ 2010-07-12 12:42 ` Alexey Dobriyan [not found] ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com> 2010-07-12 14:57 ` Valdis.Kletnieks 0 siblings, 2 replies; 72+ messages in thread From: Alexey Dobriyan @ 2010-07-12 12:42 UTC (permalink / raw) To: Marcin Letyns; +Cc: linux-kernel On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com> wrote: > Last time I tried freebsd it wasn't stable. It had problems with my hard > drive controler. This thread needs more anecdotal evidence. ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>]
* Fwd: stable? quality assurance? [not found] ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com> @ 2010-07-12 12:52 ` Marcin Letyns 0 siblings, 0 replies; 72+ messages in thread From: Marcin Letyns @ 2010-07-12 12:52 UTC (permalink / raw) To: linux-kernel ---------- Forwarded message ---------- From: Marcin Letyns <mletyns@gmail.com> Date: 2010/7/12 Subject: Re: stable? quality assurance? To: Alexey Dobriyan <adobriyan@gmail.com> 2010/7/12 Alexey Dobriyan <adobriyan@gmail.com>: > On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com> > > > This thread needs more anecdotal evidence. > This for sure! However, why should I care to provide something while other's don't? :> Anyways, I won't install freebsd anymore and I'm not interested in helping them. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 12:42 ` Alexey Dobriyan [not found] ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com> @ 2010-07-12 14:57 ` Valdis.Kletnieks 1 sibling, 0 replies; 72+ messages in thread From: Valdis.Kletnieks @ 2010-07-12 14:57 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: Marcin Letyns, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1564 bytes --] On Mon, 12 Jul 2010 15:42:32 +0300, Alexey Dobriyan said: > On Mon, Jul 12, 2010 at 3:35 PM, Marcin Letyns <mletyns@gmail.com> wrote: > > Last time I tried freebsd it wasn't stable. It had problems with my hard > > drive controler. > > This thread needs more anecdotal evidence. To be fair, the continual re-appearance of this thread is *always* anecdotal. It's always somebody who has trouble getting it to work on *their* hardware, or with *their* software, and insisting that stuff doesn't get shipped unless it works properly on everything. Apparently, having it work on 99.997% of the gear out there isn't good enough for them. Then there's the inevitable call for "no shipping with blocker bugs" - never with a good objective definition of what constitutes a "blocker" bug. Ted had it right - you insist on fixing *everything*, you end up with Debian Obsolete. It's the nature of the beast - you *will* detect regressions at something resembling an exponential-decay curve. The only question that remains is how close to zero it has to decay before the ship date - and there's no single answer for that which fits everybody. One point to note is that if you ship earlier, the decay rate increases because of wider deployment. As a result, it's quite probable that you get to some objective level of "stable" faster by releasing early and then releasing a half-dozen dot releases, instead of waiting for the 3 or 4 dozen people testing it before release to shake out all the bugs (which obviously won't happen due to things like access to hardware). [-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? [not found] ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com> 2010-07-12 12:35 ` Fwd: " Marcin Letyns @ 2010-07-12 15:56 ` David Newall 2010-07-12 17:48 ` Marcin Letyns ` (2 more replies) 1 sibling, 3 replies; 72+ messages in thread From: David Newall @ 2010-07-12 15:56 UTC (permalink / raw) To: Marcin Letyns; +Cc: Linux Kernel Mailing List Marcin, >> I don't expect fair consideration of these comments; why change when >> shooting the messenger is so much more satisfying? >> Q.E.D. First, for the sake of brevity, I want it agreed that we're talking about new kernels, not those which are old, time-tested and patched. I didn't notice anyone say they want Linux development to slow down; rather, and not just in this thread but in many threads before, that kernels released as "stable" fail to meet the common meaning of that word; and this needs to be improved. Predictably, the common response sounds a bit like "shut up, go away, you're an idiot, it doesn't happen to me." These are not useful as they serve not one whit to improve the situation, but give pause to those who might otherwise want to bring up a valid issue, once more. Expectations are key to the problem. When Linus says, "here is a shiny new, stable kernel", he creates expectations. When that kernel proves unstable, those expectations are dashed and confidence in Linux suffers. There's no reason why development methods need to change in order to reduce the number of flaky "stable" kernels. It would be sufficient to replace the somewhat deceptive word "stable" with one that is more accurate; beta or gamma test make sense as they already have industry acceptance. Clearly "stable" is not appropriate, as implicitly agreed by others who have advised: "don't use in production"; "wait at least a year"; and more. Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I doubt anybody honestly thinks otherwise. As to whether other operating systems are stable, well that's a fair question. I agree that few large bodies of computer code are flawless, and so stability can be relative. In that spirit I venture to put the stipulated kernels into order of decreasing reliability: Best is BSD, Solaris & OS X; then Windows; and then there's Linux. If named distributions had been included, the list would look better (for us); they'd go in the first group. Thank goodness for the Debian, Red Hat and Novell (to name just a few) for giving the world something which does, at least largely, meet expectations. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 15:56 ` David Newall @ 2010-07-12 17:48 ` Marcin Letyns 2010-07-12 18:00 ` Stefan Richter 2010-07-13 16:50 ` Theodore Tso 2 siblings, 0 replies; 72+ messages in thread From: Marcin Letyns @ 2010-07-12 17:48 UTC (permalink / raw) To: David Newall; +Cc: Linux Kernel Mailing List 2010/7/12 David Newall <davidn@davidnewall.com>: > > First, for the sake of brevity, I want it agreed that we're talking about > new kernels, not those which are old, time-tested and patched. > > I didn't notice anyone say they want Linux development to slow down; >rather, > and not just in this thread but in many threads before, that kernels > released as "stable" fail to meet the common meaning of that word; and > this needs to be improved. I remember when Greg (correct me if I'm wrong) said something like there are no more stable releases. Those are distros which should choose a 'proper' kernel. This seems to be working well: Ubuntu usually ships with the one release older kernel, the same about Debian, but they're much more restrictive and some other distros. Those who wants to live on a bleeding edge they choose Fedora with the latest kernel etc. Personally, I consider the LTS kernel is a stable one and IMHO, like someone said in this thread before, the latest mainline kernel shouldn't be called stable, but differently. > Predictably, the common response sounds a bit like > "shut up, go away, you're an idiot, it doesn't happen to me." These are not > useful as they serve not one whit to improve the situation, but give pause > to those who might otherwise want to bring up a valid issue, once more. Yes, I apologize for this. After reading your response now, such complains are much more clear to me. > There's no > reason why development methods need to change in order to reduce the > number > of flaky "stable" kernels. It would be sufficient to replace the somewhat > deceptive word "stable" with one that is more accurate; beta or gamma >test > make sense as they already have industry acceptance. Clearly "stable" is > not appropriate, as implicitly agreed by others who have advised: "don't >use > in production"; "wait at least a year"; and more. > > Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I doubt > anybody honestly thinks otherwise. This is the whole point IMHO. :D Fully agree with you here. > As to whether other operating systems are stable, well that's a fair > question. I agree that few large bodies of computer code are flawless, and > so stability can be relative. In that spirit I venture to put the > stipulated kernels into order of decreasing reliability: Best is BSD, > Solaris & OS X; then Windows; and then there's Linux. If named > distributions had been included, the list would look better (for us); they'd > go in the first group. Thank goodness for the Debian, Red Hat and Novell > (to name just a few) for giving the world something which does, at least > largely, meet expectations. > In my opinion you shouldn't compare the latest Linux kernel (however, such comparison would be fair if the latest Linux kernel would be a 'real' stable one) to other operating systems, but rather you should just compare proper Linux distributions: Debian, RHEL to FreeBSD and Solaris, OpenSuse, Kubuntu to Windows and OS X etc. Otherwise, it's like comparing some *BSD development branch to Debian. The similar situation to described in this thread is when comes to Fedora. There are people (Linux newbies etc.) who can consider Fedora is just an another ordinary, Linux distribution, but they're wrong. Fedora usually ships with the latest, experimental stuff and if some newbie (or even developer) decides to use Fedora and then he discovers things simply brake he can consider Linux is a mess. Fedora shipped with KDE 4.0 development release and even Linus was taken in, because he probably thought it's a stable KDE release. Imho there should be a notice what people have to deal with. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 15:56 ` David Newall 2010-07-12 17:48 ` Marcin Letyns @ 2010-07-12 18:00 ` Stefan Richter 2010-07-12 19:58 ` David Newall 2010-07-13 16:50 ` Theodore Tso 2 siblings, 1 reply; 72+ messages in thread From: Stefan Richter @ 2010-07-12 18:00 UTC (permalink / raw) To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List David Newall wrote: > Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I > doubt anybody honestly thinks otherwise. It works stable for what I use it for. If it doesn't for you, then I hope you are already in contact with the respective subsystem developers to get the regressions that you experience fixed. -- Stefan Richter -=====-==-=- -=== -==-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 18:00 ` Stefan Richter @ 2010-07-12 19:58 ` David Newall 2010-07-12 21:11 ` Stefan Richter ` (2 more replies) 0 siblings, 3 replies; 72+ messages in thread From: David Newall @ 2010-07-12 19:58 UTC (permalink / raw) To: Stefan Richter; +Cc: Marcin Letyns, Linux Kernel Mailing List Stefan Richter wrote: > David Newall wrote: > >> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I >> doubt anybody honestly thinks otherwise. >> > > It works stable for what I use it for. > Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term "stable" is not appropriate for a newly released kernel; "gamma" should be used instead. Merely six months ago 2.6.32 was released; today we're preparing for 2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the latest stable kernel; or else 2.6.27 does, which is the other 2.6 on the front page of kernel.org. I'm pretty sure 2.4 is stable (which might explain why I see it embedded *much* more frequently than 2.6.) > If it doesn't for you, then I hope you are already in contact with the > respective subsystem developers to get the regressions that you > experience fixed. > (Segue to a problem which follows from calling bleeding-edge kernels "stable".) When reporting bugs, the first response is often, "we're not interested in such an old kernel; try it with the latest." That's not hugely useful when the latest kernels are not suitable for production use. If kernels weren't marked stable until they had earned the moniker, for example 2.6.27, then the expectation of developers and of users would be consistent: developers could expect users to try it again with latest stable kernel, and users could reasonably expect that trying it wouldn't break their system. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 19:58 ` David Newall @ 2010-07-12 21:11 ` Stefan Richter 2010-07-12 21:39 ` Martin Steigerwald 2010-07-15 7:23 ` david 2 siblings, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-12 21:11 UTC (permalink / raw) To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List David Newall wrote: > Stefan Richter wrote: >> If it doesn't for you, then I hope you are already in contact with the >> respective subsystem developers to get the regressions that you >> experience fixed. >> > (Segue to a problem which follows from calling bleeding-edge kernels > "stable".) > > When reporting bugs, the first response is often, "we're not interested > in such an old kernel; try it with the latest." Because there are continuously going bug fixes into the new kernels. > That's not hugely useful when the latest kernels are not suitable for > production use. "I have this bug here." - "It might be fixed in 2.6.mn. Try it." - "I don't want to because I got burned by 2.6.jk." Well, then don't do it and keep using the old buggy kernel. Or use a forked kernel where somebody adds bugfix backports and feature backports as you require them, if that somebody does a really good job. > If kernels weren't marked stable until they had earned the moniker, > for example 2.6.27, then the expectation of developers and of users > would be consistent: 2.6.27.y is what you call stable exactly because none of the boatloads of bug fixes and improvements of each subsequent 2.6.x release goes into it anymore. That's the nature of the beast. You can't have the cake and eat it. Which is why it is important that we keep the regression count in new kernels low and try to detect and fix regressions as early as possible. I admit that I do not really help with this myself outside the subsystem which I maintain. I usually start to run -rc kernel at later -rc's only (say, -rc5, only sometimes earlier) and don't test them beyond the one or to two configurations that I use personally. There were occasionally regressions in the subsystem that I maintain but they were few and always fixed quickly, and each one was a lesson how to do better. So, for that subsystem, the "Latest Stable Kernel" that is advertised on the front page of kernel.org really and truly /is/ the latest stable release that is recommended for production use, as far as that subsystem is concerned. -- Stefan Richter -=====-==-=- -=== -==-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 19:58 ` David Newall 2010-07-12 21:11 ` Stefan Richter @ 2010-07-12 21:39 ` Martin Steigerwald 2010-07-12 22:44 ` Stefan Richter 2010-07-15 7:23 ` david 2 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-07-12 21:39 UTC (permalink / raw) To: linux-kernel; +Cc: David Newall, Stefan Richter, Marcin Letyns [-- Attachment #1: Type: Text/Plain, Size: 8705 bytes --] Am Montag 12 Juli 2010 schrieb David Newall: > Stefan Richter wrote: > > David Newall wrote: > >> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I > >> doubt anybody honestly thinks otherwise. > > > > It works stable for what I use it for. > > Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term > "stable" is not appropriate for a newly released kernel; "gamma" should > be used instead. I indeed think stable should mean "stable for the majority of users". Its difficult to estimate. But I doubt that every dot-0 release qualified for that. > Merely six months ago 2.6.32 was released; today we're preparing for > 2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the > latest stable kernel; or else 2.6.27 does, which is the other 2.6 on > the front page of kernel.org. I'm pretty sure 2.4 is stable (which > might explain why I see it embedded *much* more frequently than 2.6.) I have these metrics: martin@shambhala:~> uprecords -m 20 | cut -c1-70 # Uptime | System ----------------------------+----------------------------------------- 1 36 days, 09:57:31 | Linux 2.6.32.3-tp42-toi- Tue Jan 12 09: 2 31 days, 01:07:24 | Linux 2.6.26.5-tp42-toi- Tue Sep 30 13: 3 24 days, 13:29:07 | Linux 2.6.33.2-tp42-toi- Mon May 31 22: 4 21 days, 15:08:21 | Linux 2.6.29.2-tp42-toi- Tue Apr 28 22: 5 19 days, 21:22:14 | Linux 2.6.33.2-tp42-toi- Tue May 11 17: 6 19 days, 09:49:05 | Linux 2.6.32.8-tp42-toi- Fri Mar 5 11: 7 18 days, 02:31:41 | Linux 2.6.29.6-tp42-toi- Thu Jul 9 09: 8 17 days, 12:38:36 | Linux 2.6.28.8-tp42-toi- Wed Mar 18 10: 9 16 days, 16:10:28 | Linux 2.6.31-tp42-toi-3. Tue Sep 22 21: 10 15 days, 14:39:26 | Linux 2.6.28.4-tp42-toi- Mon Feb 9 22: 11 15 days, 13:58:12 | Linux 2.6.27.7-tp42-toi- Tue Dec 9 22: 12 13 days, 21:11:06 | Linux 2.6.31-rc7-tp42-to Mon Aug 31 21: 13 13 days, 18:34:00 | Linux 2.6.29.2-tp42-toi- Wed May 27 19: 14 12 days, 21:54:18 | Linux 2.6.26.5-tp42-toi- Fri Oct 31 13: 15 10 days, 22:02:14 | Linux 2.6.28.7-tp42-toi- Thu Feb 26 16: 16 10 days, 16:29:02 | Linux 2.6.33.2-tp42-toi- Fri Jun 25 19: 17 10 days, 08:04:52 | Linux 2.6.26.2-tp42-toi- Thu Sep 18 14: 18 10 days, 03:52:30 | Linux 2.6.31.3-tp42-toi- Thu Oct 15 09: 19 9 days, 22:03:29 | Linux 2.6.31.5-tp42-toi- Tue Nov 3 11: 20 9 days, 00:24:22 | Linux 2.6.29.2-tp42-toi- Thu Jun 25 14: ----------------------------+----------------------------------------- -> 116 0 days, 00:52:03 | Linux 2.6.33.6-tp42-toi- Mo ----------------------------+----------------------------------------- 1up in 0 days, 00:31:56 | at Mon Jul 12 23: t10 in 15 days, 13:47:24 | at Wed Jul 28 12: no1 in 36 days, 09:05:29 | at Wed Aug 18 08: up 608 days, 02:40:08 | since Thu Sep 18 14: down 54 days, 06:12:57 | since Thu Sep 18 14: %up 91.808 | since Thu Sep 18 14: And 228 entries in there in total since 2.6.26, with martin@shambhala:~> uprecords -m 300 | cut -c1-70 | grep "0 days" | wc -l 148 entries for shorter than one day. Sure these are not to be read without the experiences I made and the reasons for rebooting, since sometimes just I messed up with some kernel option and compiled another one. AFAIR 2.6.26 upto 2.6.32 has been fine, except 2.6.30 where TuxOnIce just didn't work, but I am not yet sure whether this was caused by TuxOnIce or by some problem with general hibernation infrastructure. I then just omitted 2.6.30. Since I only tried 2.6.31 with my T42 I got an whooping uptime of over 100 days for 2.6.29 on my T23! Thats stable. Well any kernels that reproducably reach more than 15 or 30 days are quite stable in my own subjective consideration. Most kernels that got that far would likely have lastest much longer if I didn't just compile the next one, be it a dot release or a major release. This all without Radeon KMS! 2.6.33.2 was only stable when I used Radeon KMS without TuxOnIce. Ok, so might be a TuxOnIce problem, but then at least those quite frequent hangs on hibernation at the place where the screen goes black for a few seconds and comes back then which I had with 2.6.33.2 where gone for 2.6.34. Maybe they are gone with 2.6.33.6 since it carries some more radeon drm fixes. 2.6.34 did not reach an uptime of more than 2 or 3 days yet. Well maybe Nix is right and its just that Radeon KMS has not been stabilized enough and rest of kernel is quite stable. And when the combination of 2.6.33 now .6 and userspace software suspend works for me - for the first time, often it was TuxOnIce that worked, but not any in kernel method I tried from time to time - so be it for the time being, even if userspace software suspend is way slower and doesn't satisfy the disk on writing the image. > > If it doesn't for you, then I hope you are already in contact with > > the respective subsystem developers to get the regressions that you > > experience fixed. > > (Segue to a problem which follows from calling bleeding-edge kernels > "stable".) > > When reporting bugs, the first response is often, "we're not interested > in such an old kernel; try it with the latest." That's not hugely > useful when the latest kernels are not suitable for production use. If > kernels weren't marked stable until they had earned the moniker, for > example 2.6.27, then the expectation of developers and of users would > be consistent: developers could expect users to try it again with > latest stable kernel, and users could reasonably expect that trying it > wouldn't break their system. I think thats really a question on how to attract more widespread testing. For wider spread testing it needs to be stable enough to have enough users deal with it. But without wider spread testing it might not get there. I just dropped 2.6.34 for now and I will wait for more dot releases. Maybe I am really the only one for whom 2.6.34 doesn't work, maybe just other people did so to frustrated without telling here or in bugzilla. Maybe providing better ways to report bugs and gather information even on freeze bugs without setting up too much manually could help. I certainly think that the enhanced DrKonqi crash reported from KDE 4.3 and up helped users to provide *good bug reports*. Maybe there could be something like that for the kernel and an easy option to have the kernel store even backtraces for hard crashes. Unfortunately there is no reset button on notebooks, so memory might be the wrong place. Well one could dedicate a ring buffer space on the swap partition for that or something like that - that area should be writable even when no filesystem is not working anymore. On next reboot the bug report application recovers the crash data from there. Would impose a risk that on severe memory corruption the kernels write crash data elsewhere, where it shouldn't save it. An USB stick comes to mind, but what when the USB stack doesn't work anymore? Well not every bug is a freeze bug and maybe something could be done for non freeze bugs. Like an application which records selected data while the user reproduces the bug. Just like enhanced DrKonqi collects crash data and even helps the user to install necessary debug packages. But I think when a kernel behaves to unstable for lots of users they just drop it. Some bugs are okay, but especially freeze bugs and even more so fs corruptions bugs scare non die-hard kernel debuggers who bisect a kernel a day away. Maybe I just had lots of bad luck, so I would love to hear other experiences, some already said 2.6.34 works pretty stable for them. I will leave 2.6.34.1 on my T23 which has a Savage which maybe will never get KMS, who knows, and on the workstation at work, which doesn't use Radeon KMS due to rock solid stable Debian Lenny userspace. Maybe this at least sheds a light, whether most of my issues have likely been Radeon KMS related. As a side note: Ext4 is absolutely rock stable for me! As is XFS on my T23 and even BTRFS for the T23 /home and some work directory on the workstation (not yet on my production T42). Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 21:39 ` Martin Steigerwald @ 2010-07-12 22:44 ` Stefan Richter 0 siblings, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-12 22:44 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel, David Newall, Marcin Letyns Martin Steigerwald wrote: > And when the combination of 2.6.33 now .6 and userspace software suspend > works for me - for the first time, often it was TuxOnIce that worked, but > not any in kernel method I tried from time to time - so be it for the time > being, even if userspace software suspend is way slower and doesn't > satisfy the disk on writing the image. BTW, the need to rely on a quite fundamental kernel component that is not in the mainline (for whichever reason) in the long term, almost guarantees you a lot of recurring pain, one way or another. -- Stefan Richter -=====-==-=- -=== -==-= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 19:58 ` David Newall 2010-07-12 21:11 ` Stefan Richter 2010-07-12 21:39 ` Martin Steigerwald @ 2010-07-15 7:23 ` david 2 siblings, 0 replies; 72+ messages in thread From: david @ 2010-07-15 7:23 UTC (permalink / raw) To: David Newall; +Cc: Stefan Richter, Marcin Letyns, Linux Kernel Mailing List On Tue, 13 Jul 2010, David Newall wrote: > (Segue to a problem which follows from calling bleeding-edge kernels > "stable".) > > When reporting bugs, the first response is often, "we're not interested in > such an old kernel; try it with the latest." That's not hugely useful when > the latest kernels are not suitable for production use. If kernels weren't > marked stable until they had earned the moniker, for example 2.6.27, then the > expectation of developers and of users would be consistent: developers could > expect users to try it again with latest stable kernel, and users could > reasonably expect that trying it wouldn't break their system. 2.6.27 didn't get declared 'stable' because it had very few bugs, it was declared 'stable' because someone volunteered to maintain it longer and back-port patches to it long past the normal process. 2.6.32 got declared 'long-term stable' before 2.6.33 was released, again not because it was especially good, but because it didn't appear to be especially bad and several distros were shipping kernels based on it, so again someone volunteered (or was volunteered by the distro that pays their paycheck) to badk-port patches to it longer. I have been running kernel.org kernels on my production systems for >13 years. I am _very_ short of time, so I generally don't get a chance to test the -rc kernels (once in a while I do get a chance to do so on my laptop). What I do is every 2-3 kernel releases I wait a couple days after the kernel release to see if there are show-stopper bugs, and if nothing shows up (which is the common case for the last several years) I compile a kernel and load it on machines in my lab. I try to have a selection of machines that match the systems I have in production in what I have found are the 'important' ways (a defintition that changes once in a while when I find something that should 'just work' that doesn't ;-). This primarily includes systems with all the network card types and Raid card types that I use in production, but now also includes a machine with a SSD (after I found a bug that only affected that combination) if my lab machiens don't crash immediatly, I leave them running (usually not even stress testing them, again lack of time) for a week or so, then I put the new kernel on my development machiens, wait a few days, then put them on QA machines, wait a few days, then put them in production. I have the old kernel around so that I can re-boot into it if needed. This tends to work very well for me. It's not perfect and every couple of cycles I run into grief and have to report a bug to the kernel list. Usually I find it before I get into production, but I have run into cases that got all the way into production before I found a problem. with the 'new' -stable series, I generally wait until at least 2.6.x.1 is released before I consider it ready to go anywhere outside my lab (I'll still install the 2.6.x kernel in the lab, but I'll wait for the additional testing that comes with the .1 stable kernels before moving it on) I don't go through this entire process with the later -stable kernels, If I'm already running 2.6.x and there is a 2.6.x.y released that contains fixes that look like they are relavent to the configuration that I run (which lets out the majority of changes, I do fairly minimal kernel configs) I will just test it in the lab to do a smoke test, then schedule a rollout through the rest of my network. If there are no problems before I get permission to deploy to production I put it on half my boxes, failover to them, then wait a little bit (a day to a week) before upgrading the backups. this writeup actually makes it sound like I spend a lot of time working with kernels, but I really don't. I'll spend couple half days twice a year on testing, and then additional time rolling it out to the 150+ clusters of servers I have in place. If you can't spend at least this much time on the kernel you are probably better off just running your distro kernel, but even there you really should do a very similar set of tests on it's kernel releases. There's another department in my company that uses distro kernels (big name distro, but I will avoid flames by not naming names) without the testing routine that I use and my track record for stability compares favorablely to theirs over the last 7 years or so (they haven't been running linux as long as I have, so we can't go back as far ;-) They also do more updates than I do simply because they can't as easily look at the kernel release and decide it doesn't apply to them. David Lang ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 15:56 ` David Newall 2010-07-12 17:48 ` Marcin Letyns 2010-07-12 18:00 ` Stefan Richter @ 2010-07-13 16:50 ` Theodore Tso 2010-07-13 20:45 ` David Newall 2 siblings, 1 reply; 72+ messages in thread From: Theodore Tso @ 2010-07-13 16:50 UTC (permalink / raw) To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List On Jul 12, 2010, at 11:56 AM, David Newall wrote: > Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I doubt anybody honestly thinks otherwise. Stable is relative. Some people are willing to consider Fedora "stable". Other people will only use a RHEL kernel, and there are those who are using RHEL 4 or even RHEL 3 because they are extremely risk-adverse. So arguments about whether or not a specific kernel version deserves to be called "stable" is going to be a waste of time and electrons because it's all about expectations. But the one huge thing that people are forgetting is that the fundamental premise behind open source is "scratch your own itch". That means that people who own a specific piece of hardware have to collectively be responsible for making sure that it works. It's not possible for me to assure that some eSATA PCMCIA card on a T23 laptop still works, because I don't own the hardware. So the only way we know whether or not there is a regression is there is *someone* who owns that hardware which is willing to try it out, hopefully during -rc3 or -rc4, and let us known if there is a problem, and hopefully help us debug the problem. If you have people saying, "-rc3 isn't stable", I'll wait until "-rc5" to test things, then it will be that much later before we discover a potential problem with the T23 laptop, and before we can fix it. If people say, "2.6.34.0" isn't stable, I refuse to run a kernel until "2.6.34.4", then if they are the only person with the T23 eSata device, then we won't hear about the problem until 2.6.34.4, and it might not get fixed until 2.6.34.5 or 2.6.34.6! What this means is yes that stable basically means, "stable for the core kernel developers". You can say that this isn't correct, and maybe even dishonest, but if we wait until 2.6.34.N before we call a release "stable", and this discourages users from testing 2.6.34.M for M<N, it just delays when bugs will be found and fixed. This is why to me, arguing that 2.6.34.0 is not "stable" really isn't useful. If you really want to frequently update your kernel and use the latest and greatest, part of the price that you have to pay is to help us with the testing, bug reporting, and root cause determination. If you don't like this, your other choice is to pay $$$ to the folks who provide support for Solaris and OS X, and accept the restrictions in hardware implied by Solaris and OS X. (Hint: neither supports a Thinkpad T23.) But to compare Linux, especially the non-distribution source code distribution from kernel.org with operating systems that have very different business models is to really and fundamentally understand how things work in the Linux world. If you want that kind of stability, then you will need to use an older kernel. Or use a distribution kernel which has a support and testing and business model compatible with your desires. Fedora for example uses kernels which are six months out of date, because during those six months, the people who use the testing versions of Fedora are doing testing and helping with the bug fixing. Red Hat uses this free testing pool to improve the testing and stability of Red Hat Enterprise Linux, so if you are willing to live with a 2-3 year release cycle, RHEL will be more stable than Fedora. And if you need to make sure that bugs are fixed very quickly, and you can call and demand a developer's attention, you can pay $$$ for a support contract. I will say once again. There is no such thing as a free lunch. Linux is a better deal than most, and you have multiple choices about how frequently you update, whether you let someone else decide whether or not a particular kernel release plus patches is "stable", or more accurately, "stable enough", and you can choose how much you are willing to pay, either in personal time and effort, or $$$ to some support organization. But demanding that kernel.org become "more stable" when it is supported by purely volunteers is simply not reasonable. -- Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-13 16:50 ` Theodore Tso @ 2010-07-13 20:45 ` David Newall 2010-07-14 6:33 ` Theodore Tso 0 siblings, 1 reply; 72+ messages in thread From: David Newall @ 2010-07-13 20:45 UTC (permalink / raw) To: Theodore Tso; +Cc: Marcin Letyns, Linux Kernel Mailing List Theodore Tso wrote: > What this means is yes that stable basically means, "stable > for the core kernel developers". You can say that this isn't > correct, and maybe even dishonest, but if we wait until 2.6.34.N > before we call a release "stable", and this discourages users > from testing 2.6.34.M for M<N, it just delays when bugs will > be found and fixed. > Calling it stable instils and reinforces a Pavlovian response in typical users, that recent Linux kernels are dangerous and unreliable; one year old was suggested as a safe benchmark. Typical users being 99% of the population, testing hardly begins until a kernel is "sufficiently old." This Pavlovian response is what really delays finding and fixing bugs. Being up-front and saying which kernels are likely to fail would help many users calculate the risk and improve their willingness to try newer kernels. "Sufficiently old" might well come down to six months, maybe four. That is to say, instead of taking a year to pass gamma-testing, new kernels could be passed in six months or less. That would be a big improvement in stability and quality assurance however you dice it. > But demanding that kernel.org become "more stable" when it > is supported by purely volunteers is simply not reasonable. Let's not be hysterical; nobody made any demands. Semantics aside, the suggestion is reasonable because it affects developers' workloads not one whit. The only change is the label that Linus applies to new releases. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-13 20:45 ` David Newall @ 2010-07-14 6:33 ` Theodore Tso 0 siblings, 0 replies; 72+ messages in thread From: Theodore Tso @ 2010-07-14 6:33 UTC (permalink / raw) To: David Newall; +Cc: Marcin Letyns, Linux Kernel Mailing List On Jul 13, 2010, at 4:45 PM, David Newall wrote: > > Calling it stable instils and reinforces a Pavlovian response in typical users, that recent Linux kernels are dangerous and unreliable; one year old was suggested as a safe benchmark. Typical users being 99% of the population, testing hardly begins until a kernel is "sufficiently old." This Pavlovian response is what really delays finding and fixing bugs. Being up-front and saying which kernels are likely to fail would help many users calculate the risk and improve their willingness to try newer kernels. "Sufficiently old" might well come down to six months, maybe four. Most typical users should be using distribution kernels. Period. We can't say which kernels are likely to fail, because we don't know. If people don't test newer kernels, the mere passage of time, whether it's four months, or six months, or a year, or two years, is not going to magically make problems go away and get fixed. That only happens if someone steps up and tries it out, and if it breaks submits bug reports or patches. A fairly large number of Linux developers seem to prefer relatively recent vintage Thinkpads, preferably without Nvidia or ATI chipsets. These laptops are generally safe and reliable by -rc3 or so --- because if they aren't the Linux developers step up and complain and do code bisections and they fix the problem. If someone has a T23 laptop, and they help out by doing the same, then it will also be safe and reliable by the time of 2.6.X.0. If they just kvetch and complain, and stamp their feet, and say "Linux is unsafe and unreliable", and no other T23 owners step up to the challenge, then two years might go by and the same kernel might still be unreliable --- for them. -- Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 13:16 ` Ted Ts'o 2010-07-11 18:02 ` Anca Emanuel 2010-07-12 6:46 ` David Newall @ 2010-09-04 17:12 ` Martin Steigerwald 2 siblings, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 17:12 UTC (permalink / raw) To: linux-kernel; +Cc: Ted Ts'o [-- Attachment #1: Type: Text/Plain, Size: 6297 bytes --] Hi Ted, I wanted to answer this for a long time... Am Sonntag 11 Juli 2010 schrieb Ted Ts'o: > On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote: > > I still actually *use* my machines for something else than hunting > > patches for kernel bugs and on kernel.org it is written "Latest > > *Stable* Kernel" (accentuation from me). I know of the argument that > > one should use a distro kernel for machines that are for production > > use. But frankly, does that justify to deliver in advance known crap > > to the distributors? What impact do partly grave bugs reported on > > bugzilla have on the release decision? > > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when > I find bugs, I report them and I help fix them. If more people did > that, then the 2.6.X.0 releases would be more stable. But kernel > development is a volunteer effort, so it's up to the volunteers to > test and fix bugs during the rc4, -rc5 and -rc6 time frame. But if > the work tails off, because the developers are busily working on new > features for the new release, then past a certain point, delaying the > release reaches a point of diminishing returns. This is why we do > time-based releases. It sure helps quality of the kernel if people test rc candidates of them and report bugs, but I think at least partly you missed my point. I wrote in my initial mail: > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as > well as most important two complete lockups - well maybe just So two out of three bugs I experienced - the third one being [Bug 16376] random - possibly Radeon DRM KMS freezed I am currently bisecting - actually have been from testers that actually tested rc kernels. One even had a patch prior to releasing 2.6.34. So for these two bugs testing rc kernels clearly has not helped raising the *release* kernel quality. I now understand that deferring a stable kernel release can cause a lot of pain. But still I have the question why at least the patch from the bug 15969 has not been taken prior to release? Not to find some guilt, but to possibly find ways to improve the process. I can't check bugzilla right now due to too many MySQL connections on the server - already reported, but supposedly already known to the admins anyway - but AFAIR the patch has been available and AFAIR also tested way before the release. So my question still stands whether anything can be improved with at least getting as much bugfix patches from Bugzilla into stable kernel. At least for critical bugs like does not boot or only garbage on screen after booting. I can accept that bug 15788 would have been missed by that, but this bug was not that important - it was just the tip on the iceberg. > It is possible to do other types of release strategies, but look at > Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens > if you insist on waiting until all release blockers are fixed (and > even with Debian, past a certain point the release engineer will still > just reclassify bugs as no longer being release blockers --- after the > stable release has slipped for months or years past the original > projected release date.) I made a suggestion on how to improve the development process while still holding to time-based releases in my other mail to this thread today. > So if you and others like you are willing to help, then the quality of > the Linux kernels can continue to improve. But simply complaining > about it is not likely to solve things, since threating to not be > willing to upgrade kernels is generally not going to motivate many, if > not most, of the volunteers who work on stablizing the kernel. I do, but I need to balance this. I already spend quite some hours on bisecting that freeze bug mentioned above and it might take some more weeks to nail it down. And it was not a threat at all. I just have to balance how much instability I can take on systems that I use for my daily stuff. > > I am willing to risk some testing and do bug reports, but these are > > still production machines, I do not have any spare test machines, and > > there needs to be some balance, i.e. the kernels should basically > > work. > > So you want the latest and greatest new features in a brand-new kernel > release, but you're not willing to pay for test machines, and you're > not willing to pay for a distribution support... The fact that you > are willing to do some testing is appreciated, but remember, there's > no such thing as a free lunch. Linux may be a very good bargain (look > at how much Oracle has increased its support contracts for Solaris!), > but it's still not a free lunch. At the end of the day, you get what > you put into it. Ted, I think there is no need to attack me like that. Actually all of the bugs have been on my laptop that I use for work *and* private work. Most of the time I spent on these bugs have been during my spare volunteer time as well. And we are yet a small company. When I apply what you wrote above, the only sane thing would be to use a distro kernel and be done with it - which means less testing of recent kernels. Still even then that likely radeon kms related freeze could have slipped even into Debian stable kernel, considering that no one posted to the bug report that he was able to reproduce the bug. Then I'd just accept the slower turn-around cycles with in kernel or userspace software suspend and be done with compiling TuxOnIce kernels. But I am not there yet. Cause compiling TuxOnIce kernels worked pretty well prior from 2.6.11 to 2.6.33. And I want to help as good as I can. Hopefully after bisecting the radeon kms relate freeze bug thinks are calmer again - although there is another wierd, possibly difficult to track bug left. Maybe I just had lots of bad luck with 2.6.34, and after tracking those two bugs things are calmer again. The Radeon KMS stuff has been a big change as well. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 7:18 stable? quality assurance? Martin Steigerwald 2010-07-11 8:39 ` Eric Dumazet 2010-07-11 13:16 ` Ted Ts'o @ 2010-07-11 13:56 ` Lee Mathers 2010-07-11 14:51 ` Martin Steigerwald 2010-07-12 19:46 ` stable? quality assurance? Nix [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com> 4 siblings, 1 reply; 72+ messages in thread From: Lee Mathers @ 2010-07-11 13:56 UTC (permalink / raw) To: Martin Steigerwald, linux-kernel Wow! First question what is a "desaster"? Second question, what makes you so important that you feel you can makes demands and comments as you did. If indeed these are production systems and you are an administrator of said production systems. I suggest you need to do a little more home work to expand your knowledge base. I would follow Eric's advice. It's sound advice and better yet it was free. Hope you have better luck in getting your systems running well. On 7/11/10, Martin Steigerwald <Martin@lichtvoll.de> wrote: > > Hi! > > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well > as most important two complete lockups - well maybe just X.org and radeon > KMS, I didn't start my second laptop to SSH into the locked up one - on my > ThinkPad T42. I fixed the first one with the patch, but after the lockups I > just downgraded to 2.6.33 again. > > I still actually *use* my machines for something else than hunting patches > for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" > (accentuation from me). I know of the argument that one should use a > distro kernel for machines that are for production use. But frankly, does > that justify to deliver in advance known crap to the distributors? What > impact do partly grave bugs reported on bugzilla have on the release > decision? > > And how about people who have their reasons - mine is TuxOnIce - to > compile their own kernels? > > Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the > freezes as well. So far so good. > > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the > website. And I just again always wait for .2 or .3, as with 2.6.34.1 I > still have some problems like the hang on hibernation reported in > > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > on this mailing list just a moment ago. But then 2.6.33 did hang with > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since > 2.6.34 did not hang with it anymore which was a reason for me to try > 2.6.34 earlier. > > I am quite a bit worried about the quality of the recent kernels. Some > iterations earlier I just compiled them, partly even rc-ones which I do > not expact to be table, and they just worked. But in the recent times .0, > partly even .1 or .2 versions haven't been stable for me quite some times > already and thus they better not be advertised as such on kernel.org I > think. I am willing to risk some testing and do bug reports, but these are > still production machines, I do not have any spare test machines, and > there needs to be some balance, i.e. the kernels should basically work. > Thus I for sure will be more reluctant to upgrade in the future. > > Ciao, > -- > Martin 'Helios' Steigerwald - http://www.Lichtvoll.de > GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 > -- Sent from my mobile device ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 13:56 ` Lee Mathers @ 2010-07-11 14:51 ` Martin Steigerwald 2010-07-11 17:22 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-07-11 14:51 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 3155 bytes --] Hi Lee, Am Sonntag 11 Juli 2010 schrieb Lee Mathers: > Wow! > > First question what is a "desaster"? For me freezing the machine or at least complete desktop randomly for example. And actually I said "for me" as you can reread on the bottom of your top posting. > Second question, what makes you so important that you feel you can > makes demands and comments as you did. Since when I do need to be considered to be important by you or anyone else to make comments? Actually I think I do not - this is still an open mailinglist, isn't it? And I won't waste my time with proofs that I contributed to free software here and there - also to kernel testing what for example Ingo Molnar could testify back in early CFS times where I roughly compiled a kernel a day and to kernel documentation once. I also do not get why you are attacking me personally. It seems to be that you feel personally attacked by me. But I did not. I just questioned the quality of the kernel and its current quality assurance process. No one is personally bad then anything of that lacks. One reason for a demand for me is best expressed by this question: Does the kernel developer community want to encourage that a group of advanced Linux users - but mostly non-developers - compile their own vanilla or valnilla near kernels, provide wider testing and report a bug now and then? I can live with either answer. If not, I just will be much more reluctant to try out new kernels. But I have experienced working productively with kernel developers like Ingo and tuxonice developer Nigel who where pretty interested in my usage of latest kernels. I admit my wording could have been friendlier, too, but I was just frustrated out of my recent experiences. What I wanted to achieve is raising concern whether kernel quality actually has decreased and more importantly something needs to be done to make it more stable again. Well Linus has at least been a bit more reluctant to take big changes after rc1 this cycle, so maybe 2.6.35 will be better again. > If indeed these are production systems and you are an administrator of > said production systems. I suggest you need to do a little more home > work to expand your knowledge base. Its production system that have some fault tolerance, i.e. not servers, but laptops and one, not yet all workstations. But for me a certain balance has to be met. I will just downgrade and drop newer kernels or even start skipping whole major versions completely on a regular basis if that turns out to be the only way to have stable enough machines for me. One approach would be to stick to the stable kernels that Greg and the stable team maintains for a longer time > Hope you have better luck in getting your systems running well. Thanks. I certainly will. If need be by downgrading. I hope that someone answers who actually can take some critique. From the current replies I perceive a lack of that ability. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 14:51 ` Martin Steigerwald @ 2010-07-11 17:22 ` Willy Tarreau 2010-07-11 21:38 ` Rafael J. Wysocki ` (3 more replies) 2010-07-11 19:49 ` Stefan Richter 2010-07-13 11:11 ` Alejandro Riveira Fernández 2 siblings, 4 replies; 72+ messages in thread From: Willy Tarreau @ 2010-07-11 17:22 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Hi Martin, On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > I hope that someone answers who actually can take some critique. From the > current replies I perceive a lack of that ability. well, I'll try to do then :-) There were some threads in the past about kernel releases quality, where Linus explained why it could not be completely black or white. Among the things he explained, I remember that one of primary concern was the inability to slow down development. I mean, if he waits 2 more weeks for things to stabilize, then there will be two more weeks of crap^H^H^H^Hdevelopment merged in next merge window, so in fact this will just shift dates and not quality. There are also some regressions that get merged with every pre-release. Thus, assuming he would wait for one more pre-release to merge the fixes you spotted, 2 or 3 more would appear, so there's a point where it must be decided when to release. Right now it's released when he feels it "good enough". This can be very subjective, but I'd think that "good enough" basically means that the kernel will be able to live in its stable branch without major changes and without reverting features. Also, you have to consider that there are several types of users. Some of them are developers who will run a latest -git kernel at some point. Some of them will be enthousiasts waiting for a feature, and who will run every -rc kernel once the feature is merged, to ensure it does not break before the release. There are also janitors and the curious ones who'll basically run a few of the last -rc as time permits to see if they can spot a few last-minute issues before the release. There are the brave ones who systematically download the dot-0 release once Linus announces it and will proudly run it to show their friends who it's better than the last one. There are those who need a bit of stability (eg: professional laptop or home server) and will prefer to wait for a few stable releases to ensure they won't waste their time on a big stupid issue that all other ones above will have immediately spotted for them. And there are the ones who run production servers who will either use distro kernels of long term stable kernels, with a more or less long qualification process between upgrades. It's just an ecosystem where you have to find your place. From your description, I think you're before the last ones above, you need something which works, eventhough it's not critical, so you could very well wait for 2-3 stable updates before upgrading (that does not prevent you from testing earlier on other systems if you want to test performance, new features, regressions, etc...). It's not really advisable to call dot-0 releases "unstable" because it will only result in shifting the adoption point between the user classes above. We need to have enthousiasts who proudly say "hey look, dot-0 and it's already rock solid". We've all seen some of them and they're the ones who help reporting issues that get fixed in the next stable release. I think that the most reasonable thing to do is to assume your need for stability and always refrain from running on the latest release. Speaking for myself, I tend to run rock solid kernels for my data (my file server was still on 2.4.37.9 till this afternoon, I just upgraded it to 2.6). The distro's kernel currently is 2.6.33.4 and I'm going to switch it back to 2.6.32.x or 2.6.27.x because I'd rather have something fully tested there. My desktop which regularly reaches 50-100 days uptime runs on whatever looks stable enough for the job when I upgrade. Usually it's one of Greg's long term stable series. 2.6.27.x or 2.6.32.x, with x >= 10. My work laptop is on similar kernels. My netbook is generally running experimental code, it does not matter much. It's where I'd try 2.6.35-rc for instance, or where I test 2.6.32.x-rc when Greg announces them. You see, there's a kernel for everyone, and for every usage. You just have to make your choice. And when you don't know or don't want to guess, stick to the distro's kernel. Regards, Willy ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 17:22 ` Willy Tarreau @ 2010-07-11 21:38 ` Rafael J. Wysocki 2010-07-12 4:17 ` Willy Tarreau 2010-07-12 9:56 ` Martin Steigerwald ` (2 subsequent siblings) 3 siblings, 1 reply; 72+ messages in thread From: Rafael J. Wysocki @ 2010-07-11 21:38 UTC (permalink / raw) To: Willy Tarreau; +Cc: Martin Steigerwald, linux-kernel On Sunday, July 11, 2010, Willy Tarreau wrote: > Hi Martin, > > On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > > I hope that someone answers who actually can take some critique. From the > > current replies I perceive a lack of that ability. > > well, I'll try to do then :-) > > There were some threads in the past about kernel releases quality, > where Linus explained why it could not be completely black or white. > > Among the things he explained, I remember that one of primary concern > was the inability to slow down development. I mean, if he waits 2 more > weeks for things to stabilize, then there will be two more weeks of > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this > will just shift dates and not quality. ... > It's not really advisable to call dot-0 releases "unstable" because > it will only result in shifting the adoption point between the user > classes above. IMnshO it's not exactly fair to call them "stable" either. I tend to call them "major releases" which basically reflects what they are - events in the development process that each start a new merge window. Nothing more, either way. Rafael ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 21:38 ` Rafael J. Wysocki @ 2010-07-12 4:17 ` Willy Tarreau 0 siblings, 0 replies; 72+ messages in thread From: Willy Tarreau @ 2010-07-12 4:17 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Martin Steigerwald, linux-kernel Hi Rafael, On Sun, Jul 11, 2010 at 11:38:28PM +0200, Rafael J. Wysocki wrote: > > It's not really advisable to call dot-0 releases "unstable" because > > it will only result in shifting the adoption point between the user > > classes above. > > IMnshO it's not exactly fair to call them "stable" either. I tend to call them > "major releases" which basically reflects what they are - events in the > development process that each start a new merge window. Nothing more, either > way. Indeed, just exactly that. Maybe the confusion comes from the title "Latest Stable Kernel" on kernel.org, which we could rename "Latest Kernel Release" whatever it reflects ? Willy ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 17:22 ` Willy Tarreau 2010-07-11 21:38 ` Rafael J. Wysocki @ 2010-07-12 9:56 ` Martin Steigerwald 2010-07-12 15:43 ` Martin Steigerwald 2010-09-04 16:38 ` Martin Steigerwald 3 siblings, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-07-12 9:56 UTC (permalink / raw) To: linux-kernel; +Cc: Willy Tarreau [-- Attachment #1: Type: Text/Plain, Size: 943 bytes --] Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: > Hi Martin, Hi Willy, > On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > > I hope that someone answers who actually can take some critique. From > > the current replies I perceive a lack of that ability. > > well, I'll try to do then :-) > > There were some threads in the past about kernel releases quality, > where Linus explained why it could not be completely black or white. [...] > You see, there's a kernel for everyone, and for every usage. You just > have to make your choice. And when you don't know or don't want to > guess, stick to the distro's kernel. Wow! Thanks to you and all the others who provided such constructive feedback. I need a bit of time to digest and think through it. I will answer then. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 17:22 ` Willy Tarreau 2010-07-11 21:38 ` Rafael J. Wysocki 2010-07-12 9:56 ` Martin Steigerwald @ 2010-07-12 15:43 ` Martin Steigerwald 2010-07-12 17:36 ` Willy Tarreau 2010-07-12 17:55 ` Stefan Richter 2010-09-04 16:38 ` Martin Steigerwald 3 siblings, 2 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-07-12 15:43 UTC (permalink / raw) To: linux-kernel; +Cc: Willy Tarreau [-- Attachment #1: Type: Text/Plain, Size: 7409 bytes --] Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: > Hi Martin, Hi Willy, > On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > > I hope that someone answers who actually can take some critique. From > > the current replies I perceive a lack of that ability. > > well, I'll try to do then :-) > > There were some threads in the past about kernel releases quality, > where Linus explained why it could not be completely black or white. > > Among the things he explained, I remember that one of primary concern > was the inability to slow down development. I mean, if he waits 2 more > weeks for things to stabilize, then there will be two more weeks of > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this > will just shift dates and not quality. Would it make that much of a difference? Linus could still say no to obvious crap, couldn't he? > There are also some regressions that get merged with every pre-release. > Thus, assuming he would wait for one more pre-release to merge the > fixes you spotted, 2 or 3 more would appear, so there's a point where > it must be decided when to release. Some sort of classifying bugs could help here I think. Something that helps Linus to decide whether it is worth to do another release candidate round or not. Actually I think the USB soundcard not working after resume bug I mentioned (bug #15788) wouldn't warrant a new release candidate round, especially as it didn't have a patch yet and will likely just affect a minority of users. Still it would be fine if it was fixed in time. I do think that the Radeon KMS does not work after resume bug (#15969) does qualify since it causes loss of data handled by the current X session(s) - sure I normally save my stuff before hibernating, but... And it actually had a patch that has been tested! The desktop freeze bug I mentioned would slip, cause I didn't report it and except from a debian bug report I found it wasn't confirmed at all. An reported and confirmed desktop freeze would qualify IMHO. Actually I read postings from Linus that he actually reads the regression list kindly provided by Rafael. 15788 was in there, but IMHO wouldn't qualify (see posting "2.6.34-rc5: Reported regressions from 2.6.33"). But 15969 was not - well it was reported for rc7, so too late for the manual report by Rafael. So yes, I see how it can have slipped. Maybe an approach would be to dynamically generate the list from all bug reports marked for 2.6.34 versions and have it posted to kernel mailing list after every rc. This way bug #15969 would at least have been in the list of known regressions. Bugzilla severity and priority fields or something similar could be used to set the importance of a bug report and the regression list could be sorted by importance. One important criterion also would be whether someone could confirm it, reproduce it. Even when I reported those desktop freezes, unless someone confirmed them it might just happen for me. Well a "confirm" or vote button might be good, so that the amount of confirmations could be counted. It would need some triaging and classifying and I am willing to help with that. > Right now it's released when he feels it "good enough". This can be > very subjective, but I'd think that "good enough" basically means > that the kernel will be able to live in its stable branch without > major changes and without reverting features. Okay, then thats two different definitions of stable. I mean stable enough for (adventurous) end users. And here its more of a development point of view. > Also, you have to consider that there are several types of users. > Some of them are developers who will run a latest -git kernel at > some point. Some of them will be enthousiasts waiting for a feature, > and who will run every -rc kernel once the feature is merged, to > ensure it does not break before the release. There are also janitors > and the curious ones who'll basically run a few of the last -rc as > time permits to see if they can spot a few last-minute issues before > the release. There are the brave ones who systematically download > the dot-0 release once Linus announces it and will proudly run it > to show their friends who it's better than the last one. There are > those who need a bit of stability (eg: professional laptop or home > server) and will prefer to wait for a few stable releases to ensure > they won't waste their time on a big stupid issue that all other ones > above will have immediately spotted for them. And there are the ones > who run production servers who will either use distro kernels of > long term stable kernels, with a more or less long qualification > process between upgrades. Yes, stable enough for whom? I see. > It's just an ecosystem where you have to find your place. From your > description, I think you're before the last ones above, you need > something which works, eventhough it's not critical, so you could > very well wait for 2-3 stable updates before upgrading (that does > not prevent you from testing earlier on other systems if you want > to test performance, new features, regressions, etc...). ACK. > It's not really advisable to call dot-0 releases "unstable" because > it will only result in shifting the adoption point between the user > classes above. We need to have enthousiasts who proudly say "hey > look, dot-0 and it's already rock solid". We've all seen some of them > and they're the ones who help reporting issues that get fixed in the > next stable release. I do think the claim should be honest. "stable" IMHO is not, at least from a user's point of view. "unstable" isn't either, cause a dot-0 kernel is not guarenteed to be unstable ;). So I agree with the major release kernel approach from Rafael. > I think that the most reasonable thing to do is to assume your need > for stability and always refrain from running on the latest release. > > Speaking for myself, I tend to run rock solid kernels for my data (my [...] > You see, there's a kernel for everyone, and for every usage. You just > have to make your choice. And when you don't know or don't want to > guess, stick to the distro's kernel. Yes. As told already I will rebalance my decision on which kernel to use. And I now better understand some of the problems. Thanks. But beyond that, I do think its worth thinking about ways to improve the process of ensuring as much stability as sensibly possible. A dot-0 kernel won't be error-free - but I find just claiming the current process as "the best we can have" not actually satisfying. And I do think it can be improved upon. I do not do kernel development, but I am willing to help with collecting information about the current state of the kernel, help with bug triaging as good as I can and manage to take time. I do have some experience with quality management as I coordinated the betatest of some AmigaOS versions, but then this has been in a closed group. Here its a different scale and I believe it needs somewhat different approaches. I reply to other posts in that thread later in the next days. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 15:43 ` Martin Steigerwald @ 2010-07-12 17:36 ` Willy Tarreau 2010-07-12 19:56 ` Martin Steigerwald 2010-07-12 17:55 ` Stefan Richter 1 sibling, 1 reply; 72+ messages in thread From: Willy Tarreau @ 2010-07-12 17:36 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Hi Martin, On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote: > > Among the things he explained, I remember that one of primary concern > > was the inability to slow down development. I mean, if he waits 2 more > > weeks for things to stabilize, then there will be two more weeks of > > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this > > will just shift dates and not quality. > > Would it make that much of a difference? Linus could still say no to > obvious crap, couldn't he? It's not "obvious" crap, it's that the developers will simply have advanced two more weeks ahead of their schedule, so their merge will be larger as it will contain some parts that ought to be in next release should the kernel be release earlier. And it will not be possible to delay merging because among them there's always the killer feature everybody wants. This is the reason for the strict merge window. > > There are also some regressions that get merged with every pre-release. > > Thus, assuming he would wait for one more pre-release to merge the > > fixes you spotted, 2 or 3 more would appear, so there's a point where > > it must be decided when to release. > > Some sort of classifying bugs could help here I think. Something that > helps Linus to decide whether it is worth to do another release candidate > round or not. Maybe sometimes that could indeed help, but that must not be done too often, otherwise releases slip and patches get even bigger. (...) > I do > think that the Radeon KMS does not work after resume bug (#15969) does > qualify since it causes loss of data handled by the current X session(s) - > sure I normally save my stuff before hibernating, but... And it actually > had a patch that has been tested! Then the problem should be checked on this side : why this patch didn't get merged in time ? Maybe the maintainer needed more time to recheck it, maybe he was on holiday, maybe he was ill on the wrong day, maybe he had already merged tons of fixes and preferred to get this one for next time, ... But even if there are fixes pending, this should not be a reason to *delay* releases, otherwise we go back to the problem above, with also the problem of new regressions reported with tested fixes available... (...) > Maybe an approach would be to dynamically generate the list from all bug > reports marked for 2.6.34 versions and have it posted to kernel mailing > list after every rc. This way bug #15969 would at least have been in the > list of known regressions. In fact, Rafael regularly emits this list, and the respective maintainers are informed. That means to me that there's little hope that you'll get the maintainers to merge and send a fix they did not manage to do. What *could* be improved though would be if Linus publically states the deadline for last fixes, as Greg does with the stable branch. That can give hopes to some of them to finish a little merge work in time instead of considering it's too late. > Bugzilla severity and priority fields or something similar could be used to > set the importance of a bug report and the regression list could be sorted > by importance. One important criterion also would be whether someone could > confirm it, reproduce it. Even when I reported those desktop freezes, > unless someone confirmed them it might just happen for me. Well a "confirm" > or vote button might be good, so that the amount of confirmations could be > counted. Maybe that could help, but it will not necessarily be the best solution. Keep in mind that some issues may be more important but still reported only by one user. If one reports FS corruption, you certainly don't want to wait for a few other ones to confirm the bug for instance. Security issues don't need counting either. (...) > > It's not really advisable to call dot-0 releases "unstable" because > > it will only result in shifting the adoption point between the user > > classes above. We need to have enthousiasts who proudly say "hey > > look, dot-0 and it's already rock solid". We've all seen some of them > > and they're the ones who help reporting issues that get fixed in the > > next stable release. > > I do think the claim should be honest. "stable" IMHO is not, at least from > a user's point of view. "unstable" isn't either, cause a dot-0 kernel is > not guarenteed to be unstable ;). So I agree with the major release kernel > approach from Rafael. But it's also the starting point of the stable branch. And what about the -stable branch itself. Sometimes an awful bug will prevent the kernel from even booting for most users, and a single patch will be present in the stable branch to fix this early. Same if a major security issue gets discovered at the time of release, it's possible that the stable branch only contains one patch. That does not qualify it for more stable than the main branch either, eventhough it's called "stable". Maybe we should indicate on www.kernel.org that a new release has generally received little testing but should be good enough for experienced users to test it, and that stable releases before .3-.4 are not recommended for general use. > But beyond that, I do think its worth thinking about ways to improve the > process of ensuring as much stability as sensibly possible. A dot-0 kernel > won't be error-free - but I find just claiming the current process as "the > best we can have" not actually satisfying. And I do think it can be > improved upon. I do not do kernel development, but I am willing to help > with collecting information about the current state of the kernel, help > with bug triaging as good as I can and manage to take time. I do have some > experience with quality management as I coordinated the betatest of some > AmigaOS versions, but then this has been in a closed group. Here its a > different scale and I believe it needs somewhat different approaches. In fact, I think we're at a point where the development process scales linearly with every brain and every pair of eyeballs. There are two orthogonal axes to scale, one on the quality and one on the quantity. Both are required, but the time spent on one is not spent on the other one. Customers want quantity (features) and expect implicit quality. It is possible for some people to bring a lot of added value, a lot more than they would through their share of brain time on code. This is the case for Rafael and Greg who noticeably enhance quality, but it's not limited to them too. Code reviews, bug reviews, -next branch, etc... are all geared towards quality. But one thing is sure, there are far less people working on quality than there are working on features, so I think that if you want to help, there is possibly a way to noticeably improve quality with one more guy there, though you have to find how to efficiently spend that time ! Regards, Willy ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 17:36 ` Willy Tarreau @ 2010-07-12 19:56 ` Martin Steigerwald 2010-07-12 23:03 ` Stefan Richter 0 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-07-12 19:56 UTC (permalink / raw) To: Willy Tarreau; +Cc: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 12427 bytes --] Am Montag 12 Juli 2010 schrieb Willy Tarreau: > Hi Martin, Hi Willy, for now I downgraded to 2.6.33.2 and started a compile of 2.6.33.6. I hit yet another bug, but thats a TuxOnIce one (nevertheless reported at bugzilla.kernel.org at #15873). And after booting again after the resume did not work, the machine just locked up again while just playing an avi file from photo sd card - I *think* that dubious freeze bug I mentioned before. Since I am holding a Linux training this week I just decide to downgrade now. Again I didn't try to SSH into the machine, but it was after eight o clock after a long work day, its really hot here and I just couldn't stand doing any collecting information about the bug work that might have easily taken two or more hours. Actually I also do not know what to do with such a random freeze bug? How to best approach it without sinking insane amounts of time into it? The last freeze bug I had was with my ThinkPad T23 when plugging in and later removing the eSATA PCMCIA card. It worked for quite some kernel versions, but since a certain version it just started to freeze on removal. Upto 2.6.33 where I last tried I think. And there I had at least found on what situation it happens. What do I do with such bugs? Back then I just decided to not use the eSATA PCMCIA card in that ThinkPad T23 again, which isn't that unreasonable I think. I didn't even report, which granted might be the reason that its not yet fixed. I am willing to do some testing, but I also like to use Linux. And above a certain amount its just too much for me. Frankly said for me its all happening too fast. I experienced it with some KDE 4 versions - later ones like 4.3 and 4.4! - where I reported so many bug I easily stumpled upon that at some time I just gave up reporting anything. Sure I wanted Radeon DRM KMS. Its great. But I really hope things will be more stable again soon. A new feature is great - when it works. That said, I am not sure whether the recent freeze bug on my ThinkPad T42 is related to Radeon DRM. I think I wait for 2.6.34.2 or .3 and then try again. If it then happens again, hopefully in a moment where I have nerve to deal with such bugs, I fire up my second notebook and try to SSH into the machine. If that works I at least could look into dmesg and X.org logs. Thats what I meant: For me personally the balance is lost. The kernel does not have to be perfect, but I am experiencing just too many issues including quite nasty ones at the moment. 2.6.33.2 with userspace software suspend was stable, or 2.6.32 with TuxOnIce. Thus I am trying 2.6.33.6. > On Mon, Jul 12, 2010 at 05:43:56PM +0200, Martin Steigerwald wrote: > > > Among the things he explained, I remember that one of primary > > > concern was the inability to slow down development. I mean, if he > > > waits 2 more weeks for things to stabilize, then there will be two > > > more weeks of crap^H^H^H^Hdevelopment merged in next merge window, > > > so in fact this will just shift dates and not quality. > > > > Would it make that much of a difference? Linus could still say no to > > obvious crap, couldn't he? > > It's not "obvious" crap, it's that the developers will simply have > advanced two more weeks ahead of their schedule, so their merge will > be larger as it will contain some parts that ought to be in next > release should the kernel be release earlier. And it will not be > possible to delay merging because among them there's always the killer > feature everybody wants. This is the reason for the strict merge > window. Hmmm, it could also be used as two more weeks for testing the new stuff that should go on, but that might just be wishful thinking... Is the Linux kernel development really in balance with feature work and stabilization work? Currently at least from my personal perception it is not. Development goes that fast - can you all cope with that speed? Maybe its just time to *slow it down* a bit? Does it really scale? I am overwhelmed. Several times I just had enough of it. Others had other experiences. So it might just be me having lots of bad luck. What are experiences of others? Actually I think a bit more shift to quality work couldn't harm. > > > There are also some regressions that get merged with every > > > pre-release. Thus, assuming he would wait for one more pre-release > > > to merge the fixes you spotted, 2 or 3 more would appear, so > > > there's a point where it must be decided when to release. > > > > Some sort of classifying bugs could help here I think. Something that > > helps Linus to decide whether it is worth to do another release > > candidate round or not. > > Maybe sometimes that could indeed help, but that must not be done too > often, otherwise releases slip and patches get even bigger. > > (...) > > > I do > > think that the Radeon KMS does not work after resume bug (#15969) > > does qualify since it causes loss of data handled by the current X > > session(s) - sure I normally save my stuff before hibernating, > > but... And it actually had a patch that has been tested! > > Then the problem should be checked on this side : why this patch didn't > get merged in time ? Maybe the maintainer needed more time to recheck > it, maybe he was on holiday, maybe he was ill on the wrong day, maybe > he had already merged tons of fixes and preferred to get this one for > next time, ... But even if there are fixes pending, this should not be > a reason to *delay* releases, otherwise we go back to the problem > above, with also the problem of new regressions reported with tested > fixes available... > > (...) Well it should only be done for major regressions I think. I still think some sorting in the regression list regarding importance and tested patch availability could help. I think that the Radeon DRM fix was quite a low hanging fruit. > > Maybe an approach would be to dynamically generate the list from all > > bug reports marked for 2.6.34 versions and have it posted to kernel > > mailing list after every rc. This way bug #15969 would at least have > > been in the list of known regressions. > > In fact, Rafael regularly emits this list, and the respective > maintainers are informed. That means to me that there's little hope > that you'll get the maintainers to merge and send a fix they did not > manage to do. What *could* be improved though would be if Linus > publically states the deadline for last fixes, as Greg does with the > stable branch. That can give hopes to some of them to finish a little > merge work in time instead of considering it's too late. Hmmm, I did not find any regression list after 2.6.34-rc5 but before 2.6.35 on kernel mailing list here. And the bug and fix was with rc7. If the list would be generated right after every rc? I wouldn't want to demand of anyone to do it that often, but with some automation and a team of people triaging and collecting regressions... > > Bugzilla severity and priority fields or something similar could be > > used to set the importance of a bug report and the regression list > > could be sorted by importance. One important criterion also would be > > whether someone could confirm it, reproduce it. Even when I reported > > those desktop freezes, unless someone confirmed them it might just > > happen for me. Well a "confirm" or vote button might be good, so > > that the amount of confirmations could be counted. > > Maybe that could help, but it will not necessarily be the best > solution. Keep in mind that some issues may be more important but > still reported only by one user. If one reports FS corruption, you > certainly don't want to wait for a few other ones to confirm the bug > for instance. Security issues don't need counting either. Okay, granted. It would just be a indication. But a complete or desktop freeze bug could lead to huge data loss, too, depending on when the user saved his data the last time. Thus is it that much more unimportant. > > > It's not really advisable to call dot-0 releases "unstable" because > > > it will only result in shifting the adoption point between the user > > > classes above. We need to have enthousiasts who proudly say "hey > > > look, dot-0 and it's already rock solid". We've all seen some of > > > them and they're the ones who help reporting issues that get fixed > > > in the next stable release. > > > > I do think the claim should be honest. "stable" IMHO is not, at least > > from a user's point of view. "unstable" isn't either, cause a dot-0 > > kernel is not guarenteed to be unstable ;). So I agree with the > > major release kernel approach from Rafael. > > But it's also the starting point of the stable branch. And what about > the -stable branch itself. Sometimes an awful bug will prevent the > kernel from even booting for most users, and a single patch will be > present in the stable branch to fix this early. Same if a major > security issue gets discovered at the time of release, it's possible > that the stable branch only contains one patch. That does not qualify > it for more stable than the main branch either, eventhough it's called > "stable". Maybe we should indicate on www.kernel.org that a new > release has generally received little testing but should be good > enough for experienced users to test it, and that stable releases > before .3-.4 are not recommended for general use. I thought about calling it a "major kernel release" or something like that from dot-0 and then after stable patches settle - but on what criterion to decide that? - "stable". Just .3 or .4? Or when there have been some dot releases with few patches? But then what if Greg just takes a bit longer to make the next one and it just contains more patches due to that reason? > > But beyond that, I do think its worth thinking about ways to improve > > the process of ensuring as much stability as sensibly possible. A > > dot-0 kernel won't be error-free - but I find just claiming the > > current process as "the best we can have" not actually satisfying. > > And I do think it can be improved upon. I do not do kernel > > development, but I am willing to help with collecting information > > about the current state of the kernel, help with bug triaging as > > good as I can and manage to take time. I do have some experience > > with quality management as I coordinated the betatest of some > > AmigaOS versions, but then this has been in a closed group. Here > > its a different scale and I believe it needs somewhat different > > approaches. > > In fact, I think we're at a point where the development process scales > linearly with every brain and every pair of eyeballs. There are two > orthogonal axes to scale, one on the quality and one on the quantity. > Both are required, but the time spent on one is not spent on the other > one. Customers want quantity (features) and expect implicit quality. Don't customers also want stability? I certainly want it. And many people running servers too in my experience. > It is possible for some people to bring a lot of added value, a lot > more than they would through their share of brain time on code. This is > the case for Rafael and Greg who noticeably enhance quality, but it's > not limited to them too. Code reviews, bug reviews, -next branch, > etc... are all geared towards quality. But one thing is sure, there > are far less people working on quality than there are working on > features, so I think that if you want to help, there is possibly a way > to noticeably improve quality with one more guy there, though you have > to find how to efficiently spend that time ! Yes, and I didn't find that yet. I am not in a state where I can just read kernel code and actually understand what it does. Where I might be able to start helping with his collecting and categorizing bug and regression information, bug triaging and stuff. For some bugs at least. I think there are bugs where I just do not understand enough to do anything helpful. Last post for today. Enough of computing. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 19:56 ` Martin Steigerwald @ 2010-07-12 23:03 ` Stefan Richter 2010-07-13 10:30 ` Martin Steigerwald 2010-07-15 7:32 ` david 0 siblings, 2 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-12 23:03 UTC (permalink / raw) To: Martin Steigerwald; +Cc: Willy Tarreau, linux-kernel Martin Steigerwald wrote: > I think I wait for 2.6.34.2 or .3 and then try again. If it then happens > again, hopefully in a moment where I have nerve to deal with such bugs, I > fire up my second notebook and try to SSH into the machine. If that works I > at least could look into dmesg and X.org logs. netconsole might be required. ... > Is the Linux kernel development really in balance with feature work and > stabilization work? Currently at least from my personal perception it is > not. Development goes that fast - can you all cope with that speed? Maybe > its just time to *slow it down* a bit? If those who added the regressions are found out and asked to debug and fix them, the balance should be corrected and perhaps more precautions being taken in the future. Alas, finding the point in history at which the kernel regressed might take a lot more time than to actually fix it then. In that case, maybe give the author of the bug an estimate of the volunteered hours that were spent on reporting this bug, to put the repercussions into it into perspective. OTOH I suspect a lack of responsibility at the developers is not so much an issue here, more so that the number of people who take the time for -rc tests (not to mention linux-next tests) _and_ to file reports is rather low. Plus, a good bug report often requires experience or good intuition, besides patience and rigor. There were discussions in the past on how more enthusiasts who are willing and able to test prereleases could be attracted. But maybe (just maybe) there are more ways in which the developers themselves could perform more extensive/ more systematic tests. -- Stefan Richter -=====-==-=- -=== -==-= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 23:03 ` Stefan Richter @ 2010-07-13 10:30 ` Martin Steigerwald 2010-07-15 7:32 ` david 1 sibling, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-07-13 10:30 UTC (permalink / raw) To: Stefan Richter; +Cc: Willy Tarreau, linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 2218 bytes --] Am Dienstag 13 Juli 2010 schrieb Stefan Richter: > ... > > > Is the Linux kernel development really in balance with feature work > > and stabilization work? Currently at least from my personal > > perception it is not. Development goes that fast - can you all cope > > with that speed? Maybe its just time to slow it down a bit? > > If those who added the regressions are found out and asked to debug and > fix them, the balance should be corrected and perhaps more precautions > being taken in the future. Alas, finding the point in history at which > the kernel regressed might take a lot more time than to actually fix it > then. In that case, maybe give the author of the bug an estimate of > the volunteered hours that were spent on reporting this bug, to put > the repercussions into it into perspective. OTOH I suspect a lack of > responsibility at the developers is not so much an issue here, more so > that the number of people who take the time for -rc tests (not to > mention linux-next tests) and to file reports is rather low. Plus, a > good bug report often requires experience or good intuition, besides > patience and rigor. > > There were discussions in the past on how more enthusiasts who are > willing and able to test prereleases could be attracted. But maybe > (just maybe) there are more ways in which the developers themselves > could perform more extensive/ more systematic tests. Well I reported it now, although it contains not nearly as much information on how to reproduce it or any other debug information either. I just did not report it before cause I didn't find the information I can provide very helpful and until yesterday I thought it might just have been these two freezes and thats it. But maybe report it early is better than not to report it at all. Bug 16376 - random - possibly Radeon DRM KMS related - freezes https://bugzilla.kernel.org/show_bug.cgi?id=16376 I will look in the logs whether I might have luck and find anything this afternoon when my students learn vi/vim, but I doubt it. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 23:03 ` Stefan Richter 2010-07-13 10:30 ` Martin Steigerwald @ 2010-07-15 7:32 ` david 1 sibling, 0 replies; 72+ messages in thread From: david @ 2010-07-15 7:32 UTC (permalink / raw) To: Stefan Richter; +Cc: Martin Steigerwald, Willy Tarreau, linux-kernel On Tue, 13 Jul 2010, Stefan Richter wrote: > Plus, a > good bug report often requires experience or good intuition, besides > patience and rigor. In my experience these are less of a requirement than patience and persistence. With these attributes you will be able to work your way through figuring out what data is needed for this bug report by answering questions (and if you get no response, trying again) nobody starts off knowing how to report a bug, and frequently you don't start off knowing all the info that will be needed to solve the bug, but if you report it and keep digging you will almost always get helped. David Lang ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-12 15:43 ` Martin Steigerwald 2010-07-12 17:36 ` Willy Tarreau @ 2010-07-12 17:55 ` Stefan Richter 1 sibling, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-12 17:55 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel, Willy Tarreau Martin Steigerwald wrote: > Bugzilla severity and priority fields or something similar could be used to > set the importance of a bug report and the regression list could be sorted > by importance. One important criterion also would be whether someone could > confirm it, reproduce it. Even when I reported those desktop freezes, > unless someone confirmed them it might just happen for me. Well a "confirm" > or vote button might be good, so that the amount of confirmations could be > counted. "I can reproduce it" comments are often very helpful. "It is important to me (and it should be to you too)" comments perhaps not so much. If a bug doesn't make any progress, it may be because the cause of the bug (i.e. which subsystem is at fault or when the bug was introduced) is not known well enough. In such a case, more reproducers won't really help (let alone stating that it is important to somebody); then somebody needs to delve deeper into it and narrow the cause further down. A bug which can be reproduced by several people is usually a bug that can be reproduced quite reliably, and hence is a bug whose cause can likely be found by bisection. A bug report with a to be blamed git commit ID attached (at least as far as the reporter could determine), Cc'd to author and committer of that commit, has more chances to get fixed quicker than others. So, votes don't help IMO; good reports do. And the reports need to be early enough --- i.e. somebody needs to run -rc kernels --- since coming up with a fix, validating the fix, and merging it may take time. If there is little progress on a regression for which at least the faulty subsystem is known, and the release goes by, the merge window opens, and you see a pull request for that subsystem, then reply to that pull request with a friendly reminder that there is still an unresolved regression in that subsystem waiting for attention. [...] > As told already I will rebalance my decision on which kernel to use. If or when you cannot spare resources to test a kernel yourself (be it Linus' final release, or an -rc, not to mention linux-next), you can also look out for Raphael's regression lists around the time of a final release, to get a picture whether it is a worse or better one. -- Stefan Richter -=====-==-=- -=== -==-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 17:22 ` Willy Tarreau ` (2 preceding siblings ...) 2010-07-12 15:43 ` Martin Steigerwald @ 2010-09-04 16:38 ` Martin Steigerwald 2010-09-04 18:46 ` Ted Ts'o ` (2 more replies) 3 siblings, 3 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 16:38 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 4902 bytes --] Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: > Hi Martin, Hi Willy, hi everyone else reading this, > On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > > I hope that someone answers who actually can take some critique. From > > the current replies I perceive a lack of that ability. > > well, I'll try to do then :-) > > There were some threads in the past about kernel releases quality, > where Linus explained why it could not be completely black or white. > > Among the things he explained, I remember that one of primary concern > was the inability to slow down development. I mean, if he waits 2 more > weeks for things to stabilize, then there will be two more weeks of > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this > will just shift dates and not quality. During bisecting [Bug 16376] random - possibly Radeon DRM KMS related freezes, which goes very slowly due to having lots of unbootable kernels with an ext4 / readahead related backtrace during boot, I had an idea: I think main problem is that the current development process does not give time for quality work and bug fixing. As I understand it currently its just a constant development of new features with bug fixing and quality work having to be done beneath that development: - before 2.6.36 is released developers aim at developing new stuff for 2.6.37. - after 2.6.36 is released developers aim at getting as much stuff into 2.6.37 and then after two weeks at developing new features for 2.6.38. This process does not take bug fixing into account at all, cause after the merge window has closing, developers hurry to get the stuff ready for the next window. In that model extending the freeze period after rc1 doesn't help at all, cause as you say more "crap^H^H^H^Hdevelopment" gets collected for the next kernel. But is that a *given* that no one actually has any influence to? Is collecting changes for next kernel like rain that either pours down or not - usually pours down in this case like in August in Germany ;)? Who feeds Linus with new stuff during the merge window? From what I understand of the Linux development process its mainly the subsystem maintainers and Andrew Morton. What if those people stop collecting new stuff for Linus except bugfixes about two or three weeks before the next kernel is relased? This would give the subsystem trees and the mm tree some time to stabilize a bit, so that Linus gets more quality stuff in the first time. And more importantly, since developers know that subsystem maintainers and Andrew only collect bugfixes 2-3 weeks before the release of a stable kernel, they can as well spend some time on quality work. Of course, developers can still decide: Well if 2.6.37 work is closed already and continue developing for 2.6.38 even earlier, but I still think this would help to slow things down a bit prior to the critical phase before releasing a stable kernel. Cause when I know my subsystem maintainer or Andrew won't be taking my stuff anyway, before the release kernel is released, I can take a little time for other things. The main idea here is to have a two-staged freeze process and to distribute the "I am only taking bug fixes" work to more people than Linus. For this to work properly, I think at the time of the release of the stable kernel subsystem maintainers and Andrew should branch their trees. For example when 2.6.36 is released: - tree => 2.6.36-stable-tree => tree, where 2.6.37 stuff will be going in Thus when subsystem maintainers take new stuff during the merge window, it will be for the next kernel release already, not for the current one. Except bugfix work. Whereas I think the criteria for bug fix work should not be that strict than for the stable patches Greg collects. Thus it needs to be clear: No new stuff for next kernel already two weeks prior to release the current stable kernel. I think, this could help. Its a bit like the two-staged development process of Debian, but with the freeze period for "unstable" being a fixed time interval of about 2 weeks instead of RC=0 for stable ;). Its a bit of a formal shift of attention to the stable kernel about 2 weeks before its release. Developers might find creative ways to circumvent it, or they understand, that this process serves a purpose of improving kernel quality. When you think these two weeks cannot be squeezed into the three-monthly development cycle, a four-monthly development cycle might do. But actually I don't see why these two weeks could not be made to fit in there. Installing and testing next kernel after yet another mail to this thread, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 16:38 ` Martin Steigerwald @ 2010-09-04 18:46 ` Ted Ts'o 2010-09-04 19:11 ` Martin Steigerwald 2010-09-04 19:24 ` Stefan Richter 2010-09-05 8:35 ` Avi Kivity 2 siblings, 1 reply; 72+ messages in thread From: Ted Ts'o @ 2010-09-04 18:46 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On Sat, Sep 04, 2010 at 06:38:59PM +0200, Martin Steigerwald wrote: > > During bisecting [Bug 16376] random - possibly Radeon DRM KMS related > freezes, which goes very slowly due to having lots of unbootable kernels > with an ext4 / readahead related backtrace during boot, I had an idea: So I'm not sure what you're referring to here. If there's an ext4 bug, why haven't you reported it to the linux-ext4 list? I've done a Google search for "Steigerwald ext4 readahead" and I can't find any bug report related to kernel oops that are ext4/readahead-related. No one else has reported such a bug to me, and I run a complete set of regression tests before I push ext4 changes to Linus. So I'm not sure what you're seeing. But complaining about it in passing on an e-mail without sending a formal bug report to the linux-ext4 mailing list is not likely to solve your problem... - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 18:46 ` Ted Ts'o @ 2010-09-04 19:11 ` Martin Steigerwald 2010-09-04 23:23 ` Ted Ts'o 0 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 19:11 UTC (permalink / raw) To: Ted Ts'o, linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 3448 bytes --] Am Samstag 04 September 2010 schrieb Ted Ts'o: > On Sat, Sep 04, 2010 at 06:38:59PM +0200, Martin Steigerwald wrote: > > During bisecting [Bug 16376] random - possibly Radeon DRM KMS related > > freezes, which goes very slowly due to having lots of unbootable > > kernels > > > with an ext4 / readahead related backtrace during boot, I had an idea: > So I'm not sure what you're referring to here. If there's an ext4 > bug, why haven't you reported it to the linux-ext4 list? I've done a > Google search for "Steigerwald ext4 readahead" and I can't find any > bug report related to kernel oops that are ext4/readahead-related. > > No one else has reported such a bug to me, and I run a complete set of > regression tests before I push ext4 changes to Linus. So I'm not sure > what you're seeing. But complaining about it in passing on an e-mail > without sending a formal bug report to the linux-ext4 mailing list is > not likely to solve your problem... Stop! I think we are misunderstanding. Its a bug I stumpled across the bisecting process. Neither 2.6.33 or 2.6.34 are affected, but some kernels in between. As such I didn't think its worth reporting the bug. I made a photo of part of the backtrace tough, so if you want I open a bug report about it nonetheless. But I really think it has been fixed during the 2.6.33 to 2.6.34 development cycle. For now I just skipped affected kernels in the bisection process in the hope that none is the first last good or first bad one regarding the freeze bug. Since for now it has all been kernels of a usb merge that showed this issue, I don't think the freeze bug is in there. Its from: # skip: [124d255382ddd37ffa920e9f5183efa54bbfe4f2] USB: pl2303: remove unnecessary reset of usb_device in urbs to # skip: [c68bb0d738897ed39b90c7ccb22e01c938117051] USB: cxacru: document how to interact with the flash memory I did not test booting every single of those >100 revisions, but got fed up with this after the fifth non booting kernel or so. I didn't get why git bisect insisted on taking me back to this range of commits - even in the middle of two skips! - instead of just readjusting the binary search so that that range is met later in the process. Cause then it might have not met again at all. In the end I skipped every commit in this USB merge manually. The ext4 readahead thing must have been introduced before that merge and fixed somewhere after that merge. But I didn't find the comment that might have fixed it from a quick glance. I do not even know whether its ext4 related at all, but ext4 and readahead has been in that backtrace. So I just wanted to show that I am seriously working on tracking down that likely radeon kms related freeze bug and that its time-consuming for me due to having lots of unbootable kernels. I got another one of these with "Destination address too large" before even InitRD seems to have done anything. I skipped this one commit as well, and now git bisect seems to have taken me to a good one again, lets see. At least it didn't freeze prior up to now and I better press send now ;-). But from my bet on where the offending commit might be, this should be a good one. I am learning a lot on how to bisect a kernel right now ;). -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 19:11 ` Martin Steigerwald @ 2010-09-04 23:23 ` Ted Ts'o 2010-09-05 7:59 ` Martin Steigerwald 0 siblings, 1 reply; 72+ messages in thread From: Ted Ts'o @ 2010-09-04 23:23 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On Sat, Sep 04, 2010 at 09:11:34PM +0200, Martin Steigerwald wrote: > > Stop! I think we are misunderstanding. > > Its a bug I stumpled across the bisecting process. Neither 2.6.33 or > 2.6.34 are affected, but some kernels in between. As such I didn't think > its worth reporting the bug. > > I made a photo of part of the backtrace tough, so if you want I open a bug > report about it nonetheless. But I really think it has been fixed during > the 2.6.33 to 2.6.34 development cycle. FYI, it's fair game to send a note to LKML with the backtrace, saying, I'm getting this wierd stack trace while trying to do a bisect; it looks like it's fixed in 2.6.34, does it look familiar? If so, someone might be able to point you at the commit that fixes the bug, and then you can apply that patch by hand while doing the bisect at each step (and then unapply it before doing the next bisect iteration). > For now I just skipped affected kernels in the bisection process in the > hope that none is the first last good or first bad one regarding the freeze > bug. Since for now it has all been kernels of a usb merge that showed this > issue, I don't think the freeze bug is in there. Are you actually booting off of a USB device? Even if you are, it seems... strange... that a series of USB patches would cause an ext4/readahead kernel OOPS. Can you disable using USB devices, which would hopefully prevent the problem from showing up? Note by the way, that you don't have to try compiling at the points chosen by "git bisect". If you run into problems, you can try going to the head of the USB patches, and if that works, report that particular commit as "good" or "bad". > So I just wanted to show that I am seriously working on tracking down that > likely radeon kms related freeze bug and that its time-consuming for me > due to having lots of unbootable kernels. Have you reported this bug to the maintainer? Is he helping you out? Have you looked at the various Radeon-related commits between 2.6.34 and 2.6.33? I imagine there probably aren't that many of them. You might try testing commits just before and after the Radeon-related commits, which might speed up the git bisect significantly. - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 23:23 ` Ted Ts'o @ 2010-09-05 7:59 ` Martin Steigerwald 0 siblings, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-09-05 7:59 UTC (permalink / raw) To: Ted Ts'o; +Cc: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 3725 bytes --] Am Sonntag 05 September 2010 schrieb Ted Ts'o: > On Sat, Sep 04, 2010 at 09:11:34PM +0200, Martin Steigerwald wrote: > > Stop! I think we are misunderstanding. > > > > Its a bug I stumpled across the bisecting process. Neither 2.6.33 or > > 2.6.34 are affected, but some kernels in between. As such I didn't > > think its worth reporting the bug. > > > > I made a photo of part of the backtrace tough, so if you want I open > > a bug report about it nonetheless. But I really think it has been > > fixed during the 2.6.33 to 2.6.34 development cycle. > > FYI, it's fair game to send a note to LKML with the backtrace, saying, > I'm getting this wierd stack trace while trying to do a bisect; it > looks like it's fixed in 2.6.34, does it look familiar? If so, > someone might be able to point you at the commit that fixes the bug, > and then you can apply that patch by hand while doing the bisect at > each step (and then unapply it before doing the next bisect > iteration). Thanks. As to your advice I am seeking help again with bisecting this bug. See the thread "help with git bisecting a bug 16376: random - possibly Radeon DRM KMS related - freezes". I put you on Cc for the Ext4 / readahead related backtrace. > > For now I just skipped affected kernels in the bisection process in > > the hope that none is the first last good or first bad one regarding > > the freeze bug. Since for now it has all been kernels of a usb merge > > that showed this issue, I don't think the freeze bug is in there. > > Are you actually booting off of a USB device? Even if you are, it > seems... strange... that a series of USB patches would cause an > ext4/readahead kernel OOPS. Can you disable using USB devices, which > would hopefully prevent the problem from showing up? Nope. I think the bug is completely unrelated to the commits from the USB merge. I think that the USB commits just had the bad luck having been merged between the other bug was introduced and fixed. > Note by the way, that you don't have to try compiling at the points > chosen by "git bisect". If you run into problems, you can try going > to the head of the USB patches, and if that works, report that > particular commit as "good" or "bad". Yes, thats what the git reset --hard example should do. But I wondered on how to do it exactly. I saw "git reset --hard HEAD~3" in the manpage to go three commits back and only later found out that I could give a commit id to "git reset". Is just going to the head of that USB merge and testing that better than skipping the complete range? Anyway I really think that none of the commits in there caused or fixed that bug. > > So I just wanted to show that I am seriously working on tracking down > > that likely radeon kms related freeze bug and that its > > time-consuming for me due to having lots of unbootable kernels. > > Have you reported this bug to the maintainer? Is he helping you out? > Have you looked at the various Radeon-related commits between 2.6.34 > and 2.6.33? I imagine there probably aren't that many of them. You > might try testing commits just before and after the Radeon-related > commits, which might speed up the git bisect significantly. Yes, of course. I also posted my previous git bisect results already. I wanted to add a comment with the current results yesterday, but bugzilla had to many MySQL connection for an extended period of time. Now I did with more specifically asking for help[1] [1] https://bugzilla.kernel.org/show_bug.cgi?id=16376#c38 Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 16:38 ` Martin Steigerwald 2010-09-04 18:46 ` Ted Ts'o @ 2010-09-04 19:24 ` Stefan Richter 2010-09-04 19:34 ` Stefan Richter 2010-09-04 20:21 ` Martin Steigerwald 2010-09-05 8:35 ` Avi Kivity 2 siblings, 2 replies; 72+ messages in thread From: Stefan Richter @ 2010-09-04 19:24 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Martin Steigerwald wrote: > I think main problem is that the current development process does not give > time for quality work and bug fixing. This has little to do with process. Put simply, the paid developers work on what they are paid for. The volunteers work on what they are interested in. If you feel that too little work is spent on stabilization and bug fixing, pay someone or take matters into your own hand. I.e. report bugs and work with the developers to get the bugs fixed. The current development process OTOH gives plenty of time for quality work and bug fixing: - There are several stages at which new code can be tested: When it lives in subsystem development trees, when it has been pulled into the linux-next tree, when it has been pulled into Linus' tree. - Bug fixes are pulled by Linus almost any time whenever they are ready. (Of course, since fixes can and do introduce regressions too, only critical fixes are accepted in later -rcs.) - New code submissions are pulled by Linus in a fairly reliable cycle with reasonable frequency (less than three months). That way, developers know that if their stuff did not quite cut it for mainline merge in month N, they know they can try again in month N+2 or N+3. They are not left to guess whether their next chance will be in half a year or two years or next week. Hence, nobody needs to panic and rush things when a merge window draws near. Plus, the code and the repository are open, so anybody can ship features to customers at any time independently of Linus' release cycle. Linux distributors do this all the time. > But is that a *given* that no one actually has any influence to? Is > collecting changes for next kernel like rain that either pours down or not > - usually pours down in this case like in August in Germany ;)? Who feeds > Linus with new stuff during the merge window? From what I understand of the > Linux development process its mainly the subsystem maintainers and Andrew > Morton. > > What if those people stop collecting new stuff for Linus except bugfixes > about two or three weeks before the next kernel is relased? Most of the maintainers are responsible enough to put only stuff into linux-next which belongs there, i.e. tested, release-ready stuff. Likewise with submissions to Linus during the merge window. Only some maintainers do in fact try to submit rushed, untested crap. Sometimes they get caught red-handed. The release-ready submissions that come via responsible maintainers still contain some regressions though. This is inevitable. There are less regressions if there are more enthusiasts who test development trees and linux-next. There are less regressions in Linus' releases if there are more enthusiasts who test -rc kernels. (And submit good bug reports and work with the developers on them.) And vice versa. Process does not do much to prevent bugs or fix bugs. People do. :-) However, you can hardly tell people to implement less features and fix more bugs if they don't owe you anything. -- Stefan Richter -=====-==-=- =--= --=-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 19:24 ` Stefan Richter @ 2010-09-04 19:34 ` Stefan Richter 2010-09-04 20:21 ` Martin Steigerwald 1 sibling, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-09-04 19:34 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Stefan Richter wrote: > Process does not do much to prevent bugs or fix bugs. People do. :-) > > However, you can hardly tell people to implement less features and fix more > bugs if they don't owe you anything. PS: When a tester sunk a lot of time into a bisection or generally into a good bug report, like you did recently according to your other post, then the developer of the bug for sure owes you something... But I am sure that most developers do appreciate such work a lot. -- Stefan Richter -=====-==-=- =--= --=-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 19:24 ` Stefan Richter 2010-09-04 19:34 ` Stefan Richter @ 2010-09-04 20:21 ` Martin Steigerwald 2010-09-04 22:50 ` Stefan Richter 2010-09-04 23:16 ` Ted Ts'o 1 sibling, 2 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 20:21 UTC (permalink / raw) To: Stefan Richter; +Cc: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 6220 bytes --] Am Samstag 04 September 2010 schrieb Stefan Richter: > Martin Steigerwald wrote: > > I think main problem is that the current development process does not > > give time for quality work and bug fixing. > > This has little to do with process. > > Put simply, the paid developers work on what they are paid for. The > volunteers work on what they are interested in. And they are paid for features instead of fixing bugs? I doubt enterprise customers have this preference. I admit, they have no reason to pay for fixing my bug, unless they experience it too, however. > If you feel that too little work is spent on stabilization and bug > fixing, pay someone or take matters into your own hand. I.e. report > bugs and work with the developers to get the bugs fixed. I do already for the bugs I encountered. > The current development process OTOH gives plenty of time for quality > work and bug fixing: > > - There are several stages at which new code can be tested: > When it lives in subsystem development trees, > when it has been pulled into the linux-next tree, > when it has been pulled into Linus' tree. > > - Bug fixes are pulled by Linus almost any time whenever they are > ready. (Of course, since fixes can and do introduce regressions too, > only critical fixes are accepted in later -rcs.) > > - New code submissions are pulled by Linus in a fairly reliable cycle > with reasonable frequency (less than three months). That way, > developers know that if their stuff did not quite cut it for > mainline merge in month N, they know they can try again in month > N+2 or N+3. They are not left to guess whether their next chance [...] I will think a bit more about this. But my first impression is that all of these provisions are currently in conflict with time for feature work. If there is no stabilization or sorta of freeze period, the speed won't calm down in order to give stabilizitation a realistic chance. > > But is that a *given* that no one actually has any influence to? Is > > collecting changes for next kernel like rain that either pours down > > or not - usually pours down in this case like in August in Germany > > ;)? Who feeds Linus with new stuff during the merge window? From > > what I understand of the Linux development process its mainly the > > subsystem maintainers and Andrew Morton. > > > > What if those people stop collecting new stuff for Linus except > > bugfixes about two or three weeks before the next kernel is relased? > > Most of the maintainers are responsible enough to put only stuff into > linux-next which belongs there, i.e. tested, release-ready stuff. > Likewise with submissions to Linus during the merge window. > > Only some maintainers do in fact try to submit rushed, untested crap. > Sometimes they get caught red-handed. > > The release-ready submissions that come via responsible maintainers > still contain some regressions though. This is inevitable. There are > less regressions if there are more enthusiasts who test development > trees and linux-next. There are less regressions in Linus' releases > if there are more enthusiasts who test -rc kernels. (And submit good > bug reports and work with the developers on them.) And vice versa. > > Process does not do much to prevent bugs or fix bugs. People do. :-) Yes, my suggestion do not guarantee that people do report and fix bugs. But it gives more room for doing so, especially regarding fixing the open and known regressions. Again two of those that I mentioned initially have been reported by people *during* the rc phase already. Still the stable kernel did not receive the bug fix patch for the nastier one of it in time: That is what I am concerned about. If people do test, do report and someone even does a patch and yet its not in the stable kernel then, what for did they do it? Okay, it was in 2.6.35.1, but when a major and reported regression is only fixed in stable patches I still think that any release without at least two or three stable patches should not be called stable at all - its just misleading then. And I think I am perfectly entitled to that oppinion. Anyway I will relabel kernels in my mind and not consider a kernel without stable patches stable anymore. I did so theoretically before already but now I experienced it for myself the first time. > However, you can hardly tell people to implement less features and fix > more bugs if they don't owe you anything. Sorry for the demanding tone in my post that initiated the thread, but in the post you are answering too I merely made a suggestion. No one does owe me anything and I am aware of that. But still even when I do not prepend each of my mails with a list of what I have done for the kernel - which is clearly less than what any core kernel developer or even a casual kernel developer did for the kernel - I still can make a valuable suggestion. That said I compiled a kernel a day or two for some time to help Ingo Molnar with testing an use case for his CFS scheduler. And am I regularily testing new TuxOnIce kernels and report back to Nigel how they fare. I report bugs for other open source projects like KDE or Debian as well and contribute a bit here and then, like my first debian package "fio". And this work mostly has been enjoyable. Neither Ingo, nor Nigel, nor Jens Axboe asked me what I did for the kernel prior to working with me. They have just been happy for the feedback I gave. I admit my initial post did well to provoke the kind of "what did you do?" feedback as it actually was demanding. But then I really was frustrated with the kernel and I think sometimes an oppinionated post like my "stable? quality assurance?" can be quite good. If I think a kernel is crap, why should it be prohibited that I tell it to their developers? At least I learned a lot and even started bisecting that bug even though it takes an insane amount of time to do so. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 20:21 ` Martin Steigerwald @ 2010-09-04 22:50 ` Stefan Richter 2010-09-04 23:16 ` Ted Ts'o 1 sibling, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-09-04 22:50 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Martin Steigerwald wrote: > Am Samstag 04 September 2010 schrieb Stefan Richter: >> Put simply, the paid developers work on what they are paid for. The >> volunteers work on what they are interested in. > > And they are paid for features instead of fixing bugs? There are lots of people who fix bugs on paid time or are even specifically paid to fix bugs. [...] > I will think a bit more about this. But my first impression is that all of > these provisions are currently in conflict with time for feature work. If > there is no stabilization or sorta of freeze period, the speed won't calm > down in order to give stabilizitation a realistic chance. Linus' merge--rc--release cycle only influences what is pulled into the mainline when. It does not prevent anyone to implement a new feature or to stabilize an existing feature any time. [...] >> However, you can hardly tell people to implement less features and fix >> more bugs if they don't owe you anything. > > Sorry for the demanding tone in my post that initiated the thread, but in > the post you are answering too I merely made a suggestion. No one does owe > me anything and I am aware of that. > > But still even when I do not prepend each of my mails with a list of what > I have done for the kernel - which is clearly less than what any core > kernel developer or even a casual kernel developer did for the kernel - I > still can make a valuable suggestion. > > That said I compiled a kernel a day or two for some time to help Ingo > Molnar with testing an use case for his CFS scheduler. And am I regularily > testing new TuxOnIce kernels and report back to Nigel how they fare. I > report bugs for other open source projects like KDE or Debian as well and > contribute a bit here and then, like my first debian package "fio". > > And this work mostly has been enjoyable. Neither Ingo, nor Nigel, nor Jens > Axboe asked me what I did for the kernel prior to working with me. They > have just been happy for the feedback I gave. > > I admit my initial post did well to provoke the kind of "what did you do?" > feedback as it actually was demanding. By the sentence above I merely meant to say that you or I or anybody cannot lay out work schedules for others who are not our employees. :-) > But then I really was frustrated > with the kernel and I think sometimes an oppinionated post like my > "stable? quality assurance?" can be quite good. If I think a kernel is > crap, why should it be prohibited that I tell it to their developers? It is not prohibited. OTOH I don't know how useful it is at this general level. There are lots of subsystem projects in the kernel project, all in different situations regarding how mature their subsystem is, how many developers and testers they have, what their balance of new features vs. stabilization work is. -- Stefan Richter -=====-==-=- =--= --=-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 20:21 ` Martin Steigerwald 2010-09-04 22:50 ` Stefan Richter @ 2010-09-04 23:16 ` Ted Ts'o 1 sibling, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2010-09-04 23:16 UTC (permalink / raw) To: Martin Steigerwald; +Cc: Stefan Richter, linux-kernel On Sat, Sep 04, 2010 at 10:21:34PM +0200, Martin Steigerwald wrote: > Am Samstag 04 September 2010 schrieb Stefan Richter: > > Martin Steigerwald wrote: > > > I think main problem is that the current development process does not > > > give time for quality work and bug fixing. > > > > This has little to do with process. > > > > Put simply, the paid developers work on what they are paid for. The > > volunteers work on what they are interested in. > > And they are paid for features instead of fixing bugs? I doubt enterprise > customers have this preference. I admit, they have no reason to pay for > fixing my bug, unless they experience it too, however. Kernel developers are paid to work on feature, yes. They are not paid to fix bugs for random folks who want run the latest stable kernel. There are separate groups of people who work on stablizing kernels for the community and enterprise kernels. These folks tend to spend about 3 months stablizing a community distribution, and probably 6-9 months stablizing a kernel for an enterprise distribution. These folks also tend to do most performance tuning on kernels destined for enterprise kernels as well. Obviously some developeres who happen to be employed by distributions will help out in stablizing an enterprise kernel, but usually they get called in to fix a bug after the testers have found it. You can argue that this maybe shouldn't be the way things work, but you're not the ones paying the salaries for the enterprise distributions. I'm sure if enough enterprise distribution customers were willing to pay the enterprise distro folks to stablizing each 2.6.x kernel, the distro's would put their people on it. I know there are some kernel developers who would prefer it if enterprise distro's didn't spend so long stablizing the tree, but instead worked on stablizing each and every mainline release. > I will think a bit more about this. But my first impression is that all of > these provisions are currently in conflict with time for feature work. If > there is no stabilization or sorta of freeze period, the speed won't calm > down in order to give stabilizitation a realistic chance. Again, you can't force developers to work on stablization. Many will work on bugs because they want their driver or their file system to have a good reputation. And we do have people like Rafael who tracks regressions; if there is a regression, and the patch isn't being accepted by the maintainer, nag the maintainer; make sure it's in the kernel bugzilla, and nag Rafael, who normally will also ping maintainers when there is a know bug fix. In the worst case, send mail to Linus. You are empowered to do this. So do it! And BTW, if the fix is reported in -rc7, to be fair, sometimes the maintainer simply won't have time to test and quality control the patch before Linus does a release. So having something show up in a 2.6.x.y release really isn't the end of the world. And the bug did get fixed. It just didn't get fixed in time for *your* needs, but especially in the case of drivers (and your problems seemed to be mostly driver related problems), remember that sometimes the driver maintainer is a volunteer. (One of the advantages of sticking with an Intel video chipset is that maintainer is paid by Intel to support the Intel video drivers, and he is normally quite responsive. In contrast, the Radeon support is, if I recall correctly all done by volunteers, and regardless of whether or not the driver maintainers are paid full-time to work on supporting the driver, or volunteers, people do go on vacation during the summer months...) As far as whether or not a kernel stable, I think the answer is, it's stable if it's stable for *you*. As I've said, with the hardware I've chosen, very often it's stable by -rc3 or -rc4. For others, they may need to wait until several 2.6.x.y releases have gone by. I tend to complain when drivers I care about are broken in the -rc2 or -rc3 time frame. But if people wait until -rc7 to try out the kernel, then it might not get fixed before 2.6.35 comes out. - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 16:38 ` Martin Steigerwald 2010-09-04 18:46 ` Ted Ts'o 2010-09-04 19:24 ` Stefan Richter @ 2010-09-05 8:35 ` Avi Kivity 2010-09-05 9:48 ` Martin Steigerwald 2 siblings, 1 reply; 72+ messages in thread From: Avi Kivity @ 2010-09-05 8:35 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On 09/04/2010 07:38 PM, Martin Steigerwald wrote: > Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: >> Hi Martin, > Hi Willy, hi everyone else reading this, > Interesting, how do you expect Willy to read this if you don't copy him? Don't trim cc lists if you want people to read you email, especially on a high volume list like lkml. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-05 8:35 ` Avi Kivity @ 2010-09-05 9:48 ` Martin Steigerwald 0 siblings, 0 replies; 72+ messages in thread From: Martin Steigerwald @ 2010-09-05 9:48 UTC (permalink / raw) To: Avi Kivity; +Cc: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 1007 bytes --] Am Sonntag 05 September 2010 schrieb Avi Kivity: > On 09/04/2010 07:38 PM, Martin Steigerwald wrote: > > Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: > >> Hi Martin, > > > > Hi Willy, hi everyone else reading this, > > Interesting, how do you expect Willy to read this if you don't copy > him? > > Don't trim cc lists if you want people to read you email, especially on > a high volume list like lkml. It was a mistake. I send another copy with the him on cc and he actually also replied already. There are mailing lists like all the debian ones where cc's are usually not wanted - even on the higher volume lists - and mailing lists where they are wanted like most linux kernel related ones. Sometimes when I switch from debian lists to linux kernel related ones I forget the cc. Would be nice to have a default setting per folder in KMail for this. ;) -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 14:51 ` Martin Steigerwald 2010-07-11 17:22 ` Willy Tarreau @ 2010-07-11 19:49 ` Stefan Richter 2010-07-13 11:11 ` Alejandro Riveira Fernández 2 siblings, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-11 19:49 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel, Lee Mathers Martin Steigerwald wrote: > One reason for a demand for me is best expressed by this question: Does > the kernel developer community want to encourage that a group of advanced > Linux users - but mostly non-developers - compile their own vanilla or > valnilla near kernels, provide wider testing and report a bug now and > then? Yes, testing is desired --- in order to shake out bugs that are not manifest on the developer's systems. Remember that the kernel is a special program in which there are many classes of bugs that can only be reproduced on special hardware and/or with special workloads. Alas, there are not only new bugs in new features but also new bugs in existing features, a.k.a. regressions. But like new bugs, many regressions can alas not be found by the developers themselves on their test systems. You mentioned two particular regressions in your initial posting. Do you have suggestions how they could have been prevented in the first place? Or how they could have been handled better than they were? Do you see subsystems of the kernel in which regressions are not taken as seriously as in other ones? > Well Linus has at least been a bit more reluctant to take big changes > after rc1 this cycle, so maybe 2.6.35 will be better again. 2.6.35 will only be better if this (gradual) change of procedure means that -rc kernels are going to be tested more and new bugs are going to be found and fixed quicker in the -rc phase than before. And 2.6.36+ will only be better if the stricter post -rc1 merges do not motivate developers to put even more hastily assembled under-tested crap into their pre -rc1 pull requests than they already do. [PS: 2.6.34 works very well for me, as most 2.6.x releases do.] [PS2: When on lkml, please use reply-to-all, not reply-to-list-only.] -- Stefan Richter -=====-==-=- -=== -=-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 14:51 ` Martin Steigerwald 2010-07-11 17:22 ` Willy Tarreau 2010-07-11 19:49 ` Stefan Richter @ 2010-07-13 11:11 ` Alejandro Riveira Fernández 2010-07-13 12:50 ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter 2 siblings, 1 reply; 72+ messages in thread From: Alejandro Riveira Fernández @ 2010-07-13 11:11 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 840 bytes --] El Sun, 11 Jul 2010 16:51:42 +0200 Martin Steigerwald <Martin@lichtvoll.de> escribió: > > One reason for a demand for me is best expressed by this question: Does > the kernel developer community want to encourage that a group of advanced > Linux users - but mostly non-developers - compile their own vanilla or > valnilla near kernels, provide wider testing and report a bug now and > then? > > I can live with either answer. If not, I just will be much more reluctant > to try out new kernels. I for one stopped booting into -rc kernels. The fact that still have to patch my kernels with a *one* liner since 2.6.29 kernel [1] does not give me confidence on the "test report/bisect and it will be fixed" promise some have made in this threath [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 11:11 ` Alejandro Riveira Fernández @ 2010-07-13 12:50 ` Stefan Richter 2010-07-13 15:35 ` John W. Linville 2010-07-13 18:06 ` Alejandro Riveira Fernández 0 siblings, 2 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-13 12:50 UTC (permalink / raw) To: Alejandro Riveira Fernández Cc: Martin Steigerwald, linux-kernel, Johannes Berg, John W. Linville, linux-wireless Alejandro Riveira Fernández wrote: > I for one stopped booting into -rc kernels. > The fact that still have to patch my kernels with a *one* liner > since 2.6.29 kernel [1] does not give me confidence on the "test > report/bisect and it will be fixed" promise some have made in this > threath > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362 There were promises made in this thread? Then I must have read a different mailinglist or so. I do not know why your WLAN regression has not been fixed yet, but at least it seems rather plausible why commit 7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if such a revert is the one-liner that you are referring to). Why is one reporter's rt2500 OK now though but not yours? Are there different card revisions or firmwares out there that require quirk handling? -- Stefan Richter -=====-==-=- -=== -==-= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 12:50 ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter @ 2010-07-13 15:35 ` John W. Linville 2010-07-13 18:19 ` Alejandro Riveira Fernández 2010-07-13 18:06 ` Alejandro Riveira Fernández 1 sibling, 1 reply; 72+ messages in thread From: John W. Linville @ 2010-07-13 15:35 UTC (permalink / raw) To: Stefan Richter Cc: Alejandro Riveira Fernández, Martin Steigerwald, linux-kernel, Johannes Berg, linux-wireless On Tue, Jul 13, 2010 at 02:50:14PM +0200, Stefan Richter wrote: > Alejandro Riveira Fernández wrote: > > I for one stopped booting into -rc kernels. > > The fact that still have to patch my kernels with a *one* liner > > since 2.6.29 kernel [1] does not give me confidence on the "test > > report/bisect and it will be fixed" promise some have made in this > > threath > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362 > > There were promises made in this thread? Then I must have read a > different mailinglist or so. > > I do not know why your WLAN regression has not been fixed yet, but at > least it seems rather plausible why commit > 7e0986c17f695952ce5d61ed793ce048ba90a661 is not going to be reverted (if > such a revert is the one-liner that you are referring to). > > Why is one reporter's rt2500 OK now though but not yours? Are there > different card revisions or firmwares out there that require quirk handling? The patch (7e0986c1) corrects an obvious error. Reverting it might improve your (i.e. Alejandro) performance, but it seems likely to cause connectivity problems for others. The fact that reverting 7e098c1 helps you suggests that rt2500usb isn't using the basic_rates map properly. But after reviewing the code and the data I have, I can't see what would be causing that. It is at least possible that your AP is sending bad rate information. Have you tried this device with other APs? John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 15:35 ` John W. Linville @ 2010-07-13 18:19 ` Alejandro Riveira Fernández 2010-07-13 18:38 ` John W. Linville 0 siblings, 1 reply; 72+ messages in thread From: Alejandro Riveira Fernández @ 2010-07-13 18:19 UTC (permalink / raw) To: John W. Linville Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg, linux-wireless El Tue, 13 Jul 2010 11:35:31 -0400 "John W. Linville" <linville@tuxdriver.com> escribió: > > The patch (7e0986c1) corrects an obvious error. Reverting it might > improve your (i.e. Alejandro) performance, but it seems likely to > cause connectivity problems for others. > > The fact that reverting 7e098c1 helps you suggests that rt2500usb my card is pci so it would be rt2500pci > isn't using the basic_rates map properly. But after reviewing the > code and the data I have, I can't see what would be causing that. > It is at least possible that your AP is sending bad rate information. > Have you tried this device with other APs? No; this is a desktop pc that connects to my home router/AP. A new wifi card is cheaper than a new AP ... > > John ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 18:19 ` Alejandro Riveira Fernández @ 2010-07-13 18:38 ` John W. Linville 2010-07-13 19:07 ` Alejandro Riveira Fernández 0 siblings, 1 reply; 72+ messages in thread From: John W. Linville @ 2010-07-13 18:38 UTC (permalink / raw) To: Alejandro Riveira Fernández Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg, linux-wireless On Tue, Jul 13, 2010 at 08:19:27PM +0200, Alejandro Riveira Fernández wrote: > El Tue, 13 Jul 2010 11:35:31 -0400 > "John W. Linville" <linville@tuxdriver.com> escribió: > > > > > > The patch (7e0986c1) corrects an obvious error. Reverting it might > > improve your (i.e. Alejandro) performance, but it seems likely to > > cause connectivity problems for others. > > > > The fact that reverting 7e098c1 helps you suggests that rt2500usb > > my card is pci so it would be rt2500pci Sorry, typo... > > isn't using the basic_rates map properly. But after reviewing the > > code and the data I have, I can't see what would be causing that. > > It is at least possible that your AP is sending bad rate information. > > Have you tried this device with other APs? > > No; this is a desktop pc that connects to my home router/AP. A new wifi > card is cheaper than a new AP ... Perhaps you could capture some beacons from that AP? -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 18:38 ` John W. Linville @ 2010-07-13 19:07 ` Alejandro Riveira Fernández 0 siblings, 0 replies; 72+ messages in thread From: Alejandro Riveira Fernández @ 2010-07-13 19:07 UTC (permalink / raw) To: John W. Linville Cc: Stefan Richter, Martin Steigerwald, linux-kernel, Johannes Berg, linux-wireless El Tue, 13 Jul 2010 14:38:52 -0400 "John W. Linville" <linville@tuxdriver.com> escribió: > > > > isn't using the basic_rates map properly. But after reviewing the > > > code and the data I have, I can't see what would be causing that. > > > It is at least possible that your AP is sending bad rate information. > > > Have you tried this device with other APs? I do no know; i captured some debug data for Ivo back in the day and from what he said all the info passed to the card was correct... See http://lkml.org/lkml/2009/5/25/163 ( link is in bugzilla) in case you missed it > > > > No; this is a desktop pc that connects to my home router/AP. A new wifi > > card is cheaper than a new AP ... > > Perhaps you could capture some beacons from that AP? f you explain how; I can try. > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 12:50 ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter 2010-07-13 15:35 ` John W. Linville @ 2010-07-13 18:06 ` Alejandro Riveira Fernández 2010-07-13 19:18 ` Stefan Richter 1 sibling, 1 reply; 72+ messages in thread From: Alejandro Riveira Fernández @ 2010-07-13 18:06 UTC (permalink / raw) To: Stefan Richter Cc: Martin Steigerwald, linux-kernel, Johannes Berg, John W. Linville, linux-wireless El Tue, 13 Jul 2010 14:50:14 +0200 Stefan Richter <stefanr@s5r6.in-berlin.de> escribió: > Alejandro Riveira Fernández wrote: > > I for one stopped booting into -rc kernels. > > The fact that still have to patch my kernels with a *one* liner > > since 2.6.29 kernel [1] does not give me confidence on the "test > > report/bisect and it will be fixed" promise some have made in this > > threath > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=13362 > > There were promises made in this thread? Then I must have read a > different mailinglist or so. Ok no promises. Maybe I read to much in to Mr Tso previous mail. My apologies [quote] > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when > I find bugs, I report them and I help fix them. If more people did > that, then the 2.6.X.0 releases would be more stable. But kernel > development is a volunteer effort, so it's up to the volunteers to > test and fix bugs during the rc4, -rc5 and -rc6 time frame. [...] > [...] Linux may be a very good bargain (look > at how much Oracle has increased its support contracts for Solaris!), > but it's still not a free lunch. At the end of the day, you get what > you put into it. I tested the kernels i reported the bugs and helped (to the best of my knowledge; I'm not a programmer) I got no result. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) 2010-07-13 18:06 ` Alejandro Riveira Fernández @ 2010-07-13 19:18 ` Stefan Richter 0 siblings, 0 replies; 72+ messages in thread From: Stefan Richter @ 2010-07-13 19:18 UTC (permalink / raw) To: Alejandro Riveira Fernández Cc: Martin Steigerwald, linux-kernel, Johannes Berg, John W. Linville, linux-wireless Alejandro Riveira Fernández wrote: > El Tue, 13 Jul 2010 14:50:14 +0200 > Stefan Richter <stefanr@s5r6.in-berlin.de> escribió: >> There were promises made in this thread? Then I must have read a >> different mailinglist or so. > > Ok no promises. > Maybe I read to much in to Mr Tso previous mail. My apologies > [quote] > > So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when > > I find bugs, I report them and I help fix them. If more people did > > that, then the 2.6.X.0 releases would be more stable. But kernel > > development is a volunteer effort, so it's up to the volunteers to > > test and fix bugs during the rc4, -rc5 and -rc6 time frame. > > [...] > > [...] Linux may be a very good bargain (look > > at how much Oracle has increased its support contracts for Solaris!), > > but it's still not a free lunch. At the end of the day, you get what > > you put into it. > > I tested the kernels i reported the bugs and helped (to the best of my > knowledge; I'm not a programmer) > I got no result. "You get what you put into it" probably did not mean "report a bug, get it fixed, every time". Often enough, kernel bugs or hardware quirks are very hard to fix without direct access to affected hardware. Here is how my involvement with Linux started: I reported a bug but nobody reacted. I collected some more information, reported the bug again, and it was immediately fixed by the driver authors. From then on I kept following driver development as a tester and answered user questions. A few years later, the driver authors all had left for other projects but there were still bugs to tackle. So I started to write and submit bug fixes myself. (I'm not a programmer either but by then I already knew a lot about the subsystem.) -- Stefan Richter -=====-==-=- -=== -==-= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-11 7:18 stable? quality assurance? Martin Steigerwald ` (2 preceding siblings ...) 2010-07-11 13:56 ` Lee Mathers @ 2010-07-12 19:46 ` Nix [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com> 4 siblings, 0 replies; 72+ messages in thread From: Nix @ 2010-07-12 19:46 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On 11 Jul 2010, Martin Steigerwald said: > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well > as most important two complete lockups - well maybe just X.org and radeon > KMS, I didn't start my second laptop to SSH into the locked up one - on my > ThinkPad T42. I fixed the first one with the patch, but after the lockups I > just downgraded to 2.6.33 again. [...] > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > on this mailing list just a moment ago. But then 2.6.33 did hang with > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since > 2.6.34 did not hang with it anymore which was a reason for me to try > 2.6.34 earlier. To introduce yet more anecdata into this thread, I too had problems with TuxOnIce-driven suspend/resume from just post-2.6.32 to just pre-2.6.34. The solution was, surprise surprise, to *raise a bug report*, whereupon in short order I had a workaround. In 2.6.34, the problem vanished as mysteriously as it appeared, as did the bug whereby X coredumped and the screen stayed dark forever upon quitting X. 2.6.34 and 2.6.34.1 have worked better for me than any kernel I've used since 2.6.30, with no bugs noticeable on any of my machines (that's a first since 2.6.26). I speculate that there may be some subtle piece of overwriting inside the Radeon KMS and/or DRM code, which is obscure enough that it is relatively easily perturbed by changes elsewhere in the kernel. But nonetheless, one cannot extrapolate from a single bug in a subsystem as complex as DRM/KMS to the quality of the entire kernel. This is doubly true given the degree of difference between different cards labelled as Radeons: I'd venture to state that most of the Radeon bugs I've seen flow past over the last year or so only affect a small subset of cards: but if you add them all up, it's likely that most users have been bitten by at least one. But the problem here is not the kernel developers, nor the kernel quality: it's that ATI Radeons are a horrifically complicated and tangled web of slightly variable hardware. (In this they are no different from any other modern graphics card.) Martin, might I suggest considering stable kernels 'experimental' until at least .1 is out? Before Linus releases a kernel, its only users are dedicated masochists and developers: after the release, piles of regular early adopters pour in, and heaps of bug reports head to lkml and fixes head to -stable. The .1 kernels, with fixes for some of those, are the first you can really call *stable*, as they've got fixes for bugs isolated after testing by a much larger userbase of suckers. -- N., dedicated sucker and masochist ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>]
* Re: stable? quality assurance? [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com> @ 2010-07-15 9:09 ` Valeo de Vries 2010-07-16 7:00 ` Greg KH 0 siblings, 1 reply; 72+ messages in thread From: Valeo de Vries @ 2010-07-15 9:09 UTC (permalink / raw) To: linux-kernel; +Cc: Martin On 11 July 2010 08:18, Martin Steigerwald <Martin@lichtvoll.de> wrote: > > Hi! > > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well > as most important two complete lockups - well maybe just X.org and radeon > KMS, I didn't start my second laptop to SSH into the locked up one - on my > ThinkPad T42. I fixed the first one with the patch, but after the lockups I > just downgraded to 2.6.33 again. > > I still actually *use* my machines for something else than hunting patches > for kernel bugs and on kernel.org it is written "Latest *Stable* Kernel" > (accentuation from me). I know of the argument that one should use a > distro kernel for machines that are for production use. But frankly, does > that justify to deliver in advance known crap to the distributors? What > impact do partly grave bugs reported on bugzilla have on the release > decision? > > And how about people who have their reasons - mine is TuxOnIce - to > compile their own kernels? > > Well 2.6.34.1 fixed the two reported bugs and it seemed to have fixed the > freezes as well. So far so good. > > Maybe it should read "prerelease of stable" for at least 2.6.34.0 on the > website. And I just again always wait for .2 or .3, as with 2.6.34.1 I > still have some problems like the hang on hibernation reported in > > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > on this mailing list just a moment ago. But then 2.6.33 did hang with > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since > 2.6.34 did not hang with it anymore which was a reason for me to try > 2.6.34 earlier. > > I am quite a bit worried about the quality of the recent kernels. Some > iterations earlier I just compiled them, partly even rc-ones which I do > not expact to be table, and they just worked. But in the recent times .0, > partly even .1 or .2 versions haven't been stable for me quite some times > already and thus they better not be advertised as such on kernel.org I > think. I am willing to risk some testing and do bug reports, but these are > still production machines, I do not have any spare test machines, and > there needs to be some balance, i.e. the kernels should basically work. > Thus I for sure will be more reluctant to upgrade in the future. Ooh, it's been a while since I've partaken in a LKML flamewar. ;-) On a slightly less childish note, I agree with a few of your points. I have noticed *stable* releases (I'm talking distro kernels here) being less than stable on occasion recently (the sporadic hard lock-up, bdi-writeback taking damn long, the recent 'umount with dirty buffers taking an ice-age to complete' bug). Additionally there seems to have been some very chunky point-releases in the last 3-6 months, many containing patches that really should have been kept for the next Linus kernel.org kernel, IMO. These annoyances drove me away from Linux for a good few months... it's amazing what working full-time with Windows can do to one's soul, though! That said, from what I've seen of late, there's only one guy (Greg) handling most of the stable stuff (there are probably others working behind the scenes), and he has a hell of a lot on his plate. So if you, like me, want to see more reliable stable releases, I'd recommend either offering to help out in some way reviewing/testing stable patches, as telling volunteers their shit doesn't tend to gain you much at all, generally. :-) Valeo ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-15 9:09 ` Valeo de Vries @ 2010-07-16 7:00 ` Greg KH 2010-07-16 7:19 ` Justin P. Mattock ` (2 more replies) 0 siblings, 3 replies; 72+ messages in thread From: Greg KH @ 2010-07-16 7:00 UTC (permalink / raw) To: Valeo de Vries; +Cc: linux-kernel, Martin On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote: > That said, from what I've seen of late, there's only one guy (Greg) handling > most of the stable stuff (there are probably others working behind the > scenes),?and he has a hell of a lot on his plate. Nope, it's just me :) thanks, greg "i need some minions" k-h ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-16 7:00 ` Greg KH @ 2010-07-16 7:19 ` Justin P. Mattock 2010-07-16 15:25 ` Randy Dunlap 2010-07-16 15:34 ` Valeo de Vries 2 siblings, 0 replies; 72+ messages in thread From: Justin P. Mattock @ 2010-07-16 7:19 UTC (permalink / raw) To: Greg KH; +Cc: Valeo de Vries, linux-kernel, Martin On 07/16/2010 12:00 AM, Greg KH wrote: > On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote: >> That said, from what I've seen of late, there's only one guy (Greg) handling >> most of the stable stuff (there are probably others working behind the >> scenes),?and he has a hell of a lot on his plate. > > Nope, it's just me :) > > thanks, > > greg "i need some minions" k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > you need some some minions... Justin P. Mattock ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-16 7:00 ` Greg KH 2010-07-16 7:19 ` Justin P. Mattock @ 2010-07-16 15:25 ` Randy Dunlap 2010-07-16 15:34 ` Valeo de Vries 2 siblings, 0 replies; 72+ messages in thread From: Randy Dunlap @ 2010-07-16 15:25 UTC (permalink / raw) To: Greg KH; +Cc: Valeo de Vries, linux-kernel, Martin On Fri, 16 Jul 2010 00:00:10 -0700 Greg KH wrote: > On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote: > > That said, from what I've seen of late, there's only one guy (Greg) handling > > most of the stable stuff (there are probably others working behind the > > scenes),?and he has a hell of a lot on his plate. > > Nope, it's just me :) > > thanks, > > greg "i need some minions" k-h > -- Chris Wright is still listed in MAINTAINERS... --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-07-16 7:00 ` Greg KH 2010-07-16 7:19 ` Justin P. Mattock 2010-07-16 15:25 ` Randy Dunlap @ 2010-07-16 15:34 ` Valeo de Vries 2 siblings, 0 replies; 72+ messages in thread From: Valeo de Vries @ 2010-07-16 15:34 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel On 16 July 2010 08:00, Greg KH <greg@kroah.com> wrote: > On Thu, Jul 15, 2010 at 10:09:03AM +0100, Valeo de Vries wrote: >> That said, from what I've seen of late, there's only one guy (Greg) handling >> most of the stable stuff (there are probably others working behind the >> scenes),?and he has a hell of a lot on his plate. > > Nope, it's just me :) > > thanks, > > greg "i need some minions" k-h I thought that was the case, alas. I'm not sure how much time I could commit, but I'd be interested in helping out, even if it's just reviewing and testing patches heading for stable. Are there any specific areas you could use a hand with though? Valeo ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? @ 2010-09-04 16:42 Martin Steigerwald 2010-09-04 17:22 ` Willy Tarreau 0 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 16:42 UTC (permalink / raw) To: linux-kernel; +Cc: Willy Tarreau [-- Attachment #1: Type: text/plain, Size: 4929 bytes --] Sorry, forgot Cc again. Am Sonntag 11 Juli 2010 schrieb Willy Tarreau: > Hi Martin, Hi Willy, hi everyone else reading this, > On Sun, Jul 11, 2010 at 04:51:42PM +0200, Martin Steigerwald wrote: > > I hope that someone answers who actually can take some critique. From > > the current replies I perceive a lack of that ability. > > well, I'll try to do then :-) > > There were some threads in the past about kernel releases quality, > where Linus explained why it could not be completely black or white. > > Among the things he explained, I remember that one of primary concern > was the inability to slow down development. I mean, if he waits 2 more > weeks for things to stabilize, then there will be two more weeks of > crap^H^H^H^Hdevelopment merged in next merge window, so in fact this > will just shift dates and not quality. During bisecting [Bug 16376] random - possibly Radeon DRM KMS related freezes, which goes very slowly due to having lots of unbootable kernels with an ext4 / readahead related backtrace during boot, I had an idea: I think main problem is that the current development process does not give time for quality work and bug fixing. As I understand it currently its just a constant development of new features with bug fixing and quality work having to be done beneath that development: - before 2.6.36 is released developers aim at developing new stuff for 2.6.37. - after 2.6.36 is released developers aim at getting as much stuff into 2.6.37 and then after two weeks at developing new features for 2.6.38. This process does not take bug fixing into account at all, cause after the merge window has closing, developers hurry to get the stuff ready for the next window. In that model extending the freeze period after rc1 doesn't help at all, cause as you say more "crap^H^H^H^Hdevelopment" gets collected for the next kernel. But is that a *given* that no one actually has any influence to? Is collecting changes for next kernel like rain that either pours down or not - usually pours down in this case like in August in Germany ;)? Who feeds Linus with new stuff during the merge window? From what I understand of the Linux development process its mainly the subsystem maintainers and Andrew Morton. What if those people stop collecting new stuff for Linus except bugfixes about two or three weeks before the next kernel is relased? This would give the subsystem trees and the mm tree some time to stabilize a bit, so that Linus gets more quality stuff in the first time. And more importantly, since developers know that subsystem maintainers and Andrew only collect bugfixes 2-3 weeks before the release of a stable kernel, they can as well spend some time on quality work. Of course, developers can still decide: Well if 2.6.37 work is closed already and continue developing for 2.6.38 even earlier, but I still think this would help to slow things down a bit prior to the critical phase before releasing a stable kernel. Cause when I know my subsystem maintainer or Andrew won't be taking my stuff anyway, before the release kernel is released, I can take a little time for other things. The main idea here is to have a two-staged freeze process and to distribute the "I am only taking bug fixes" work to more people than Linus. For this to work properly, I think at the time of the release of the stable kernel subsystem maintainers and Andrew should branch their trees. For example when 2.6.36 is released: - tree => 2.6.36-stable-tree => tree, where 2.6.37 stuff will be going in Thus when subsystem maintainers take new stuff during the merge window, it will be for the next kernel release already, not for the current one. Except bugfix work. Whereas I think the criteria for bug fix work should not be that strict than for the stable patches Greg collects. Thus it needs to be clear: No new stuff for next kernel already two weeks prior to release the current stable kernel. I think, this could help. Its a bit like the two-staged development process of Debian, but with the freeze period for "unstable" being a fixed time interval of about 2 weeks instead of RC=0 for stable ;). Its a bit of a formal shift of attention to the stable kernel about 2 weeks before its release. Developers might find creative ways to circumvent it, or they understand, that this process serves a purpose of improving kernel quality. When you think these two weeks cannot be squeezed into the three-monthly development cycle, a four-monthly development cycle might do. But actually I don't see why these two weeks could not be made to fit in there. Installing and testing next kernel after yet another mail to this thread, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 16:42 Martin Steigerwald @ 2010-09-04 17:22 ` Willy Tarreau 2010-09-04 19:33 ` Martin Steigerwald 0 siblings, 1 reply; 72+ messages in thread From: Willy Tarreau @ 2010-09-04 17:22 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel Hi Martin, On Sat, Sep 04, 2010 at 06:42:19PM +0200, Martin Steigerwald wrote: (...) > The main idea here is to have a two-staged freeze process and to > distribute the "I am only taking bug fixes" work to more people than Linus. > > For this to work properly, I think at the time of the release of the > stable kernel subsystem maintainers and Andrew should branch their trees. > For example when 2.6.36 is released: > > - tree > => 2.6.36-stable-tree > => tree, where 2.6.37 stuff will be going in > > Thus when subsystem maintainers take new stuff during the merge window, it > will be for the next kernel release already, not for the current one. > Except bugfix work. Whereas I think the criteria for bug fix work should not > be that strict than for the stable patches Greg collects. > > Thus it needs to be clear: No new stuff for next kernel already two weeks > prior to release the current stable kernel. While I respect your beliefs on this matter (they once were mine too), I now realized I was wrong for several reasons : - most developers want to create. They (generally) test what they create, they believe it's flawless because it works for them. No need for more testing, go on with new features ; if you refuse to merge their new work for some time, they work on their own tree and push you more work at once next time. - developers need real world use cases. That means more testers. Developers are bad testers because they don't trigger the unexpected use cases. And how do you get good testers ? by motivating end users to test your code. Most testers will only test a new kernel to get a new feature. If it works for them, no need to push the tests further. So that means that the first RCs are the most tested, and that the later ones are the least tested. Thus at one point you can't hope to get bug reports anymore. When you see an -rc7 or -rc8, you think "hey, -rc4 was OK, let's wait for -final and install it". - people concerned by stability don't test every release. They test when they can, precisely because they can't impact production. So they don't contribute bug reports in time. And as the 2.4 maintainer, I'm well aware of that, because when I break something, I only know about it 3-4 months later. For this reason, I think the release rhythm can't much be changed. I think that trying to evaluate and publish quality per developer or maintainer can have a better effect because everyone in the commit chain is responsible. But even doing that is hard because some changes touch everything and it's not obvious to say that Mr X or Y has done some crap. In my opinion, reporting bugs is the most effective way of improving quality. If you report 10 bugs in a week on the same driver, there are chances that at one point this driver's author will want to take some time to audit his code and find other bugs before you next point your finger at him/her. As you see, the goal is not just to report bugs to get them fixed, but to educate bug authors. I can tell you that I am an author of quite a number of bugs in another project (haproxy), and I absolutely hate it when a bug is detected in production (especially given the product's goal), to the point that the code is generally reworked 2, 3, 5, 10 times before being committed. Of course it is still not enough to catch all bugs, but since the product has got a widely accepted reputation of being rock solid, I think it works quite well afterall. Last, developers must not betray their users' trust. When they're not certain of their code, this must be advertised (this is often the case but not always). That helps a lot end users select only reliable features and experience more stability. Regards, Willy ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 17:22 ` Willy Tarreau @ 2010-09-04 19:33 ` Martin Steigerwald 2010-09-04 20:19 ` Willy Tarreau 0 siblings, 1 reply; 72+ messages in thread From: Martin Steigerwald @ 2010-09-04 19:33 UTC (permalink / raw) To: Willy Tarreau; +Cc: linux-kernel [-- Attachment #1: Type: Text/Plain, Size: 7236 bytes --] Hi again, Am Samstag 04 September 2010 schrieb Willy Tarreau: > On Sat, Sep 04, 2010 at 06:42:19PM +0200, Martin Steigerwald wrote: > (...) > > > The main idea here is to have a two-staged freeze process and to > > distribute the "I am only taking bug fixes" work to more people than > > Linus. > > > > For this to work properly, I think at the time of the release of the > > stable kernel subsystem maintainers and Andrew should branch their > > trees. For example when 2.6.36 is released: > > > > - tree > > > > => 2.6.36-stable-tree > > => tree, where 2.6.37 stuff will be going in > > > > Thus when subsystem maintainers take new stuff during the merge > > window, it will be for the next kernel release already, not for the > > current one. Except bugfix work. Whereas I think the criteria for > > bug fix work should not be that strict than for the stable patches > > Greg collects. > > > > Thus it needs to be clear: No new stuff for next kernel already two > > weeks prior to release the current stable kernel. > > While I respect your beliefs on this matter (they once were mine too), > I now realized I was wrong for several reasons : > - most developers want to create. They (generally) test what they > create, they believe it's flawless because it works for them. No need > for more testing, go on with new features ; if you refuse to merge > their new work for some time, they work on their own tree and push you > more work at once next time. > > - developers need real world use cases. That means more testers. > Developers are bad testers because they don't trigger the unexpected > use cases. And how do you get good testers ? by motivating end users > to test your code. Most testers will only test a new kernel to get a > new feature. If it works for them, no need to push the tests further. > So that means that the first RCs are the most tested, and that the > later ones are the least tested. Thus at one point you can't hope to > get bug reports anymore. When you see an -rc7 or -rc8, you think "hey, > -rc4 was OK, let's wait for -final and install it". That fits perfectly well. If the first rcs are nicely testing, then ideally all major issues should be done, when rc7 or rc8 are reached. And thus time can be spent on fixing the major remaining open regression. I guess those who reported these regression are interested in testing a fix. For me features have been number one reason to upgrade kernels as well, but then its not a yes or no decision, but more a tuning on how much new feature stuff each stable kernel release should have and a way to put a little bit more attention to making a stable kernel release stable. > - people concerned by stability don't test every release. They test > when they can, precisely because they can't impact production. So they > don't contribute bug reports in time. And as the 2.4 maintainer, I'm > well aware of that, because when I break something, I only know about > it 3-4 months later. How does this affect my suggestion above? If as you say the first rcs are tested better and if as I assume those who reported regressions have an interest in testing their fixes, I think this can work out nicely. Aside from that, I am not sure whether most people step in with rc1 or rc2 already. When I tested rc kernels - there have been some times - I usually waited to rc3 or rc4 so I could be somewhat confident that really major issues are fixed already. > For this reason, I think the release rhythm can't much be changed. I still object that for above given reasons. And cause I think that if something does not work out perfectly it still can be improved. But I am interested in your other suggestions as well, cause maybe its not so much the release process but something else the issue here: > I think that trying to evaluate and publish quality per developer or > maintainer can have a better effect because everyone in the commit > chain is responsible. But even doing that is hard because some changes > touch everything and it's not obvious to say that Mr X or Y has done > some crap. And who judges on what is crap? Build failures could be tracked automatically. Partly maybe even performance regression as the automated tests from Phoronix show. Well boot failures or freezes are even more important. But then, you are probably not judging the quality of the work of the developer but the difficulty of the area he works on. Nix pointed out that programming ATI Radeon cards can be quite challenging. And I do have lots of respect for the Radeon KMS related work. So I think it would be unfair to point at one of the Radeon KMS developers and say to him "you did crap" for example. I think crap does happen and am more concerned about how to handle it when it does. > In my opinion, reporting bugs is the most effective way of improving > quality. If you report 10 bugs in a week on the same driver, there are > chances that at one point this driver's author will want to take some > time to audit his code and find other bugs before you next point your > finger at him/her. As you see, the goal is not just to report bugs to > get them fixed, but to educate bug authors. Okay, my contribution then: I report bugs. I reported 4-5 kernels bugs in the last time. I reported some before, but only occassionally. I didn't face that many bugs prior to 2.6.34 which contributed to my admittedly very subjective impression that kernel quality has lowered. > I can tell you that I am an author of quite a number of bugs in another > project (haproxy), and I absolutely hate it when a bug is detected in > production (especially given the product's goal), to the point that the > code is generally reworked 2, 3, 5, 10 times before being committed. Of > course it is still not enough to catch all bugs, but since the product > has got a widely accepted reputation of being rock solid, I think it > works quite well afterall. Interesting project, I am implementing a highly available active/passive loadbalancer cluster using Corosync, Pacemaker and the IPVS frontend Ldirectord at the moment currently at work. > Last, developers must not betray their users' trust. When they're not > certain of their code, this must be advertised (this is often the case > but not always). That helps a lot end users select only reliable > features and experience more stability. Well for me a balance must be met: A kernel has to work good enough for me to use it regularily. And currently 2.6.34 upto 2.6.36-rc2 on my ThinkPad T42 simply do not fulfil that criterium. What annoys me most: Radeon KMS already works perfectly stable on 2.6.33 for me. So the issue was not in the initial version of Radeon KMS. It has been introduced afterwards. Thus a supposedly more matured and stable version of it is working less stable for me. 2.6.33-tp42-01231-g11b897c has been good to me so far. I am glad it had not frozen yet. I better press send now. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: stable? quality assurance? 2010-09-04 19:33 ` Martin Steigerwald @ 2010-09-04 20:19 ` Willy Tarreau 0 siblings, 0 replies; 72+ messages in thread From: Willy Tarreau @ 2010-09-04 20:19 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-kernel On Sat, Sep 04, 2010 at 09:33:27PM +0200, Martin Steigerwald wrote: > > Thus at one point you can't hope to get bug reports anymore. > > When you see an -rc7 or -rc8, you think "hey, -rc4 was OK, let's > > wait for -final and install it". > > That fits perfectly well. If the first rcs are nicely testing, then ideally > all major issues should be done, when rc7 or rc8 are reached. And thus > time can be spent on fixing the major remaining open regression. OK I see that you're talking about *open* regressions. I thought you were talking about bugs in general. I think (but that's my own feeling) that as soon as the cause of a regression is narrowed down enough to identify the commit that caused it, it gets quickly fixed (though I have no numbers on the subject). But when someone says "I was doing this or that when my kernel froze", it can be anything. Drivers are different because they impact less people than the core. However the developers don't always have access to the hardware combination causing a reproducible error case. > I guess > those who reported these regression are interested in testing a fix. I really think that there's good interactivity when the bug is spotted. The hard part is the one before. > > - people concerned by stability don't test every release. They test > > when they can, precisely because they can't impact production. So they > > don't contribute bug reports in time. And as the 2.4 maintainer, I'm > > well aware of that, because when I break something, I only know about > > it 3-4 months later. > > How does this affect my suggestion above? If as you say the first rcs are > tested better and if as I assume those who reported regressions have an > interest in testing their fixes, I think this can work out nicely. But you can't have developer sit on their code for 4 months waiting for bug reports to come in. And if you're talking about open bugs only, each one of them will think the issue is probably in the other one's code. Common problem of development teams. > Aside from that, I am not sure whether most people step in with rc1 or rc2 > already. When I tested rc kernels - there have been some times - I usually > waited to rc3 or rc4 so I could be somewhat confident that really major > issues are fixed already. I think that people waiting for a specific feature will immediately jump on rc1 or rc2. People who are curious about what was stuffed in the new kernel will likely wait for rc3/4, hoping to get something they can run a day long. > > I think that trying to evaluate and publish quality per developer or > > maintainer can have a better effect because everyone in the commit > > chain is responsible. But even doing that is hard because some changes > > touch everything and it's not obvious to say that Mr X or Y has done > > some crap. > > And who judges on what is crap? Build failures could be tracked > automatically. Partly maybe even performance regression as the automated > tests from Phoronix show. Well boot failures or freezes are even more > important. But then, you are probably not judging the quality of the work > of the developer but the difficulty of the area he works on. I agree with you in general on this point, which makes the issue even harder to solve. However, some bugs are definitely caused by crap (look for Al Viro's occasional audit reports, missing locks and thinks like this should not get merged). Every developer starts inexperienced, and may humbly ask for help. > Nix pointed out that programming ATI Radeon cards can be quite > challenging. And I do have lots of respect for the Radeon KMS related > work. So I think it would be unfair to point at one of the Radeon KMS > developers and say to him "you did crap" for example. 100% agreed. It's the same in my opinion for every piece of code that relies on configs that are hard to obtain. For instance, if a driver breaks on configs with more than 256 CPUs or 1 TB of RAM, we can't necessarily blame the author for not being able to test his code in such situations. > I think crap does happen and am more concerned about how to handle it when > it does. OK, but when an unusual config is required, sometimes the author cannot help getting his code fixed. > Okay, my contribution then: I report bugs. I reported 4-5 kernels bugs in > the last time. I reported some before, but only occassionally. That's really nice. > I didn't > face that many bugs prior to 2.6.34 which contributed to my admittedly > very subjective impression that kernel quality has lowered. Possible, but it's also possible that the new bugs affect an area that you're using much more than the ones affected by bugs in older versions. It's also possible that you became better at noticing bugs. > > Last, developers must not betray their users' trust. When they're not > > certain of their code, this must be advertised (this is often the case > > but not always). That helps a lot end users select only reliable > > features and experience more stability. > > Well for me a balance must be met: A kernel has to work good enough for me > to use it regularily. That's what everyone looks for, and obviously the threshold is not the same for everyone, and the bugs don't affect everyone. You see, while 2.4 is in feature freeze and thought to be very stable by its users (and I occasionally encounter systems with 2 years of uptime under permanent stress), i would not be surprized that some people consider it still not stable enough for their usages. It's just a matter of personal taste. > And currently 2.6.34 upto 2.6.36-rc2 on my ThinkPad > T42 simply do not fulfil that criterium. What annoys me most: Radeon KMS > already works perfectly stable on 2.6.33 for me. So the issue was not in > the initial version of Radeon KMS. It has been introduced afterwards. Thus > a supposedly more matured and stable version of it is working less stable > for me. That's where you're on the wrong side. 2.6.34 is not supposed to be a more matured and stable version than 2.6.33. It's supposed to be a more *advanced* version. Some issues were fixed, some features were added, some improvements were performed and many bugs were added in that whole process. There's a rule to follow concerning kernel upgrades in my opinion : you should only upgrade for at least one of these 4 reasons : - test new kernels - get new features - fix a known bug - remain on a supported version It's very likely that you'll regularly switch between newer and older kernels to switch between the first 2 and the last 2 reasons. But people who upgrade just to be on the edge and who don't even contribute bug reports back are just looking for trouble in my opinion. Regards, Willy ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2010-09-05 9:48 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-11 7:18 stable? quality assurance? Martin Steigerwald
2010-07-11 8:39 ` Eric Dumazet
2010-07-11 14:22 ` Martin Steigerwald
2010-07-11 14:52 ` Martin Steigerwald
2010-07-11 15:58 ` William Pitcock
2010-07-11 16:34 ` Eric Dumazet
2010-07-16 6:59 ` Greg KH
2010-08-05 3:27 ` Jeremy Fitzhardinge
2010-07-11 17:04 ` Heinz Diehl
2010-07-11 13:16 ` Ted Ts'o
2010-07-11 18:02 ` Anca Emanuel
2010-07-12 6:46 ` David Newall
[not found] ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
2010-07-12 12:35 ` Fwd: " Marcin Letyns
2010-07-12 12:42 ` Alexey Dobriyan
[not found] ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
2010-07-12 12:52 ` Fwd: " Marcin Letyns
2010-07-12 14:57 ` Valdis.Kletnieks
2010-07-12 15:56 ` David Newall
2010-07-12 17:48 ` Marcin Letyns
2010-07-12 18:00 ` Stefan Richter
2010-07-12 19:58 ` David Newall
2010-07-12 21:11 ` Stefan Richter
2010-07-12 21:39 ` Martin Steigerwald
2010-07-12 22:44 ` Stefan Richter
2010-07-15 7:23 ` david
2010-07-13 16:50 ` Theodore Tso
2010-07-13 20:45 ` David Newall
2010-07-14 6:33 ` Theodore Tso
2010-09-04 17:12 ` Martin Steigerwald
2010-07-11 13:56 ` Lee Mathers
2010-07-11 14:51 ` Martin Steigerwald
2010-07-11 17:22 ` Willy Tarreau
2010-07-11 21:38 ` Rafael J. Wysocki
2010-07-12 4:17 ` Willy Tarreau
2010-07-12 9:56 ` Martin Steigerwald
2010-07-12 15:43 ` Martin Steigerwald
2010-07-12 17:36 ` Willy Tarreau
2010-07-12 19:56 ` Martin Steigerwald
2010-07-12 23:03 ` Stefan Richter
2010-07-13 10:30 ` Martin Steigerwald
2010-07-15 7:32 ` david
2010-07-12 17:55 ` Stefan Richter
2010-09-04 16:38 ` Martin Steigerwald
2010-09-04 18:46 ` Ted Ts'o
2010-09-04 19:11 ` Martin Steigerwald
2010-09-04 23:23 ` Ted Ts'o
2010-09-05 7:59 ` Martin Steigerwald
2010-09-04 19:24 ` Stefan Richter
2010-09-04 19:34 ` Stefan Richter
2010-09-04 20:21 ` Martin Steigerwald
2010-09-04 22:50 ` Stefan Richter
2010-09-04 23:16 ` Ted Ts'o
2010-09-05 8:35 ` Avi Kivity
2010-09-05 9:48 ` Martin Steigerwald
2010-07-11 19:49 ` Stefan Richter
2010-07-13 11:11 ` Alejandro Riveira Fernández
2010-07-13 12:50 ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
2010-07-13 15:35 ` John W. Linville
2010-07-13 18:19 ` Alejandro Riveira Fernández
2010-07-13 18:38 ` John W. Linville
2010-07-13 19:07 ` Alejandro Riveira Fernández
2010-07-13 18:06 ` Alejandro Riveira Fernández
2010-07-13 19:18 ` Stefan Richter
2010-07-12 19:46 ` stable? quality assurance? Nix
[not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
2010-07-15 9:09 ` Valeo de Vries
2010-07-16 7:00 ` Greg KH
2010-07-16 7:19 ` Justin P. Mattock
2010-07-16 15:25 ` Randy Dunlap
2010-07-16 15:34 ` Valeo de Vries
-- strict thread matches above, loose matches on Subject: below --
2010-09-04 16:42 Martin Steigerwald
2010-09-04 17:22 ` Willy Tarreau
2010-09-04 19:33 ` Martin Steigerwald
2010-09-04 20:19 ` Willy Tarreau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox