public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <Martin@lichtvoll.de>
To: linux-kernel@vger.kernel.org
Cc: David Newall <davidn@davidnewall.com>,
	Stefan Richter <stefanr@s5r6.in-berlin.de>,
	Marcin Letyns <mletyns@gmail.com>
Subject: Re: stable? quality assurance?
Date: Mon, 12 Jul 2010 23:39:58 +0200	[thread overview]
Message-ID: <201007122340.06951.Martin@lichtvoll.de> (raw)
In-Reply-To: <4C3B73D7.8050802@davidnewall.com>

[-- Attachment #1: Type: Text/Plain, Size: 8705 bytes --]

Am Montag 12 Juli 2010 schrieb David Newall:
> Stefan Richter wrote:
> > David Newall wrote:
> >> Thus 2.6.34 is the latest gamma-test kernel.  It's not stable and I
> >> doubt anybody honestly thinks otherwise.
> > 
> > It works stable for what I use it for.
> 
> Mea culpa.  I didn't mean that 2.6.34 is unstable, but that the term
> "stable" is not appropriate for a newly released kernel; "gamma" should
> be used instead.

I indeed think stable should mean "stable for the majority of users". Its 
difficult to estimate. But I doubt that every dot-0 release qualified for 
that.

> Merely six months ago 2.6.32 was released; today we're preparing for
> 2.6.35; a new kernel every two months!  Perhaps 2.6.31 is truly the
> latest stable kernel; or else 2.6.27 does, which is the other 2.6 on
> the front page of kernel.org.  I'm pretty sure 2.4 is stable (which
> might explain why I see it embedded *much* more frequently than 2.6.)

I have these metrics:

martin@shambhala:~> uprecords -m 20 | cut -c1-70
     #               Uptime | System                                  
----------------------------+-----------------------------------------
     1    36 days, 09:57:31 | Linux 2.6.32.3-tp42-toi-  Tue Jan 12 09:
     2    31 days, 01:07:24 | Linux 2.6.26.5-tp42-toi-  Tue Sep 30 13:
     3    24 days, 13:29:07 | Linux 2.6.33.2-tp42-toi-  Mon May 31 22:
     4    21 days, 15:08:21 | Linux 2.6.29.2-tp42-toi-  Tue Apr 28 22:
     5    19 days, 21:22:14 | Linux 2.6.33.2-tp42-toi-  Tue May 11 17:
     6    19 days, 09:49:05 | Linux 2.6.32.8-tp42-toi-  Fri Mar  5 11:
     7    18 days, 02:31:41 | Linux 2.6.29.6-tp42-toi-  Thu Jul  9 09:
     8    17 days, 12:38:36 | Linux 2.6.28.8-tp42-toi-  Wed Mar 18 10:
     9    16 days, 16:10:28 | Linux 2.6.31-tp42-toi-3.  Tue Sep 22 21:
    10    15 days, 14:39:26 | Linux 2.6.28.4-tp42-toi-  Mon Feb  9 22:
    11    15 days, 13:58:12 | Linux 2.6.27.7-tp42-toi-  Tue Dec  9 22:
    12    13 days, 21:11:06 | Linux 2.6.31-rc7-tp42-to  Mon Aug 31 21:
    13    13 days, 18:34:00 | Linux 2.6.29.2-tp42-toi-  Wed May 27 19:
    14    12 days, 21:54:18 | Linux 2.6.26.5-tp42-toi-  Fri Oct 31 13:
    15    10 days, 22:02:14 | Linux 2.6.28.7-tp42-toi-  Thu Feb 26 16:
    16    10 days, 16:29:02 | Linux 2.6.33.2-tp42-toi-  Fri Jun 25 19:
    17    10 days, 08:04:52 | Linux 2.6.26.2-tp42-toi-  Thu Sep 18 14:
    18    10 days, 03:52:30 | Linux 2.6.31.3-tp42-toi-  Thu Oct 15 09:
    19     9 days, 22:03:29 | Linux 2.6.31.5-tp42-toi-  Tue Nov  3 11:
    20     9 days, 00:24:22 | Linux 2.6.29.2-tp42-toi-  Thu Jun 25 14:
----------------------------+-----------------------------------------
-> 116     0 days, 00:52:03 | Linux 2.6.33.6-tp42-toi-  Mo
----------------------------+-----------------------------------------
1up in     0 days, 00:31:56 | at                        Mon Jul 12 23:
t10 in    15 days, 13:47:24 | at                        Wed Jul 28 12:
no1 in    36 days, 09:05:29 | at                        Wed Aug 18 08:
    up   608 days, 02:40:08 | since                     Thu Sep 18 14:
  down    54 days, 06:12:57 | since                     Thu Sep 18 14:
   %up               91.808 | since                     Thu Sep 18 14:

And 228 entries in there in total since 2.6.26, with 

martin@shambhala:~> uprecords -m 300 | cut -c1-70 | grep "0 days" | wc -l
148

entries for shorter than one day.

Sure these are not to be read without the experiences I made and the 
reasons for rebooting, since sometimes just I messed up with some kernel 
option and compiled another one.

AFAIR 2.6.26 upto 2.6.32 has been fine, except 2.6.30 where TuxOnIce just 
didn't work, but I am not yet sure whether this was caused by TuxOnIce or 
by some problem with general hibernation infrastructure. I then just 
omitted 2.6.30. Since I only tried 2.6.31 with my T42 I got an whooping 
uptime of over 100 days for 2.6.29 on my T23! Thats stable. Well any 
kernels that reproducably reach more than 15 or 30 days are quite stable 
in my own subjective consideration. Most kernels that got that far would 
likely have lastest much longer if I didn't just compile the next one, be 
it a dot release or a major release.

This all without Radeon KMS!

2.6.33.2 was only stable when I used Radeon KMS without TuxOnIce. Ok, so 
might be a TuxOnIce problem, but then at least those quite frequent hangs 
on hibernation at the place where the screen goes black for a few seconds 
and comes back then which I had with 2.6.33.2 where gone for 2.6.34. Maybe 
they are gone with 2.6.33.6 since it carries some more radeon drm fixes.

2.6.34 did not reach an uptime of more than 2 or 3 days yet.

Well maybe Nix is right and its just that Radeon KMS has not been 
stabilized enough and rest of kernel is quite stable.

And when the combination of 2.6.33 now .6 and userspace software suspend 
works for me - for the first time, often it was TuxOnIce that worked, but 
not any in kernel method I tried from time to time - so be it for the time 
being, even if userspace software suspend is way slower and doesn't 
satisfy the disk on writing the image.

> > If it doesn't for you, then I hope you are already in contact with
> > the respective subsystem developers to get the regressions that you
> > experience fixed.
> 
> (Segue to a problem which follows from calling bleeding-edge kernels
> "stable".)
> 
> When reporting bugs, the first response is often, "we're not interested
> in such an old kernel; try it with the latest."  That's not hugely
> useful when the latest kernels are not suitable for production use.  If
> kernels weren't marked stable until they had earned the moniker, for
> example 2.6.27, then the expectation of developers and of users would
> be consistent: developers could expect users to try it again with
> latest stable kernel, and users could reasonably expect that trying it
> wouldn't break their system.

I think thats really a question on how to attract more widespread testing. 
For wider spread testing it needs to be stable enough to have enough users 
deal with it. But without wider spread testing it might not get there.

I just dropped 2.6.34 for now and I will wait for more dot releases. Maybe 
I am really the only one for whom 2.6.34 doesn't work, maybe just other 
people did so to frustrated without telling here or in bugzilla. 

Maybe providing better ways to report bugs and gather information even on 
freeze bugs without setting up too much manually could help. I certainly 
think that the enhanced DrKonqi crash reported from KDE 4.3 and up helped 
users to provide *good bug reports*. Maybe there could be something like 
that for the kernel and an easy option to have the kernel store even 
backtraces for hard crashes. Unfortunately there is no reset button on 
notebooks, so memory might be the wrong place. Well one could dedicate a 
ring buffer space on the swap partition for that or something like that - 
that area should be writable even when no filesystem is not working 
anymore. On next reboot the bug report application recovers the crash data 
from there. Would impose a risk that on severe memory corruption the 
kernels write crash data elsewhere, where it shouldn't save it. An USB 
stick comes to mind, but what when the USB stack doesn't work anymore?

Well not every bug is a freeze bug and maybe something could be done for 
non freeze bugs. Like an application which records selected data while the 
user reproduces the bug. Just like enhanced DrKonqi collects crash data 
and even helps the user to install necessary debug packages.

But I think when a kernel behaves to unstable for lots of users they just 
drop it. Some bugs are okay, but especially freeze bugs and even more so 
fs corruptions bugs scare non die-hard kernel debuggers who bisect a 
kernel a day away.

Maybe I just had lots of bad luck, so I would love to hear other 
experiences, some already said 2.6.34 works pretty stable for them.

I will leave 2.6.34.1 on my T23 which has a Savage which maybe will never 
get KMS, who knows, and on the workstation at work, which doesn't use 
Radeon KMS due to rock solid stable Debian Lenny userspace. Maybe this at 
least sheds a light, whether most of my issues have likely been Radeon KMS 
related.

As a side note: Ext4 is absolutely rock stable for me! As is XFS on my T23 
and even BTRFS for the T23 /home and some work directory on the 
workstation (not yet on my production T42).

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2010-07-12 21:40 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-11  7:18 stable? quality assurance? Martin Steigerwald
2010-07-11  8:39 ` Eric Dumazet
2010-07-11 14:22   ` Martin Steigerwald
2010-07-11 14:52     ` Martin Steigerwald
2010-07-11 15:58   ` William Pitcock
2010-07-11 16:34     ` Eric Dumazet
2010-07-16  6:59     ` Greg KH
2010-08-05  3:27       ` Jeremy Fitzhardinge
2010-07-11 17:04   ` Heinz Diehl
2010-07-11 13:16 ` Ted Ts'o
2010-07-11 18:02   ` Anca Emanuel
2010-07-12  6:46   ` David Newall
     [not found]     ` <AANLkTilGjfx9sb66qVfZn1SeFPURHUrrdE7JCrild8VX@mail.gmail.com>
2010-07-12 12:35       ` Fwd: " Marcin Letyns
2010-07-12 12:42         ` Alexey Dobriyan
     [not found]           ` <AANLkTik64lxDiCN-eRo3i_-cTqAvCzbaRI4EEXoD44Vj@mail.gmail.com>
2010-07-12 12:52             ` Fwd: " Marcin Letyns
2010-07-12 14:57           ` Valdis.Kletnieks
2010-07-12 15:56       ` David Newall
2010-07-12 17:48         ` Marcin Letyns
2010-07-12 18:00         ` Stefan Richter
2010-07-12 19:58           ` David Newall
2010-07-12 21:11             ` Stefan Richter
2010-07-12 21:39             ` Martin Steigerwald [this message]
2010-07-12 22:44               ` Stefan Richter
2010-07-15  7:23             ` david
2010-07-13 16:50         ` Theodore Tso
2010-07-13 20:45           ` David Newall
2010-07-14  6:33             ` Theodore Tso
2010-09-04 17:12   ` Martin Steigerwald
2010-07-11 13:56 ` Lee Mathers
2010-07-11 14:51   ` Martin Steigerwald
2010-07-11 17:22     ` Willy Tarreau
2010-07-11 21:38       ` Rafael J. Wysocki
2010-07-12  4:17         ` Willy Tarreau
2010-07-12  9:56       ` Martin Steigerwald
2010-07-12 15:43       ` Martin Steigerwald
2010-07-12 17:36         ` Willy Tarreau
2010-07-12 19:56           ` Martin Steigerwald
2010-07-12 23:03             ` Stefan Richter
2010-07-13 10:30               ` Martin Steigerwald
2010-07-15  7:32               ` david
2010-07-12 17:55         ` Stefan Richter
2010-09-04 16:38       ` Martin Steigerwald
2010-09-04 18:46         ` Ted Ts'o
2010-09-04 19:11           ` Martin Steigerwald
2010-09-04 23:23             ` Ted Ts'o
2010-09-05  7:59               ` Martin Steigerwald
2010-09-04 19:24         ` Stefan Richter
2010-09-04 19:34           ` Stefan Richter
2010-09-04 20:21           ` Martin Steigerwald
2010-09-04 22:50             ` Stefan Richter
2010-09-04 23:16             ` Ted Ts'o
2010-09-05  8:35         ` Avi Kivity
2010-09-05  9:48           ` Martin Steigerwald
2010-07-11 19:49     ` Stefan Richter
2010-07-13 11:11     ` Alejandro Riveira Fernández
2010-07-13 12:50       ` rt2x00: slow wifi with correct basic rate bitmap (was Re: stable? quality assurance?) Stefan Richter
2010-07-13 15:35         ` John W. Linville
2010-07-13 18:19           ` Alejandro Riveira Fernández
2010-07-13 18:38             ` John W. Linville
2010-07-13 19:07               ` Alejandro Riveira Fernández
2010-07-13 18:06         ` Alejandro Riveira Fernández
2010-07-13 19:18           ` Stefan Richter
2010-07-12 19:46 ` stable? quality assurance? Nix
     [not found] ` <AANLkTimEdVsmIgXBbmhsq75ElQvGAI8avsM8-wlDpm4z@mail.gmail.com>
2010-07-15  9:09   ` Valeo de Vries
2010-07-16  7:00     ` Greg KH
2010-07-16  7:19       ` Justin P. Mattock
2010-07-16 15:25       ` Randy Dunlap
2010-07-16 15:34       ` Valeo de Vries
  -- strict thread matches above, loose matches on Subject: below --
2010-09-04 16:42 Martin Steigerwald
2010-09-04 17:22 ` Willy Tarreau
2010-09-04 19:33   ` Martin Steigerwald
2010-09-04 20:19     ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201007122340.06951.Martin@lichtvoll.de \
    --to=martin@lichtvoll.de \
    --cc=davidn@davidnewall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mletyns@gmail.com \
    --cc=stefanr@s5r6.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox