Determining patch impact on a specific config

kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed

* Determining patch impact on a specific config
@ 2016-08-17 12:39 Nicholas Mc Guire
  2016-08-17 13:25 ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2016-08-17 12:39 UTC (permalink / raw)
  To: kernelnewbies

Hi !

 For a given patch I would like to find out if it impacts a
 given configuration or not. Now of course one could compile the
 kernel for the configuration prior to the patch, then apply the
 patch and recompile to find out if there is an impact but I would
 be looking for some smarter solution. Checking files only 
 unfortunately will not do it, due to ifdefs and friends so make
 would detect a change and recompile even if the affeted code 
 area is actualy dropped by the preprocessor.

 What Im trying to do is find out is, how many of the e.g. stable
 fixes of 4.4-4.4.14 would have impacted a given configuration - the
 whole exercise is intended for some statistical analysis of bugs
 in linux-stable.

 Maybe this is a trivial problem - but I did not find any usable solution 
 so any ideas other than brute force compile/patch/recompile/diff *.o ?

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 12:39 Determining patch impact on a specific config Nicholas Mc Guire
@ 2016-08-17 13:25 ` Greg KH
  2016-08-17 13:52   ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2016-08-17 13:25 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> 
> Hi !
> 
>  For a given patch I would like to find out if it impacts a
>  given configuration or not. Now of course one could compile the
>  kernel for the configuration prior to the patch, then apply the
>  patch and recompile to find out if there is an impact but I would
>  be looking for some smarter solution. Checking files only 
>  unfortunately will not do it, due to ifdefs and friends so make
>  would detect a change and recompile even if the affeted code 
>  area is actualy dropped by the preprocessor.
> 
>  What Im trying to do is find out is, how many of the e.g. stable
>  fixes of 4.4-4.4.14 would have impacted a given configuration - the
>  whole exercise is intended for some statistical analysis of bugs
>  in linux-stable.

Good question.  Such a good one that I know of a number of people
working on this very topic as part of a PhD research project, it's not a
"trivial" thing to deduce automatically.

good luck!

gre gk-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 13:25 ` Greg KH
@ 2016-08-17 13:52   ` Greg KH
  2016-08-17 14:01     ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2016-08-17 13:52 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > 
> > Hi !
> > 
> >  For a given patch I would like to find out if it impacts a
> >  given configuration or not. Now of course one could compile the
> >  kernel for the configuration prior to the patch, then apply the
> >  patch and recompile to find out if there is an impact but I would
> >  be looking for some smarter solution. Checking files only 
> >  unfortunately will not do it, due to ifdefs and friends so make
> >  would detect a change and recompile even if the affeted code 
> >  area is actualy dropped by the preprocessor.
> > 
> >  What Im trying to do is find out is, how many of the e.g. stable
> >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> >  whole exercise is intended for some statistical analysis of bugs
> >  in linux-stable.

Also, are you going to be analyizing the bugs in the stable trees, or
the ones we just happen to fix?

Note, that's not always the same thing :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 13:52   ` Greg KH
@ 2016-08-17 14:01     ` Nicholas Mc Guire
  2016-08-17 14:17       ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2016-08-17 14:01 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > 
> > > Hi !
> > > 
> > >  For a given patch I would like to find out if it impacts a
> > >  given configuration or not. Now of course one could compile the
> > >  kernel for the configuration prior to the patch, then apply the
> > >  patch and recompile to find out if there is an impact but I would
> > >  be looking for some smarter solution. Checking files only 
> > >  unfortunately will not do it, due to ifdefs and friends so make
> > >  would detect a change and recompile even if the affeted code 
> > >  area is actualy dropped by the preprocessor.
> > > 
> > >  What Im trying to do is find out is, how many of the e.g. stable
> > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > >  whole exercise is intended for some statistical analysis of bugs
> > >  in linux-stable.
> 
> Also, are you going to be analyizing the bugs in the stable trees, or
> the ones we just happen to fix?
> 
> Note, that's not always the same thing :)
>
what we have been looking at first is the stable fixes
for which the bug-commit is known via Fixes: patch. That only
a first approximation but correlates very good with the
overall stable fix rates. And from the regression analysis
of the stable fix rates over versions one then can exstimate the
residual bugs if one knows the distribution of the bug 
survival times - which one again can estimate based on the
bug-fixes that have Fixes: tags. 

I dont know yet how robust these models will be at the end
but from what we have until now I do think we can come up
with quite sound predictions for the residual faults in the
kernel.

Some early results where presented at ALS in Japan on July 14th
but this still needs quite a bit of work.

thx!
hofrat 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 14:01     ` Nicholas Mc Guire
@ 2016-08-17 14:17       ` Greg KH
  2016-08-17 14:49         ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2016-08-17 14:17 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > 
> > > > Hi !
> > > > 
> > > >  For a given patch I would like to find out if it impacts a
> > > >  given configuration or not. Now of course one could compile the
> > > >  kernel for the configuration prior to the patch, then apply the
> > > >  patch and recompile to find out if there is an impact but I would
> > > >  be looking for some smarter solution. Checking files only 
> > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > >  would detect a change and recompile even if the affeted code 
> > > >  area is actualy dropped by the preprocessor.
> > > > 
> > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > >  whole exercise is intended for some statistical analysis of bugs
> > > >  in linux-stable.
> > 
> > Also, are you going to be analyizing the bugs in the stable trees, or
> > the ones we just happen to fix?
> > 
> > Note, that's not always the same thing :)
> >
> what we have been looking at first is the stable fixes
> for which the bug-commit is known via Fixes: patch. That only
> a first approximation but correlates very good with the
> overall stable fix rates. And from the regression analysis
> of the stable fix rates over versions one then can exstimate the
> residual bugs if one knows the distribution of the bug 
> survival times - which one again can estimate based on the
> bug-fixes that have Fixes: tags. 

That is all relying on the Fixes: tags, which are not used evenly across
the kernel at all.  Heck, there are still major subsystems that NEVER
mark a single patch for the stable trees, let alone adding Fixes: tags.
Same thing goes for most cpu architectures.

So be careful about what you are trying to measure, it might just be not
what you are assuming it is...

Also note that LWN.net already published an article based on the fixes:
tags and tracking that in stable releases.

> I dont know yet how robust these models will be at the end
> but from what we have until now I do think we can come up
> with quite sound predictions for the residual faults in the
> kernel.

Based on what I know about how stable patches are picked and applied, I
think you will find it is totally incorrect.  But hey, what do I know?
:)

> Some early results where presented at ALS in Japan on July 14th
> but this still needs quite a bit of work.

Have a pointer to that presentation?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 14:17       ` Greg KH
@ 2016-08-17 14:49         ` Nicholas Mc Guire
  2016-08-17 15:39           ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2016-08-17 14:49 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 04:17:19PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> > On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > > 
> > > > > Hi !
> > > > > 
> > > > >  For a given patch I would like to find out if it impacts a
> > > > >  given configuration or not. Now of course one could compile the
> > > > >  kernel for the configuration prior to the patch, then apply the
> > > > >  patch and recompile to find out if there is an impact but I would
> > > > >  be looking for some smarter solution. Checking files only 
> > > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > > >  would detect a change and recompile even if the affeted code 
> > > > >  area is actualy dropped by the preprocessor.
> > > > > 
> > > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > > >  whole exercise is intended for some statistical analysis of bugs
> > > > >  in linux-stable.
> > > 
> > > Also, are you going to be analyizing the bugs in the stable trees, or
> > > the ones we just happen to fix?
> > > 
> > > Note, that's not always the same thing :)
> > >
> > what we have been looking at first is the stable fixes
> > for which the bug-commit is known via Fixes: patch. That only
> > a first approximation but correlates very good with the
> > overall stable fix rates. And from the regression analysis
> > of the stable fix rates over versions one then can exstimate the
> > residual bugs if one knows the distribution of the bug 
> > survival times - which one again can estimate based on the
> > bug-fixes that have Fixes: tags. 
> 
> That is all relying on the Fixes: tags, which are not used evenly across
> the kernel at all.  Heck, there are still major subsystems that NEVER
> mark a single patch for the stable trees, let alone adding Fixes: tags.
> Same thing goes for most cpu architectures.

Well for the config we studied it was not that bad

4.4 - 4.4.13 stable bug-fix commits 
         total   with    % with
         fix     Fixes:  Fixes
         commits tag     tag in
         1643    589     subsys
kernel   3.89%   4.75%   43.7%
mm       1.82%   2.17%   53.3%
block    0.36%   0.84%   83.3%!
fs       8.76%   4.92%*  20.1%*
net      9.31%   12.56%  48.3%
drivers  47.96%  49.23%  36.8%
include  6.87%   19.18%  28.3%*
arch/x86 4.50%   12.56%  33.7%
 (Note that the precentages here do not add up
  to 100% because we just picked out x86 and did not 
  include all subsystems e.g. lib is missing).

 So fs is significantly below and include a bit - block is 
 hard to say simply because it was only 6 stable fixes of 
 which 5 had Fixes: tags so that sample is too small.
 Correlating overall stable-fixes distribution over sublevels
 with stabel-fixes with Fixes: tag gives me an R^2 of 0.76
 so that does show that for any trending using Fixes: tags
 is resonable. As noted we are looking at statistic properties
 to come up with expected values nothing more.

> 
> So be careful about what you are trying to measure, it might just be not
> what you are assuming it is...

A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
is quite well representing the overall stable fixes. 

> 
> Also note that LWN.net already published an article based on the fixes:
> tags and tracking that in stable releases.

ok will go dig for that - I did not stumble across that yet - actually
did check lwn.net for Fixes tag related infos and found some patches
noted - specifically Doc patches.

> 
> > I dont know yet how robust these models will be at the end
> > but from what we have until now I do think we can come up
> > with quite sound predictions for the residual faults in the
> > kernel.
> 
> Based on what I know about how stable patches are picked and applied, I
> think you will find it is totally incorrect.  But hey, what do I know?
> :)

Well if I look at the overall stable fixes developlment - not just those
with Fixes: tags I get very clear trends if we look at at stable fixes
over sublevels (linear model using gamma-distribution)

ver  intercept slope      p-value DoF AIC
3.2  4.2233783 0.0059133  < 2-16  79  2714.8
3.4  3.9778258 -0.0005657 0.164 * 110 4488
3.10 4.3841885 -0.0085419 < 2-16  98  2147.1
3.12 4.7146752 -0.0014718 0.0413  58  1696.9
3.14 4.6159638 -0.0131122 < 2-16  70  2124.8
3.18 4.671178  -0.006517  7.34-5  34  1881.2
4.1  4.649701  -0.004211  0.09    25  1231.8
4.4  5.049331  -0.039307  7.69-11 12  571.48

So while the confidence levels of some (notable 3.4) is not
that exciting the overall trend does look resonably establshied
that the slop is turning negative - indicating that the
number of stable-fixes of sublevels systematically decreases
with sub-lvels, which does indicate a stable development process.

> 
> > Some early results where presented at ALS in Japan on July 14th
> > but this still needs quite a bit of work.
> 
> Have a pointer to that presentation?
>
They probably are somewher on the ALS site - but I just dropped
them to our web-server at
  http://www.opentech.at/Statistics.pdf and
  http://www.opentech.at/TechSummary.pdf

This is quite a rough summary - so if anyone wants the actual data
or R commands used - let me know - no issue with sharing this and having
people tell me that Im totally wrong :)

thx!
hofrat 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 14:49         ` Nicholas Mc Guire
@ 2016-08-17 15:39           ` Greg KH
  2016-08-17 16:50             ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2016-08-17 15:39 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 02:49:22PM +0000, Nicholas Mc Guire wrote:
> On Wed, Aug 17, 2016 at 04:17:19PM +0200, Greg KH wrote:
> > On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> > > On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > > > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > > > 
> > > > > > Hi !
> > > > > > 
> > > > > >  For a given patch I would like to find out if it impacts a
> > > > > >  given configuration or not. Now of course one could compile the
> > > > > >  kernel for the configuration prior to the patch, then apply the
> > > > > >  patch and recompile to find out if there is an impact but I would
> > > > > >  be looking for some smarter solution. Checking files only 
> > > > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > > > >  would detect a change and recompile even if the affeted code 
> > > > > >  area is actualy dropped by the preprocessor.
> > > > > > 
> > > > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > > > >  whole exercise is intended for some statistical analysis of bugs
> > > > > >  in linux-stable.
> > > > 
> > > > Also, are you going to be analyizing the bugs in the stable trees, or
> > > > the ones we just happen to fix?
> > > > 
> > > > Note, that's not always the same thing :)
> > > >
> > > what we have been looking at first is the stable fixes
> > > for which the bug-commit is known via Fixes: patch. That only
> > > a first approximation but correlates very good with the
> > > overall stable fix rates. And from the regression analysis
> > > of the stable fix rates over versions one then can exstimate the
> > > residual bugs if one knows the distribution of the bug 
> > > survival times - which one again can estimate based on the
> > > bug-fixes that have Fixes: tags. 
> > 
> > That is all relying on the Fixes: tags, which are not used evenly across
> > the kernel at all.  Heck, there are still major subsystems that NEVER
> > mark a single patch for the stable trees, let alone adding Fixes: tags.
> > Same thing goes for most cpu architectures.
> 
> Well for the config we studied it was not that bad
> 
> 4.4 - 4.4.13 stable bug-fix commits 
>          total   with    % with
>          fix     Fixes:  Fixes
>          commits tag     tag in
>          1643    589     subsys
> kernel   3.89%   4.75%   43.7%
> mm       1.82%   2.17%   53.3%
> block    0.36%   0.84%   83.3%!
> fs       8.76%   4.92%*  20.1%*
> net      9.31%   12.56%  48.3%
> drivers  47.96%  49.23%  36.8%
> include  6.87%   19.18%  28.3%*
> arch/x86 4.50%   12.56%  33.7%
>  (Note that the precentages here do not add up
>   to 100% because we just picked out x86 and did not 
>   include all subsystems e.g. lib is missing).
> 
>  So fs is significantly below and include a bit - block is 
>  hard to say simply because it was only 6 stable fixes of 
>  which 5 had Fixes: tags so that sample is too small.
>  Correlating overall stable-fixes distribution over sublevels
>  with stabel-fixes with Fixes: tag gives me an R^2 of 0.76
>  so that does show that for any trending using Fixes: tags
>  is resonable. As noted we are looking at statistic properties
>  to come up with expected values nothing more.

But you aren't comparing that to the number of changes that are
happening in a "real" release.  If you do that, you will see the
subsystems that never mark things for stable, which you totally miss
here, right?

For example, where are the driver subsystems that everyone relies on
that are changing upstream, yet have no stable fixes?  What about the
filesystems that even more people rely on, yet have no stable fixes?
Are those code bases just so good and solid that there are no bugs to be
fixed?  (hint, no...)

So because of that, you can't use the information about what I apply to
stable trees as an indication that those are the only parts of the
kernel that have bugs to be fixed.

> > So be careful about what you are trying to measure, it might just be not
> > what you are assuming it is...
> 
> A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
> is quite well representing the overall stable fixes. 

"overall stable fixes".  Not "overall kernel fixes", two very different
things, please don't confuse the two.

And because of that, I would state that "overall stable fixes" number
really doesn't mean much to a user of the kernel.

> > > I dont know yet how robust these models will be at the end
> > > but from what we have until now I do think we can come up
> > > with quite sound predictions for the residual faults in the
> > > kernel.
> > 
> > Based on what I know about how stable patches are picked and applied, I
> > think you will find it is totally incorrect.  But hey, what do I know?
> > :)
> 
> Well if I look at the overall stable fixes developlment - not just those
> with Fixes: tags I get very clear trends if we look at at stable fixes
> over sublevels (linear model using gamma-distribution)
> 
> ver  intercept slope      p-value DoF AIC
> 3.2  4.2233783 0.0059133  < 2-16  79  2714.8
> 3.4  3.9778258 -0.0005657 0.164 * 110 4488
> 3.10 4.3841885 -0.0085419 < 2-16  98  2147.1
> 3.12 4.7146752 -0.0014718 0.0413  58  1696.9
> 3.14 4.6159638 -0.0131122 < 2-16  70  2124.8
> 3.18 4.671178  -0.006517  7.34-5  34  1881.2
> 4.1  4.649701  -0.004211  0.09    25  1231.8
> 4.4  5.049331  -0.039307  7.69-11 12  571.48
> 
> So while the confidence levels of some (notable 3.4) is not
> that exciting the overall trend does look resonably establshied
> that the slop is turning negative - indicating that the
> number of stable-fixes of sublevels systematically decreases
> with sub-lvels, which does indicate a stable development process.

I don't understand.  Not everyone uses "fixes:" so you really can't
use that as an indication of anything.  I know I never do for any patch
that I write.

Over time, more people are using the "fixes:" tag, but then that messes
with your numbers because you can't compare the work we did this year
with the work we did last year.

Also, our rate of change has increased, and the number of stable patches
being tagged has increased, based on me going around and kicking
maintainers.  Again, because of that you can't compare year to year at
all.

There's also the "bias" of the long-term and stable maintainer to skew
the patches they review and work to get applied based on _why_ they are
maintaining a specific tree.  I know I do that for the trees I maintain,
and know the other stable developers do the same.  But those reasons are
different, so you can't compare what is done to one tree vs. another one
very well at all because of that bias.

So don't compare 3.10 to 3.4 or 3.2 and expect even the motivation to be
identical to what is going on for that tree.

> > > Some early results where presented at ALS in Japan on July 14th
> > > but this still needs quite a bit of work.
> > 
> > Have a pointer to that presentation?
> >
> They probably are somewher on the ALS site - but I just dropped
> them to our web-server at
>   http://www.opentech.at/Statistics.pdf and
>   http://www.opentech.at/TechSummary.pdf
> 
> This is quite a rough summary - so if anyone wants the actual data
> or R commands used - let me know - no issue with sharing this and having
> people tell me that Im totally wrong :)

Interesting, I'll go read them when I get the chance.

But I will make a meta-observation, it's "interesting" that people go
and do analysis of development processes like this, yet never actually
talk to the people doing the work about how they do it, nor how they
could possible improve it based on their analysis.

We aren't just people to just be researched, we can change if asked.
And remember, I _always_ ask for help with the stable development
process, I have huge areas that I know need work to improve, just no one
ever provides that help...

And how is this at all a kernelnewbies question/topic?  That's even
odder to me...

sorry for the rant,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 15:39           ` Greg KH
@ 2016-08-17 16:50             ` Nicholas Mc Guire
  2016-08-17 17:34               ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Mc Guire @ 2016-08-17 16:50 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 05:39:27PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 02:49:22PM +0000, Nicholas Mc Guire wrote:
> > On Wed, Aug 17, 2016 at 04:17:19PM +0200, Greg KH wrote:
> > > On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> > > > On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > > > > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > > > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > > > > 
> > > > > > > Hi !
> > > > > > > 
> > > > > > >  For a given patch I would like to find out if it impacts a
> > > > > > >  given configuration or not. Now of course one could compile the
> > > > > > >  kernel for the configuration prior to the patch, then apply the
> > > > > > >  patch and recompile to find out if there is an impact but I would
> > > > > > >  be looking for some smarter solution. Checking files only 
> > > > > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > > > > >  would detect a change and recompile even if the affeted code 
> > > > > > >  area is actualy dropped by the preprocessor.
> > > > > > > 
> > > > > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > > > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > > > > >  whole exercise is intended for some statistical analysis of bugs
> > > > > > >  in linux-stable.
> > > > > 
> > > > > Also, are you going to be analyizing the bugs in the stable trees, or
> > > > > the ones we just happen to fix?
> > > > > 
> > > > > Note, that's not always the same thing :)
> > > > >
> > > > what we have been looking at first is the stable fixes
> > > > for which the bug-commit is known via Fixes: patch. That only
> > > > a first approximation but correlates very good with the
> > > > overall stable fix rates. And from the regression analysis
> > > > of the stable fix rates over versions one then can exstimate the
> > > > residual bugs if one knows the distribution of the bug 
> > > > survival times - which one again can estimate based on the
> > > > bug-fixes that have Fixes: tags. 
> > > 
> > > That is all relying on the Fixes: tags, which are not used evenly across
> > > the kernel at all.  Heck, there are still major subsystems that NEVER
> > > mark a single patch for the stable trees, let alone adding Fixes: tags.
> > > Same thing goes for most cpu architectures.
> > 
> > Well for the config we studied it was not that bad
> > 
> > 4.4 - 4.4.13 stable bug-fix commits 
> >          total   with    % with
> >          fix     Fixes:  Fixes
> >          commits tag     tag in
> >          1643    589     subsys
> > kernel   3.89%   4.75%   43.7%
> > mm       1.82%   2.17%   53.3%
> > block    0.36%   0.84%   83.3%!
> > fs       8.76%   4.92%*  20.1%*
> > net      9.31%   12.56%  48.3%
> > drivers  47.96%  49.23%  36.8%
> > include  6.87%   19.18%  28.3%*
> > arch/x86 4.50%   12.56%  33.7%
> >  (Note that the precentages here do not add up
> >   to 100% because we just picked out x86 and did not 
> >   include all subsystems e.g. lib is missing).
> > 
> >  So fs is significantly below and include a bit - block is 
> >  hard to say simply because it was only 6 stable fixes of 
> >  which 5 had Fixes: tags so that sample is too small.
> >  Correlating overall stable-fixes distribution over sublevels
> >  with stabel-fixes with Fixes: tag gives me an R^2 of 0.76
> >  so that does show that for any trending using Fixes: tags
> >  is resonable. As noted we are looking at statistic properties
> >  to come up with expected values nothing more.
> 
> But you aren't comparing that to the number of changes that are
> happening in a "real" release.  If you do that, you will see the
> subsystems that never mark things for stable, which you totally miss
> here, right?

we are not looking at the runup to 4.4 here we are looking at
the fixes that go into 4.4.1++ and for those we look at all
commits in linux-stable. so that should cover ALL subsystems 
for which bugs were discovered and fixed (either in 4.4.X or
ported from other 4.X findings).

> 
> For example, where are the driver subsystems that everyone relies on
> that are changing upstream, yet have no stable fixes?  What about the
> filesystems that even more people rely on, yet have no stable fixes?
> Are those code bases just so good and solid that there are no bugs to be
> fixed?  (hint, no...)

that is not what we are claiming - the model here is that the 
operation is uncovering bugs and the critical bugs are being 
fixed in stable releases. That there are more fixes and lots 
of cleanups that go into stable is clear but with respect to 
the usability of the kernel we do assume that if a bug in 
driver X is found that results in this driver being unusable 
or destabilizing the kernel it would be fixed in the stable 
fixes as well (which is also visible in the close to 50% 
fixes being in drivers) - now if that assumption is overly 
naive then you are right - and the assessment will not hold

> 
> So because of that, you can't use the information about what I apply to
> stable trees as an indication that those are the only parts of the
> kernel that have bugs to be fixed.

so a discovered critical bug found in 4.7 that also is found
to apply to say 4.4.14 would *not* be fixed in 4.4.15 stable 
release ? 

> 
> > > So be careful about what you are trying to measure, it might just be not
> > > what you are assuming it is...
> > 
> > A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
> > is quite well representing the overall stable fixes. 
> 
> "overall stable fixes".  Not "overall kernel fixes", two very different
> things, please don't confuse the two.

Im not - we are looking at stable fixes - not kernel fixes the 
reason for that simply being that for kernel fixes it is not
possible to say if they are bug-fixes or optimzations/enhancements
- atleast not in any automated way.

The focus on stable dot releases and their fixes was chosen 
 * because it is manageable
 * because we assume that critical bugs discovered will be fixed
 * and because there are no optimizations or added features 

> 
> And because of that, I would state that "overall stable fixes" number
> really doesn't mean much to a user of the kernel.

It does for those that are using some LTS release and it says 
something about the probability of a bug in a stable relase
being detected. Or would you say that a 4.4.13 is not to be
expected to be better off than 4.4.1 ? From the data we have
looked at so far: life-time of a bug in -stable as well as with 
respect to the discovery rate of bugs in sublevel releases
it seems clear that the reliability of the kernel over
sublevel releases is increasing and that this can be utilized
to select a kernelversion more suitable for HA or critical
systems based on trending/analysis.

> 
> > > > I dont know yet how robust these models will be at the end
> > > > but from what we have until now I do think we can come up
> > > > with quite sound predictions for the residual faults in the
> > > > kernel.
> > > 
> > > Based on what I know about how stable patches are picked and applied, I
> > > think you will find it is totally incorrect.  But hey, what do I know?
> > > :)
> > 
> > Well if I look at the overall stable fixes developlment - not just those
> > with Fixes: tags I get very clear trends if we look at at stable fixes
> > over sublevels (linear model using gamma-distribution)
> > 
> > ver  intercept slope      p-value DoF AIC
> > 3.2  4.2233783 0.0059133  < 2-16  79  2714.8
> > 3.4  3.9778258 -0.0005657 0.164 * 110 4488
> > 3.10 4.3841885 -0.0085419 < 2-16  98  2147.1
> > 3.12 4.7146752 -0.0014718 0.0413  58  1696.9
> > 3.14 4.6159638 -0.0131122 < 2-16  70  2124.8
> > 3.18 4.671178  -0.006517  7.34-5  34  1881.2
> > 4.1  4.649701  -0.004211  0.09    25  1231.8
> > 4.4  5.049331  -0.039307  7.69-11 12  571.48
> > 
> > So while the confidence levels of some (notable 3.4) is not
> > that exciting the overall trend does look resonably establshied
> > that the slop is turning negative - indicating that the
> > number of stable-fixes of sublevels systematically decreases
> > with sub-lvels, which does indicate a stable development process.
> 
> I don't understand.  Not everyone uses "fixes:" so you really can't
> use that as an indication of anything.  I know I never do for any patch
> that I write.

This is not using Fixes: this is over all stable sublevel relase
fix-commits - so the overall number of commits in the sublevel
releases is systematically going down with sublevels (sublevels
them selves being of course a covoluted parameter representing
testing/field-usage/review/etc.)

> 
> Over time, more people are using the "fixes:" tag, but then that messes
> with your numbers because you can't compare the work we did this year
> with the work we did last year.

sur why not ? You must look at relative usage and correlation
of the tags - currently about 36% of the stable commits in the
dot-releases (sublevels) are a uable basis - if the use of
Fixes: increases all the better - it just means we are moving
towards an R^2 of 1 - results stay comparable, it just means
that the confidence intervals for the current data are wider
than for the data of next year.

> 
> Also, our rate of change has increased, and the number of stable patches
> being tagged has increased, based on me going around and kicking
> maintainers.  Again, because of that you can't compare year to year at
> all.

why not ? We are not selecting a specific class of bugs in any
way - the Fixes are neatly randomly distributed across the 
effective fixes in stable - it may be a bit biased because some
maintainer does not like Fixes: tags and her subsystem is 
significantly more complex/more buggy/better tested/etc. than
the average bussystem - so we would get a bit of a bias into it
all - but that does not invalidate the results. 
You can ask the voters in 3 states who they will elect president
and this will give you a less accurate result than if you ask in
all 51 states but if you factor in that uncertainty into the
result its perfectly valid and stays comparable to other results 

Im not saying that you simply can compare numeric values for
2016 with those from 2017 but you can compare the trends and
the expectations if you model uncertainties. 

Note that we have a huge advantage here - we can make predictions
from models - say predict 4.4.16 and then actually check our models

Now if there are really significant changes like the task struct
bein redone then that may have a large impact and the assumption
that the convoluted parameter "sublevel" is describing a more or
less stable development might be less correct - it will not be
completely wrong - and consequently the prediction quality will
suffer - but does that invalidate the approach ?

> 
> There's also the "bias" of the long-term and stable maintainer to skew
> the patches they review and work to get applied based on _why_ they are
> maintaining a specific tree.  I know I do that for the trees I maintain,
> and know the other stable developers do the same.  But those reasons are
> different, so you can't compare what is done to one tree vs. another one
> very well at all because of that bias.

If the procedures applied do not "jump" but evolve then bias is
not an issue - you can find many factors that will increas the
uncertainty of any such prediction - but if the parameters, which
all are convoluted - be it by presonal preferences of maintainers
selection of a specific FS in mainline distributions, etc - stil
represent the overall development and as long as your bias as you
called it does not flip-flop from 4.4.6 to 4.4.7 we do not care
to much.

> 
> So don't compare 3.10 to 3.4 or 3.2 and expect even the motivation to be
> identical to what is going on for that tree.
> 

no expectation of anything being constant - we simply say that the number of fixes was going up with sublevels in 3.2 now down !
and has since then shown improved trends with 4.4 showing a 
robust negative coupling (declining bug-fixes). This is valid
because it is generally *not* the maintainers that discover the
bugs - its the users/testers/reviewers. I doubt that maintainers
would reject a critical bug-fix provided to them due to personal
bias.

> > > > Some early results where presented at ALS in Japan on July 14th
> > > > but this still needs quite a bit of work.
> > > 
> > > Have a pointer to that presentation?
> > >
> > They probably are somewher on the ALS site - but I just dropped
> > them to our web-server at
> >   http://www.opentech.at/Statistics.pdf and
> >   http://www.opentech.at/TechSummary.pdf
> > 
> > This is quite a rough summary - so if anyone wants the actual data
> > or R commands used - let me know - no issue with sharing this and having
> > people tell me that Im totally wrong :)
> 
> Interesting, I'll go read them when I get the chance.
> 
> But I will make a meta-observation, it's "interesting" that people go
> and do analysis of development processes like this, yet never actually
> talk to the people doing the work about how they do it, nor how they
> could possible improve it based on their analysis.

I do talk to the people - Ive been doing this quit a bit - one of
the reasons for hoping over to ALS was precisely that. We ahve been
publishing our stuff all along including any findings, patches
etc. 

BUT: Im not going to go to LinuxCon and claim that I know how
     to do better - not based on the preliminary data we have now

Once we think we have something solid - I?ll be most happy to sit 
down and listen.

> 
> We aren't just people to just be researched, we can change if asked.
> And remember, I _always_ ask for help with the stable development
> process, I have huge areas that I know need work to improve, just no one
> ever provides that help...

And we are doing our best to support that - be it by documentation
fixes, compliance analysis, type safety analysis and appropriate
patches Ive been pestering maintainers with.

But you do have to give us the time to have SOLID data first
and NOT rush conclusions - as you pointed out here your self
some of the assumptions we are making might well be wrong so 
what kind of suggestions do you expect here ? 
 First get the data
  -> make a model
   -> deduce your analysis/sample/experiements
    -> write it all up and present it to the community 
     -> get the feedback and fix the model
and if after tha some significant findings are left - THEN
we will show up at LinuxCon and try to find someone to listen
to what we think we have to say...

> 
> And how is this at all a kernelnewbies question/topic?  That's even
> odder to me...

Well that is not - but the first was - I simply could not come up
with some resonable way to figure out the imapct of a patch on a
given config - that did sound to me like it would be a kernelnewbie
question... 

> 
> sorry for the rant,
>

Rants at that level are most welcome - I?ll put some of the
concerns raised on my TODO list for our next round of data analysis.

thx!
hofrat 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 16:50             ` Nicholas Mc Guire
@ 2016-08-17 17:34               ` Greg KH
  2016-08-17 18:48                 ` Nicholas Mc Guire
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2016-08-17 17:34 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 04:50:30PM +0000, Nicholas Mc Guire wrote:
> > But you aren't comparing that to the number of changes that are
> > happening in a "real" release.  If you do that, you will see the
> > subsystems that never mark things for stable, which you totally miss
> > here, right?
> 
> we are not looking at the runup to 4.4 here we are looking at
> the fixes that go into 4.4.1++ and for those we look at all
> commits in linux-stable. so that should cover ALL subsystems 
> for which bugs were discovered and fixed (either in 4.4.X or
> ported from other 4.X findings).

No, because (see below)...

> > For example, where are the driver subsystems that everyone relies on
> > that are changing upstream, yet have no stable fixes?  What about the
> > filesystems that even more people rely on, yet have no stable fixes?
> > Are those code bases just so good and solid that there are no bugs to be
> > fixed?  (hint, no...)
> 
> that is not what we are claiming - the model here is that the 
> operation is uncovering bugs and the critical bugs are being 
> fixed in stable releases. That there are more fixes and lots 
> of cleanups that go into stable is clear but with respect to 
> the usability of the kernel we do assume that if a bug in 
> driver X is found that results in this driver being unusable 
> or destabilizing the kernel it would be fixed in the stable 
> fixes as well (which is also visible in the close to 50% 
> fixes being in drivers) - now if that assumption is overly 
> naive then you are right - and the assessment will not hold

No, that's not how bugs normally get found/fixed. They aren't found in
older kernels for the most part, they are found in the "latest" kernel
and then sometimes tagged that they should be backported.

All of the automated testing/debugging that we have going on to fix
issues are on the latest kernel release, not the older releases.  We
might get lucky and get bug reports from a good distro like Debian or
Fedora that is running the latest stable kernel, but usually those
reports are "backport these fixes to the stable kernel please" as the
fixes have already been made by the community.

But this does depened on the subsystem/area as well.  Some arches don't
even test every kernel, they only wake up once a year and start sending
patches in.  Some subsystems are the same (scsi was known for this...)
So things are all over the place.

Also, you have the subsystems and arches that are just quiet for
long stretches of time (like scsi used to be), where patches would queue
up for many releases before they finally got merged.  Some arches only
send patches in every other year for things that are more complex than
build-breakage bugs because they just don't care.

So please, don't assume that the patches I apply to a LTS kernel are due
to someone noticing it in that kernel.  It's almost never the case, but
of course there are exceptions.

Again, I think you are trying to attribute a pattern to something that
doesn't have it, based on how I have been seeing kernel development work
over the years.

> > So because of that, you can't use the information about what I apply to
> > stable trees as an indication that those are the only parts of the
> > kernel that have bugs to be fixed.
> 
> so a discovered critical bug found in 4.7 that also is found
> to apply to say 4.4.14 would *not* be fixed in 4.4.15 stable 
> release ? 

Maybe, depends on the subsystem.  I know some specific ones that the
answer to that would be no.  And that's the subsystem maintainers
choice, I can't tell him to do extra work just because I decided to
maintain a specific kernel version for longer than expected.

> > > > So be careful about what you are trying to measure, it might just be not
> > > > what you are assuming it is...
> > > 
> > > A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
> > > is quite well representing the overall stable fixes. 
> > 
> > "overall stable fixes".  Not "overall kernel fixes", two very different
> > things, please don't confuse the two.
> 
> Im not - we are looking at stable fixes - not kernel fixes the 
> reason for that simply being that for kernel fixes it is not
> possible to say if they are bug-fixes or optimzations/enhancements
> - atleast not in any automated way.

I agree, it's hard, if not impossible to do that :)

> The focus on stable dot releases and their fixes was chosen 
>  * because it is manageable
>  * because we assume that critical bugs discovered will be fixed
>  * and because there are no optimizations or added features 

The first one makes this easier for you, the second and third are not
always true.  There have been big patchsets get merged into longterm
stable kernel releases that were done because they were "optimizations"
and the maintainer of that subsystem and I discussed it and deemed it
was a valid thing to accept.  This happens every 6 months or so if you
look closely.  The mm subsystem is known for this :)

And as for #2, again, I'm at the whim of the subsystem maintainer to
mark the patches as such.  And again, this does not happen for all
subsystems.

> > And because of that, I would state that "overall stable fixes" number
> > really doesn't mean much to a user of the kernel.
> 
> It does for those that are using some LTS release and it says 
> something about the probability of a bug in a stable relase
> being detected. Or would you say that a 4.4.13 is not to be
> expected to be better off than 4.4.1 ?

Yes, I would hope that it is better, otherwise why would I have accepted
the patches to create that kernel?  :)

But you can't make the claim that all bugs that are being found are
being added to the stable kernels, and especially not the lts kernels.

> From the data we have
> looked at so far: life-time of a bug in -stable as well as with 
> respect to the discovery rate of bugs in sublevel releases
> it seems clear that the reliability of the kernel over
> sublevel releases is increasing and that this can be utilized
> to select a kernelversion more suitable for HA or critical
> systems based on trending/analysis.

That's good, I'm glad we aren't regressing.  But the only way you can be
sure to get all fixes is to always use the latest kernel release.
That's all the kernel developers will ever guarantee.

"critical" and HA systems had better be updating to newer kernel
releases as they have all of the fixes in it that they need.  There
shouldn't be any "fear" of changing to a new kernel any more than they
should fear moving to a new .y stable release.

> > Over time, more people are using the "fixes:" tag, but then that messes
> > with your numbers because you can't compare the work we did this year
> > with the work we did last year.
> 
> sur why not ? You must look at relative usage and correlation
> of the tags - currently about 36% of the stable commits in the
> dot-releases (sublevels) are a uable basis - if the use of
> Fixes: increases all the better - it just means we are moving
> towards an R^2 of 1 - results stay comparable, it just means
> that the confidence intervals for the current data are wider
> than for the data of next year.

Depends on how you describe "confidence" levels, but sure, I'll take
your word for it :)

> > Also, our rate of change has increased, and the number of stable patches
> > being tagged has increased, based on me going around and kicking
> > maintainers.  Again, because of that you can't compare year to year at
> > all.
> 
> why not ? We are not selecting a specific class of bugs in any
> way - the Fixes are neatly randomly distributed across the 
> effective fixes in stable - it may be a bit biased because some
> maintainer does not like Fixes: tags and her subsystem is 
> significantly more complex/more buggy/better tested/etc. than
> the average bussystem - so we would get a bit of a bias into it
> all - but that does not invalidate the results. 
> You can ask the voters in 3 states who they will elect president
> and this will give you a less accurate result than if you ask in
> all 51 states but if you factor in that uncertainty into the
> result its perfectly valid and stays comparable to other results 
> 
> Im not saying that you simply can compare numeric values for
> 2016 with those from 2017 but you can compare the trends and
> the expectations if you model uncertainties. 

Ok, fair enough.  As long as we continue to do better I'll be happy.

> Note that we have a huge advantage here - we can make predictions
> from models - say predict 4.4.16 and then actually check our models

That's good, and is what I've been telling people that they should be
doing for a long time.  Someone actually went and ran regression tests
on all 3.10.y kernel releases and found no regressions for their
hardware platform.  That's a good thing to see.

> Now if there are really significant changes like the task struct
> bein redone then that may have a large impact and the assumption
> that the convoluted parameter "sublevel" is describing a more or
> less stable development might be less correct - it will not be
> completely wrong - and consequently the prediction quality will
> suffer - but does that invalidate the approach ?

I don't know, you tell me :)

> > There's also the "bias" of the long-term and stable maintainer to skew
> > the patches they review and work to get applied based on _why_ they are
> > maintaining a specific tree.  I know I do that for the trees I maintain,
> > and know the other stable developers do the same.  But those reasons are
> > different, so you can't compare what is done to one tree vs. another one
> > very well at all because of that bias.
> 
> If the procedures applied do not "jump" but evolve then bias is
> not an issue - you can find many factors that will increas the
> uncertainty of any such prediction - but if the parameters, which
> all are convoluted - be it by presonal preferences of maintainers
> selection of a specific FS in mainline distributions, etc - stil
> represent the overall development and as long as your bias as you
> called it does not flip-flop from 4.4.6 to 4.4.7 we do not care
> to much.

Ok, but I don't think the users of those kernels will like that, as you
can't represent bias in your numbers and perhaps a whole class of users
is being ignored for one specific LTS release.  Then they would get no
bugfixes for their areas :(

> > > > > Some early results where presented at ALS in Japan on July 14th
> > > > > but this still needs quite a bit of work.
> > > > 
> > > > Have a pointer to that presentation?
> > > >
> > > They probably are somewher on the ALS site - but I just dropped
> > > them to our web-server at
> > >   http://www.opentech.at/Statistics.pdf and
> > >   http://www.opentech.at/TechSummary.pdf
> > > 
> > > This is quite a rough summary - so if anyone wants the actual data
> > > or R commands used - let me know - no issue with sharing this and having
> > > people tell me that Im totally wrong :)
> > 
> > Interesting, I'll go read them when I get the chance.
> > 
> > But I will make a meta-observation, it's "interesting" that people go
> > and do analysis of development processes like this, yet never actually
> > talk to the people doing the work about how they do it, nor how they
> > could possible improve it based on their analysis.
> 
> I do talk to the people - Ive been doing this quit a bit - one of
> the reasons for hoping over to ALS was precisely that. We ahve been
> publishing our stuff all along including any findings, patches
> etc. 

What long-term stable kernel maintainer have you talked to?

Not me :)

> BUT: Im not going to go to LinuxCon and claim that I know how
>      to do better - not based on the preliminary data we have now
>  
> Once we think we have something solid - I?ll be most happy to sit 
> down and listen.
> 
> > 
> > We aren't just people to just be researched, we can change if asked.
> > And remember, I _always_ ask for help with the stable development
> > process, I have huge areas that I know need work to improve, just no one
> > ever provides that help...
> 
> And we are doing our best to support that - be it by documentation
> fixes, compliance analysis, type safety analysis and appropriate
> patches Ive been pestering maintainers with.

You have?  As a subsystem maintainer I haven't seen anything like this,
I guess no one relies on my subsystems :)

> But you do have to give us the time to have SOLID data first
> and NOT rush conclusions - as you pointed out here your self
> some of the assumptions we are making might well be wrong so 
> what kind of suggestions do you expect here ? 
>  First get the data
>   -> make a model
>    -> deduce your analysis/sample/experiements
>     -> write it all up and present it to the community 
>      -> get the feedback and fix the model
> and if after tha some significant findings are left - THEN
> we will show up at LinuxCon and try to find someone to listen
> to what we think we have to say...

No need to go to LinuxCon, email works.  And lots of us go to much
better confernces as well (Plumbers, Kernel Recipes, FOSDEM, etc.) :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 17:34               ` Greg KH
@ 2016-08-17 18:48                 ` Nicholas Mc Guire
  2016-08-18  7:35                   ` Greg KH
  2016-08-18  7:38                   ` Greg KH
  0 siblings, 2 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2016-08-17 18:48 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 07:34:02PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 04:50:30PM +0000, Nicholas Mc Guire wrote:
> > > But you aren't comparing that to the number of changes that are
> > > happening in a "real" release.  If you do that, you will see the
> > > subsystems that never mark things for stable, which you totally miss
> > > here, right?
> > 
> > we are not looking at the runup to 4.4 here we are looking at
> > the fixes that go into 4.4.1++ and for those we look at all
> > commits in linux-stable. so that should cover ALL subsystems 
> > for which bugs were discovered and fixed (either in 4.4.X or
> > ported from other 4.X findings).
> 
> No, because (see below)...
> 
> > > For example, where are the driver subsystems that everyone relies on
> > > that are changing upstream, yet have no stable fixes?  What about the
> > > filesystems that even more people rely on, yet have no stable fixes?
> > > Are those code bases just so good and solid that there are no bugs to be
> > > fixed?  (hint, no...)
> > 
> > that is not what we are claiming - the model here is that the 
> > operation is uncovering bugs and the critical bugs are being 
> > fixed in stable releases. That there are more fixes and lots 
> > of cleanups that go into stable is clear but with respect to 
> > the usability of the kernel we do assume that if a bug in 
> > driver X is found that results in this driver being unusable 
> > or destabilizing the kernel it would be fixed in the stable 
> > fixes as well (which is also visible in the close to 50% 
> > fixes being in drivers) - now if that assumption is overly 
> > naive then you are right - and the assessment will not hold
> 
> No, that's not how bugs normally get found/fixed. They aren't found in
> older kernels for the most part, they are found in the "latest" kernel
> and then sometimes tagged that they should be backported.
> 
> All of the automated testing/debugging that we have going on to fix
> issues are on the latest kernel release, not the older releases.  We

Well our QA Farm at OSADL does do long-term testing of specific versions
Carsten Emde calls this freeze-and-grow, stick to one specific
version and "monitor" HEAD but dont jump to each new erlease, in fact
I would assume that a HA system would be better off with some simplistic
kernel version selection like:
  1) is it an LTS ?
  2) did it make it into some mainstream distro ?
  3) how many bugs sufaced over time in those distros ?
  4) did it make it through a few sublevels that show decreasing trends ?
  ...
All the automated build-bots/kernelci etc. are nice but they
do not replace field-data - we need both.

> might get lucky and get bug reports from a good distro like Debian or
> Fedora that is running the latest stable kernel, but usually those
> reports are "backport these fixes to the stable kernel please" as the
> fixes have already been made by the community.
> 
> But this does depened on the subsystem/area as well.  Some arches don't
> even test every kernel, they only wake up once a year and start sending
> patches in.  Some subsystems are the same (scsi was known for this...)
> So things are all over the place.

well that measn some of the possible issues are known - and some might
be backed by meta-data - if that is much of an incentif to fix the 
process I do not know, let see.

> 
> Also, you have the subsystems and arches that are just quiet for
> long stretches of time (like scsi used to be), where patches would queue
> up for many releases before they finally got merged.  Some arches only
> send patches in every other year for things that are more complex than
> build-breakage bugs because they just don't care.
> 
> So please, don't assume that the patches I apply to a LTS kernel are due
> to someone noticing it in that kernel.  It's almost never the case, but
> of course there are exceptions.
> 
> Again, I think you are trying to attribute a pattern to something that
> doesn't have it, based on how I have been seeing kernel development work
> over the years.
> 
> > > So because of that, you can't use the information about what I apply to
> > > stable trees as an indication that those are the only parts of the
> > > kernel that have bugs to be fixed.
> > 
> > so a discovered critical bug found in 4.7 that also is found
> > to apply to say 4.4.14 would *not* be fixed in 4.4.15 stable 
> > release ? 
> 
> Maybe, depends on the subsystem.  I know some specific ones that the
> answer to that would be no.  And that's the subsystem maintainers
> choice, I can't tell him to do extra work just because I decided to
> maintain a specific kernel version for longer than expected.

oops.. ok thats bad - that messes up a bit what we had been expecting
then we will need to include monitoring of HEAD basically to fix by
backporting on our own in case it is not done. Seems that I was then
a bit naive on that one.

> 
> > > > > So be careful about what you are trying to measure, it might just be not
> > > > > what you are assuming it is...
> > > > 
> > > > A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
> > > > is quite well representing the overall stable fixes. 
> > > 
> > > "overall stable fixes".  Not "overall kernel fixes", two very different
> > > things, please don't confuse the two.
> > 
> > Im not - we are looking at stable fixes - not kernel fixes the 
> > reason for that simply being that for kernel fixes it is not
> > possible to say if they are bug-fixes or optimzations/enhancements
> > - atleast not in any automated way.
> 
> I agree, it's hard, if not impossible to do that :)

...well then I?ll chicken out and go for the meager but possible.

> 
> > The focus on stable dot releases and their fixes was chosen 
> >  * because it is manageable
> >  * because we assume that critical bugs discovered will be fixed
> >  * and because there are no optimizations or added features 
> 
> The first one makes this easier for you, the second and third are not
> always true.  There have been big patchsets get merged into longterm
> stable kernel releases that were done because they were "optimizations"
> and the maintainer of that subsystem and I discussed it and deemed it
> was a valid thing to accept.  This happens every 6 months or so if you
> look closely.  The mm subsystem is known for this :)

so major mm subsystem optimizations will go in in the middle of a 
LTS between "random" sublevel releases ? Atleast for 4.4-4.4.13 I was not
able to pin-point such a change (based on files-changes/lines-added/removed)
could you point me to the one or other ? would help to see why we missed it.

> 
> And as for #2, again, I'm at the whim of the subsystem maintainer to
> mark the patches as such.  And again, this does not happen for all
> subsystems.
> 
> > > And because of that, I would state that "overall stable fixes" number
> > > really doesn't mean much to a user of the kernel.
> > 
> > It does for those that are using some LTS release and it says 
> > something about the probability of a bug in a stable relase
> > being detected. Or would you say that a 4.4.13 is not to be
> > expected to be better off than 4.4.1 ?
> 
> Yes, I would hope that it is better, otherwise why would I have accepted
> the patches to create that kernel?  :)
> 
> But you can't make the claim that all bugs that are being found are
> being added to the stable kernels, and especially not the lts kernels.
>

Im not making such a claim - we are just trying to estimate residual
bugs in the kernel for a given (defined) configuration based on the
git-meta data. We know that the kernel has bugs - but we can classify
their severity, estimate there distribution, estimate the residual bugs
and from that estimate the overall criticality of the kernel in a quantiative
way (with modeled/quantified uncertainty)
 
> > From the data we have
> > looked at so far: life-time of a bug in -stable as well as with 
> > respect to the discovery rate of bugs in sublevel releases
> > it seems clear that the reliability of the kernel over
> > sublevel releases is increasing and that this can be utilized
> > to select a kernelversion more suitable for HA or critical
> > systems based on trending/analysis.
> 
> That's good, I'm glad we aren't regressing.  But the only way you can be
> sure to get all fixes is to always use the latest kernel release.
> That's all the kernel developers will ever guarantee.
> 
> "critical" and HA systems had better be updating to newer kernel
> releases as they have all of the fixes in it that they need.  There
> shouldn't be any "fear" of changing to a new kernel any more than they
> should fear moving to a new .y stable release.

That would be nice - but its not doable - not as soon as you need a certification for such a system. Dot releases have the key advantage of not including 
feature changes or significant redesign - so the testing and field-data as 
well as analysis (like ftrace campagnes/code-coverage/LTP/etc.) stay valid
to a large extent. From what we have been reviewing for mainstream hardware
we also did not see that backporting was *not* happening for a quite 
constraint/small configuration.

> 
> > > Over time, more people are using the "fixes:" tag, but then that messes
> > > with your numbers because you can't compare the work we did this year
> > > with the work we did last year.
> > 
> > sur why not ? You must look at relative usage and correlation
> > of the tags - currently about 36% of the stable commits in the
> > dot-releases (sublevels) are a uable basis - if the use of
> > Fixes: increases all the better - it just means we are moving
> > towards an R^2 of 1 - results stay comparable, it just means
> > that the confidence intervals for the current data are wider
> > than for the data of next year.
> 
> Depends on how you describe "confidence" levels, but sure, I'll take
> your word for it :)

We are trying to put numeric values on artefacts of development so that they
are comparable - and with confidence here we do mean formal confidence
levels (p-values, AIC values, hypothesis/significance testing at
defined levels) - trying to get away from "gut-feeling" only - but lets see
if we do better or just produce "formalized gut-feeling"...

> 
> > > Also, our rate of change has increased, and the number of stable patches
> > > being tagged has increased, based on me going around and kicking
> > > maintainers.  Again, because of that you can't compare year to year at
> > > all.
> > 
> > why not ? We are not selecting a specific class of bugs in any
> > way - the Fixes are neatly randomly distributed across the 
> > effective fixes in stable - it may be a bit biased because some
> > maintainer does not like Fixes: tags and her subsystem is 
> > significantly more complex/more buggy/better tested/etc. than
> > the average bussystem - so we would get a bit of a bias into it
> > all - but that does not invalidate the results. 
> > You can ask the voters in 3 states who they will elect president
> > and this will give you a less accurate result than if you ask in
> > all 51 states but if you factor in that uncertainty into the
> > result its perfectly valid and stays comparable to other results 
> > 
> > Im not saying that you simply can compare numeric values for
> > 2016 with those from 2017 but you can compare the trends and
> > the expectations if you model uncertainties. 
> 
> Ok, fair enough.  As long as we continue to do better I'll be happy.
> 
> > Note that we have a huge advantage here - we can make predictions
> > from models - say predict 4.4.16 and then actually check our models
> 
> That's good, and is what I've been telling people that they should be
> doing for a long time.  Someone actually went and ran regression tests
> on all 3.10.y kernel releases and found no regressions for their
> hardware platform.  That's a good thing to see.

and just like you can do that on code to detect null pointers or what not
you can do it on development processes to detect systematic problems there
like infinite fix cycles or accumulation of fix-fix-commits indicating a
possibly broken design rather than "just" broken code.

> 
> > Now if there are really significant changes like the task struct
> > bein redone then that may have a large impact and the assumption
> > that the convoluted parameter "sublevel" is describing a more or
> > less stable development might be less correct - it will not be
> > completely wrong - and consequently the prediction quality will
> > suffer - but does that invalidate the approach ?
> 
> I don't know, you tell me :)

ok - will answer that by - say end of 2017 +/- 1 year with a probability of...

> 
> > > There's also the "bias" of the long-term and stable maintainer to skew
> > > the patches they review and work to get applied based on _why_ they are
> > > maintaining a specific tree.  I know I do that for the trees I maintain,
> > > and know the other stable developers do the same.  But those reasons are
> > > different, so you can't compare what is done to one tree vs. another one
> > > very well at all because of that bias.
> > 
> > If the procedures applied do not "jump" but evolve then bias is
> > not an issue - you can find many factors that will increas the
> > uncertainty of any such prediction - but if the parameters, which
> > all are convoluted - be it by presonal preferences of maintainers
> > selection of a specific FS in mainline distributions, etc - stil
> > represent the overall development and as long as your bias as you
> > called it does not flip-flop from 4.4.6 to 4.4.7 we do not care
> > to much.
> 
> Ok, but I don't think the users of those kernels will like that, as you
> can't represent bias in your numbers and perhaps a whole class of users
> is being ignored for one specific LTS release.  Then they would get no
> bugfixes for their areas :(

I think we are looking at it from different angles here - our intent is to 
uncover high-level faults in the develoment life-cycle - thinkgs that start
going off track, like fix-rates going up or complexity metrics jumping, bug
ages changing statitically significantly. e.g. ext4 has kind of poped out
as a problem case - we can?t yet really say much why but from what Ive been
looking at it seems that the problem goes all the way back to the initial
release as a copy of ext3 rather than a clean re-implementation/re-design
(This conclusion may well be wrong - its based on the observation that 
 ext4 stable fixes are seemingly not stabilizing)

So yes - we might be missing a whole subssytem or arch - we can not
do much about that - but we can detect some types of high-level faults
in the development and possibly address them by tools (like static code
checkers or git meta-data harvesting, etc.) or fixes to the process and 
in this sense it will profit those, for us, hidden users. 

> 
> > > > > > Some early results where presented at ALS in Japan on July 14th
> > > > > > but this still needs quite a bit of work.
> > > > > 
> > > > > Have a pointer to that presentation?
> > > > >
> > > > They probably are somewher on the ALS site - but I just dropped
> > > > them to our web-server at
> > > >   http://www.opentech.at/Statistics.pdf and
> > > >   http://www.opentech.at/TechSummary.pdf
> > > > 
> > > > This is quite a rough summary - so if anyone wants the actual data
> > > > or R commands used - let me know - no issue with sharing this and having
> > > > people tell me that Im totally wrong :)
> > > 
> > > Interesting, I'll go read them when I get the chance.
> > > 
> > > But I will make a meta-observation, it's "interesting" that people go
> > > and do analysis of development processes like this, yet never actually
> > > talk to the people doing the work about how they do it, nor how they
> > > could possible improve it based on their analysis.
> > 
> > I do talk to the people - Ive been doing this quit a bit - one of
> > the reasons for hoping over to ALS was precisely that. We ahve been
> > publishing our stuff all along including any findings, patches
> > etc. 
> 
> What long-term stable kernel maintainer have you talked to?
> 
> Not me :)

I actually did talk to you in Duesseldorf at the LISI session (I think
that was its name) about LTS kernels for safety-related automotive systems.
but I did not discuss the statistical stuff at that time as it was not
really ready yet. But as noted before - first we need solid data
so that we can actually resonably uncover high-level issues (like
type missmatches and the whole linux kernel type system/mess - developers
or subsystems with particular issues like lack of reviewed-by or what
ever)

> 
> > BUT: Im not going to go to LinuxCon and claim that I know how
> >      to do better - not based on the preliminary data we have now
> >  
> > Once we think we have something solid - I?ll be most happy to sit 
> > down and listen.
> > 
> > > 
> > > We aren't just people to just be researched, we can change if asked.
> > > And remember, I _always_ ask for help with the stable development
> > > process, I have huge areas that I know need work to improve, just no one
> > > ever provides that help...
> > 
> > And we are doing our best to support that - be it by documentation
> > fixes, compliance analysis, type safety analysis and appropriate
> > patches Ive been pestering maintainers with.
> 
> You have?  As a subsystem maintainer I haven't seen anything like this,
> I guess no one relies on my subsystems :)

Actually a number of them went to you and showed up in stable review
patch series in the past. Now nothing wild and big - its just cleanup
patches (type fixes, completion API fixes, doc fixes) - and some did
go into backports like 3.14-stable (from 3.19 I think)
Atleast I do have a dozen or so "Applied, thanks" from your email address
or your "friendly semi-automated patch-bot"


> 
> > But you do have to give us the time to have SOLID data first
> > and NOT rush conclusions - as you pointed out here your self
> > some of the assumptions we are making might well be wrong so 
> > what kind of suggestions do you expect here ? 
> >  First get the data
> >   -> make a model
> >    -> deduce your analysis/sample/experiements
> >     -> write it all up and present it to the community 
> >      -> get the feedback and fix the model
> > and if after tha some significant findings are left - THEN
> > we will show up at LinuxCon and try to find someone to listen
> > to what we think we have to say...
> 
> No need to go to LinuxCon, email works.  And lots of us go to much
> better confernces as well (Plumbers, Kernel Recipes, FOSDEM, etc.) :)
>
Well early findings were preseted this year at FOSDEM and
at plumbers in Duesseldorf we presented the certification
approach as well - but that was still at a very early stage
in the context of the SIL2LinuxMP certification project at
OSADL (http://www.osadl.org/SIL2)

We are more than happy to present findings and rake in some more rants ...

if e-mail is prefered - all the better.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 18:48                 ` Nicholas Mc Guire
@ 2016-08-18  7:35                   ` Greg KH
  2016-08-18  7:38                   ` Greg KH
  1 sibling, 0 replies; 12+ messages in thread
From: Greg KH @ 2016-08-18  7:35 UTC (permalink / raw)
  To: kernelnewbies

Ok, just one response to this every growing thread, I think me and
Nicholas need to sit down over a beverage and work this out in person
instead of boring everyone on the list...

On Wed, Aug 17, 2016 at 06:48:53PM +0000, Nicholas Mc Guire wrote:
> I think we are looking at it from different angles here - our intent is to 
> uncover high-level faults in the develoment life-cycle - thinkgs that start
> going off track, like fix-rates going up or complexity metrics jumping, bug
> ages changing statitically significantly. e.g. ext4 has kind of poped out
> as a problem case - we can?t yet really say much why but from what Ive been
> looking at it seems that the problem goes all the way back to the initial
> release as a copy of ext3 rather than a clean re-implementation/re-design
> (This conclusion may well be wrong - its based on the observation that 
>  ext4 stable fixes are seemingly not stabilizing)

Ah, here's an example of what I was trying to say before.

ext4 is not any less or more "unstable" with more bugs than other
filesystems.  It's just that their developers and maintainer do send and
mark patches for the stable kernel trees.  The other filesystem
developers do not at all.

Yes, some bugfixes trickle in for other filesystems, but that's a very
rare occasion, and one major filesystem has the explicit rule that they
will NOT send any bugfixes for stable kernels because they just don't
want to worry about it and they want their users to always use the
latest kernel release to get the needed fixes.

So don't think that ext4 is more "unstable" than others, in fact, one
could argue that the ext4 developers are making their users _more_
stable than the other filesystems just because of these fixes :)

I think you might want to try to figure out how to look at the fixes
that go into Linus's kernel tree in order to try to get a better overall
picture.  Yes, it's a firehose, but that's what we have data analysis
tools for :)

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Determining patch impact on a specific config
  2016-08-17 18:48                 ` Nicholas Mc Guire
  2016-08-18  7:35                   ` Greg KH
@ 2016-08-18  7:38                   ` Greg KH
  1 sibling, 0 replies; 12+ messages in thread
From: Greg KH @ 2016-08-18  7:38 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 17, 2016 at 06:48:53PM +0000, Nicholas Mc Guire wrote:
> > The first one makes this easier for you, the second and third are not
> > always true.  There have been big patchsets get merged into longterm
> > stable kernel releases that were done because they were "optimizations"
> > and the maintainer of that subsystem and I discussed it and deemed it
> > was a valid thing to accept.  This happens every 6 months or so if you
> > look closely.  The mm subsystem is known for this :)
> 
> so major mm subsystem optimizations will go in in the middle of a 
> LTS between "random" sublevel releases ? Atleast for 4.4-4.4.13 I was not
> able to pin-point such a change (based on files-changes/lines-added/removed)
> could you point me to the one or other ? would help to see why we missed it.

4.4 hasn't been around long enough for this to happen yet, I think it
happened in 4.1, or maybe 3.14, or possibly 3.10, can't remember, but it
should be obvious by the changelogs.

thanks,

greg "just one more email!" k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-18  7:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-17 12:39 Determining patch impact on a specific config Nicholas Mc Guire
2016-08-17 13:25 ` Greg KH
2016-08-17 13:52   ` Greg KH
2016-08-17 14:01     ` Nicholas Mc Guire
2016-08-17 14:17       ` Greg KH
2016-08-17 14:49         ` Nicholas Mc Guire
2016-08-17 15:39           ` Greg KH
2016-08-17 16:50             ` Nicholas Mc Guire
2016-08-17 17:34               ` Greg KH
2016-08-17 18:48                 ` Nicholas Mc Guire
2016-08-18  7:35                   ` Greg KH
2016-08-18  7:38                   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).