[Lustre-devel] broader Lustre testing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Lustre-devel] broader Lustre testing
       [not found] <A58D7719023A5D47A6D3BEAF8C0BE67509EA751B@CFWEX01.americas.cray.com>
@ 2012-07-12 19:37 ` Nathan Rutman
  2012-07-12 20:25   ` [Lustre-devel] [cdwg] " Andreas Dilger
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nathan Rutman @ 2012-07-12 19:37 UTC (permalink / raw)
  To: lustre-devel

On Jul 12, 2012, at 7:30 AM, John Carrier wrote:

> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released.  Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official. 

Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
These questions, I think, should not need to be asked.  Which version of MacOS should I use?  The latest one, period.  Why can't Lustre do the same thing?  The answer I think lies in testing, which becomes a chicken and egg problem.   I'm only going to use a "stable" release, which is the release which was tested with my applications.  I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released.  Hopefully it even ran on a big system like Hyperion.  (Do we learn anything more about running acc-sm on other big systems?  Probably not much.)  But it certainly wasn't tested with my application, because I didn't test it.  Because it wasn't released yet.  Chicken and egg.  Only after enough others make the leap am I willing to.
So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications.  To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day".  Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day.  Make sure your data is backed up.  Make sure it's not in the middle of a critical week-long run.  Accept that you might have to re-run it tomorrow in the worst case.  Report any problems you have.
What you get out of it is a much more stable Master, and an end to the question of "which version should I run".  When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications.  More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches.  Latest and greatest and most stable.

Pipe dream?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120712/2fed6c1d/attachment.htm>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] [cdwg] broader Lustre testing
  2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
@ 2012-07-12 20:25   ` Andreas Dilger
  2012-07-12 20:57   ` Christopher J. Morrone
  2012-07-20 14:13   ` James A Simmons
  2 siblings, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2012-07-12 20:25 UTC (permalink / raw)
  To: lustre-devel

On 2012-07-12, at 1:37 PM, Nathan Rutman wrote:
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> A more strategic solution is to do more testing of a feature release
>> candidate _before_ it is released.  Even if a Community member has no
>> interest in using a feature release in production, early testing with
>> pre-release versions of feature releases will help identify
>> instabilities created by the new feature with their workloads and
>> hardware before the release is official. 
> 
> 
> Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
> 
> These questions, I think, should not need to be asked.  Which version of MacOS should I use?  The latest one, period.

Interesting...   I _don't_ run the latest version of MacOS, and I distinctly recall people having a variety of issues with 10.7.0 when it was released.  Does that mean the MacOS testing was insufficient?  Partly, but it is unrealistic to test every possible usage pattern, so testing has to be "optimized" to cover the most common use cases in order to be finished within both time and cost constraints.

> Why can't Lustre do the same thing?  The answer I think lies in testing, which becomes a chicken and egg problem.   I'm only going to use a "stable" release, which is the release which was tested with my applications.  I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released.  Hopefully it even ran on a big system like Hyperion.  (Do we learn anything more about running acc-sm on other big systems?  Probably not much.)

Right.  I don't think that acc-sm is the end-all in testing frameworks, and I freely admit that there is a lot more testing that could be done, both in scale and in the types of loads that are used.  The acceptance-small.sh script is intended to be an "optimized" test set that can run in a few hours to give some reasonable confidence in a particular change.

>  But it certainly wasn't tested with my application, because I didn't test it.  Because it wasn't released yet.  Chicken and egg.  Only after enough others make the leap am I willing to.

There are all kinds of other load/stress tests (including applications) that can/should be run after the "basic" tests have been run to find new defects.  When those defects are found they should be distilled down to a simple and specific test that gets added to the regular regression suite.  I think it is this kind of testing that is needed moving forward.

> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications.

I would caveat this to say - only test on tags which we know to be at least reasonably stable, since a lot of testing time will be wasted otherwise.

> To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day".  Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day.  Make sure your data is backed up.  Make sure it's not in the middle of a critical week-long run.  Accept that you might have to re-run it tomorrow in the worst case.  Report any problems you have.

I'm not sure that users will be willing to do this, though some "friendly" users are known to make the leap onto new systems in order to get early/free CPU cycles on new clusters.

There are also "feature tests" that need to be run at scale to validate new features, to ensure they are functional at scale, don't impact performance, and experiencing the kids of race conditions that scale testing provides.

> What you get out of it is a much more stable Master, and an end to the question of "which version should I run".  When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications.  More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches.  Latest and greatest and most stable.
> 
> Pipe dream?

I hope not.  When I see users taking a specific release of Lustre, testing it, and then applying a patch series to their branch, the unfortunate result is more effort for the user (vendor/site, not end users) to maintain their patches, and more effort for support to determine if some _other_ bug is already fixed, or to debug a problem that appears only with a specific combination of patches applied, and then craft a different fix for that branch than the mainline.

A better use case would be for users to start testing _before_ a major release is made, find/fix bugs, and merge the fixes into mainline, so when it is released in a maintenance release it will already be quite stable.  This keeps the user patchset much smaller, and everyone will benefit from fixes from other testing before the release, and hopefully find fewer bugs in the field.  It also avoids the issue of each user testing some cross-product of patches, and not really leveraging each others testing.  Then, any bugs found in the field go into the maintenance branch and master, but there is much less of a need to "test" the maintenance branch, since the changes there should be relatively small.

I think this is a reasonable approach, given that we no longer land features on maintenance branches.  That means the risk of following maintenance releases is much smaller than it was in the 1.6 and 1.8 days (1.8.x only really entered "maintenance" mode with 1.8.6 or so).

We've been trying to follow this model with LLNL.  One issue is that 2.1.0 didn't really receive as much up-front testing as it could have, so it is getting more fixes than it should.  We are working hard to land all of the LLNL (and other) bugfix patches into master and the next 2.1.x release.

There is a parallel effort to test orion (2.4 development branch) so that by the time 2.4 rolls around (including features that are not in master or orion yet) it will be relatively stable and does not need its own "test effort".

Are we at this nirvana yet?  Not quite, but I think we are closer than ever before, and we have the chance to get there with a coordinated effort of the community.

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] [cdwg] broader Lustre testing
  2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
  2012-07-12 20:25   ` [Lustre-devel] [cdwg] " Andreas Dilger
@ 2012-07-12 20:57   ` Christopher J. Morrone
  2012-07-16 18:31     ` Roman Grigoryev
  2012-07-20 14:13   ` James A Simmons
  2 siblings, 1 reply; 6+ messages in thread
From: Christopher J. Morrone @ 2012-07-12 20:57 UTC (permalink / raw)
  To: lustre-devel

On 07/12/2012 12:37 PM, Nathan Rutman wrote:
>
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released.  Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official.
>
>
> Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
> These questions, I think, should not need to be asked.  Which version of MacOS should I use?  The latest one, period.  Why can't Lustre do the same thing?

Because we're an open source project where all of our dirty laundry is 
in the public.  I'm sure that Apple has all kinds of internal deadlines 
and testing tags and things that we don't see on the outside world 
because it is a close-source proprietary product with vast resources to 
develop and test internally.

The every-six month cadence is a good thing in my opinion.  It forces us 
developers to regularly address the stability of the changes we are 
introducing.  It provides a clear, explicit time in the schedule for 
developers to stop writing new bugs, and focus their effort on fixing bugs.

I believe that the maintenance branch _is_ the place that you go when 
the question is "which version should I use"?  We just need to have a 
decent web page that says "Want Lustre? Here's the latest stable 
release!"  We need to increase exposure of the maintence releases, and 
hid the "feature" releases off on a developers page.

> The answer I think lies in testing, which becomes a chicken and egg problem.   I'm only going to use a "stable" release, which is the release which was tested with my applications.  I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released.  Hopefully it even ran on a big system like Hyperion.  (Do we learn anything more about running acc-sm on other big systems?  Probably not much.)  But it certainly wasn't tested with my application, because I didn't test it.  Because it wasn't released yet.  Chicken and egg.  Only after enough others make the leap am I willing to.
> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications.  To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day".  Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day.  Make sure your data is backed up.  Make sure it's not in the middle of a critical week-long run.  Accept that you might have to re-run it tomorrow in the worst case.  Report any problems you have.
> What you get out of it is a much more stable Master, and an end to the question of "which version should I run".  When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications.  More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches.  Latest and greatest and most stable.

We can do a great deal more testing, and find a seriously large amount 
of bugs that we have been missing by getting more testing personnel 
allocated to Lustre.  I think that's the major gap in Lustre right now.

One day every two months is, I think, insufficient validating any 
software product, let alone something as complex as Lustre.  Not that I 
am opposed to the idea.  If you can arrange that, go for it!  But that 
isn't good enough by itself by a long shot.

We need full time personnel working on testing lustre.  I would think 
that all of the vendors out there selling products to customers would 
already have alot of experience testing hardware, and other software 
bits.  Lets apply some of that know-how to Lustre!

And I think these testing personnel need to be made known to the 
community, so they can talk to each other, so that developers can guide 
their efforts, so we know what our testing converage looks like, etc.

Testing needs to be a CONTINUAL process, not just something we do at the 
end for a specific release number.  By the time we tag 2.4, it should 
already have been tested so frequently all along the master development 
cycle that the final testing will start to look like a formality to us. 
  We should still do it, of course, but we should have confidence long 
before that happens.

LLNL is trying to do that with the master branch as it moves to 2.4. 
Our coverage is mainly on zfs backends for now, but as the rest of orion 
lands on master, and Sequoia goes into limited production use we'll have 
both zfs and ldiskfs filesystems in our testbed, and test regularly all 
the way up to, and beyond, 2.4.

The gaps in testing are NOT all an issue of insufficent scale testing, 
although there is admittedly a constant issue there.  We need much 
better testing at small scale as well.

And let me be really clear: when I say testing, I mean a real human 
being thinking up new tests all of the time.  Looking at logs all of the 
time (so even when the test app succeeded, we'll catch the timeouts and 
reconnections and things that should not be happening, and are symptoms 
of bugs).  Powering things off randomly.  Literally pulling cables out 
while an evil, pathologically bad IO workload is running.

We need real people to test all of the things that it is really easy for 
a human to do, and would take years for developers to automate with any 
reliability.

The automated regression suite that we use is great.  We should continue 
to improve that over time.  But I would content that it is not, and 
never will be, sufficient to tells us if Lustre is stable.

I would argue that the regressions tests are, in fact, a very low bar. 
And Lustre is just too complicated, networks are too complicated, we 
have too few developers, to ever come up with an automated suite with 
any thing but a relatively low confidence level in the stability of the 
software.

And human testers are given a very different set of goals then 
developers.  A developer's job is to make things work.  A tester's is to 
do whatever they can to break it.  And then create a good report of how 
they broke it so the developers can fix it.

I also agree that I don't want to continue in this mode of "we'll only 
run it when LLNL/ORNL runs it and says its good".  So we need more human 
testers.

And to get back to the topic of making every single release a "stable" 
release:  That ignores the fact that we have roughly a decade of 
seriously buggy, undocumented code that we're dealing with.  It just 
will not happen.  Period.  We have to accept that and move forward.

We can strive from this point on to make every release better than the 
last.  But developers are human.  Every time we add new features, we're 
going to add new bugs.  We'll also fix bugs.  But we're going to add new 
ones as well.

So we deal with that by having "maintenance" releases.  The maintenance 
release is maintained for a "long" period of time, but add NO new 
features.  No new support for new kernels.  No fantastic new performance 
improvements.  Just bug fixes.

The maintenance release is what vendors should build products upon, 
because that is where we'll land only bug fixes.  So it is far more 
likely to only improve with time, whereas "master" (and therefore the 
"feature" releases which are just tags on master every 6 months), will 
also introduce destabilizing new features.

We'll endevour to make the new features as stable as we are capable of 
doing, and we can do better if we have more testers, but we have to be 
pragmatic.

"Every tag should be completely stable" is impossible.  "Every tag on 
the maintenance branch should be more stable than the last" is an 
achievable goal.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] [cdwg] broader Lustre testing
  2012-07-12 20:57   ` Christopher J. Morrone
@ 2012-07-16 18:31     ` Roman Grigoryev
  0 siblings, 0 replies; 6+ messages in thread
From: Roman Grigoryev @ 2012-07-16 18:31 UTC (permalink / raw)
  To: lustre-devel

Hi Christopher,

.....
>
> The automated regression suite that we use is great.  We should continue
> to improve that over time.  But I would content that it is not, and
> never will be, sufficient to tells us if Lustre is stable.
>
> I would argue that the regressions tests are, in fact, a very low bar.
> And Lustre is just too complicated, networks are too complicated, we
> have too few developers, to ever come up with an automated suite with
> any thing but a relatively low confidence level in the stability of the
> software.
>
> And human testers are given a very different set of goals then
> developers.  A developer's job is to make things work.  A tester's is to
> do whatever they can to break it.  And then create a good report of how
> they broke it so the developers can fix it.
>
.............

Just for proving your statement that it is not enough just execute
automated regression suite (acc-small) for testing quality I would like to
share coverage summary which we got:

958 tests was executed
                     Hit	Total	Coverage
Lines:	        79691	128935	61.8 %
Functions:	6206	7935	78.2 %
Branches:	49287	113914	43.3 %

Thanks,
	Roman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] [cdwg] broader Lustre testing
  2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
  2012-07-12 20:25   ` [Lustre-devel] [cdwg] " Andreas Dilger
  2012-07-12 20:57   ` Christopher J. Morrone
@ 2012-07-20 14:13   ` James A Simmons
  2012-07-20 18:20     ` Nathan Rutman
  2 siblings, 1 reply; 6+ messages in thread
From: James A Simmons @ 2012-07-20 14:13 UTC (permalink / raw)
  To: lustre-devel

On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:
> 
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
> 
> > A more strategic solution is to do more testing of a feature release
> > candidate _before_ it is released.  Even if a Community member has
> > no
> > interest in using a feature release in production, early testing
> > with
> > pre-release versions of feature releases will help identify
> > instabilities created by the new feature with their workloads and
> > hardware before the release is official. 

...
> So, it seems, we need to test pre-release versions of Lustre, aka
> Master, with my applications.  To that end, how willing are people to
> set aside a day, say once every two months, to be "filesystem beta
> day".  Scientists, run your codes, users, do your normal work, but
> bear in mind there may be filesystem instabilities on that day.  Make
> sure your data is backed up.  Make sure it's not in the middle of a
> critical week-long run.  Accept that you might have to re-run it
> tomorrow in the worst case.  Report any problems you have.
> What you get out of it is a much more stable Master, and an end to the
> question of "which version should I run".  When released, you have
> confidence that you can move up, get the great new features and
> performance, and it runs your applications.  More people are on the
> same release, so it sees even more testing. The maintenance branch is
> always the latest branch, you can pull in point releases with more bug
> fixes with ease. No more rolling your own Lustre with Frankenstein
> sets of patches.  Latest and greatest and most stable.
> 
> 
> Pipe dream?

Since people are now moving to help test out the current master branch
for whamcloud I like to purpose posting a general summary of testing
results people are seeing. I personally have finished a first run at
testing 2.2.91 this last week and would galdly share the results. Anyone
else can to share :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] [cdwg] broader Lustre testing
  2012-07-20 14:13   ` James A Simmons
@ 2012-07-20 18:20     ` Nathan Rutman
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Rutman @ 2012-07-20 18:20 UTC (permalink / raw)
  To: lustre-devel


On Jul 20, 2012, at 7:13 AM, James A Simmons wrote:

> On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:
>> 
>> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> 
>>> A more strategic solution is to do more testing of a feature release
>>> candidate _before_ it is released.  Even if a Community member has
>>> no
>>> interest in using a feature release in production, early testing
>>> with
>>> pre-release versions of feature releases will help identify
>>> instabilities created by the new feature with their workloads and
>>> hardware before the release is official. 
> 
> ...
>> So, it seems, we need to test pre-release versions of Lustre, aka
>> Master, with my applications.  To that end, how willing are people to
>> set aside a day, say once every two months, to be "filesystem beta
>> day".  Scientists, run your codes, users, do your normal work, but
>> bear in mind there may be filesystem instabilities on that day.  Make
>> sure your data is backed up.  Make sure it's not in the middle of a
>> critical week-long run.  Accept that you might have to re-run it
>> tomorrow in the worst case.  Report any problems you have.
>> What you get out of it is a much more stable Master, and an end to the
>> question of "which version should I run".  When released, you have
>> confidence that you can move up, get the great new features and
>> performance, and it runs your applications.  More people are on the
>> same release, so it sees even more testing. The maintenance branch is
>> always the latest branch, you can pull in point releases with more bug
>> fixes with ease. No more rolling your own Lustre with Frankenstein
>> sets of patches.  Latest and greatest and most stable.
>> 
>> 
>> Pipe dream?
> 
> Since people are now moving to help test out the current master branch
> for whamcloud I like to purpose posting a general summary of testing
> results people are seeing. I personally have finished a first run at
> testing 2.2.91 this last week and would galdly share the results. Anyone
> else can to share :-)
> 
> 

I started a page on the OpenSFS Wiki for everyone to share their test results in a free-form format.  Note that the Wiki itself is still in it's infancy - I call on the community to help populate it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120720/18bf46fe/attachment.htm>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-20 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <A58D7719023A5D47A6D3BEAF8C0BE67509EA751B@CFWEX01.americas.cray.com>
2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
2012-07-12 20:25   ` [Lustre-devel] [cdwg] " Andreas Dilger
2012-07-12 20:57   ` Christopher J. Morrone
2012-07-16 18:31     ` Roman Grigoryev
2012-07-20 14:13   ` James A Simmons
2012-07-20 18:20     ` Nathan Rutman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.