* [Lustre-devel] broader Lustre testing
[not found] <A58D7719023A5D47A6D3BEAF8C0BE67509EA751B@CFWEX01.americas.cray.com>
@ 2012-07-12 19:37 ` Nathan Rutman
2012-07-12 20:25 ` [Lustre-devel] [cdwg] " Andreas Dilger
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Nathan Rutman @ 2012-07-12 19:37 UTC (permalink / raw)
To: lustre-devel
On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released. Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official.
Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period. Why can't Lustre do the same thing? The answer I think lies in testing, which becomes a chicken and egg problem. I'm only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.) But it certainly wasn't tested with my application, because I didn't test it. Because it wasn't released yet. Chicken and egg. Only after enough others make the leap am I willing to.
So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications. To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it's not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have.
What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable.
Pipe dream?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120712/2fed6c1d/attachment.htm>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Lustre-devel] [cdwg] broader Lustre testing
2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
@ 2012-07-12 20:25 ` Andreas Dilger
2012-07-12 20:57 ` Christopher J. Morrone
2012-07-20 14:13 ` James A Simmons
2 siblings, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2012-07-12 20:25 UTC (permalink / raw)
To: lustre-devel
On 2012-07-12, at 1:37 PM, Nathan Rutman wrote:
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> A more strategic solution is to do more testing of a feature release
>> candidate _before_ it is released. Even if a Community member has no
>> interest in using a feature release in production, early testing with
>> pre-release versions of feature releases will help identify
>> instabilities created by the new feature with their workloads and
>> hardware before the release is official.
>
>
> Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
>
> These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period.
Interesting... I _don't_ run the latest version of MacOS, and I distinctly recall people having a variety of issues with 10.7.0 when it was released. Does that mean the MacOS testing was insufficient? Partly, but it is unrealistic to test every possible usage pattern, so testing has to be "optimized" to cover the most common use cases in order to be finished within both time and cost constraints.
> Why can't Lustre do the same thing? The answer I think lies in testing, which becomes a chicken and egg problem. I'm only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.)
Right. I don't think that acc-sm is the end-all in testing frameworks, and I freely admit that there is a lot more testing that could be done, both in scale and in the types of loads that are used. The acceptance-small.sh script is intended to be an "optimized" test set that can run in a few hours to give some reasonable confidence in a particular change.
> But it certainly wasn't tested with my application, because I didn't test it. Because it wasn't released yet. Chicken and egg. Only after enough others make the leap am I willing to.
There are all kinds of other load/stress tests (including applications) that can/should be run after the "basic" tests have been run to find new defects. When those defects are found they should be distilled down to a simple and specific test that gets added to the regular regression suite. I think it is this kind of testing that is needed moving forward.
> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications.
I would caveat this to say - only test on tags which we know to be at least reasonably stable, since a lot of testing time will be wasted otherwise.
> To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it's not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have.
I'm not sure that users will be willing to do this, though some "friendly" users are known to make the leap onto new systems in order to get early/free CPU cycles on new clusters.
There are also "feature tests" that need to be run at scale to validate new features, to ensure they are functional at scale, don't impact performance, and experiencing the kids of race conditions that scale testing provides.
> What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable.
>
> Pipe dream?
I hope not. When I see users taking a specific release of Lustre, testing it, and then applying a patch series to their branch, the unfortunate result is more effort for the user (vendor/site, not end users) to maintain their patches, and more effort for support to determine if some _other_ bug is already fixed, or to debug a problem that appears only with a specific combination of patches applied, and then craft a different fix for that branch than the mainline.
A better use case would be for users to start testing _before_ a major release is made, find/fix bugs, and merge the fixes into mainline, so when it is released in a maintenance release it will already be quite stable. This keeps the user patchset much smaller, and everyone will benefit from fixes from other testing before the release, and hopefully find fewer bugs in the field. It also avoids the issue of each user testing some cross-product of patches, and not really leveraging each others testing. Then, any bugs found in the field go into the maintenance branch and master, but there is much less of a need to "test" the maintenance branch, since the changes there should be relatively small.
I think this is a reasonable approach, given that we no longer land features on maintenance branches. That means the risk of following maintenance releases is much smaller than it was in the 1.6 and 1.8 days (1.8.x only really entered "maintenance" mode with 1.8.6 or so).
We've been trying to follow this model with LLNL. One issue is that 2.1.0 didn't really receive as much up-front testing as it could have, so it is getting more fixes than it should. We are working hard to land all of the LLNL (and other) bugfix patches into master and the next 2.1.x release.
There is a parallel effort to test orion (2.4 development branch) so that by the time 2.4 rolls around (including features that are not in master or orion yet) it will be relatively stable and does not need its own "test effort".
Are we at this nirvana yet? Not quite, but I think we are closer than ever before, and we have the chance to get there with a coordinated effort of the community.
Cheers, Andreas
--
Andreas Dilger Whamcloud, Inc.
Principal Lustre Engineer http://www.whamcloud.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Lustre-devel] [cdwg] broader Lustre testing
2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
2012-07-12 20:25 ` [Lustre-devel] [cdwg] " Andreas Dilger
@ 2012-07-12 20:57 ` Christopher J. Morrone
2012-07-16 18:31 ` Roman Grigoryev
2012-07-20 14:13 ` James A Simmons
2 siblings, 1 reply; 6+ messages in thread
From: Christopher J. Morrone @ 2012-07-12 20:57 UTC (permalink / raw)
To: lustre-devel
On 07/12/2012 12:37 PM, Nathan Rutman wrote:
>
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released. Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official.
>
>
> Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use":
> These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period. Why can't Lustre do the same thing?
Because we're an open source project where all of our dirty laundry is
in the public. I'm sure that Apple has all kinds of internal deadlines
and testing tags and things that we don't see on the outside world
because it is a close-source proprietary product with vast resources to
develop and test internally.
The every-six month cadence is a good thing in my opinion. It forces us
developers to regularly address the stability of the changes we are
introducing. It provides a clear, explicit time in the schedule for
developers to stop writing new bugs, and focus their effort on fixing bugs.
I believe that the maintenance branch _is_ the place that you go when
the question is "which version should I use"? We just need to have a
decent web page that says "Want Lustre? Here's the latest stable
release!" We need to increase exposure of the maintence releases, and
hid the "feature" releases off on a developers page.
> The answer I think lies in testing, which becomes a chicken and egg problem. I'm only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn't be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.) But it certainly wasn't tested with my application, because I didn't test it. Because it wasn't released yet. Chicken and egg. Only after enough others make the leap am I willing to.
> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications. To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it's not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have.
> What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable.
We can do a great deal more testing, and find a seriously large amount
of bugs that we have been missing by getting more testing personnel
allocated to Lustre. I think that's the major gap in Lustre right now.
One day every two months is, I think, insufficient validating any
software product, let alone something as complex as Lustre. Not that I
am opposed to the idea. If you can arrange that, go for it! But that
isn't good enough by itself by a long shot.
We need full time personnel working on testing lustre. I would think
that all of the vendors out there selling products to customers would
already have alot of experience testing hardware, and other software
bits. Lets apply some of that know-how to Lustre!
And I think these testing personnel need to be made known to the
community, so they can talk to each other, so that developers can guide
their efforts, so we know what our testing converage looks like, etc.
Testing needs to be a CONTINUAL process, not just something we do at the
end for a specific release number. By the time we tag 2.4, it should
already have been tested so frequently all along the master development
cycle that the final testing will start to look like a formality to us.
We should still do it, of course, but we should have confidence long
before that happens.
LLNL is trying to do that with the master branch as it moves to 2.4.
Our coverage is mainly on zfs backends for now, but as the rest of orion
lands on master, and Sequoia goes into limited production use we'll have
both zfs and ldiskfs filesystems in our testbed, and test regularly all
the way up to, and beyond, 2.4.
The gaps in testing are NOT all an issue of insufficent scale testing,
although there is admittedly a constant issue there. We need much
better testing at small scale as well.
And let me be really clear: when I say testing, I mean a real human
being thinking up new tests all of the time. Looking at logs all of the
time (so even when the test app succeeded, we'll catch the timeouts and
reconnections and things that should not be happening, and are symptoms
of bugs). Powering things off randomly. Literally pulling cables out
while an evil, pathologically bad IO workload is running.
We need real people to test all of the things that it is really easy for
a human to do, and would take years for developers to automate with any
reliability.
The automated regression suite that we use is great. We should continue
to improve that over time. But I would content that it is not, and
never will be, sufficient to tells us if Lustre is stable.
I would argue that the regressions tests are, in fact, a very low bar.
And Lustre is just too complicated, networks are too complicated, we
have too few developers, to ever come up with an automated suite with
any thing but a relatively low confidence level in the stability of the
software.
And human testers are given a very different set of goals then
developers. A developer's job is to make things work. A tester's is to
do whatever they can to break it. And then create a good report of how
they broke it so the developers can fix it.
I also agree that I don't want to continue in this mode of "we'll only
run it when LLNL/ORNL runs it and says its good". So we need more human
testers.
And to get back to the topic of making every single release a "stable"
release: That ignores the fact that we have roughly a decade of
seriously buggy, undocumented code that we're dealing with. It just
will not happen. Period. We have to accept that and move forward.
We can strive from this point on to make every release better than the
last. But developers are human. Every time we add new features, we're
going to add new bugs. We'll also fix bugs. But we're going to add new
ones as well.
So we deal with that by having "maintenance" releases. The maintenance
release is maintained for a "long" period of time, but add NO new
features. No new support for new kernels. No fantastic new performance
improvements. Just bug fixes.
The maintenance release is what vendors should build products upon,
because that is where we'll land only bug fixes. So it is far more
likely to only improve with time, whereas "master" (and therefore the
"feature" releases which are just tags on master every 6 months), will
also introduce destabilizing new features.
We'll endevour to make the new features as stable as we are capable of
doing, and we can do better if we have more testers, but we have to be
pragmatic.
"Every tag should be completely stable" is impossible. "Every tag on
the maintenance branch should be more stable than the last" is an
achievable goal.
Chris
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Lustre-devel] [cdwg] broader Lustre testing
2012-07-12 20:57 ` Christopher J. Morrone
@ 2012-07-16 18:31 ` Roman Grigoryev
0 siblings, 0 replies; 6+ messages in thread
From: Roman Grigoryev @ 2012-07-16 18:31 UTC (permalink / raw)
To: lustre-devel
Hi Christopher,
.....
>
> The automated regression suite that we use is great. We should continue
> to improve that over time. But I would content that it is not, and
> never will be, sufficient to tells us if Lustre is stable.
>
> I would argue that the regressions tests are, in fact, a very low bar.
> And Lustre is just too complicated, networks are too complicated, we
> have too few developers, to ever come up with an automated suite with
> any thing but a relatively low confidence level in the stability of the
> software.
>
> And human testers are given a very different set of goals then
> developers. A developer's job is to make things work. A tester's is to
> do whatever they can to break it. And then create a good report of how
> they broke it so the developers can fix it.
>
.............
Just for proving your statement that it is not enough just execute
automated regression suite (acc-small) for testing quality I would like to
share coverage summary which we got:
958 tests was executed
Hit Total Coverage
Lines: 79691 128935 61.8 %
Functions: 6206 7935 78.2 %
Branches: 49287 113914 43.3 %
Thanks,
Roman
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Lustre-devel] [cdwg] broader Lustre testing
2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
2012-07-12 20:25 ` [Lustre-devel] [cdwg] " Andreas Dilger
2012-07-12 20:57 ` Christopher J. Morrone
@ 2012-07-20 14:13 ` James A Simmons
2012-07-20 18:20 ` Nathan Rutman
2 siblings, 1 reply; 6+ messages in thread
From: James A Simmons @ 2012-07-20 14:13 UTC (permalink / raw)
To: lustre-devel
On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:
>
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>
> > A more strategic solution is to do more testing of a feature release
> > candidate _before_ it is released. Even if a Community member has
> > no
> > interest in using a feature release in production, early testing
> > with
> > pre-release versions of feature releases will help identify
> > instabilities created by the new feature with their workloads and
> > hardware before the release is official.
...
> So, it seems, we need to test pre-release versions of Lustre, aka
> Master, with my applications. To that end, how willing are people to
> set aside a day, say once every two months, to be "filesystem beta
> day". Scientists, run your codes, users, do your normal work, but
> bear in mind there may be filesystem instabilities on that day. Make
> sure your data is backed up. Make sure it's not in the middle of a
> critical week-long run. Accept that you might have to re-run it
> tomorrow in the worst case. Report any problems you have.
> What you get out of it is a much more stable Master, and an end to the
> question of "which version should I run". When released, you have
> confidence that you can move up, get the great new features and
> performance, and it runs your applications. More people are on the
> same release, so it sees even more testing. The maintenance branch is
> always the latest branch, you can pull in point releases with more bug
> fixes with ease. No more rolling your own Lustre with Frankenstein
> sets of patches. Latest and greatest and most stable.
>
>
> Pipe dream?
Since people are now moving to help test out the current master branch
for whamcloud I like to purpose posting a general summary of testing
results people are seeing. I personally have finished a first run at
testing 2.2.91 this last week and would galdly share the results. Anyone
else can to share :-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Lustre-devel] [cdwg] broader Lustre testing
2012-07-20 14:13 ` James A Simmons
@ 2012-07-20 18:20 ` Nathan Rutman
0 siblings, 0 replies; 6+ messages in thread
From: Nathan Rutman @ 2012-07-20 18:20 UTC (permalink / raw)
To: lustre-devel
On Jul 20, 2012, at 7:13 AM, James A Simmons wrote:
> On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:
>>
>> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>>
>>> A more strategic solution is to do more testing of a feature release
>>> candidate _before_ it is released. Even if a Community member has
>>> no
>>> interest in using a feature release in production, early testing
>>> with
>>> pre-release versions of feature releases will help identify
>>> instabilities created by the new feature with their workloads and
>>> hardware before the release is official.
>
> ...
>> So, it seems, we need to test pre-release versions of Lustre, aka
>> Master, with my applications. To that end, how willing are people to
>> set aside a day, say once every two months, to be "filesystem beta
>> day". Scientists, run your codes, users, do your normal work, but
>> bear in mind there may be filesystem instabilities on that day. Make
>> sure your data is backed up. Make sure it's not in the middle of a
>> critical week-long run. Accept that you might have to re-run it
>> tomorrow in the worst case. Report any problems you have.
>> What you get out of it is a much more stable Master, and an end to the
>> question of "which version should I run". When released, you have
>> confidence that you can move up, get the great new features and
>> performance, and it runs your applications. More people are on the
>> same release, so it sees even more testing. The maintenance branch is
>> always the latest branch, you can pull in point releases with more bug
>> fixes with ease. No more rolling your own Lustre with Frankenstein
>> sets of patches. Latest and greatest and most stable.
>>
>>
>> Pipe dream?
>
> Since people are now moving to help test out the current master branch
> for whamcloud I like to purpose posting a general summary of testing
> results people are seeing. I personally have finished a first run at
> testing 2.2.91 this last week and would galdly share the results. Anyone
> else can to share :-)
>
>
I started a page on the OpenSFS Wiki for everyone to share their test results in a free-form format. Note that the Wiki itself is still in it's infancy - I call on the community to help populate it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120720/18bf46fe/attachment.htm>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-07-20 18:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <A58D7719023A5D47A6D3BEAF8C0BE67509EA751B@CFWEX01.americas.cray.com>
2012-07-12 19:37 ` [Lustre-devel] broader Lustre testing Nathan Rutman
2012-07-12 20:25 ` [Lustre-devel] [cdwg] " Andreas Dilger
2012-07-12 20:57 ` Christopher J. Morrone
2012-07-16 18:31 ` Roman Grigoryev
2012-07-20 14:13 ` James A Simmons
2012-07-20 18:20 ` Nathan Rutman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.