public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] xfstests: define an INTENSITY level for testing
@ 2010-01-21 21:36 Alex Elder
  2010-01-23 11:59 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Elder @ 2010-01-21 21:36 UTC (permalink / raw)
  To: xfs

I've often felt it would be nice if testing could be
done to a specified level of intensity.  That way,
for example, I could perform a full suite of tests
but have them just do very basic stuff, so that I
get coverage but without having to wait as long as
is required for a hard-core test.  Similarly, before
a release I'd like to run tests exhaustively, to
make sure things get really exercised.

Right now there is a "quick" group defined for xfstests,
but what I'm talking about is more of a parameter applied
to all tests so that certain functions could be lightly
tested that might not otherwise be covered by one of the
"quick" ones.  We might even be able to get rid of the
"quick" group.  And an inherently long-running test
might make itself not run if the intensity level was
not high enough.

So I propse we defined a global, set in common.rc, which
defines an integer 0 < INTENSITY <= 100, which would
define how hard each test should push.  INTENSITY of
100, would cause all tests would do their most exhaustive
and/or demanding exercises.  INTENSITY of 1 would do very
superficial testing.  Default might be 50.

Tests can simply ignore the INTENSITY value, and initially
that will be the case for most tests.  It may not even make
sense for a given test to have its activity scaled by this
setting.  Once we define it though, tests can be adapted
to make use of it where possible.

Below is a patch that shows how such feature might be
used for tests 104 and 109.

					-Alex

---
 104    |    2 +-
 109    |    2 +-
 common |   37 +++++++++++++++++++++++++++++++++++--
 3 files changed, 37 insertions(+), 4 deletions(-)

Index: b/104
===================================================================
--- a/104
+++ b/104
@@ -58,7 +58,7 @@ _fill_scratch()
 _stress_scratch()
 {
 	procs=3
-	nops=1000
+	nops=$(expr 20 \* "${INTENSITY}")	# Default 1000
 	# -w ensures that the only ops are ones which cause write I/O
 	$FSSTRESS_PROG -d $SCRATCH_MNT -w -p $procs -n $nops $FSSTRESS_AVOID > /dev/null &
 }
Index: b/109
===================================================================
--- a/109
+++ b/109
@@ -96,7 +96,7 @@ umount $SCRATCH_DEV 2>/dev/null
 _scratch_mount
 
 # see if faststart is possible (and requested)
-files=2000
+files=$(expr 40 \* "${INTENSITY}")	# Default 2000
 faststart=""
 if [ -n "$FASTSTART" -a -f $SCRATCH_MNT/f0 ]; then
 	faststart="-N"	# causes us to skip the mkfs step
Index: b/common
===================================================================
--- a/common
+++ b/common
@@ -42,6 +42,8 @@ sortme=false
 expunge=true
 have_test_arg=false
 randomize=false
+iflag=false
+export INTENSITY=50
 rm -f $tmp.list $tmp.tmp $tmp.sed
 
 # Autodetect fs type based on what's on $TEST_DEV
@@ -54,8 +56,26 @@ fi
 
 for r
 do
-
-    if $group
+    if $iflag
+    then
+	# make sure next arg is a number
+    	if [ $(expr match "$r" '[0-9]\+' = length "$r") = 1 ]
+	then
+	    # and it is in range
+	    if [ "$r" -gt 0 -a "$r" -le 100 ]
+	    then
+	    	INTENSITY=$r
+	    else
+		echo "Intensity \"$r\" is out of range (must be 1-100)"
+		exit 1
+	    fi
+	else
+	    echo "Intensity \"$r\" is not a (valid) number"
+	    exit 1
+	fi
+	iflag=false
+	continue
+    elif $group
     then
 	# arg after -g
 	group_list=`sed -n <group -e 's/$/ /' -e "/^[0-9][0-9][0-9].* $r /"'{
@@ -132,6 +152,7 @@ check options
     -q			quick [deprecated]
     -T			output timestamps
     -r 			randomize test order
+    -i <intensity>	set intensity level (1-100, default 50)
     
 testlist options
     -g group[,group...]	include tests from these groups
@@ -162,6 +183,12 @@ testlist options
 	    xpand=false
 	    ;;
 
+	-i)	# -i intensity ... set testing intensity
+	    shift
+	    iflag=true
+	    xpand=false
+	    ;;
+
 	-l)	# line mode for diff, was default before
 	    diff="diff"
 	    xpand=false
@@ -268,6 +295,12 @@ BEGIN	{ for (t='$start'; t<='$end'; t++)
 
 done
 
+if $iflag
+then
+    echo "No intensity value specified with -i"
+    exit 1
+fi
+
 if [ -s $tmp.list ]
 then
     # found some valid test numbers ... this is good

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] xfstests: define an INTENSITY level for testing
  2010-01-21 21:36 [RFC] xfstests: define an INTENSITY level for testing Alex Elder
@ 2010-01-23 11:59 ` Dave Chinner
  2010-01-26 21:39   ` Alex Elder
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2010-01-23 11:59 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Jan 21, 2010 at 03:36:07PM -0600, Alex Elder wrote:
> I've often felt it would be nice if testing could be
> done to a specified level of intensity.  That way,
> for example, I could perform a full suite of tests
> but have them just do very basic stuff, so that I
> get coverage but without having to wait as long as
> is required for a hard-core test.  Similarly, before
> a release I'd like to run tests exhaustively, to
> make sure things get really exercised.

At first glance, this sounds like a good idea to control
the runtime of test runs.

However, after thinking about it for a while and reflecting on
the approach the QA group in ASG (long live ASG!) took for release
testing, I have a few concerns about using the concept in xfstests.

> Right now there is a "quick" group defined for xfstests,
> but what I'm talking about is more of a parameter applied
> to all tests so that certain functions could be lightly
> tested that might not otherwise be covered by one of the
> "quick" ones.  We might even be able to get rid of the
> "quick" group.  And an inherently long-running test
> might make itself not run if the intensity level was
> not high enough.

IIRC we introduced the "quick" group as a way to provide developers
sufficient coverage to flush out major bugs in patches quickly, not
provide complete test coverage. i.e. to speed up the development
process, not speed up or improve the QA process. Patches still need
to pass the "auto" group tests without regressions before being
posted for review....

> So I propse we defined a global, set in common.rc, which
> defines an integer 0 < INTENSITY <= 100, which would
> define how hard each test should push.  INTENSITY of
> 100, would cause all tests would do their most exhaustive
> and/or demanding exercises.  INTENSITY of 1 would do very
> superficial testing.  Default might be 50.

How would you solve the problem that "intensity" is very dependent
on the system the tests are being run on? e.g. Something run on an
SSD is going to run far faster than the same test on a UML instance
on a slow laptop disk, even though they run at the same "intensity"
level.

Another concern I have is that "intensity" might have different causes
on different systems.  e.g. on UML, it is forking new processes that
causes the massive slowdowns (300ms for a fork+exec on a 2GHz
Athlon64), not the amount of IO. Hence changing the number of files
or IOPS won't really change the runtime of tests significantly if
the problem is that the test runs "expr" 100,000 times. e.g:

http://git.kernel.org/?p=fs/xfs/xfstests-dev.git;a=commit;h=e714acc0ef37031b9a5a522703f2832f139c22e0

> Tests can simply ignore the INTENSITY value, and initially
> that will be the case for most tests.  It may not even make
> sense for a given test to have its activity scaled by this
> setting.  Once we define it though, tests can be adapted
> to make use of it where possible.
> 
> Below is a patch that shows how such feature might be
> used for tests 104 and 109.

/me looks at the changes

I think this is the wrong fix for decreasing test 104 runtime.
The fstress processes only need to run while the grows are in
progress, once they are complete the fsstress processes
can be killed rather than waited for. Using kill then wait
would reduce the runtime without potentially compromising the
test - if the number of ops are too low then fsstress doesn't run
long enough to effectively load up the filesystem during the grow
process to trigger the deadlock conditions.

For 109 I think changing the number of files compromises the initial
conditions required to trigger the deadlock on kernels <= 2.6.18.
It's an enospc test on a 160MB filesystem and the number of files it
uses is for fragementing free space sufficiently to trigger
out-of-order AG locking when ENOSPC in and AG occurs. Changing the
number of files results in different freespace fragmentation
patterns and  hence may not trigger the deadlock condition....

----

Stepping back and looking at this from an overall QA coverage point
of view, it seems to me that you are trying to make xfstests be
something that it is not intended to be. You want "exhaustive" test
coverage before a release, but xfstests have never been a vehicle
for exhaustive testing. That is, xfstests is really designed to
provide maximal code coverage with some load and stress tests
thrown in, but it is not intended to be the only testing mechanism
for the filesystem.

It might be instructive to go back and look at what the old SGI ASG
(long live ASG!) test group were doing (I hope it was archived!).
They were running xfstests on multiple platforms (x86_64, PPC and
ia64) for code coverage but not stress. To improve coverage, every
second xfstest run used a different set of non-default mkfs and
mount options to exercise different code paths (e.g. blocksize <
pagesize, directory block size > page size, etc) which otherwise
would not be tested.

There were separate test plans, procedures, processes and scripts to
execute long running stress and load tests. These were run as part
of the QA validation prior to major releases (the angle you appear
to be coming from, Alex) rather than day-to-day testing of the
current dev kernels.

More importantly, the load/stress tests weren't aimed at specific
XFS features (already handled by xfstests) - instead they were high
level tests aimed at trying to break the system.  e.g. one of the
stress tests was running tens of local processes creating and
destroying large and small files simultaneously with NFS clients
doing the same thing on the same filesystem whilst turning quotas on
and off randomly and running concurrent filesystem snapshots and
then mounting and running filesystem checks on the snapshots to
ensure they were consistent.  These tests would run for up to a week
at a time, so it takes dedicated resources to run this sort of
testing.

For load point tests, similar tests were run but the number of
processes creating load were varied over time so that the system
load varied between almost idle to almost 100% to ensure that
there weren't problems that light or medium loads exposed. Once
again these were long running tests on multiple platforms.

----

In my experience, exhaustive testing requires a combination of
testing from low level point tests (xfstests) all the way up to high
level system level integration tests. The methods and test processes
for these are different as the focus for the tests are different.

Hence I agree with your intent and reasoning behind intensity level
based stress testing, but I think that xfstests is not the right
sort of test suite to use for this type of testing. I think we'd
do better to try to recover some of the high level stress tests
and processes from the corpse of ASG than to try to use xfstests
for this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [RFC] xfstests: define an INTENSITY level for testing
  2010-01-23 11:59 ` Dave Chinner
@ 2010-01-26 21:39   ` Alex Elder
  0 siblings, 0 replies; 3+ messages in thread
From: Alex Elder @ 2010-01-26 21:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave Chinner wrote:
> On Thu, Jan 21, 2010 at 03:36:07PM -0600, Alex Elder wrote:
>> I've often felt it would be nice if testing could be
>> done to a specified level of intensity.  That way,
>> for example, I could perform a full suite of tests
>> but have them just do very basic stuff, so that I
>> get coverage but without having to wait as long as
>> is required for a hard-core test.  Similarly, before
>> a release I'd like to run tests exhaustively, to
>> make sure things get really exercised.
> 
> At first glance, this sounds like a good idea to control
> the runtime of test runs.

I know this is all a bit long, but I think it's a good
discussion.  I have some responses below.  I think you
and I are in pretty close agreement on the role of
tests, and maybe philosophy toward testing code...

> However, after thinking about it for a while and reflecting on
> the approach the QA group in ASG (long live ASG!) took for release
> testing, I have a few concerns about using the concept in xfstests.
> 
>> Right now there is a "quick" group defined for xfstests,
>> but what I'm talking about is more of a parameter applied
>> to all tests so that certain functions could be lightly
>> tested that might not otherwise be covered by one of the
>> "quick" ones.  We might even be able to get rid of the
>> "quick" group.  And an inherently long-running test
>> might make itself not run if the intensity level was
>> not high enough.
> 
> IIRC we introduced the "quick" group as a way to provide developers
> sufficient coverage to flush out major bugs in patches quickly, not
> provide complete test coverage. i.e. to speed up the development
> process, not speed up or improve the QA process. Patches still need
> to pass the "auto" group tests without regressions before being
> posted for review....
> 
>> So I propse we defined a global, set in common.rc, which
>> defines an integer 0 < INTENSITY <= 100, which would
>> define how hard each test should push.  INTENSITY of
>> 100, would cause all tests would do their most exhaustive
>> and/or demanding exercises.  INTENSITY of 1 would do very
>> superficial testing.  Default might be 50.
> 
> How would you solve the problem that "intensity" is very dependent
> on the system the tests are being run on? e.g. Something run on an
> SSD is going to run far faster than the same test on a UML instance
> on a slow laptop disk, even though they run at the same "intensity"
> level.

There is an unlimited number of ways one could define "stress" or
"intensity" of a test--each particular test will be doing something
very specific.  No single parameter (like "intensity") could possibly
capture all of them.

That being said, my purpose is to define a single knob with
an approximate definition, to be interpreted as appropriate for
each test.  You're right, in some setups (e.g., under UML) the
intensity setting might have undesirable results--and the person
who decides to make a test key on the intensity setting should
take that into account.  (By the same argument the value/meaning/
result of a given test--intense or not--may change depending on
the setup, so we're already faced with that issue.)

> Another concern I have is that "intensity" might have different causes
> on different systems.  e.g. on UML, it is forking new processes that
> causes the massive slowdowns (300ms for a fork+exec on a 2GHz
> Athlon64), not the amount of IO. Hence changing the number of files
> or IOPS won't really change the runtime of tests significantly if
> the problem is that the test runs "expr" 100,000 times. e.g:
> 
> http://git.kernel.org/?p=fs/xfs/xfstests-dev.git;a=commit;h=e714acc0ef37031b9a5a522703f2832f139c22e0

The meaning of "intensity" should be defined by the focus
of what is being tested, not how long that takes.  I.e., if
you're trying to test lots of concurrent I/O, then higher
intensity should mean doing *more* concurrent I/O, regardless
of the particular system under test.  Adjusting run time is
admittedly one of the goals, but it shouldn't really be taken
as the meaning for this setting.  (An example below elaborates
on this a bit more.)

If you want to limit runtime, then some other knob might be
defined for that (but I don't advocate that).

>> Tests can simply ignore the INTENSITY value, and initially
>> that will be the case for most tests.  It may not even make
>> sense for a given test to have its activity scaled by this
>> setting.  Once we define it though, tests can be adapted
>> to make use of it where possible.
>> 
>> Below is a patch that shows how such feature might be
>> used for tests 104 and 109.
> 
> /me looks at the changes

I didn't really make this clear in my initial post, but I
sort of contrived those two examples for the purpose of
demonstration.  I looked for relatively long-running
tests, and found a simple thing that could be tweaked.
I make no claim that the examples were the correct way
to use this concept.

> I think this is the wrong fix for decreasing test 104 runtime.
> The fstress processes only need to run while the grows are in
> progress, once they are complete the fsstress processes
> can be killed rather than waited for. Using kill then wait
> would reduce the runtime without potentially compromising the
> test - if the number of ops are too low then fsstress doesn't run
> long enough to effectively load up the filesystem during the grow
> process to trigger the deadlock conditions.

That's likely a better way to do it.  Using an intensity level
may not even make sense in some circumstances.  (And again,
runtime isn't the goal.)

> For 109 I think changing the number of files compromises the initial
> conditions required to trigger the deadlock on kernels <= 2.6.18.
> It's an enospc test on a 160MB filesystem and the number of files it
> uses is for fragementing free space sufficiently to trigger
> out-of-order AG locking when ENOSPC in and AG occurs. Changing the
> number of files results in different freespace fragmentation
> patterns and  hence may not trigger the deadlock condition....

So this is a bad example, but it does demonstrate the
sort of thing that could be done.  Each test really would
need to be looked at individually, and if used at all,
intensity scaling would be done in a way that makes
sense for the test.

> ----
> 
> Stepping back and looking at this from an overall QA coverage point
> of view, it seems to me that you are trying to make xfstests be
> something that it is not intended to be. You want "exhaustive" test
> coverage before a release, but xfstests have never been a vehicle
> for exhaustive testing. That is, xfstests is really designed to
> provide maximal code coverage with some load and stress tests
> thrown in, but it is not intended to be the only testing mechanism
> for the filesystem.

I totally agree that xfstests is not sufficient for exhaustive
tests.  There needs to be a sort of meta- layer of testing that
covers option combinations, platforms, grouping of things
concurrently, etc. (which you kind of get into, below).

Still, I think this "intensity" (or whatever you might call it)
can be a useful concept--even if it's used "only" as you describe.


> It might be instructive to go back and look at what the old SGI ASG
> (long live ASG!) test group were doing (I hope it was archived!).
> They were running xfstests on multiple platforms (x86_64, PPC and
> ia64) for code coverage but not stress. To improve coverage, every
> second xfstest run used a different set of non-default mkfs and
> mount options to exercise different code paths (e.g. blocksize <
> pagesize, directory block size > page size, etc) which otherwise
> would not be tested.

I'm doing some of this stuff now and am working on expanding it
as I can.  I do not have the hardware or even software setup that
was present in ASG (long live ASG!) but I wish I did, and intend
to keep building on what I have.  We have QA people to run some
things too, and I'm hoping we can focus their efforts more on
pushing limits, doing layered testing (like concurrent tests,
for example) and perhaps much larger systems than I typically
have.

> There were separate test plans, procedures, processes and scripts to
> execute long running stress and load tests. These were run as part
> of the QA validation prior to major releases (the angle you appear
> to be coming from, Alex) rather than day-to-day testing of the
> current dev kernels.

Actually, I'm coming from it more from the other end, but the point
is still there.  I want to ensure that developers can do their
day-to-day testing in a reasonable time.  We will be adding more
and more tests (which is good!) and in time, scaling things *back*
might become more important.  Still, as long as you're writing a
test, it would be good while you're thinking about it to be able
to lay out just how one might do both a minimal but sufficient
test, as well as how one might really do something extreme.

> More importantly, the load/stress tests weren't aimed at specific
> XFS features (already handled by xfstests) - instead they were high
> level tests aimed at trying to break the system.  e.g. one of the
> stress tests was running tens of local processes creating and
> destroying large and small files simultaneously with NFS clients
> doing the same thing on the same filesystem whilst turning quotas on
> and off randomly and running concurrent filesystem snapshots and
> then mounting and running filesystem checks on the snapshots to
> ensure they were consistent.  These tests would run for up to a week
> at a time, so it takes dedicated resources to run this sort of
> testing.
> 
> For load point tests, similar tests were run but the number of
> processes creating load were varied over time so that the system
> load varied between almost idle to almost 100% to ensure that
> there weren't problems that light or medium loads exposed. Once
> again these were long running tests on multiple platforms.

All of the above are great.

> ----
> 
> In my experience, exhaustive testing requires a combination of
> testing from low level point tests (xfstests) all the way up to high
> level system level integration tests. The methods and test processes
> for these are different as the focus for the tests are different.
> 
> Hence I agree with your intent and reasoning behind intensity level
> based stress testing, but I think that xfstests is not the right
> sort of test suite to use for this type of testing. I think we'd
> do better to try to recover some of the high level stress tests
> and processes from the corpse of ASG than to try to use xfstests
> for this....

To reiterate, I agree that xfstests alone aren't enough.  They're
a tool for doing single tests, one at a time, on a single system
and more or less a single file system.

However, I still think there's *some* value to this concept, even
for xfstests.

Here's another example to consider.  Suppose a change goes in to
XFS, and all of a sudden test 306 is failing intermittently.  I
put in a fix, and now test 306 hasn't failed in a long time.  But
to be really sure, I'd like to do whatever test 306 is doing, but
even more so.  So in that case, I'd like to be able to run the
EXTREME version of test 306, and if it passes that I have more
confidence that my fix has eliminated the problem.  I don't really
care if it takes a lot longer to run this test, I'm looking for
a more rigorous form of the test than a normal xfstests run
might do.

Yes, I could go look at the test and figure out myself how to
make it more extreme, but that's missing the point.  If there
were an intensity level supported by that test already, I can
have some assurance that using it will give me just the kind
of hard-core version of the test I'm looking for.

I really appreciate your thoughtful response.  And you should
know that even though I haven't replicated what the ASG test
people were doing (long live ASG!) I'm working toward getting
at least some of what they did back again.

Thanks.

					-Alex


> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-01-26 21:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-21 21:36 [RFC] xfstests: define an INTENSITY level for testing Alex Elder
2010-01-23 11:59 ` Dave Chinner
2010-01-26 21:39   ` Alex Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox