Re: performance testing

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: performance testing
       [not found]           ` <4680CE9B.1030602@bull.net>
@ 2007-06-26  9:48             ` Valerie Clement
  2007-06-26 10:36               ` Girish Shilamkar
  0 siblings, 1 reply; 11+ messages in thread
From: Valerie Clement @ 2007-06-26  9:48 UTC (permalink / raw)
  To: Valerie Clement, Alex Tomas, Andreas Dilger; +Cc: ext4 development

Valerie Clement wrote:
> Alex Tomas wrote:
>> Jean noel Cordenner wrote:
>>> The last patch queue concerns the  2.6.22-rc4 kernel, so we took the
>>> previous ext4 patch queue including the modifications suggested by 
>>> dmitriy:
>>> http://article.gmane.org/gmane.comp.file-systems.ext4/2291
>>> This solve the oops problem but after a while, the system hangs. We are
>>> still trying to find where the bug is.
>>> When we remove all the patches until booked-page-flag.patch in the
>>> series, the system still hangs. When using another filesystem, or
>>> whithout any patches it works.
>>
>> any details? backtraces? dmesg?
> 
> We are trying to get some traces, but with 2.6.22-rc5 and 2.6.22-rc6 
> kernels, the serial console isn't working on our systems (x86_64), the 
> magic SysRq keys either. Strange...
> 
> When the system hangs, no messages are logged.
> It seems that the hangs only occur with ext4 FS when applying the 
> patches of the current git tree.
> We are trying now to find which patch is faulty.
> 
>    Valérie
> 

It seems that the faulty patch is "ext4-journal_chksum-2.6.20.patch".
Looking at the patch, I think the following change is not correct:

@@ -116,21 +120,36 @@ static int journal_write_commit_record(j

     bh = jh2bh(descriptor);

-   /* AKPM: buglet - add `i' to tmp! */
     for (i = 0; i < bh->b_size; i += 512) {
-       journal_header_t *tmp = (journal_header_t*)bh->b_data;
+       struct commit_header *tmp = (struct commit_header*)bh->b_data +
+                                           i;

Shouldn't it be :

   struct commit_header *tmp = (struct commit_header*)(bh->b_data + i);

Valérie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: performance testing
  2007-06-26  9:48             ` performance testing Valerie Clement
@ 2007-06-26 10:36               ` Girish Shilamkar
  2007-06-26 11:03                 ` Valerie Clement
  0 siblings, 1 reply; 11+ messages in thread
From: Girish Shilamkar @ 2007-06-26 10:36 UTC (permalink / raw)
  To: Valerie Clement; +Cc: Alex Tomas, Andreas Dilger, ext4 development

On Tue, 2007-06-26 at 11:48 +0200, Valerie Clement wrote:
> Shouldn't it be :
> 
>    struct commit_header *tmp = (struct commit_header*)(bh->b_data + i);
> 
Ohhh, yes you are right. This is the correct thing to do.
This patch which is been used has some endian-ness bugs. I had sent an
updated patch for 2.6.22-rc5 on 
19 June. I think we should be using the updated patch else it may break
on big-endian machines. 
Even the updated patch has this bug i.e "struct commit_header *tmp
=...."

Thanks,
Girish.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: performance testing
  2007-06-26 10:36               ` Girish Shilamkar
@ 2007-06-26 11:03                 ` Valerie Clement
  0 siblings, 0 replies; 11+ messages in thread
From: Valerie Clement @ 2007-06-26 11:03 UTC (permalink / raw)
  To: Girish Shilamkar; +Cc: Alex Tomas, Andreas Dilger, ext4 development

Girish Shilamkar wrote:
> On Tue, 2007-06-26 at 11:48 +0200, Valerie Clement wrote:
>> Shouldn't it be :
>>
>>    struct commit_header *tmp = (struct commit_header*)(bh->b_data + i);
>>
> Ohhh, yes you are right. This is the correct thing to do.
> This patch which is been used has some endian-ness bugs. I had sent an
> updated patch for 2.6.22-rc5 on 
> 19 June. I think we should be using the updated patch else it may break
> on big-endian machines. 
> Even the updated patch has this bug i.e "struct commit_header *tmp
> =...."
> 
> Thanks,
> Girish.
> 
Hi Girish,
I made the change on my system and tested it. Good news, everything 
seems to work well now. Could you please make and post the patch?
Thanks,
   Valérie

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Performance testing
@ 2014-09-17  8:48 Jan Tulak
  2014-09-17  9:36 ` Dmitry Monakhov
  2014-09-18  0:36 ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Tulak @ 2014-09-17  8:48 UTC (permalink / raw)
  To: fstests; +Cc: Lukas Czerner

Hi,

I have began to work on some set of performance tests. I think it would
be useful to have some standard set, because as far as I know, there is
just little of performance testing and every of the few tests someone
does is unique. I want to propose my ideas before I start to really
write it, to fix possible complications.

Mixing performance with regressions tests wouldn't be a good idea, so I
thought about creating another category on the main level of tests
(something like xfstests/tests/performance). Or it would be better to
put it into entirely new directory, like xfstests/performance?

>From the beginning there would be some basic test cases, like sync/async
read and write. Hopefully more natural cases, like a database server
would be added later. For the IO testing, I want to use FIO for the
specific workflow and eventually iozone for the basic synthetic tests.

What I'm not sure is how a comparison between different versions could
be done, because I don't see any infrastructure within fstests for
cross-version comparison. (What would it do with regression tests
anyway...) So I wonder if it should be done in this set at all. So the
set would only print the measured values. Some other tool (which can be
also included, but is not directly part of the performance tests set)
could then be used to compare and/or plot graphs.

Comments and questions? :-)

Jan Tulak

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-17  8:48 Performance testing Jan Tulak
@ 2014-09-17  9:36 ` Dmitry Monakhov
  2014-09-25 15:08   ` Jan Tulak
  2014-09-18  0:36 ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Dmitry Monakhov @ 2014-09-17  9:36 UTC (permalink / raw)
  To: Jan Tulak, fstests; +Cc: Lukas Czerner

On Wed, 17 Sep 2014 10:48:44 +0200, Jan Tulak <jtulak@redhat.com> wrote:
> Hi,
> 
> I have began to work on some set of performance tests. I think it would
> be useful to have some standard set, because as far as I know, there is
> just little of performance testing and every of the few tests someone
> does is unique. I want to propose my ideas before I start to really
> write it, to fix possible complications.
> 
> Mixing performance with regressions tests wouldn't be a good idea, so I
> thought about creating another category on the main level of tests
> (something like xfstests/tests/performance). Or it would be better to
> put it into entirely new directory, like xfstests/performance?
> 
> From the beginning there would be some basic test cases, like sync/async
> read and write. Hopefully more natural cases, like a database server
> would be added later. For the IO testing, I want to use FIO for the
> specific workflow and eventually iozone for the basic synthetic tests.
> 
> What I'm not sure is how a comparison between different versions could
> be done, because I don't see any infrastructure within fstests for
> cross-version comparison. (What would it do with regression tests
> anyway...) So I wonder if it should be done in this set at all. So the
> set would only print the measured values. Some other tool (which can be
> also included, but is not directly part of the performance tests set)
> could then be used to compare and/or plot graphs.
> 
> Comments and questions? :-)
This kind of functionality already implemented in autotest via perf keyval
http://autotest.readthedocs.org/en/latest/main/local/Keyval.html
test case may produce any numbers of keyvalues which will be
automatically stored to standard database for later comparison.
> 
> Jan Tulak
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-17  8:48 Performance testing Jan Tulak
  2014-09-17  9:36 ` Dmitry Monakhov
@ 2014-09-18  0:36 ` Dave Chinner
  2014-09-25 15:03   ` Jan Tulak
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-09-18  0:36 UTC (permalink / raw)
  To: Jan Tulak; +Cc: fstests, Lukas Czerner

On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> Hi,
> 
> I have began to work on some set of performance tests. I think it would
> be useful to have some standard set, because as far as I know, there is
> just little of performance testing and every of the few tests someone
> does is unique. I want to propose my ideas before I start to really
> write it, to fix possible complications.

Great idea, but I'm missing some context about your ultimate goal
here: how is this performance testing going to be used?

My focus for xfstests is mainly for it to be useful to filesystem
developers who are developing new features and fixing bugs, so my
comments come from the point of view of "will it make my life as a
filesystem developer easier?" rather than a "we have need a
performance test suites" perspective.

> Mixing performance with regressions tests wouldn't be a good idea, so I
> thought about creating another category on the main level of tests
> (something like xfstests/tests/performance). Or it would be better to
> put it into entirely new directory, like xfstests/performance?

That depends. What infrastructure do you actually need from the
xfstests harness? How much commonality are you expecting to use
here? If there's no commonality (i.e. it's a completely separate set
of infrastructure that only shares SCRATCH_DEV/SCRATCH_MNT) then I'd
have to question whether xfstests is the right place for this
functionality.

However, if it leverages all the same test template and execution
methods, then having it as just another test subgroup (i.e. in
tests/performance) would the right way to approach this.

> From the beginning there would be some basic test cases, like sync/async
> read and write. Hopefully more natural cases, like a database server
> would be added later.

IMO, if we do add performance tests to xfstests, then the focus
would definitely need to be on performance regression tests, not
"performance benchmark" (aka benchmarketing) tests. If you want
"performance benchmarks" then openbenchmarking.org is probably a
better place to start as that is what it is designed for and
already has everything you've mentioned.

So I'll focus on performance regression testing. Performance
regression testing involves a lot more than just "run benchmark,
save and compare results". It's once the "compare results" phase
says "regression found" that the functionality of the test really
matters to the filesystem develper. i.e. the tests need to be useful
for *analysis of the regression*.

Hence things like "database server benchmark" don't really belong in
a performance regression test suite because they can't be used to
isolate regressions. Further, they tend to be susceptible to changes
in performance being caused by changes outside filesystem and
storage layers. Hence they lead to wild goose chases more often than
they point to a real filesystem or IO regression.

> For the IO testing, I want to use FIO for the
> specific workflow and eventually iozone for the basic synthetic tests.

I think that the initial focus for performance regression tests
would need to be more on simple micro-benchmarks (e.g. read, write,
create, remove, etc).  I'd much prefer to see simple, targeted
benchmarks that are easily understood jus tby looking at the
xfstests code. e.g.  a patchset made unlink go fast, but slowed down
file create. Or that we sped up single threaded creates, but
destroyed multithreaded create scalability. Or that we sped up small
directories at the expense of large directories.  These things
can all be measured individually (and quickly) and because they
tend to measure a single aspect of filesystem performance they
can be used directly for regression analysis.

Many of these sorts of tests can be written into the existing
xfstests infrastructure without needing significant external
dependencies - fio and fsmark cover most of the microbenchmarks that
would be necessary. I already have quite a few scripts that I use to
run fsmark tests that could easily be wrapped with xfstests
templates....

As for IOZone, well, I'd suggest you don't bother with IOZone(*)
because we can do far better with bash, dd and fio....

> What I'm not sure is how a comparison between different versions could
> be done, because I don't see any infrastructure within fstests for
> cross-version comparison. (What would it do with regression tests
> anyway...) So I wonder if it should be done in this set at all. So the
> set would only print the measured values. Some other tool (which can be
> also included, but is not directly part of the performance tests set)
> could then be used to compare and/or plot graphs.

I don't think that storing results long term or comparing results is
something xfstests should directly care about. It is architected to
defer that to some external tool for post processing.  i.e. xfstests
is used to run the tests and generate results, not do long term
storage or analysis of those results.

I see no issues with including scripts to do result processing
across multiple RESULT_DIRs within xfstests itself, but the
infrastructure still has to be architected so it can be externally
controllable and usable by external test infrastructure.

Cheers,

Dave.

(*) IOZone is pretty much useless for performance regression
*detection*, let alone be useful for analysis of regressions.
Run-to-run variation of +/-10% is not uncommon or unexpected - it
has a very low precision.

It requires extremely stale clocks for it's timing to be accurate,
which means you have to be very careful about the hardware you use.
This also rules out testing in VMs as the timing is simply too
variable to be useful for accurate measurement. It also means that
it's very difficult to reproduce the same results across multiple
machines.

Worse is tha fact that it is also extremely sensitive to userspace
and kernel CPU cache footprint changes. Hence a change that affects
the CPU cache residency of the IOZone data buffer will have far more
effect on the result than the actual algorithmic change to
filesystem and IO subsystem that lead to the CPU cache footprint
change. Hence the same test on two different machiens that only
differ by CPU can give very different results - one might say
"faster", the other can say "slower".

It's just not a reliable tool for IO performance measurement, which
is kinda sad because that is it's sole purpose in life.....
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-18  0:36 ` Dave Chinner
@ 2014-09-25 15:03   ` Jan Tulak
  2014-09-27  0:47     ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Tulak @ 2014-09-25 15:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, Lukas Czerner

On Thu, 2014-09-18 at 10:36 +1000, Dave Chinner wrote:
> On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> > Hi,
> > 
> > I have began to work on some set of performance tests. I think it would
> > be useful to have some standard set, because as far as I know, there is
> > just little of performance testing and every of the few tests someone
> > does is unique. I want to propose my ideas before I start to really
> > write it, to fix possible complications.
> 
> Great idea, but I'm missing some context about your ultimate goal
> here: how is this performance testing going to be used?
> 
> My focus for xfstests is mainly for it to be useful to filesystem
> developers who are developing new features and fixing bugs, so my
> comments come from the point of view of "will it make my life as a
> filesystem developer easier?" rather than a "we have need a
> performance test suites" perspective.

I think there is a place for two kinds of these tests. One is a quick
suite that should not run for more than few minutes and can be used
whenever sending a patch (or on daily basis...) for a quick check, to
catch bad things as soon as possible. 

The other set would be for deeper and more complex and used rather
between versions than single patches. This one maybe could be
independent, but for consistency, I think it is better to keep both sets
together.

> 
> > Mixing performance with regressions tests wouldn't be a good idea, so I
> > thought about creating another category on the main level of tests
> > (something like xfstests/tests/performance). Or it would be better to
> > put it into entirely new directory, like xfstests/performance?
> 
> That depends. What infrastructure do you actually need from the
> xfstests harness? How much commonality are you expecting to use
> here? If there's no commonality (i.e. it's a completely separate set
> of infrastructure that only shares SCRATCH_DEV/SCRATCH_MNT) then I'd
> have to question whether xfstests is the right place for this
> functionality.
> 
> However, if it leverages all the same test template and execution
> methods, then having it as just another test subgroup (i.e. in
> tests/performance) would the right way to approach this.

Yes, this is my intention - use the test template and these things, so
it would look like another test, just it would measure performance.

> > From the beginning there would be some basic test cases, like sync/async
> > read and write. Hopefully more natural cases, like a database server
> > would be added later.
> 
> IMO, if we do add performance tests to xfstests, then the focus
> would definitely need to be on performance regression tests, not
> "performance benchmark" (aka benchmarketing) tests. If you want
> "performance benchmarks" then openbenchmarking.org is probably a
> better place to start as that is what it is designed for and
> already has everything you've mentioned.
> 
> So I'll focus on performance regression testing. Performance
> regression testing involves a lot more than just "run benchmark,
> save and compare results". It's once the "compare results" phase
> says "regression found" that the functionality of the test really
> matters to the filesystem develper. i.e. the tests need to be useful
> for *analysis of the regression*.
> 
> Hence things like "database server benchmark" don't really belong in
> a performance regression test suite because they can't be used to
> isolate regressions. Further, they tend to be susceptible to changes
> in performance being caused by changes outside filesystem and
> storage layers. Hence they lead to wild goose chases more often than
> they point to a real filesystem or IO regression.
> 
This is not needed in the quick suite, but just testing simple
read/write will not find regression appearing during some more
complicated situations. If the only thing changed is the filesystem,
any big difference in results can be attributed to the filesystem
change. Right? 

I do not expect everyone will run this test suite all
day, but it could notice us about regressions between versions of a
filesystem.

> 
> > For the IO testing, I want to use FIO for the
> > specific workflow and eventually iozone for the basic synthetic tests.
> 
> I think that the initial focus for performance regression tests
> would need to be more on simple micro-benchmarks (e.g. read, write,
> create, remove, etc).  I'd much prefer to see simple, targeted
> benchmarks that are easily understood jus tby looking at the
> xfstests code. e.g.  a patchset made unlink go fast, but slowed down
> file create. Or that we sped up single threaded creates, but
> destroyed multithreaded create scalability. Or that we sped up small
> directories at the expense of large directories.  These things
> can all be measured individually (and quickly) and because they
> tend to measure a single aspect of filesystem performance they
> can be used directly for regression analysis.
> 
> Many of these sorts of tests can be written into the existing
> xfstests infrastructure without needing significant external
> dependencies - fio and fsmark cover most of the microbenchmarks that
> would be necessary. I already have quite a few scripts that I use to
> run fsmark tests that could easily be wrapped with xfstests
> templates....

The initial focus should really aim at this, I agree. Creating this
quick and small suite should not take a long time. If you have
something, that could be useful once I will create some kind of template
for performance tests.

> 
> As for IOZone, well, I'd suggest you don't bother with IOZone(*)
> because we can do far better with bash, dd and fio....

Thanks for the info. I did just some brief experiments with IOZone so
far, so these things did eluded me.

> 
> > What I'm not sure is how a comparison between different versions could
> > be done, because I don't see any infrastructure within fstests for
> > cross-version comparison. (What would it do with regression tests
> > anyway...) So I wonder if it should be done in this set at all. So the
> > set would only print the measured values. Some other tool (which can be
> > also included, but is not directly part of the performance tests set)
> > could then be used to compare and/or plot graphs.
> 
> I don't think that storing results long term or comparing results is
> something xfstests should directly care about. It is architected to
> defer that to some external tool for post processing.  i.e. xfstests
> is used to run the tests and generate results, not do long term
> storage or analysis of those results.
> 
> I see no issues with including scripts to do result processing
> across multiple RESULT_DIRs within xfstests itself, but the
> infrastructure still has to be architected so it can be externally
> controllable and usable by external test infrastructure.

I expected something like this, so it shouldn't be a big trouble. What I
see as a good way: at first to create some small tests. Then, once they
works as intended, I can work on the external tool for managing the
results, rather than at first creating the tool. That will also give me
more time to find some good solution. (What I see, there is already some
work with autotest running xfstest, so maybe it will needs just a little
work to add the new tests.)

I hope I answered everything. :-)

> 
> Cheers,
> 
> Dave.
> 
> (*) IOZone is pretty much useless for performance regression
> *detection*, let alone be useful for analysis of regressions.
> Run-to-run variation of +/-10% is not uncommon or unexpected - it
> has a very low precision.
> 
> It requires extremely stale clocks for it's timing to be accurate,
> which means you have to be very careful about the hardware you use.
> This also rules out testing in VMs as the timing is simply too
> variable to be useful for accurate measurement. It also means that
> it's very difficult to reproduce the same results across multiple
> machines.
> 
> Worse is tha fact that it is also extremely sensitive to userspace
> and kernel CPU cache footprint changes. Hence a change that affects
> the CPU cache residency of the IOZone data buffer will have far more
> effect on the result than the actual algorithmic change to
> filesystem and IO subsystem that lead to the CPU cache footprint
> change. Hence the same test on two different machiens that only
> differ by CPU can give very different results - one might say
> "faster", the other can say "slower".
> 
> It's just not a reliable tool for IO performance measurement, which
> is kinda sad because that is it's sole purpose in life.....


Cheers,
Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-17  9:36 ` Dmitry Monakhov
@ 2014-09-25 15:08   ` Jan Tulak
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Tulak @ 2014-09-25 15:08 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: fstests, Lukas Czerner

On Wed, 2014-09-17 at 13:36 +0400, Dmitry Monakhov wrote:
> On Wed, 17 Sep 2014 10:48:44 +0200, Jan Tulak <jtulak@redhat.com> wrote:
> > 
> > Comments and questions? :-)
> This kind of functionality already implemented in autotest via perf keyval
> http://autotest.readthedocs.org/en/latest/main/local/Keyval.html
> test case may produce any numbers of keyvalues which will be
> automatically stored to standard database for later comparison.

Thanks for the info. I think Autotest could be used as the upper layer,
for saving and comparing the results. But I will need to look more on
how it would works.

Cheers,
Jan Tulak

> > 
> > Jan Tulak
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-25 15:03   ` Jan Tulak
@ 2014-09-27  0:47     ` Dave Chinner
  2014-09-29 16:07       ` Jan Tulak
  2014-10-25 17:10       ` Jan Tulak
  0 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2014-09-27  0:47 UTC (permalink / raw)
  To: Jan Tulak; +Cc: fstests, Lukas Czerner

[-- Attachment #1: Type: text/plain, Size: 8024 bytes --]

On Thu, Sep 25, 2014 at 05:03:40PM +0200, Jan Tulak wrote:
> On Thu, 2014-09-18 at 10:36 +1000, Dave Chinner wrote:
> > On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> > > Hi,
> > > 
> > > I have began to work on some set of performance tests. I think it would
> > > be useful to have some standard set, because as far as I know, there is
> > > just little of performance testing and every of the few tests someone
> > > does is unique. I want to propose my ideas before I start to really
> > > write it, to fix possible complications.
> > 
> > Great idea, but I'm missing some context about your ultimate goal
> > here: how is this performance testing going to be used?
> > 
> > My focus for xfstests is mainly for it to be useful to filesystem
> > developers who are developing new features and fixing bugs, so my
> > comments come from the point of view of "will it make my life as a
> > filesystem developer easier?" rather than a "we have need a
> > performance test suites" perspective.
> 
> I think there is a place for two kinds of these tests. One is a quick
> suite that should not run for more than few minutes and can be used
> whenever sending a patch (or on daily basis...) for a quick check, to
> catch bad things as soon as possible. 
> 
> The other set would be for deeper and more complex and used rather
> between versions than single patches. This one maybe could be
> independent, but for consistency, I think it is better to keep both sets
> together.

I'm not yet convinced, but keep talking ;)

> > > From the beginning there would be some basic test cases, like sync/async
> > > read and write. Hopefully more natural cases, like a database server
> > > would be added later.
> > 
> > IMO, if we do add performance tests to xfstests, then the focus
> > would definitely need to be on performance regression tests, not
> > "performance benchmark" (aka benchmarketing) tests. If you want
> > "performance benchmarks" then openbenchmarking.org is probably a
> > better place to start as that is what it is designed for and
> > already has everything you've mentioned.
> > 
> > So I'll focus on performance regression testing. Performance
> > regression testing involves a lot more than just "run benchmark,
> > save and compare results". It's once the "compare results" phase
> > says "regression found" that the functionality of the test really
> > matters to the filesystem develper. i.e. the tests need to be useful
> > for *analysis of the regression*.
> > 
> > Hence things like "database server benchmark" don't really belong in
> > a performance regression test suite because they can't be used to
> > isolate regressions. Further, they tend to be susceptible to changes
> > in performance being caused by changes outside filesystem and
> > storage layers. Hence they lead to wild goose chases more often than
> > they point to a real filesystem or IO regression.
> > 
> This is not needed in the quick suite, but just testing simple
> read/write will not find regression appearing during some more
> complicated situations. If the only thing changed is the filesystem,
> any big difference in results can be attributed to the filesystem
> change. Right? 

No. A change in a filesystem can causing things like more context
switches to occur due to additional serialisation on a sleeping
lock. A change of context switch behaviour can expose issuing in
other subsystems, like the scheduler or even bugs in the locking
code. This happens more frequently than you think...

> I do not expect everyone will run this test suite all
> day, but it could notice us about regressions between versions of a
> filesystem.

It's rare that developers run tests directly comparing released
versions of the kernel. We'll compare "unpatched vs patched" in
back-to-back tests, so the tests we do run need to be cover a good
portion of the performance matrix in a useful fashion....

> > > For the IO testing, I want to use FIO for the
> > > specific workflow and eventually iozone for the basic synthetic tests.
> > 
> > I think that the initial focus for performance regression tests
> > would need to be more on simple micro-benchmarks (e.g. read, write,
> > create, remove, etc).  I'd much prefer to see simple, targeted
> > benchmarks that are easily understood jus tby looking at the
> > xfstests code. e.g.  a patchset made unlink go fast, but slowed down
> > file create. Or that we sped up single threaded creates, but
> > destroyed multithreaded create scalability. Or that we sped up small
> > directories at the expense of large directories.  These things
> > can all be measured individually (and quickly) and because they
> > tend to measure a single aspect of filesystem performance they
> > can be used directly for regression analysis.
> > 
> > Many of these sorts of tests can be written into the existing
> > xfstests infrastructure without needing significant external
> > dependencies - fio and fsmark cover most of the microbenchmarks that
> > would be necessary. I already have quite a few scripts that I use to
> > run fsmark tests that could easily be wrapped with xfstests
> > templates....
> 
> The initial focus should really aim at this, I agree. Creating this
> quick and small suite should not take a long time. If you have
> something, that could be useful once I will create some kind of template
> for performance tests.

I've attached an example script I use to run a file creation
micro-benchmark.

What is important here is that once the files are created, I then
run several more performance tests on the filesystem - xfs_repair
performance, bulkstat performance, find and ls -R performance, and
finally unlink performance.

So it's really 5 or 6 tests in one. We are going to need to be able
to support such "sub-test" categories so that we don't waste lots of
time having to create filesystem pre-conditions for various
micro-benchmarks. Any ideas on how we could group tests like this
so they are run sequentially as a group, but also can be run
individually and correctly invking the setup test if the filesystem
is not in the correct state?

> > > What I'm not sure is how a comparison between different versions could
> > > be done, because I don't see any infrastructure within fstests for
> > > cross-version comparison. (What would it do with regression tests
> > > anyway...) So I wonder if it should be done in this set at all. So the
> > > set would only print the measured values. Some other tool (which can be
> > > also included, but is not directly part of the performance tests set)
> > > could then be used to compare and/or plot graphs.
> > 
> > I don't think that storing results long term or comparing results is
> > something xfstests should directly care about. It is architected to
> > defer that to some external tool for post processing.  i.e. xfstests
> > is used to run the tests and generate results, not do long term
> > storage or analysis of those results.
> > 
> > I see no issues with including scripts to do result processing
> > across multiple RESULT_DIRs within xfstests itself, but the
> > infrastructure still has to be architected so it can be externally
> > controllable and usable by external test infrastructure.
> 
> I expected something like this, so it shouldn't be a big trouble. What I
> see as a good way: at first to create some small tests. Then, once they
> works as intended, I can work on the external tool for managing the
> results, rather than at first creating the tool. That will also give me
> more time to find some good solution. (What I see, there is already some
> work with autotest running xfstest, so maybe it will needs just a little
> work to add the new tests.)

Yes, that seems like the sensible approach to take.

FWIW, I'm pretty sure most developers run xfstests directly, so I'd
concentrate on making reporting work well for this case first, then
concentrate on what extra functionality external harnesses like
autotest require....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


[-- Attachment #2: fsmark-50-test-xfs.sh --]
[-- Type: application/x-sh, Size: 2026 bytes --]

[-- Attachment #3: walk-scratch.sh --]
[-- Type: application/x-sh, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-27  0:47     ` Dave Chinner
@ 2014-09-29 16:07       ` Jan Tulak
  2014-10-25 17:10       ` Jan Tulak
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Tulak @ 2014-09-29 16:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, Lukas Czerner

On Sat, 2014-09-27 at 10:47 +1000, Dave Chinner wrote:
> On Thu, Sep 25, 2014 at 05:03:40PM +0200, Jan Tulak wrote:
> > On Thu, 2014-09-18 at 10:36 +1000, Dave Chinner wrote:
> > > On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> > > ...
> > This is not needed in the quick suite, but just testing simple
> > read/write will not find regression appearing during some more
> > complicated situations. If the only thing changed is the filesystem,
> > any big difference in results can be attributed to the filesystem
> > change. Right? 
> 
> No. A change in a filesystem can causing things like more context
> switches to occur due to additional serialisation on a sleeping
> lock. A change of context switch behaviour can expose issuing in
> other subsystems, like the scheduler or even bugs in the locking
> code. This happens more frequently than you think...
> 
> > I do not expect everyone will run this test suite all
> > day, but it could notice us about regressions between versions of a
> > filesystem.
> 
> It's rare that developers run tests directly comparing released
> versions of the kernel. We'll compare "unpatched vs patched" in
> back-to-back tests, so the tests we do run need to be cover a good
> portion of the performance matrix in a useful fashion....
> 

I can't argue about that. :-)

> > > > ...
> > > ...
> > 
> > The initial focus should really aim at this, I agree. Creating this
> > quick and small suite should not take a long time. If you have
> > something, that could be useful once I will create some kind of template
> > for performance tests.
> 
> I've attached an example script I use to run a file creation
> micro-benchmark.
> 
> What is important here is that once the files are created, I then
> run several more performance tests on the filesystem - xfs_repair
> performance, bulkstat performance, find and ls -R performance, and
> finally unlink performance.
> 
> So it's really 5 or 6 tests in one. We are going to need to be able
> to support such "sub-test" categories so that we don't waste lots of
> time having to create filesystem pre-conditions for various
> micro-benchmarks. Any ideas on how we could group tests like this
> so they are run sequentially as a group, but also can be run
> individually and correctly invking the setup test if the filesystem
> is not in the correct state?

One idea I thought about and didn't see big problems in existing
infrastructure would be to add another level for tests. So there could
be something like tests/something/001/001. Not specifying the last level
would mean "every test in a sub-level". 

>From what I saw, that would need some smaller changes in ./check script
and a mechanism for the sub-levels to share their environment. The
mechanism could work as follows:
In a sub-level there would be an init script, invoked by the tests and
used for setting the filesystem and other things. After first run, it
would export a variable with its own file path/group name. On subsequent
tests, the variable would already have the specified value, so no
initialization would be done again, until another sub-level.

An alternative for the mechanism would be to really check if the
filesystem is in required state, but this seems to me like a test in a
test (in a test...) :-)

> 
> > > > ...
> > > > ...
> > > 
> > > ...
> > > 
> > 
> > I expected something like this, so it shouldn't be a big trouble. What I
> > see as a good way: at first to create some small tests. Then, once they
> > works as intended, I can work on the external tool for managing the
> > results, rather than at first creating the tool. That will also give me
> > more time to find some good solution. (What I see, there is already some
> > work with autotest running xfstest, so maybe it will needs just a little
> > work to add the new tests.)
> 
> Yes, that seems like the sensible approach to take.
> 
> FWIW, I'm pretty sure most developers run xfstests directly, so I'd
> concentrate on making reporting work well for this case first, then
> concentrate on what extra functionality external harnesses like
> autotest require....

Yes, exactly.

Cheers,
Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance testing
  2014-09-27  0:47     ` Dave Chinner
  2014-09-29 16:07       ` Jan Tulak
@ 2014-10-25 17:10       ` Jan Tulak
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Tulak @ 2014-10-25 17:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, Lukas Czerner

Hi,

I have done some work on the pre-conditions for tests to share some
setup recently (like making the FS almost full... I'm using word
"environment" as name for it). It is not yet fully implemented, but I
have found some things where I'm uncertain what solution to use.

So at first how it works:
There is a new file "environment" on the same level as a "group" file in
tests directory. This file has the same syntax as "group". For setting
up the environment for a test, there is also a file named
"environment-NAME", where NAME is the name of the setup from the first
file - something like a group name.

Tests are sorted by the environment name, so if they do not need to
reset the environment pre-conditions, the setup script is run just one.

Here is the unsure part: Currently I have the environments on same level
as groups, but that means that it can't be shared between (for example)
general/001 and xfs/001 tests. I can move it somewhere else, so either
full environments settings, or just the setup scripts can be shared
between the categories. Is there a strong reasoning why to choose one or
the other option? If not, I think I will keep the "environment" file on
same level as "group", but will share the setup scripts.

Also, because one test can be in multiple environments, the test can be
run multiple times. Because of this, I wonder if all the functionality
should be enabled explicitly, as a flag to ./check script. (No
environment preparations until requested). It is not changing anything
if the test is not explicitly placed into the "environment" file.

Thanks.

Cheers,
Jan

On Sat, 2014-09-27 at 10:47 +1000, Dave Chinner wrote:
> On Thu, Sep 25, 2014 at 05:03:40PM +0200, Jan Tulak wrote:
> > On Thu, 2014-09-18 at 10:36 +1000, Dave Chinner wrote:
> > > On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> > > > Hi,
> > > > 
> > > > I have began to work on some set of performance tests. I think it would
> > > > be useful to have some standard set, because as far as I know, there is
> > > > just little of performance testing and every of the few tests someone
> > > > does is unique. I want to propose my ideas before I start to really
> > > > write it, to fix possible complications.
> > > 
> > > Great idea, but I'm missing some context about your ultimate goal
> > > here: how is this performance testing going to be used?
> > > 
> > > My focus for xfstests is mainly for it to be useful to filesystem
> > > developers who are developing new features and fixing bugs, so my
> > > comments come from the point of view of "will it make my life as a
> > > filesystem developer easier?" rather than a "we have need a
> > > performance test suites" perspective.
> > 
> > I think there is a place for two kinds of these tests. One is a quick
> > suite that should not run for more than few minutes and can be used
> > whenever sending a patch (or on daily basis...) for a quick check, to
> > catch bad things as soon as possible. 
> > 
> > The other set would be for deeper and more complex and used rather
> > between versions than single patches. This one maybe could be
> > independent, but for consistency, I think it is better to keep both sets
> > together.
> 
> I'm not yet convinced, but keep talking ;)
> 
> > > > From the beginning there would be some basic test cases, like sync/async
> > > > read and write. Hopefully more natural cases, like a database server
> > > > would be added later.
> > > 
> > > IMO, if we do add performance tests to xfstests, then the focus
> > > would definitely need to be on performance regression tests, not
> > > "performance benchmark" (aka benchmarketing) tests. If you want
> > > "performance benchmarks" then openbenchmarking.org is probably a
> > > better place to start as that is what it is designed for and
> > > already has everything you've mentioned.
> > > 
> > > So I'll focus on performance regression testing. Performance
> > > regression testing involves a lot more than just "run benchmark,
> > > save and compare results". It's once the "compare results" phase
> > > says "regression found" that the functionality of the test really
> > > matters to the filesystem develper. i.e. the tests need to be useful
> > > for *analysis of the regression*.
> > > 
> > > Hence things like "database server benchmark" don't really belong in
> > > a performance regression test suite because they can't be used to
> > > isolate regressions. Further, they tend to be susceptible to changes
> > > in performance being caused by changes outside filesystem and
> > > storage layers. Hence they lead to wild goose chases more often than
> > > they point to a real filesystem or IO regression.
> > > 
> > This is not needed in the quick suite, but just testing simple
> > read/write will not find regression appearing during some more
> > complicated situations. If the only thing changed is the filesystem,
> > any big difference in results can be attributed to the filesystem
> > change. Right? 
> 
> No. A change in a filesystem can causing things like more context
> switches to occur due to additional serialisation on a sleeping
> lock. A change of context switch behaviour can expose issuing in
> other subsystems, like the scheduler or even bugs in the locking
> code. This happens more frequently than you think...
> 
> > I do not expect everyone will run this test suite all
> > day, but it could notice us about regressions between versions of a
> > filesystem.
> 
> It's rare that developers run tests directly comparing released
> versions of the kernel. We'll compare "unpatched vs patched" in
> back-to-back tests, so the tests we do run need to be cover a good
> portion of the performance matrix in a useful fashion....
> 
> > > > For the IO testing, I want to use FIO for the
> > > > specific workflow and eventually iozone for the basic synthetic tests.
> > > 
> > > I think that the initial focus for performance regression tests
> > > would need to be more on simple micro-benchmarks (e.g. read, write,
> > > create, remove, etc).  I'd much prefer to see simple, targeted
> > > benchmarks that are easily understood jus tby looking at the
> > > xfstests code. e.g.  a patchset made unlink go fast, but slowed down
> > > file create. Or that we sped up single threaded creates, but
> > > destroyed multithreaded create scalability. Or that we sped up small
> > > directories at the expense of large directories.  These things
> > > can all be measured individually (and quickly) and because they
> > > tend to measure a single aspect of filesystem performance they
> > > can be used directly for regression analysis.
> > > 
> > > Many of these sorts of tests can be written into the existing
> > > xfstests infrastructure without needing significant external
> > > dependencies - fio and fsmark cover most of the microbenchmarks that
> > > would be necessary. I already have quite a few scripts that I use to
> > > run fsmark tests that could easily be wrapped with xfstests
> > > templates....
> > 
> > The initial focus should really aim at this, I agree. Creating this
> > quick and small suite should not take a long time. If you have
> > something, that could be useful once I will create some kind of template
> > for performance tests.
> 
> I've attached an example script I use to run a file creation
> micro-benchmark.
> 
> What is important here is that once the files are created, I then
> run several more performance tests on the filesystem - xfs_repair
> performance, bulkstat performance, find and ls -R performance, and
> finally unlink performance.
> 
> So it's really 5 or 6 tests in one. We are going to need to be able
> to support such "sub-test" categories so that we don't waste lots of
> time having to create filesystem pre-conditions for various
> micro-benchmarks. Any ideas on how we could group tests like this
> so they are run sequentially as a group, but also can be run
> individually and correctly invking the setup test if the filesystem
> is not in the correct state?
> 
> > > > What I'm not sure is how a comparison between different versions could
> > > > be done, because I don't see any infrastructure within fstests for
> > > > cross-version comparison. (What would it do with regression tests
> > > > anyway...) So I wonder if it should be done in this set at all. So the
> > > > set would only print the measured values. Some other tool (which can be
> > > > also included, but is not directly part of the performance tests set)
> > > > could then be used to compare and/or plot graphs.
> > > 
> > > I don't think that storing results long term or comparing results is
> > > something xfstests should directly care about. It is architected to
> > > defer that to some external tool for post processing.  i.e. xfstests
> > > is used to run the tests and generate results, not do long term
> > > storage or analysis of those results.
> > > 
> > > I see no issues with including scripts to do result processing
> > > across multiple RESULT_DIRs within xfstests itself, but the
> > > infrastructure still has to be architected so it can be externally
> > > controllable and usable by external test infrastructure.
> > 
> > I expected something like this, so it shouldn't be a big trouble. What I
> > see as a good way: at first to create some small tests. Then, once they
> > works as intended, I can work on the external tool for managing the
> > results, rather than at first creating the tool. That will also give me
> > more time to find some good solution. (What I see, there is already some
> > work with autotest running xfstest, so maybe it will needs just a little
> > work to add the new tests.)
> 
> Yes, that seems like the sensible approach to take.
> 
> FWIW, I'm pretty sure most developers run xfstests directly, so I'd
> concentrate on making reporting work well for this case first, then
> concentrate on what extra functionality external harnesses like
> autotest require....
> 
> Cheers,
> 
> Dave.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-10-25 18:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-17  8:48 Performance testing Jan Tulak
2014-09-17  9:36 ` Dmitry Monakhov
2014-09-25 15:08   ` Jan Tulak
2014-09-18  0:36 ` Dave Chinner
2014-09-25 15:03   ` Jan Tulak
2014-09-27  0:47     ` Dave Chinner
2014-09-29 16:07       ` Jan Tulak
2014-10-25 17:10       ` Jan Tulak
     [not found] <467BE4C0.2020203@bull.net>
     [not found] ` <1182541578.9939.3.camel@localhost.localdomain>
     [not found]   ` <467C99F5.6060603@clusterfs.com>
     [not found]     ` <1182755567.4067.1.camel@localhost.localdomain>
     [not found]       ` <467FE8A5.4030508@bull.net>
     [not found]         ` <467FEAF7.7060902@clusterfs.com>
     [not found]           ` <4680CE9B.1030602@bull.net>
2007-06-26  9:48             ` performance testing Valerie Clement
2007-06-26 10:36               ` Girish Shilamkar
2007-06-26 11:03                 ` Valerie Clement

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.