Response testing on 2.6.0-test11 variants

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Response testing on 2.6.0-test11 variants
@ 2003-12-17 20:51 Bill Davidsen
  2003-12-18 13:33 ` William Lee Irwin III
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Davidsen @ 2003-12-17 20:51 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1202 bytes --]

These are some results of variants of 2.6.0-test11 with my
responsiveness test. The responsiveness test checks to see how well the
system runs a small process which uses minimal resources (think more or
df) after the system has been doing other things for a while, 20sec by
default. This gives a hint how well the system will respond if you let a
shell window sit and then type a command, or look at a web page for a
bit and page down.

Details are in the README attached, the values of interest are the ratio
of the response time with some given load to the response time with no
load. All data were taken on a fresh booted system in a single xterm
window, with results scrolling to the screen as well as a file.

The two ratios are the raw average of average time, and the ratio of all
points within one S.D. of the median point. The 2nd reduces the effect
of a few bad results and seems more typical. Feel free to use either or
roll your own.

I included the values for a 2.4.20 kernel as well, for reference. I did
rerun the tests, 2.6 kernels really behave like that.

Test machine is a P-II 350, 96MB RAM, two dog-slow drives, with the temp
space on one so old it won't do DMA.
-- 
bill davidsen

[-- Attachment #2: Type: TEXT/PLAIN, Size: 9038 bytes --]








                       T\bTh\bhe\be r\bre\bes\bsp\bp1\b1 r\bre\bes\bsp\bpo\bon\bns\bse\be b\bbe\ben\bnc\bch\bhm\bma\bar\brk\bk
                              _\bB_\bi_\bl_\bl _\bD_\ba_\bv_\bi_\bd_\bs_\be_\bn
                             _\bd_\ba_\bv_\bi_\bd_\bs_\be_\bn_\b@_\bt_\bm_\br_\b._\bc_\bo_\bm


       I\bIn\bnt\btr\bro\bod\bdu\buc\bct\bti\bio\bon\bn


       The  _\br_\be_\bs_\bp_\b1  benchmark is intended to measure system response
       to trivial interactions when under reproducible  loads.  The
       intent is to see how a system will respond to small requests
       such as _\bl_\bs or  uncovering  a  window  in  a  window  manager
       environment.  This will hopefully give some insight into how
       the systems "feels" under  load.  See  "how  it  works"  for
       details.


       U\bUs\bsi\bin\bng\bg t\bth\bhe\be b\bbe\ben\bnc\bch\bhm\bma\bar\brk\bk


       I  use  the  benchmark  to  compare Linux kernels and tuning
       parameters. I boot a system in single user mode with  memory
       limited  to  256MB,  and  run  the  benchmark. I capture the
       output by running with the _\bs_\bc_\br_\bi_\bp_\bt command, and I usually run
       with  the  -v (verbose) option. The output is formatted such
       that you can get the base results by
          grep '^ ' script.output
       since all optional output is non-blank in column one.


       The output data is the low and high response time (in  ms.),
       the  median and the average, the standard deviation, and the
       ratio of the average to the average of the noload  reference
       data.

       After  running  the  benchmark  you  might get a report like
       this:

         Starting 1 CPU run with 124 MB RAM, minimum 5 data points at 20 sec intervals

                              _____________ delay ms. ____________
                  Test        low       high    median     average     S.D.    ratio
                noload    178.743    233.056    184.435    192.526    0.023    1.000
            smallwrite    189.085    322.913    232.981    237.789    0.044    1.235
            largewrite    187.308   1997.804    243.612    576.542    0.688    2.995
               cpuload    178.544    259.127    179.017    196.635    0.035    1.021
             spawnload    178.450    258.380    178.761    194.615    0.036    1.011
              8ctx-mem    178.899   1571.854    182.922    463.700    0.620    2.409
              2ctx-mem    178.917   5611.114    185.104   1540.410    2.351    8.001







       resp1 response test                                   Page 1









       A\bAl\blt\bte\ber\brn\bna\bat\bti\biv\bve\be f\bfo\bor\brm\bma\bat\bt


       Newer versions of _\br_\be_\bs_\bp_\b1 produce a slightly different output,
       deleting  the  S.D. (still in output with the -v option) and
       adding the ratio of the 1sdavg of the test run  to  the  run
       with  no  load.  The 1sdavg is defined as the average of all
       points within one S.D. of the median, and  tends  to  ignore
       very large or small values which happen rarely.  Some people
       feel this is more useful.


         Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals

                              _____________ delay ms. ____________     ___ Ratio ___
                  Test        low       high    median     average      raw     S.D.
                noload    242.375    252.947    245.237    246.150    1.000    1.000
            smallwrite    331.007   1840.243    344.869    746.631    3.033    1.933
            largewrite    313.113   8067.703    922.303   2504.440   10.174    4.549
               cpuload    362.457    467.524    369.683    387.730    1.575    1.502
             spawnload    312.127    420.631    322.017    337.934    1.373    1.296
              8ctx-mem   4918.723  16471.092  12337.373  11201.385   45.506   57.219
              2ctx-mem  11951.979  19299.983  15779.539  15656.751   63.607   64.023



       W\bWh\bha\bat\bt d\bdo\boe\bes\bs i\bit\bt m\bme\bea\ban\bn?\b?


       Everyone can point to some value and say it is the "one real
       value"  which  shows  how  well  the configuration works. In
       truth you can look at the ratio  to  see  what  the  overall
       effect  is,  or  the  high if you want to avoid "worst case"
       issues, or the ratio of the median to the noload median,  or
       whatever  else  you  think  reflects  how  the configuration
       really feels. Bear in  mind  this  benchmark  is  trying  to
       identify just that, not best throughput or whatever else.


       There  are a few things you can always identify. First, if a
       configuration has a large spread  between  the  average  and
       median  it  will  feel  uneven, and second if the ratio is a
       very large value, and I've seen values in the hundreds, with
       that  type of load your computer will go forth and conjugate
       the verb "suck."











       resp1 response test                                   Page 2









       A\bAb\bbo\bou\but\bt t\bth\bhe\be t\bte\bes\bst\bts\bs


       These are the loads which are run  to  test  response.  Each
       generates demands on one or two resources.


          +\bo noload
            This  just runs the response test a few times to get an
            idea of how fast the system can  respond  when  it  has
            nothing  better  to do. Hopefully the process will stay
            in memory and only the CPU time for  the  system  calls
            will  be  evident.  In the real world that's not always
            the case, of course.


          +\bo smallwrite
            This allocates a buffer of 1MB and  writes  a  file  in
            buffer  size  writes.   The file size is five times the
            system  memory,  which  puts  some  pressure   on   the
            buffering  logic  as  well  as the storage performance.
            With the verbose option on the overall size of the data
            write will be reported, in case this is useful.


          +\bo largewrite
            This  is  just like smallwrite, except it uses a buffer
            which is (memsize-20)/2 MB in size. Does the o/s handle
            large  writes  better  than  small? Worse? Does the o/s
            actually swap pages  of  the  buffer  from  which  it's
            writing in order to create disk cache? Some kernels are
            hundreds of times worse than noload with this test.


          +\bo cpuload
            This just generates a  bunch  of  CPU-bound  processes,
            Ncpu+1  of  them, and they beat the CPU. By having more
            processes than CPUs and using floating  point  load,  I
            can damage CPU affinity and thrash the cache.


          +\bo spawnload
            This  repeatedly  forks  a  process which fork/exec's a
            shell, which runs  a  command.  This  creates  tons  of
            process creates and cleanups. With the verbose flag the
            number of loops  per  second  is  reported,  each  loop
            representing two or three forks and two execs.


          +\bo 8ctx-mem
            This  creates  Ncpu+2  processes  which  each  allocate
            memory such that the total  memory  is  about  120%  of
            physical.  Then  they  pass a token in a circle using a
            SysV message queue, causing  context  switching.  There


       resp1 response test                                   Page 3









            are  eight  trips  around the loop for each access to a
            new 2k page, and Ncpu tokens running around the circle.
            With  the  -v  option  the  child  processes report the
            circles per second rate,  which  might  mean  something
            about the IPC and context switching.


          +\bo 2ctx-mem
            This  is  just like the 8ctx-mem test, but runs through
            memory with a 1k stride, giving  a  new  page  accessed
            every  other  trip  of  the  token  around  the process
            circle. Very exciting on a big SMP machine if you  like
            big numbers coming out of vmstat.



       H\bHo\bow\bw i\bit\bt w\bwo\bor\brk\bks\bs


       The main program starts a reference load as a child process,
       then waits until the load "warms up," then enters a loop  of
       sleeping  and  doing  a  small interaction. This consists of
       scanning an array which is allocated at startup and  may  be
       swapped, and allocating and scanning an array which requires
       free pages from the system. The scans are  byte-by-byte,  so
       they use a small amount of CPU time, similar to small system
       commands.  The  difference  between  the  noload  and   load
       response time is reported in ms.




























       resp1 response test                                   Page 4



[-- Attachment #3: Type: TEXT/PLAIN, Size: 4124 bytes --]

2.4.20.out
  Starting 1 CPU run with 93 MB RAM, minimum 5 data points at 20 sec intervals
 
                       _____________ delay ms. ____________     ___ Ratio ___
           Test        low       high    median     average      raw     S.D.
         noload    229.901    282.459    233.232    243.299    1.000    1.000
     smallwrite    404.531   5640.602    458.907   1935.987    7.957    1.871
     largewrite    363.251  14025.380    397.479   4361.725   17.927    1.652
        cpuload    559.621    698.061    597.186    615.075    2.528    2.545
      spawnload    719.436    880.521    782.215    783.246    3.219    3.250
       8ctx-mem    583.971   8523.321    726.046   3272.639   13.451    2.827
       2ctx-mem    653.283   7927.200   1012.727   2675.856   10.998    5.837
2.6.0-test11.out
  Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals
 
                       _____________ delay ms. ____________     ___ Ratio ___
           Test        low       high    median     average      raw     S.D.
         noload    247.634    287.550    249.879    257.679    1.000    1.000
     smallwrite   1316.429   5721.928   2389.766   3220.307   12.497    8.452
     largewrite   2113.308   8078.336   4381.649   4257.728   16.523   13.199
        cpuload    274.599    482.308    279.741    332.361    1.290    1.178
      spawnload    272.836    402.664    279.448    312.193    1.212    1.104
       8ctx-mem   5308.450  18760.567  11352.899  11576.995   44.928   45.659
       2ctx-mem   8290.388  17804.397  16221.722  14460.405   56.118   63.957
2.6.0-test11-bk12.out
  Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals
 
                       _____________ delay ms. ____________     ___ Ratio ___
           Test        low       high    median     average      raw     S.D.
         noload    246.138    295.824    250.364    261.131    1.000    1.000
     smallwrite    283.588   7725.815   2287.066   3286.297   12.585    5.110
     largewrite   1026.081   5734.609   3395.009   3436.063   13.158   13.721
        cpuload    271.338    589.871    286.339    342.194    1.310    1.110
      spawnload    271.747    388.543    287.548    304.779    1.167    1.124
       8ctx-mem   4082.105  15076.516  11357.832  10469.028   40.091   51.557
       2ctx-mem  10019.499  15834.343  12848.258  12858.138   49.240   50.796
2.6.0-test11-wli-2.out
  Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals
 
                       _____________ delay ms. ____________     ___ Ratio ___
           Test        low       high    median     average      raw     S.D.
         noload    241.308    507.529    251.228    314.148    1.000    1.000
     smallwrite    819.019   4922.349   1273.123   2052.233    6.533    5.389
     largewrite   1781.760   9437.622   3838.493   4939.853   15.725   13.092
        cpuload    269.542    562.006    277.177    352.739    1.123    1.213
      spawnload    265.993    462.221    274.887    309.579    0.985    1.096
       8ctx-mem   7499.271  24126.753  16028.938  15420.571   49.087   68.186
       2ctx-mem  12701.278  39539.505  26427.740  25699.607   81.807  102.920
2.6.0-test11-mm1.out
  Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals
 
                       _____________ delay ms. ____________     ___ Ratio ___
           Test        low       high    median     average      raw     S.D.
         noload    248.326    407.161    253.513    287.390    1.000    1.000
     smallwrite    325.208  10172.099   3572.895   4920.817   17.122   14.015
     largewrite    241.369   4797.435   2816.604   2229.296    7.757   10.942
        cpuload    265.870    423.674    274.597    308.643    1.074    1.087
      spawnload    265.519    471.692    278.004    315.864    1.099    1.076
       8ctx-mem   4934.075   7982.318   6947.914   6705.179   23.331   27.765
       2ctx-mem   4532.178  18488.828   6557.832   8481.727   29.513   23.228

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Response testing on 2.6.0-test11 variants
  2003-12-17 20:51 Response testing on 2.6.0-test11 variants Bill Davidsen
@ 2003-12-18 13:33 ` William Lee Irwin III
  2003-12-18 15:43   ` Bill Davidsen
  0 siblings, 1 reply; 3+ messages in thread
From: William Lee Irwin III @ 2003-12-18 13:33 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Wed, Dec 17, 2003 at 03:51:30PM -0500, Bill Davidsen wrote:
>                     _____________ delay ms. ____________     ___ Ratio ___
>         Test        low       high    median     average      raw     S.D.
> v      noload    246.138    295.824    250.364    261.131    1.000    1.000
> w      noload    241.308    507.529    251.228    314.148    1.000    1.000
> v  smallwrite    283.588   7725.815   2287.066   3286.297   12.585    5.110
> w  smallwrite    819.019   4922.349   1273.123   2052.233    6.533    5.389
> v  largewrite   1026.081   5734.609   3395.009   3436.063   13.158   13.721
> w  largewrite   1781.760   9437.622   3838.493   4939.853   15.725   13.092
> v     cpuload    271.338    589.871    286.339    342.194    1.310    1.110
> w     cpuload    269.542    562.006    277.177    352.739    1.123    1.213
> v   spawnload    271.747    388.543    287.548    304.779    1.167    1.124
> w   spawnload    265.993    462.221    274.887    309.579    0.985    1.096
> v    8ctx-mem   4082.105  15076.516  11357.832  10469.028   40.091   51.557
> w    8ctx-mem   7499.271  24126.753  16028.938  15420.571   49.087   68.186
> v    2ctx-mem  10019.499  15834.343  12848.258  12858.138   49.240   50.796
> w    2ctx-mem  12701.278  39539.505  26427.740  25699.607   81.807  102.920

hrandoz has identified some recent degradations in -wli vs. prior -wli
versions; I'm going to try to track down the source of those in the near
future.

-- wli

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Response testing on 2.6.0-test11 variants
  2003-12-18 13:33 ` William Lee Irwin III
@ 2003-12-18 15:43   ` Bill Davidsen
  0 siblings, 0 replies; 3+ messages in thread
From: Bill Davidsen @ 2003-12-18 15:43 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

On Thu, 18 Dec 2003, William Lee Irwin III wrote:

> On Wed, Dec 17, 2003 at 03:51:30PM -0500, Bill Davidsen wrote:
> >                     _____________ delay ms. ____________     ___ Ratio ___
> >         Test        low       high    median     average      raw     S.D.
> > v      noload    246.138    295.824    250.364    261.131    1.000    1.000
> > w      noload    241.308    507.529    251.228    314.148    1.000    1.000
> > v  smallwrite    283.588   7725.815   2287.066   3286.297   12.585    5.110
> > w  smallwrite    819.019   4922.349   1273.123   2052.233    6.533    5.389
> > v  largewrite   1026.081   5734.609   3395.009   3436.063   13.158   13.721
> > w  largewrite   1781.760   9437.622   3838.493   4939.853   15.725   13.092
> > v     cpuload    271.338    589.871    286.339    342.194    1.310    1.110
> > w     cpuload    269.542    562.006    277.177    352.739    1.123    1.213
> > v   spawnload    271.747    388.543    287.548    304.779    1.167    1.124
> > w   spawnload    265.993    462.221    274.887    309.579    0.985    1.096
> > v    8ctx-mem   4082.105  15076.516  11357.832  10469.028   40.091   51.557
> > w    8ctx-mem   7499.271  24126.753  16028.938  15420.571   49.087   68.186
> > v    2ctx-mem  10019.499  15834.343  12848.258  12858.138   49.240   50.796
> > w    2ctx-mem  12701.278  39539.505  26427.740  25699.607   81.807  102.920
> 
> hrandoz has identified some recent degradations in -wli vs. prior -wli
> versions; I'm going to try to track down the source of those in the near
> future.

The number I find interesting is the better performance of 2.4 kernels.
The performance of the load generating programs is much better with 2.6,
but that's not what is of interest in this particular test.

I did one set of tests with the elevator options, I hope to do swappiness
next week. I'm trying to keep all the tests on the same machine so I can
compare old results.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-12-18 15:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-17 20:51 Response testing on 2.6.0-test11 variants Bill Davidsen
2003-12-18 13:33 ` William Lee Irwin III
2003-12-18 15:43   ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox