VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
@ 2001-10-16 12:16 rwhron
  2001-10-17  0:12 ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: rwhron @ 2001-10-16 12:16 UTC (permalink / raw)
  To: linux-kernel, ltp-list


Summary:

Wall clock time for this test has dropped dramatically (which
is good) over the last 3 Andrea Arcangeli patched kernels.
mp3blaster sounds less pleasant though.


Test:

Run loop of 10 iterations of Linux Test Project's "mtest01 -p80 -w"
This test attempts to allocate 80% of virtual memory and write to
each page.  Simultaneously listen to mp3blaster.


Reboot before each test.

Hardware:
Athlon 1333
512 Mb RAM
1024 Mb swap


I've shown the last 2 results in a previous message.  But the
side by side comparison is pretty exciting.

2.4.13pre3aa1

Averages for 10 mtest01 runs
bytes allocated:                    1240045977
User time (seconds):                2.106
System time (seconds):              2.738
Elapsed (wall clock) time:          39.408
Percent of CPU this job got:        11.70
Major (requiring I/O) page faults:  110.0
Minor (reclaiming a frame) faults:  303527.4

2.4.13pre2aa1

Averages for 10 mtest01 runs
bytes allocated:                    1245184000
User time (seconds):                2.050
System time (seconds):              2.874
Elapsed (wall clock) time:          49.513
Percent of CPU this job got:        9.70
Major (requiring I/O) page faults:  115.6
Minor (reclaiming a frame) faults:  304781.9

2.4.12aa1

Averages for 10 mtest01 runs
bytes allocated:                    1253362892
User time (seconds):                2.099
System time (seconds):              2.823
Elapsed (wall clock) time:          64.109
Percent of CPU this job got:        7.50
Major (requiring I/O) page faults:  135.2
Minor (reclaiming a frame) faults:  306779.8


The rest of the results below are just from 2.4.13pre3aa1.

mtest01 passes each time with the expect 1.2 gigabytes of
memory allocated:

PASS ... 1215299584 bytes allocated.
PASS ... 1242562560 bytes allocated.
PASS ... 1240465408 bytes allocated.
PASS ... 1241513984 bytes allocated.
PASS ... 1244659712 bytes allocated.
PASS ... 1241513984 bytes allocated.
PASS ... 1245708288 bytes allocated.
PASS ... 1242562560 bytes allocated.
PASS ... 1243611136 bytes allocated.
PASS ... 1242562560 bytes allocated.

mp3blaster is much less pleasant as the wall clock time for VM improves.  
With 2.4.13pre3aa1, mp3blaster stutters through almost the entire run.  
The last 3-4 seconds of each iteration sound good though.  (highest vmstat 
swpd value and the next 2 low values).  This "sounds good" may actually be
the first 3-4 seconds of the test. 


vmstat 1 output for 1 iteration:

vmstat output starts towards the end of one iteration, goes through a complete cycle,
then the beginning of another.

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  6  1 685252   1548   1188   1136  72 23192   380 23192  419   310   3   8  89
 0  6  1 707740   1648   1196   1140  44 20532   412 20552  368   335   5   9  86
 0  4  0 725628   3624   1176   1152  32 20264   512 20264  343   300   2  10  88

 mp3blaster sounds good

 1  4  0 738192   3312   1216   1908 516 11276  1528 11276  467   435   3   4  93
 2  0  0  15928 387480   1264   3148 352   0  1632     0  477   686  19  24  56
 2  0  0  15756 122780   1280   3172   0   0    24    24  285   563  35  65   0

 mp3blaster stutters until the end of test iteration.

 3  3  0  47424   3788   1172   1412 860 40228   892 40236  789   819  12  23  66
 0  5  1  90244   1656   1184   1416 1032 39568  1076 39572  653   425   6   5  89
 1  3  0 129592   3744   1176   1416 236 40960   276 40988  588   432   5   8  87
 0  2  1 159260   3584   1172   1540 132 27676   300 27680  396   270   7   9  84
 0  5  1 187764   2572   1184   1416 312 29632   368 29636  534   448   5   7  88
 0  5  1 218844   1648   1176   1416 220 31268   256 31272  560   486   5   7  89
 0  2  1 242820   2548   1172   1416 124 24576   168 24600  419   376   3   8  89
 1  1  1 280660   3052   1176   1416  60 36352   116 36356  554   439   3  10  87
 0  3  1 325164   2036   1176   1416  40 44832    76 44836  586   467   4  10  86
 0  3  1 350204   1660   1172   1420  44 25824    88 25852  432   319   3  12  85
 1  2  1 396728   3564   1184   1416  72 45780   120 45784  637   528   3  12  86
 0  3  1 423816   3572   1180   1416  48 27020    80 27024  420   361   2  14  84
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  3  2 467284   1644   1200   1420  52 42816   100 42832  627   482   5  10  85
 0  3  1 490292   2040   1180   1420  32 23648    40 23656  344   242   6  12  82
 0  3  1 512764   1660   1172    956 292 23604   340 23604  426   273   5  14  81
 0  3  1 539844   2108   1184    968  56 26728   316 26728  463   338   3  11  86
 0  3  1 563852   2036   1184    976  56 23500   512 23500  440   357   3  11  86
 1  2  1 579656   1908   1172   1004 332 17356  1324 17380  411   352   3   8  89
 1  3  1 605720   1652   1184   1024  48 24656   516 24656  456   375   0  11  89
 0  6  1 627676   3176   1200   1028  64 21432   316 21436  386   283   3   9  88
 1  3  0 642980   3804   1180   1048  56 16376   888 16376  356   280   4   7  89
 2  3  1 661348   1776   1180   1064 312 18816   724 18860  390   340   2   9  89
 0  4  1 686888   2148   1184   1256  68 23992   848 23992  443   359   3   9  88
 1  4  1 705276   1896   1188   1116  44 19880   836 19880  431   331   2   6  92

 mp3blaster sounds good

 0  5  1 724676   1652   1192   1124 240 18388  1084 18388  371   336   2  13  85
 1  4  1  16348 491352   1220   1796 512 12108  1332 12132  393   403   2  11  87
 2  2  0  15872 489360   1264   3004 732   0  1984     0  472   691   2   3  95
 3  0  0  15692 266700   1284   3168 116   0   296     0  344   639  41  46  14

 mp3blaster begins to stutter again

 2  0  0  14604   4480   1284   3196 316   0   344     0  300   587  46  54   0
 0  4  0  32952   3572   1176   1464 372 21932   392 21944  393   313   9  12  79


vmstat 1 output from 2.4.12aa1 and 2.4.13pre2aa1 is in previous messages.  Subject
is something like VM Test on {kernel versions}.  Two separate tiny email threads.

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-16 12:16 VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1) rwhron
@ 2001-10-17  0:12 ` Andrea Arcangeli
  2001-10-17  1:32   ` Beau Kuiper
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2001-10-17  0:12 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list

On Tue, Oct 16, 2001 at 08:16:39AM -0400, rwhron@earthlink.net wrote:
> 
> Summary:
> 
> Wall clock time for this test has dropped dramatically (which
> is good) over the last 3 Andrea Arcangeli patched kernels.

:) I worked the last two days to make it faster under swap, it's nice to
see that your tests also confirm that.  I'm only scared it swaps too
much when swap is not used but if this sorts out to be the case it will
be very easy to fix. And a very minor bit of very seldom background
pagetable scanning shouldn't hurt anyways. So far on my desktop it seems
not to swap too much.

> mp3blaster sounds less pleasant though.

A (very) optimistic theory could be that the increase of the swap
throughput is decreasing the bandiwth available to read the mp3 8). Do
you swap on the same physical disk where you keep the mp3? But it maybe
that I'm blocking too easily waiting for I/O completion instead, or that
the mp3blast routines needed for the playback are been swapped out,
dunno with only this info. You can rule out the "mp3blast is been
swapped out" by running mp3blast after an mlockall. And you can avoid
the disk bandwith problems by putting the mp3 in a separate disk.

So far I received very good feedback about 2.4.13pre3aa1 [also Luigi's
and Mario's problems gone away completly] (I'm also happy myself on my
machine). It may need further tuning but I'd hope it's only a matter of
changing some line of code.

>  3  3  0  47424   3788   1172   1412 860 40228   892 40236  789   819  12  23  66
>  0  5  1  90244   1656   1184   1416 1032 39568  1076 39572  653   425   6   5  89

those swapins could be due mp3blast that is getting swapped out
continously while it sleeps.  Not easy for the vm to understand it has
to stay in cache and it makes sense it gets swapped out faster, the
faster the swap rate is. Could you also make sure to run mp3blast with
-20 priority and the swap-hog at +19 priority just in case?

thanks for feedback!

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  0:12 ` Andrea Arcangeli
@ 2001-10-17  1:32   ` Beau Kuiper
  2001-10-17  2:09     ` Andrea Arcangeli
  2001-10-17  2:31   ` Andrea Arcangeli
  2001-10-17  3:59   ` rwhron
  2 siblings, 1 reply; 12+ messages in thread
From: Beau Kuiper @ 2001-10-17  1:32 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: rwhron, linux-kernel, ltp-list

On Wed, 17 Oct 2001, Andrea Arcangeli wrote:

> On Tue, Oct 16, 2001 at 08:16:39AM -0400, rwhron@earthlink.net wrote:
> >
> > Summary:
> >
> > Wall clock time for this test has dropped dramatically (which
> > is good) over the last 3 Andrea Arcangeli patched kernels.
>
> :) I worked the last two days to make it faster under swap, it's nice to
> see that your tests also confirm that.  I'm only scared it swaps too
> much when swap is not used but if this sorts out to be the case it will
> be very easy to fix. And a very minor bit of very seldom background
> pagetable scanning shouldn't hurt anyways. So far on my desktop it seems
> not to swap too much.

Swapping too much probably has a lot to do with a particular hard drive
and its performace. Is there any way of adding a configurable option (via
sysctl) to allow the adminstrators to tune how aggressively the kernel
swaps out data/vs throwing out the disk cache (so if it is set to
agressive, the kernel will try hard to make sure to use swap to free up
memory, or if it is set to conservative it will try to free disk cache (to
a limit) instead of swapping stuff out to free memory)

Beau Kuiper
kuib-kl@ljbc.wa.edu.au

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  1:32   ` Beau Kuiper
@ 2001-10-17  2:09     ` Andrea Arcangeli
  2001-10-18 19:36       ` bill davidsen
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2001-10-17  2:09 UTC (permalink / raw)
  To: Beau Kuiper; +Cc: rwhron, linux-kernel, ltp-list

On Wed, Oct 17, 2001 at 09:32:12AM +0800, Beau Kuiper wrote:
> On Wed, 17 Oct 2001, Andrea Arcangeli wrote:
> 
> > On Tue, Oct 16, 2001 at 08:16:39AM -0400, rwhron@earthlink.net wrote:
> > >
> > > Summary:
> > >
> > > Wall clock time for this test has dropped dramatically (which
> > > is good) over the last 3 Andrea Arcangeli patched kernels.
> >
> > :) I worked the last two days to make it faster under swap, it's nice to
> > see that your tests also confirm that.  I'm only scared it swaps too
> > much when swap is not used but if this sorts out to be the case it will
> > be very easy to fix. And a very minor bit of very seldom background
> > pagetable scanning shouldn't hurt anyways. So far on my desktop it seems
> > not to swap too much.
> 
> Swapping too much probably has a lot to do with a particular hard drive
> and its performace. Is there any way of adding a configurable option (via
> sysctl) to allow the adminstrators to tune how aggressively the kernel
> swaps out data/vs throwing out the disk cache (so if it is set to
> agressive, the kernel will try hard to make sure to use swap to free up
> memory, or if it is set to conservative it will try to free disk cache (to
> a limit) instead of swapping stuff out to free memory)

I could add a sysctl to control that. In short the change consists of
making the DEF_PRIORITY in mm/vmscan.c a variable rather than a
preprocessor #define. That's the "ratio" number I was talking about in
the last email to Rik, and if you read ac/mm/vmscan.c you'll find it
there too indeed.

That's basically the only number that I left in the code, everything
else should be completly dynamic behaviour. Anyways also this number
isn't critical, as said it shouldn't make an huge difference anyways,
but yes it could be tunable.

However one of the reasons I didn't do that I still believe the vm
should be autotuning and provide behaviour with concepts, not with
random tweaking. But I cannot imagine at the moment how to make even
such fixed number to go away :), so at the moment it could make some
sense to make it a sysctl.

The probe of the cache that allows me to swapouts before we really
failed shrinking the cache doesn't sounds like random tweaking either to
me (maybe I'm biased 8), it instead allows to free memory and swapout at
the very same time, and this seems beneficial.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  0:12 ` Andrea Arcangeli
  2001-10-17  1:32   ` Beau Kuiper
@ 2001-10-17  2:31   ` Andrea Arcangeli
  2001-10-17  4:48     ` rwhron
  2001-10-17  3:59   ` rwhron
  2 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2001-10-17  2:31 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list

On Wed, Oct 17, 2001 at 02:12:42AM +0200, Andrea Arcangeli wrote:
> >  3  3  0  47424   3788   1172   1412 860 40228   892 40236  789   819  12  23  66
> >  0  5  1  90244   1656   1184   1416 1032 39568  1076 39572  653   425   6   5  89
> 
> those swapins could be due mp3blast that is getting swapped out
> continously while it sleeps.  Not easy for the vm to understand it has

I noticed that anotehr thing that changed between vanilla 2.4.13pre2 and
2.4.13pre3 is the setting of page_cluster on machine with lots of ram.

You'll now find the page_cluster set to 6, that means "1 << 6 << 12"
bytes will be paged in at each major fault, while previously only "1 <<
4 << 12" bytes were paged in.

So I'd suggest to try again after "echo 4 > /proc/sys/vm/page-cluster"
to see if it makes any difference.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  0:12 ` Andrea Arcangeli
  2001-10-17  1:32   ` Beau Kuiper
  2001-10-17  2:31   ` Andrea Arcangeli
@ 2001-10-17  3:59   ` rwhron
  2 siblings, 0 replies; 12+ messages in thread
From: rwhron @ 2001-10-17  3:59 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel, ltp-list

On Wed, Oct 17, 2001 at 02:12:42AM +0200, Andrea Arcangeli wrote:
> On Tue, Oct 16, 2001 at 08:16:39AM -0400, rwhron@earthlink.net wrote:
> > 
> > Wall clock time for this test has dropped dramatically (which
> > is good) over the last 3 Andrea Arcangeli patched kernels.
> 
> > mp3blaster sounds less pleasant though.
> 
> A (very) optimistic theory could be that the increase of the swap
> throughput is decreasing the bandiwth available to read the mp3 8). Do
> you swap on the same physical disk where you keep the mp3? But it maybe
> that I'm blocking too easily waiting for I/O completion instead, or that
> the mp3blast routines needed for the playback are been swapped out,

That theory makes sense.  2.4.13-pre3aa1 seems more aggressive at 
making memory (swap) available to memory (swap) hogs.  2.4.12aa1
would be agressive from swpd (small) to about 130000 on this machine.
2.4.13-pre2aa1 was aggressive until swpd around 280000 on this machine, 
and 2.4.13-pre3aa1 is aggressive as long as swap is needed.

I say "aggressive" based on when mp3blaster starts to sputter.

The mp3 is on the same disk as swap and everything else.  

> dunno with only this info. You can rule out the "mp3blast is been
> swapped out" by running mp3blast after an mlockall. And you can avoid
> the disk bandwith problems by putting the mp3 in a separate disk.

I didn't find a user mlockall program on freshmeat or icewalkers.


> >  3  3  0  47424   3788   1172   1412 860 40228   892 40236  789   819  12  23  66
> >  0  5  1  90244   1656   1184   1416 1032 39568  1076 39572  653   425   6   5  89
> 
> those swapins could be due mp3blast that is getting swapped out
> continously while it sleeps.  Not easy for the vm to understand it has
> to stay in cache and it makes sense it gets swapped out faster, the
> faster the swap rate is. Could you also make sure to run mp3blast with
> -20 priority and the swap-hog at +19 priority just in case?

I did 3 tests using "nice".  

1) nothing niced
2) mp3blaster not nice
3) mtest01 very nice, and mp3blaster not nice

mp3blaster uses about 11 seconds of CPU time to play a 3 minute mp3 on this machine.

Here is a bit of ps with mtest01 very nice, and mp3blaster un-nice

    F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD
  000 S 18008 15643    93  0  59 -20    -  7455 nanosl tty3     00:00:00 mp3blaster
  002 S 18008 15644 15643  0  59 -20    -  7455 do_pol tty3     00:00:00 mp3blaster
  002 S 18008 15645 15644  1  59 -20    -  7455 nanosl tty3     00:00:15 mp3blaster
  002 S 18008 15710 15644  0  59 -20    -  7455 end    tty3     00:00:00 mp3blaster
  004 S     0 15711    91  0  79  19    -   530 wait4  tty1     00:00:00 mmtest
  004 S     0 15714 15711  0  79  19    -   331 nanosl tty1     00:00:00 vmstat
  000 R 18008 15717    98  5  75   0    -   727 -      tty8     00:00:00 ps
  000 S     0 15718 15711  0  79  19    -   318 wait4  tty1     00:00:00 time
  000 R     0 15719 15718  0  79  19    -  4686 -      tty1     00:00:00 mtest01

Changing nice values didn't really have any affect on mp3blaster's sound quality.

mp3blaster not nice, mtest01 very nice

Averages for 10 mtest01 runs
bytes allocated:                    1238577971
User time (seconds):                2.062
System time (seconds):              2.715
Elapsed (wall clock) time:          40.606
Percent of CPU this job got:        11.50
Major (requiring I/O) page faults:  108.3
Minor (reclaiming a frame) faults:  303169.0

mp3blaster not nice

Averages for 10 mtest01 runs
bytes allocated:                    1221800755
User time (seconds):                2.059
System time (seconds):              2.697
Elapsed (wall clock) time:          37.597
Percent of CPU this job got:        12.10
Major (requiring I/O) page faults:  115.2
Minor (reclaiming a frame) faults:  299073.0

no nice processes

Averages for 10 mtest01 runs
bytes allocated:                    1240045977
User time (seconds):                2.106
System time (seconds):              2.738
Elapsed (wall clock) time:          39.408
Percent of CPU this job got:        11.70
Major (requiring I/O) page faults:  110.0
Minor (reclaiming a frame) faults:  303527.4

Note the total test time is around 400 seconds (wall clock * 10).
The mp3 would play just over 120 seconds by the time mtest01 completed
10 iterations.


I did a fourth run with strace -p 15645 (mp3blaster PID using most cpu time).

read(6, "\20Ks\303\303\222\236o\272\231\177\32\316\360\341\314z"..., 4096) = 4096
nanosleep({0, 200000}, NULL)            = 0     (5 calls to nanosleep)
time([1003288001])                      = 1003288001
nanosleep({0, 200000}, NULL)            = 0     (21 calls to nanosleep)
read(6, "\356$\365\274)\332\336\277c\375\356>+\234\307q\213\6\4"..., 4096) = 4096


When not running mtest01, strace is like this:

read(6, "\317W\234\311i\230\273\221\276J5\245\310A\251\226C?\202"..., 4096) = 4096
nanosleep({0, 200000}, NULL)            = 0     (4 calls to nanosleep)
time([1003287905])                      = 1003287905
nanosleep({0, 200000}, NULL)            = 0     (3 calls to nanosleep)
read(6, "$Q\17\357aL\264\301e\357S\370h{4\322L\246\344\273y\232"..., 4096) = 4096


Oddly, it appears there are more calls to nanosleep when mp3blaster is sputtering
(and fighting for i/o or memory?)


> thanks for feedback!
> 
> Andrea

My pleasure!

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  2:31   ` Andrea Arcangeli
@ 2001-10-17  4:48     ` rwhron
  2001-10-17 16:27       ` Linus Torvalds
  2001-10-18 19:45       ` Bill Davidsen
  0 siblings, 2 replies; 12+ messages in thread
From: rwhron @ 2001-10-17  4:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: rwhron, linux-kernel, ltp-list

On Wed, Oct 17, 2001 at 04:31:03AM +0200, Andrea Arcangeli wrote:
> I noticed that anotehr thing that changed between vanilla 2.4.13pre2 and
> 2.4.13pre3 is the setting of page_cluster on machine with lots of ram.
> 
> You'll now find the page_cluster set to 6, that means "1 << 6 << 12"
> bytes will be paged in at each major fault, while previously only "1 <<
> 4 << 12" bytes were paged in.
> 
> So I'd suggest to try again after "echo 4 > /proc/sys/vm/page-cluster"
> to see if it makes any difference.
> 
> Andrea

You Rule!

The tweak to page-cluster is basically magic for this test.

With page-cluster=4, the mp3blaster sputtered like 2.4.13pre2aa1.
Better, but not beautiful.

Real beauty happens with page-cluster=2.  There is virtually no sputter.  
And the wall clock time is a little better than 2.4.13pre2aa1!

I don't know what page-cluster size is best for everything, but 
2.4.12aa1 (which was very good IMHO), sputtered about 10 seconds per 
iteration, and each iteration took 64 seconds.

2.4.13pre3aa1 with no sputters:  48 seconds.

Amazing!

Also, interactive "feel" is much better too.  This test would
really brutalize keyboard response.  With 2.4.13-pre3aa1 and 
page-cluster=2, the box is still usable.  (for more than listening
to mp3's  :))

page-cluster = 6

Averages for 10 mtest01 runs
bytes allocated:                    1236166246
User time (seconds):                2.299
System time (seconds):              2.951
Elapsed (wall clock) time:          41.969
Percent of CPU this job got:        12.00
Major (requiring I/O) page faults:  113.5
Minor (reclaiming a frame) faults:  302580.3

page-cluster = 4

Averages for 10 mtest01 runs
bytes allocated:                    1237529395
User time (seconds):                2.097
System time (seconds):              2.788
Elapsed (wall clock) time:          49.394
Percent of CPU this job got:        9.50
Major (requiring I/O) page faults:  120.3
Minor (reclaiming a frame) faults:  302914.1

page-cluster = 2

Averages for 10 mtest01 runs
bytes allocated:                    1239521689
User time (seconds):                2.051
System time (seconds):              2.785
Elapsed (wall clock) time:          47.878
Percent of CPU this job got:        9.80
Major (requiring I/O) page faults:  114.0
Minor (reclaiming a frame) faults:  303399.7

The wall clock time went up somewhat from page-cluster=6. 
Here is where we were before:

2.4.13-pre2aa1

Averages for 10 mtest01 runs
bytes allocated:                    1245184000
User time (seconds):                2.050
System time (seconds):              2.874
Elapsed (wall clock) time:          49.513
Percent of CPU this job got:        9.70
Major (requiring I/O) page faults:  115.6
Minor (reclaiming a frame) faults:  304781.9

2.4.12aa1

Averages for 10 mtest01 runs
bytes allocated:                    1253362892
User time (seconds):                2.099
System time (seconds):              2.823
Elapsed (wall clock) time:          64.109
Percent of CPU this job got:        7.50
Major (requiring I/O) page faults:  135.2
Minor (reclaiming a frame) faults:  306779.8


-- 
Randy Hron


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17 16:27       ` Linus Torvalds
@ 2001-10-17 15:19         ` Marcelo Tosatti
  0 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2001-10-17 15:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On Wed, 17 Oct 2001, Linus Torvalds wrote:

> In article <20011017004839.A15996@earthlink.net>, <rwhron@earthlink.net> wrote:
> >> 
> >> So I'd suggest to try again after "echo 4 > /proc/sys/vm/page-cluster"
> >> to see if it makes any difference.
> >> 
> >> Andrea
> >
> >You Rule!
> >
> >The tweak to page-cluster is basically magic for this test.
> >
> >With page-cluster=4, the mp3blaster sputtered like 2.4.13pre2aa1.
> >Better, but not beautiful.
> >
> >Real beauty happens with page-cluster=2.  There is virtually no sputter.  
> >And the wall clock time is a little better than 2.4.13pre2aa1!
> 
> This is good information.
> 
> The problem is that "page-cluster" is actually used for two different
> things: it's used for mmap page-in clustering, and it's used for swap
> page-in clustering, and they probably have rather different behaviours.

Its also used to limit the number of on flight swapouts. That different
meaning thingie sucks: I would say we need to separate that :)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  4:48     ` rwhron
@ 2001-10-17 16:27       ` Linus Torvalds
  2001-10-17 15:19         ` Marcelo Tosatti
  2001-10-18 19:45       ` Bill Davidsen
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2001-10-17 16:27 UTC (permalink / raw)
  To: linux-kernel

In article <20011017004839.A15996@earthlink.net>, <rwhron@earthlink.net> wrote:
>> 
>> So I'd suggest to try again after "echo 4 > /proc/sys/vm/page-cluster"
>> to see if it makes any difference.
>> 
>> Andrea
>
>You Rule!
>
>The tweak to page-cluster is basically magic for this test.
>
>With page-cluster=4, the mp3blaster sputtered like 2.4.13pre2aa1.
>Better, but not beautiful.
>
>Real beauty happens with page-cluster=2.  There is virtually no sputter.  
>And the wall clock time is a little better than 2.4.13pre2aa1!

This is good information.

The problem is that "page-cluster" is actually used for two different
things: it's used for mmap page-in clustering, and it's used for swap
page-in clustering, and they probably have rather different behaviours.

Setting page-cluster to 2 means that both mmap and page-in will cluster
only four pages, which might slow down mmap throughput when not swapping
(and make program loading in particular slow down under disk load).  At
the same time it's probably perfectly fine for swapping - I think
Marcelo eventually wants to re-do the swapin read-clustering anyway. 

And wall-clock time apparently did decrease with page-clustering
lowered, although personally I like latency more than throughput so that
doesn't really bother me.

However, I'd really like to know whether it is mmap or swap clustering
that matters more, so it would be interesting to hear what happens if
you remove the "swapin_readahead(entry)" line in mm/memory.c (in
do_swap_page()).  Does a large page-cluster value still make matters
worse when it's disabled for swapping? (In other words: does
page-cluster actually hurt for mmap too, or is the problem strictly
related to swapping?)

Willing to test your load?

	Thanks,
		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  2:09     ` Andrea Arcangeli
@ 2001-10-18 19:36       ` bill davidsen
  2001-10-18 23:27         ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: bill davidsen @ 2001-10-18 19:36 UTC (permalink / raw)
  To: linux-kernel

In article <20011017040907.A2380@athlon.random> andrea@suse.de wrote:
>On Wed, Oct 17, 2001 at 09:32:12AM +0800, Beau Kuiper wrote:

>> Swapping too much probably has a lot to do with a particular hard drive
>> and its performace. Is there any way of adding a configurable option (via
>> sysctl) to allow the adminstrators to tune how aggressively the kernel
>> swaps out data/vs throwing out the disk cache (so if it is set to
>> agressive, the kernel will try hard to make sure to use swap to free up
>> memory, or if it is set to conservative it will try to free disk cache (to
>> a limit) instead of swapping stuff out to free memory)
>
>I could add a sysctl to control that. In short the change consists of
>making the DEF_PRIORITY in mm/vmscan.c a variable rather than a
>preprocessor #define. That's the "ratio" number I was talking about in
>the last email to Rik, and if you read ac/mm/vmscan.c you'll find it
>there too indeed.

I think that would give people a sense of control.

>That's basically the only number that I left in the code, everything
>else should be completly dynamic behaviour. Anyways also this number
>isn't critical, as said it shouldn't make an huge difference anyways,
>but yes it could be tunable.
>
>However one of the reasons I didn't do that I still believe the vm
>should be autotuning and provide behaviour with concepts, not with
>random tweaking. But I cannot imagine at the moment how to make even
>such fixed number to go away :), so at the moment it could make some
>sense to make it a sysctl.

  I thing it's desirable for VM to run well on autopilot for as many
cases as possible, because Linux is going to be used by some very
non-technical users. However, it is also strong in small machines, old
PCs, embedded uses, etc. These would benefit from tuning the ratio of
swap and buffer use, and also from being able to specify having a large
available page pool, for applications which suddenly need memory which
is a large percentage of physical memory.

| The probe of the cache that allows me to swapouts before we really
| failed shrinking the cache doesn't sounds like random tweaking either to
| me (maybe I'm biased 8), it instead allows to free memory and swapout at
| the very same time, and this seems beneficial.

  If it can work well in common cases for average users, and still allow
tuning by people who have special needs, I'm all for it. I'm sure you
understand the problems with self-tuning VM as well as anyone, I just
want to suggest that for uncommon situations you provide a way for
knowledgable users to handle special situations which need info not
available to the VM otherwise.

-- 
bill davidsen <davidsen@tmr.com>
  His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-17  4:48     ` rwhron
  2001-10-17 16:27       ` Linus Torvalds
@ 2001-10-18 19:45       ` Bill Davidsen
  1 sibling, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2001-10-18 19:45 UTC (permalink / raw)
  To: linux-kernel

In article <20011017004839.A15996@earthlink.net> rwhron@earthlink.net wrote:
>On Wed, Oct 17, 2001 at 04:31:03AM +0200, Andrea Arcangeli wrote:
>> I noticed that anotehr thing that changed between vanilla 2.4.13pre2 and
>> 2.4.13pre3 is the setting of page_cluster on machine with lots of ram.
>> 
>> You'll now find the page_cluster set to 6, that means "1 << 6 << 12"
>> bytes will be paged in at each major fault, while previously only "1 <<
>> 4 << 12" bytes were paged in.
>> 
>> So I'd suggest to try again after "echo 4 > /proc/sys/vm/page-cluster"
>> to see if it makes any difference.
>> 
>> Andrea
>
>You Rule!
>
>The tweak to page-cluster is basically magic for this test.

Out of curiousity, did you play with the 'preempt' patch at all?

-- 
bill davidsen <davidsen@tmr.com>
  His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1)
  2001-10-18 19:36       ` bill davidsen
@ 2001-10-18 23:27         ` Andrea Arcangeli
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2001-10-18 23:27 UTC (permalink / raw)
  To: bill davidsen; +Cc: linux-kernel

On Thu, Oct 18, 2001 at 03:36:21PM -0400, bill davidsen wrote:
> In article <20011017040907.A2380@athlon.random> andrea@suse.de wrote:
> >On Wed, Oct 17, 2001 at 09:32:12AM +0800, Beau Kuiper wrote:
> 
> >> Swapping too much probably has a lot to do with a particular hard drive
> >> and its performace. Is there any way of adding a configurable option (via
> >> sysctl) to allow the adminstrators to tune how aggressively the kernel
> >> swaps out data/vs throwing out the disk cache (so if it is set to
> >> agressive, the kernel will try hard to make sure to use swap to free up
> >> memory, or if it is set to conservative it will try to free disk cache (to
> >> a limit) instead of swapping stuff out to free memory)
> >
> >I could add a sysctl to control that. In short the change consists of
> >making the DEF_PRIORITY in mm/vmscan.c a variable rather than a
> >preprocessor #define. That's the "ratio" number I was talking about in
> >the last email to Rik, and if you read ac/mm/vmscan.c you'll find it
> >there too indeed.
> 
> I think that would give people a sense of control.

ok, I added three sysctl:

andrea@laser:/misc/andrea-athlon > ls /proc/sys/vm/vm_*             
/proc/sys/vm/vm_balance_ratio  /proc/sys/vm/vm_mapped_ratio /proc/sys/vm/vm_scan_ratio
andrea@laser:/misc/andrea-athlon > 

with some commentary in the sourcecode:

/*
 * The "vm_scan_ratio" is how much of the queues we will scan
 * in one go. A value of 6 for vm_scan_ratio implies that we'll
 * scan 1/6 of the inactive list during a normal aging round.
 */
int vm_scan_ratio = 8;

/*
 * The "vm_mapped_ratio" controls when to start early-paging, we probe
 * the inactive list during shrink_cache() and if there are too many
 * mapped unfreeable pages we have an indication that we'd better
 * start paging. The bigger vm_mapped_ratio is, the eaerlier the
 * machine will run into swapping activities.
 */
int vm_mapped_ratio = 32;

/*
 * The "vm_balance_ratio" controls the balance between active and
 * inactive cache. The bigger vm_balance_ratio is, the easier the
 * active cache will grow, because we'll rotate the active list
 * slowly. A value of 4 means we'll go towards a balance of
 * 1/5 of the cache being inactive.
 */
int vm_balance_ratio = 16;

I'm still testing though, so it's not guaranteed that the above will
remain the same :).

>   If it can work well in common cases for average users, and still allow
> tuning by people who have special needs, I'm all for it. I'm sure you
> understand the problems with self-tuning VM as well as anyone, I just
> want to suggest that for uncommon situations you provide a way for
> knowledgable users to handle special situations which need info not
> available to the VM otherwise.

ok. Another argument is that by making those sysctl tunable people can
test and report the best numbers for them in their workloads.
Those would be fixed numbers anyways, they're magics, they're not
perfect, they tend to do the right thing, and changing them slightly
isn't going to make a big difference if the machine has enough ram for
doing its work.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-10-18 23:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-16 12:16 VM test on 2.4.13-pre3aa1 (compared to 2.4.12-aa1 and 2.4.13-pre2aa1) rwhron
2001-10-17  0:12 ` Andrea Arcangeli
2001-10-17  1:32   ` Beau Kuiper
2001-10-17  2:09     ` Andrea Arcangeli
2001-10-18 19:36       ` bill davidsen
2001-10-18 23:27         ` Andrea Arcangeli
2001-10-17  2:31   ` Andrea Arcangeli
2001-10-17  4:48     ` rwhron
2001-10-17 16:27       ` Linus Torvalds
2001-10-17 15:19         ` Marcelo Tosatti
2001-10-18 19:45       ` Bill Davidsen
2001-10-17  3:59   ` rwhron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox