VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)
@ 2001-10-28 17:07 rwhron
  2001-10-29  0:47 ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: rwhron @ 2001-10-28 17:07 UTC (permalink / raw)
  To: linux-kernel, ltp-list


Summary:	2.4.14pre3aa2 gave oom errors not seen in 2.4.14pre3aa1.

Test:	Usual scripts to execute mtest01 and mmap001.  
	Listen to long mp3 with mp3blaster.

mtest01 -p 80 -w
================

2.4.14pre3aa1

Averages for 10 mtest01 runs
bytes allocated:                    1246232576
User time (seconds):                2.105
System time (seconds):              2.773
Elapsed (wall clock) time:          59.503
Percent of CPU this job got:        7.80
Major (requiring I/O) page faults:  132.8
Minor (reclaiming a frame) faults:  305043.1

2.4.14pre3aa2

Averages for 10 mtest01 runs
bytes allocated:                    1254201753
User time (seconds):                2.211
System time (seconds):              2.794
Elapsed (wall clock) time:          65.176
Percent of CPU this job got:        7.20
Major (requiring I/O) page faults:  129.7
Minor (reclaiming a frame) faults:  306988.9


mmap001 -m 500000
=================

This test worked on 2.4.14pre3aa1, but on 2.4.14pre3aa2, each
iteration was terminated by signal 9.  Also an irc client I
had running was killed.  

/var/log/kern.log had these messages:

Oct 28 11:50:24 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Oct 28 11:50:24 rushmore kernel: VM: killing process mmap001
Oct 28 11:51:07 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Oct 28 11:51:07 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Oct 28 11:51:08 rushmore last message repeated 3 times
Oct 28 11:51:08 rushmore kernel: VM: killing process bx
Oct 28 11:51:09 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Oct 28 11:51:12 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Oct 28 11:51:13 rushmore last message repeated 2 times
Oct 28 11:51:13 rushmore kernel: VM: killing process mmap001
Oct 28 11:51:47 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Oct 28 11:51:47 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Oct 28 11:51:47 rushmore kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Oct 28 11:51:47 rushmore kernel: VM: killing process mmap001


Hardware:
AMD Athlon 1333
512 MB RAM
1024 MB swap.

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)
  2001-10-28 17:07 VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1) rwhron
@ 2001-10-29  0:47 ` Andrea Arcangeli
  2001-10-29  2:45   ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2001-10-29  0:47 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list

On Sun, Oct 28, 2001 at 12:07:21PM -0500, rwhron@earthlink.net wrote:
> 
> Summary:	2.4.14pre3aa2 gave oom errors not seen in 2.4.14pre3aa1.
> 
> Test:	Usual scripts to execute mtest01 and mmap001.  
> 	Listen to long mp3 with mp3blaster.
> 
> mtest01 -p 80 -w
> ================
> 
> 2.4.14pre3aa1
> 
> Averages for 10 mtest01 runs
> bytes allocated:                    1246232576
> User time (seconds):                2.105
> System time (seconds):              2.773
> Elapsed (wall clock) time:          59.503
> Percent of CPU this job got:        7.80
> Major (requiring I/O) page faults:  132.8
> Minor (reclaiming a frame) faults:  305043.1
> 
> 2.4.14pre3aa2
> 
> Averages for 10 mtest01 runs
> bytes allocated:                    1254201753
> User time (seconds):                2.211
> System time (seconds):              2.794
> Elapsed (wall clock) time:          65.176
> Percent of CPU this job got:        7.20
> Major (requiring I/O) page faults:  129.7
> Minor (reclaiming a frame) faults:  306988.9

I'm looking into optimizing this test. While it probably doesn't affect
the above numbers given the bytes allocated are quite similar, the
benchmark is not reliable, if you want to use it as benchmark you should
apply this patch first to make sure to compare apples to apples (not to
oranges). For example, without those fixes it allocates only 20mbytese
of ram here so it cannot swapout despite I use -p 80, because it
considers only the freeswap and freememory but on any real load all the
free memory will be allocated in cache most of the time.

the bench in short measure how fast we can push stuff to disk (just the
swapout, no the swapins).

Index: mtest01.c
===================================================================
RCS file: /cvsroot/ltp/ltp/testcases/kernel/mem/mtest01/mtest01.c,v
retrieving revision 1.1
diff -u -r1.1 mtest01.c
--- mtest01.c	2001/08/27 22:15:12	1.1
+++ mtest01.c	2001/10/29 00:32:08
@@ -69,24 +69,9 @@
   }
 
   if(maxpercent) {
-    unsigned long int D, C;
     sysinfo(&sstats);
-    maxbytes = ((float)maxpercent/100)*(sstats.totalram+sstats.totalswap) - ((sstats.totalram+sstats.totalswap)-(sstats.freeram+sstats.freeswap));
-    /* Total memory needed to reach maxpercent */
-    D = ((float)maxpercent/100)*(sstats.totalram+sstats.totalswap);
-
-    /* Total memory already used */
-    C = (sstats.totalram+sstats.totalswap)-(sstats.freeram+sstats.freeswap);
-
-    /* Are we already using more than maxpercent? */
-    if(C>D) {
-      printf("More memory than the maximum amount you specified is already being used\n");
-      exit(1);
-    }
-
-    /* set maxbytes to the extra amount we want to allocate */
-    maxbytes = D-C;
-    printf("Filling up %d%%  of ram which is %lud bytes\n",maxpercent,maxbytes);
+    maxbytes = ((float)maxpercent/100)*(sstats.totalram+sstats.totalswap);
+    printf("Filling up %d of ram which is %lu bytes\n",maxpercent,maxbytes);
   }
 
   bytecount=chunksize;


For the mmap001 failures, they're due the max_mapped logic (the one that
was supposed to improve the swapout), I break the loop but I don't
consider I didn't scanned all the nr_inactive/vm_scan_ratio. they
triggers only with mmap001 because with mmap001 the lru gets filled by
mapped pages.

thanks for the feedback,

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)
  2001-10-29  0:47 ` Andrea Arcangeli
@ 2001-10-29  2:45   ` Andrea Arcangeli
  2001-10-29  3:29     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2001-10-29  2:45 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list, Linus Torvalds

On Mon, Oct 29, 2001 at 01:47:15AM +0100, Andrea Arcangeli wrote:
> I'm looking into optimizing this test. While it probably doesn't affect

Ok here it is some feedback from my part on this swap bandwith bench.

hardware: 4G of ram and 2G of swap, 4-way (provided by osdlab.org, thanks!)

2.4.14pre3aa3 (not yet released, should fix oom faliure with mmap001,
though didn't tested it yet)

started 3 swap-bench-pause that puts the machine into this state (4G allocated
and 450M swapped out, all tasks stopped).

 0  0  0 450012   4100    136   3512   0   0     0     0  113    30   0   0 100
 0  0  0 450012   4084    136   3512   0   0     0     0  117    70   0   0 100

now started a copy of 'mtest01 -b $[1024*1024*512] -w' that swapouts additional
512mbytes.

PASS ... 536870912 bytes allocated.

real    1m9.254s
user    0m9.240s
sys     0m2.410s

Repeat again the same (so kill the previous swap-bench-pause, start them
again and wait to return into the 450M swapped out and all ram in
anonoymous memory, and then finally run the mtest01 again). Ok ready again
to startup the benchmark (with my previous patch applied, thought it
doesn't matter as I have to use -b because on a 4G+2G machine the -p logic
overflows :)

 0  0  0 450564   4724    160   3324   0   0     0     0  104     6   0   0 100
 0  0  0 450564   4720    160   3324   0   0     0     0  103     6   0   0 100

andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w     
PASS ... 536870912 bytes allocated.

real    1m8.655s
user    0m9.370s
sys     0m2.250s
andrea@dev4-000:~> 

last lines of vmstat 1 while the benchmark finishes.

 0  1  1 932696   2720    156   3324   0 7424     4  7428  187   115   4   6  89
 0  1  1 941784   2816    156   3324   0 8732     0  8732  187   132   4   3  93
 0  1  1 949464   4840    156   3324   0 7648     0  7648  188   120   3   2  95
 0  1  1 956888   3584    156   3324   0 7612     0  7612  185   114   5   1  94
 0  1  1 964824   2816    156   3324   0 7776    16  7776  191   134   4   1  95
 0  1  0 972120   4736    156   3324   0 7512     0  7512  188   120   3   1  95
 0  0  0 975828 529852    160   3344   0 3728    28  3728  154    81   3   4  93
 0  0  0 975828 529852    160   3344   0   0     0     0  103     6   0   0 100

All run smoothly (not handy to play mp3 remotely [or better, not handy to check
they don't skip :], but vmstat beats are also an interactive feedback and they
were fine, no one skept).

So in short 1m and 9 seconds to swapout exactly 512m, with 4.5G worth of address
space mapped in memory. System time used is 2 seconds and user time is 9 seconds.

Now try again with vanilla 2.4.14pre3 with no a single patch applied:

started the three swap-bench-pause to swapout the 500mbytes and to keep the 4G of
ram in anon memory, very bad vmstat responsiveness, 2.4.14pre3aa3 never skept
a beat, here it hangs all the time for several seconds.

 3  0  0      0 682632   1844  15732   0   0     0   128  129    11   1  74  25
 3  0  0      0 356876   1844  15732   0   0     0     0  103     5   1  74  25
 3  0  0      0  30992   1844  15732   0   0     0     0  103     5   1  75  25
 3  0  1 183852   2920    132   7260   0 10328     0 10832  576  1924   0  53  47
 5  0  1 247024   3656    132   7272   0 85056     0 85016 2549  4044   0   4  96
 2  1  0 248668   2752    140   7260   0 8348     8  8360  280   328   0  14  86
 3  0  0 309060   2388    132   7268   0 72872     0 72848 2194  3532   0   6  94
 4  0  1 380220   3284    132   7264   0 68736     4 68740 2089  3842   0   7  93
 0  2  1 383428   2436    136   7260   0 14780     4 14780  448   473   0   7  93
 3  0  0 448324   2176    132   7264   0 59576     0 59576 1777  2819   0   6  94
 2  0  1 459588   4332    136   7256   0 2920     4  2920  243   417   0  30  70
 0  2  0 461124   2568    136   7244   0 5952     0  5952  189   197   0  11  89
 2  0  0 527044   2248    132   7244   0 70004     0 70004 2100  2796   0   5  95
 3  0  0 595140   2176    140   7244   0 72024     8 72028 2220  3682   0   7  93
 2  0  1 597828   3480    136   7252   0 10964     4 10964  292   283   0   1  99
 0  0  0 597948   3476    136   7244   0   0     0     0  104    89   0   7  93
 0  0  0 597948   3476    136   7244   0   0     0     0  103     6   0   0 100
 0  0  0 597948   3476    136   7244   0   0     0     0  103     6   0   0 100
 0  0  0 597948   3472    136   7244   0   0     0     0  103     8   0   0 100

now with 512G in swap and 4G in anon memory we're ready to start the real benchmark:

andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w
PASS ... 536870912 bytes allocated.

real    1m44.269s
user    0m9.050s
sys     0m6.420s
andrea@dev4-000:~> 

here the last vmstat lines during the bench:

 0  1  1 1051388   4064    136   7092   0 8132     0  8132  191   161   1   1  98
 1  0  0 1063548   2240    136   7092   0 3636     0  3636  144   285  12   5  83
 0  4  1 1100796   2592    136   7092   0 63572     0 63564 1795  2068   1   3  96
 0  1  1 1102972   3600    136   7092   0 8220     0  8220  186   162   5   3  92
 0  1  1 1105404   4112    136   7092   0 7760     0  7760  186   154   5   3  92
 0  1  1 1107964   3460    136   7092   0 8056     0  8056  186   156   6   2  93
 0  2  1 1133180   2564    136   7092   0 48904     0 48900 1284  1543   0   1  99
 0  0  0 1091836 529988    140   7096   0 2072     8  2072  145   147   4   7  89
 0  0  0 1091836 529980    140   7096   0   0     0     0  107     8   0   0 100
 0  0  0 1091836 529980    140   7096   0   0     0     0  103     6   0   0 100
 0  0  0 1091836 529980    140   7096   0   0     0     0  103     6   0   0 100

vmstat skips beat during the bench too.

Repeat again the whole thing. basically same results and same vmstat skips during
swapout activity:

andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w
PASS ... 536870912 bytes allocated.

real    1m39.406s
user    0m8.920s
sys     0m7.070s
andrea@dev4-000:~> 

So 2.4.14pre3aa3 takes 1m and 9s to swapout 512mbytes, while 2.4.14pre3
vanilla takes 1m and 40s, plus my tree doesn't skip a single beat during
the preparation of the benchmark and during the benchmark itself, while
mainline is hanging all the time for serveral seconds.  The system time
of vanilla 2.4.14pre3 is also 3 times bigger than in 2.4.14pre3aa3. So
I've no doubt my current tree is swapping out much faster than
2.4.14pre3 vanilla, at least on the large mem boxes (as said such
machine has 4G), and as far I can tell the offender for mainline is the
design to put anon pages in the lru that generates waste of cpu due
complexity problems.

Not sure why you got slower results with pre3aa1/pre3aa2 than with pre3
mainline, OTOH I also did some very interesting change in this aa3,
maybe aa2 and aa1 where slower because of other reasons (didn't
benchmarked them the above way).

And if pre3aa3 swapouts +44% faster than pre3 vanilla on a 4G box, I
expect it to swapout at least 80% faster on the 16G box by doing the
same test (with 16G in anon memory and 512m swapped out, then starting
the same bench, didn't tested it though).

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)
  2001-10-29  2:45   ` Andrea Arcangeli
@ 2001-10-29  3:29     ` Andrea Arcangeli
  2001-10-29  3:57       ` 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)] Andrea Arcangeli
  2001-10-29  4:24       ` VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1) Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2001-10-29  3:29 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list, Linus Torvalds

On Mon, Oct 29, 2001 at 03:45:46AM +0100, Andrea Arcangeli wrote:
> andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w     
> PASS ... 536870912 bytes allocated.
> 
> real    1m8.655s
> user    0m9.370s
> sys     0m2.250s
> andrea@dev4-000:~> 

new exciting result on exactly the same test (4Ganon mem +512m swap,
then started the bench):

andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w
PASS ... 536870912 bytes allocated.

real    0m40.473s
user    0m9.290s
sys     0m3.860s
andrea@dev4-000:~> 

(mainline takes 1m 40s, 1 minute more for the same thing)

I guess I cheated this time though :), see the _only_ change that I did to
speedup from 68/69 seconds to exactly 40 seconds:

--- 2.4.14pre3aa2/mm/page_io.c.~1~	Tue May  1 19:35:33 2001
+++ 2.4.14pre3aa2/mm/page_io.c	Mon Oct 29 03:58:23 2001
@@ -43,10 +43,12 @@
 	struct inode *swapf = 0;
 	int wait = 0;

+#if 0
 	/* Don't allow too many pending pages in flight.. */
 	if ((rw == WRITE) && atomic_read(&nr_async_pages) >
 			pager_daemon.swap_cluster * (1 << page_cluster))
 		wait = 1;
+#endif

 	if (rw == READ) {
 		ClearPageUptodate(page);
@@ -75,10 +77,12 @@
 	} else {
 		return 0;
 	}
+#if 0
  	if (!wait) {
  		SetPageDecrAfter(page);
  		atomic_inc(&nr_async_pages);
  	}
+#endif

  	/* block_size == PAGE_SIZE/zones_used */
  	brw_page(rw, page, dev, zones, block_size);

I found we were hurted by not being able to use the full I/O pipeline
like we do for writes.

Now it swapouts constantly and regularly at 12.8 Mbyte/sec (still
smooth), the write throttling happens at the PG_launder layer like for
MAP_SHARED.

hdparm -t on the swap partition says 27 Mbyte/sec but that's unreal, at
least with writes, a cp flood at max runs at 17mbyte/sec on such scsi
disk where we also swapout to.  Without the above change it swapouts at
7.5 Mbyte/sec instead of 12.8 Mbyte/sec.  12.8Mbyte/sec seems acceptable
also considering the pagetable walking etc... more costly than a stright
generic_file_write + balance_dirty.

I'm aware of the implications of the above, we may empty the pfmemalloc
pool, but that should mostly cause some sched_yields, it still runs
stable during this test at least. I'd prefer to fix any places to
sched_yield rather than running at 7.5 mbyte/sec.

But my strongest non-cheat (and backwards compatibility, no too risk in
2.4) is that we just don't use the nr_async_pages during pageout to disk
of the MAP_SHARED segments, so why should we use it for pageout of the
anonymous memory that doesn't even need to pass through the fs? (in most
setups with a proper swap partition) As far as MAP_SHARED is just correct,
page_io.c also shouldn't need it. comments?

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)]
  2001-10-29  3:29     ` Andrea Arcangeli
@ 2001-10-29  3:57       ` Andrea Arcangeli
  2001-10-30  0:10         ` rwhron
  2001-10-29  4:24       ` VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1) Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2001-10-29  3:57 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, ltp-list, Linus Torvalds

On Mon, Oct 29, 2001 at 04:29:38AM +0100, Andrea Arcangeli wrote:
> On Mon, Oct 29, 2001 at 03:45:46AM +0100, Andrea Arcangeli wrote:
> > andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w     
> > PASS ... 536870912 bytes allocated.
> > 
> > real    1m8.655s
> > user    0m9.370s
> > sys     0m2.250s
> > andrea@dev4-000:~> 
> 
> new exciting result on exactly the same test (4Ganon mem +512m swap,
> then started the bench):
> 
> andrea@dev4-000:~> time ./mtest01 -b $[1024*1024*512] -w
> PASS ... 536870912 bytes allocated.
> 
> real    0m40.473s
> user    0m9.290s
> sys     0m3.860s
> andrea@dev4-000:~> 
> 
> (mainline takes 1m 40s, 1 minute more for the same thing)
> 
> I guess I cheated this time though :), see the _only_ change that I did to
> speedup from 68/69 seconds to exactly 40 seconds:

I uploaded a new 2.4.14pre3aa3 patckit with this and the other changes
included (again, under the rule that if this is wrong, MAP_SHARED was
just broken in first place). so you may want to give it a spin and see
if it goes better now. Of course also verify that you are doing a
reliable benchmark by always using -b option, or apply my patch to fix
the breakage of the -p option of the benchmark. Of course background
load matters too, so no updatedb or netscape or whatever, just mp3blast
and the benchmark with -b or with fixed -p, so we make sure to compare
apples to apples :). thanks!

(this isn't very well tested [my desktop still runs 2.4.14pre3aa2], but the
4-way 4G+2G at osdlab is under heavy swapout load and it doesn't complain yet

 1  2  0 449880   4984    164   1644 7092 5760  7092  5760  371   528  25   2  73
 1  2  0 449696   4724    164   1644 6912 6528  6912  6528  356   505  25   2  74
 1  2  1 450656   5504    168   1644 6112 6912  6116  6912  329   452  25   2  73
 1  2  1 450340   5248    164   1644 6924 6400  6924  6404  362   511  25   1  74
 1  2  1 450248   5188    164   1644 6632 6400  6632  6400  350   492  25   1  74
 1  2  1 449988   4900    164   1644 6544 6144  6544  6144  349   485  25   1  74
 1  2  0 449340   4288    164   1644 6432 5600  6432  5600  345   479  25   2  73
 1  2  2 450296   5432    168   1644 5632 6432  5636  6432  321   432  25   1  74
 2  1  0 449924   4652    164   1644 6028 5360  6028  5364  330   440  19   8  73
 1  2  0 450276   5264    164   1644 6036 6400  6036  6400  339   476  25   1  74
 1  2  1 450744   5428    164   1644 6696 6928  6696  6928  356   496  25   1  74
 1  2  1 450328   5244    164   1644 6848 6268  6848  6268  353   514  25   2  73
 1  2  2 450156   4924    168   1644 6912 6652  6916  6652  358   510  25   1  74
 1  2  0 449876   4608    164   1644 6180 5640  6180  5644  337   464  25   1  74
 1  2  0 450152   4976    164   1644 6284 6516  6284  6516  338   468  25   2  73
 1  2  1 451188   5184    164   1648 6560 6624  6560  6624  347   474  25   1  74
 1  2  1 448620   3480    164   1644 6748 4780  6748  4780  353   481  25  18  57
 1  2  0 449520   4348    164   1644 5868 6528  5868  6528  330   450  25   0  75
 1  2  0 450460   5012    164   1644 6240 7016  6240  7016  343   459  25   1  74
 1  2  0 450524   5028    164   1644 6616 6552  6616  6552  353   484  25   1  74
 1  2  0 449928   4620    164   1644 6952 6136  6952  6136  365   511  25   2  73
 1  2  0 450036   4936    164   1644 6336 6408  6336  6408  341   460  25   3  72
 1  2  1 450560   5156    168   1644 6172 6400  6176  6400  335   466  25   1  74

again the sane 12/13mbyte/sec of bandwith, instead of the previous 7mbyte/sec)

oom mmap001 failures should be cured (untested though), as said the bug was
pretty obvious after I got the report, thanks!

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.14pre3aa3.bz2
	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.14pre3aa3/

Only in 2.4.14pre3aa2: 00_files_struct_rcu-2.4.10-04-3
Only in 2.4.14pre3aa3: 00_files_struct_rcu-2.4.10-04-4

	Fixed missing var initialization.

Only in 2.4.14pre3aa2: 10_vm-6
Only in 2.4.14pre3aa3: 10_vm-7
Only in 2.4.14pre3aa3: 10_vm-7.1

	Further vm changes, should fix mmap001 failures and improve
	swapout performance.

Only in 2.4.14pre3aa3: 52_u64-1

	Minor compile fix for uml.

BTW, you can still tweak the /proc/sys/vm/vm_* parameters. there's the
updated commentary in mm/vmscan.c. default values should be sane. As usual an
unit change isn't going to make relevant differences, those numbers doesn't
need to be perfect.

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)
  2001-10-29  3:29     ` Andrea Arcangeli
  2001-10-29  3:57       ` 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)] Andrea Arcangeli
@ 2001-10-29  4:24       ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2001-10-29  4:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: rwhron, linux-kernel, ltp-list

On Mon, 29 Oct 2001, Andrea Arcangeli wrote:
>
> I guess I cheated this time though :), see the _only_ change that I did to
> speedup from 68/69 seconds to exactly 40 seconds:

That's not really cheating - I think it's the right thing to do.

The whole "synchronous wait" thing is for historical reasons, and for VM's
that didn't throttle on their own. I actually think that _not_ waiting is
the right thing, because the new VM throttles when it thinks it needs to,
so other waiting is just going to hurt.

As long as interactive behaviour is fine under load, removing these hacks
is a _good_ thing, not a cheat.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)]
  2001-10-29  3:57       ` 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)] Andrea Arcangeli
@ 2001-10-30  0:10         ` rwhron
  0 siblings, 0 replies; 7+ messages in thread
From: rwhron @ 2001-10-30  0:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, ltp-list, Linus Torvalds

On Mon, Oct 29, 2001 at 04:57:47AM +0100, Andrea Arcangeli wrote:
> BTW, you can still tweak the /proc/sys/vm/vm_* parameters. there's the
> updated commentary in mm/vmscan.c. default values should be sane. As usual an
> unit change isn't going to make relevant differences, those numbers doesn't
> need to be perfect.
> 
> Andrea

I haven't tried the patched version of the LTP tests yet.
This is with 2.4.14-pre3aa4.

Summary:

mtest01
2.4.14-pre3	Elapsed (wall clock) time:          30.517
2.4.14-pre3aa2	Elapsed (wall clock) time:          65.176
2.4.14-pre3aa4	Elapsed (wall clock) time:          37.277

mmap001
2.4.14-pre3	Elapsed (wall clock seconds) time:  171.45
2.4.14-pre3aa2	terminated with signal 9 
2.4.14-pre3aa4	Elapsed (wall clock seconds) time:  170.47

2.4.14-pre3 had the best interactive feel and mp3 sound.

Test:	"mtest01 -p 80" -w and "mmap001 -m 500000"
	Play mp3 sampled at 128k with mp3blaster.
	Light bitchx use (2 sessions), lynx, 52k link to net.
	vmstat 8, iostat 10, no X.  (typical load for these tests).
	Change page-cluster from default (3) to 2.


2.4.14pre3aa4 page-cluster=3

mp3 played 275 seconds of 373 second run.

Averages for 10 mtest01 runs
bytes allocated:                    1247805440
User time (seconds):                2.105
System time (seconds):              2.893
Elapsed (wall clock) time:          37.277
Percent of CPU this job got:        12.80
Major (requiring I/O) page faults:  131.9
Minor (reclaiming a frame) faults:  305428.0

mp3 played 800 seconds of 850 second run.

Average for 5 mmap001 runs
bytes allocated:                    2048000000
User time (seconds):                19.502
System time (seconds):              14.312
Elapsed (wall clock seconds) time:  170.47
Percent of CPU this job got:        19.20
Major (requiring I/O) page faults:  500166.4
Minor (reclaiming a frame) faults:  44.2


2.4.14pre3aa4 page-cluster=2

mp3 played 295 seconds of 388 second run

Averages for 10 mtest01 runs
bytes allocated:                    1249692876
User time (seconds):                2.092
System time (seconds):              2.845
Elapsed (wall clock) time:          38.753
Percent of CPU this job got:        12.40
Major (requiring I/O) page faults:  131.9
Minor (reclaiming a frame) faults:  305888.8

mp3 played 810 seconds of 855 second run

Average for 5 mmap001 runs
bytes allocated:                    2048000000
User time (seconds):                19.420
System time (seconds):              14.878
Elapsed (wall clock seconds) time:  170.96
Percent of CPU this job got:        19.60
Major (requiring I/O) page faults:  500167.0
Minor (reclaiming a frame) faults:  41.2

Hardware:
AMD 1333 Athlon
512 MB RAM
1024 MB swap.

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-10-30  0:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-28 17:07 VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1) rwhron
2001-10-29  0:47 ` Andrea Arcangeli
2001-10-29  2:45   ` Andrea Arcangeli
2001-10-29  3:29     ` Andrea Arcangeli
2001-10-29  3:57       ` 2.4.14pre3aa3 [was Re: VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1)] Andrea Arcangeli
2001-10-30  0:10         ` rwhron
2001-10-29  4:24       ` VM test on 2.4.14pre3aa2 (compared to 2.4.14pre3aa1) Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox