IO degradation in 2.4.17-pre2 vs. 2.4.16

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* IO degradation in 2.4.17-pre2 vs. 2.4.16
@ 2001-12-01 19:16 Jason Holmes
  2001-12-01 21:34 ` Andrew Morton
  2001-12-03 19:22 ` Marcelo Tosatti
  0 siblings, 2 replies; 11+ messages in thread
From: Jason Holmes @ 2001-12-01 19:16 UTC (permalink / raw)
  To: linux-kernel

I saw in a previous thread that the interactivity improvements in
2.4.17-pre2 had some adverse effect on IO throughput and since I was
already evaluating 2.4.16 for a somewhat large fileserving project, I
threw 2.4.17-pre2 on to see what has changed.  Throughput while serving
a large number of clients is important to me, so my tests have included
using dbench to try to see how things scale as clients increase.

2.4.16:

Throughput 116.098 MB/sec (NB=145.123 MB/sec  1160.98 MBit/sec)  1 procs
Throughput 206.604 MB/sec (NB=258.255 MB/sec  2066.04 MBit/sec)  2 procs
Throughput 210.364 MB/sec (NB=262.955 MB/sec  2103.64 MBit/sec)  4 procs
Throughput 213.397 MB/sec (NB=266.747 MB/sec  2133.97 MBit/sec)  8 procs
Throughput 210.989 MB/sec (NB=263.736 MB/sec  2109.89 MBit/sec)  16
procs
Throughput 138.713 MB/sec (NB=173.391 MB/sec  1387.13 MBit/sec)  32
procs
Throughput 117.729 MB/sec (NB=147.162 MB/sec  1177.29 MBit/sec)  64
procs
Throughput 66.7354 MB/sec (NB=83.4193 MB/sec  667.354 MBit/sec)  128
procs

2.4.17-pre2:

Throughput 96.2302 MB/sec (NB=120.288 MB/sec  962.302 MBit/sec)  1 procs
Throughput 226.679 MB/sec (NB=283.349 MB/sec  2266.79 MBit/sec)  2 procs
Throughput 223.955 MB/sec (NB=279.944 MB/sec  2239.55 MBit/sec)  4 procs
Throughput 224.533 MB/sec (NB=280.666 MB/sec  2245.33 MBit/sec)  8 procs
Throughput 153.672 MB/sec (NB=192.09 MB/sec  1536.72 MBit/sec)  16 procs
Throughput 91.3464 MB/sec (NB=114.183 MB/sec  913.464 MBit/sec)  32
procs
Throughput 64.876 MB/sec (NB=81.095 MB/sec  648.76 MBit/sec)  64 procs
Throughput 54.9774 MB/sec (NB=68.7217 MB/sec  549.774 MBit/sec)  128
procs

Throughput 136.522 MB/sec (NB=170.652 MB/sec  1365.22 MBit/sec)  1 procs
Throughput 223.682 MB/sec (NB=279.603 MB/sec  2236.82 MBit/sec)  2 procs
Throughput 222.806 MB/sec (NB=278.507 MB/sec  2228.06 MBit/sec)  4 procs
Throughput 224.427 MB/sec (NB=280.534 MB/sec  2244.27 MBit/sec)  8 procs
Throughput 152.286 MB/sec (NB=190.358 MB/sec  1522.86 MBit/sec)  16
procs
Throughput 92.044 MB/sec (NB=115.055 MB/sec  920.44 MBit/sec)  32 procs
Throughput 78.0881 MB/sec (NB=97.6101 MB/sec  780.881 MBit/sec)  64
procs
Throughput 66.1573 MB/sec (NB=82.6966 MB/sec  661.573 MBit/sec)  128
procs

Throughput 117.95 MB/sec (NB=147.438 MB/sec  1179.5 MBit/sec)  1 procs
Throughput 212.469 MB/sec (NB=265.586 MB/sec  2124.69 MBit/sec)  2 procs
Throughput 214.763 MB/sec (NB=268.453 MB/sec  2147.63 MBit/sec)  4 procs
Throughput 214.007 MB/sec (NB=267.509 MB/sec  2140.07 MBit/sec)  8 procs
Throughput 96.6572 MB/sec (NB=120.821 MB/sec  966.572 MBit/sec)  16
procs
Throughput 48.1342 MB/sec (NB=60.1677 MB/sec  481.342 MBit/sec)  32
procs
Throughput 71.3444 MB/sec (NB=89.1806 MB/sec  713.444 MBit/sec)  64
procs
Throughput 59.258 MB/sec (NB=74.0724 MB/sec  592.58 MBit/sec)  128 procs

I have included three runs for 2.4.17-pre2 to show how inconsistent its
results are; 2.4.16 didn't have this problem to this extent.  bonnie++
numbers seem largely unchanged between kernels, coming in around:

      ------Sequential Output------ --Sequential Input- --Random-
      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
 Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
2512M 14348  81 49495  26 24438  16 16040  96 55006  15 373.7   1
      ------Sequential Create------ --------Random Create--------
      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
   16  3087  99 +++++ +++ +++++ +++  3175 100 +++++ +++ 11042 100

The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
of RAM.  The above numbers were generated off of 1 10K RPM SCSI disk
hanging off of an Adaptec aix7899 controller.

--
Jason Holmes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-01 19:16 Jason Holmes
@ 2001-12-01 21:34 ` Andrew Morton
  2001-12-01 22:35   ` Jason Holmes
  2001-12-03 19:22 ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2001-12-01 21:34 UTC (permalink / raw)
  To: Jason Holmes; +Cc: linux-kernel

Jason Holmes wrote:
> 
> I saw in a previous thread that the interactivity improvements in
> 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> already evaluating 2.4.16 for a somewhat large fileserving project, I
> threw 2.4.17-pre2 on to see what has changed.  Throughput while serving
> a large number of clients is important to me, so my tests have included
> using dbench to try to see how things scale as clients increase.
> 
> 2.4.16:
> 
> ...
> Throughput 210.989 MB/sec (NB=263.736 MB/sec  2109.89 MBit/sec)  16 procs
> ...
> 
> 2.4.17-pre2:
> 
> ...
> Throughput 153.672 MB/sec (NB=192.09 MB/sec  1536.72 MBit/sec)  16 procs
> ...

This is expected, and tunable.

The thing about dbench is this:  it creates files and then it
quickly deletes them.  It is really, really important to understand
this!

If the kernel allows processes to fill all of memory with dirty
data and to *not* start IO on that data, then this really helps
dbench, because when the delete comes along, that data gets tossed
away and is never written.

If you have enough memory, an entire dbench run can be performed
and it will do no disk IO at all.

The 2.4.17-pre2 change meant that the kernel starts writeout of
dirty data earlier, and will cause the writer to block, to
prevent it from filling all memory with write() data.  This is
how the kernel is actually supposed to work, but it wasn't working
right, and the mistake benefitted dbench.  The net effect is that
a dbench run does a lot more IO.

If your normal operating workload creates files and does *not* 
delete them within a few seconds, then the -pre2 change won't
make much difference at all, as your bonnie++ figures show.

If your normal operating workloads _does_ involve very short-lived
files then you can optimise for that load by increasing the
kernel's dirty buffer thresholds:

mnm:/home/akpm> cat /proc/sys/vm/bdflush 
40      0       0       0       500     3000    60      0       0
^^                                              ^^
  nfract                                          nfract_sync

These two numbers are percentages.

nfract: percentage of physical memory at which a write()r will
start writeout.

nfract_sync: percentage of physical memory at which a write()r
will block on some writeout (writer throttling).

You'll find that running

	echo 80 0 0 0 500 3000 90 0 0  > /proc/sys/vm/bdflush

will boost your dbench throughput muchly.

dbench is a good stability and stress tester.  It is not a good
benchmark, and it is not representative of most real-world
workloads.

-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-01 21:34 ` Andrew Morton
@ 2001-12-01 22:35   ` Jason Holmes
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Holmes @ 2001-12-01 22:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> 
> Jason Holmes wrote:
> >
> > I saw in a previous thread that the interactivity improvements in
> > 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> > already evaluating 2.4.16 for a somewhat large fileserving project, I
> > threw 2.4.17-pre2 on to see what has changed.  Throughput while serving
> > a large number of clients is important to me, so my tests have included
> > using dbench to try to see how things scale as clients increase.
> >
> > 2.4.16:
> >
> > ...
> > Throughput 210.989 MB/sec (NB=263.736 MB/sec  2109.89 MBit/sec)  16 procs
> > ...
> >
> > 2.4.17-pre2:
> >
> > ...
> > Throughput 153.672 MB/sec (NB=192.09 MB/sec  1536.72 MBit/sec)  16 procs
> > ...
> 
> This is expected, and tunable.
> 
> The thing about dbench is this:  it creates files and then it
> quickly deletes them.  It is really, really important to understand
> this!
> 
> If the kernel allows processes to fill all of memory with dirty
> data and to *not* start IO on that data, then this really helps
> dbench, because when the delete comes along, that data gets tossed
> away and is never written.
> 
> If you have enough memory, an entire dbench run can be performed
> and it will do no disk IO at all.

Yeah, I was basically treating the lower process runs (<64) as in-memory
performance and the higher process runs as a mix (since, for example,
the 128 run deals with ~8 GB of data and I only have 1.25 GB of RAM).

> ...
>
> You'll find that running
> 
>         echo 80 0 0 0 500 3000 90 0 0  > /proc/sys/vm/bdflush
> 
> will boost your dbench throughput muchly.

Yeah, actually, I've been sorta "brute-forcing" the bdflush and
max-readahead space (or the part of it that I chose for a start) over
the past few days for bonnie++ and dbench.  The idea was to use these
quicker-running benchmarks to get an general idea of good values to use
and then zero in on the final values with longer, more real-world load. 
I was thinking that bonnie++ would at least give me an idea of
sequential read/write performance for files larger than RAM (one part of
the typical workload I see is moving large files out to multiple [32-64
or so] machines at the same time) and that dbench would give me an idea
of performance for many small read/write operations, both for cached and
on-disk data (another aspect of the workload I see is reading/writing
many small files from multiple machines, such as postprocessing the
results of some large computational run).  Oh, I don't think I actually
mentioned that I'm looking to tune fileservers here for medium-sized
(100-200 node) computational clusters and that in the end there will be
something much more powerful than a single SCSI disk in the backend.

FWIW, the top 10 bdflush/max-readahead combinations for dbench (sorted
by 128 processes) that I've seen so far are:

                             16        32        64       128
                       --------  --------  --------  --------
70-900-1000-90-2047     208.056   159.598   144.721   122.514
30-100-1000-50-127      113.829   101.820   110.699   120.017
70-500-1000-90-2047     209.547   150.172   142.556   115.979
30-300-1000-90-63       108.862   118.443   109.060   112.901
30-100-1000-50-63       113.904    96.648   113.969   112.021
50-700-1000-90-63       208.062   137.579   134.504   111.656
30-500-1000-50-255      111.955    97.373   115.360   111.004
30-100-1000-70-1023     115.110    99.823   122.720   110.016
70-300-1000-90-1023     220.096   169.194   160.025   109.753
70-700-1000-90-255      208.468   146.202   140.098   109.618

(with the numbers on the left being
nfract-interval-age_buffer-nfract_sync-max_readahead, the column entries
being the non-adjusted MB/s that dbench reports, and the columns being
the number of processes).  Unfortunately, these are a bit bunk because I
haven't run the tests enough times to average the results to remove
variance between runs.

If you have any suggestions on better ways than dbench to somewhat
quickly simulate performance for many clients hitting a fileserver at
the same time, I'd love to hear it.

Thanks,

--
Jason Holmes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-01 19:16 Jason Holmes
  2001-12-01 21:34 ` Andrew Morton
@ 2001-12-03 19:22 ` Marcelo Tosatti
  2001-12-03 23:32   ` Jason Holmes
  2001-12-11 22:37   ` Bill Davidsen
  1 sibling, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2001-12-03 19:22 UTC (permalink / raw)
  To: Jason Holmes; +Cc: linux-kernel



Jason,

Yes, throughtput-only tests will have their numbers degradated with the
change applied on 2.4.16-pre2.

The whole thing is just about tradeoffs: Interactivity vs throughtput.

I'm not going to destroy interactivity for end users to get beatiful
dbench numbers.

And about your clients: Don't you think they want some kind of
decent latency on their side? 

Anyway, thanks for your report! 

On Sat, 1 Dec 2001, Jason Holmes wrote:

> I saw in a previous thread that the interactivity improvements in
> 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> already evaluating 2.4.16 for a somewhat large fileserving project, I
> threw 2.4.17-pre2 on to see what has changed.  Throughput while serving
> a large number of clients is important to me, so my tests have included
> using dbench to try to see how things scale as clients increase.
> 
> 2.4.16:
> 
> Throughput 116.098 MB/sec (NB=145.123 MB/sec  1160.98 MBit/sec)  1 procs
> Throughput 206.604 MB/sec (NB=258.255 MB/sec  2066.04 MBit/sec)  2 procs
> Throughput 210.364 MB/sec (NB=262.955 MB/sec  2103.64 MBit/sec)  4 procs
> Throughput 213.397 MB/sec (NB=266.747 MB/sec  2133.97 MBit/sec)  8 procs
> Throughput 210.989 MB/sec (NB=263.736 MB/sec  2109.89 MBit/sec)  16
> procs
> Throughput 138.713 MB/sec (NB=173.391 MB/sec  1387.13 MBit/sec)  32
> procs
> Throughput 117.729 MB/sec (NB=147.162 MB/sec  1177.29 MBit/sec)  64
> procs
> Throughput 66.7354 MB/sec (NB=83.4193 MB/sec  667.354 MBit/sec)  128
> procs
> 
> 2.4.17-pre2:
> 
> Throughput 96.2302 MB/sec (NB=120.288 MB/sec  962.302 MBit/sec)  1 procs
> Throughput 226.679 MB/sec (NB=283.349 MB/sec  2266.79 MBit/sec)  2 procs
> Throughput 223.955 MB/sec (NB=279.944 MB/sec  2239.55 MBit/sec)  4 procs
> Throughput 224.533 MB/sec (NB=280.666 MB/sec  2245.33 MBit/sec)  8 procs
> Throughput 153.672 MB/sec (NB=192.09 MB/sec  1536.72 MBit/sec)  16 procs
> Throughput 91.3464 MB/sec (NB=114.183 MB/sec  913.464 MBit/sec)  32
> procs
> Throughput 64.876 MB/sec (NB=81.095 MB/sec  648.76 MBit/sec)  64 procs
> Throughput 54.9774 MB/sec (NB=68.7217 MB/sec  549.774 MBit/sec)  128
> procs
> 
> Throughput 136.522 MB/sec (NB=170.652 MB/sec  1365.22 MBit/sec)  1 procs
> Throughput 223.682 MB/sec (NB=279.603 MB/sec  2236.82 MBit/sec)  2 procs
> Throughput 222.806 MB/sec (NB=278.507 MB/sec  2228.06 MBit/sec)  4 procs
> Throughput 224.427 MB/sec (NB=280.534 MB/sec  2244.27 MBit/sec)  8 procs
> Throughput 152.286 MB/sec (NB=190.358 MB/sec  1522.86 MBit/sec)  16
> procs
> Throughput 92.044 MB/sec (NB=115.055 MB/sec  920.44 MBit/sec)  32 procs
> Throughput 78.0881 MB/sec (NB=97.6101 MB/sec  780.881 MBit/sec)  64
> procs
> Throughput 66.1573 MB/sec (NB=82.6966 MB/sec  661.573 MBit/sec)  128
> procs
> 
> Throughput 117.95 MB/sec (NB=147.438 MB/sec  1179.5 MBit/sec)  1 procs
> Throughput 212.469 MB/sec (NB=265.586 MB/sec  2124.69 MBit/sec)  2 procs
> Throughput 214.763 MB/sec (NB=268.453 MB/sec  2147.63 MBit/sec)  4 procs
> Throughput 214.007 MB/sec (NB=267.509 MB/sec  2140.07 MBit/sec)  8 procs
> Throughput 96.6572 MB/sec (NB=120.821 MB/sec  966.572 MBit/sec)  16
> procs
> Throughput 48.1342 MB/sec (NB=60.1677 MB/sec  481.342 MBit/sec)  32
> procs
> Throughput 71.3444 MB/sec (NB=89.1806 MB/sec  713.444 MBit/sec)  64
> procs
> Throughput 59.258 MB/sec (NB=74.0724 MB/sec  592.58 MBit/sec)  128 procs
> 
> I have included three runs for 2.4.17-pre2 to show how inconsistent its
> results are; 2.4.16 didn't have this problem to this extent.  bonnie++
> numbers seem largely unchanged between kernels, coming in around:
> 
>       ------Sequential Output------ --Sequential Input- --Random-
>       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>  Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> 2512M 14348  81 49495  26 24438  16 16040  96 55006  15 373.7   1
>       ------Sequential Create------ --------Random Create--------
>       -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>    16  3087  99 +++++ +++ +++++ +++  3175 100 +++++ +++ 11042 100
> 
> The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
> of RAM.  The above numbers were generated off of 1 10K RPM SCSI disk
> hanging off of an Adaptec aix7899 controller.
> 
> --
> Jason Holmes
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-03 19:22 ` Marcelo Tosatti
@ 2001-12-03 23:32   ` Jason Holmes
  2001-12-11 22:37   ` Bill Davidsen
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Holmes @ 2001-12-03 23:32 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

Sure, I wasn't protesting the patch or anything; I was just passing my
observations along.  I also couldn't care less about dbench numbers for
the sake of dbench numbers; I was just using it and other simple
benchmarks as stepping stones to try to figure out what effect bdflush
and max_readahead settings actually have on the way the system
performs.  After the simple benchmarks narrowed things down, I would've
run more exhaustive benchmarks, then some large MM5 runs (including
setup, takedown, post-processing into graphs, etc), enough Gaussian jobs
to create 200 GB or so of scratch files, a hundred or so BLAST jobs
against a centralized database, all or part of these at the same time,
etc, the typical stuff that I see running.  If I were to start out with
the real workload it'd take years.

The thing is, everywhere I read about tweaking filesystem performance
someone has some magic number to throw into bdflush.  There's never any
justification for it and it's 9 times out of 10 for a "server" system,
whatever that is.  Some recommendations are for values larger than
fs/buffer.c allows, some are wacko recommending 100/100 for
nfract/nfract_sync, some want 5000 or 6000 for nfract_sync, which seems
somehow wrong for a percentage (perhaps older kernels didn't have a
percentage there or something).  There are even different bdflush
numbers between 2.4.13-pre2, 2.4.17, and 2.4.17-pre1aa1.  I was just
looking for a way to profile the way the different settings affect
system performance under a variety of conditions and dbench seemed like
a way to get the 'many clients / many small files' aspect of it all. 
Who knows, maybe the default numbers are the best compromise or maybe
the continuing vm tweaks will make any results from a previous kernel
invalid for a current kernel or maybe the bdflush tweaking isn't really
worth it at all and I'm better off getting on with mucking about with
larger hardware and parallel filesystems.  At least I learned that I
really do want a larger max_readahead number.

As for interactivity, if the changes have any effect on the number of
"NFS server blah not responding" messages I get, I'll be more than
happy.

Thanks,

--
Jason Holmes

Marcelo Tosatti wrote:
> 
> Jason,
> 
> Yes, throughtput-only tests will have their numbers degradated with the
> change applied on 2.4.16-pre2.
> 
> The whole thing is just about tradeoffs: Interactivity vs throughtput.
> 
> I'm not going to destroy interactivity for end users to get beatiful
> dbench numbers.
> 
> And about your clients: Don't you think they want some kind of
> decent latency on their side?
> 
> Anyway, thanks for your report!
> 
> On Sat, 1 Dec 2001, Jason Holmes wrote:
> 
> > I saw in a previous thread that the interactivity improvements in
> > 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> > already evaluating 2.4.16 for a somewhat large fileserving project, I
> > threw 2.4.17-pre2 on to see what has changed.  Throughput while serving
> > a large number of clients is important to me, so my tests have included
> > using dbench to try to see how things scale as clients increase.
> >
> > 2.4.16:
> >
> > Throughput 116.098 MB/sec (NB=145.123 MB/sec  1160.98 MBit/sec)  1 procs
> > Throughput 206.604 MB/sec (NB=258.255 MB/sec  2066.04 MBit/sec)  2 procs
> > Throughput 210.364 MB/sec (NB=262.955 MB/sec  2103.64 MBit/sec)  4 procs
> > Throughput 213.397 MB/sec (NB=266.747 MB/sec  2133.97 MBit/sec)  8 procs
> > Throughput 210.989 MB/sec (NB=263.736 MB/sec  2109.89 MBit/sec)  16
> > procs
> > Throughput 138.713 MB/sec (NB=173.391 MB/sec  1387.13 MBit/sec)  32
> > procs
> > Throughput 117.729 MB/sec (NB=147.162 MB/sec  1177.29 MBit/sec)  64
> > procs
> > Throughput 66.7354 MB/sec (NB=83.4193 MB/sec  667.354 MBit/sec)  128
> > procs
> >
> > 2.4.17-pre2:
> >
> > Throughput 96.2302 MB/sec (NB=120.288 MB/sec  962.302 MBit/sec)  1 procs
> > Throughput 226.679 MB/sec (NB=283.349 MB/sec  2266.79 MBit/sec)  2 procs
> > Throughput 223.955 MB/sec (NB=279.944 MB/sec  2239.55 MBit/sec)  4 procs
> > Throughput 224.533 MB/sec (NB=280.666 MB/sec  2245.33 MBit/sec)  8 procs
> > Throughput 153.672 MB/sec (NB=192.09 MB/sec  1536.72 MBit/sec)  16 procs
> > Throughput 91.3464 MB/sec (NB=114.183 MB/sec  913.464 MBit/sec)  32
> > procs
> > Throughput 64.876 MB/sec (NB=81.095 MB/sec  648.76 MBit/sec)  64 procs
> > Throughput 54.9774 MB/sec (NB=68.7217 MB/sec  549.774 MBit/sec)  128
> > procs
> >
> > Throughput 136.522 MB/sec (NB=170.652 MB/sec  1365.22 MBit/sec)  1 procs
> > Throughput 223.682 MB/sec (NB=279.603 MB/sec  2236.82 MBit/sec)  2 procs
> > Throughput 222.806 MB/sec (NB=278.507 MB/sec  2228.06 MBit/sec)  4 procs
> > Throughput 224.427 MB/sec (NB=280.534 MB/sec  2244.27 MBit/sec)  8 procs
> > Throughput 152.286 MB/sec (NB=190.358 MB/sec  1522.86 MBit/sec)  16
> > procs
> > Throughput 92.044 MB/sec (NB=115.055 MB/sec  920.44 MBit/sec)  32 procs
> > Throughput 78.0881 MB/sec (NB=97.6101 MB/sec  780.881 MBit/sec)  64
> > procs
> > Throughput 66.1573 MB/sec (NB=82.6966 MB/sec  661.573 MBit/sec)  128
> > procs
> >
> > Throughput 117.95 MB/sec (NB=147.438 MB/sec  1179.5 MBit/sec)  1 procs
> > Throughput 212.469 MB/sec (NB=265.586 MB/sec  2124.69 MBit/sec)  2 procs
> > Throughput 214.763 MB/sec (NB=268.453 MB/sec  2147.63 MBit/sec)  4 procs
> > Throughput 214.007 MB/sec (NB=267.509 MB/sec  2140.07 MBit/sec)  8 procs
> > Throughput 96.6572 MB/sec (NB=120.821 MB/sec  966.572 MBit/sec)  16
> > procs
> > Throughput 48.1342 MB/sec (NB=60.1677 MB/sec  481.342 MBit/sec)  32
> > procs
> > Throughput 71.3444 MB/sec (NB=89.1806 MB/sec  713.444 MBit/sec)  64
> > procs
> > Throughput 59.258 MB/sec (NB=74.0724 MB/sec  592.58 MBit/sec)  128 procs
> >
> > I have included three runs for 2.4.17-pre2 to show how inconsistent its
> > results are; 2.4.16 didn't have this problem to this extent.  bonnie++
> > numbers seem largely unchanged between kernels, coming in around:
> >
> >       ------Sequential Output------ --Sequential Input- --Random-
> >       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> >  Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> > 2512M 14348  81 49495  26 24438  16 16040  96 55006  15 373.7   1
> >       ------Sequential Create------ --------Random Create--------
> >       -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> > files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> >    16  3087  99 +++++ +++ +++++ +++  3175 100 +++++ +++ 11042 100
> >
> > The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
> > of RAM.  The above numbers were generated off of 1 10K RPM SCSI disk
> > hanging off of an Adaptec aix7899 controller.
> >
> > --
> > Jason Holmes
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-11 22:37   ` IO degradation in 2.4.17-pre2 vs. 2.4.16 Dan Maas
@ 2001-12-11 21:44     ` Marcelo Tosatti
  2001-12-11 23:14       ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2001-12-11 21:44 UTC (permalink / raw)
  To: Dan Maas; +Cc: Bill Davidsen, linux-kernel



On Tue, 11 Dec 2001, Dan Maas wrote:

> > > Yes, throughtput-only tests will have their numbers degradated with the
> > > change applied on 2.4.16-pre2.
> > > 
> > > The whole thing is just about tradeoffs: Interactivity vs throughtput.
> > > 
> > > I'm not going to destroy interactivity for end users to get beatiful
> > > dbench numbers.
> >
> > Latency is more of an issue for end user machines.
> 
> Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?

That would be the best thing to do, yes.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-03 19:22 ` Marcelo Tosatti
  2001-12-03 23:32   ` Jason Holmes
@ 2001-12-11 22:37   ` Bill Davidsen
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Davidsen @ 2001-12-11 22:37 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, 4 Dec 2001, Marcelo Tosatti wrote:

> Yes, throughtput-only tests will have their numbers degradated with the
> change applied on 2.4.16-pre2.
> 
> The whole thing is just about tradeoffs: Interactivity vs throughtput.
> 
> I'm not going to destroy interactivity for end users to get beatiful
> dbench numbers.
> 
> And about your clients: Don't you think they want some kind of
> decent latency on their side? 

It depends on the machine. For a server the thing you need to feed clients
is throughput. I don't see how feeding the data slower is going to be GOOD
for latency. Particularly servers which push a lot of data, like mail and
news or certain web sites, need to push it now.

Latency is more of an issue for end user machines.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
       [not found] ` <fa.jlqjvuv.348ign@ifi.uio.no>
@ 2001-12-11 22:37   ` Dan Maas
  2001-12-11 21:44     ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Maas @ 2001-12-11 22:37 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

> > Yes, throughtput-only tests will have their numbers degradated with the
> > change applied on 2.4.16-pre2.
> > 
> > The whole thing is just about tradeoffs: Interactivity vs throughtput.
> > 
> > I'm not going to destroy interactivity for end users to get beatiful
> > dbench numbers.
>
> Latency is more of an issue for end user machines.

Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?

Dan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-11 21:44     ` Marcelo Tosatti
@ 2001-12-11 23:14       ` Alan Cox
  2001-12-12  0:23         ` Andrew Morton
  2001-12-12  0:52         ` J Sloan
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Cox @ 2001-12-11 23:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Dan Maas, Bill Davidsen, linux-kernel

> > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> That would be the best thing to do, yes.

/proc/sys not CONFIG_..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-11 23:14       ` Alan Cox
@ 2001-12-12  0:23         ` Andrew Morton
  2001-12-12  0:52         ` J Sloan
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2001-12-12  0:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: Marcelo Tosatti, Dan Maas, Bill Davidsen, linux-kernel

Alan Cox wrote:
> 
> > > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> > That would be the best thing to do, yes.
> 
> /proc/sys not CONFIG_..

/proc/sys/vm/bdflush, to be precise.

I thought we discussed all this?

-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IO degradation in 2.4.17-pre2 vs. 2.4.16
  2001-12-11 23:14       ` Alan Cox
  2001-12-12  0:23         ` Andrew Morton
@ 2001-12-12  0:52         ` J Sloan
  1 sibling, 0 replies; 11+ messages in thread
From: J Sloan @ 2001-12-12  0:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Marcelo Tosatti, Dan Maas, Bill Davidsen, linux-kernel

Alan Cox wrote:

> > > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> > That would be the best thing to do, yes.
>
> /proc/sys not CONFIG_..

YES!

Much much preferable...

cu

jjs


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-12-12  0:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.n0jjs6v.7ms98a@ifi.uio.no>
     [not found] ` <fa.jlqjvuv.348ign@ifi.uio.no>
2001-12-11 22:37   ` IO degradation in 2.4.17-pre2 vs. 2.4.16 Dan Maas
2001-12-11 21:44     ` Marcelo Tosatti
2001-12-11 23:14       ` Alan Cox
2001-12-12  0:23         ` Andrew Morton
2001-12-12  0:52         ` J Sloan
2001-12-01 19:16 Jason Holmes
2001-12-01 21:34 ` Andrew Morton
2001-12-01 22:35   ` Jason Holmes
2001-12-03 19:22 ` Marcelo Tosatti
2001-12-03 23:32   ` Jason Holmes
2001-12-11 22:37   ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox