Re: O(1) scheduler gives big boost to tbench 192

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: O(1) scheduler gives big boost to tbench 192
@ 2002-05-06  8:20 rwhron
  2002-05-06 16:42 ` Andrea Arcangeli
  0 siblings, 1 reply; 24+ messages in thread
From: rwhron @ 2002-05-06  8:20 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel

> BTW, Randy, I seen my tree runs slower with tiobench, that's probably
> because I made the elevator anti-starvation logic more aggressive than
> mainline and the other kernel trees (to help interactive usage), could
> you try to run tiobench on -aa after elvtune -r 8192 -w 16384
> /dev/hd[abcd] to verify? Thanks for the great benchmarking effort.

I will have results on the big machine in a couple days.  On the 
small machine, elvtune increases tiobench sequential reads by
30-50%, and lowers worst case latency a little.

More -aa at:
http://home.earthlink.net/~rwhron/kernel/aa.html

> And for the reason fork is faster in -aa that's partly thanks to the
> reschedule-child-first logic, that can be easily merged in mainline,
> it's just in 2.5.

Is that part of parent_timeslice patch?  parent_timeslice helped 
fork a little when I tried to isolating patches to find what
makes fork faster in -aa.  It is more than one patch as far as 
I can tell.

On uniprocessor the unixbench execl test, all -aa kernel's going back 
at least to 2.4.15aa1 are about 20% faster than other trees, even those 
like jam and akpm's splitted vm.  Fork in -aa for more "real world" 
test (autoconf build) is about 8-10% over other kernel trees.

On quad Xeon, with bigger L2 cache, autoconf (fork test) the difference
between mainline and -aa is smaller.  The -aa based VMs in aa, jam, and
mainline have about 15% edge over rmap VM in ac and rmap.  jam has a
slight advantage for autoconf build, possibly because of O(1) effect
which is more likely to show up since more processes execute
on the 4 way box.

More quad Xeon at:
http://home.earthlink.net/~rwhron/kernel/bigbox.html

-- 
Randy Hron

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-06  8:20 O(1) scheduler gives big boost to tbench 192 rwhron
@ 2002-05-06 16:42 ` Andrea Arcangeli
  0 siblings, 0 replies; 24+ messages in thread
From: Andrea Arcangeli @ 2002-05-06 16:42 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

On Mon, May 06, 2002 at 04:20:05AM -0400, rwhron@earthlink.net wrote:
> > BTW, Randy, I seen my tree runs slower with tiobench, that's probably
> > because I made the elevator anti-starvation logic more aggressive than
> > mainline and the other kernel trees (to help interactive usage), could
> > you try to run tiobench on -aa after elvtune -r 8192 -w 16384
> > /dev/hd[abcd] to verify? Thanks for the great benchmarking effort.
> 
> I will have results on the big machine in a couple days.  On the 
> small machine, elvtune increases tiobench sequential reads by
> 30-50%, and lowers worst case latency a little.

ok, everything is fine then, thanks for the further benchmarks. Not sure
if I should increase the elvtune defaults, the max latency with 8
reading threads literally doubles (from a mean of 500/600 to 1200). OTOH
with 128 threads max latency even decreases (most probably because of
higher mean throughput).

> > And for the reason fork is faster in -aa that's partly thanks to the
> > reschedule-child-first logic, that can be easily merged in mainline,
> > it's just in 2.5.
> 
> Is that part of parent_timeslice patch?  parent_timeslice helped 

Yep.

> fork a little when I tried to isolating patches to find what
> makes fork faster in -aa.  It is more than one patch as far as 
> I can tell.
> 
> On uniprocessor the unixbench execl test, all -aa kernel's going back 
> at least to 2.4.15aa1 are about 20% faster than other trees, even those 
> like jam and akpm's splitted vm.  Fork in -aa for more "real world" 
> test (autoconf build) is about 8-10% over other kernel trees.
> 
> On quad Xeon, with bigger L2 cache, autoconf (fork test) the difference
> between mainline and -aa is smaller.  The -aa based VMs in aa, jam, and
> mainline have about 15% edge over rmap VM in ac and rmap.  jam has a
> slight advantage for autoconf build, possibly because of O(1) effect
> which is more likely to show up since more processes execute
> on the 4 way box.
> 
> More quad Xeon at:
> http://home.earthlink.net/~rwhron/kernel/bigbox.html
> 
> 
> -- 
> Randy Hron


Andrea

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
@ 2002-05-20 12:46 rwhron
  0 siblings, 0 replies; 24+ messages in thread
From: rwhron @ 2002-05-20 12:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: kravetz, jamagallon, rml

> On Tue, May 07, 2002 at 04:39:34PM -0700, Robert Love wrote:
> > It is just for pipes we previously used sync, no?

On Tue, 7 May 2002 16:48:57 -0700, Mike Kravetz wrote
> That's the only thing I know of that used it.

> I'd really like to know if there are any real workloads that
> benefited from this feature, rather than just some benchmark.
> I can do some research, but was hoping someone on this list
> might remember.  If there is a valid workload, I'll propose
> a patch.  

On Mon, 13 May 2002 02:06:31 +0200, J.A. Magallon wrote: 
> - Re-introduction of wake_up_sync to make pipes run fast again. No idea
> about this is useful or not, that is the point, to test it

2.4.19-pre8-jam2 showed slightly better performance on the quad Xeon
for most benchmarks with 25-wake_up_sync backed out.  However, it's
not clear to me 25-wake_up_sync was proper patch to backout for this
test, as there wasn't a dramatic change in Pipe latency or bandwidth
without it.  

There was a > 300% improvement lmbench Pipe bandwidth and latency 
comparing pre8-jam2 to pre7-jam6.  

Average of 25 lmbench runs on jam2 kernels, 12 on the others:
2.4.19-pre8-jam2-nowuos (backed out 25-wake_up_sync patch)

*Local* Communication latencies in microseconds - smaller is better
                                 AF     
kernel                   Pipe   UNIX   
-----------------------  -----  -----  
2.4.19-pre7-jam6         29.51  42.37  
2.4.19-pre8              10.73  29.94  
2.4.19-pre8-aa2          12.45  29.53  
2.4.19-pre8-ac1          35.39  45.59  
2.4.19-pre8-jam2          7.70  15.27  
2.4.19-pre8-jam2-nowuos   7.74  14.93  


*Local* Communication bandwidths in MB/s - bigger is better
                                   AF  
kernel                    Pipe    UNIX 
-----------------------  ------  ------
2.4.19-pre7-jam6          66.41  260.39
2.4.19-pre8              468.57  273.32
2.4.19-pre8-aa2          418.09  273.59
2.4.19-pre8-ac1          110.62  241.06
2.4.19-pre8-jam2         545.66  233.68
2.4.19-pre8-jam2-nowuos  544.57  246.53

The kernel build test, which applies patches through a pipe
and compiles with -pipe didn't reflect an improvement.

kernel                   average  min_time  max_time  runs  notes
2.4.19-pre7-jam6           237.0       235       239     3  All successful
2.4.19-pre8                239.7       238       241     3  All successful
2.4.19-pre8-aa2            237.7       237       238     3  All successful
2.4.19-pre8-ac1            239.3       238       241     3  All successful
2.4.19-pre8-jam2           240.0       238       241     3  All successful
2.4.19-pre8-jam2-nowuos    238.7       236       241     3  All successful

I don't know how much of the kernel build test is dependant on
pipe performance.  There is probably a better "real world"
measurement.  

On a single processor box, there was an improvement on kernel build
between pre7-jam6 and pre8-jam2.  That was only on one sample though.

Xeon page:
http://home.earthlink.net/~rwhron/kernel/bigbox.html

Latest on uniproc:
http://home.earthlink.net/~rwhron/kernel/latest.html

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
@ 2002-05-08 16:39 Bill Davidsen
  0 siblings, 0 replies; 24+ messages in thread
From: Bill Davidsen @ 2002-05-08 16:39 UTC (permalink / raw)
  To: Linux-Kernel Mailing List

Forgive me if you feel I've clipped too much from your posting, I'm trying
to capture the points made by various folks without responding to each
message.

---------- Forwarded message ----------
From: Mike Kravetz <kravetz@us.ibm.com>
Date: Tue, 7 May 2002 15:13:56 -0700

I have experimented with reintroducing '__wake_up_sync' support
into the O(1) scheduler.  The modifications are limited to the
'try_to_wake_up' routine as they were before.  If the 'synchronous'
flag is set, then 'try_to_wake_up' trys to put the awakened task
on the same runqueue as the caller without forcing a reschedule.
If the task is not already on a runqueue, this is easy.  If not,
we give up.  Results, restore previous bandwidth results.

BEFORE
------
Pipe latency:    6.5185 microseconds
Pipe bandwidth: 86.35 MB/sec

AFTER
-----
Pipe latency:     6.5723 microseconds
Pipe bandwidth: 540.13 MB/sec

---------- Forwarded message ----------
From: Andrea Arcangeli <andrea@suse.de>

So my hypothesis about the sync wakeup in the below email proven to be right:

	http://marc.theaimsgroup.com/?l=linux-kernel&m=102050009725367&w=2

Many thanks for verifying this.

Personally if the two tasks ends blocking waiting each other, then I
prefer them to be in the same cpu. That was the whole point of the
optimization. If the pipe buffer is large enough not to require reader
or writer to block, then we don't do the sync wakeup just now (there's a
detail with the reader that may block simply because the writer is slow
at writing, but it probably doesn't matter much). There are many cases
where a PAGE_SIZE of buffer gets filled in much less then a timeslice,
and for all those cases rescheduling the two tasks one after the other
in the same cpu is a win, just like the benchmark shows.  Think the
normal pipes we do from the shell, like a "| grep something", they are
very common and they all wants to be handled as a sync wakeups.  In
short when loads of data pass through the pipe with max bandwith, the
sync-wakeup is a definitive win. If the pipe never gets filled then the
writer never sync-wakeup, it just returns the write call asynchronously,
but of course the pipe doesn't get filled because it's not a
max-bandiwth scenario, and so the producer and the consumer are allowed
to scale in multiple cpus by the design of the workload.

Comments?

I would like if you could pass over your changes to the O(1) scheduler
to resurrect the sync-wakeup.

---------- Forwarded message ----------
From: Mike Kravetz <kravetz@us.ibm.com>
Date: Tue, 7 May 2002 15:43:22 -0700

I'm not sure if 'synchronous' is still being passed all the way
down to try_to_wake_up in your tree (since it was removed in 2.5).
This is based off a back port of O(1) to 2.4.18 that Robert Love
did.  The rest of try_to_wake_up (the normal/common path) remains
the same.

---------- Forwarded message ----------
From: Robert Love <rml@tech9.net>
Date: 07 May 2002 16:39:34 -0700

Hm, interesting.  When Ingo removed the sync variants of wake_up he did
it believing the load balancer would handle the case.  Apparently, at
least in this case, that assumption was wrong.

I agree with your earlier statement, though - this benchmark may be a
case where it shows up negatively but in general the balancing is
preferred.  I can think of plenty of workloads where that is the case. 
I also wonder if over time the load balancer would end up putting the
tasks on the same CPU.  That is something the quick pipe benchmark would
not show.

---------- Forwarded message ----------
From: Mike Kravetz <kravetz@us.ibm.com>
Date: Tue, 7 May 2002 16:48:57 -0700

On Tue, May 07, 2002 at 04:39:34PM -0700, Robert Love wrote:
> It is just for pipes we previously used sync, no?

That's the only thing I know of that used it.

I'd really like to know if there are any real workloads that
benefited from this feature, rather than just some benchmark.
I can do some research, but was hoping someone on this list
might remember.  If there is a valid workload, I'll propose
a patch.  However, I don't think we should be adding patches/
features just to help some benchmark that is unrelated to
real world use.

==== start original material ====

Got to change mailers...

Consider the command line:
  grep pattern huge_log_file | cut -f1-2,5,7 | sed 's/stuff/things/' |
  tee extract.tmp | less

Ideally I would like the pipes to run as fast as possible since I'm
waiting for results, using cache and one CPU where that is best, and using
all the CPUs needed if the machine is SMP and processing is complex. I
believe that the original code came closer to that ideal than the recent
code, and obviously I think the example is "valid workload" since I do
stuff like that every time I look for/at server problems.

I believe the benchmark shows a performance issue which will occur in
normal usage.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
@ 2002-05-03 16:37 John Hawkes
  0 siblings, 0 replies; 24+ messages in thread
From: John Hawkes @ 2002-05-03 16:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: rwhron

From: <rwhron@earthlink.net>
...
> tbench 192 is an anomaly test too.  AIM looks like a nice
> "mixed" bench.  Do you have any scripts for it?  I'd like 
> to use AIM too.

Try http://www.caldera.com/developers/community/contrib/aim.html for a tarball
with everything you'll need.

The "Multiuser Shared System Mix" (aka "workfile.shared") is the one I use.
You'll need several disk spindles to keep it compute-bound, though.  Several
of the disk subtests, especially the sync_* tests, quickly drive one or two
spindles to their max transaction rates, and from that point AIM7 will be
I/O-bound and produce a largely idle system, which isn't very interesting if
you're trying to example CPU scheduler performance with high process counts.

One thing you can do is to comment-out the three sync_* tests in the
workfile.shared configuration file, and then watch your idle time with
something like vmstat.  Experiment with commenting-out more disk subtests,
like creat-clo, disk_cp, and disk_src, one by one, until AIM7 becomes
compute-bound.

John Hawkes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
@ 2002-05-03 13:38 rwhron
  2002-05-03 20:29 ` Gerrit Huizenga
  2002-05-07 22:13 ` Mike Kravetz
  0 siblings, 2 replies; 24+ messages in thread
From: rwhron @ 2002-05-03 13:38 UTC (permalink / raw)
  To: gh; +Cc: linux-kernel, alan

> > > Rumor is that on some workloads MQ it outperforms O(1), but it
> > > may be that the latest (post K3?) O(1) is catching up?

Is MQ based on the Davide Libenzi scheduler? 
(a version of Davide's scheduler is in the -aa tree).

> > I'd be interested to know what workloads ?

> AIM on large CPU count machines was the most significant I had heard
> about.  Haven't measured recently on database load - we made a cut to
> O(1) some time back for simplicity.  Supposedly volanomark was doing
> better for a while but again we haven't cut back to MQ in quite a while;
> trying instead to refine O(1).  Volanomark is something of a scheduling
> anomaly though - sender/receiver timing on loopback affects scheduling
> decisions and overall throughput in ways that may or may not be consistent
> with real workloads.  AIM is probably a better workload for "real life"
> random scheduling testing.

tbench 192 is an anomaly test too.  AIM looks like a nice
"mixed" bench.  Do you have any scripts for it?  I'd like 
to use AIM too.

A side effect of O(1) in ac2 and jam6 on the 4 way box is a decrease 
in pipe bandwidth and an increase in pipe latency measured by lmbench:

kernel                    Pipe bandwidth in MB/s - bigger is better
-----------------------  ------
2.4.16                   383.93
2.4.19-pre3aa2           316.88
2.4.19-pre5              385.56
2.4.19-pre5-aa1          345.93
2.4.19-pre5-aa1-2g-hio   371.87
2.4.19-pre5-aa1-3g-hio   355.97
2.4.19-pre7              462.80
2.4.19-pre7-aa1          382.90
2.4.19-pre7-ac2           85.66
2.4.19-pre7-jam6          66.41
2.4.19-pre7-rl           464.60
2.4.19-pre7-rmap13       453.24

kernel                   Pipe latency in microseconds - smaller is better
-----------------------  -----
2.4.16                   12.73
2.4.19-pre3aa2           13.58
2.4.19-pre5              12.98
2.4.19-pre5-aa1          13.46
2.4.19-pre5-aa1-2g-hio   12.83
2.4.19-pre5-aa1-3g-hio   13.08
2.4.19-pre7              10.71
2.4.19-pre7-aa1          13.32
2.4.19-pre7-ac2          31.95
2.4.19-pre7-jam6         29.51
2.4.19-pre7-rl           10.71
2.4.19-pre7-rmap13       10.75

More at:
http://home.earthlink.net/~rwhron/kernel/bigbox.html

-- 
Randy Hron


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03 13:38 rwhron
@ 2002-05-03 20:29 ` Gerrit Huizenga
  2002-05-04  8:13   ` Andrea Arcangeli
  2002-05-07 22:13 ` Mike Kravetz
  1 sibling, 1 reply; 24+ messages in thread
From: Gerrit Huizenga @ 2002-05-03 20:29 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel, alan

In message <20020503093856.A27263@rushmore>, > : rwhron@earthlink.net writes:
> > > > Rumor is that on some workloads MQ it outperforms O(1), but it
> > > > may be that the latest (post K3?) O(1) is catching up?
> 
> Is MQ based on the Davide Libenzi scheduler? 
> (a version of Davide's scheduler is in the -aa tree).

No - Davide's is another variant.  All three had similar goals
and similar changes.  MQ was the "first" public one written by
Mike Kravetz and Hubertus Franke with help from a number of others.

> tbench 192 is an anomaly test too.  AIM looks like a nice
> "mixed" bench.  Do you have any scripts for it?  I'd like 
> to use AIM too.

The SGI folks may be using more custom scripts.  I think there
is a reasonable set of options in the released package.  OSDL
might also be playing with it (Wookie, are you out here?).  Sequent
used to have a large set of scripts but I don't know where those
are at the moment.  I may check around.

> A side effect of O(1) in ac2 and jam6 on the 4 way box is a decrease 
> in pipe bandwidth and an increase in pipe latency measured by lmbench:

Not surprised.  That seems to be one of our problems with
volanomark testing at the moment and we have some hacks to help,
one in TCP which allows the receiver to be scheduled on a "close"
CPU which seems to help latency.  Others are tweaks of the
scheduler itself, with nothing conclusively better yet.

gerrit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03 20:29 ` Gerrit Huizenga
@ 2002-05-04  8:13   ` Andrea Arcangeli
  0 siblings, 0 replies; 24+ messages in thread
From: Andrea Arcangeli @ 2002-05-04  8:13 UTC (permalink / raw)
  To: Gerrit Huizenga; +Cc: rwhron, linux-kernel, alan

On Fri, May 03, 2002 at 01:29:11PM -0700, Gerrit Huizenga wrote:
> In message <20020503093856.A27263@rushmore>, > : rwhron@earthlink.net writes:
> > > > > Rumor is that on some workloads MQ it outperforms O(1), but it
> > > > > may be that the latest (post K3?) O(1) is catching up?
> > 
> > Is MQ based on the Davide Libenzi scheduler? 
> > (a version of Davide's scheduler is in the -aa tree).
>  
> No - Davide's is another variant.  All three had similar goals

Davide's patch reduces the complexity of the scheduler from O(N) where N
is the number of tasks in the system, to O(N) where N is the number of
simultaneous running tasks in the system. It's also a simple
optimization and it can make responsiveness even better than the mainline
scheduler.  I know many people is using 2.4 with some thousand tasks
with only a few of them (let's say a dozen of them) running
simultaneously, so while the O(1) scheduler would be even better, the
dyn-sched patch from Davide looked the most attractive for production
usage given its simplicty. I also refined it a little while merging it
with Davide's overview.

Soon I will also get into merging the O(1) scheduler, but first I want
to inspect the interactivity and pipe bandwith effect, at least to
understand why they are getting affected (one first variable could be
the removal of the sync-wakeup, O(1) scheduler needs to get the wakeup
anyways in order to run the task but still we can teach the O(1)
scheduler that it must not try to reschedule the current task after
queueing the new one). In theory dyn-sched would be almost optimal for
well written applications with only 1 task per-cpu using async-io, but
of course in many benchmark (and in some real life environemtn) there
are an huge number of tasks running simultaneously and so dyn-sched
doesn't help much there compared to mainline.

BTW, Randy, I seen my tree runs slower with tiobench, that's probably
because I made the elevator anti-starvation logic more aggressive than
mainline and the other kernel trees (to help interactive usage), could
you try to run tiobench on -aa after elvtune -r 8192 -w 16384
/dev/hd[abcd] to verify? Thanks for the great benchmarking effort.

And for the reason fork is faster in -aa that's partly thanks to the
reschedule-child-first logic, that can be easily merged in mainline,
it's just in 2.5.

> and similar changes.  MQ was the "first" public one written by
> Mike Kravetz and Hubertus Franke with help from a number of others.
> 
> > tbench 192 is an anomaly test too.  AIM looks like a nice
> > "mixed" bench.  Do you have any scripts for it?  I'd like 
> > to use AIM too.
>  
> The SGI folks may be using more custom scripts.  I think there
> is a reasonable set of options in the released package.  OSDL
> might also be playing with it (Wookie, are you out here?).  Sequent
> used to have a large set of scripts but I don't know where those
> are at the moment.  I may check around.
> 
> > A side effect of O(1) in ac2 and jam6 on the 4 way box is a decrease 
> > in pipe bandwidth and an increase in pipe latency measured by lmbench:
> 
> Not surprised.  That seems to be one of our problems with
> volanomark testing at the moment and we have some hacks to help,
> one in TCP which allows the receiver to be scheduled on a "close"
> CPU which seems to help latency.  Others are tweaks of the
> scheduler itself, with nothing conclusively better yet.
> 
> gerrit
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Andrea

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03 13:38 rwhron
  2002-05-03 20:29 ` Gerrit Huizenga
@ 2002-05-07 22:13 ` Mike Kravetz
  2002-05-07 22:44   ` Alan Cox
  2002-05-08  8:50   ` Andrea Arcangeli
  1 sibling, 2 replies; 24+ messages in thread
From: Mike Kravetz @ 2002-05-07 22:13 UTC (permalink / raw)
  To: rwhron, mingo; +Cc: gh, linux-kernel, alan, andrea

On Fri, May 03, 2002 at 09:38:56AM -0400, rwhron@earthlink.net wrote:
> 
> A side effect of O(1) in ac2 and jam6 on the 4 way box is a decrease 
> in pipe bandwidth and an increase in pipe latency measured by lmbench:
> 

Believe it or not, the increase in pipe latency could be considered
a desirable result.  I believe that lat_pipe (the latency test) uses
two tasks that simply pass a token back and forth.  With the 'old'
scheduler these two tasks (mostly) got scheduled and ran on the same
CPU which produced the 'best results'.  With the O(1) scheduler, tasks
have some affinity to the CPUs they last ran on.  If the tasks end
up on different CPUs, then they will have a tendency to stay there.
In the case of lat_pipe, IPI latency (used to awaken/schedule a task
on another CPU) is added to every 'pipe transfer'.  This is bad for
the benchmark, but good for most workloads where it is more important
to run with warm caches than to be scheduled as fast as possible.

I believe the decrease in pipe bandwidth is a direct result of the
removal of the '__wake_up_sync' support.  I'm not exactly sure what
the arguments were for adding this support to the 'old' scheduler.
However, it was only used by the 'pipe_write' code when it had to
block after waking up a the reader on the pipe.  The 'bw_pipe'
test exercised this code path.  In the 'old' scheduler '__wake_up_sync'
seemed to accomplish the following:
1) Eliminated (possibly) unnecessary schedules on 'remote' CPUs
2) Eliminated IPI latency by having both reader and writer
   execute on the same CPU
3) ? Took advantage of pipe data being in the CPU cache, by
   having the reader read data the writer just wrote into the
   cache. ?
As I said, I'm not sure of the arguments for introducing this
functionality in the 'old' scheduler.  Hopefully, it was not
just a 'benchmark enhancing' patch.

I have experimented with reintroducing '__wake_up_sync' support
into the O(1) scheduler.  The modifications are limited to the
'try_to_wake_up' routine as they were before.  If the 'synchronous'
flag is set, then 'try_to_wake_up' trys to put the awakened task
on the same runqueue as the caller without forcing a reschedule.
If the task is not already on a runqueue, this is easy.  If not,
we give up.  Results, restore previous bandwidth results.

BEFORE
------
Pipe latency:    6.5185 microseconds
Pipe bandwidth: 86.35 MB/sec

AFTER
-----
Pipe latency:     6.5723 microseconds
Pipe bandwidth: 540.13 MB/sec

Comments?  If anyone would like to see/test the code (pretty simple
really) let me know.

-- 
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 22:13 ` Mike Kravetz
@ 2002-05-07 22:44   ` Alan Cox
  2002-05-07 22:43     ` Mike Kravetz
  2002-05-08  8:50   ` Andrea Arcangeli
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Cox @ 2002-05-07 22:44 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: rwhron, mingo, gh, linux-kernel, alan, andrea

> BEFORE
> ------
> Pipe latency:    6.5185 microseconds
> Pipe bandwidth: 86.35 MB/sec
> 
> AFTER
> -----
> Pipe latency:     6.5723 microseconds
> Pipe bandwidth: 540.13 MB/sec
> 
> Comments?  If anyone would like to see/test the code (pretty simple
> really) let me know.

Are you doing prefetches on the pipe data in your system. Im curious if
this is an SMP cross processor pipe issue or simply cache behaviour ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 22:44   ` Alan Cox
@ 2002-05-07 22:43     ` Mike Kravetz
  2002-05-07 23:39       ` Robert Love
  0 siblings, 1 reply; 24+ messages in thread
From: Mike Kravetz @ 2002-05-07 22:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: rwhron, mingo, gh, linux-kernel, andrea

On Tue, May 07, 2002 at 11:44:35PM +0100, Alan Cox wrote:
> > BEFORE
> > ------
> > Pipe latency:    6.5185 microseconds
> > Pipe bandwidth: 86.35 MB/sec
> > 
> > AFTER
> > -----
> > Pipe latency:     6.5723 microseconds
> > Pipe bandwidth: 540.13 MB/sec
> > 
> > Comments?  If anyone would like to see/test the code (pretty simple
> > really) let me know.
> 
> Are you doing prefetches on the pipe data in your system. Im curious if
> this is an SMP cross processor pipe issue or simply cache behaviour ?

I'm not doing any prefetches in the code (if that is what you are
talking about).  The code just moves the pipe reader to the same
CPU as the pipe writer (which is about to block).  Certainly, the
pipe reader could take advantage of any data written by the writer
still being in the cache.

The code added to 'try_to_wake_up' looks something like this:

        if (unlikely(synchronous)){
                rq = lock_task_rq(p, &flags);
                if (p->array || p->state == TASK_RUNNING) {
			/* We're too late */
                        unlock_task_rq(rq, &flags);
                        return success;
                }
                p->cpu = smp_processor_id(); /* Change CPU id */
                unlock_task_rq(rq, &flags);
                rq = lock_task_rq(p, &flags);
                p->state = TASK_RUNNING;
                if (!p->array) {
                        activate_task(p, rq);
                }
                unlock_task_rq(rq, &flags);
		return success;
	}

I'm not sure if 'synchronous' is still being passed all the way
down to try_to_wake_up in your tree (since it was removed in 2.5).
This is based off a back port of O(1) to 2.4.18 that Robert Love
did.  The rest of try_to_wake_up (the normal/common path) remains
the same.

-- 
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 22:43     ` Mike Kravetz
@ 2002-05-07 23:39       ` Robert Love
  2002-05-07 23:48         ` Mike Kravetz
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Love @ 2002-05-07 23:39 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Alan Cox, rwhron, mingo, gh, linux-kernel, andrea

On Tue, 2002-05-07 at 15:43, Mike Kravetz wrote:

> I'm not doing any prefetches in the code (if that is what you are
> talking about).  The code just moves the pipe reader to the same
> CPU as the pipe writer (which is about to block).  Certainly, the
> pipe reader could take advantage of any data written by the writer
> still being in the cache.

Hm, interesting.  When Ingo removed the sync variants of wake_up he did
it believing the load balancer would handle the case.  Apparently, at
least in this case, that assumption was wrong.

I agree with your earlier statement, though - this benchmark may be a
case where it shows up negatively but in general the balancing is
preferred.  I can think of plenty of workloads where that is the case. 
I also wonder if over time the load balancer would end up putting the
tasks on the same CPU.  That is something the quick pipe benchmark would
not show.

> I'm not sure if 'synchronous' is still being passed all the way
> down to try_to_wake_up in your tree (since it was removed in 2.5).
> This is based off a back port of O(1) to 2.4.18 that Robert Love
> did.  The rest of try_to_wake_up (the normal/common path) remains
> the same.

In 2.5 nor the 2.4 backport I did (what is in -ac) I don't think the
sync flag is being passed down since the functionality was removed.  The
functions were rewritten I believe to not have that parameter at all.

It is just for pipes we previously used sync, no?

	Robert Love

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 23:39       ` Robert Love
@ 2002-05-07 23:48         ` Mike Kravetz
  2002-05-08 15:34           ` Jussi Laako
  0 siblings, 1 reply; 24+ messages in thread
From: Mike Kravetz @ 2002-05-07 23:48 UTC (permalink / raw)
  To: Robert Love; +Cc: Alan Cox, rwhron, mingo, gh, linux-kernel, andrea

On Tue, May 07, 2002 at 04:39:34PM -0700, Robert Love wrote:
> It is just for pipes we previously used sync, no?

That's the only thing I know of that used it.

I'd really like to know if there are any real workloads that
benefited from this feature, rather than just some benchmark.
I can do some research, but was hoping someone on this list
might remember.  If there is a valid workload, I'll propose
a patch.  However, I don't think we should be adding patches/
features just to help some benchmark that is unrelated to
real world use.

-- 
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 23:48         ` Mike Kravetz
@ 2002-05-08 15:34           ` Jussi Laako
  2002-05-08 16:31             ` Robert Love
  0 siblings, 1 reply; 24+ messages in thread
From: Jussi Laako @ 2002-05-08 15:34 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Robert Love, mingo, linux-kernel

Mike Kravetz wrote:
> 
> I'd really like to know if there are any real workloads that
> benefited from this feature, rather than just some benchmark.

Maybe this is the reason why O(1) scheduler has big latencies with
pthread_cond_*() functions which original scheduler doesn't have?
I think I tracked the problem down to try_to_wake_up(), but I was unable to
fix it.


	- Jussi Laako

-- 
PGP key fingerprint: 161D 6FED 6A92 39E2 EB5B  39DD A4DE 63EB C216 1E4B
Available at PGP keyservers


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-08 15:34           ` Jussi Laako
@ 2002-05-08 16:31             ` Robert Love
  2002-05-08 17:02               ` Mike Kravetz
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Love @ 2002-05-08 16:31 UTC (permalink / raw)
  To: Jussi Laako; +Cc: Mike Kravetz, mingo, linux-kernel

On Wed, 2002-05-08 at 08:34, Jussi Laako wrote:

> Mike Kravetz wrote:
> > 
> > I'd really like to know if there are any real workloads that
> > benefited from this feature, rather than just some benchmark.
> 
> Maybe this is the reason why O(1) scheduler has big latencies with
> pthread_cond_*() functions which original scheduler doesn't have?
> I think I tracked the problem down to try_to_wake_up(), but I was unable to
> fix it.

Ah this could be the same case.  I just looked into the definition of
the conditional variable pthread stuff and it looks like it _could_ be
implemented using pipes but I do not see why it would per se.  If it
does not use pipes, then this sync issue is not at hand (only the pipe
code passed 1 for the sync flag).

If it does not use pipes, we could have another problem - but I doubt
it.  Maybe the benchmark is just another case where it shows worse
performance due to some attribute of the scheduler or load balancer?

	Robert Love

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-08 16:31             ` Robert Love
@ 2002-05-08 17:02               ` Mike Kravetz
  2002-05-09  0:26                 ` Jussi Laako
  0 siblings, 1 reply; 24+ messages in thread
From: Mike Kravetz @ 2002-05-08 17:02 UTC (permalink / raw)
  To: Robert Love; +Cc: Jussi Laako, mingo, linux-kernel

On Wed, May 08, 2002 at 09:31:39AM -0700, Robert Love wrote:
> On Wed, 2002-05-08 at 08:34, Jussi Laako wrote:
> > 
> > Maybe this is the reason why O(1) scheduler has big latencies with
> > pthread_cond_*() functions which original scheduler doesn't have?
> > I think I tracked the problem down to try_to_wake_up(), but I was unable to
> > fix it.
> 
> Ah this could be the same case.  I just looked into the definition of
> the conditional variable pthread stuff and it looks like it _could_ be
> implemented using pipes but I do not see why it would per se.  If it
> does not use pipes, then this sync issue is not at hand (only the pipe
> code passed 1 for the sync flag).
> 
> If it does not use pipes, we could have another problem - but I doubt
> it.  Maybe the benchmark is just another case where it shows worse
> performance due to some attribute of the scheduler or load balancer?
> 

In some cases, the O(1) scheduler will produce higher latencies than
the old scheduler.  On 'some' workloads/benchmarks the old scheduler
was better because it had a greater tendency to schedule tasks on the
same CPU.  This is certainly the case with the lat_ctx and lat_pipe
components of LMbench.  Note that this has nothing to do with the
wake_up sync behavior.  Rather, it is the difference between scheduling
a new task on the current CPU as opposed to a 'remote' CPU.  You can
schedule the task on the current CPU quicker, but this is not good for
optimal cache usage.  I believe the O(1) scheduler makes the correct
trade off in this area.

Is there anything simple I can do to check the latencies of the
pthread_cond_*() functions?  I'd like to do some analysis of 
scheduler behavior, but am unfamiliar with the user level code.

-- 
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-08 17:02               ` Mike Kravetz
@ 2002-05-09  0:26                 ` Jussi Laako
  0 siblings, 0 replies; 24+ messages in thread
From: Jussi Laako @ 2002-05-09  0:26 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Robert Love, mingo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

Mike Kravetz wrote:
> 
> Is there anything simple I can do to check the latencies of the
> pthread_cond_*() functions?  I'd like to do some analysis of
> scheduler behavior, but am unfamiliar with the user level code.

I just wrote small test program (attached) for testing this. It is similar
to my app, but for this test the soundcard is just used more as a timing
source.

Adjust the buffer size so that it's on the edge of missing blocks then
generate some system load to run it over the edge to see how sensitive it is
and how many blocks are lost. I can see clear difference using kernel with
original scheduler and with kernel having the O(1) scheduler.

It's quite sensitive when you run it without setuid privileges, but normal
situation is to run it at SCHED_FIFO.

Because of famous ReiserFS tailmerging feature lost-and-rewrote part of it
after crashing system with it. Found the the binary overwriting top of it
after reboot... I was happy to have copied it to another machine over NFS
just a few minutes before crash, so not much was lost.

	- Jussi Laako

-- 
PGP key fingerprint: 161D 6FED 6A92 39E2 EB5B  39DD A4DE 63EB C216 1E4B
Available at PGP keyservers

[-- Attachment #2: schedtest.c --]
[-- Type: application/x-unknown-content-type-cfile, Size: 5022 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-07 22:13 ` Mike Kravetz
  2002-05-07 22:44   ` Alan Cox
@ 2002-05-08  8:50   ` Andrea Arcangeli
  2002-05-09 23:18     ` Mike Kravetz
  1 sibling, 1 reply; 24+ messages in thread
From: Andrea Arcangeli @ 2002-05-08  8:50 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: rwhron, mingo, gh, linux-kernel, alan

On Tue, May 07, 2002 at 03:13:56PM -0700, Mike Kravetz wrote:
> I believe the decrease in pipe bandwidth is a direct result of the
> removal of the '__wake_up_sync' support.  I'm not exactly sure what
> the arguments were for adding this support to the 'old' scheduler.
> However, it was only used by the 'pipe_write' code when it had to
> block after waking up a the reader on the pipe.  The 'bw_pipe'
> test exercised this code path.  In the 'old' scheduler '__wake_up_sync'
> seemed to accomplish the following:
> 1) Eliminated (possibly) unnecessary schedules on 'remote' CPUs
> 2) Eliminated IPI latency by having both reader and writer
>    execute on the same CPU
> 3) ? Took advantage of pipe data being in the CPU cache, by
>    having the reader read data the writer just wrote into the
>    cache. ?
> As I said, I'm not sure of the arguments for introducing this
> functionality in the 'old' scheduler.  Hopefully, it was not
> just a 'benchmark enhancing' patch.
> 
> I have experimented with reintroducing '__wake_up_sync' support
> into the O(1) scheduler.  The modifications are limited to the
> 'try_to_wake_up' routine as they were before.  If the 'synchronous'
> flag is set, then 'try_to_wake_up' trys to put the awakened task
> on the same runqueue as the caller without forcing a reschedule.
> If the task is not already on a runqueue, this is easy.  If not,
> we give up.  Results, restore previous bandwidth results.
> 
> BEFORE
> ------
> Pipe latency:    6.5185 microseconds
> Pipe bandwidth: 86.35 MB/sec
> 
> AFTER
> -----
> Pipe latency:     6.5723 microseconds
> Pipe bandwidth: 540.13 MB/sec

So my hypothesis about the sync wakeup in the below email proven to be right:

	http://marc.theaimsgroup.com/?l=linux-kernel&m=102050009725367&w=2

Many thanks for verifying this.

Personally if the two tasks ends blocking waiting each other, then I
prefer them to be in the same cpu. That was the whole point of the
optimization. If the pipe buffer is large enough not to require reader
or writer to block, then we don't do the sync wakeup just now (there's a
detail with the reader that may block simply because the writer is slow
at writing, but it probably doesn't matter much). There are many cases
where a PAGE_SIZE of buffer gets filled in much less then a timeslice,
and for all those cases rescheduling the two tasks one after the other
in the same cpu is a win, just like the benchmark shows.  Think the
normal pipes we do from the shell, like a "| grep something", they are
very common and they all wants to be handled as a sync wakeups.  In
short when loads of data pass through the pipe with max bandwith, the
sync-wakeup is a definitive win. If the pipe never gets filled then the
writer never sync-wakeup, it just returns the write call asynchronously,
but of course the pipe doesn't get filled because it's not a
max-bandiwth scenario, and so the producer and the consumer are allowed
to scale in multiple cpus by the design of the workload.

Comments?

I would like if you could pass over your changes to the O(1) scheduler
to resurrect the sync-wakeup.

Andrea

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-08  8:50   ` Andrea Arcangeli
@ 2002-05-09 23:18     ` Mike Kravetz
  0 siblings, 0 replies; 24+ messages in thread
From: Mike Kravetz @ 2002-05-09 23:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: rwhron, mingo, gh, linux-kernel, alan

On Wed, May 08, 2002 at 10:50:49AM +0200, Andrea Arcangeli wrote:
> 
> I would like if you could pass over your changes to the O(1) scheduler
> to resurrect the sync-wakeup.
> 

Here is a patch to reintroduce __wake_up_sync() in 2.5.14.
It would be interesting to see if this helps in some of the
areas where people have seen a drop in performance.  Since,
'full pipes' are the only users of this code, I would only
expect to see benefit in workloads expecting high pipe
bandwidth.

Let me know what you think.

-- 
Mike

diff -Naur linux-2.5.14/fs/pipe.c linux-2.5.14-pipe/fs/pipe.c
--- linux-2.5.14/fs/pipe.c	Mon May  6 03:37:52 2002
+++ linux-2.5.14-pipe/fs/pipe.c	Wed May  8 22:48:39 2002
@@ -116,7 +116,7 @@
 		 * writers synchronously that there is more
 		 * room.
 		 */
-		wake_up_interruptible(PIPE_WAIT(*inode));
+		wake_up_interruptible_sync(PIPE_WAIT(*inode));
 		if (!PIPE_EMPTY(*inode))
 			BUG();
 		goto do_more_read;
diff -Naur linux-2.5.14/include/linux/sched.h linux-2.5.14-pipe/include/linux/sched.h
--- linux-2.5.14/include/linux/sched.h	Mon May  6 03:37:54 2002
+++ linux-2.5.14-pipe/include/linux/sched.h	Thu May  9 20:47:19 2002
@@ -485,6 +485,11 @@
 #define wake_up_interruptible(x)	__wake_up((x),TASK_INTERRUPTIBLE, 1)
 #define wake_up_interruptible_nr(x, nr)	__wake_up((x),TASK_INTERRUPTIBLE, nr)
 #define wake_up_interruptible_all(x)	__wake_up((x),TASK_INTERRUPTIBLE, 0)
+#ifdef CONFIG_SMP
+#define wake_up_interruptible_sync(x)   __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
+#else
+#define wake_up_interruptible_sync(x)   __wake_up((x),TASK_INTERRUPTIBLE, 1)
+#endif
 asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru);
 
 extern int in_group_p(gid_t);
diff -Naur linux-2.5.14/kernel/sched.c linux-2.5.14-pipe/kernel/sched.c
--- linux-2.5.14/kernel/sched.c	Mon May  6 03:37:57 2002
+++ linux-2.5.14-pipe/kernel/sched.c	Thu May  9 21:04:13 2002
@@ -329,27 +329,38 @@
  * "current->state = TASK_RUNNING" to mark yourself runnable
  * without the overhead of this.
  */
-static int try_to_wake_up(task_t * p)
+static int try_to_wake_up(task_t * p, int sync)
 {
 	unsigned long flags;
 	int success = 0;
 	runqueue_t *rq;
 
+repeat_lock_task:
 	rq = task_rq_lock(p, &flags);
-	p->state = TASK_RUNNING;
 	if (!p->array) {
+		if (unlikely(sync)) {
+			if (p->thread_info->cpu != smp_processor_id()) {
+				p->thread_info->cpu = smp_processor_id();
+				task_rq_unlock(rq, &flags);
+				goto repeat_lock_task;
+			}
+		}
 		activate_task(p, rq);
+		/*
+		 * If sync is set, a resched_task() is a NOOP
+		 */
 		if (p->prio < rq->curr->prio)
 			resched_task(rq->curr);
 		success = 1;
 	}
+	p->state = TASK_RUNNING;
 	task_rq_unlock(rq, &flags);
 	return success;
 }
 
 int wake_up_process(task_t * p)
 {
-	return try_to_wake_up(p);
+	return try_to_wake_up(p, 0);
 }
 
 void wake_up_forked_process(task_t * p)
@@ -872,7 +883,7 @@
  * started to run but is not in state TASK_RUNNING.  try_to_wake_up() returns
  * zero in this (rare) case, and we handle it by continuing to scan the queue.
  */
-static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
+static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync)
 {
 	struct list_head *tmp;
 	unsigned int state;
@@ -883,7 +894,7 @@
 		curr = list_entry(tmp, wait_queue_t, task_list);
 		p = curr->task;
 		state = p->state;
-		if ((state & mode) && try_to_wake_up(p) &&
+		if ((state & mode) && try_to_wake_up(p, sync) &&
 			((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive))
 				break;
 	}
@@ -897,7 +908,22 @@
 		return;
 
 	wq_read_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive);
+	__wake_up_common(q, mode, nr_exclusive, 0);
+	wq_read_unlock_irqrestore(&q->lock, flags);
+}
+
+void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
+{
+	unsigned long flags;
+
+	if (unlikely(!q))
+		return;
+
+	wq_read_lock_irqsave(&q->lock, flags);
+	if (likely(nr_exclusive))
+		__wake_up_common(q, mode, nr_exclusive, 1);
+	else
+		__wake_up_common(q, mode, nr_exclusive, 0);
 	wq_read_unlock_irqrestore(&q->lock, flags);
 }
 
@@ -907,7 +933,7 @@
 
 	spin_lock_irqsave(&x->wait.lock, flags);
 	x->done++;
-	__wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1);
+	__wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, 0);
 	spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* O(1) scheduler gives big boost to tbench 192
@ 2002-05-02 21:36 rwhron
  2002-05-03  0:09 ` Gerrit Huizenga
  0 siblings, 1 reply; 24+ messages in thread
From: rwhron @ 2002-05-02 21:36 UTC (permalink / raw)
  To: linux-kernel

On an OSDL 4 way x86 box the O(1) scheduler effect 
becomes obvious as the run queue gets large.  

2.4.19-pre7-ac2 and 2.4.19-pre7-jam6 have the O(1) scheduler.  

At 192 processes, O(1) shows about 340% improvement in throughput.
The dyn-sched in -aa appears to be somewhat improved over the
standard scheduler.

Numbers are in MB/second.

tbench 192 processes
2.4.16                    29.39
2.4.17                    29.70
2.4.19-pre5               29.01
2.4.19-pre5-aa1           29.22
2.4.19-pre5-aa1-2g-hio    29.94
2.4.19-pre5-aa1-3g-hio    28.66
2.4.19-pre7               29.93
2.4.19-pre7-aa1           32.75
2.4.19-pre7-ac2          103.98
2.4.19-pre7-rmap13        29.46
2.4.19-pre7-jam6         104.98
2.4.19-pre7-rl            29.74

At 64 processes, O(1) helps a little.  ac2 and jam6 have
the highest numbers here too.

tbench 64 processes
2.4.16                    101.99
2.4.17                    103.49
2.4.19-pre5-aa1           102.43
2.4.19-pre5-aa1-2g-hio    104.30
2.4.19-pre5-aa1-3g-hio    104.60
2.4.19-pre7               100.86
2.4.19-pre7-aa1           101.76
2.4.19-pre7-ac2           105.89
2.4.19-pre7-rmap13        100.94
2.4.19-pre7-rl             99.65
2.4.19-pre7-jam6          108.23

I've seen some benefit on a uniprocessor box running tbench 32 
for kernels with O(1).  Hmm, have to try tbench 192 on uniproc 
and see if the difference is all scheduler overhead.

I'm putting together a page with more results on this machine.
It will be growing at:
http://home.earthlink.net/~rwhron/kernel/bigbox.html

-- 
Randy Hron

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-02 21:36 rwhron
@ 2002-05-03  0:09 ` Gerrit Huizenga
  2002-05-02 23:17   ` J.A. Magallon
  2002-05-03  0:14   ` Alan Cox
  0 siblings, 2 replies; 24+ messages in thread
From: Gerrit Huizenga @ 2002-05-03  0:09 UTC (permalink / raw)
  To: rwhron; +Cc: linux-kernel

In message <20020502173656.A26986@rushmore>, > : rwhron@earthlink.net writes:
> On an OSDL 4 way x86 box the O(1) scheduler effect 
> becomes obvious as the run queue gets large.  
> 
> 2.4.19-pre7-ac2 and 2.4.19-pre7-jam6 have the O(1) scheduler.  
> 
> At 192 processes, O(1) shows about 340% improvement in throughput.
> The dyn-sched in -aa appears to be somewhat improved over the
> standard scheduler.
> 
> Numbers are in MB/second.
> 

If you are bored, you might compare this to the MQ scheduler
at http://prdownloads.sourceforge.net/lse/2.4.14.mq-sched

Also, I think rml did a backport of the 2.5.X version of O(1);
I'm not sure if htat is in -ac or -jam as yet.

Rumor is that on some workloads MQ it outperforms O(1), but it
may be that the latest (post K3?) O(1) is catching up?

gerrit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03  0:09 ` Gerrit Huizenga
@ 2002-05-02 23:17   ` J.A. Magallon
  2002-05-03  0:14   ` Alan Cox
  1 sibling, 0 replies; 24+ messages in thread
From: J.A. Magallon @ 2002-05-02 23:17 UTC (permalink / raw)
  To: Gerrit Huizenga; +Cc: linux-kernel


On 2002.05.03 Gerrit Huizenga wrote:
>In message <20020502173656.A26986@rushmore>, > : rwhron@earthlink.net writes:
>> On an OSDL 4 way x86 box the O(1) scheduler effect 
>> becomes obvious as the run queue gets large.  
>> 
>> 2.4.19-pre7-ac2 and 2.4.19-pre7-jam6 have the O(1) scheduler.  
>> 
>> At 192 processes, O(1) shows about 340% improvement in throughput.
>> The dyn-sched in -aa appears to be somewhat improved over the
>> standard scheduler.
>> 
>> Numbers are in MB/second.
>> 
>
>If you are bored, you might compare this to the MQ scheduler
>at http://prdownloads.sourceforge.net/lse/2.4.14.mq-sched
>
>Also, I think rml did a backport of the 2.5.X version of O(1);
>I'm not sure if htat is in -ac or -jam as yet.
>

-jam6 is sched-O1-rml-2 (the backport).

>Rumor is that on some workloads MQ it outperforms O(1), but it
>may be that the latest (post K3?) O(1) is catching up?
>

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.3 (Cooker) for i586
Linux werewolf 2.4.19-pre7-jam9 #2 SMP mié may 1 12:09:38 CEST 2002 i686

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03  0:09 ` Gerrit Huizenga
  2002-05-02 23:17   ` J.A. Magallon
@ 2002-05-03  0:14   ` Alan Cox
  2002-05-03  1:08     ` Gerrit Huizenga
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Cox @ 2002-05-03  0:14 UTC (permalink / raw)
  To: gh; +Cc: rwhron, linux-kernel

> If you are bored, you might compare this to the MQ scheduler
> at http://prdownloads.sourceforge.net/lse/2.4.14.mq-sched
> 
> Also, I think rml did a backport of the 2.5.X version of O(1);
> I'm not sure if htat is in -ac or -jam as yet.

-ac has Robert Love's backport of the additional fixes

> Rumor is that on some workloads MQ it outperforms O(1), but it
> may be that the latest (post K3?) O(1) is catching up?

I'd be interested to know what workloads ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: O(1) scheduler gives big boost to tbench 192
  2002-05-03  0:14   ` Alan Cox
@ 2002-05-03  1:08     ` Gerrit Huizenga
  0 siblings, 0 replies; 24+ messages in thread
From: Gerrit Huizenga @ 2002-05-03  1:08 UTC (permalink / raw)
  To: Alan Cox; +Cc: rwhron, linux-kernel

In message <E173QiK-0005Bd-00@the-village.bc.nu>, > : Alan Cox writes:
> 
> > Rumor is that on some workloads MQ it outperforms O(1), but it
> > may be that the latest (post K3?) O(1) is catching up?
> 
> I'd be interested to know what workloads ?

AIM on large CPU count machines was the most significant I had heard
about.  Haven't measured recently on database load - we made a cut to
O(1) some time back for simplicity.  Supposedly volanomark was doing
better for a while but again we haven't cut back to MQ in quite a while;
trying instead to refine O(1).  Volanomark is something of a scheduling
anomaly though - sender/receiver timing on loopback affects scheduling
decisions and overall throughput in ways that may or may not be consistent
with real workloads.  AIM is probably a better workload for "real life"
random scheduling testing.

gerrit

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2002-05-20 12:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-06  8:20 O(1) scheduler gives big boost to tbench 192 rwhron
2002-05-06 16:42 ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2002-05-20 12:46 rwhron
2002-05-08 16:39 Bill Davidsen
2002-05-03 16:37 John Hawkes
2002-05-03 13:38 rwhron
2002-05-03 20:29 ` Gerrit Huizenga
2002-05-04  8:13   ` Andrea Arcangeli
2002-05-07 22:13 ` Mike Kravetz
2002-05-07 22:44   ` Alan Cox
2002-05-07 22:43     ` Mike Kravetz
2002-05-07 23:39       ` Robert Love
2002-05-07 23:48         ` Mike Kravetz
2002-05-08 15:34           ` Jussi Laako
2002-05-08 16:31             ` Robert Love
2002-05-08 17:02               ` Mike Kravetz
2002-05-09  0:26                 ` Jussi Laako
2002-05-08  8:50   ` Andrea Arcangeli
2002-05-09 23:18     ` Mike Kravetz
2002-05-02 21:36 rwhron
2002-05-03  0:09 ` Gerrit Huizenga
2002-05-02 23:17   ` J.A. Magallon
2002-05-03  0:14   ` Alan Cox
2002-05-03  1:08     ` Gerrit Huizenga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox