Re: [PATCH][plugsched 0/28] Pluggable cpu scheduler framework

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Pavel Machek <pavel@ucw.cz>
Cc: Con Kolivas <kernel@kolivas.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	Peter Williams <pwil3058@bigpond.net.au>,
	William Lee Irwin III <wli@holomorphy.com>,
	Alexander Nyberg <alexn@dsv.su.se>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH][plugsched 0/28] Pluggable cpu scheduler framework
Date: Mon, 1 Nov 2004 12:41:24 +0100	[thread overview]
Message-ID: <20041101114124.GA31458@elte.hu> (raw)
In-Reply-To: <20041031233313.GB6909@elf.ucw.cz>

* Pavel Machek <pavel@ucw.cz> wrote:

> You are changing 
> 
> some_functions()
> 
> into
> 
> something->function()
> 
> no? I do not think that is 0 overhead...

my main worry with this approach is not really overhead but the impact
on scheduler development. Right now there is a Linux scheduler that
every developer (small-workload and large-workload people) tries to make
as good as possible. Historically and fundamentally, scheduler
development and feedback has always been a 'scarce resource' - the
feedback cycle is (necessarily) long and there are alot of specialized
cases to take care of, which slowly dribble in with time.

firstly, if someone wants a different or specialized scheduler there's
no problem even under the current model, and it has happened before. We
made the scheduler itself easily 'rip-out-able' in 2.6 by decreasing the
junction points between the scheduler and the rest of the system. Also,
the current scheduler is no way cast into stone, we could easily end up
having a different interactivity code within the scheduler, as a result
of the various 'get rid of the two arrays' efforts currently underway.
But i very much do not support making the 'junction points' at the wrong
place.

But more importantly, in the current model, people who care about
'fringe' workloads (embedded and high-end) are 'forced' to improve the
core scheduler if they want to see their problems solved by mainline.
They are forced to think about issues, to generalize problems and to
solve them so that the large picture is still right. This worked pretty
well in the past and works well today. It is painful in terms of getting
stuff integrated but it works.

Scheduler domains was and is a prime example of this concept in the
works: load-balancing was a difficult issue that kept (some of) us
uneasy for years and then a nice generic framework came along that
replaced the old code, made both small boxes and large boxes possible.
As a bonus it also solved the 'HT scheduling' issue almost for free.
Sched-domains is nice for both the low-end and the high-end - it enables
512 CPU single-system-image systems supported by (almost-) vanilla 2.6
kernel. What more can we ask for?

I am 100% sure that we'd not have sched-domains today had we gone for a
'plugin' model say 2-3 years ago. It's always hard to predict 'what if'
scenarios but here's my guess: we'd have a NUMA scheduler, a separate
SMP scheduler, a number of UP schedulers and embedded schedulers, and
say HT would be supported in different ways by the SMP and NUMA
schedulers.

or to give another example: we emphatically do not allow 'dynamic
syscalls' in Linux, albeit for years we've been hammered with how
enterprise-ready Linux would be from them. In reality, without 'dynamic
syscalls' all the 'fringe functionality' people have to think harder and
have to integrate their stuff into the current
syscalls/drivers/subsystems.

the process scheduler is i think a similar piece of technology: we want
to make it _harder_ for specialized workloads to be handled in some
'specialized' way, because those precise workloads do show up in other
workloads too, in a different manner. A fix made for NUMA or real-time
purposes can easily make a difference for desktop workloads. Often
'specialized' is an excluse for a 'fundamentally broken, limited hack',
especially in the scheduler world.

I believe that by compartmenting in the wrong way [*] we kill the
natural integration effects. We'd end up with 5 (or 20) bad generic
schedulers that happen to work in one precise workload only, but there
would not be enough push to build one good generic scheduler, because
the people who are now forced to care about the Linux scheduler would be
content about their specialized schedulers. Yes, it would be easier to
make a specialized scheduler work well in that precise workload (because
the developer can make the 'this is only for this parcticular workload'
excuse), and this approach may satisfy the embedded and high-end needs
in a quicker way. So i consider scheduler plugins as the STREAMS
equivalent of scheduling and i am not very positive about it. Just like
STREAMS, i consider 'scheduler plugins' as the easy but deceptive and
wrong way out of current problems, which will create much worse problems
than the ones it tries to solve.

	Ingo

( [*] how is this different from say the IO scheduler plugin
architecture?  Just compare the two, it's two very different things.
Firstly, the timescale is very different - the process scheduler cares
about microseconds, the IO scheduler's domain is milliseconds. Also, IO
scheduling is fundamentally per-device and often there is good
per-device workload isolation so picking an IO scheduler per queue makes
much more sense than say picking a scheduler per CPU ... There are other
differences too, such as complexity and isolation from the rest of the
system. )

next prev parent reply	other threads:[~2004-11-01 11:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-30 14:32 [PATCH][plugsched 0/28] Pluggable cpu scheduler framework Con Kolivas
2004-10-31 23:33 ` Pavel Machek
2004-10-31 23:37   ` Con Kolivas
2004-11-01  1:42   ` William Lee Irwin III
2004-11-01 11:41   ` Ingo Molnar [this message]
2004-11-01 13:08     ` Kasper Sandberg
2004-11-01 13:32     ` Con Kolivas
2004-11-01 14:23       ` Nick Piggin
2004-11-01 17:54     ` Jesse Barnes
2004-11-02 21:28     ` Matthias Urlichs
2004-11-02 22:30       ` Peter Chubb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041101114124.GA31458@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=alexn@dsv.su.se \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pavel@ucw.cz \
    --cc=pwil3058@bigpond.net.au \
    --cc=torvalds@osdl.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox