public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Paul Jackson <pj@sgi.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	vatsa@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com,
	nickpiggin@yahoo.com.au, ebiederm@xmission.com,
	akpm@linux-foundation.org, sgrubb@redhat.com,
	rostedt@goodmis.org, ghaskins@novell.com,
	dmitry.adamushko@gmail.com, tong.n.li@intel.com,
	tglx@linutronix.de, menage@google.com, rientjes@google.com
Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing
Date: Tue, 29 Jan 2008 12:31:24 +0100	[thread overview]
Message-ID: <1201606284.28547.114.camel@lappy> (raw)
In-Reply-To: <20080129051353.4628c9eb.pj@sgi.com>


On Tue, 2008-01-29 at 05:13 -0600, Paul Jackson wrote:
> Peter wrote:
> > Thanks for the link. Yes I think your last suggestion of creating
> > rt-domains ( http://lkml.org/lkml/2007/10/23/419 ) is a good one.
> 
> We now have a per-cpuset Boolean flag file called 'sched_load_balance'.

SD_LOAD_BALANCE, right?

> In the default case, this flag is set on, and the kernel does its
> usual load balancing across all CPUs in that cpuset.  This means, under
> the covers, that there exists some sched domain such that all CPUs in
> that cpuset are in that same sched domain.  That sched domain might
> contain additional CPUs from outside that cpuset as well.  Indeed,
> in the default vanilla configuration, that sched domain contains all
> CPUs in the system.
> 
> If we turn the sched_load_balance flag off for some cpuset, we are
> telling the kernel it's ok not to load balance on the CPUs in that
> cpuset (unless those CPUs are in some other cpuset that needed load
> balancing anyway.)
> 
> This 'sched_load_balance' flag is, thus far, "the" cpuset hook
> supporting realtime.  One can use it to configure a system so that
> the kernel does not do normal load balancing on select CPUs, such
> as those CPUs dedicated to realtime use.

Ah, here I disagree, it is possible to do (hard) realtime scheduling
over multiple cpus, the only draw back is that it requires a very strong
load-balancer, making it unsuitable for large number of cpus.

( of course, having a strong rt load balancer on a large cpuset doesn't
  harm, as long as there are no rt tasks to balance )

So if we have a system like so:

             __A__
            /  |  \
          B1  B2  B3
          /\
         /  \
       C1   C2

A comprises of cpus 0-127, !SD_LOAD_BALANCE

B1 comprises of cpus 0-63, !SD_LOAD_BALANCE
B2 comprises of cpus 64-119
B3                   120-127

C1                   0-3
C2                   5-63

We end up with 4 disjoint load-balanced sets.

I would then attach the rt balance information to: C1, C2, B2, B3.

If, for example, B1 would be load-balanced, we'd only have 3 disjoint
sets left: B1, B2 and B3, and the rt balance data would be there.

> It sounds like Peter is reminding us that we really have three choices
> for a handling a given CPU's load balancing:
>  1) normal kernel scheduler load balancing,
>  2) RT load balancing, or
>  3) no load balancing whatsoever.
> 
> If that's the case (if we really need choice 3) then a single Boolean
> flag, such as sched_load_balance, is not sufficient to select from
> the three choices, and it might make sense to add a second per-cpuset
> Boolean flag, say "sched_rt_balance", default off, which if turned on,
> enabled choice 2.
> 
> If that's not the case (we only need choices 1 and 2) then -logically-
> we could overload the meaning of the current sched_load_balance,
> to mean, if turned off, not only to stop doing normal balancing, but
> to further mean that we should commence RT balancing.  However bits
> aren't -that- precious here, and this sounds unnecessarily confusing.
> 
> So ... would a new per-cpuset Boolean flag such as sched_rt_balance be
> appropriate and sufficient to mark those cpusets whose set of CPUs
> required RT balancing?

So, I don't think we need that, I think we can do with the single flag,
we just need to find these disjoint sets and stick our rt-domain there. 



  reply	other threads:[~2008-01-29 11:32 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29  9:53 scheduler scalability - cgroups, cpusets and load-balancing Peter Zijlstra
2008-01-29 10:01 ` Paul Jackson
2008-01-29 10:50   ` Peter Zijlstra
2008-01-29 11:13     ` Paul Jackson
2008-01-29 11:31       ` Peter Zijlstra [this message]
2008-01-29 11:53         ` Paul Jackson
2008-01-29 12:07           ` Peter Zijlstra
2008-01-29 12:36             ` Paul Jackson
2008-01-29 12:03         ` Paul Jackson
2008-01-29 12:30           ` Peter Zijlstra
2008-01-29 12:52             ` Paul Jackson
2008-01-29 13:38               ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30   ` Paul Jackson
2008-01-29 11:34     ` Paul Jackson
2008-01-29 11:50     ` Peter Zijlstra
2008-01-29 12:12       ` Paul Jackson
2008-01-29 15:57         ` Gregory Haskins
2008-01-29 16:33           ` Paul Jackson
2008-01-29 15:50       ` Gregory Haskins
2008-01-29 16:51         ` Paul Jackson
2008-01-29 17:21           ` Gregory Haskins
2008-01-29 19:04             ` Paul Jackson
2008-01-29 20:36               ` Gregory Haskins
2008-01-29 21:02                 ` Paul Jackson
2008-01-29 21:07                   ` Gregory Haskins
2008-01-29 15:36     ` Gregory Haskins
2008-01-29 16:28       ` Paul Jackson
2008-01-29 16:42         ` Gregory Haskins
2008-01-29 19:37           ` Paul Jackson
2008-01-29 20:28             ` Gregory Haskins
2008-01-29 20:56               ` Paul Jackson
2008-01-29 21:02                 ` Gregory Haskins
2008-01-29 22:23                   ` Steven Rostedt
2008-01-29 12:32   ` Srivatsa Vaddagiri
2008-01-29 12:21     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1201606284.28547.114.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sgrubb@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tong.n.li@intel.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox