All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Paul Jackson <pj@sgi.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	vatsa@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com,
	nickpiggin@yahoo.com.au, ebiederm@xmission.com,
	akpm@linux-foundation.org, sgrubb@redhat.com,
	rostedt@goodmis.org, ghaskins@novell.com,
	dmitry.adamushko@gmail.com, tong.n.li@intel.com,
	tglx@linutronix.de, menage@google.com, rientjes@google.com
Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing
Date: Tue, 29 Jan 2008 12:31:24 +0100	[thread overview]
Message-ID: <1201606284.28547.114.camel@lappy> (raw)
In-Reply-To: <20080129051353.4628c9eb.pj@sgi.com>


On Tue, 2008-01-29 at 05:13 -0600, Paul Jackson wrote:
> Peter wrote:
> > Thanks for the link. Yes I think your last suggestion of creating
> > rt-domains ( http://lkml.org/lkml/2007/10/23/419 ) is a good one.
> 
> We now have a per-cpuset Boolean flag file called 'sched_load_balance'.

SD_LOAD_BALANCE, right?

> In the default case, this flag is set on, and the kernel does its
> usual load balancing across all CPUs in that cpuset.  This means, under
> the covers, that there exists some sched domain such that all CPUs in
> that cpuset are in that same sched domain.  That sched domain might
> contain additional CPUs from outside that cpuset as well.  Indeed,
> in the default vanilla configuration, that sched domain contains all
> CPUs in the system.
> 
> If we turn the sched_load_balance flag off for some cpuset, we are
> telling the kernel it's ok not to load balance on the CPUs in that
> cpuset (unless those CPUs are in some other cpuset that needed load
> balancing anyway.)
> 
> This 'sched_load_balance' flag is, thus far, "the" cpuset hook
> supporting realtime.  One can use it to configure a system so that
> the kernel does not do normal load balancing on select CPUs, such
> as those CPUs dedicated to realtime use.

Ah, here I disagree, it is possible to do (hard) realtime scheduling
over multiple cpus, the only draw back is that it requires a very strong
load-balancer, making it unsuitable for large number of cpus.

( of course, having a strong rt load balancer on a large cpuset doesn't
  harm, as long as there are no rt tasks to balance )

So if we have a system like so:

             __A__
            /  |  \
          B1  B2  B3
          /\
         /  \
       C1   C2

A comprises of cpus 0-127, !SD_LOAD_BALANCE

B1 comprises of cpus 0-63, !SD_LOAD_BALANCE
B2 comprises of cpus 64-119
B3                   120-127

C1                   0-3
C2                   5-63

We end up with 4 disjoint load-balanced sets.

I would then attach the rt balance information to: C1, C2, B2, B3.

If, for example, B1 would be load-balanced, we'd only have 3 disjoint
sets left: B1, B2 and B3, and the rt balance data would be there.

> It sounds like Peter is reminding us that we really have three choices
> for a handling a given CPU's load balancing:
>  1) normal kernel scheduler load balancing,
>  2) RT load balancing, or
>  3) no load balancing whatsoever.
> 
> If that's the case (if we really need choice 3) then a single Boolean
> flag, such as sched_load_balance, is not sufficient to select from
> the three choices, and it might make sense to add a second per-cpuset
> Boolean flag, say "sched_rt_balance", default off, which if turned on,
> enabled choice 2.
> 
> If that's not the case (we only need choices 1 and 2) then -logically-
> we could overload the meaning of the current sched_load_balance,
> to mean, if turned off, not only to stop doing normal balancing, but
> to further mean that we should commence RT balancing.  However bits
> aren't -that- precious here, and this sounds unnecessarily confusing.
> 
> So ... would a new per-cpuset Boolean flag such as sched_rt_balance be
> appropriate and sufficient to mark those cpusets whose set of CPUs
> required RT balancing?

So, I don't think we need that, I think we can do with the single flag,
we just need to find these disjoint sets and stick our rt-domain there. 



  reply	other threads:[~2008-01-29 11:32 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29  9:53 scheduler scalability - cgroups, cpusets and load-balancing Peter Zijlstra
2008-01-29 10:01 ` Paul Jackson
2008-01-29 10:50   ` Peter Zijlstra
2008-01-29 11:13     ` Paul Jackson
2008-01-29 11:31       ` Peter Zijlstra [this message]
2008-01-29 11:53         ` Paul Jackson
2008-01-29 12:07           ` Peter Zijlstra
2008-01-29 12:36             ` Paul Jackson
2008-01-29 12:03         ` Paul Jackson
2008-01-29 12:30           ` Peter Zijlstra
2008-01-29 12:52             ` Paul Jackson
2008-01-29 13:38               ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30   ` Paul Jackson
2008-01-29 11:34     ` Paul Jackson
2008-01-29 11:50     ` Peter Zijlstra
2008-01-29 12:12       ` Paul Jackson
2008-01-29 15:57         ` Gregory Haskins
2008-01-29 16:33           ` Paul Jackson
2008-01-29 15:50       ` Gregory Haskins
2008-01-29 16:51         ` Paul Jackson
2008-01-29 17:21           ` Gregory Haskins
2008-01-29 19:04             ` Paul Jackson
2008-01-29 20:36               ` Gregory Haskins
2008-01-29 21:02                 ` Paul Jackson
2008-01-29 21:07                   ` Gregory Haskins
2008-01-29 15:36     ` Gregory Haskins
2008-01-29 16:28       ` Paul Jackson
2008-01-29 16:42         ` Gregory Haskins
2008-01-29 19:37           ` Paul Jackson
2008-01-29 20:28             ` Gregory Haskins
2008-01-29 20:56               ` Paul Jackson
2008-01-29 21:02                 ` Gregory Haskins
2008-01-29 22:23                   ` Steven Rostedt
2008-01-29 12:32   ` Srivatsa Vaddagiri
2008-01-29 12:21     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1201606284.28547.114.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sgrubb@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tong.n.li@intel.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.