From: Matthew Dobson <colpatch@us.ibm.com>
To: Paul Jackson <pj@sgi.com>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
Simon.Derr@bull.net, pwil3058@bigpond.net.au,
frankeh@watson.ibm.com, dipankar@in.ibm.com,
Andrew Morton <akpm@osdl.org>,
ckrm-tech@lists.sourceforge.net, efocht@hpce.nec.com,
LSE Tech <lse-tech@lists.sourceforge.net>,
hch@infradead.org, steiner@sgi.com,
Jesse Barnes <jbarnes@sgi.com>,
sylvain.jeaugey@bull.net, djh@sgi.com,
LKML <linux-kernel@vger.kernel.org>, Andi Kleen <ak@suse.de>,
sivanich@sgi.com
Subject: Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
Date: Wed, 06 Oct 2004 15:59:41 -0700 [thread overview]
Message-ID: <1097103580.4907.84.camel@arrakis> (raw)
In-Reply-To: <20041005190852.7b1fd5b5.pj@sgi.com>
On Tue, 2004-10-05 at 19:08, Paul Jackson wrote:
> Martin writes:
> > I agree with the basic partitioning stuff - and see a need for that. The
> > non-exclusive stuff I think is fairly obscure, and unnecessary complexity
> > at this point, as 90% of it is covered by CKRM. It's Andrew and Linus's
> > decision, but that's my input.
>
> Now you're trying to marginalize non-exclusive cpusets as a fringe
> requirement. Thanks a bunch ;).
>
> Instead of requiring complete exclusion for all cpusets, and pointing to
> the current 'exclusive' flag as the wrong flag at the wrong place at the
> wrong time (sorry - my radio is turned to the V.P. debate in the
> background) how about let's being clear what sort of exclusion the
> schedulers, the allocators and here the resource manager (CKRM) require.
I think what Martin is trying to say, in his oh so eloquent way, is that
the difference between 'non-exclusive' cpusets and, say, CKRM
taskclasses isn't very clear. It seems to me that non-exclusive cpusets
are little more than a convenient way to group tasks. Now, I'm not
saying that I don't think that is a useful functionality, but I am
saying that cpusets seem like the wrong way to go about it.
> I can envision dividing a machine into a few large, quite separate,
> 'soft' partitions, where each such partition is represented by a subtree
> of the cpuset hierarchy, and where there is no overlap of CPUs, Memory
> Nodes or tasks between the 'soft' partitions, even though there is a
> possibly richly nested cpuset (cpu and memory affinity) structure within
> any given 'soft' partition.
>
> Nothing would cross 'soft' partition boundaries. So far as CPUs, Memory
> Nodes, Tasks and their Affinity, the 'soft' partitions would be
> separate, isolated, and non-overlapping.
Ok. These imaginary 'soft' partitions sound much like what I expected
'exclusive' cpusets to be based on the terminology. They also sound
exactly like what I am trying to implement through my sched_domains
work.
> Each such 'soft' partition could host a separate instance (domain) of
> the scheduler, allocator, and resource manager. Any such domain would
> know what set of CPUs, Memory Nodes and Tasks it was managing, and would
> have complete and sole control of the scheduling, allocation or resource
> sharing of those entities.
I don't know that these partitions would necessarily need their own
scheduler, allocator and resource manager, or if we would just make the
current scheduler, allocator and resource manager aware of these
boundaries. In either case, that is an implementation detail not to be
agonized over now.
> But also within a 'soft' partition, there would be finer grain placement,
> finer grain CPU and Memory affinity, whether by the current tasks
> cpus_allowed and mems_allowed, or by some improved mechanism that the
> schedulers, allocators and resource managers could better deal with.
>
> There _has_ to be. Even if cpusets, sched_setaffinity, mbind, and
> set_mempolicy all disappeared tomorrow, you still have the per-cpu
> kernel threads that have to be placed to a tighter specification than
> the whole of such a 'soft' partition.
Agreed. I'm not proposing that we rip out sched_set/getaffinity, mbind,
etc. What I'm saying is that tasks should not *default* to using these
mechanisms because, at least in their current incarnations, our
scheduler and allocator are written in such a way that these mechanisms
are secondary. The assumption is that the scheduler/allocator can
schedule/allocate wherever they choose. The scheduler does look at
these bindings and if they contradict the decision made we deal with
that after the fact. The allocator has longer code paths and more logic
to deal with if there are bindings in place. So our options are to
either:
1) find a way to not have to rely on these mechanisms for most/all tasks
in the system, or
2) rewrite the scheduler/allocator to deal with these bindings up front,
and take them into consideration early in the scheduling/allocating
process.
> Could you or some appropriate CKRM guru please try to tell me what
> isolation you actually need for CKRM. Matthew or Peter please do the
> same for the schedulers.
>
> In particular, do you need to prohibit any finer grained placement
> within a particular domain, or not. I believe not. Is it not the case
> that what you really need is that the cpusets that correspond to one of
> your domains (my 'soft' partitions, above) be isolated from any other
> such 'soft' partition? Is it not the case that further, finer grained
> placement within such an isolated 'soft' partition is acceptable? Sure
> better be. Indeed, that's pretty much what we have now, with what
> amounts to a single domain covering the entire system.
I must also plead ignorance to the gritty details of CKRM. It would
seem to me, from discussions on this thread, that CKRM could be made to
deal with 'isolated' domains, 'soft' partitions, or 'exclusive' cpusets
without TOO much headache. Basically just telling CKRM that the tasks
in this group are sharing CPU time from a pool of 4 CPUs, rather than
all 16 CPUs in the system. Hubertus? As far as supporting fine grained
binding inside domains, that should definitely be supported in any
solution worthy of acceptance. CKRM, to the best of my knowledge,
currently deals with cpus_allowed, and there's no reason to think that
it wouldn't be able to deal with cpus_allowed in the multiple domain
case.
> Instead of throwing out half of cpusets on claims that it conflicts
> with the requirements of the schedulers, resource managers or (not yet
> raised) the allocators, please be more clear as to what the actual
> requirements are.
That's not really the reason that I was arguing against half of
cpusets. My argument is not related to CKRM's requirements, as I really
don't know what those are! :) My argument is that I don't see what
non-exclusive cpusets buys us. If all we're looking for is basic
task-grouping functionality, I'm quite certain that we can implement
that in a much more light-weight way that doesn't conflict with the
scheduler's decision making process. In fact, for non-exclusive
cpusets, I'd say that we can probably implement that type of
task-grouping in a non-intrusive way that will complement the scheduler
and possibly even improve performance by giving the scheduler a hint
about which tasks should be scheduled together. Using cpus_allowed is
not that way. cpus_allowed should be reserved for what it was
originally meant for: specifying a *strict* subset of CPUs that a task
is restricted to running on.
-Matt
next prev parent reply other threads:[~2004-10-06 23:11 UTC|newest]
Thread overview: 233+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-05 10:08 [PATCH] new bitmap list format (for cpusets) Paul Jackson
2004-08-05 10:10 ` [PATCH] cpusets - big numa cpu and memory placement Paul Jackson
2004-08-05 20:55 ` [Lse-tech] " Martin J. Bligh
2004-08-06 2:05 ` Paul Jackson
2004-08-06 3:24 ` Martin J. Bligh
2004-08-06 8:31 ` Paul Jackson
2004-08-06 15:30 ` Erich Focht
2004-08-06 15:35 ` Martin J. Bligh
2004-08-06 15:48 ` Hubertus Franke
2004-08-07 6:30 ` Paul Jackson
2004-08-07 6:45 ` Paul Jackson
2004-08-06 15:49 ` Hubertus Franke
2004-08-06 15:52 ` Hubertus Franke
2004-08-06 15:55 ` Erich Focht
2004-08-07 6:10 ` Paul Jackson
2004-08-07 15:22 ` Erich Focht
2004-08-07 18:59 ` Paul Jackson
2004-08-08 3:17 ` Paul Jackson
2004-08-08 14:50 ` Martin J. Bligh
2004-08-11 0:43 ` Paul Jackson
2004-08-11 9:40 ` Erich Focht
2004-08-11 14:49 ` Martin J. Bligh
2004-08-11 17:50 ` Paul Jackson
2004-08-11 21:12 ` Shailabh Nagar
2004-08-12 7:15 ` Paul Jackson
2004-08-12 12:58 ` Jack Steiner
2004-08-12 14:50 ` Martin J. Bligh
2004-08-11 15:12 ` Shailabh Nagar
2004-08-08 20:22 ` Shailabh Nagar
2004-08-09 15:57 ` Hubertus Franke
2004-08-10 11:31 ` [ckrm-tech] " Paul Jackson
2004-08-10 22:38 ` Shailabh Nagar
2004-08-11 10:42 ` Erich Focht
2004-08-11 14:56 ` Shailabh Nagar
2004-08-14 8:51 ` Paul Jackson
2004-08-08 19:58 ` Shailabh Nagar
2004-10-01 23:41 ` Andrew Morton
2004-10-02 6:06 ` Paul Jackson
2004-10-02 14:55 ` Dipankar Sarma
2004-10-02 16:14 ` Hubertus Franke
2004-10-02 18:04 ` Paul Jackson
2004-10-02 23:21 ` Peter Williams
2004-10-02 23:44 ` Hubertus Franke
2004-10-03 0:00 ` Peter Williams
2004-10-03 3:44 ` Paul Jackson
2004-10-05 3:13 ` [ckrm-tech] " Matthew Helsley
2004-10-05 8:30 ` Hubertus Franke
2004-10-05 14:20 ` Paul Jackson
2004-10-03 2:59 ` Paul Jackson
2004-10-03 3:19 ` Paul Jackson
2004-10-03 3:53 ` Peter Williams
2004-10-03 4:47 ` Paul Jackson
2004-10-03 5:12 ` Peter Williams
2004-10-03 5:39 ` Paul Jackson
2004-10-03 4:02 ` Paul Jackson
2004-10-03 3:39 ` Paul Jackson
2004-10-03 14:36 ` Martin J. Bligh
2004-10-03 15:39 ` Paul Jackson
2004-10-03 23:53 ` Martin J. Bligh
2004-10-04 0:02 ` Martin J. Bligh
2004-10-04 0:53 ` Paul Jackson
2004-10-04 3:56 ` Martin J. Bligh
2004-10-04 4:24 ` Paul Jackson
2004-10-04 15:03 ` Martin J. Bligh
2004-10-04 15:53 ` [ckrm-tech] " Paul Jackson
2004-10-04 18:17 ` Martin J. Bligh
2004-10-04 20:25 ` Paul Jackson
2004-10-04 22:15 ` Martin J. Bligh
2004-10-05 9:17 ` Paul Jackson
2004-10-05 10:01 ` Paul Jackson
2004-10-05 22:24 ` Matthew Dobson
2004-10-05 9:26 ` Simon Derr
2004-10-05 9:58 ` Paul Jackson
2004-10-05 19:34 ` Martin J. Bligh
2004-10-06 0:28 ` Paul Jackson
2004-10-06 1:16 ` Martin J. Bligh
2004-10-06 2:08 ` Paul Jackson
2004-10-06 22:59 ` Matthew Dobson [this message]
2004-10-06 23:23 ` Peter Williams
2004-10-07 0:16 ` Rick Lindsley
2004-10-07 18:27 ` Paul Jackson
2004-10-07 8:51 ` Paul Jackson
2004-10-07 10:53 ` Rick Lindsley
2004-10-07 14:41 ` Martin J. Bligh
[not found] ` <20041007072842.2bafc320.pj@sgi.com>
2004-10-07 19:05 ` Rick Lindsley
2004-10-10 2:15 ` [ckrm-tech] " Paul Jackson
2004-10-11 22:06 ` Matthew Dobson
2004-10-11 22:58 ` Paul Jackson
2004-10-12 21:22 ` Matthew Dobson
2004-10-12 8:50 ` Simon Derr
2004-10-12 21:25 ` Matthew Dobson
2004-10-10 2:28 ` Paul Jackson
2004-10-09 0:06 ` Matthew Dobson
[not found] ` <4165A31E.4070905@watson.ibm.com>
2004-10-08 13:14 ` Paul Jackson
2004-10-08 15:42 ` Hubertus Franke
2004-10-08 18:23 ` Paul Jackson
2004-10-09 1:00 ` Matthew Dobson
2004-10-09 20:08 ` [Lse-tech] " Paul Jackson
2004-10-11 22:16 ` Matthew Dobson
2004-10-11 22:42 ` Paul Jackson
2004-10-10 0:05 ` Paul Jackson
2004-10-11 22:18 ` Matthew Dobson
2004-10-11 22:39 ` Paul Jackson
2004-10-09 0:51 ` Matthew Dobson
2004-10-10 0:50 ` [Lse-tech] " Paul Jackson
2004-10-10 0:59 ` Paul Jackson
2004-10-09 0:22 ` Matthew Dobson
2004-10-12 22:24 ` [Lse-tech] " Hanna Linder
2004-10-13 20:56 ` Matthew Dobson
2004-10-07 12:47 ` [Lse-tech] " Simon Derr
2004-10-07 14:49 ` Martin J. Bligh
2004-10-07 17:54 ` Paul Jackson
2004-10-07 18:13 ` Martin J. Bligh
2004-10-08 9:23 ` Erich Focht
2004-10-08 9:50 ` Andrew Morton
2004-10-08 10:40 ` Erich Focht
2004-10-08 14:26 ` Martin J. Bligh
2004-10-08 9:53 ` Nick Piggin
2004-10-08 11:40 ` Erich Focht
2004-10-08 14:24 ` Martin J. Bligh
2004-10-08 22:37 ` Erich Focht
2004-10-14 10:35 ` Eric W. Biederman
2004-10-14 11:22 ` Erich Focht
2004-10-14 11:23 ` Paul Jackson
2004-10-14 19:39 ` Paul Jackson
2004-10-14 22:38 ` Hubertus Franke
2004-10-15 1:26 ` Paul Jackson
2004-10-07 18:25 ` Andrew Morton
2004-10-07 19:52 ` Paul Jackson
2004-10-07 21:04 ` [ckrm-tech] " Matthew Helsley
2004-10-10 3:22 ` Paul Jackson
2004-10-07 19:16 ` Rick Lindsley
2004-10-10 2:35 ` Paul Jackson
2004-10-10 5:12 ` [ckrm-tech] " Paul Jackson
2004-10-08 23:48 ` Matthew Dobson
2004-10-09 0:18 ` Nick Piggin
2004-10-11 23:00 ` Matthew Dobson
2004-10-11 23:09 ` Nick Piggin
2004-10-05 22:33 ` Matthew Dobson
2004-10-06 3:01 ` Paul Jackson
2004-10-06 23:12 ` Matthew Dobson
2004-10-07 8:59 ` [ckrm-tech] " Paul Jackson
2004-10-04 0:45 ` Paul Jackson
2004-10-04 11:44 ` Rick Lindsley
2004-10-04 22:46 ` [ckrm-tech] " Paul Jackson
2004-10-05 22:19 ` Matthew Dobson
2004-10-06 2:39 ` Paul Jackson
2004-10-06 23:21 ` Matthew Dobson
2004-10-07 9:41 ` [ckrm-tech] " Paul Jackson
2004-10-06 2:47 ` Paul Jackson
2004-10-06 9:43 ` Simon Derr
2004-10-06 13:27 ` Paul Jackson
2004-10-06 21:55 ` Peter Williams
2004-10-06 22:49 ` Paul Jackson
2004-10-06 8:02 ` Simon Derr
2005-02-07 23:59 ` Matthew Dobson
2005-02-08 0:20 ` Andrew Morton
2005-02-08 0:34 ` Paul Jackson
2005-02-08 9:54 ` Dinakar Guniguntala
2005-02-08 9:49 ` Nick Piggin
2005-02-08 16:13 ` Martin J. Bligh
2005-02-08 23:26 ` Nick Piggin
2005-02-09 4:23 ` Paul Jackson
2005-02-08 19:32 ` Matthew Dobson
2005-02-09 2:53 ` Nick Piggin
2005-02-08 19:00 ` Matthew Dobson
2005-02-08 20:42 ` Paul Jackson
2005-02-08 22:14 ` Matthew Dobson
2005-02-08 23:58 ` Shailabh Nagar
2005-02-09 0:27 ` Paul Jackson
2005-02-09 0:24 ` Paul Jackson
2005-02-09 17:59 ` [ckrm-tech] " Chandra Seetharaman
2005-02-11 2:46 ` Chandra Seetharaman
2005-02-11 9:21 ` Paul Jackson
2005-02-12 1:37 ` Chandra Seetharaman
2005-02-12 6:16 ` Paul Jackson
2005-02-11 16:54 ` Jesse Barnes
2005-02-11 18:42 ` Chandra Seetharaman
2005-02-11 18:50 ` Jesse Barnes
2005-02-08 16:15 ` Martin J. Bligh
2005-02-08 22:17 ` Matthew Dobson
2004-10-03 16:02 ` Paul Jackson
2004-10-03 23:47 ` Martin J. Bligh
2004-10-04 3:33 ` Paul Jackson
2004-10-03 20:10 ` Tim Hockin
2004-10-04 1:56 ` Paul Jackson
2004-10-03 3:35 ` Paul Jackson
2004-10-03 20:21 ` Erich Focht
2004-10-03 20:48 ` Andrew Morton
2004-10-04 14:05 ` Erich Focht
2004-10-04 14:57 ` Martin J. Bligh
2004-10-04 15:30 ` Paul Jackson
2004-10-04 15:41 ` Martin J. Bligh
2004-10-04 16:02 ` Paul Jackson
2004-10-04 18:19 ` Martin J. Bligh
2004-10-04 18:29 ` Paul Jackson
2004-10-04 15:38 ` Paul Jackson
2004-10-04 16:46 ` Paul Jackson
2004-10-04 3:41 ` Paul Jackson
2004-10-04 13:58 ` Hubertus Franke
2004-10-04 14:13 ` Simon Derr
2004-10-04 14:15 ` Erich Focht
2004-10-04 15:23 ` Paul Jackson
2004-10-04 14:37 ` Paul Jackson
2004-10-02 15:46 ` [ckrm-tech] " Marc E. Fiuczynski
2004-10-02 16:17 ` Hubertus Franke
2004-10-02 17:53 ` Paul Jackson
2004-10-02 18:16 ` Hubertus Franke
2004-10-02 19:14 ` Paul Jackson
2004-10-02 23:29 ` Peter Williams
2004-10-02 23:51 ` Hubertus Franke
2004-10-02 20:40 ` Andrew Morton
2004-10-02 23:08 ` Hubertus Franke
2004-10-02 22:26 ` Alan Cox
2004-10-03 2:49 ` Paul Jackson
2004-10-03 12:19 ` Hubertus Franke
2004-10-03 3:25 ` Paul Jackson
2004-10-03 2:26 ` Paul Jackson
2004-10-03 14:11 ` Paul Jackson
2004-10-02 17:47 ` Paul Jackson
2004-08-05 20:47 ` [Lse-tech] [PATCH] new bitmap list format (for cpusets) Martin J. Bligh
2004-08-05 21:45 ` Paul Jackson
[not found] ` <Pine.A41.4.53.0408060930100.20680@isabelle.frec.bull.fr>
2004-08-06 10:14 ` Paul Jackson
2004-08-09 8:01 ` Paul Jackson
2004-08-09 14:49 ` Martin J. Bligh
2004-08-10 23:43 ` Paul Jackson
2004-08-11 13:11 ` Dinakar Guniguntala
2004-08-11 16:17 ` Paul Jackson
2004-08-11 18:05 ` Dinakar Guniguntala
2004-08-11 20:40 ` Paul Jackson
2004-08-12 9:48 ` Dinakar Guniguntala
2004-08-12 10:11 ` Paul Jackson
2004-08-12 12:34 ` Dinakar Guniguntala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1097103580.4907.84.camel@arrakis \
--to=colpatch@us.ibm.com \
--cc=Simon.Derr@bull.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=dipankar@in.ibm.com \
--cc=djh@sgi.com \
--cc=efocht@hpce.nec.com \
--cc=frankeh@watson.ibm.com \
--cc=hch@infradead.org \
--cc=jbarnes@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mbligh@aracnet.com \
--cc=pj@sgi.com \
--cc=pwil3058@bigpond.net.au \
--cc=sivanich@sgi.com \
--cc=steiner@sgi.com \
--cc=sylvain.jeaugey@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox