From: Shailabh Nagar <nagar@watson.ibm.com>
To: Paul Jackson <pj@sgi.com>
Cc: Erich Focht <efocht@hpce.nec.com>,
mbligh@aracnet.com, lse-tech@lists.sourceforge.net,
akpm@osdl.org, hch@infradead.org, steiner@sgi.com,
jbarnes@sgi.com, sylvain.jeaugey@bull.net, djh@sgi.com,
linux-kernel@vger.kernel.org, colpatch@us.ibm.com,
Simon.Derr@bull.net, ak@suse.de, sivanich@sgi.com
Subject: Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
Date: Sun, 08 Aug 2004 15:58:14 -0400 [thread overview]
Message-ID: <411685D6.5040405@watson.ibm.com> (raw)
In-Reply-To: <20040806231013.2b6c44df.pj@sgi.com>
Paul Jackson wrote:
> Erich Focht wrote:
>
>>we (NEC) are also a potential user of this patch
>
>
> Good - welcome.
>
>
>
>>I think cpusets and CKRM should be
>>made to come together. One of CKRM's user interfaces is a filesystem
>>with the file-tree representing the class hierarchy. It's the same for
>>cpusets.
>
>
> Hmmm ... this suggestion worries me, for a couple of reasons.
>
> Just because cpusets and CKRM both have a hierarchy represented in a
> file system doesn't mean it is, or can be, the same file system. Not
> all trees are the same.
>
> Perhaps someone more expert in CKRM can help here. The cpuset hierarchy
> has some strict semantics:
> 1) Any cpusets CPUs and Memory must be a subset of its parents.
> 2) A cpuset may be exclusive for CPU or Memory only if its parent is.
> 3) A CPU or Memory exclusive cpuset may not overlap its siblings.
>
> See the routine kernel/cpuset.c:validate_change() for the exact
> coding of these rules.
>
> If we followed your suggestion, Erich, would these rules still hold?
> I can't imagine that the CKRM folks have any existing hierarchies with
> these particular rules. They would need to if we went this way.
As CKRM stands today, we wouldn't be able to impose these constraints
for exactly the reasons you point out. The other controllers would not
forbid the move of a task violating the above rules to a CKRM class but
this controller (CKRM's version of cpusets) would. Currently, on a task
move, CKRM's core calls per-controller callbacks so the controller can
make modifications to the controller-specific per-class objects. But
controllers can't prevent such a move.
However, one of the CKRM changes suggested in the Kernel Summit was to
split up the controllers and not have them bundled within a "core" class
as we call it. In this model, each task would directly belong to some
controller-specific class.
If CKRM were to adopt this change, one *potential* (but not necessary)
consequence, is to have multiple hierarchies, one per-controller,
exposed to the user e.g. instead of /rcfs/taskclass/<sameclasstree>, we
would have /rcfs/cpu/<oneclasstree> and /rcfs/mem/<anotherclasstree> etc.
In such a scenario, it would be more logical for the controller to
constrain memberships (i.e. task moves, class share setting while it is
part of a hierarchy etc.) and it would be easy for cpusets to get its
semantics.
>
> On the flip side, what additional rules, if any, would CKRM impose
> on this hierarchy?
Currently, we impose rules on the shares that one can set (child cannot
have more than its parent, sibling shares should add up etc.) and we'd
discussed, but not implemented yet, some limit on how deep the common
hierarchy would go.
>
> The other reason that this suggestion worries me is a bit more
> philosophical. I'm sure that for all the other, well known,
> resources that CKRM manages, no one is proposing replacing whatever
> existing names and mechanisms exist for those resources, such as
> bandwidth, compute cycles, memory, ... Rather I presume that CKRM
> provides an additional resource management layer on top of the
> existing resources, which retain their classic names and apparatus.
>
> What you seem to be suggesting here, especially with this nice
> picture from your next post:
>
> The files in cpusets are:
> - cpus: list of CPUs in that cpuset
> - mems: list of Memory Nodes in that cpuset
> - cpu_exclusive flag: is cpu placement exclusive?
> - mem_exclusive flag: is memory placement exclusive?
> - tasks: list of tasks (by pid) attached to that cpuset
> The files in a CKRM class directory:
> - stats : statistics (not needed for cpusets)
> - shares : could contain cpus, mems, cpu_exclusive, mem_exclusive
> - members : same as reading /dev/cpusets/.../tasks
> - target : same as writing /dev/cpusets/.../tasks
>
> Changing the "shares" would mean something like
> echo "cpus +6-10" > .../shares
>
> would remove the cpuset specific interface forever, leaving it only
> visible via a more generic "shares, members, target" interface suitable
> for abstract resource management.
>
> I am afraid that this would make it harder for new users of cpusets to
> figure them out. Just cpusets by themselves add a new and strange layer
> of abstraction, that will require a little bit of head scratching (as
> Martin Bligh can testify to, from recent experience ;) for those
> administering and managing the big iron where cpusets will be useful.
>
> To add yet another layer of abstractions on top of that, from the CKRM
> world, might send quite a few users into mental overload, doing the
> usual stupid things we all do when we have given up on understanding and
> are just thrashing about, trying to get something to work.
>
> I think we are onto something useful here, the hierarchical organizing
> of compute resources of CPU and Memory, which will become increasingly
> relevant in the coming years, with bigger machines and more complex
> compute and memory architectures.
>
> I'd hate to see cpusets hidden behind resource management terms from day
> one.
Yup, thats a valid concern. In this current round of CKRM redesign,
we're considering whether controllers should be allowed to export their
own interface (in a sense) by accepting different kinds of share
settings. That is already true today in case of the "stats" and "config"
virtual files which don't have any CKRM-imposed semantics. Only "shares"
has a CKRM-defined set of values defined, not all of which are useful
or will be implemented by a controller. We're debating whether to make
that one controller-dependent too. If that happens, it'll make it
somewhat better for cpusets. But I'm not sure if we'd want to go so far
as to allow controllers to define what virtual files they export......we
do that today for the classification engine because it is an entirely
different beast but the controllers are similar.....
> And, looking at it from the CKRM side (not sure I can, I'll try ...)
> would it not seem a bit odd to a CKRM user that just one of the resource
> types managed, these cpusets, had no apparent existence outside of the
> CKRM hierarchy, unlike all the other resources, which existed a priori,
> and, I presume, continue their independent existance?
From just the viewpoint of cpusets (not adding mem), it seems to be
quite similar to what CKRM's other controllers are doing - grouping a
per-task control (in your case, sched_setaffinity) using hierarchical
sets.
>
> Obviously, I could use a little CKRM expertise here.
>
> But my inclination is to continue to view these two projects as separate,
> with the potential that CKRM will someday add cpusets to the resource types
> that it can manage.
Umm... I'm quite sure you mean , you'll contribute code to do that,
right ? :-)
It looks like the interface issue is the main one from both projects'
pov. Hopefully things will become clearer in the next week or so when
ckrm-tech thrashes out the Kernel Summit suggestion (it has other
ramifications besides interface).
-- Shailabh
>
> Thank-you.
>
next prev parent reply other threads:[~2004-08-08 20:08 UTC|newest]
Thread overview: 233+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-05 10:08 [PATCH] new bitmap list format (for cpusets) Paul Jackson
2004-08-05 10:10 ` [PATCH] cpusets - big numa cpu and memory placement Paul Jackson
2004-08-05 20:55 ` [Lse-tech] " Martin J. Bligh
2004-08-06 2:05 ` Paul Jackson
2004-08-06 3:24 ` Martin J. Bligh
2004-08-06 8:31 ` Paul Jackson
2004-08-06 15:30 ` Erich Focht
2004-08-06 15:35 ` Martin J. Bligh
2004-08-06 15:48 ` Hubertus Franke
2004-08-07 6:30 ` Paul Jackson
2004-08-07 6:45 ` Paul Jackson
2004-08-06 15:49 ` Hubertus Franke
2004-08-06 15:52 ` Hubertus Franke
2004-08-06 15:55 ` Erich Focht
2004-08-07 6:10 ` Paul Jackson
2004-08-07 15:22 ` Erich Focht
2004-08-07 18:59 ` Paul Jackson
2004-08-08 3:17 ` Paul Jackson
2004-08-08 14:50 ` Martin J. Bligh
2004-08-11 0:43 ` Paul Jackson
2004-08-11 9:40 ` Erich Focht
2004-08-11 14:49 ` Martin J. Bligh
2004-08-11 17:50 ` Paul Jackson
2004-08-11 21:12 ` Shailabh Nagar
2004-08-12 7:15 ` Paul Jackson
2004-08-12 12:58 ` Jack Steiner
2004-08-12 14:50 ` Martin J. Bligh
2004-08-11 15:12 ` Shailabh Nagar
2004-08-08 20:22 ` Shailabh Nagar
2004-08-09 15:57 ` Hubertus Franke
2004-08-10 11:31 ` [ckrm-tech] " Paul Jackson
2004-08-10 22:38 ` Shailabh Nagar
2004-08-11 10:42 ` Erich Focht
2004-08-11 14:56 ` Shailabh Nagar
2004-08-14 8:51 ` Paul Jackson
2004-08-08 19:58 ` Shailabh Nagar [this message]
2004-10-01 23:41 ` Andrew Morton
2004-10-02 6:06 ` Paul Jackson
2004-10-02 14:55 ` Dipankar Sarma
2004-10-02 16:14 ` Hubertus Franke
2004-10-02 18:04 ` Paul Jackson
2004-10-02 23:21 ` Peter Williams
2004-10-02 23:44 ` Hubertus Franke
2004-10-03 0:00 ` Peter Williams
2004-10-03 3:44 ` Paul Jackson
2004-10-05 3:13 ` [ckrm-tech] " Matthew Helsley
2004-10-05 8:30 ` Hubertus Franke
2004-10-05 14:20 ` Paul Jackson
2004-10-03 2:59 ` Paul Jackson
2004-10-03 3:19 ` Paul Jackson
2004-10-03 3:53 ` Peter Williams
2004-10-03 4:47 ` Paul Jackson
2004-10-03 5:12 ` Peter Williams
2004-10-03 5:39 ` Paul Jackson
2004-10-03 4:02 ` Paul Jackson
2004-10-03 3:39 ` Paul Jackson
2004-10-03 14:36 ` Martin J. Bligh
2004-10-03 15:39 ` Paul Jackson
2004-10-03 23:53 ` Martin J. Bligh
2004-10-04 0:02 ` Martin J. Bligh
2004-10-04 0:53 ` Paul Jackson
2004-10-04 3:56 ` Martin J. Bligh
2004-10-04 4:24 ` Paul Jackson
2004-10-04 15:03 ` Martin J. Bligh
2004-10-04 15:53 ` [ckrm-tech] " Paul Jackson
2004-10-04 18:17 ` Martin J. Bligh
2004-10-04 20:25 ` Paul Jackson
2004-10-04 22:15 ` Martin J. Bligh
2004-10-05 9:17 ` Paul Jackson
2004-10-05 10:01 ` Paul Jackson
2004-10-05 22:24 ` Matthew Dobson
2004-10-05 9:26 ` Simon Derr
2004-10-05 9:58 ` Paul Jackson
2004-10-05 19:34 ` Martin J. Bligh
2004-10-06 0:28 ` Paul Jackson
2004-10-06 1:16 ` Martin J. Bligh
2004-10-06 2:08 ` Paul Jackson
2004-10-06 22:59 ` Matthew Dobson
2004-10-06 23:23 ` Peter Williams
2004-10-07 0:16 ` Rick Lindsley
2004-10-07 18:27 ` Paul Jackson
2004-10-07 8:51 ` Paul Jackson
2004-10-07 10:53 ` Rick Lindsley
2004-10-07 14:41 ` Martin J. Bligh
[not found] ` <20041007072842.2bafc320.pj@sgi.com>
2004-10-07 19:05 ` Rick Lindsley
2004-10-10 2:15 ` [ckrm-tech] " Paul Jackson
2004-10-11 22:06 ` Matthew Dobson
2004-10-11 22:58 ` Paul Jackson
2004-10-12 21:22 ` Matthew Dobson
2004-10-12 8:50 ` Simon Derr
2004-10-12 21:25 ` Matthew Dobson
2004-10-10 2:28 ` Paul Jackson
[not found] ` <4165A31E.4070905@watson.ibm.com>
2004-10-08 13:14 ` Paul Jackson
2004-10-08 15:42 ` Hubertus Franke
2004-10-08 18:23 ` Paul Jackson
2004-10-09 1:00 ` Matthew Dobson
2004-10-09 20:08 ` [Lse-tech] " Paul Jackson
2004-10-11 22:16 ` Matthew Dobson
2004-10-11 22:42 ` Paul Jackson
2004-10-10 0:05 ` Paul Jackson
2004-10-11 22:18 ` Matthew Dobson
2004-10-11 22:39 ` Paul Jackson
2004-10-09 0:51 ` Matthew Dobson
2004-10-10 0:50 ` [Lse-tech] " Paul Jackson
2004-10-10 0:59 ` Paul Jackson
2004-10-09 0:22 ` Matthew Dobson
2004-10-12 22:24 ` [Lse-tech] " Hanna Linder
2004-10-13 20:56 ` Matthew Dobson
2004-10-09 0:06 ` [Lse-tech] " Matthew Dobson
2004-10-07 12:47 ` Simon Derr
2004-10-07 14:49 ` Martin J. Bligh
2004-10-07 17:54 ` Paul Jackson
2004-10-07 18:13 ` Martin J. Bligh
2004-10-08 9:23 ` Erich Focht
2004-10-08 9:50 ` Andrew Morton
2004-10-08 10:40 ` Erich Focht
2004-10-08 14:26 ` Martin J. Bligh
2004-10-08 9:53 ` Nick Piggin
2004-10-08 11:40 ` Erich Focht
2004-10-08 14:24 ` Martin J. Bligh
2004-10-08 22:37 ` Erich Focht
2004-10-14 10:35 ` Eric W. Biederman
2004-10-14 11:22 ` Erich Focht
2004-10-14 11:23 ` Paul Jackson
2004-10-14 19:39 ` Paul Jackson
2004-10-14 22:38 ` Hubertus Franke
2004-10-15 1:26 ` Paul Jackson
2004-10-07 18:25 ` Andrew Morton
2004-10-07 19:52 ` Paul Jackson
2004-10-07 21:04 ` [ckrm-tech] " Matthew Helsley
2004-10-10 3:22 ` Paul Jackson
2004-10-07 19:16 ` Rick Lindsley
2004-10-10 2:35 ` Paul Jackson
2004-10-10 5:12 ` [ckrm-tech] " Paul Jackson
2004-10-08 23:48 ` Matthew Dobson
2004-10-09 0:18 ` Nick Piggin
2004-10-11 23:00 ` Matthew Dobson
2004-10-11 23:09 ` Nick Piggin
2004-10-05 22:33 ` Matthew Dobson
2004-10-06 3:01 ` Paul Jackson
2004-10-06 23:12 ` Matthew Dobson
2004-10-07 8:59 ` [ckrm-tech] " Paul Jackson
2004-10-04 0:45 ` Paul Jackson
2004-10-04 11:44 ` Rick Lindsley
2004-10-04 22:46 ` [ckrm-tech] " Paul Jackson
2004-10-05 22:19 ` Matthew Dobson
2004-10-06 2:39 ` Paul Jackson
2004-10-06 23:21 ` Matthew Dobson
2004-10-07 9:41 ` [ckrm-tech] " Paul Jackson
2004-10-06 2:47 ` Paul Jackson
2004-10-06 9:43 ` Simon Derr
2004-10-06 13:27 ` Paul Jackson
2004-10-06 21:55 ` Peter Williams
2004-10-06 22:49 ` Paul Jackson
2004-10-06 8:02 ` Simon Derr
2005-02-07 23:59 ` Matthew Dobson
2005-02-08 0:20 ` Andrew Morton
2005-02-08 0:34 ` Paul Jackson
2005-02-08 9:54 ` Dinakar Guniguntala
2005-02-08 9:49 ` Nick Piggin
2005-02-08 16:13 ` Martin J. Bligh
2005-02-08 23:26 ` Nick Piggin
2005-02-09 4:23 ` Paul Jackson
2005-02-08 19:32 ` Matthew Dobson
2005-02-09 2:53 ` Nick Piggin
2005-02-08 19:00 ` Matthew Dobson
2005-02-08 20:42 ` Paul Jackson
2005-02-08 22:14 ` Matthew Dobson
2005-02-08 23:58 ` Shailabh Nagar
2005-02-09 0:27 ` Paul Jackson
2005-02-09 0:24 ` Paul Jackson
2005-02-09 17:59 ` [ckrm-tech] " Chandra Seetharaman
2005-02-11 2:46 ` Chandra Seetharaman
2005-02-11 9:21 ` Paul Jackson
2005-02-12 1:37 ` Chandra Seetharaman
2005-02-12 6:16 ` Paul Jackson
2005-02-11 16:54 ` Jesse Barnes
2005-02-11 18:42 ` Chandra Seetharaman
2005-02-11 18:50 ` Jesse Barnes
2005-02-08 16:15 ` Martin J. Bligh
2005-02-08 22:17 ` Matthew Dobson
2004-10-03 16:02 ` Paul Jackson
2004-10-03 23:47 ` Martin J. Bligh
2004-10-04 3:33 ` Paul Jackson
2004-10-03 20:10 ` Tim Hockin
2004-10-04 1:56 ` Paul Jackson
2004-10-03 3:35 ` Paul Jackson
2004-10-03 20:21 ` Erich Focht
2004-10-03 20:48 ` Andrew Morton
2004-10-04 14:05 ` Erich Focht
2004-10-04 14:57 ` Martin J. Bligh
2004-10-04 15:30 ` Paul Jackson
2004-10-04 15:41 ` Martin J. Bligh
2004-10-04 16:02 ` Paul Jackson
2004-10-04 18:19 ` Martin J. Bligh
2004-10-04 18:29 ` Paul Jackson
2004-10-04 15:38 ` Paul Jackson
2004-10-04 16:46 ` Paul Jackson
2004-10-04 3:41 ` Paul Jackson
2004-10-04 13:58 ` Hubertus Franke
2004-10-04 14:13 ` Simon Derr
2004-10-04 14:15 ` Erich Focht
2004-10-04 15:23 ` Paul Jackson
2004-10-04 14:37 ` Paul Jackson
2004-10-02 15:46 ` [ckrm-tech] " Marc E. Fiuczynski
2004-10-02 16:17 ` Hubertus Franke
2004-10-02 17:53 ` Paul Jackson
2004-10-02 18:16 ` Hubertus Franke
2004-10-02 19:14 ` Paul Jackson
2004-10-02 23:29 ` Peter Williams
2004-10-02 23:51 ` Hubertus Franke
2004-10-02 20:40 ` Andrew Morton
2004-10-02 23:08 ` Hubertus Franke
2004-10-02 22:26 ` Alan Cox
2004-10-03 2:49 ` Paul Jackson
2004-10-03 12:19 ` Hubertus Franke
2004-10-03 3:25 ` Paul Jackson
2004-10-03 2:26 ` Paul Jackson
2004-10-03 14:11 ` Paul Jackson
2004-10-02 17:47 ` Paul Jackson
2004-08-05 20:47 ` [Lse-tech] [PATCH] new bitmap list format (for cpusets) Martin J. Bligh
2004-08-05 21:45 ` Paul Jackson
[not found] ` <Pine.A41.4.53.0408060930100.20680@isabelle.frec.bull.fr>
2004-08-06 10:14 ` Paul Jackson
2004-08-09 8:01 ` Paul Jackson
2004-08-09 14:49 ` Martin J. Bligh
2004-08-10 23:43 ` Paul Jackson
2004-08-11 13:11 ` Dinakar Guniguntala
2004-08-11 16:17 ` Paul Jackson
2004-08-11 18:05 ` Dinakar Guniguntala
2004-08-11 20:40 ` Paul Jackson
2004-08-12 9:48 ` Dinakar Guniguntala
2004-08-12 10:11 ` Paul Jackson
2004-08-12 12:34 ` Dinakar Guniguntala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=411685D6.5040405@watson.ibm.com \
--to=nagar@watson.ibm.com \
--cc=Simon.Derr@bull.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=colpatch@us.ibm.com \
--cc=djh@sgi.com \
--cc=efocht@hpce.nec.com \
--cc=hch@infradead.org \
--cc=jbarnes@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mbligh@aracnet.com \
--cc=pj@sgi.com \
--cc=sivanich@sgi.com \
--cc=steiner@sgi.com \
--cc=sylvain.jeaugey@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox