From: Paul Jackson <pj@sgi.com>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: pwil3058@bigpond.net.au, frankeh@watson.ibm.com,
dipankar@in.ibm.com, akpm@osdl.org,
ckrm-tech@lists.sourceforge.net, efocht@hpce.nec.com,
lse-tech@lists.sourceforge.net, hch@infradead.org,
steiner@sgi.com, jbarnes@sgi.com, sylvain.jeaugey@bull.net,
djh@sgi.com, linux-kernel@vger.kernel.org, colpatch@us.ibm.com,
Simon.Derr@bull.net, ak@suse.de, sivanich@sgi.com
Subject: Re: [ckrm-tech] Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
Date: Tue, 5 Oct 2004 02:17:36 -0700 [thread overview]
Message-ID: <20041005021736.40f51b33.pj@sgi.com> (raw)
In-Reply-To: <13000000.1096928155@flay>
Martin wrote:
> Let me make one thing clear: I don't work on CKRM ;-)
ok ...
Indeed, unless I'm not recognizing someone's expertise properly, there
seems to be a shortage of the CKRM experts on this thread.
Who am I missing ...
> However, the non-dedicated stuff seems much more debateable, and where
> the overlap with CKRM stuff seems possible to me. Do the people showing
> up at random with smaller parallel jobs REALLY, REALLY care about the
> physical layout of the machine? I suspect not, it's not the highly tuned
> syncopated rhythm stuff you describe above. The "give me 1.5 CPUs worth
> of bandwidth please" model of CKRM makes much more sense to me.
It will vary. In shops that are doing alot of highly parallel work,
such as with OpenMP or MPI, many smaller parallel jobs will also be
placement sensitive. The performance of such jobs is hugely sensitive
to their placement and scheduling on dedicated CPUs and Memory, one per
active thread.
These shops will often use a batch scheduler or workload manager, such
as PBS or LSF to manage their jobs. PBS and LSF make a business of
defining various sized cpusets to fit the queued jobs, and running each
job in a dedicated cpuset. Their value comes from obtaining high
utilization, and optimum repeatable runtimes, on a varied input job
stream, especially of placement sensitive jobs. The feature set of
cpusets was driven as much as anything by what was required to support a
port of PBS or LSF.
> I'd argue the interface of specifying physical resources is a bit
> clunky for non-dedicated stuff.
Likeky so - the interface is expected to be wrapped with a user level
'cpuset' library, which converts it to a 'C' friendly model. And that
in turn is expected to be wrapped with a port of LSF or PBS, which
converts placement back to something that the customer finds familiar
and useful for managing their varied job mix.
I don't expect admins at HPC shops to spend much time poking around the
/dev/cpuset file system, though it is a nice way to look around and
figure out how things work.
The /dev/cpuset pseudo file system api was chosen because it was
convenient for small scale work, learning and experimentation, because
it was a natural for the hierarchical name space with permissions that I
required, and because it was convenient to leverage existing vfs
structure in the kernel.
> So personally what I'd like is to have a unified interface
> ...
> Not sure if that's exactly what Andrew was hoping
> for, or the rest of you either ;-)
Well, not what I'm pushing for, that's for sure.
We really have two different mechanisms here:
1) A placement mechanism, explicitly specifying what CPUs and Memory
Nodes are allowed, and
2) A sharing mechanism, specifying what proportion of fungible
resources as cpu cycles, page faults, i/o requests a particular
subset (class) of the user population is to receive.
If you look at the very lowest level hooks for cpusets and CKRM, you
will see the essential difference:
1) cpusets hooks the scheduler to prohibit scheduling on a CPU that
is not allowed, and the allocator to prohibit obtaining memory
on a Node that is not allowed.
2) CKRM hooks these and other places to throttle tasks by inserting
small delays, so as to obtain the requested share or percentage,
per class of user, of the rate of usage of fungible resources.
The specific details which must be passed back and forth across the
boundary between the kernel and user-space for these two mechanisms are
simply different. One controls which of a list of enumerable finite
non-substitutable resources may or may not be used, and the other
controls what share of other anonymous, fungible resources may be used.
Looking for a unified interface is a false economy in my view, and I
am suspicious that such a search reflects a failure to recognize the
essential differences between the two mechanisms.
> The whole discussion about multiple sched-domains, etc, we had earlier
> is kind of just an implementation thing, but is a crapload easier to do
> something efficient here if the bits caring about that stuff are only
> dealing with dedicated resource partitions.
Yes - much easier. I suspect that someday I will have to add to cpusets
the ability to provide, for select cpusets, the additional guarantees
(sole and exclusive ownership of all the CPUs, Memory Nodes, Tasks and
affinity masks therein) which a scheduler or allocator that's trying to
be smart requires to avoid going crazy. Not all cpusets need this - but
those cpusets which define the scope of scheduler or allocator domain
would sure like it. Whatever my exclusive flag means now, I'm sure we
all agree that it is too weak to meet this particular requirement.
> OK, now my email is getting as long as yours, so I'll stop ;-) ;-)
That would be tragic indeed. Good thing you stopped.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
next prev parent reply other threads:[~2004-10-05 9:20 UTC|newest]
Thread overview: 234+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-05 10:08 [PATCH] new bitmap list format (for cpusets) Paul Jackson
2004-08-05 10:10 ` [PATCH] cpusets - big numa cpu and memory placement Paul Jackson
2004-08-05 20:55 ` [Lse-tech] " Martin J. Bligh
2004-08-06 2:05 ` Paul Jackson
2004-08-06 3:24 ` Martin J. Bligh
2004-08-06 8:31 ` Paul Jackson
2004-08-06 15:30 ` Erich Focht
2004-08-06 15:35 ` Martin J. Bligh
2004-08-06 15:48 ` Hubertus Franke
2004-08-07 6:30 ` Paul Jackson
2004-08-07 6:45 ` Paul Jackson
2004-08-06 15:49 ` Hubertus Franke
2004-08-06 15:52 ` Hubertus Franke
2004-08-06 15:55 ` Erich Focht
2004-08-07 6:10 ` Paul Jackson
2004-08-07 15:22 ` Erich Focht
2004-08-07 18:59 ` Paul Jackson
2004-08-08 3:17 ` Paul Jackson
2004-08-08 14:50 ` Martin J. Bligh
2004-08-11 0:43 ` Paul Jackson
2004-08-11 9:40 ` Erich Focht
2004-08-11 14:49 ` Martin J. Bligh
2004-08-11 17:50 ` Paul Jackson
2004-08-11 21:12 ` Shailabh Nagar
2004-08-12 7:15 ` Paul Jackson
2004-08-12 12:58 ` Jack Steiner
2004-08-12 14:50 ` Martin J. Bligh
2004-08-11 15:12 ` Shailabh Nagar
2004-08-08 20:22 ` Shailabh Nagar
2004-08-09 15:57 ` Hubertus Franke
2004-08-10 11:31 ` [ckrm-tech] " Paul Jackson
2004-08-10 22:38 ` Shailabh Nagar
2004-08-11 10:42 ` Erich Focht
2004-08-11 14:56 ` Shailabh Nagar
2004-08-14 8:51 ` Paul Jackson
2004-08-08 19:58 ` Shailabh Nagar
2004-10-01 23:41 ` Andrew Morton
2004-10-02 6:06 ` Paul Jackson
2004-10-02 14:55 ` Dipankar Sarma
2004-10-02 16:14 ` Hubertus Franke
2004-10-02 18:04 ` Paul Jackson
2004-10-02 23:21 ` Peter Williams
2004-10-02 23:44 ` Hubertus Franke
2004-10-03 0:00 ` Peter Williams
2004-10-03 3:44 ` Paul Jackson
2004-10-05 3:13 ` [ckrm-tech] " Matthew Helsley
2004-10-05 8:30 ` Hubertus Franke
2004-10-05 14:20 ` Paul Jackson
2004-10-03 2:59 ` Paul Jackson
2004-10-03 3:19 ` Paul Jackson
2004-10-03 3:53 ` Peter Williams
2004-10-03 4:47 ` Paul Jackson
2004-10-03 5:12 ` Peter Williams
2004-10-03 5:39 ` Paul Jackson
2004-10-03 4:02 ` Paul Jackson
2004-10-03 3:39 ` Paul Jackson
2004-10-03 14:36 ` Martin J. Bligh
2004-10-03 15:39 ` Paul Jackson
2004-10-03 23:53 ` Martin J. Bligh
2004-10-04 0:02 ` Martin J. Bligh
2004-10-04 0:53 ` Paul Jackson
2004-10-04 3:56 ` Martin J. Bligh
2004-10-04 4:24 ` Paul Jackson
2004-10-04 15:03 ` Martin J. Bligh
2004-10-04 15:53 ` [ckrm-tech] " Paul Jackson
2004-10-04 18:17 ` Martin J. Bligh
2004-10-04 20:25 ` Paul Jackson
2004-10-04 22:15 ` Martin J. Bligh
2004-10-05 9:17 ` Paul Jackson [this message]
2004-10-05 10:01 ` Paul Jackson
2004-10-05 22:24 ` Matthew Dobson
2004-10-05 9:26 ` Simon Derr
2004-10-05 9:58 ` Paul Jackson
2004-10-05 19:34 ` Martin J. Bligh
2004-10-06 0:28 ` Paul Jackson
2004-10-06 1:16 ` Martin J. Bligh
2004-10-06 2:08 ` Paul Jackson
2004-10-06 22:59 ` Matthew Dobson
2004-10-06 23:23 ` Peter Williams
2004-10-07 0:16 ` Rick Lindsley
2004-10-07 18:27 ` Paul Jackson
2004-10-07 8:51 ` Paul Jackson
2004-10-07 10:53 ` Rick Lindsley
2004-10-07 14:41 ` Martin J. Bligh
[not found] ` <20041007072842.2bafc320.pj@sgi.com>
2004-10-07 19:05 ` Rick Lindsley
2004-10-10 2:15 ` [ckrm-tech] " Paul Jackson
2004-10-11 22:06 ` Matthew Dobson
2004-10-11 22:58 ` Paul Jackson
2004-10-12 21:22 ` Matthew Dobson
2004-10-12 8:50 ` Simon Derr
2004-10-12 21:25 ` Matthew Dobson
2004-10-10 2:28 ` Paul Jackson
2004-10-09 0:06 ` Matthew Dobson
[not found] ` <4165A31E.4070905@watson.ibm.com>
2004-10-08 13:14 ` Paul Jackson
2004-10-08 15:42 ` Hubertus Franke
2004-10-08 18:23 ` Paul Jackson
2004-10-09 1:00 ` Matthew Dobson
2004-10-09 20:08 ` [Lse-tech] " Paul Jackson
2004-10-11 22:16 ` Matthew Dobson
2004-10-11 22:42 ` Paul Jackson
2004-10-10 0:05 ` Paul Jackson
2004-10-11 22:18 ` Matthew Dobson
2004-10-11 22:39 ` Paul Jackson
2004-10-09 0:51 ` Matthew Dobson
2004-10-10 0:50 ` [Lse-tech] " Paul Jackson
2004-10-10 0:59 ` Paul Jackson
2004-10-09 0:22 ` Matthew Dobson
2004-10-12 22:24 ` [Lse-tech] " Hanna Linder
2004-10-13 20:56 ` Matthew Dobson
2004-10-07 12:47 ` [Lse-tech] " Simon Derr
2004-10-07 14:49 ` Martin J. Bligh
2004-10-07 17:54 ` Paul Jackson
2004-10-07 18:13 ` Martin J. Bligh
2004-10-08 9:23 ` Erich Focht
2004-10-08 9:50 ` Andrew Morton
2004-10-08 10:40 ` Erich Focht
2004-10-08 14:26 ` Martin J. Bligh
2004-10-08 9:53 ` Nick Piggin
2004-10-08 11:40 ` Erich Focht
2004-10-08 14:24 ` Martin J. Bligh
2004-10-08 22:37 ` Erich Focht
2004-10-14 10:35 ` Eric W. Biederman
2004-10-14 11:22 ` Erich Focht
2004-10-14 11:23 ` Paul Jackson
2004-10-14 19:39 ` Paul Jackson
2004-10-14 22:38 ` Hubertus Franke
2004-10-15 1:26 ` Paul Jackson
2004-10-07 18:25 ` Andrew Morton
2004-10-07 19:52 ` Paul Jackson
2004-10-07 21:04 ` [ckrm-tech] " Matthew Helsley
2004-10-10 3:22 ` Paul Jackson
2004-10-07 19:16 ` Rick Lindsley
2004-10-10 2:35 ` Paul Jackson
2004-10-10 5:12 ` [ckrm-tech] " Paul Jackson
2004-10-08 23:48 ` Matthew Dobson
2004-10-09 0:18 ` Nick Piggin
2004-10-11 23:00 ` Matthew Dobson
2004-10-11 23:09 ` Nick Piggin
2004-10-05 22:33 ` Matthew Dobson
2004-10-06 3:01 ` Paul Jackson
2004-10-06 23:12 ` Matthew Dobson
2004-10-07 8:59 ` [ckrm-tech] " Paul Jackson
2004-10-04 0:45 ` Paul Jackson
2004-10-04 11:44 ` Rick Lindsley
2004-10-04 22:46 ` [ckrm-tech] " Paul Jackson
2004-10-05 22:19 ` Matthew Dobson
2004-10-06 2:39 ` Paul Jackson
2004-10-06 23:21 ` Matthew Dobson
2004-10-07 9:41 ` [ckrm-tech] " Paul Jackson
2004-10-06 2:47 ` Paul Jackson
2004-10-06 9:43 ` Simon Derr
2004-10-06 13:27 ` Paul Jackson
2004-10-06 21:55 ` Peter Williams
2004-10-06 22:49 ` Paul Jackson
2004-10-06 8:02 ` Simon Derr
2005-02-07 23:59 ` Matthew Dobson
2005-02-08 0:20 ` Andrew Morton
2005-02-08 0:34 ` Paul Jackson
2005-02-08 9:54 ` Dinakar Guniguntala
2005-02-08 9:49 ` Nick Piggin
2005-02-08 16:13 ` Martin J. Bligh
2005-02-08 23:26 ` Nick Piggin
2005-02-09 4:23 ` Paul Jackson
2005-02-08 19:32 ` Matthew Dobson
2005-02-09 2:53 ` Nick Piggin
2005-02-08 19:00 ` Matthew Dobson
2005-02-08 20:42 ` Paul Jackson
2005-02-08 22:14 ` Matthew Dobson
2005-02-08 23:58 ` Shailabh Nagar
2005-02-09 0:27 ` Paul Jackson
2005-02-09 0:24 ` Paul Jackson
2005-02-09 17:59 ` [ckrm-tech] " Chandra Seetharaman
2005-02-11 2:46 ` Chandra Seetharaman
2005-02-11 9:21 ` Paul Jackson
2005-02-12 1:37 ` Chandra Seetharaman
2005-02-12 6:16 ` Paul Jackson
2005-02-11 16:54 ` Jesse Barnes
2005-02-11 18:42 ` Chandra Seetharaman
2005-02-11 18:50 ` Jesse Barnes
2005-02-08 16:15 ` Martin J. Bligh
2005-02-08 22:17 ` Matthew Dobson
2004-10-03 16:02 ` Paul Jackson
2004-10-03 23:47 ` Martin J. Bligh
2004-10-04 3:33 ` Paul Jackson
2004-10-03 20:10 ` Tim Hockin
2004-10-04 1:56 ` Paul Jackson
2004-10-03 3:35 ` Paul Jackson
2004-10-03 20:21 ` Erich Focht
2004-10-03 20:48 ` Andrew Morton
2004-10-04 14:05 ` Erich Focht
2004-10-04 14:57 ` Martin J. Bligh
2004-10-04 15:30 ` Paul Jackson
2004-10-04 15:41 ` Martin J. Bligh
2004-10-04 16:02 ` Paul Jackson
2004-10-04 18:19 ` Martin J. Bligh
2004-10-04 18:29 ` Paul Jackson
2004-10-04 15:38 ` Paul Jackson
2004-10-04 16:46 ` Paul Jackson
2004-10-04 3:41 ` Paul Jackson
2004-10-04 13:58 ` Hubertus Franke
2004-10-04 14:13 ` Simon Derr
2004-10-04 14:15 ` Erich Focht
2004-10-04 15:23 ` Paul Jackson
2004-10-04 14:37 ` Paul Jackson
2004-10-02 15:46 ` [ckrm-tech] " Marc E. Fiuczynski
2004-10-02 16:17 ` Hubertus Franke
2004-10-02 17:53 ` Paul Jackson
2004-10-02 18:16 ` Hubertus Franke
2004-10-02 19:14 ` Paul Jackson
2004-10-02 23:29 ` Peter Williams
2004-10-02 23:51 ` Hubertus Franke
2004-10-02 20:40 ` Andrew Morton
2004-10-02 23:08 ` Hubertus Franke
2004-10-02 22:26 ` Alan Cox
2004-10-03 2:49 ` Paul Jackson
2004-10-03 12:19 ` Hubertus Franke
2004-10-03 3:25 ` Paul Jackson
2004-10-03 2:26 ` Paul Jackson
2004-10-03 14:11 ` Paul Jackson
2004-10-02 17:47 ` Paul Jackson
2004-08-05 20:47 ` [Lse-tech] [PATCH] new bitmap list format (for cpusets) Martin J. Bligh
2004-08-05 21:45 ` Paul Jackson
[not found] ` <Pine.A41.4.53.0408060930100.20680@isabelle.frec.bull.fr>
2004-08-06 10:14 ` Paul Jackson
2004-08-09 8:01 ` Paul Jackson
2004-08-09 14:49 ` Martin J. Bligh
2004-08-10 23:43 ` Paul Jackson
2004-08-11 13:11 ` Dinakar Guniguntala
2004-08-11 16:17 ` Paul Jackson
2004-08-11 18:05 ` Dinakar Guniguntala
2004-08-11 20:40 ` Paul Jackson
2004-08-12 9:48 ` Dinakar Guniguntala
2004-08-12 10:11 ` Paul Jackson
2004-08-12 12:34 ` Dinakar Guniguntala
-- strict thread matches above, loose matches on Subject: below --
2004-10-05 6:05 [ckrm-tech] Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041005021736.40f51b33.pj@sgi.com \
--to=pj@sgi.com \
--cc=Simon.Derr@bull.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=colpatch@us.ibm.com \
--cc=dipankar@in.ibm.com \
--cc=djh@sgi.com \
--cc=efocht@hpce.nec.com \
--cc=frankeh@watson.ibm.com \
--cc=hch@infradead.org \
--cc=jbarnes@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mbligh@aracnet.com \
--cc=pwil3058@bigpond.net.au \
--cc=sivanich@sgi.com \
--cc=steiner@sgi.com \
--cc=sylvain.jeaugey@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox