From: Paul Jackson <pj@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: hch@infradead.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.13-rc3-mm1 (ckrm)
Date: Sun, 17 Jul 2005 08:20:00 -0700 [thread overview]
Message-ID: <20050717082000.349b391f.pj@sgi.com> (raw)
In-Reply-To: <20050715131610.25c25c15.akpm@osdl.org>
Andrew, replying to Christoph, about CKRM:
> What, in your opinion, makes it "obviously unmergeable"?
Thanks to some earlier discussions on the relation of CKRM with
cpusets, I've spent some time looking at CKRM. I'm not Christoph,
but perhaps my notes will be of some use in this matter.
CKRM is big, it's difficult for us mere mortals to understand, and it
has attracted only limited review - inadequate review in proportion
to its size and impact. I tried, and failed, sometime last year to
explain some of what I found difficult to grasp of CKRM to the folks
doing it. See further an email thread entitled:
Classes: 1) what are they, 2) what is their name?
http://sourceforge.net/mailarchive/forum.php?thread_id=5328162&forum_id=35191
on the ckrm-tech@lists.sourceforge.net email list between Aug 14 and
Aug 27, 2004
As to its size, CKRM is in a 2.6.5 variant of SuSE that I happen to be
building just now for other reasons. The source files that have 'ckrm'
in the pathname, _not_ counting Doc files, total 13044 lines of text.
The CONFIG_CKRM* config options add 144 Kbytes to the kernel text.
The CKRM patches in 2.6.13-rc3-mm1 are similar in size. These patch
files total 14367 lines of text.
It is somewhat intrusive in the areas it controls, such as some large
ifdef's in kernel/sched.c.
The sched hooks may well impact the cost of maintaining the sched code,
which is always a hotbed of Linux kernel development. However others
who work in that area will have to speak to that concern.
I tried just now to read through the ckrm hooks in fork, to see
what sort of impact they might have on scalability on large systems.
But I gave up after a couple layers of indirection. I saw several
atomic counters and a couple of spinlocks that I suspect (not at all
sure) lay on the fork main code path. I'd be surprised if this didn't
impact scalability. Earlier, according to my notes, I saw mention of
lmbench results in the OLS 2004 slides, indicating a several percent
cost of available cpu cycles.
A feature of this size and impact needs to attract a fair bit of
discussion, because it is essential to a variety of people, or because
it is intriguing in some other way.
I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors. If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.
The threshold of what is sufficient review needs to be set rather high
for such a patch, quite a bit higher than I believe it has obtained
so far. It will not be easy for them to obtain that level of review,
until they get better at arousing the substained interest of other
kernel developers.
There may well be multiple end users and applications depending on
CKRM, but I have not been able to identify how many separate vendors
provide middleware that depends on CKRM. I am guessing that only one
vendor has a serious middleware software product that provides full
CKRM support. Acceptance of CKRM would be easier if multiple competing
middleware vendors were using it. It is also a concern that CKRM
is not really usable for its primary intended purpose except if it
is accompanied by this corresponding middleware, which I presume is
proprietary code. I'd like to see a persuasive case that CKRM is
useful and used on production systems not running substantial sole
sourced proprietary middleware.
The development and maintenance costs so far of CKRM appear (to
this outsider) to have been substantial, which suggests that the
maintenance costs of CKRM once in the kernel would be non-trivial.
Given the size of the project, its impact on kernel code, and the
rather limited degree to which developers outside of the CKRM project
have participated in CKRM's development or review, this could either
leave the Linux kernel overly dependent on one vendor for maintaining
CKRM, or place an undo maintenance burden on other kernel developers.
CKRM is in part a generalization and descendent of what I call fair
share schedulers. For example, the fork hooks for CKRM include a
forkrates controller, to slow down the rate of forking of tasks using
too much resources.
No doubt the CKRM experts are already familiar with these, but for
the possible benefit of other readers:
UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
SHARE II -- A User Administration and Resource Control System for UNIX
http://www.c-side.com/c/papers/lisa-91.html
Solaris Resource Manager White Paper
http://wwws.sun.com/software/resourcemgr/wp-mixed/
ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
A Fair Share Scheduler, J. Kay and P. Lauder
Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.
The documentation that I've noticed (likely I've missed something)
doesn't do an adequate job of making the case - providing the
motivation and context essential to understanding this patch set.
Because CKRM provides an infrastructure for multiple controllers
(limiting forks, memory allocation and network rates) and multiple
classifiers and policies, its critical interfaces have rather
generic and abstract names. This makes it difficult for others to
approach CKRM, reducing the rate of peer review by other Linux kernel
developers, which is perhaps the key impediment to acceptance of CKRM.
If anything, CKRM tends to be a little too abstract.
Inclusion of diffstat output would help convey to others the scope
of the patchset.
My notes from many months ago indicate something about a 128 CPU
limit in CKRM. I don't know why, nor if it still applies. It is
certainly a smaller limit than the systems I care about.
A major restructuring of this patch set could be considered, This
might involve making the metric tools (that monitor memory, fork
and network usage rates per task) separate patches useful for other
purposes. It might also make the rate limiters in fork, alloc and
network i/o separately useful patches. I mean here genuinely useful
and understandable in their own right, independent of some abstract
CKRM framework.
Though hints have been dropped, I have not seen any public effort to
integrate CKRM with either cpusets or scheduler domains or process
accounting. By this I don't mean recoding cpusets using the CKRM
infrastructure; that proposal received _extensive_ consideration
earlier, and I am as certain as ever that it made no sense. Rather I
could imagine the CKRM folks extending cpusets to manage resources
on a per-cpuset basis, not just on a per-task or task class basis.
Similarly, it might make sense to use CKRM to manage resources on
a per-sched domain basis, and to integrate the resource tracking
of CKRM with the resource tracking needs of system accounting.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
next prev parent reply other threads:[~2005-07-17 15:20 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-15 8:36 2.6.13-rc3-mm1 Andrew Morton
2005-07-15 8:49 ` 2.6.13-rc3-mm1 Russell King
2005-07-15 8:56 ` 2.6.13-rc3-mm1 Andrew Morton
2005-07-15 9:03 ` 2.6.13-rc3-mm1 Russell King
2005-07-15 9:15 ` 2.6.13-rc3-mm1 Andrew Morton
2005-07-15 9:24 ` 2.6.13-rc3-mm1 Matthias Urlichs
2005-07-15 17:42 ` 2.6.13-rc3-mm1 Matthias Urlichs
2005-07-15 10:25 ` 2.6.13-rc3-mm1 Grant Coady
2005-07-15 10:36 ` 2.6.13-rc3-mm1 Andrew Morton
2005-07-15 10:27 ` 2.6.13-rc3-mm1: horribly drivers/scsi/qla2xxx/Makefile Adrian Bunk
2005-07-15 14:40 ` Andrew Vasquez
2005-07-16 17:26 ` Jindrich Makovicka
2005-07-19 14:04 ` [-mm patch] SCSI_QLA2ABC options must select FW_LOADER Adrian Bunk
2005-07-20 13:38 ` Jesper Juhl
2005-07-21 15:25 ` Adrian Bunk
2005-07-17 2:38 ` [2.6 patch] SCSI_QLA2ABC mustn't select SCSI_FC_ATTRS Adrian Bunk
2005-07-17 3:11 ` Lee Revell
2005-07-17 4:04 ` randy_dunlap
2005-07-17 4:20 ` Lee Revell
2005-07-15 15:00 ` 2.6.13-rc3-mm1 Christoph Hellwig
2005-07-15 20:16 ` 2.6.13-rc3-mm1 (ckrm) Andrew Morton
2005-07-17 15:20 ` Paul Jackson [this message]
2005-07-17 19:02 ` Mark Hahn
2005-07-21 1:40 ` Paul Jackson
2005-07-22 3:59 ` Shailabh Nagar
2005-07-22 4:27 ` Gerrit Huizenga
2005-07-22 4:53 ` Mark Hahn
2005-07-22 5:03 ` Gerrit Huizenga
2005-07-22 5:37 ` Mark Hahn
2005-07-22 14:53 ` Alan Cox
2005-07-22 15:51 ` Gerrit Huizenga
2005-07-22 16:35 ` Mark Hahn
2005-07-22 19:27 ` Alan Cox
2005-07-22 20:18 ` [ckrm-tech] " Matthew Helsley
2005-07-23 0:23 ` Mark Hahn
2005-07-23 4:19 ` Matthew Helsley
2005-07-23 15:38 ` Mark Hahn
2005-07-18 10:12 ` Hirokazu Takahashi
2005-07-21 22:37 ` Matthew Helsley
2005-07-21 23:32 ` Paul Jackson
2005-07-22 0:29 ` Martin J. Bligh
2005-07-22 3:46 ` Paul Jackson
2005-07-22 4:07 ` Shailabh Nagar
2005-07-22 19:53 ` Paul Jackson
2005-07-28 20:15 ` Shailabh Nagar
2005-07-28 22:54 ` Paul Jackson
2005-07-22 1:06 ` Peter Williams
2005-07-22 3:00 ` Gerrit Huizenga
2005-07-22 3:46 ` Peter Williams
2005-07-22 3:55 ` Gerrit Huizenga
2005-07-15 17:13 ` 2.6.13-rc3-mm1 Joel Becker
2005-07-15 22:04 ` [PATCH] Assorted fixes J.A. Magallon
2005-07-15 22:11 ` [PATCH] fix LDT tss J.A. Magallon
2005-07-15 22:11 ` [PATCH] fix kmalloc in IDE J.A. Magallon
2005-07-15 22:12 ` [PATCH] SCSI SATA is a tristate J.A. Magallon
2005-07-15 22:13 ` [PATCH] SMB fix J.A. Magallon
2005-07-15 22:14 ` [PATCH] signed char fixes for scripts J.A. Magallon
2005-07-16 9:52 ` Sam Ravnborg
2005-07-18 11:16 ` Paulo Marques
2005-07-18 11:29 ` Paulo Marques
2005-07-27 20:27 ` Sam Ravnborg
2005-07-27 23:36 ` J.A. Magallon
2005-07-28 10:02 ` Paulo Marques
2005-07-28 10:16 ` Bernd Petrovitsch
2005-07-28 10:40 ` Paulo Marques
2005-07-28 11:05 ` Bernd Petrovitsch
2005-07-15 22:52 ` 2.6.13-rc3-mm1 Yoichi Yuasa
2005-07-15 23:00 ` 2.6.13-rc3-mm1 Yoichi Yuasa
2005-07-15 23:23 ` 2.6.13-rc3-mm1 Andrew Morton
2005-07-16 1:08 ` 2.6.13-rc3-mm1 Yoichi Yuasa
2005-07-16 21:30 ` 2.6.13-rc3-mm1: a regression Rafael J. Wysocki
2005-07-16 21:39 ` Andrew Morton
2005-07-17 20:11 ` Rafael J. Wysocki
2005-07-16 22:12 ` 2.6.13-rc3-mm1 : oops in dnotify_parent Laurent Riffard
2005-07-17 1:32 ` 2.6.13-rc3-mm1 Joseph Fannin
2005-07-18 11:41 ` 2.6.13-rc3-mm1 Pavel Machek
2005-07-18 14:21 ` 2.6.13-rc3-mm1 Joseph Fannin
2005-07-17 20:20 ` 2.6.13-rc3-mm1: mount problems w/ 3ware on dual Opteron Rafael J. Wysocki
2005-07-19 14:21 ` 2.6.13-rc3-mm1 Coywolf Qi Hunt
2005-07-19 14:42 ` [patch] kbuild: make help binrpm-pkg fix Coywolf Qi Hunt
2005-07-21 21:46 ` Sam Ravnborg
2005-07-21 11:37 ` 2.6.13-rc3-mm1 - breaks DRI Ed Tomlinson
2005-07-21 15:56 ` Andrew Morton
2005-07-21 22:37 ` Ed Tomlinson
2005-07-21 23:18 ` Dave Airlie
2005-07-22 21:17 ` [-mm patch] kernel/ckrm/rbce/rbce_core.c: fix -Wundef warning Adrian Bunk
2005-07-24 16:20 ` 2.6.13-rc3-mm1 Richard Purdie
2005-07-25 6:42 ` 2.6.13-rc3-mm1 Andrew Morton
2005-07-25 9:35 ` [patch] Stop the nand functions triggering false softlockup reports Richard Purdie
2005-07-28 12:50 ` 2.6.13-rc3-mm1 compiles unrequested/unconfigured module! Helge Hafting
2005-07-28 12:56 ` Adrian Bunk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050717082000.349b391f.pj@sgi.com \
--to=pj@sgi.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox