[patch 0/8] CKRM: Core patch set

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/8] CKRM:   Core patch set
@ 2005-03-30  2:52 gh
  2005-03-30  6:05 ` Paul Jackson
  0 siblings, 1 reply; 15+ messages in thread
From: gh @ 2005-03-30  2:52 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: ckrm-tech

--

This is the core patch set for CKRM, review comments almost all
applied (there are a few we are still working on, mostly cosmetic).
However, this set has been extensively regression tested on IA32,
x86-64/EM64T, and PPC64, with various CKRM CONFIG options on and
off and both regression tests and ckrm's functional tests.

I believe this set is ready for additional testing in -mm.  We
have an additional 4 patch sets that will follow this (classification
engines, memory controller, IO controller, updated network controller).

Continued comments are welcome; once we have patches for the last
of the cleanups, we are hoping we'll have sufficient testing to be
able to push this towards mainline.

gerrit

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM:   Core patch set
  2005-03-30  2:52 [patch 0/8] CKRM: Core patch set gh
@ 2005-03-30  6:05 ` Paul Jackson
  2005-03-30  7:03   ` Gerrit Huizenga
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Paul Jackson @ 2005-03-30  6:05 UTC (permalink / raw)
  To: gh; +Cc: akpm, linux-kernel, ckrm-tech

gerrit wrote:
> This is the core patch set for CKRM

Welcome.

Newcomers to CKRM might want to start reading these patches with "[patch
8/8] CKRM:  Documentation".  Starting with patch 0/8 or 1/8 will be
difficult, at least if you're as dimm witted as I am.

Even the documentation included in patch 8/8 is missing the motivation
and context essential to understanding this patch set.  It might have
helped if the Introduction text at http://ckrm.sourceforge.net/ had been
included in some form, as part of patch 0/8.  I'm just a little penguin
here (lkml), but from what I can tell by watching how things work,
you're going to have to "make the case" -- explain what this is, how
it's put togeher, and why it's needed.  This is a sizable patch, in
lines of code, in hooks in critical places, and in amount of "new
concepts."  I presume (unless you've managed to bribe or blackmail some
big penguin) you're going to have convince some others that this is
worth having.  I for one am a CKRM skeptic, so won't be much help to you
in that quest.  Good luck.

I don't see any performance numbers, either on small systems, or
scalability on large systems.  Certainly this patch does not fall under
the "obviously no performance impact" exclusion.

Here's a combined diffstat showing how much code is added by these
patches, where.  Some of the patches have individual diffstat's, some
don't seem to.

 Documentation/ckrm/TODO          |   17
 Documentation/ckrm/ckrm_basics   |   66 ++
 Documentation/ckrm/core_usage    |   72 +++
 Documentation/ckrm/crbce         |   33 +
 Documentation/ckrm/installation  |   70 +++
 Documentation/ckrm/rbce_basics   |   67 ++
 Documentation/ckrm/rbce_usage    |   98 ++++
 fs/Makefile                      |    1
 fs/exec.c                        |    2
 fs/proc/array.c                  |   18
 fs/proc/base.c                   |   17
 fs/proc/internal.h               |    1
 fs/rcfs/Makefile                 |    9
 fs/rcfs/dir.c                    |  220 +++++++++
 fs/rcfs/inode.c                  |  160 ++++++
 fs/rcfs/magic.c                  |  517 ++++++++++++++++++++++
 fs/rcfs/rootdir.c                |  220 +++++++++
 fs/rcfs/socket_fs.c              |  280 ++++++++++++
 fs/rcfs/super.c                  |  291 ++++++++++++
 fs/rcfs/tc_magic.c               |   93 ++++
 include/linux/ckrm_ce.h          |   95 ++++
 include/linux/ckrm_events.h      |  230 +++++++++-
 include/linux/ckrm_net.h         |   42 +
 include/linux/ckrm_rc.h          |  345 +++++++++++++++
 include/linux/ckrm_tc.h          |   46 ++
 include/linux/ckrm_tsk.h         |   35 +
 include/linux/rcfs.h             |  116 ++++-
 include/linux/sched.h            |  105 ++++
 include/linux/taskdelays.h       |   35 +
 include/net/sock.h               |    3
 include/net/tcp.h                |    4
 init/Kconfig                     |   68 ++
 init/main.c                      |    2
 kernel/Makefile                  |    1
 kernel/ckrm/Makefile             |   14
 kernel/ckrm/ckrm.c               |  892 +++++++++++++++++++++++++++++++++++++++
 kernel/ckrm/ckrm_events.c        |   86 +++
 kernel/ckrm/ckrm_numtasks.c      |  522 ++++++++++++++++++++++
 kernel/ckrm/ckrm_numtasks_stub.c |   53 ++
 kernel/ckrm/ckrm_sockc.c         |  559 ++++++++++++++++++++++++
 kernel/ckrm/ckrm_tc.c            |  745 ++++++++++++++++++++++++++++++++
 kernel/ckrm/ckrmutils.c          |  188 ++++++++
 kernel/exit.c                    |    3
 kernel/fork.c                    |   12
 kernel/sched.c                   |   20
 kernel/sys.c                     |   11
 mm/memory.c                      |   10
 net/ipv4/tcp_ipv4.c              |    5
 48 files changed, 6460 insertions(+), 39 deletions(-)

A couple of nits:

 1) Instead of disabling routines with #defines:
         #define numtasks_put_ref(core_class)  do {} while (0)
    one can do it with static inlines, preserving more compiler
    checking.

 2) I take it that the following constitutes the 'documentation'
    for what is in /proc/<pid>/delay.  Perhaps I missed something.

	+	res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
	+		       (unsigned int) get_delay(task,runs),
	+		       (uint64_t) get_delay(task,runcpu_total),
	+		       (uint64_t) get_delay(task,waitcpu_total),
	+		       (unsigned int) get_delay(task,num_iowaits),
	+		       (uint64_t) get_delay(task,iowait_total),
	+		       (unsigned int) get_delay(task,num_memwaits),
	+		       (uint64_t) get_delay(task,mem_iowait_total)

 3) Typo in init/Kconfig "atleast":

    If you say Y here, enable the Resource Class File System and atleast

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30  6:05 ` Paul Jackson
@ 2005-03-30  7:03   ` Gerrit Huizenga
  2005-03-30 16:57     ` Dave Hansen
  2005-03-30 20:55   ` Diego Calleja
  2005-03-30 20:55   ` Diego Calleja
  2 siblings, 1 reply; 15+ messages in thread
From: Gerrit Huizenga @ 2005-03-30  7:03 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, linux-kernel, ckrm-tech, kashyapv

On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote:
> gerrit wrote:
> > This is the core patch set for CKRM
> 
> Welcome.

 Hi Paul.

> Newcomers to CKRM might want to start reading these patches with "[patch
> 8/8] CKRM:  Documentation".  Starting with patch 0/8 or 1/8 will be
> difficult, at least if you're as dimm witted as I am.
> 
> Even the documentation included in patch 8/8 is missing the motivation
> and context essential to understanding this patch set.  It might have
> helped if the Introduction text at http://ckrm.sourceforge.net/ had been
> included in some form, as part of patch 0/8.  I'm just a little penguin
> here (lkml), but from what I can tell by watching how things work,
> you're going to have to "make the case" -- explain what this is, how
> it's put togeher, and why it's needed.  This is a sizable patch, in
> lines of code, in hooks in critical places, and in amount of "new
> concepts."  I presume (unless you've managed to bribe or blackmail some
> big penguin) you're going to have convince some others that this is
> worth having.  I for one am a CKRM skeptic, so won't be much help to you
> in that quest.  Good luck.

 Good point on including the pointer to the web site.  As you probably
 noticed, there is a history of the design, papers presented, etc.
 Also, Jonathan Corbet did a nice write up from the discussion at the
 2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/
 which may be of use.

 The OLS and LinuxTag papers are archived at the site that you pointed
 to and there will be a tutorial on configuring, using and writing
 controllers for CKRM at OLS this year.  You may also want to see the
 previous postings of this code to LKML for more background.

 In short, CKRM provides very basic desktop to server workload management
 capabilities similar to those provided by most of the old fashioned
 operating systems.  The code provides a fairly simple mechanism for
 adding controllers for any resource type and the code is currently
 widely deployed by PlanetLab, a part of Novell/SuSE's distro, and
 the capabilities are requested by a fair number of Linux users and
 customers.

> I don't see any performance numbers, either on small systems, or
> scalability on large systems.  Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.

 Fair point.  We have been running some of the smaller benchmarks but
 have not yet had a chance to do any kind of performance comparison
 based on the current code.  However, when configured out, it will
 have zero impact.  We do have some performance analysis of the code
 with CONFIG_CKRM set to y but no rules configured planned for the
 very near future.

> A couple of nits:
> 
>  1) Instead of disabling routines with #defines:
>          #define numtasks_put_ref(core_class)  do {} while (0)
>     one can do it with static inlines, preserving more compiler
>     checking.

 Yeah - that works well in some cases but it turns out to not do so
 well when an argument to a function refers to a structure element
 which is not configured in.  In that case, the compiler emits a
 reference to an undefined structure value in the case of the static
 inline, where otherwise the entire set of code is pre-processed
 away.  I think we've gone through the code and used the correct
 balance of static inlines and #define constructs as appropriate.
 If we've missed any, I'm more than willing to accept a patch to
 correct a specific instance.

>  2) I take it that the following constitutes the 'documentation'
>     for what is in /proc/<pid>/delay.  Perhaps I missed something.
> 
> 	+	res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
> 	+		       (unsigned int) get_delay(task,runs),
> 	+		       (uint64_t) get_delay(task,runcpu_total),
> 	+		       (uint64_t) get_delay(task,waitcpu_total),
> 	+		       (unsigned int) get_delay(task,num_iowaits),
> 	+		       (uint64_t) get_delay(task,iowait_total),
> 	+		       (unsigned int) get_delay(task,num_memwaits),
> 	+		       (uint64_t) get_delay(task,mem_iowait_total)

 The code is the documentation?  :)

 There is probably some documentation on /proc/<pid>/ in general and
 we'll see if we can get it updated appropriately.  Vivek?

>  3) Typo in init/Kconfig "atleast":
> 
>     If you say Y here, enable the Resource Class File System and atleast

 Got it - thanks!  Someone liked the new word "atleast" - at least
 three occurences removed.

 Oh - and uniformly updated diffstats - I probably missed some when
 I was playing with quilt originally.

gerrit

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30  7:03   ` Gerrit Huizenga
@ 2005-03-30 16:57     ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2005-03-30 16:57 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Paul Jackson, Andrew Morton, Linux Kernel Mailing List, ckrm-tech,
	Vivek Kashyap [imap]

On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
> The code provides a fairly simple mechanism for adding controllers for
> any resource type

Last time I saw the memory controller, it was 3000 lines.  Doesn't seem
too simple to me. :)

Can you post some of the additional controllers that you've been working
on to the appropriate mailing lists, like linux-mm?  If the subject
experts get a good look at the controllers, it's quite possible that
some comments will cascade back to the core, don't you think?

-- Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
       [not found] <1112201599.11490.6.camel@localhost>
@ 2005-03-30 17:25 ` Gerrit Huizenga
  2005-03-30 18:48   ` Chandra Seetharaman
  0 siblings, 1 reply; 15+ messages in thread
From: Gerrit Huizenga @ 2005-03-30 17:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Paul Jackson, Andrew Morton, Linux Kernel Mailing List, ckrm-tech,
	Vivek Kashyap [imap], sekharan

On Wed, 30 Mar 2005 08:53:19 PST, Dave Hansen wrote:
> On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
> > The code provides a fairly simple mechanism for adding controllers for
> > any resource type
> 
> Last time I saw the memory controller, it was 3000 lines.  Doesn't seem
> too simple to me. :)

 Chandra, Dave's suggestions for the memory controller makes a lot of
 sense.  Can you post the current code, ported to the patch set that
 I just posted, to linux-mm for comment?

> Can you post some of the additional controllers that you've been working
> on to the appropriate mailing lists, like linux-mm?  If the subject
> experts get a good look at the controllers, it's quite possible that
> some comments will cascade back to the core, don't you think?

 You can access the various current controllers via the ckrm-tech
 archives from sf.net/projects/ckrm today.

 However, if there are additional changes to the core, I'd like to
 see them as patches built on top of this core set.  Resending the
 modified core each time makes it hard for people to see what has
 changed from release to release, where individual patches will help
 track modifications better.

gerrit

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30 17:25 ` Gerrit Huizenga
@ 2005-03-30 18:48   ` Chandra Seetharaman
  0 siblings, 0 replies; 15+ messages in thread
From: Chandra Seetharaman @ 2005-03-30 18:48 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Dave Hansen, Paul Jackson, Andrew Morton,
	Linux Kernel Mailing List, ckrm-tech, Vivek Kashyap [imap]

On Wed, Mar 30, 2005 at 09:25:44AM -0800, Gerrit Huizenga wrote:
> 
> On Wed, 30 Mar 2005 08:53:19 PST, Dave Hansen wrote:
> > On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
> > > The code provides a fairly simple mechanism for adding controllers for
> > > any resource type
> > 
> > Last time I saw the memory controller, it was 3000 lines.  Doesn't seem
> > too simple to me. :)
>  
>  Chandra, Dave's suggestions for the memory controller makes a lot of
>  sense.  Can you post the current code, ported to the patch set that
>  I just posted, to linux-mm for comment?

Yes, it is in the plans. Withing couple of days I will post memory
controller against this patchset, will crosspost to linux-mm then.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM:   Core patch set
  2005-03-30  6:05 ` Paul Jackson
  2005-03-30  7:03   ` Gerrit Huizenga
@ 2005-03-30 20:55   ` Diego Calleja
  2005-03-30 21:29     ` Gerrit Huizenga
                       ` (2 more replies)
  2005-03-30 20:55   ` Diego Calleja
  2 siblings, 3 replies; 15+ messages in thread
From: Diego Calleja @ 2005-03-30 20:55 UTC (permalink / raw)
  To: Paul Jackson; +Cc: gh, akpm, linux-kernel, ckrm-tech

El Tue, 29 Mar 2005 22:05:30 -0800,
Paul Jackson <pj@engr.sgi.com> escribió:

> worth having.  I for one am a CKRM skeptic, so won't be much help to you
> in that quest.  Good luck.
> 
> I don't see any performance numbers, either on small systems, or
> scalability on large systems.  Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.

I'm one of those people who also thinks that CKRM tries to do too much things, and
although my opinion doesn't counts a lot, I'll try to explain myself anyway :)

One of the things I personally don't like about CKRM its how it handles "CPU resources".
The goal of CKRM seems to be "control how much % a process can get get", but the
amount of concepts created to achieve that is too huge and too complex. For the
"CPU resources", I think that there're much simpler and better solutions. For example,
instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a 
niceness level.

Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
permissions to renice themselves to nice -5. If instead of that, group foo has been
attached at nice level 15, all processes from users who belong to foo will be run at 15,
and they won't be able to renice themselves even to the default priority (0)

This should be very easy to implement, and what's more important, it'd probably have
zero performance impact at runtime - CRKM touches hot paths in the scheduler
I think, this would just touch a few non-critical places - because we'd just use a existing
concept.

Sure, this can't guarantee that a group will get reserved exactly 57% of  the CPU, but I
think that such level of detail is unnecesary - instead we let the kernel uses the
standard internal mechanisms to do the dirty job based in the distinction between
standard nice levels. (And we could get that level of detail just by modifying the
scheduler algorithm and adding a range of -50...0...50 nice levels ;)

For the CPU resources, we already have nice levels. The existing algorithms can already
handle priorities with them. CKRM alternative seems to be to add a second scheduling
algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
performance impact. In my very humble opinion, I think we should reuse existing UNIX
concepts and combine them to achieve some of the goals CKRM tries to achieve in
a much simpler (unixy ;) way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM:   Core patch set
  2005-03-30  6:05 ` Paul Jackson
  2005-03-30  7:03   ` Gerrit Huizenga
  2005-03-30 20:55   ` Diego Calleja
@ 2005-03-30 20:55   ` Diego Calleja
  2 siblings, 0 replies; 15+ messages in thread
From: Diego Calleja @ 2005-03-30 20:55 UTC (permalink / raw)
  To: Paul Jackson; +Cc: gh, akpm, linux-kernel, ckrm-tech

El Tue, 29 Mar 2005 22:05:30 -0800,
Paul Jackson <pj@engr.sgi.com> escribió:

> worth having.  I for one am a CKRM skeptic, so won't be much help to you
> in that quest.  Good luck.
> 
> I don't see any performance numbers, either on small systems, or
> scalability on large systems.  Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.

I'm one of those people who also thinks that CKRM tries to do too much things, and
although my opinion doesn't counts a lot, I'll try to explain myself anyway :)

One of the things I personally don't like about CKRM its how it handles "CPU resources".
The goal of CKRM seems to be "control how much % a process can get get", but the
amount of concepts created to achieve that is too huge and too complex. For the
"CPU resources", I think that there're much simpler and better solutions. For example,
instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a 
niceness level.

Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
permissions to renice themselves to nice -5. If instead of that, group foo has been
attached at nice level 15, all processes from users who belong to foo will be run at 15,
and they won't be able to renice themselves even to the default priority (0)

This should be very easy to implement, and what's more important, it'd probably have
zero performance impact at runtime - CRKM touches hot paths in the scheduler
I think, this would just touch a few non-critical places - because we'd just use a existing
concept.

Sure, this can't guarantee that a group will get reserved exactly 57% of  the CPU, but I
think that such level of detail is unnecesary - instead we let the kernel uses the
standard internal mechanisms to do the dirty job based in the distinction between
standard nice levels. (And we could get that level of detail just by modifying the
scheduler algorithm and adding a range of -50...0...50 nice levels ;)

For the CPU resources, we already have nice levels. The existing algorithms can already
handle priorities with them. CKRM alternative seems to be to add a second scheduling
algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
performance impact. In my very humble opinion, I think we should reuse existing UNIX
concepts and combine them to achieve some of the goals CKRM tries to achieve in
a much simpler (unixy ;) way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30 20:55   ` Diego Calleja
@ 2005-03-30 21:29     ` Gerrit Huizenga
  2005-03-30 22:29       ` Diego Calleja
  2005-03-30 22:11     ` Shailabh Nagar
  2005-03-31  0:25     ` Chandra Seetharaman
  2 siblings, 1 reply; 15+ messages in thread
From: Gerrit Huizenga @ 2005-03-30 21:29 UTC (permalink / raw)
  To: Diego Calleja; +Cc: Paul Jackson, akpm, linux-kernel, ckrm-tech

On Wed, 30 Mar 2005 22:55:05 +0200, Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <pj@engr.sgi.com> escribi=F3:
> 
> 
> > worth having.  I for one am a CKRM skeptic, so won't be much help to you
> > in that quest.  Good luck.
> >
> > I don't see any performance numbers, either on small systems, or
> > scalability on large systems.  Certainly this patch does not fall under
> > the "obviously no performance impact" exclusion.
> 
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
>
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex. For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
> niceness level.

Well, the current code and the stacked up patch sets don't currently
include a CPU resource controller, although the SuSE distro version does.
We've pulled back on that for the time being since the scheduler has
been under so much revision lately.  However, resource utilization at the
priority level does not allow you to say "OpenOffice can have up to 30%
of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
Java apps get no more than 50% of my machine, and my CD player gets 10%".
Niceness levels provide none of that level of resource control.  Also,
GID's have no utility on a desktop machine, other than to separate
possibly background tasks like updatedb vs. all my real time apps.

> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)

 Again, great for multiuser systems if you just want people to be prioritized
 as opposed to work.  But more often on larger multiuser systems, you want various
 services to have priorities.  For instance, a web server may be allowed some
 rate of incoming connections or some amount of CPU bandwidth; a database may
 have memory limits, CPU limits (or allowing "at least" some percentage, possibly
 also limiting it from taking over the entire machine; and IO limits in terms
 amount disk traffic.  These limits may allow various clients or web servers
 to make progress without getting drowned out by some large server which
 wants to consume 100% of cpu or all of available memory.

> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.

 Not currently in the patches being brought forward to LKML.

> Sure, this can't guarantee that a group will get reserved exactly 57% of  the CPU, but I
> think that such level of detail is unnecesary - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)

 Also, with various implementation of the scheduler, the nice levels have been
 either studiously ignored or sometimes at the other extreme there has been a
 more clear stairstepping of nice levels.  Relying on predictability here based
 on the current algorithm is not a great formula for success, nor does it address
 the needs of most desktop or server users in any simple/easy to use way.

> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.

 I'd love to see patches which could be validated by folks like the PlanetLab
 folks, for instance.  I don't believe it is possible to get the level of machine
 partitioning/virtualization that CKRM provides with this overly simple prioritization
 scheme.

gerrit

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ckrm-tech] Re: [patch 0/8] CKRM:   Core patch set
  2005-03-30 20:55   ` Diego Calleja
  2005-03-30 21:29     ` Gerrit Huizenga
@ 2005-03-30 22:11     ` Shailabh Nagar
  2005-03-31  0:25     ` Chandra Seetharaman
  2 siblings, 0 replies; 15+ messages in thread
From: Shailabh Nagar @ 2005-03-30 22:11 UTC (permalink / raw)
  To: Diego Calleja; +Cc: Paul Jackson, gh, akpm, linux-kernel, ckrm-tech

Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <pj@engr.sgi.com> escribió:
> 
> 
> 
>>worth having.  I for one am a CKRM skeptic, so won't be much help to you
>>in that quest.  Good luck.
>>
>>I don't see any performance numbers, either on small systems, or
>>scalability on large systems.  Certainly this patch does not fall under
>>the "obviously no performance impact" exclusion.
> 
> 
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
> 
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex. 

Certainly there's scope for improvement in the implementation of the CPU 
controller but the solution you propose works by redefining the problem.

> For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a 
> niceness level.

Doing performance isolation at the granularity of users and groups may 
be useful but is not enough for workload management needs. There, it is 
essential that a a) flexible b) dynamic grouping of processes be 
controllable in their resource consumption as an aggregate. Tying that 
grouping to user/groups will not suffice.

CKRM's definition of class can be made equivalent to a user or group but 
not vice versa. Hence the more generic classes are being used, rather 
than reusing groups/users.

Also, our earlier prototype for the CPU controller had shown a 
0.14-0.63us overhead which remained constant with increasing number of 
processes. While we don't have measurements for later versions, the 
overhead figures are by no means unacceptably high if one values the 
additional generality of CKRM's class (over groups/users).

> 
> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)
 >
> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.

> Sure, this can't guarantee that a group will get reserved exactly 57% of  the CPU, but I
> think that such level of detail is unnecesary 

For desktop users, perhaps. For server workload management, this level 
of detail is necessary. As stated earlier, CKRM's design satisfies both.

> - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)
> 
> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.

Not that other Unix's design decisions should influence Linux but every 
other enterprise UNIX has some equivalent of CKRM's classes available. 
So the design is far from being non-unixy :-)

-- Shailabh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30 21:29     ` Gerrit Huizenga
@ 2005-03-30 22:29       ` Diego Calleja
  2005-03-31  1:32         ` Paul Jackson
  0 siblings, 1 reply; 15+ messages in thread
From: Diego Calleja @ 2005-03-30 22:29 UTC (permalink / raw)
  To: Gerrit Huizenga; +Cc: pj, akpm, linux-kernel, ckrm-tech

El Wed, 30 Mar 2005 13:29:53 -0800,
Gerrit Huizenga <gh@us.ibm.com> escribió:

> been under so much revision lately.  However, resource utilization at the
> priority level does not allow you to say "OpenOffice can have up to 30%
> of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
> Java apps get no more than 50% of my machine, and my CD player gets 10%".
> Niceness levels provide none of that level of resource control.  Also,

Users can launch tasks and renice them to lowest priority levels..., with the highset
priority being given by the administrator...I've always though it's gnome/kde fault to launch
the apps at the same nice level than the panel and the window manager. Despite of 
that my "design" wouldn't achieve that such fine-grained control, no - I'd argue that not
many people needs that, but then I shouldn't tell people what they need (and anyway
the previous proposal would so powerful for its simplicity that it might be worth of it doing
it anyway)

>  I'd love to see patches which could be validated by folks like the PlanetLab
>  folks, for instance.  I don't believe it is possible to get the level of machine
>  partitioning/virtualization that CKRM provides with this overly simple prioritization
>  scheme.

I realize that CKRM provides much broader functionality, the alternative I was proposing
was just for CPU resources (and would probably work well for IO bandwith with CFQ),
I realize that things like "partitioning memory resources" is a whole different problem.

But I certainly think that CKRM is far too complex - the docs I've read spent all the time
describing things like classes, classes inhretance, classification engine, resources
scheduler, resource schedulers configuration and so on. I must admit I've not read too
much about CKRM - I had to stop because I couldn't understand it, everything is far too
complex to my little mind, and I'm saying this because I bet I'm not the only one here
who can't understand it either.....

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ckrm-tech] Re: [patch 0/8] CKRM:   Core patch set
  2005-03-30 20:55   ` Diego Calleja
  2005-03-30 21:29     ` Gerrit Huizenga
  2005-03-30 22:11     ` Shailabh Nagar
@ 2005-03-31  0:25     ` Chandra Seetharaman
  2 siblings, 0 replies; 15+ messages in thread
From: Chandra Seetharaman @ 2005-03-31  0:25 UTC (permalink / raw)
  To: Diego Calleja; +Cc: Paul Jackson, gh, akpm, linux-kernel, ckrm-tech

On Wed, Mar 30, 2005 at 10:55:05PM +0200, Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <pj@engr.sgi.com> escribió:
> 
> 
> > worth having.  I for one am a CKRM skeptic, so won't be much help to you
> > in that quest.  Good luck.
> > 
> > I don't see any performance numbers, either on small systems, or
> > scalability on large systems.  Certainly this patch does not fall under
> > the "obviously no performance impact" exclusion.
> 
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
> 
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex. For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a 
> niceness level.
> 
> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)
> 
> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.

Your design is nice and simple to take the priority based scheduling
to the next level.

Whereas what CKRM provides is resource management and monitoring, which
is more than prioritizing group of users for scheduling.

It allows one to manage/monitor different groups of applications(that
are related or non-related).

With CKRM, you can provide resource control support for features
like UML and virtual servers to make them more controllable 
domains(term domain used loosely) in terms of resource management.

> 
> Sure, this can't guarantee that a group will get reserved exactly 57% of  the CPU, but I
> think that such level of detail is unnecesary - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)
> 
> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a

One clarification: CKRM is the infrastruture, What you are referring 
is the CPU controller(whish is a module to mange the resource CPU), which
can be replaced by a simplistic one(like the one you propose) or turned
off if needed.

That is one of the advantage the architecture provides, it removed resource
specific details from the core functionality CKRM provided, so that it
remains flexible(in choosing the resources you want to control) and
expandable easily(to support additional resources).

> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by Demarc:
> A global provider of Threat Management Solutions.
> Download our HomeAdmin security software for free today!
> http://www.demarc.com/info/Sentarus/hamr30
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
> 

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-30 22:29       ` Diego Calleja
@ 2005-03-31  1:32         ` Paul Jackson
  2005-03-31  3:30           ` Gerrit Huizenga
  2005-03-31 14:46           ` [ckrm-tech] " Shailabh Nagar
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Jackson @ 2005-03-31  1:32 UTC (permalink / raw)
  To: Diego Calleja; +Cc: gh, akpm, linux-kernel, ckrm-tech

Diego wrote:
> I bet I'm not the only one here
> who can't understand it either.....

You're not alone.

See an email thread entitled:

    Classes: 1) what are they, 2) what is their name?
    http://sourceforge.net/mailarchive/forum.php?thread_id=5328162&forum_id=35191

on the ckrm-tech@lists.sourceforge.net email list between Aug 14 and Aug
27, 2004, where I did my best to encourage the CKRM project to address
this problem.  To no avail.

Apparently, to some of the smartest amongst us, who got to hear
live presentations describing CKRM, it makes sense and is worthy
of serious consideration.

For myself, of more ordinary intelligence and working just from the
documentation and an occassional glance at the code, it has been a
difficult proposal to understand, with a rather large patch requiring
some non-trivial kernel hooks.

A question for the CKRM developers:

    What middleware packages, outside the kernel, exist or are
    in the works that will rely on CKRM?

    CKRM (like another project near and dear to me, cpusets)
    strikes me as a "middleware foundation" facility, intended
    to provide the essential kernel support required for some
    serious enterprise software.  So perhaps in addition to
    asking what end-users (of a combined kernel-middleware
    platform) exist, we should also be asking who will be
    directly using CKRM - directly layering middleware on top
    of it.

    The details don't matter much and may have to remain
    obscured in the competitive fog.  But the presence of
    multiple groups lobbying for the same kernel infrastructure,
    as an apparent basis for competing middleware products,
    would I think weigh in CKRM's favor.

My impression, which may not align with how the CKRM developers view
things, is that CKRM is descendent from what have been called fair-share
schedulers.  The following comes from the above email thread.

No doubt the CKRM experts are already familiar with these, but for the
possible benefit of other readers:

  UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
  http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883

  SHARE II -- A User Administration and Resource Control System for UNIX
  http://www.c-side.com/c/papers/lisa-91.html

  Solaris Resource Manager White Paper
  http://wwws.sun.com/software/resourcemgr/wp-mixed/

  ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
  http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm

  A Fair Share Scheduler, J. Kay and P. Lauder
  Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/8] CKRM: Core patch set
  2005-03-31  1:32         ` Paul Jackson
@ 2005-03-31  3:30           ` Gerrit Huizenga
  2005-03-31 14:46           ` [ckrm-tech] " Shailabh Nagar
  1 sibling, 0 replies; 15+ messages in thread
From: Gerrit Huizenga @ 2005-03-31  3:30 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Diego Calleja, akpm, linux-kernel, ckrm-tech

On Wed, 30 Mar 2005 17:32:32 PST, Paul Jackson wrote:
> A question for the CKRM developers:
> 
>     What middleware packages, outside the kernel, exist or are
>     in the works that will rely on CKRM?

 Primarily, CKRM classes can be instantiated today by simple
 echo's into the /rcfs filesystem.  There isn't a big need for
 a complex middleware package to set up and use CKRM.

 However, there are some tools under way to provide a small CLI
 to help with the administration for those who want it.  There
 are also some pretty minimal rc scripts underway to ensure that
 classes are configured at boot time and/or saved and restored
 across reboots and a simple config file used by that rc script.

>     CKRM (like another project near and dear to me, cpusets)
>     strikes me as a "middleware foundation" facility, intended
>     to provide the essential kernel support required for some
>     serious enterprise software.  So perhaps in addition to
>     asking what end-users (of a combined kernel-middleware
>     platform) exist, we should also be asking who will be
>     directly using CKRM - directly layering middleware on top
>     of it.

 I'm sure you could plug this into some existing workload management
 tools - lots of companies have them for managing other OS's.  Getting
 them to manage Linux with CKRM should be pretty simple for any of
 them if you really want that sort of thing.

>     The details don't matter much and may have to remain
>     obscured in the competitive fog.  But the presence of
>     multiple groups lobbying for the same kernel infrastructure,
>     as an apparent basis for competing middleware products,
>     would I think weigh in CKRM's favor.

> My impression, which may not align with how the CKRM developers view
> things, is that CKRM is descendent from what have been called fair-share
> schedulers.  The following comes from the above email thread.

 CKRM is about ways of managing kernel resources - CPU would just be
 one of these.  Fairshare scheduling is similar in some respects to
 what a scheduler might need to do for such a capabilitiy.  But that
 isn't part of the code being put forward now or the set that is
 getting finalized on ckrm-tech for mainline right now.  Definitely
 useful, but a bit more challenging for getting a mainline mergeable
 version.

 BTW, one of your comments was that the word "class" was confusing.
 This may stem from the fact that there have been two approaches
 with the word "class" in them in CKRM.

 The first was that a class would be a set of resource upper/lower limits
 such as CPU, memory, number of tasks, getrlimit style resource limits,
 IO bandwidth, network connections, etc. that would be applied to some
 set of tasks.

 At last year's kernel summit, Linus suggested that classes should
 be unique to each resource, e.g. a task could be a member of a
 memory class, mem-A; a CPU resource class cpu-B, an IO resource
 class io-C.  So, now a class is specific to a resource and a task
 is effectively a member of a number of distinct and otherwise
 independent resource classes.

 The current code embodies the second definition of class, which
 provides some more useful independence of resources (they don't all
 need to tie into a common class infrastructure, which made the code
 a little more intertangled).

 With the current core code, a task is put into a particular resource
 class simply by echoes in the corresponding rcfs directory structure
 for that resource.

 A soon to be forthcoming updated patch provides a simple and a more
 interesting classification engine which allows you to specific rules
 about what processes are associated with which resource classes.
 E.g. all tasks with a particular uid can be put in the
 "oracle_mem_pig" class or all tasks with a particular gid may be
 put into the "video" scheduler class.  The classification engine allows
 for some more complex rules which are applied at task creation
 time, or at a few other points such as a change of real or effective
 uid/gid.

 In some respects, this provides for a *very* lightweight form of
 virtualization, by restricting a working set of tasks to a limited
 set of resources, without the hard boundaries of a UML or Xen style
 virtual machine.  This also allows protection for some workloads
 in the face of bursty traffic or workloads which are otherwise content
 to consume your entire machine, to the exclusion of all other activities
 on the machine.

gerrit

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ckrm-tech] Re: [patch 0/8] CKRM: Core patch set
  2005-03-31  1:32         ` Paul Jackson
  2005-03-31  3:30           ` Gerrit Huizenga
@ 2005-03-31 14:46           ` Shailabh Nagar
  1 sibling, 0 replies; 15+ messages in thread
From: Shailabh Nagar @ 2005-03-31 14:46 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Diego Calleja, gh, akpm, linux-kernel, ckrm-tech

Paul Jackson wrote:
> Diego wrote:
> 
>>I bet I'm not the only one here
>>who can't understand it either.....
> 
> 
> You're not alone.
> 
> See an email thread entitled:
> 
>     Classes: 1) what are they, 2) what is their name?
>     http://sourceforge.net/mailarchive/forum.php?thread_id=5328162&forum_id=35191
> 
> on the ckrm-tech@lists.sourceforge.net email list between Aug 14 and Aug
> 27, 2004, where I did my best to encourage the CKRM project to address
> this problem.  To no avail.

That is not really a fair categorization of the thread. Hubertus and I 
did try to explain what CKRM classes are. As the last parts of the 
thread show, it was the choice of names that you disagreed with.

> Apparently, to some of the smartest amongst us, who got to hear
> live presentations describing CKRM, it makes sense and is worthy
> of serious consideration.

Except for the Kernel Summit talk (slides of which were very brief),
you have access to the very same presentations on the ckrm website.

> For myself, of more ordinary intelligence and working just from the
> documentation and an occassional glance at the code, it has been a
> difficult proposal to understand, with a rather large patch requiring
> some non-trivial kernel hooks.


Have you read Section 2 of the
	http://ckrm.sourceforge.net/downloads/ckrm-ols04-paper.pdf

There the terms class, classtype, resource controllers and 
classification engine have all been explained. If you continue to have 
trouble understanding what these mean, we'd be happy to go over it once 
more. Perhaps we should try a twiki type site or come up with a specific 
set of doubts that need to be addressed.


> A question for the CKRM developers:
> 
>     What middleware packages, outside the kernel, exist or are
>     in the works that will rely on CKRM?
>     
>     CKRM (like another project near and dear to me, cpusets)
>     strikes me as a "middleware foundation" facility, intended
>     to provide the essential kernel support required for some
>     serious enterprise software.  So perhaps in addition to
>     asking what end-users (of a combined kernel-middleware
>     platform) exist, we should also be asking who will be
>     directly using CKRM - directly layering middleware on top
>     of it.
>     
>     The details don't matter much and may have to remain
>     obscured in the competitive fog.  But the presence of
>     multiple groups lobbying for the same kernel infrastructure,
>     as an apparent basis for competing middleware products,
>     would I think weigh in CKRM's favor.

Undoubtedly so. However, workload management middleware developers don't 
seem to have a history of actively participating in LKML for useful 
features so its left to the likes of us to determine what *would* be 
useful and then go build it if it makes sense and is acceptable to the 
community.


> My impression, which may not align with how the CKRM developers view
> things, is that CKRM is descendent from what have been called fair-share
> schedulers.  The following comes from the above email thread.

Doing fair-share scheduling is indeed the ultimate goal of CKRM. But 
using that characterization *alone* will not, in my opinion, be 
sufficient to explain what are classes, classtypes etc.

> No doubt the CKRM experts are already familiar with these, but for the
> possible benefit of other readers:
> 
>   UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
>   http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
> 
>   SHARE II -- A User Administration and Resource Control System for UNIX
>   http://www.c-side.com/c/papers/lisa-91.html
> 
>   Solaris Resource Manager White Paper
>   http://wwws.sun.com/software/resourcemgr/wp-mixed/
> 
>   ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
>   http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
> 
>   A Fair Share Scheduler, J. Kay and P. Lauder
>   Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.

Thanks for the links. Yes, some of these are useful in understanding the 
utility of fair-share scheduling and may even help in creating better 
"controllers" in CKRM-speak.


-- Shailabh



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-03-31 14:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-30  2:52 [patch 0/8] CKRM: Core patch set gh
2005-03-30  6:05 ` Paul Jackson
2005-03-30  7:03   ` Gerrit Huizenga
2005-03-30 16:57     ` Dave Hansen
2005-03-30 20:55   ` Diego Calleja
2005-03-30 21:29     ` Gerrit Huizenga
2005-03-30 22:29       ` Diego Calleja
2005-03-31  1:32         ` Paul Jackson
2005-03-31  3:30           ` Gerrit Huizenga
2005-03-31 14:46           ` [ckrm-tech] " Shailabh Nagar
2005-03-30 22:11     ` Shailabh Nagar
2005-03-31  0:25     ` Chandra Seetharaman
2005-03-30 20:55   ` Diego Calleja
     [not found] <1112201599.11490.6.camel@localhost>
2005-03-30 17:25 ` Gerrit Huizenga
2005-03-30 18:48   ` Chandra Seetharaman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox