Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
@ 2006-04-21 19:07 Al Boldi
  2006-04-21 22:04 ` Matt Helsley
  2006-04-21 22:09 ` Chandra Seetharaman
  0 siblings, 2 replies; 30+ messages in thread
From: Al Boldi @ 2006-04-21 19:07 UTC (permalink / raw)
  To: linux-kernel

Chandra Seetharaman wrote:
> On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > CKRM has gone through a major overhaul by removing some of the
> > > complexity, cutting down on features and moving portions to userspace.
> >
> > What do you want done with these patches?  Do you think they are ready
> > for mainline?  -mm?  Or, are you just posting here for comments?
>
> We think it is ready for -mm. But, want to go through a review cycle in
> lkml before i request Andrew for that.

IMHO, it would be a good idea to decouple the current implementation and 
reconnect them via an open mapper/wrapper to allow a more flexible/open 
approach to resource management, which may ease its transition into 
mainline, due to a step-by-step instead of an all-or-none approach.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 19:07 [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Al Boldi
@ 2006-04-21 22:04 ` Matt Helsley
       [not found]   ` <200604220708.40018.a1426z@gawab.com>
  2006-04-21 22:09 ` Chandra Seetharaman
  1 sibling, 1 reply; 30+ messages in thread
From: Matt Helsley @ 2006-04-21 22:04 UTC (permalink / raw)
  To: Al Boldi; +Cc: LKML

On Fri, 2006-04-21 at 22:07 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the
> > > > complexity, cutting down on features and moving portions to userspace.
> > >
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> >
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> IMHO, it would be a good idea to decouple the current implementation and 
> reconnect them via an open mapper/wrapper to allow a more flexible/open 
> approach to resource management, which may ease its transition into 
> mainline, due to a step-by-step instead of an all-or-none approach.
> 
> Thanks!
> 
> --
> Al

Hi Al,

	I'm sorry, I don't understand what you're suggesting. Could you please
elaborate on how you think it should be decoupled?

Thanks,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 30+ messages in thread

[parent not found: <200604220708.40018.a1426z@gawab.com>]

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
       [not found]   ` <200604220708.40018.a1426z@gawab.com>
@ 2006-04-22  5:46     ` Chandra Seetharaman
  2006-04-22 20:40       ` Al Boldi
  0 siblings, 1 reply; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  5:46 UTC (permalink / raw)
  To: Al Boldi; +Cc: Matt Helsley, LKML, Andrew Morton

On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:

> i.e: it should be possible to run the RCs w/o CKRM.
> 
> The current design pins the RCs on CKRM, when in fact this is not necessary.  
> One way to decouple them, could be to pin them against pid, thus allowing an 
> RC to leverage the pid hierarchy w/o the need for CKRM.  And only when finer 
> RM control is necessary, should CKRM come into play, by dynamically 
> adjusting the RC to achieve the desired effect.

This model works well in universities, where you associate some resource
when a student logs in, or a virtualised environment (like a UML or
vserver), where you attach resource to the root process.

It doesn't work with web servers, database servers etc.,, where the main
application will be forking tasks for different set of end users. In
that case you have to group tasks that are not related to one another
and attach resources to them.

Having a unified interface gives the system administrator ability to
group the tasks as they see them in real life (a department or important
transactions or just critical apps in a desktop).

It also has the added advantage that the resource controller writer do
not have to spend their time in coming up with an interface for their
controller. On the other hand if they do, the user finally ends up with
multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
resource management.

> 
> Thanks!
> 
> --
> Al
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:46     ` Chandra Seetharaman
@ 2006-04-22 20:40       ` Al Boldi
  2006-04-23  2:33         ` Matt Helsley
  0 siblings, 1 reply; 30+ messages in thread
From: Al Boldi @ 2006-04-22 20:40 UTC (permalink / raw)
  To: sekharan; +Cc: Matt Helsley, LKML, Andrew Morton

Chandra Seetharaman wrote:
> On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > i.e: it should be possible to run the RCs w/o CKRM.
> >
> > The current design pins the RCs on CKRM, when in fact this is not
> > necessary. One way to decouple them, could be to pin them against pid,
> > thus allowing an RC to leverage the pid hierarchy w/o the need for CKRM.
> >  And only when finer RM control is necessary, should CKRM come into
> > play, by dynamically adjusting the RC to achieve the desired effect.
>
> This model works well in universities, where you associate some resource
> when a student logs in, or a virtualised environment (like a UML or
> vserver), where you attach resource to the root process.
>
> It doesn't work with web servers, database servers etc.,, where the main
> application will be forking tasks for different set of end users. In
> that case you have to group tasks that are not related to one another
> and attach resources to them.
>
> Having a unified interface gives the system administrator ability to
> group the tasks as they see them in real life (a department or important
> transactions or just critical apps in a desktop).

So, why drag this unified interface around when it is only needed in certain 
models.  The underlying interface via pid comes for free and should be 
leveraged as such to yield a low overhead implementation.  Then maybe, when 
a more complex model is involved should CKRM come into play.

> It also has the added advantage that the resource controller writer do
> not have to spend their time in coming up with an interface for their
> controller. On the other hand if they do, the user finally ends up with
> multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
> resource management.

So, maybe what is needed is an abstract parent RC that implements this 
interface and lets the child RCs implement the specifics, and allows CKRM to 
connect to the parent RC to allow finer RM control when a specific model 
requires it.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22 20:40       ` Al Boldi
@ 2006-04-23  2:33         ` Matt Helsley
  2006-04-23 11:22           ` Al Boldi
  0 siblings, 1 reply; 30+ messages in thread
From: Matt Helsley @ 2006-04-23  2:33 UTC (permalink / raw)
  To: Al Boldi; +Cc: sekharan, LKML, Andrew Morton

On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > i.e: it should be possible to run the RCs w/o CKRM.
> > >
> > > The current design pins the RCs on CKRM, when in fact this is not
> > > necessary. One way to decouple them, could be to pin them against pid,
> > > thus allowing an RC to leverage the pid hierarchy w/o the need for CKRM.
> > >  And only when finer RM control is necessary, should CKRM come into
> > > play, by dynamically adjusting the RC to achieve the desired effect.
> >
> > This model works well in universities, where you associate some resource
> > when a student logs in, or a virtualised environment (like a UML or
> > vserver), where you attach resource to the root process.
> >
> > It doesn't work with web servers, database servers etc.,, where the main
> > application will be forking tasks for different set of end users. In
> > that case you have to group tasks that are not related to one another
> > and attach resources to them.
> >
> > Having a unified interface gives the system administrator ability to
> > group the tasks as they see them in real life (a department or important
> > transactions or just critical apps in a desktop).
> 
> So, why drag this unified interface around when it is only needed in certain 
> models.  The underlying interface via pid comes for free and should be 
> leveraged as such to yield a low overhead implementation.  Then maybe, when 
> a more complex model is involved should CKRM come into play.

Assuming I'm not misinterpretting your brief description above:

	The interface "via pid" does not come for free. You'd essentially
attach the shares structures to the task and implement inheritance and
hierarchy of those shares during fork -- hardly lower overhead when you
consider that in most cases the number of tasks is going to be much
larger than the number of classes. Furthermore this would mean
duplicating the loops in ckrm_alloc_class, ckrm_free_class,
ckrm_register_controller, and ckrm_unregister_controller. I suspect the
loops would be deeper, more complex, execute more frequently, and have a
much wider performance impact when you consider that we'd be dealing
with the task struct directly instead of a class. The class structure
effectively factors most of the loops out of the fork() and exit() paths
and into mkdir() rmdir() calls that create and remove classes. The
remaining loops in fork() and exit() paths are proportional to the
number of resource controllers -- currently limitedto 8 by
CKRM_MAX_RES_CTLRS.

	Classes also have an advantage when it comes to administrating resource
management -- they are created and destroyed by an administrator and
hence are easier to control. In contrast, the resource management
decisions associated purely with tasks would disappear with the task. In
many cases a task would be too short-lived for an administrator to
manually intervene even if swarms of these tasks are created. Having
this orthogonal hierarchy gives us the opportunity to manage all of
these situations via a common interface and factors out overhead from
the per-task solution you seem to be advocating.

I'm willing to discuss your ideas without patches but I think patches
(even if incomplete) would be clearer.

> > It also has the added advantage that the resource controller writer do
> > not have to spend their time in coming up with an interface for their
> > controller. On the other hand if they do, the user finally ends up with
> > multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
> > resource management.
> 
> So, maybe what is needed is an abstract parent RC that implements this 
> interface and lets the child RCs implement the specifics, and allows CKRM to 
> connect to the parent RC to allow finer RM control when a specific model 
> requires it.

	I'm not sure what advantage that would give compared to CKRM as it
stands now -- it sounds much more complex. Could you give an example of
what kind of interfaces you're suggesting?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23  2:33         ` Matt Helsley
@ 2006-04-23 11:22           ` Al Boldi
  2006-04-24 18:23             ` Chandra Seetharaman
  0 siblings, 1 reply; 30+ messages in thread
From: Al Boldi @ 2006-04-23 11:22 UTC (permalink / raw)
  To: Matt Helsley; +Cc: sekharan, LKML, Andrew Morton

Matt Helsley wrote:
> On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> > Chandra Seetharaman wrote:
> > > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > > i.e: it should be possible to run the RCs w/o CKRM.
> > > >
> > > > The current design pins the RCs on CKRM, when in fact this is not
> > > > necessary. One way to decouple them, could be to pin them against
> > > > pid, thus allowing an RC to leverage the pid hierarchy w/o the need
> > > > for CKRM. And only when finer RM control is necessary, should CKRM
> > > > come into play, by dynamically adjusting the RC to achieve the
> > > > desired effect.
> > >
> > > This model works well in universities, where you associate some
> > > resource when a student logs in, or a virtualised environment (like a
> > > UML or vserver), where you attach resource to the root process.
> > >
> > > It doesn't work with web servers, database servers etc.,, where the
> > > main application will be forking tasks for different set of end users.
> > > In that case you have to group tasks that are not related to one
> > > another and attach resources to them.
> > >
> > > Having a unified interface gives the system administrator ability to
> > > group the tasks as they see them in real life (a department or
> > > important transactions or just critical apps in a desktop).
> >
> > So, why drag this unified interface around when it is only needed in
> > certain models.  The underlying interface via pid comes for free and
> > should be leveraged as such to yield a low overhead implementation. 
> > Then maybe, when a more complex model is involved should CKRM come into
> > play.
>
> Assuming I'm not misinterpretting your brief description above:
>
> 	The interface "via pid" does not come for free. You'd essentially
> attach the shares structures to the task and implement inheritance and
> hierarchy of those shares during fork

No, attach the shares struct to the parent RC, and allow it to take advantage 
of the free pid hierarchy.

> I'm willing to discuss your ideas without patches but I think patches
> (even if incomplete) would be clearer.

The discussion here is more about design rather than implementation.

> > > It also has the added advantage that the resource controller writer do
> > > not have to spend their time in coming up with an interface for their
> > > controller. On the other hand if they do, the user finally ends up
> > > with multiple interface (/proc, sysfs, configfs, /dev etc.,) to do
> > > their resource management.
> >
> > So, maybe what is needed is an abstract parent RC that implements this
> > interface and lets the child RCs implement the specifics, and allows
> > CKRM to connect to the parent RC to allow finer RM control when a
> > specific model requires it.
>
> 	I'm not sure what advantage that would give compared to CKRM as it
> stands now -- it sounds much more complex. Could you give an example of
> what kind of interfaces you're suggesting?

Nothing wrong w/ CKRM per se, other than its monolithic approach.

The suggestion here would be to modularize CKRM by removing dependencies, 
effectively splitting CKRM into 3 parts:

	  RM --- RC parent (no-op)
		/ | \
	      RC child (ntask, cpu, mem, .....)

So it could be possible to:
1. Load the RC parent to provide for simple stats based on pid-hierarchy.
2. Load an RC child for rc-enforcement.
3. Load the RM for finer control across different tasks by way of an 
orthogonal hierarchy.

Although this may look more complex than the monolithic approach, it is in 
fact lots simpler, due to its "division of labor" approach.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23 11:22           ` Al Boldi
@ 2006-04-24 18:23             ` Chandra Seetharaman
  0 siblings, 0 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-24 18:23 UTC (permalink / raw)
  To: Al Boldi; +Cc: Matt Helsley, LKML, Andrew Morton

On Sun, 2006-04-23 at 14:22 +0300, Al Boldi wrote:
> Matt Helsley wrote:
> > On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> > > Chandra Seetharaman wrote:
> > > > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > > > i.e: it should be possible to run the RCs w/o CKRM.
> > > > >
> > > > > The current design pins the RCs on CKRM, when in fact this is not
> > > > > necessary. One way to decouple them, could be to pin them against
> > > > > pid, thus allowing an RC to leverage the pid hierarchy w/o the need
> > > > > for CKRM. And only when finer RM control is necessary, should CKRM
> > > > > come into play, by dynamically adjusting the RC to achieve the
> > > > > desired effect.
> > > >
> > > > This model works well in universities, where you associate some
> > > > resource when a student logs in, or a virtualised environment (like a
> > > > UML or vserver), where you attach resource to the root process.
> > > >
> > > > It doesn't work with web servers, database servers etc.,, where the
> > > > main application will be forking tasks for different set of end users.
> > > > In that case you have to group tasks that are not related to one
> > > > another and attach resources to them.
> > > >
> > > > Having a unified interface gives the system administrator ability to
> > > > group the tasks as they see them in real life (a department or
> > > > important transactions or just critical apps in a desktop).
> > >
> > > So, why drag this unified interface around when it is only needed in
> > > certain models.  The underlying interface via pid comes for free and

The "pid tree" approach will not allow ISPs to provide workload
management capabilities inside a virual server also.

> > > should be leveraged as such to yield a low overhead implementation.

As you can see "pid based resource control" does not lead to a low
overhead implementation.

>  
> > > Then maybe, when a more complex model is involved should CKRM come into
> > > play.
> >
> > Assuming I'm not misinterpretting your brief description above:
> >
> > 	The interface "via pid" does not come for free. You'd essentially
> > attach the shares structures to the task and implement inheritance and
> > hierarchy of those shares during fork
> 
> No, attach the shares struct to the parent RC, and allow it to take advantage 
> of the free pid hierarchy.
> 
> > I'm willing to discuss your ideas without patches but I think patches
> > (even if incomplete) would be clearer.
> 
> The discussion here is more about design rather than implementation.

"pid based RC" does sound as easily understandable design. But IMHO, we
should consider how the implementation will be (in this context
comparing it with CKRM). As Matt pointed it may not be any less
complex. 
> 
> > > > It also has the added advantage that the resource controller writer do
> > > > not have to spend their time in coming up with an interface for their
> > > > controller. On the other hand if they do, the user finally ends up
> > > > with multiple interface (/proc, sysfs, configfs, /dev etc.,) to do
> > > > their resource management.
> > >
> > > So, maybe what is needed is an abstract parent RC that implements this
> > > interface and lets the child RCs implement the specifics, and allows
> > > CKRM to connect to the parent RC to allow finer RM control when a
> > > specific model requires it.
> >
> > 	I'm not sure what advantage that would give compared to CKRM as it
> > stands now -- it sounds much more complex. Could you give an example of
> > what kind of interfaces you're suggesting?
> 
> Nothing wrong w/ CKRM per se, other than its monolithic approach.
> 
> The suggestion here would be to modularize CKRM by removing dependencies, 
> effectively splitting CKRM into 3 parts:
> 
> 	  RM --- RC parent (no-op)
> 		/ | \
> 	      RC child (ntask, cpu, mem, .....)
> 

If the "RC parent" is _only_ going to allow attaching resource shares
with a "pid hierarchy", then how can an RM attach unrelated tasks to any
resource share ?

CKRM brings in grouping of unrelated tasks, which IMO is not possible
with the "pid tree" approach. On the other hand, CKRM takes care of
different scenarios.

> So it could be possible to:
> 1. Load the RC parent to provide for simple stats based on pid-hierarchy.
> 2. Load an RC child for rc-enforcement.
> 3. Load the RM for finer control across different tasks by way of an 
> orthogonal hierarchy.
> 
> Although this may look more complex than the monolithic approach, it is in 
> fact lots simpler, due to its "division of labor" approach.
> 
> Thanks!
> 
> --
> Al
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 19:07 [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Al Boldi
  2006-04-21 22:04 ` Matt Helsley
@ 2006-04-21 22:09 ` Chandra Seetharaman
  1 sibling, 0 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-21 22:09 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel

On Fri, 2006-04-21 at 22:07 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the
> > > > complexity, cutting down on features and moving portions to userspace.
> > >
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> >
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> IMHO, it would be a good idea to decouple the current implementation and 
> reconnect them via an open mapper/wrapper to allow a more flexible/open 

I am not understanding your comment, can you please elaborate.

> approach to resource management, which may ease its transition into 
> mainline, due to a step-by-step instead of an all-or-none approach.

BTW, the design does allow step by step approach to resource management.
You can add individual resource control one at a time, or even turn on
only the resources you are interested in.

> 
> Thanks!
> 
> --
> Al
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC] [PATCH 00/12] CKRM after a major overhaul
@ 2006-04-21  2:24 sekharan
  2006-04-21 14:49 ` [ckrm-tech] " Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

CKRM has gone through a major overhaul by removing some of the complexity,
cutting down on features and moving portions to userspace.

Diffstat of this patchset (including the numtasks controller that follows)
is:

	23 files changed, 2475 insertions(+), 5 deletions(-)

including Documentaion and comments.

This patchset will be followed with two controllers:
	- a simple controller, numtasks to control number of tasks
	- CPU controller, to control CPU resource.
--

Brief Intro for CKRM:

Class-based Kernel Resource Management (CKRM) enables control of system
resource usage and monitoring of resource usage through user-defined
groups of tasks called classes.

Class is a group of tasks that is grouped by the administrator.

By assigning tasks to classes, administrators can monitor and bound the
resource usage of any system resource with a resource controller.
Resources amenable to such control include CPU ticks, physical pages,
disk I/O bandwidth, number of open file handles, and number of tasks to
name a few.

Userspace interfaces with CKRM through a configfs subsystem: Resource Control
File System (RCFS). Users create and delete classes simply by issuing mkdir
or rmdir commands. Once created the user may set the resource
share of a class and alter the group of tasks bound to those classes by
writing to files in the class directory. Similarly, to monitor the
subsequent resource utilization of the class, users read files in the class
directory.

Users control different resource shares of a class independent of other
resource. In other words, CPU share of class can be very different from
memory share and that of I/O share.

Resource controllers implement a small set of functions that respond to
changes in resource shares, class creation/deletion and class membership.
Given a class and its shares the controller then manages resource usage
of tasks in that class. For instance, a CPU resource controller might
manipulate the timeslice of each task according to its class' remaining
CPU share.

--

Patch Descriptions:

This set of patches implements classes, resource controller registration,
and the RCFS interface. Subsequent sets of patches add specific resource
controllers.

More details are available in the doumentation patch.

Patch descriptions:
01/12: ckrm_core
	- Provides register/unregister functions for a controller

02/12: ckrm_core_class_support
	- Provides functions to alloc and free a user defined class
	- Provides utility functions to walk through the class hierarchy

03/12: ckrm_core_handle_shares
	- Provides functions to set/get shares of a class
	- Defines a teardown function that is intended to be called when
	  user disables CKRM (by umount of configfs or rmmod of rcfs)

04/12: ckrm_tasksupport
	- Adds logic to support adding/removing task to/from a class
	- Provides an interface to set a task's class

05/12: ckrm_tasksupport_fork_exit_init
	- Initializes and clears ckrm specific information at fork() and
	  exit()
	- Inititalizes ckrm (called from start_kernel)

06/12: ckrm_tasksupport_procsupport
	- Adds an interface in /proc to get the class name of a task.

07/12 - ckrm_configfs_rcfs
	Creates configfs interface(RCFS) for managing CKRM.
	Hooks up with configfs. Provides functions for creating and
	deleting classes.

08/12 - ckrm_configfs_rcfs_attr_support
	Adds the basic attribute store and show functions.

09/12 - 04ckrm_configfs_rcfs_stats
	Adds attr_store and attr_show support for stats file.

10/12 - ckrm_configfs_rcfs_shares
	Adds attr_store and attr_show support for shares file.

11/12 - ckrm_configfs_rcfs_members
	Adds attr_store and attr_show support for members file.

12/12 - ckrm_docs
	Documentation describing important CKRM elements such as classes,
	shares, controllers, and the interface provided to userspace via RCFS

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21  2:24 sekharan
@ 2006-04-21 14:49 ` Dave Hansen
  2006-04-21 16:58   ` Chandra Seetharaman
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2006-04-21 14:49 UTC (permalink / raw)
  To: sekharan; +Cc: linux-kernel, ckrm-tech

On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> CKRM has gone through a major overhaul by removing some of the complexity,
> cutting down on features and moving portions to userspace.

What do you want done with these patches?  Do you think they are ready
for mainline?  -mm?  Or, are you just posting here for comments?

-- Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 14:49 ` [ckrm-tech] " Dave Hansen
@ 2006-04-21 16:58   ` Chandra Seetharaman
  2006-04-21 22:57     ` Andrew Morton
  0 siblings, 1 reply; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-21 16:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > CKRM has gone through a major overhaul by removing some of the complexity,
> > cutting down on features and moving portions to userspace.
> 
> What do you want done with these patches?  Do you think they are ready
> for mainline?  -mm?  Or, are you just posting here for comments?
> 

We think it is ready for -mm. But, want to go through a review cycle in
lkml before i request Andrew for that.

Thanks for asking,

chandra
> -- Dave
> 
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 16:58   ` Chandra Seetharaman
@ 2006-04-21 22:57     ` Andrew Morton
  2006-04-22  1:48       ` Chandra Seetharaman
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2006-04-21 22:57 UTC (permalink / raw)
  To: sekharan; +Cc: haveblue, linux-kernel, ckrm-tech

Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>
> On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > CKRM has gone through a major overhaul by removing some of the complexity,
> > > cutting down on features and moving portions to userspace.
> > 
> > What do you want done with these patches?  Do you think they are ready
> > for mainline?  -mm?  Or, are you just posting here for comments?
> > 
> 
> We think it is ready for -mm. But, want to go through a review cycle in
> lkml before i request Andrew for that.

>From a quick scan, the overall code quality is probably the best I've seen
for an initial submission of this magnitude.  I had a few minor issues and
questions, but it'd need a couple of hours to go through it all.

So.  Send 'em over when you're ready.

I have one concern.  If we merge this framework into mainline then we'd
(quite reasonably) expect to see an ongoing dribble of new controllers
being submitted.  But we haven't seen those controllers yet.  So there is a
risk that you'll submit a major new controller (most likely a net or memory
controller) and it will provoke a reviewer revolt.  We'd then be in a
situation of cant-go-forward, cant-go-backward.

It would increase the comfort level if we could see what the major
controllers look like before committing.  But that's unreasonable.

Could I ask that you briefly enumerate

a) which controllers you think we'll need in the forseeable future

b) what they need to do

c) pointer to prototype code if poss

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 22:57     ` Andrew Morton
@ 2006-04-22  1:48       ` Chandra Seetharaman
  2006-04-22  2:13         ` Andrew Morton
  2006-04-24  1:47         ` Hirokazu Takahashi
  0 siblings, 2 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  1:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 15:57 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the complexity,
> > > > cutting down on features and moving portions to userspace.
> > > 
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> > > 
> > 
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> From a quick scan, the overall code quality is probably the best I've seen
> for an initial submission of this magnitude.  I had a few minor issues and

Thanks, and thanks to all that helped.

> questions, but it'd need a couple of hours to go through it all.
> 
> So.  Send 'em over when you're ready.

Great. I will wait for couple of days for comments and then send them
your way.

> 
> I have one concern.  If we merge this framework into mainline then we'd
> (quite reasonably) expect to see an ongoing dribble of new controllers
> being submitted.  But we haven't seen those controllers yet.  So there is a
> risk that you'll submit a major new controller (most likely a net or memory
> controller) and it will provoke a reviewer revolt.  We'd then be in a
> situation of cant-go-forward, cant-go-backward.
> 

I totally understand your concern.

CKRM's design is not tied with a specific implementation of a
controller. It allows hooking up different controllers for the same
resource. If a controller is considered complex, it can cut some of the
features and be made simpler. Or a simpler controller can replace an
earlier complex controller without affecting the user interface. 

This flexibility feature reduces the "cant-go-forward, cant-go-back"
problem, somewhat.

FYI, we found out that managing network resources was not falling into
this task based model and we had to invent complex layering to
accommodate it. So, we dropped our plans for network support. 

One can write controller for any resource that can be accounted at task
level. The corresponding subsystem stakeholders can ensure that it is
clean, and at acceptable level.

> It would increase the comfort level if we could see what the major
> controllers look like before committing.  But that's unreasonable.

You might have seen the CPU controller (different implementation than
what we had earlier) and the numtasks controller (can prevent fork
bombs) that followed this patchset.

>
> Could I ask that you briefly enumerate
> 
> a) which controllers you think we'll need in the forseeable future
> 

Our main object is to provide resource control for the hardware
resources: CPU, I/O and memory.

We have already posted the CPU controller.

We have two implementations of memory controller and a I/O controller. 

Memory controller is understandably more complex and controversial, and
that is the reason we haven't posted it this time around (we are looking
at ways to simplify the design and hence the complexity). Both the
memory controllers has been posted to linux-mm.

I/O controller is based on CFQ-scheduler.

> b) what they need to do

Both memory controllers provide control for LRU lists.

 - One maintains the active/inactive lists per class for each zone. It
   is of order O(1). Current code is little complex. We are looking at
   ways to simplify it.

 - Another creates pseudo zones under each zones (by splitting the 
   number of pages available in a zone) and attaches them with
   each class.

I/O Controller that we are working on is based on CFQ scheduler and
provides bandwidth control.  
> 
> c) pointer to prototype code if poss

Both the memory controllers are fully functional. We need to trim them
down.

active/inactive list per class memory controller:
http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download

pzone based memory controller:
http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2

i/o controller: This controller is not ported to the framework posted,
but can be taken for a prototype version. New version would be simpler
though.

http://prdownloads.sourceforge.net/ckrm/io_rc.tar.bz2?download

Thanks & Regards,

chandra
> 
> Thanks.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  1:48       ` Chandra Seetharaman
@ 2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
                             ` (3 more replies)
  2006-04-24  1:47         ` Hirokazu Takahashi
  1 sibling, 4 replies; 30+ messages in thread
From: Andrew Morton @ 2006-04-22  2:13 UTC (permalink / raw)
  To: sekharan; +Cc: haveblue, linux-kernel, ckrm-tech

Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>
> > 
> > c) pointer to prototype code if poss
> 
> Both the memory controllers are fully functional. We need to trim them
> down.
> 
> active/inactive list per class memory controller:
> http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download

Oh my gosh.  That converts memory reclaim from per-zone LRU to
per-CKRM-class LRU.  If configured.

This is huge.  It means that we have basically two quite different versions
of memory reclaim to test and maintain.   This is a problem.

(I hope that's the before-we-added-comments version of the patch btw).

> pzone based memory controller:
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2

>From a super-quick scan that looks saner.  Is it effective?  Is this the
way you're planning on proceeding?

This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
Just that it covers a group of mm's and not just the one mm?

Do you attempt to manage just pagecache?  So if class A tries to read 10GB
from disk, does that get more aggressively reclaimed based on class A's
resource limits?

This all would have been more comfortable if done on top of the 2.4
kernel's virtual scanner.

(btw, using the term "class" to identify a group of tasks isn't very
comfortable - it's an instance, not a class...)

Worried.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
@ 2006-04-22  2:20           ` Matt Helsley
  2006-04-22  2:33             ` Andrew Morton
  2006-04-22  5:28           ` Chandra Seetharaman
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Matt Helsley @ 2006-04-22  2:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chandra S. Seetharaman, Dave Hansen, LKML, CKRM-Tech

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:

<snip> (I'll let those more familiar with the memory controller efforts
comment on those concerns)

> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

Yes, I can see how this would be uncomfortable. How about replacing
"class" with "resource group"?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:20           ` Matt Helsley
@ 2006-04-22  2:33             ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2006-04-22  2:33 UTC (permalink / raw)
  To: Matt Helsley; +Cc: sekharan, haveblue, linux-kernel, ckrm-tech

Matt Helsley <matthltc@us.ibm.com> wrote:
>
> > (btw, using the term "class" to identify a group of tasks isn't very
>  > comfortable - it's an instance, not a class...)
> 
>  Yes, I can see how this would be uncomfortable. How about replacing
>  "class" with "resource group"?

Much more comfortable, thanks ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
@ 2006-04-22  5:28           ` Chandra Seetharaman
  2006-04-24  1:10             ` KUROSAWA Takahiro
  2006-04-24  5:18             ` Hirokazu Takahashi
  2006-04-23  6:52           ` Paul Jackson
  2006-04-28  1:58           ` Chandra Seetharaman
  3 siblings, 2 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  5:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: haveblue, linux-kernel, ckrm-tech, Valerie Clement,
	Takahiro Kurosawa

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > > 
> > > c) pointer to prototype code if poss
> > 
> > Both the memory controllers are fully functional. We need to trim them
> > down.
> > 
> > active/inactive list per class memory controller:
> > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> 
> Oh my gosh.  That converts memory reclaim from per-zone LRU to
> per-CKRM-class LRU.  If configured.

Yes. We originally had an implementation that would use the existing
per-zone LRU, but the reclamation path was O(n), where n is the number
of classes. So, we moved towards a O(1) algorithm.

> 
> This is huge.  It means that we have basically two quite different versions
> of memory reclaim to test and maintain.   This is a problem.

Understood, will work and come up with an acceptable memory controller.
> 
> (I hope that's the before-we-added-comments version of the patch btw).

Yes, indeed :). As I told earlier this patch is not ready for lkml or -
mm yet.
> 
> > pzone based memory controller:
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> 
> From a super-quick scan that looks saner.  Is it effective?  Is this the
> way you're planning on proceeding?
> 

Yes, it is effective, and the reclamation is O(1) too. It has couple of
problems by design, (1) doesn't handle shared pages and (2) doesn't
provide support for both min_shares and max_shares.

> This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> Just that it covers a group of mm's and not just the one mm?

Yes, that is the core object of ckrm, associate resources to a group of
tasks.

> 
> Do you attempt to manage just pagecache?  So if class A tries to read 10GB
> from disk, does that get more aggressively reclaimed based on class A's
> resource limits?

Yes, it would get more aggressively reclaimed. But, if you have the I/O
controller also configured appropriately only class A will be affected.

> 
> This all would have been more comfortable if done on top of the 2.4
> kernel's virtual scanner.
> 
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

We could go with "Resource Group" as Matt suggested.
> 
> 

Valerie, KUROSAWA, Please free to add any more details.
> Worried.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:28           ` Chandra Seetharaman
@ 2006-04-24  1:10             ` KUROSAWA Takahiro
  2006-04-24  4:39               ` Kirill Korotaev
  2006-04-24  5:18             ` Hirokazu Takahashi
  1 sibling, 1 reply; 30+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  1:10 UTC (permalink / raw)
  To: sekharan
  Cc: akpm, haveblue, linux-kernel, ckrm-tech,
	" Valerie.Clement"

On Fri, 21 Apr 2006 22:28:45 -0700
Chandra Seetharaman <sekharan@us.ibm.com> wrote:

> > > pzone based memory controller:
> > > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> > 
> > From a super-quick scan that looks saner.  Is it effective?  Is this the
> > way you're planning on proceeding?
> 
> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> problems by design, (1) doesn't handle shared pages and (2) doesn't
> provide support for both min_shares and max_shares.

Right.  I wanted to show proof-of-cencept of the pzone based controller
and implemented minimal features necessary as the memory controller.
So, the pzone based controller still needs development and some cleanup.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  1:10             ` KUROSAWA Takahiro
@ 2006-04-24  4:39               ` Kirill Korotaev
  2006-04-24  5:41                 ` KUROSAWA Takahiro
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill Korotaev @ 2006-04-24  4:39 UTC (permalink / raw)
  To: KUROSAWA Takahiro
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement

>>>> pzone based memory controller:
>>>> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
>>> From a super-quick scan that looks saner.  Is it effective?  Is this the
>>> way you're planning on proceeding?
>> Yes, it is effective, and the reclamation is O(1) too. It has couple of
>> problems by design, (1) doesn't handle shared pages and (2) doesn't
>> provide support for both min_shares and max_shares.
> 
> Right.  I wanted to show proof-of-cencept of the pzone based controller
> and implemented minimal features necessary as the memory controller.
> So, the pzone based controller still needs development and some cleanup.
Just out of curiosity, how it was meassured that it is effective?
How does it work when there is a global memory shortage in the system?

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  4:39               ` Kirill Korotaev
@ 2006-04-24  5:41                 ` KUROSAWA Takahiro
  2006-04-24  6:45                   ` Kirill Korotaev
  0 siblings, 1 reply; 30+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  5:41 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement

On Mon, 24 Apr 2006 08:39:52 +0400
Kirill Korotaev <dev@openvz.org> wrote:

> >>>> pzone based memory controller:
> >>>> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> >>> From a super-quick scan that looks saner.  Is it effective?  Is this the
> >>> way you're planning on proceeding?
> >> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> >> problems by design, (1) doesn't handle shared pages and (2) doesn't
> >> provide support for both min_shares and max_shares.
> > 
> > Right.  I wanted to show proof-of-cencept of the pzone based controller
> > and implemented minimal features necessary as the memory controller.
> > So, the pzone based controller still needs development and some cleanup.
> Just out of curiosity, how it was meassured that it is effective?

I don't have any benchmark numbers yet, so I can't explain the
effectiveness with numbers.  I've been looking for the way to
measure the cost of pzones correctly, but I've not found it out yet.

> How does it work when there is a global memory shortage in the system?

I guess you are referring to the situation that global memory is running
out but there are free pages in pzones.  These free pages in pzones are
handled as reserved for pzone users and not used even in global memory 
shortage.

Thanks,
-- 
KUROSAWA, Takahiro

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  5:41                 ` KUROSAWA Takahiro
@ 2006-04-24  6:45                   ` Kirill Korotaev
  2006-04-24  7:12                     ` KUROSAWA Takahiro
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill Korotaev @ 2006-04-24  6:45 UTC (permalink / raw)
  To: KUROSAWA Takahiro
  Cc: Kirill Korotaev, sekharan, akpm, haveblue, linux-kernel,
	ckrm-tech, Valerie.Clement, devel

>>>>Yes, it is effective, and the reclamation is O(1) too. It has couple of
>>>>problems by design, (1) doesn't handle shared pages and (2) doesn't
>>>>provide support for both min_shares and max_shares.
>>>
>>>Right.  I wanted to show proof-of-cencept of the pzone based controller
>>>and implemented minimal features necessary as the memory controller.
>>>So, the pzone based controller still needs development and some cleanup.
>>
>>Just out of curiosity, how it was meassured that it is effective?
> 
> 
> I don't have any benchmark numbers yet, so I can't explain the
> effectiveness with numbers.  I've been looking for the way to
> measure the cost of pzones correctly, but I've not found it out yet.
> 
> 
>>How does it work when there is a global memory shortage in the system?
> 
> 
> I guess you are referring to the situation that global memory is running
> out but there are free pages in pzones.  These free pages in pzones are
> handled as reserved for pzone users and not used even in global memory 
> shortage.
ok. Let me explain what I mean.
Imagine the situation with global memory shortage. In kernel, there are 
threads which do some job behalf the user, e.g. kjournald, loop etc. If 
the user has some pzone memory, but these threads fail to do their job 
some nasty things can happen (ext3 problems, deadlocks, OOM etc.)
If such behaviour is ok for you, then great. But did you consider it?

Also, I can't understand how it works with OOM killer. If pzones has 
enough memory, but there is a global shortage, who will be killed?

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  6:45                   ` Kirill Korotaev
@ 2006-04-24  7:12                     ` KUROSAWA Takahiro
  0 siblings, 0 replies; 30+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  7:12 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement, devel

On Mon, 24 Apr 2006 10:45:59 +0400
Kirill Korotaev <dev@openvz.org> wrote:

> >>>>Yes, it is effective, and the reclamation is O(1) too. It has couple of
> >>>>problems by design, (1) doesn't handle shared pages and (2) doesn't
> >>>>provide support for both min_shares and max_shares.
> >>>
> >>>Right.  I wanted to show proof-of-cencept of the pzone based controller
> >>>and implemented minimal features necessary as the memory controller.
> >>>So, the pzone based controller still needs development and some cleanup.
> >>
> >>Just out of curiosity, how it was meassured that it is effective?
> > 
> > I don't have any benchmark numbers yet, so I can't explain the
> > effectiveness with numbers.  I've been looking for the way to
> > measure the cost of pzones correctly, but I've not found it out yet.
> > 
> >>How does it work when there is a global memory shortage in the system?
> > 
> > I guess you are referring to the situation that global memory is running
> > out but there are free pages in pzones.  These free pages in pzones are
> > handled as reserved for pzone users and not used even in global memory 
> > shortage.
> ok. Let me explain what I mean.
> Imagine the situation with global memory shortage. In kernel, there are 
> threads which do some job behalf the user, e.g. kjournald, loop etc. If 
> the user has some pzone memory, but these threads fail to do their job 
> some nasty things can happen (ext3 problems, deadlocks, OOM etc.)
> If such behaviour is ok for you, then great. But did you consider it?
> 
> Also, I can't understand how it works with OOM killer. If pzones has 
> enough memory, but there is a global shortage, who will be killed?

I understand.
IMHO, only the system processes should use global memory.
User processes that may cause such memory shortage should be 
enclosed in pzones first.

Thanks,

-- 
KUROSAWA, Takahiro

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:28           ` Chandra Seetharaman
  2006-04-24  1:10             ` KUROSAWA Takahiro
@ 2006-04-24  5:18             ` Hirokazu Takahashi
  2006-04-25  1:42               ` Chandra Seetharaman
  1 sibling, 1 reply; 30+ messages in thread
From: Hirokazu Takahashi @ 2006-04-24  5:18 UTC (permalink / raw)
  To: sekharan
  Cc: akpm, haveblue, linux-kernel, ckrm-tech,
	" Valerie.Clement", kurosawa

Hi Chandra, 

> > > > c) pointer to prototype code if poss
> > > 
> > > Both the memory controllers are fully functional. We need to trim them
> > > down.
> > > 
> > > active/inactive list per class memory controller:
> > > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> > 
> > Oh my gosh.  That converts memory reclaim from per-zone LRU to
> > per-CKRM-class LRU.  If configured.
> 
> Yes. We originally had an implementation that would use the existing
> per-zone LRU, but the reclamation path was O(n), where n is the number
> of classes. So, we moved towards a O(1) algorithm.
> 
> > 
> > This is huge.  It means that we have basically two quite different versions
> > of memory reclaim to test and maintain.   This is a problem.
> 
> Understood, will work and come up with an acceptable memory controller.
> > 
> > (I hope that's the before-we-added-comments version of the patch btw).
> 
> Yes, indeed :). As I told earlier this patch is not ready for lkml or -
> mm yet.
> > 
> > > pzone based memory controller:
> > > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> > 
> > From a super-quick scan that looks saner.  Is it effective?  Is this the
> > way you're planning on proceeding?
> > 
> 
> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> problems by design, (1) doesn't handle shared pages and (2) doesn't
> provide support for both min_shares and max_shares.

I'm not sure all of them have to be managed under ckrm_core and rcfs
in kernel.

These functions you mentioned can be implemented in user space
to minimize the overhead in usual VM operations because it isn't
expected quick response to resize it. It is a bit different from
that of CPU resource.

You don't need to invent everything. I think you can reuse what
NUMA team is doing instead. This approach may not fit in your rcfs,
though.

> > This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> > Just that it covers a group of mm's and not just the one mm?
> 
> Yes, that is the core object of ckrm, associate resources to a group of
> tasks.

Thanks,
Hirokazu Takahahsi.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  5:18             ` Hirokazu Takahashi
@ 2006-04-25  1:42               ` Chandra Seetharaman
  0 siblings, 0 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-25  1:42 UTC (permalink / raw)
  To: Hirokazu Takahashi
  Cc: akpm, haveblue, linux-kernel, ckrm-tech, Valerie.Clement,
	kurosawa

On Mon, 2006-04-24 at 14:18 +0900, Hirokazu Takahashi wrote:
> Hi Chandra, 
<snip>
> > Yes, it is effective, and the reclamation is O(1) too. It has couple of
> > problems by design, (1) doesn't handle shared pages and (2) doesn't
> > provide support for both min_shares and max_shares.
> 
> I'm not sure all of them have to be managed under ckrm_core and rcfs
> in kernel.
> 
> These functions you mentioned can be implemented in user space
> to minimize the overhead in usual VM operations because it isn't
> expected quick response to resize it. It is a bit different from
> that of CPU resource.

Agree, that is where the additional complexity arise from.

If the user can achieve the same results with user space solution that
would be good too. 

Thanks

chandra

> You don't need to invent everything. I think you can reuse what
> NUMA team is doing instead. This approach may not fit in your rcfs,
> though.
> 
> > > This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> > > Just that it covers a group of mm's and not just the one mm?
> > 
> > Yes, that is the core object of ckrm, associate resources to a group of
> > tasks.
> 
> Thanks,
> Hirokazu Takahahsi.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
  2006-04-22  5:28           ` Chandra Seetharaman
@ 2006-04-23  6:52           ` Paul Jackson
  2006-04-23  9:31             ` Matt Helsley
  2006-04-28  1:58           ` Chandra Seetharaman
  3 siblings, 1 reply; 30+ messages in thread
From: Paul Jackson @ 2006-04-23  6:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: sekharan, haveblue, linux-kernel, ckrm-tech

Andrew wrote:
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

Bless you.  I objected to the term 'class' a long time ago, but failed
to advance my case in a successful fashion.

Matt replied:
> "resource group"?

Nice.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23  6:52           ` Paul Jackson
@ 2006-04-23  9:31             ` Matt Helsley
  0 siblings, 0 replies; 30+ messages in thread
From: Matt Helsley @ 2006-04-23  9:31 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andrew Morton, sekharan, haveblue, linux-kernel, ckrm-tech

On Sat, 2006-04-22 at 23:52 -0700, Paul Jackson wrote:
> Andrew wrote:
> > (btw, using the term "class" to identify a group of tasks isn't very
> > comfortable - it's an instance, not a class...)
> 
> Bless you.  I objected to the term 'class' a long time ago, but failed
> to advance my case in a successful fashion.

	Well, I wouldn't say you were entirely unsuccessful. I distinctly
remembered your case and I tried to think of suitable names during the
recent changes. Please take a look at the latest set of patches and see
if you think the names are clearer.

> Matt replied:
> > "resource group"?
> 
> Nice.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
                             ` (2 preceding siblings ...)
  2006-04-23  6:52           ` Paul Jackson
@ 2006-04-28  1:58           ` Chandra Seetharaman
  2006-04-28  6:07             ` Kirill Korotaev
  3 siblings, 1 reply; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-28  1:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > > 
> > > c) pointer to prototype code if poss
> > 
> > Both the memory controllers are fully functional. We need to trim them
> > down.
> > 
> > active/inactive list per class memory controller:
> > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> 
> Oh my gosh.  That converts memory reclaim from per-zone LRU to
> per-CKRM-class LRU.  If configured.
> 
> This is huge.  It means that we have basically two quite different versions
> of memory reclaim to test and maintain.   This is a problem.
> 
> (I hope that's the before-we-added-comments version of the patch btw).
> 
> > pzone based memory controller:
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> 
> From a super-quick scan that looks saner.  Is it effective?  Is this the
> way you're planning on proceeding?
> 
> This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> Just that it covers a group of mm's and not just the one mm?
> 
> Do you attempt to manage just pagecache?  So if class A tries to read 10GB
> from disk, does that get more aggressively reclaimed based on class A's
> resource limits?
> 
> This all would have been more comfortable if done on top of the 2.4
> kernel's virtual scanner.
> 
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)
> 
> 
> Worried.

The object of this infrastructure is to get a unified interface for
resource management, irrespective of the resource that is being managed.

As I mentioned in my earlier email, subsystem experts are the ones who
will finally decide what type resource controller they will accept. With
VM experts' direction and advice, i am positive that we will get an
excellent memory controller (as well as other controllers).

As you might have noticed, we have gone through major changes to come to
community's acceptance levels. We are now making use of all possible
features (kref, process event connector, configfs, module parameter,
kzalloc) in this infrastructure.

Having a CPU controller, two memory controllers, an I/O controller and a
numtasks controller proves that the infrastructure does handle major
resources nicely and is also capable of managing virtual resources.

Hope i reduced your worries (at least some :).

regards,

chandra
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-28  1:58           ` Chandra Seetharaman
@ 2006-04-28  6:07             ` Kirill Korotaev
  2006-04-28 17:57               ` Chandra Seetharaman
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill Korotaev @ 2006-04-28  6:07 UTC (permalink / raw)
  To: sekharan; +Cc: Andrew Morton, haveblue, linux-kernel, ckrm-tech

>>Worried.
> The object of this infrastructure is to get a unified interface for
> resource management, irrespective of the resource that is being managed.
> 
> As I mentioned in my earlier email, subsystem experts are the ones who
> will finally decide what type resource controller they will accept. With
> VM experts' direction and advice, i am positive that we will get an
> excellent memory controller (as well as other controllers).
> 
> As you might have noticed, we have gone through major changes to come to
> community's acceptance levels. We are now making use of all possible
> features (kref, process event connector, configfs, module parameter,
> kzalloc) in this infrastructure.
> 
> Having a CPU controller, two memory controllers, an I/O controller and a
> numtasks controller proves that the infrastructure does handle major
> resources nicely and is also capable of managing virtual resources.
> 
> Hope i reduced your worries (at least some :).
Not all :) Let me explain.

Until you provided something more complex then numtasks, this 
infrastructure is pure theory. For example, in your infrastracture, when 
you will add memory resource controller with data sharing, you will face 
that changing CKRM class of the tasks is almost impossible in a suitable 
way. Another possible situation: hierarchical classes with shared memory 
are even more complicated thing.

In both cases you can end up with a poor/complicated/slow solution or 
dropping some of your infrastructre features (changing class on the fly, 
hierarchy) or which is worse IMHO with incosistency between controllers 
and interfaces.

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-28  6:07             ` Kirill Korotaev
@ 2006-04-28 17:57               ` Chandra Seetharaman
  0 siblings, 0 replies; 30+ messages in thread
From: Chandra Seetharaman @ 2006-04-28 17:57 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: Andrew Morton, haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-28 at 10:07 +0400, Kirill Korotaev wrote:
> >>Worried.
> > The object of this infrastructure is to get a unified interface for
> > resource management, irrespective of the resource that is being managed.
> > 
> > As I mentioned in my earlier email, subsystem experts are the ones who
> > will finally decide what type resource controller they will accept. With
> > VM experts' direction and advice, i am positive that we will get an
> > excellent memory controller (as well as other controllers).
> > 
> > As you might have noticed, we have gone through major changes to come to
> > community's acceptance levels. We are now making use of all possible
> > features (kref, process event connector, configfs, module parameter,
> > kzalloc) in this infrastructure.
> > 
> > Having a CPU controller, two memory controllers, an I/O controller and a
> > numtasks controller proves that the infrastructure does handle major
> > resources nicely and is also capable of managing virtual resources.
> > 
> > Hope i reduced your worries (at least some :).
> Not all :) Let me explain.
> 
> Until you provided something more complex then numtasks, this 
> infrastructure is pure theory. For example, in your infrastracture, when 
> you will add memory resource controller with data sharing, you will face 
> that changing CKRM class of the tasks is almost impossible in a suitable 

I do not see a problem here, there could be 2 solutions:
 - do not account shared pages against the resource group(put them in
   the default resource group (as some other OSs do)).
 - when you are moving the task to a different class, calculate the
   resource group's usage depending on how many users are using a 
   specific page.
> way. Another possible situation: hierarchical classes with shared memory 
> are even more complicated thing.

Hierarchy is not an issue. Resource controller can calculate the
absolute number of resources (say no. of pages in this case) when the
shares are assigned and then treat all resource groups as flat.

> 
> In both cases you can end up with a poor/complicated/slow solution or 
> dropping some of your infrastructre features (changing class on the fly, 
> hierarchy) or which is worse IMHO with incosistency between controllers 
> and interfaces.

I am not convinced (based on the above explanations).
> 
> Thanks,
> Kirill
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  1:48       ` Chandra Seetharaman
  2006-04-22  2:13         ` Andrew Morton
@ 2006-04-24  1:47         ` Hirokazu Takahashi
  2006-04-24 20:42           ` Shailabh Nagar
  1 sibling, 1 reply; 30+ messages in thread
From: Hirokazu Takahashi @ 2006-04-24  1:47 UTC (permalink / raw)
  To: sekharan; +Cc: akpm, haveblue, linux-kernel, ckrm-tech

Hi Chandra,

> > Could I ask that you briefly enumerate
> > 
> > a) which controllers you think we'll need in the forseeable future
> > 
> 
> Our main object is to provide resource control for the hardware
> resources: CPU, I/O and memory.
> 
> We have already posted the CPU controller.
> 
> We have two implementations of memory controller and a I/O controller. 
> 
> Memory controller is understandably more complex and controversial, and
> that is the reason we haven't posted it this time around (we are looking
> at ways to simplify the design and hence the complexity). Both the
> memory controllers has been posted to linux-mm.
> 
> I/O controller is based on CFQ-scheduler.
> 
> > b) what they need to do

	(snip)

> I/O Controller that we are working on is based on CFQ scheduler and
> provides bandwidth control.  
> > 
> > c) pointer to prototype code if poss

	(snip)

> i/o controller: This controller is not ported to the framework posted,
> but can be taken for a prototype version. New version would be simpler
> though.

I think controlling I/O bandwidth is right way to go.

However, I think you need to change the design of the controller a bit.
A lot of I/O requests processes issue will be handled by other contexts.
There are AIO, journaling, pdflush and vmscan, which some kernel threads
treat instead of the processes.

The current design looks not to care about this.

Thanks,
Hirokazu Takahashi.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  1:47         ` Hirokazu Takahashi
@ 2006-04-24 20:42           ` Shailabh Nagar
  0 siblings, 0 replies; 30+ messages in thread
From: Shailabh Nagar @ 2006-04-24 20:42 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech

Hirokazu Takahashi wrote:

>  
>
>>i/o controller: This controller is not ported to the framework posted,
>>but can be taken for a prototype version. New version would be simpler
>>though.
>>    
>>
>
>I think controlling I/O bandwidth is right way to go.
>  
>
Thanks. Obviously we agree heartily :-)

>However, I think you need to change the design of the controller a bit.
>A lot of I/O requests processes issue will be handled by other contexts.
>There are AIO, journaling, pdflush and vmscan, which some kernel threads
>treat instead of the processes.
>
>The current design looks not to care about this.
>  
>
Yes. The current design, which builds directly on top of the CFQ 
scheduler, does not attempt to treat kernel
threads specially in order to account the I/O they're doing on behalf of 
others properly. This was mainly because
of the desire to keep the controller simple.

I suspect pdflush and vmscan I/O is never going to be properly 
attributable and journaling may be possible but
unlikely to be worth it given the risks of throttling it ?  AIO is 
likely to be something we can address if there is
consensus that one is willing to pay the price of tracking the source 
through the I/O submission layers.

I suppose this would be a good time to dust off the I/O controller and 
post it so discussions can become more
concrete.

But as always, changes in the design and implementation are always 
welcome....

Regards,
Shailabh


>Thanks,
>Hirokazu Takahashi.
>  
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2006-04-28 17:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-21 19:07 [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Al Boldi
2006-04-21 22:04 ` Matt Helsley
     [not found]   ` <200604220708.40018.a1426z@gawab.com>
2006-04-22  5:46     ` Chandra Seetharaman
2006-04-22 20:40       ` Al Boldi
2006-04-23  2:33         ` Matt Helsley
2006-04-23 11:22           ` Al Boldi
2006-04-24 18:23             ` Chandra Seetharaman
2006-04-21 22:09 ` Chandra Seetharaman
  -- strict thread matches above, loose matches on Subject: below --
2006-04-21  2:24 sekharan
2006-04-21 14:49 ` [ckrm-tech] " Dave Hansen
2006-04-21 16:58   ` Chandra Seetharaman
2006-04-21 22:57     ` Andrew Morton
2006-04-22  1:48       ` Chandra Seetharaman
2006-04-22  2:13         ` Andrew Morton
2006-04-22  2:20           ` Matt Helsley
2006-04-22  2:33             ` Andrew Morton
2006-04-22  5:28           ` Chandra Seetharaman
2006-04-24  1:10             ` KUROSAWA Takahiro
2006-04-24  4:39               ` Kirill Korotaev
2006-04-24  5:41                 ` KUROSAWA Takahiro
2006-04-24  6:45                   ` Kirill Korotaev
2006-04-24  7:12                     ` KUROSAWA Takahiro
2006-04-24  5:18             ` Hirokazu Takahashi
2006-04-25  1:42               ` Chandra Seetharaman
2006-04-23  6:52           ` Paul Jackson
2006-04-23  9:31             ` Matt Helsley
2006-04-28  1:58           ` Chandra Seetharaman
2006-04-28  6:07             ` Kirill Korotaev
2006-04-28 17:57               ` Chandra Seetharaman
2006-04-24  1:47         ` Hirokazu Takahashi
2006-04-24 20:42           ` Shailabh Nagar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox