public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] [PATCH 00/12] CKRM after a major overhaul
@ 2006-04-21  2:24 sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 01/12] Register/Unregister interface for Controllers sekharan
                   ` (12 more replies)
  0 siblings, 13 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

CKRM has gone through a major overhaul by removing some of the complexity,
cutting down on features and moving portions to userspace.

Diffstat of this patchset (including the numtasks controller that follows)
is:

	23 files changed, 2475 insertions(+), 5 deletions(-)

including Documentaion and comments.

This patchset will be followed with two controllers:
	- a simple controller, numtasks to control number of tasks
	- CPU controller, to control CPU resource.
--

Brief Intro for CKRM:

Class-based Kernel Resource Management (CKRM) enables control of system
resource usage and monitoring of resource usage through user-defined
groups of tasks called classes.

Class is a group of tasks that is grouped by the administrator.

By assigning tasks to classes, administrators can monitor and bound the
resource usage of any system resource with a resource controller.
Resources amenable to such control include CPU ticks, physical pages,
disk I/O bandwidth, number of open file handles, and number of tasks to
name a few.

Userspace interfaces with CKRM through a configfs subsystem: Resource Control
File System (RCFS). Users create and delete classes simply by issuing mkdir
or rmdir commands. Once created the user may set the resource
share of a class and alter the group of tasks bound to those classes by
writing to files in the class directory. Similarly, to monitor the
subsequent resource utilization of the class, users read files in the class
directory.

Users control different resource shares of a class independent of other
resource. In other words, CPU share of class can be very different from
memory share and that of I/O share.

Resource controllers implement a small set of functions that respond to
changes in resource shares, class creation/deletion and class membership.
Given a class and its shares the controller then manages resource usage
of tasks in that class. For instance, a CPU resource controller might
manipulate the timeslice of each task according to its class' remaining
CPU share.

--

Patch Descriptions:

This set of patches implements classes, resource controller registration,
and the RCFS interface. Subsequent sets of patches add specific resource
controllers.

More details are available in the doumentation patch.

Patch descriptions:
01/12: ckrm_core
	- Provides register/unregister functions for a controller

02/12: ckrm_core_class_support
	- Provides functions to alloc and free a user defined class
	- Provides utility functions to walk through the class hierarchy

03/12: ckrm_core_handle_shares
	- Provides functions to set/get shares of a class
	- Defines a teardown function that is intended to be called when
	  user disables CKRM (by umount of configfs or rmmod of rcfs)

04/12: ckrm_tasksupport
	- Adds logic to support adding/removing task to/from a class
	- Provides an interface to set a task's class

05/12: ckrm_tasksupport_fork_exit_init
	- Initializes and clears ckrm specific information at fork() and
	  exit()
	- Inititalizes ckrm (called from start_kernel)

06/12: ckrm_tasksupport_procsupport
	- Adds an interface in /proc to get the class name of a task.

07/12 - ckrm_configfs_rcfs
	Creates configfs interface(RCFS) for managing CKRM.
	Hooks up with configfs. Provides functions for creating and
	deleting classes.

08/12 - ckrm_configfs_rcfs_attr_support
	Adds the basic attribute store and show functions.

09/12 - 04ckrm_configfs_rcfs_stats
	Adds attr_store and attr_show support for stats file.

10/12 - ckrm_configfs_rcfs_shares
	Adds attr_store and attr_show support for shares file.

11/12 - ckrm_configfs_rcfs_members
	Adds attr_store and attr_show support for members file.

12/12 - ckrm_docs
	Documentation describing important CKRM elements such as classes,
	shares, controllers, and the interface provided to userspace via RCFS

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 01/12] Register/Unregister interface for Controllers
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 02/12] Class creation/deletion sekharan
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

01/12 - ckrm_core

This patch defines data structures for defining a class and resource
controller.
Provides register/unregister functions for a controller.
Provides utility functions to get a controller's data structure.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Hubertus Franke <frankeh@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Gerrit Huizenga <gh@us.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 include/linux/ckrm.h     |   76 +++++++++++++++++
 include/linux/ckrm_rc.h  |   67 ++++++++++++++
 init/Kconfig             |   14 +++
 kernel/Makefile          |    1 
 kernel/ckrm/Makefile     |    1 
 kernel/ckrm/ckrm.c       |  210 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/ckrm/ckrm_local.h |   13 ++
 7 files changed, 382 insertions(+)

Index: linux2617-rc2/include/linux/ckrm.h
===================================================================
--- /dev/null
+++ linux2617-rc2/include/linux/ckrm.h
@@ -0,0 +1,76 @@
+/*
+ *  ckrm.h - Header file to be used by Class-based Kernel Resource
+ *  Management (CKRM).
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003, 2004
+ *		(C) Shailabh Nagar,  IBM Corp. 2003, 2004
+ *		(C) Chandra Seetharaman, IBM Corp. 2003, 2004, 2005
+ *
+ * Provides data structures, macros and kernel APIs
+ *
+ * More details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef _LINUX_CKRM_H
+#define _LINUX_CKRM_H
+
+#ifdef CONFIG_CKRM
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+
+#define CKRM_SHARE_UNCHANGED	(-1)	/* implicitly specified by userspace,
+					 * never stored in a class' shares
+					 * struct, and never displayed */
+#define CKRM_SHARE_UNSUPPORTED	(-2)	/* If the resource controller doesn't
+					 * support user changing a shares value
+					 * it sets the corresponding share
+					 * value to UNSUPPORTED when it returns
+					 * the newly allocated shares data
+					 * structure */
+#define CKRM_SHARE_DONT_CARE	(-3)
+
+#define CKRM_SHARE_DEFAULT_DIVISOR 	(100)
+
+#define CKRM_MAX_RES_CTLRS	8	/* max # of resource controllers */
+
+#define CKRM_NO_CLASS		NULL
+#define CKRM_NO_SHARE		NULL
+#define CKRM_NO_RES_ID		CKRM_MAX_RES_CTLRS /* Invalid ID */
+
+/*
+ * Share quantities are a child's fraction of the parent's resource
+ * specified by a divisor in the parent and a dividend in the child.
+ *
+ * Shares are represented as a relative quantity between parent and child
+ * to simplify locking when propagating modifications to the shares of a
+ * class. Only the parent and the children of the modified class need to be
+ * locked.
+*/
+struct ckrm_shares {
+};
+
+/*
+ * Class is the grouping of tasks with shares of each resource that has
+ * registered a resource controller (see include/linux/ckrm_rc.h).
+ */
+struct ckrm_class {
+	int depth;		/* depth of this class. root == 0 */
+	spinlock_t class_lock;	/* protects task_list, shares and children
+				 * When grabbing class_lock in a hierarchy,
+				 * grab parent's class_lock first.
+				 * If resource controller uses a class
+				 * specific lock, grab class_lock before
+				 * grabbing resource specific lock */
+	struct ckrm_shares *shares[CKRM_MAX_RES_CTLRS];/* resource shares */
+	struct list_head class_list;	/* entry in list of all classes */
+};
+
+#endif /* CONFIG_CKRM */
+#endif /* _LINUX_CKRM_H */
Index: linux2617-rc2/include/linux/ckrm_rc.h
===================================================================
--- /dev/null
+++ linux2617-rc2/include/linux/ckrm_rc.h
@@ -0,0 +1,67 @@
+/*
+ *  ckrm_rc.h - Header file to be used by Resource controllers of CKRM
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003
+ *		(C) Shailabh Nagar,  IBM Corp. 2003
+ *		(C) Chandra Seetharaman, IBM Corp. 2003, 2004, 2005
+ *		(C) Vivek Kashyap , IBM Corp. 2004
+ *
+ * Provides data structures, macros and kernel API of CKRM for
+ * resource controllers.
+ *
+ * More details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef _LINUX_CKRM_RC_H
+#define _LINUX_CKRM_RC_H
+
+#include <linux/ckrm.h>
+
+struct ckrm_controller {
+	const char *name;
+	int depth_supported;/* maximum hierarchy supported by controller */
+	unsigned int ctlr_id;
+
+	/*
+	 * Keeps number of references to this controller structure. kref
+	 * does not work as we want to be able to allow removal of a
+	 * controller even when some classes are defined.
+	 */
+	atomic_t count;
+
+	/*
+	 * Allocate a new shares struct for this resource controller.
+	 * Called when registering a resource controller with pre-existing
+	 * classes and when new classes are created by the user.
+	 */
+	struct ckrm_shares *(*alloc_shares_struct)(struct ckrm_class *);
+	/* Corresponding free of shares struct for this resource controller */
+	void (*free_shares_struct)(struct ckrm_shares *);
+
+	/* Notifies the controller when the shares are changed */
+	void (*shares_changed)(struct ckrm_shares *);
+
+	/* resource statistics */
+	ssize_t (*show_stats)(struct ckrm_shares *, char *, size_t);
+	int (*reset_stats)(struct ckrm_shares *, const char *);
+
+	/*
+	 * move_task is called when a task moves from one class to another.
+	 * The first parameter is the task that is moving, the second
+	 * is the resource specific shares of the previous class the task
+	 * was in, and the third is the shares of the class the task has
+	 * moved to.
+	 */
+	void (*move_task)(struct task_struct *, struct ckrm_shares *,
+				struct ckrm_shares *);
+};
+
+extern int ckrm_register_controller(struct ckrm_controller *);
+extern int ckrm_unregister_controller(struct ckrm_controller *);
+#endif /* _LINUX_CKRM_RC_H */
Index: linux2617-rc2/kernel/ckrm/ckrm.c
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/ckrm.c
@@ -0,0 +1,210 @@
+/* ckrm.c - Class-based Kernel Resource Management (CKRM)
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003, 2004
+ *		(C) Shailabh Nagar, IBM Corp. 2003, 2004
+ *		(C) Chandra Seetharaman, IBM Corp. 2003, 2004, 2005
+ *		(C) Vivek Kashyap, IBM Corp. 2004
+ *		(C) Matt Helsley, IBM Corp. 2006
+ *
+ * Provides kernel API of CKRM for in-kernel,per-resource controllers
+ * (one each for cpu, memory and io).
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/ckrm_rc.h>
+
+static struct ckrm_controller *ckrm_controllers[CKRM_MAX_RES_CTLRS];
+/* res_ctlrs_lock protects ckrm_controllers array and count in controllers*/
+static spinlock_t res_ctlrs_lock = SPIN_LOCK_UNLOCKED;
+
+static LIST_HEAD(ckrm_classes);/* link all classes */
+static rwlock_t ckrm_class_lock = RW_LOCK_UNLOCKED; /* protects ckrm_classes */
+
+/* Must be called with res_ctlr_lock held */
+static inline int is_ctlr_id_valid(unsigned int ctlr_id)
+{
+	return ((ctlr_id < CKRM_MAX_RES_CTLRS) &&
+			(ckrm_controllers[ctlr_id] != NULL));
+}
+
+struct ckrm_controller *ckrm_get_controller_by_id(unsigned int ctlr_id)
+{
+	/*
+	 * inc of controller[i].count has to be atomically done with
+	 * checking the array ckrm_controllers, as remove_controller()
+	 * checks controller[i].count and clears ckrm_controllers[i]
+	 * atomically, under res_ctlrs_lock.
+	 */
+	spin_lock(&res_ctlrs_lock);
+	if (!is_ctlr_id_valid(ctlr_id)) {
+		spin_unlock(&res_ctlrs_lock);
+		return NULL;
+	}
+	atomic_inc(&ckrm_controllers[ctlr_id]->count);
+	spin_unlock(&res_ctlrs_lock);
+	return ckrm_controllers[ctlr_id];
+}
+
+struct ckrm_controller *ckrm_get_controller_by_name(const char *name)
+{
+	struct ckrm_controller *ctlr;
+	unsigned int i;
+
+	spin_lock(&res_ctlrs_lock);
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++, ctlr = NULL) {
+		ctlr = ckrm_controllers[i];
+		if (!ctlr)
+			continue;
+		if (!strcmp(name, ctlr->name)) {
+			atomic_inc(&ckrm_controllers[i]->count);
+			break;
+		}
+	}
+	spin_unlock(&res_ctlrs_lock);
+	return ctlr;
+}
+
+static void ckrm_get_controller(struct ckrm_controller *ctlr)
+{
+	atomic_inc(&ctlr->count);
+}
+
+void ckrm_put_controller(struct ckrm_controller *ctlr)
+{
+	atomic_dec(&ctlr->count);
+}
+
+/* Allocate resource specific information for a class */
+static void do_alloc_shares_struct(struct ckrm_class *class,
+					struct ckrm_controller *ctlr)
+{
+	if (class->shares[ctlr->ctlr_id]) /* already allocated */
+		return;
+
+	if (class->depth > ctlr->depth_supported)
+		return;
+
+	class->shares[ctlr->ctlr_id] = ctlr->alloc_shares_struct(class);
+	if (class->shares[ctlr->ctlr_id] != NULL)
+		ckrm_get_controller(ctlr);
+}
+
+/* Free up the given resource specific information in a class */
+static void do_free_shares_struct(struct ckrm_class *class,
+						struct ckrm_controller *ctlr)
+{
+	struct ckrm_shares *shares = class->shares[ctlr->ctlr_id];
+
+	/* No shares alloced previously */
+	if (shares == NULL)
+		return;
+
+	spin_lock(&class->class_lock);
+	class->shares[ctlr->ctlr_id] = NULL;
+	spin_unlock(&class->class_lock);
+	ctlr->free_shares_struct(shares);
+	ckrm_put_controller(ctlr); /* Drop reference acquired in do_alloc */
+}
+
+static int add_controller(struct ckrm_controller *ctlr)
+{
+	int ctlr_id, ret = -ENOSPC;
+
+	spin_lock(&res_ctlrs_lock);
+	for (ctlr_id = 0; ctlr_id < CKRM_MAX_RES_CTLRS; ctlr_id++)
+		if (ckrm_controllers[ctlr_id] == NULL) {
+			ckrm_controllers[ctlr_id] = ctlr;
+			ret = ctlr_id;
+			break;
+		}
+	spin_unlock(&res_ctlrs_lock);
+	return ret;
+}
+
+/*
+ * Interface for registering a resource controller.
+ *
+ * Returns the 0 on success, -errno for failure.
+ * Fills ctlr->ctlr_id with a valid controller id on success.
+ */
+int ckrm_register_controller(struct ckrm_controller *ctlr)
+{
+	int ret;
+	struct ckrm_class *class;
+
+	if (!ctlr)
+		return -EINVAL;
+
+	/* Make sure there is an alloc and a free */
+	if (!ctlr->alloc_shares_struct || !ctlr->free_shares_struct)
+		return -EINVAL;
+
+	ret = add_controller(ctlr);
+
+	if (ret < 0)
+		return ret;
+
+	ctlr->ctlr_id = ret;
+
+	atomic_set(&ctlr->count, 0);
+
+	/*
+	 * Run through all classes and create the controller specific data
+	 * structures.
+	 */
+	read_lock(&ckrm_class_lock);
+	list_for_each_entry(class, &ckrm_classes, class_list)
+		do_alloc_shares_struct(class, ctlr);
+	read_unlock(&ckrm_class_lock);
+	return 0;
+}
+
+static int remove_controller(struct ckrm_controller *ctlr)
+{
+	spin_lock(&res_ctlrs_lock);
+	if (atomic_read(&ctlr->count) > 0) {
+		spin_unlock(&res_ctlrs_lock);
+		return -EBUSY;
+	}
+
+	ckrm_controllers[ctlr->ctlr_id] = NULL;
+	ctlr->ctlr_id = CKRM_NO_RES_ID;
+	spin_unlock(&res_ctlrs_lock);
+	return 0;
+}
+
+/*
+ * Unregistering resource controller.
+ *
+ * Returns 0 on success -errno for failure.
+ */
+int ckrm_unregister_controller(struct ckrm_controller *ctlr)
+{
+	struct ckrm_class *class;
+
+	if (!ctlr)
+		return -EINVAL;
+
+	if (ckrm_get_controller_by_id(ctlr->ctlr_id) != ctlr)
+		return -EINVAL;
+
+	/* free shares structs for this resource from all the classes */
+	read_lock(&ckrm_class_lock);
+	list_for_each_entry_reverse(class, &ckrm_classes, class_list)
+		do_free_shares_struct(class, ctlr);
+	read_unlock(&ckrm_class_lock);
+
+	ckrm_put_controller(ctlr);
+	return remove_controller(ctlr);
+}
+
+EXPORT_SYMBOL_GPL(ckrm_register_controller);
+EXPORT_SYMBOL_GPL(ckrm_unregister_controller);
Index: linux2617-rc2/kernel/Makefile
===================================================================
--- linux2617-rc2.orig/kernel/Makefile
+++ linux2617-rc2/kernel/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
 obj-$(CONFIG_RELAY) += relay.o
+obj-$(CONFIG_CKRM) += ckrm/
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux2617-rc2/kernel/ckrm/Makefile
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/Makefile
@@ -0,0 +1 @@
+obj-y = ckrm.o
Index: linux2617-rc2/init/Kconfig
===================================================================
--- linux2617-rc2.orig/init/Kconfig
+++ linux2617-rc2/init/Kconfig
@@ -150,6 +150,20 @@ config BSD_PROCESS_ACCT_V3
 	  for processing it. A preliminary version of these tools is available
 	  at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>.
 
+menu "Class Based Kernel Resource Management"
+
+config CKRM
+	bool "Class Based Kernel Resource Management Core"
+	depends on EXPERIMENTAL
+	help
+	  Class-based Kernel Resource Management is a framework for controlling
+	  and monitoring resource allocation of user-defined groups of tasks.
+	  For more information, please visit http://ckrm.sf.net.
+
+	  If you say Y here, enable the Resource Class File System and at least
+	  one of the resource controllers below. Say N if you are unsure.
+
+endmenu
 config SYSCTL
 	bool "Sysctl support"
 	---help---
Index: linux2617-rc2/kernel/ckrm/ckrm_local.h
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/ckrm_local.h
@@ -0,0 +1,13 @@
+/*
+ * Contains function definitions that are local to ckrm core.
+ * NOT to be included by controllers.
+ */
+
+#include <linux/ckrm_rc.h>
+
+extern struct ckrm_controller *ckrm_get_controller_by_name(const char *);
+extern struct ckrm_controller *ckrm_get_controller_by_id(unsigned int);
+extern void ckrm_put_controller(struct ckrm_controller *);
+extern struct ckrm_class *ckrm_alloc_class(struct ckrm_class *, const char *);
+extern int ckrm_free_class(struct ckrm_class *);
+extern void ckrm_release_class(struct kref *);

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 02/12] Class creation/deletion
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 01/12] Register/Unregister interface for Controllers sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 03/12] Share Handling sekharan
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

02/12 - ckrm_core_class_support

Provides functions to alloc and free a user defined class.
Provides utility macro to walk through the class hierarchy
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Hubertus Franke <frankeh@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Gerrit Huizenga <gh@us.ibm.com>
Signed-Off-By: Vivek Kashyap <kashyapv@us.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 include/linux/ckrm.h    |    8 ++
 include/linux/ckrm_rc.h |    9 ++
 kernel/ckrm/ckrm.c      |  171 ++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 175 insertions(+), 13 deletions(-)

Index: linux2617-rc2/include/linux/ckrm.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm.h
+++ linux2617-rc2/include/linux/ckrm.h
@@ -61,6 +61,8 @@ struct ckrm_shares {
  * registered a resource controller (see include/linux/ckrm_rc.h).
  */
 struct ckrm_class {
+	const char *name;
+	struct kref ref;
 	int depth;		/* depth of this class. root == 0 */
 	spinlock_t class_lock;	/* protects task_list, shares and children
 				 * When grabbing class_lock in a hierarchy,
@@ -70,6 +72,12 @@ struct ckrm_class {
 				 * grabbing resource specific lock */
 	struct ckrm_shares *shares[CKRM_MAX_RES_CTLRS];/* resource shares */
 	struct list_head class_list;	/* entry in list of all classes */
+
+	struct list_head task_list;	/* this class's tasks */
+
+	struct ckrm_class *parent;
+	struct list_head siblings;	/* entry in list of siblings */
+	struct list_head children;	/* head of children */
 };
 
 #endif /* CONFIG_CKRM */
Index: linux2617-rc2/kernel/ckrm/ckrm.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm.c
+++ linux2617-rc2/kernel/ckrm/ckrm.c
@@ -19,14 +19,32 @@
  */
 
 #include <linux/module.h>
-#include <linux/ckrm_rc.h>
+#include "ckrm_local.h"
 
 static struct ckrm_controller *ckrm_controllers[CKRM_MAX_RES_CTLRS];
 /* res_ctlrs_lock protects ckrm_controllers array and count in controllers*/
 static spinlock_t res_ctlrs_lock = SPIN_LOCK_UNLOCKED;
 
 static LIST_HEAD(ckrm_classes);/* link all classes */
-static rwlock_t ckrm_class_lock = RW_LOCK_UNLOCKED; /* protects ckrm_classes */
+static int ckrm_num_classes;	/* Number of user defined classes */
+static rwlock_t ckrm_class_lock = RW_LOCK_UNLOCKED;
+			/* protects ckrm_classes list and ckrm_num_classes */
+
+struct ckrm_class ckrm_default_class = {
+	.task_list = LIST_HEAD_INIT(ckrm_default_class.task_list),
+	.class_lock = SPIN_LOCK_UNLOCKED,
+	.name = "task",
+	.class_list = LIST_HEAD_INIT(ckrm_default_class.class_list),
+	.siblings = LIST_HEAD_INIT(ckrm_default_class.siblings),
+	.children = LIST_HEAD_INIT(ckrm_default_class.children),
+};
+
+/* Must be called with parent's class_lock held */
+static inline void ckrm_remove_child(struct ckrm_class *child)
+{
+	list_del(&child->siblings);
+	child->parent = CKRM_NO_CLASS;
+}
 
 /* Must be called with res_ctlr_lock held */
 static inline int is_ctlr_id_valid(unsigned int ctlr_id)
@@ -97,6 +115,55 @@ static void do_alloc_shares_struct(struc
 		ckrm_get_controller(ctlr);
 }
 
+static void ckrm_class_init(struct ckrm_class *class)
+{
+	class->class_lock = SPIN_LOCK_UNLOCKED;
+	kref_init(&class->ref);
+	INIT_LIST_HEAD(&class->task_list);
+	INIT_LIST_HEAD(&class->children);
+	INIT_LIST_HEAD(&class->siblings);
+}
+
+struct ckrm_class *ckrm_alloc_class(struct ckrm_class *parent,
+						const char *name)
+{
+	int i;
+	struct ckrm_class *class;
+
+	BUG_ON(parent == NULL);
+
+	kref_get(&parent->ref);
+	class = kzalloc(sizeof(struct ckrm_class), GFP_KERNEL);
+	if (!class) {
+		kref_put(&parent->ref, ckrm_release_class);
+		return NULL;
+	}
+	ckrm_class_init(class);
+	class->name = name;
+	class->depth = parent->depth + 1;
+
+	/* Add to parent */
+	spin_lock(&parent->class_lock);
+	class->parent = parent;
+	list_add(&class->siblings, &parent->children);
+	spin_unlock(&parent->class_lock);
+
+	write_lock(&ckrm_class_lock);
+	list_add_tail(&class->class_list, &ckrm_classes);
+	ckrm_num_classes++;
+	write_unlock(&ckrm_class_lock);
+
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++) {
+		struct ckrm_controller *ctlr = ckrm_get_controller_by_id(i);
+		if (!ctlr)
+			continue;
+		do_alloc_shares_struct(class, ctlr);
+		ckrm_put_controller(ctlr);
+	}
+
+	return class;
+}
+
 /* Free up the given resource specific information in a class */
 static void do_free_shares_struct(struct ckrm_class *class,
 						struct ckrm_controller *ctlr)
@@ -114,6 +181,59 @@ static void do_free_shares_struct(struct
 	ckrm_put_controller(ctlr); /* Drop reference acquired in do_alloc */
 }
 
+/*
+ * Release a class
+ *  requires that all tasks were previously reassigned to another class
+ *
+ * Returns 0 on success -errno on failure.
+ */
+void ckrm_release_class(struct kref *kref)
+{
+	int i;
+	struct ckrm_class *class = container_of(kref,
+				struct ckrm_class, ref);
+	struct ckrm_class *parent = class->parent;
+
+	BUG_ON(ckrm_is_class_root(class));
+
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++) {
+		struct ckrm_controller *ctlr = ckrm_get_controller_by_id(i);
+		if (!ctlr)
+			continue;
+		do_free_shares_struct(class, ctlr);
+		ckrm_put_controller(ctlr);
+	}
+
+	/* Remove this class from the list of all classes */
+	write_lock(&ckrm_class_lock);
+	list_del(&class->class_list);
+	ckrm_num_classes--;
+	write_unlock(&ckrm_class_lock);
+
+	/* remove from parent */
+	spin_lock(&parent->class_lock);
+	list_del(&class->siblings);
+	class->parent = CKRM_NO_CLASS;
+	spin_unlock(&parent->class_lock);
+
+	kref_put(&parent->ref, ckrm_release_class);
+	kfree(class);
+}
+
+int ckrm_free_class(struct ckrm_class *class)
+{
+	BUG_ON(ckrm_is_class_root(class));
+	spin_lock(&class->class_lock);
+	if (!list_empty(&class->children)) {
+		spin_unlock(&class->class_lock);
+		return -EBUSY;
+	}
+	spin_unlock(&class->class_lock);
+	kref_put(&class->ref, ckrm_release_class);
+	return 0;
+}
+
+
 static int add_controller(struct ckrm_controller *ctlr)
 {
 	int ctlr_id, ret = -ENOSPC;
@@ -128,7 +248,6 @@ static int add_controller(struct ckrm_co
 	spin_unlock(&res_ctlrs_lock);
 	return ret;
 }
-
 /*
  * Interface for registering a resource controller.
  *
@@ -138,7 +257,7 @@ static int add_controller(struct ckrm_co
 int ckrm_register_controller(struct ckrm_controller *ctlr)
 {
 	int ret;
-	struct ckrm_class *class;
+	struct ckrm_class *class, *prev_class;
 
 	if (!ctlr)
 		return -EINVAL;
@@ -160,10 +279,20 @@ int ckrm_register_controller(struct ckrm
 	 * Run through all classes and create the controller specific data
 	 * structures.
 	 */
-	read_lock(&ckrm_class_lock);
-	list_for_each_entry(class, &ckrm_classes, class_list)
-		do_alloc_shares_struct(class, ctlr);
-	read_unlock(&ckrm_class_lock);
+	prev_class = NULL;
+  	read_lock(&ckrm_class_lock);
+	list_for_each_entry(class, &ckrm_classes, class_list) {
+		kref_get(&class->ref);
+		read_unlock(&ckrm_class_lock);
+  		do_alloc_shares_struct(class, ctlr);
+		if (prev_class)
+			kref_put(&prev_class->ref, ckrm_release_class);
+		prev_class = class;
+		read_lock(&ckrm_class_lock);
+	}
+  	read_unlock(&ckrm_class_lock);
+	if (prev_class)
+		kref_put(&prev_class->ref, ckrm_release_class);
 	return 0;
 }
 
@@ -188,7 +317,7 @@ static int remove_controller(struct ckrm
  */
 int ckrm_unregister_controller(struct ckrm_controller *ctlr)
 {
-	struct ckrm_class *class;
+	struct ckrm_class *class, *prev_class;
 
 	if (!ctlr)
 		return -EINVAL;
@@ -197,10 +326,20 @@ int ckrm_unregister_controller(struct ck
 		return -EINVAL;
 
 	/* free shares structs for this resource from all the classes */
-	read_lock(&ckrm_class_lock);
-	list_for_each_entry_reverse(class, &ckrm_classes, class_list)
-		do_free_shares_struct(class, ctlr);
-	read_unlock(&ckrm_class_lock);
+	prev_class = NULL;
+  	read_lock(&ckrm_class_lock);
+	list_for_each_entry_reverse(class, &ckrm_classes, class_list) {
+		kref_get(&class->ref);
+		read_unlock(&ckrm_class_lock);
+  		do_free_shares_struct(class, ctlr);
+		if (prev_class)
+			kref_put(&prev_class->ref, ckrm_release_class);
+		prev_class = class;
+		read_lock(&ckrm_class_lock);
+	}
+  	read_unlock(&ckrm_class_lock);
+	if (prev_class)
+		kref_put(&prev_class->ref, ckrm_release_class);
 
 	ckrm_put_controller(ctlr);
 	return remove_controller(ctlr);
@@ -208,3 +347,9 @@ int ckrm_unregister_controller(struct ck
 
 EXPORT_SYMBOL_GPL(ckrm_register_controller);
 EXPORT_SYMBOL_GPL(ckrm_unregister_controller);
+EXPORT_SYMBOL_GPL(ckrm_alloc_class);
+EXPORT_SYMBOL_GPL(ckrm_free_class);
+EXPORT_SYMBOL_GPL(ckrm_default_class);
+EXPORT_SYMBOL_GPL(ckrm_get_controller_by_name);
+EXPORT_SYMBOL_GPL(ckrm_get_controller_by_id);
+EXPORT_SYMBOL_GPL(ckrm_put_controller);
Index: linux2617-rc2/include/linux/ckrm_rc.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm_rc.h
+++ linux2617-rc2/include/linux/ckrm_rc.h
@@ -64,4 +64,13 @@ struct ckrm_controller {
 
 extern int ckrm_register_controller(struct ckrm_controller *);
 extern int ckrm_unregister_controller(struct ckrm_controller *);
+extern struct ckrm_class ckrm_default_class;
+static inline int ckrm_is_class_root(const struct ckrm_class* class)
+{
+	return (class == &ckrm_default_class);
+}
+
+#define for_each_child(child, parent)	\
+	list_for_each_entry(child, &parent->children, siblings)
+
 #endif /* _LINUX_CKRM_RC_H */

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 03/12] Share Handling
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 01/12] Register/Unregister interface for Controllers sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 02/12] Class creation/deletion sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 04/12] Add task logic to class sekharan
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

03/12 - ckrm_core_handle_shares

Provides functions to set/get shares of a specific resource of a class
Defines a teardown function that is intended to be called when user
disables CKRM (by umount of RCFS)
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Hubertus Franke <frankeh@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Gerrit Huizenga <gh@us.ibm.com>
Signed-off-by: MAEDA Naoaki <maeda.naoaki@jp.fujitsu.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 include/linux/ckrm.h      |   14 ++
 include/linux/ckrm_rc.h   |   10 +
 kernel/ckrm/Makefile      |    2 
 kernel/ckrm/ckrm.c        |   24 ++++
 kernel/ckrm/ckrm_local.h  |    6 +
 kernel/ckrm/ckrm_shares.c |  242 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 297 insertions(+), 1 deletion(-)

Index: linux2617-rc2/include/linux/ckrm.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm.h
+++ linux2617-rc2/include/linux/ckrm.h
@@ -54,6 +54,20 @@
  * locked.
 */
 struct ckrm_shares {
+	/* shares only set by userspace */
+	int min_shares; /* minimun fraction of parent's resources allowed */
+	int max_shares; /* maximum fraction of parent's resources allowed */
+	int child_shares_divisor; /* >= 1, may not be DONT_CARE */
+
+	/*
+	 * share values invisible to userspace.  adjusted when userspace
+	 * sets shares
+	 */
+	int unused_min_shares;
+		/* 0 <= unused_min_shares <= (child_shares_divisor -
+		 * 			Sum of min_shares of children)
+		 */
+	int cur_max_shares; /* max(children's max_shares). need better name */
 };
 
 /*
Index: linux2617-rc2/kernel/ckrm/ckrm_shares.c
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/ckrm_shares.c
@@ -0,0 +1,242 @@
+/*
+ * ckrm_shares.c - Share management functions for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman,  IBM Corp. 2003, 2004, 2005, 2006
+ *		(C) Hubertus Franke,  IBM Corp. 2004
+ *		(C) Matt Helsley,  IBM Corp. 2006
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/errno.h>
+#include <linux/ckrm_rc.h>
+
+/*
+ * Share values can be quantitative (quantity of memory for instance) or
+ * symbolic. The symbolic value DONT_CARE allows for any quantity of a resource
+ * to be substituted in its place. The symbolic value UNCHANGED is only used
+ * when setting share values and means that the old value should be used.
+ */
+
+/* Is the share a quantity (as opposed to "symbols" DONT_CARE or UNCHANGED) */
+static inline int is_share_quantitative(int share)
+{
+	return (share >= 0);
+}
+
+static inline int is_share_symbolic(int share)
+{
+	return !is_share_quantitative(share);
+}
+
+static inline int is_share_valid(int share)
+{
+	return ((share == CKRM_SHARE_DONT_CARE) ||
+			(share == CKRM_SHARE_UNSUPPORTED) ||
+			is_share_quantitative(share));
+}
+
+static inline int did_share_change(int share)
+{
+	return (share != CKRM_SHARE_UNCHANGED);
+}
+
+static inline int change_supported(int share)
+{
+	return (share != CKRM_SHARE_UNSUPPORTED);
+}
+
+/*
+ * Caller is responsible for protecting 'parent'
+ * Caller is responsible for making sure that the sum of sibling min_shares
+ * doesn't exceed parent's total min_shares.
+ */
+static inline void ckrm_child_min_shares_changed(struct ckrm_shares *parent,
+				   int child_cur_min_shares,
+				   int child_new_min_shares)
+{
+	if (is_share_quantitative(child_new_min_shares))
+		parent->unused_min_shares -= child_new_min_shares;
+	if (is_share_quantitative(child_cur_min_shares))
+		parent->unused_min_shares += child_cur_min_shares;
+}
+
+/*
+ * Set parent's cur_max_shares to the largest 'max_shares' of all
+ * of its children.
+ */
+static inline void ckrm_set_cur_max_shares(struct ckrm_class *parent,
+					struct ckrm_controller *ctlr)
+{
+	int max_shares = 0;
+	struct ckrm_class *child = NULL;
+	struct ckrm_shares *child_shares, *parent_shares;
+
+	for_each_child(child, parent) {
+		child_shares = ckrm_get_controller_shares(child, ctlr);
+		max_shares = max(max_shares, child_shares->max_shares);
+	}
+
+	parent_shares = ckrm_get_controller_shares(parent, ctlr);
+	parent_shares->cur_max_shares = max_shares;
+}
+
+/*
+ * Return -EINVAL if the child's shares violate self-consistency or
+ * parent-imposed restrictions. Otherwise return 0.
+ *
+ * This involves checking shares between the child and its parent;
+ * the child and itself (userspace can't be trusted).
+ */
+static inline int are_shares_valid(struct ckrm_shares *child,
+				   struct ckrm_shares *parent,
+				   int current_usage,
+				   int min_shares_increase)
+{
+	/*
+	 * CHILD <-> PARENT validation
+	 * Increases in child's min_shares or max_shares can't exceed
+	 * limitations imposed by the parent class.
+	 * Only validate this if we have a parent.
+	 */
+	if (parent &&
+	    ((is_share_quantitative(child->min_shares) &&
+	      (min_shares_increase > parent->unused_min_shares)) ||
+	     (is_share_quantitative(child->max_shares) &&
+	      (child->max_shares > parent->child_shares_divisor))))
+		return -EINVAL;
+
+	/* CHILD validation: is min valid */
+	if (!is_share_valid(child->min_shares))
+		return -EINVAL;
+
+	/* CHILD validation: is max valid */
+	if (!is_share_valid(child->max_shares))
+		return -EINVAL;
+
+	/*
+	 * CHILD validation: is divisor quantitative & current_usage
+	 * is not more than the new divisor
+	 */
+	if (!is_share_quantitative(child->child_shares_divisor) ||
+			(current_usage > child->child_shares_divisor))
+		return -EINVAL;
+
+	/*
+	 * CHILD validation: is the new child_shares_divisor large
+	 * enough to accomodate largest max_shares of any of my child
+	 */
+	if (child->child_shares_divisor < child->cur_max_shares)
+		return -EINVAL;
+
+	/* CHILD validation: min <= max */
+	if (is_share_quantitative(child->min_shares) &&
+			is_share_quantitative(child->max_shares) &&
+			(child->min_shares > child->max_shares))
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
+ * Set the resource shares of a child class given the new shares
+ * specified by userspace, the child's current shares, and the parent class'
+ * shares.
+ *
+ * Caller is responsible for holding class->lock of child and parent
+ * classes to protect the shares structures passed to this function.
+ */
+static int ckrm_set_shares(const struct ckrm_shares *new,
+		    struct ckrm_shares *child_shares,
+    		    struct ckrm_shares *parent_shares)
+{
+	int rc, current_usage, min_shares_increase;
+	struct ckrm_shares final_shares;
+
+	BUG_ON(!new || !child_shares);
+
+	final_shares = *child_shares;
+	if (did_share_change(new->child_shares_divisor) &&
+			change_supported(child_shares->child_shares_divisor))
+		final_shares.child_shares_divisor = new->child_shares_divisor;
+	if (did_share_change(new->min_shares) &&
+			change_supported(child_shares->min_shares))
+		final_shares.min_shares = new->min_shares;
+	if (did_share_change(new->max_shares) &&
+			change_supported(child_shares->max_shares))
+		final_shares.max_shares = new->max_shares;
+
+	current_usage = child_shares->child_shares_divisor -
+	    		 child_shares->unused_min_shares;
+	min_shares_increase = final_shares.min_shares;
+	if (is_share_quantitative(child_shares->min_shares))
+		min_shares_increase -= child_shares->min_shares;
+
+	rc = are_shares_valid(&final_shares, parent_shares, current_usage,
+   			      min_shares_increase);
+	if (rc)
+		return rc; /* new shares would violate restrictions */
+
+	if (did_share_change(new->child_shares_divisor))
+		final_shares.unused_min_shares =
+			(final_shares.child_shares_divisor - current_usage);
+	*child_shares = final_shares;
+	return 0;
+}
+
+int ckrm_set_controller_shares(struct ckrm_class *class,
+					struct ckrm_controller *ctlr,
+					const struct ckrm_shares *new_shares)
+{
+	struct ckrm_shares *shares, *parent_shares;
+	int prev_min, prev_max, rc;
+
+	if (!ctlr->shares_changed)
+		return -EINVAL;
+
+	shares = ckrm_get_controller_shares(class, ctlr);
+	if (!shares)
+		return -EINVAL;
+
+	prev_min = shares->min_shares;
+	prev_max = shares->max_shares;
+
+	if (!ckrm_is_class_root(class))
+		spin_lock(&class->parent->class_lock);
+	spin_lock(&class->class_lock);
+	parent_shares = ckrm_get_controller_shares(class->parent, ctlr);
+	rc = ckrm_set_shares(new_shares, shares, parent_shares);
+	spin_unlock(&class->class_lock);
+
+	if (rc || ckrm_is_class_root(class))
+		goto done;
+
+	/* Notify parent about changes in my shares */
+	ckrm_child_min_shares_changed(parent_shares, prev_min,
+				      shares->min_shares);
+	if (prev_max != shares->max_shares)
+		ckrm_set_cur_max_shares(class->parent, ctlr);
+
+done:
+	if (!ckrm_is_class_root(class))
+		spin_unlock(&class->parent->class_lock);
+	if (!rc)
+		ctlr->shares_changed(shares);
+	return rc;
+}
+
+void ckrm_set_shares_to_default(struct ckrm_class *class,
+						struct ckrm_controller *ctlr)
+{
+	struct ckrm_shares shares = {
+		.min_shares = CKRM_SHARE_DONT_CARE,
+		.max_shares = CKRM_SHARE_DONT_CARE,
+		.child_shares_divisor = CKRM_SHARE_DEFAULT_DIVISOR,
+	};
+	ckrm_set_controller_shares(class, ctlr, &shares);
+}
+
Index: linux2617-rc2/kernel/ckrm/Makefile
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/Makefile
+++ linux2617-rc2/kernel/ckrm/Makefile
@@ -1 +1 @@
-obj-y = ckrm.o
+obj-y = ckrm.o ckrm_shares.o
Index: linux2617-rc2/kernel/ckrm/ckrm.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm.c
+++ linux2617-rc2/kernel/ckrm/ckrm.c
@@ -174,6 +174,8 @@ static void do_free_shares_struct(struct
 	if (shares == NULL)
 		return;
 
+	ckrm_set_shares_to_default(class, ctlr);
+
 	spin_lock(&class->class_lock);
 	class->shares[ctlr->ctlr_id] = NULL;
 	spin_unlock(&class->class_lock);
@@ -345,6 +347,26 @@ int ckrm_unregister_controller(struct ck
 	return remove_controller(ctlr);
 }
 
+/*
+ * Bring the state of CKRM to the initial state.
+ * Only shares of the default class need to be changed to default values.
+ * At this point no user-defined classes should exist.
+ */
+void ckrm_teardown(void)
+{
+	int i;
+	struct ckrm_controller *ctlr;
+
+	BUG_ON(ckrm_num_classes != 0);
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++) {
+		ctlr = ckrm_get_controller_by_id(i);
+		if (ctlr) {
+			ckrm_set_shares_to_default(&ckrm_default_class, ctlr);
+			ckrm_put_controller(ctlr);
+		}
+	}
+}
+
 EXPORT_SYMBOL_GPL(ckrm_register_controller);
 EXPORT_SYMBOL_GPL(ckrm_unregister_controller);
 EXPORT_SYMBOL_GPL(ckrm_alloc_class);
@@ -353,3 +375,5 @@ EXPORT_SYMBOL_GPL(ckrm_default_class);
 EXPORT_SYMBOL_GPL(ckrm_get_controller_by_name);
 EXPORT_SYMBOL_GPL(ckrm_get_controller_by_id);
 EXPORT_SYMBOL_GPL(ckrm_put_controller);
+EXPORT_SYMBOL_GPL(ckrm_set_controller_shares);
+EXPORT_SYMBOL_GPL(ckrm_teardown);
Index: linux2617-rc2/include/linux/ckrm_rc.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm_rc.h
+++ linux2617-rc2/include/linux/ckrm_rc.h
@@ -73,4 +73,14 @@ static inline int ckrm_is_class_root(con
 #define for_each_child(child, parent)	\
 	list_for_each_entry(child, &parent->children, siblings)
 
+/* Get controller specific shares structure for the given class */
+static inline struct ckrm_shares *ckrm_get_controller_shares(
+			struct ckrm_class *class, struct ckrm_controller *ctlr)
+{
+	if (class && ctlr)
+		return class->shares[ctlr->ctlr_id];
+	else
+		return CKRM_NO_SHARE;
+}
+
 #endif /* _LINUX_CKRM_RC_H */
Index: linux2617-rc2/kernel/ckrm/ckrm_local.h
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_local.h
+++ linux2617-rc2/kernel/ckrm/ckrm_local.h
@@ -11,3 +11,9 @@ extern void ckrm_put_controller(struct c
 extern struct ckrm_class *ckrm_alloc_class(struct ckrm_class *, const char *);
 extern int ckrm_free_class(struct ckrm_class *);
 extern void ckrm_release_class(struct kref *);
+extern int ckrm_set_controller_shares(struct ckrm_class *,
+			struct ckrm_controller *, const struct ckrm_shares *);
+/* Set the shares for the given class and resource to default values */
+extern void ckrm_set_shares_to_default(struct ckrm_class *,
+						struct ckrm_controller *);
+extern void ckrm_teardown(void);

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 04/12] Add task logic to class
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (2 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 03/12] Share Handling sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 05/12] Init and clear class info in task sekharan
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

04/12 - ckrm_tasksupport

Adds logic to support adding/removing task to/from a class
Provides interface to set a task's class
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 include/linux/sched.h    |    4 +
 kernel/ckrm/Makefile     |    2 
 kernel/ckrm/ckrm.c       |   24 +++++++
 kernel/ckrm/ckrm_local.h |    1 
 kernel/ckrm/ckrm_task.c  |  144 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 174 insertions(+), 1 deletion(-)

Index: linux2617-rc2/include/linux/sched.h
===================================================================
--- linux2617-rc2.orig/include/linux/sched.h
+++ linux2617-rc2/include/linux/sched.h
@@ -888,6 +888,10 @@ struct task_struct {
 	 * cache last used pipe for splice
 	 */
 	struct pipe_inode_info *splice_pipe;
+#ifdef CONFIG_CKRM
+	struct ckrm_class *class;
+	struct list_head member_list; /* list of tasks in class */
+#endif /* CONFIG_CKRM */
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
Index: linux2617-rc2/kernel/ckrm/ckrm_task.c
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/ckrm_task.c
@@ -0,0 +1,144 @@
+/* ckrm_task.c - Class-based Kernel Resource Management (CKRM)
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003,2004
+ *		(C) Shailabh Nagar,  IBM Corp. 2003
+ *		(C) Chandra Seetharaman,  IBM Corp. 2003, 2004, 2005
+ *		(C) Vivek Kashyap,	IBM Corp. 2004
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+#include <linux/sched.h>
+#include <linux/module.h>
+#include "ckrm_local.h"
+
+static inline struct ckrm_class *remove_from_old_class(struct task_struct *tsk)
+{
+	struct ckrm_class *class;
+
+retry:
+	class = tsk->class;
+	if (class == CKRM_NO_CLASS)
+		goto done;
+
+	spin_lock(&class->class_lock);
+	if (class != tsk->class) { /* lost the race, retry */
+		spin_unlock(&class->class_lock);
+		goto retry;
+	}
+	/* take out of old class */
+	list_del_init(&tsk->member_list);
+	tsk->class = CKRM_NO_CLASS;
+	spin_unlock(&class->class_lock);
+done:
+	return class;
+}
+
+static void move_to_new_class(struct task_struct *tsk,
+				struct ckrm_class *newclass)
+{
+	BUG_ON(!list_empty(&tsk->member_list));
+	BUG_ON(tsk->class != CKRM_NO_CLASS);
+
+	spin_lock(&newclass->class_lock);
+	tsk->class = newclass;
+	list_add(&tsk->member_list, &newclass->task_list);
+	spin_unlock(&newclass->class_lock);
+}
+
+static void notify_res_ctlrs(struct task_struct *tsk,
+		struct ckrm_class *oldclass, struct ckrm_class *newclass)
+{
+	int i;
+	struct ckrm_controller *ctlr;
+	struct ckrm_shares *old_shares, *new_shares;
+
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++) {
+		ctlr = ckrm_get_controller_by_id(i);
+		if (ctlr == NULL)
+			continue;
+		if (ctlr->move_task) {
+			old_shares = ckrm_get_controller_shares(oldclass, ctlr);
+			new_shares = ckrm_get_controller_shares(newclass, ctlr);
+			ctlr->move_task(tsk, old_shares, new_shares);
+		}
+		ckrm_put_controller(ctlr);
+	}
+}
+
+/*
+ * Change the class of the given task to "newclass"
+ *
+ * Caller is responsible to make sure the task structure stays put
+ * through this function.
+ *
+ * This function should be called without holding class_lock of
+ * newclass and tsk->class.
+ *
+ * Called with a reference to the new class held. The reference is
+ * dropped only when the task is assigned to a different class
+ * or when the task exits.
+ */
+static void ckrm_setclass_internal(struct task_struct *tsk,
+				struct ckrm_class *newclass)
+{
+	struct ckrm_class *oldclass;
+
+retry:
+	oldclass = remove_from_old_class(tsk);
+
+	/* The task is either exiting or is moving to a different class. */
+	if (oldclass == CKRM_NO_CLASS) {
+		/* In the exit path, must succeed */
+		if (newclass == CKRM_NO_CLASS)
+			goto retry;
+		kref_put(&newclass->ref, ckrm_release_class);
+		return;
+	}
+
+	/*
+	 * notify resource controllers before we actually set the class
+	 * in the task to avoid a race with notify_res_ctlrs being called
+	 * from another ckrm_setclass_internal.
+	 */
+	notify_res_ctlrs(tsk, oldclass, newclass);
+	if (newclass != CKRM_NO_CLASS)
+		move_to_new_class(tsk, newclass);
+	kref_put(&oldclass->ref, ckrm_release_class);
+}
+
+/*
+ * Set class of the task associated with pid to class.
+ * returns 0 on success, -errno on error.
+ */
+int ckrm_setclass(pid_t pid, struct ckrm_class *class)
+{
+	int rc = 0;
+	struct task_struct *tsk;
+
+	read_lock(&tasklist_lock);
+	tsk = find_task_by_pid(pid);
+	if (tsk == NULL) {
+		read_unlock(&tasklist_lock);
+		return -ESRCH; /* pid not found */
+	}
+	get_task_struct(tsk);
+	read_unlock(&tasklist_lock);
+
+	/* Check permissions */
+	if ((!capable(CAP_SYS_NICE)) &&
+		(!capable(CAP_SYS_RESOURCE)) && (current->user != tsk->user))
+		rc = -EPERM;
+	else {
+		kref_get(&class->ref);
+		ckrm_setclass_internal(tsk, class);
+	}
+	put_task_struct(tsk);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ckrm_setclass);
Index: linux2617-rc2/kernel/ckrm/Makefile
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/Makefile
+++ linux2617-rc2/kernel/ckrm/Makefile
@@ -1 +1 @@
-obj-y = ckrm.o ckrm_shares.o
+obj-y = ckrm.o ckrm_shares.o ckrm_task.o
Index: linux2617-rc2/kernel/ckrm/ckrm.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm.c
+++ linux2617-rc2/kernel/ckrm/ckrm.c
@@ -251,6 +251,26 @@ static int add_controller(struct ckrm_co
 	return ret;
 }
 /*
+ * Helper function to move all tasks in a class to/from the registering
+ * /unregistering resource controller.
+ *
+ * Assumes ctlr is valid and class is initialized with resource
+ * controller's shares.
+ */
+static void move_tasks(struct ckrm_class *class, struct ckrm_controller *ctlr,
+		struct ckrm_shares *from, struct ckrm_shares *to)
+{
+	struct task_struct *tsk;
+
+	if (!ctlr->move_task)
+		return;
+	spin_lock(&class->class_lock);
+	list_for_each_entry(tsk, &class->task_list, member_list)
+		ctlr->move_task(tsk, from, to);
+	spin_unlock(&class->class_lock);
+}
+
+/*
  * Interface for registering a resource controller.
  *
  * Returns the 0 on success, -errno for failure.
@@ -287,6 +307,8 @@ int ckrm_register_controller(struct ckrm
 		kref_get(&class->ref);
 		read_unlock(&ckrm_class_lock);
   		do_alloc_shares_struct(class, ctlr);
+		move_tasks(class, ctlr, CKRM_NO_SHARE,
+					class->shares[ctlr->ctlr_id]);
 		if (prev_class)
 			kref_put(&prev_class->ref, ckrm_release_class);
 		prev_class = class;
@@ -333,6 +355,8 @@ int ckrm_unregister_controller(struct ck
 	list_for_each_entry_reverse(class, &ckrm_classes, class_list) {
 		kref_get(&class->ref);
 		read_unlock(&ckrm_class_lock);
+		move_tasks(class, ctlr, class->shares[ctlr->ctlr_id],
+							CKRM_NO_SHARE);
   		do_free_shares_struct(class, ctlr);
 		if (prev_class)
 			kref_put(&prev_class->ref, ckrm_release_class);
Index: linux2617-rc2/kernel/ckrm/ckrm_local.h
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_local.h
+++ linux2617-rc2/kernel/ckrm/ckrm_local.h
@@ -17,3 +17,4 @@ extern int ckrm_set_controller_shares(st
 extern void ckrm_set_shares_to_default(struct ckrm_class *,
 						struct ckrm_controller *);
 extern void ckrm_teardown(void);
+extern int ckrm_setclass(pid_t, struct ckrm_class *);

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 05/12] Init and clear class info in task
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (3 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 04/12] Add task logic to class sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 06/12] Add proc interface to get class info of task sekharan
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

05/12 - ckrm_tasksupport_fork_exit_init

Initializes and clears ckrm specific information in a task at fork() and
exit().
Inititalizes ckrm (called from start_kernel)
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>

 include/linux/ckrm.h     |    7 ++++++
 init/main.c              |    2 +
 kernel/ckrm/ckrm.c       |   11 +++++++++
 kernel/ckrm/ckrm_local.h |    1 
 kernel/ckrm/ckrm_task.c  |   52 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c            |    2 +
 kernel/fork.c            |    2 +
 7 files changed, 77 insertions(+)

Index: linux2617-rc2/kernel/exit.c
===================================================================
--- linux2617-rc2.orig/kernel/exit.c
+++ linux2617-rc2/kernel/exit.c
@@ -35,6 +35,7 @@
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
+#include <linux/ckrm.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -731,6 +732,7 @@ static void exit_notify(struct task_stru
 	struct task_struct *t;
 	struct list_head ptrace_dead, *_p, *_n;
 
+	ckrm_clear_task(tsk);
 	if (signal_pending(tsk) && !(tsk->signal->flags & SIGNAL_GROUP_EXIT)
 	    && !thread_group_empty(tsk)) {
 		/*
Index: linux2617-rc2/kernel/fork.c
===================================================================
--- linux2617-rc2.orig/kernel/fork.c
+++ linux2617-rc2/kernel/fork.c
@@ -44,6 +44,7 @@
 #include <linux/rmap.h>
 #include <linux/acct.h>
 #include <linux/cn_proc.h>
+#include <linux/ckrm.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1214,6 +1215,7 @@ static task_t *copy_process(unsigned lon
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
+	ckrm_init_task(p);
 	proc_fork_connector(p);
 	return p;
 
Index: linux2617-rc2/init/main.c
===================================================================
--- linux2617-rc2.orig/init/main.c
+++ linux2617-rc2/init/main.c
@@ -47,6 +47,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/ckrm.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -541,6 +542,7 @@ asmlinkage void __init start_kernel(void
 	proc_root_init();
 #endif
 	cpuset_init();
+	ckrm_init();
 
 	check_bugs();
 
Index: linux2617-rc2/kernel/ckrm/ckrm.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm.c
+++ linux2617-rc2/kernel/ckrm/ckrm.c
@@ -231,6 +231,7 @@ int ckrm_free_class(struct ckrm_class *c
 		return -EBUSY;
 	}
 	spin_unlock(&class->class_lock);
+	ckrm_move_tasks_to_parent(class);
 	kref_put(&class->ref, ckrm_release_class);
 	return 0;
 }
@@ -391,6 +392,16 @@ void ckrm_teardown(void)
 	}
 }
 
+void ckrm_init(void)
+{
+	write_lock(&ckrm_class_lock);
+	list_add_tail(&ckrm_default_class.class_list, &ckrm_classes);
+	write_unlock(&ckrm_class_lock);
+	kref_init(&ckrm_default_class.ref);
+	init_task.class = &ckrm_default_class;
+	ckrm_init_task(&init_task);
+}
+
 EXPORT_SYMBOL_GPL(ckrm_register_controller);
 EXPORT_SYMBOL_GPL(ckrm_unregister_controller);
 EXPORT_SYMBOL_GPL(ckrm_alloc_class);
Index: linux2617-rc2/kernel/ckrm/ckrm_task.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_task.c
+++ linux2617-rc2/kernel/ckrm/ckrm_task.c
@@ -141,4 +141,56 @@ int ckrm_setclass(pid_t pid, struct ckrm
 	put_task_struct(tsk);
 	return rc;
 }
+
+void ckrm_init_task(struct task_struct *tsk)
+{
+	struct ckrm_class *class;
+
+	/*
+	 * processes inherit their class from their real parent, and
+	 * threads inherit class from their process.
+	 */
+	if (thread_group_leader(tsk))
+		class = tsk->real_parent->class;
+	else
+		class = tsk->group_leader->class;
+
+	tsk->class = CKRM_NO_CLASS;
+	INIT_LIST_HEAD(&tsk->member_list);
+
+	BUG_ON(class == NULL);
+	kref_get(&class->ref);
+	move_to_new_class(tsk, class);
+	notify_res_ctlrs(tsk, CKRM_NO_CLASS, class);
+}
+
+void ckrm_clear_task(struct task_struct *tsk)
+{
+	ckrm_setclass_internal(tsk, CKRM_NO_CLASS);
+}
+
+/*
+ * Move all tasks in the given class to its parent.
+ */
+void ckrm_move_tasks_to_parent(struct ckrm_class *class)
+{
+	kref_get(&class->ref);
+
+next_task:
+	spin_lock(&class->class_lock);
+	if (!list_empty(&class->task_list)) {
+		struct task_struct *tsk =
+			list_entry(class->task_list.next,
+				struct task_struct, member_list);
+		get_task_struct(tsk);
+		spin_unlock(&class->class_lock);
+		kref_get(&class->parent->ref);
+		ckrm_setclass_internal(tsk, class->parent);
+		put_task_struct(tsk);
+		goto next_task;
+	}
+	spin_unlock(&class->class_lock);
+	kref_put(&class->ref, ckrm_release_class);
+}
+
 EXPORT_SYMBOL_GPL(ckrm_setclass);
Index: linux2617-rc2/include/linux/ckrm.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm.h
+++ linux2617-rc2/include/linux/ckrm.h
@@ -94,5 +94,12 @@ struct ckrm_class {
 	struct list_head children;	/* head of children */
 };
 
+extern void ckrm_init_task(struct task_struct *);
+extern void ckrm_clear_task(struct task_struct *);
+extern void ckrm_init(void);
+#else /* CONFIG_CKRM */
+static inline void ckrm_init_task(struct task_struct *tsk) { }
+static inline void ckrm_clear_task(struct task_struct *tsk) { }
+static inline void ckrm_init(void) { }
 #endif /* CONFIG_CKRM */
 #endif /* _LINUX_CKRM_H */
Index: linux2617-rc2/kernel/ckrm/ckrm_local.h
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_local.h
+++ linux2617-rc2/kernel/ckrm/ckrm_local.h
@@ -18,3 +18,4 @@ extern void ckrm_set_shares_to_default(s
 						struct ckrm_controller *);
 extern void ckrm_teardown(void);
 extern int ckrm_setclass(pid_t, struct ckrm_class *);
+extern void ckrm_move_tasks_to_parent(struct ckrm_class *);

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 06/12] Add proc interface to get class info of task
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (4 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 05/12] Init and clear class info in task sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 07/12] Configfs based filesystem user interface - RCFS sekharan
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

06/12: ckrm_tasksupport_procsupport

Adds an interface in /proc to get the class name of a task.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: MAEDA Naoaki <maeda.naoaki@jp.fujitsu.com>

 fs/proc/base.c          |   19 +++++++++++++++++++
 include/linux/ckrm.h    |    2 ++
 kernel/ckrm/ckrm_task.c |   32 +++++++++++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

Index: linux2617-rc2/fs/proc/base.c
===================================================================
--- linux2617-rc2.orig/fs/proc/base.c
+++ linux2617-rc2/fs/proc/base.c
@@ -70,6 +70,7 @@
 #include <linux/ptrace.h>
 #include <linux/seccomp.h>
 #include <linux/cpuset.h>
+#include <linux/ckrm.h>
 #include <linux/audit.h>
 #include <linux/poll.h>
 #include "internal.h"
@@ -115,6 +116,9 @@ enum pid_directory_inos {
 #ifdef CONFIG_CPUSETS
 	PROC_TGID_CPUSET,
 #endif
+#ifdef CONFIG_CKRM
+	PROC_TGID_CKRM_CLASS,
+#endif
 #ifdef CONFIG_SECURITY
 	PROC_TGID_ATTR,
 	PROC_TGID_ATTR_CURRENT,
@@ -156,6 +160,9 @@ enum pid_directory_inos {
 #ifdef CONFIG_CPUSETS
 	PROC_TID_CPUSET,
 #endif
+#ifdef CONFIG_CKRM
+	PROC_TID_CKRM_CLASS,
+#endif
 #ifdef CONFIG_SECURITY
 	PROC_TID_ATTR,
 	PROC_TID_ATTR_CURRENT,
@@ -219,6 +226,9 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_CPUSETS
 	E(PROC_TGID_CPUSET,    "cpuset",  S_IFREG|S_IRUGO),
 #endif
+#ifdef CONFIG_CKRM
+	E(PROC_TGID_CKRM_CLASS,"ckrm_class",S_IFREG|S_IRUGO),
+#endif
 	E(PROC_TGID_OOM_SCORE, "oom_score",S_IFREG|S_IRUGO),
 	E(PROC_TGID_OOM_ADJUST,"oom_adj", S_IFREG|S_IRUGO|S_IWUSR),
 #ifdef CONFIG_AUDITSYSCALL
@@ -261,6 +271,9 @@ static struct pid_entry tid_base_stuff[]
 #ifdef CONFIG_CPUSETS
 	E(PROC_TID_CPUSET,     "cpuset",  S_IFREG|S_IRUGO),
 #endif
+#ifdef CONFIG_CKRM
+	E(PROC_TID_CKRM_CLASS, "ckrm_class",S_IFREG|S_IRUGO),
+#endif
 	E(PROC_TID_OOM_SCORE,  "oom_score",S_IFREG|S_IRUGO),
 	E(PROC_TID_OOM_ADJUST, "oom_adj", S_IFREG|S_IRUGO|S_IWUSR),
 #ifdef CONFIG_AUDITSYSCALL
@@ -1814,6 +1827,12 @@ static struct dentry *proc_pident_lookup
 			inode->i_fop = &proc_cpuset_operations;
 			break;
 #endif
+#ifdef CONFIG_CKRM
+		case PROC_TID_CKRM_CLASS:
+		case PROC_TGID_CKRM_CLASS:
+			inode->i_fop = &proc_ckrm_class_operations;
+			break;
+#endif
 		case PROC_TID_OOM_SCORE:
 		case PROC_TGID_OOM_SCORE:
 			inode->i_fop = &proc_info_file_operations;
Index: linux2617-rc2/kernel/ckrm/ckrm_task.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_task.c
+++ linux2617-rc2/kernel/ckrm/ckrm_task.c
@@ -13,7 +13,8 @@
  * (at your option) any later version.
  *
  */
-#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
 #include <linux/module.h>
 #include "ckrm_local.h"
 
@@ -193,4 +194,33 @@ next_task:
 	kref_put(&class->ref, ckrm_release_class);
 }
 
+static int proc_ckrm_class_show(struct seq_file *m, void *v)
+{
+	struct task_struct *tsk = m->private;
+	struct ckrm_class *class = tsk->class;
+
+	if (!class)
+		return -EINVAL;
+
+	kref_get(&class->ref);
+	seq_puts(m, "/");
+	if (!ckrm_is_class_root(class))
+		seq_puts(m, class->name);
+	seq_putc(m, '\n');
+	kref_put(&class->ref, ckrm_release_class);
+	return 0;
+}
+
+static int ckrm_class_open(struct inode *inode, struct file *file)
+{
+	struct task_struct *tsk = PROC_I(inode)->task;
+	return single_open(file, proc_ckrm_class_show, tsk);
+}
+
+struct file_operations proc_ckrm_class_operations = {
+	.open		= ckrm_class_open,
+	.read		= seq_read,
+	.llseek 	= seq_lseek,
+	.release	= single_release,
+};
 EXPORT_SYMBOL_GPL(ckrm_setclass);
Index: linux2617-rc2/include/linux/ckrm.h
===================================================================
--- linux2617-rc2.orig/include/linux/ckrm.h
+++ linux2617-rc2/include/linux/ckrm.h
@@ -94,6 +94,8 @@ struct ckrm_class {
 	struct list_head children;	/* head of children */
 };
 
+extern struct file_operations proc_ckrm_class_operations;
+
 extern void ckrm_init_task(struct task_struct *);
 extern void ckrm_clear_task(struct task_struct *);
 extern void ckrm_init(void);

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 07/12] Configfs based filesystem user interface - RCFS
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (5 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 06/12] Add proc interface to get class info of task sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:24 ` [RFC] [PATCH 08/12] Add attribute support to RCFS sekharan
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

07/12 - ckrm_configfs_rcfs

Create the filesystem(RCFS) for managing CKRM. Hooks up with configfs.
Provides functions for creating and deleting classes.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 init/Kconfig            |   12 +++
 kernel/ckrm/Makefile    |    1 
 kernel/ckrm/ckrm_rcfs.c |  160 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 173 insertions(+)

Index: linux2617-rc2/init/Kconfig
===================================================================
--- linux2617-rc2.orig/init/Kconfig
+++ linux2617-rc2/init/Kconfig
@@ -163,6 +163,18 @@ config CKRM
 	  If you say Y here, enable the Resource Class File System and at least
 	  one of the resource controllers below. Say N if you are unsure.
 
+config CKRM_RCFS
+	tristate "Resource Control File System (User API for CKRM)"
+	depends on CKRM
+	select CONFIGFS_FS
+	default m
+	help
+	  RCFS is the filesystem API for CKRM. Compiling it as a module permits
+	  users to only load RCFS if they intend to use CKRM.
+
+	  Say M if unsure, Y to save on module loading. N doesn't make sense
+	  when CKRM has been configured.
+
 endmenu
 config SYSCTL
 	bool "Sysctl support"
Index: linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
===================================================================
--- /dev/null
+++ linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
@@ -0,0 +1,160 @@
+/*
+ * kernel/ckrm/ckrm_rcfs.c
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2005
+ *               Chandra Seetharaman,   IBM Corp. 2005, 2006
+ *
+ * Configfs based Resource class filesystem (rcfs) serving the
+ * user interface to Class-based Kernel Resource Management (CKRM).
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the  GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+#include <linux/module.h>
+#include <linux/configfs.h>
+#include "ckrm_local.h"
+
+static struct configfs_subsystem rcfs_subsys;
+static struct config_item_type rcfs_class_type;
+
+struct rcfs_class {
+	char *name;
+	struct ckrm_class *core;
+	struct config_group group;
+};
+
+static inline struct rcfs_class *group_to_rcfs_class(struct config_group *grp)
+{
+	return container_of(grp, struct rcfs_class, group);
+}
+
+static inline struct rcfs_class *item_to_rcfs_class(struct config_item *item)
+{
+	return group_to_rcfs_class(to_config_group(item));
+}
+
+static inline struct ckrm_class *group_to_ckrm_class(struct config_group *grp)
+{
+	struct rcfs_class *rclass;
+	/*
+	 * A configfs wrinkle forces us to treat the root group as a special
+	 * case instead of wrapping the group in a struct rcfs_class like all
+	 * other groups.
+	 */
+	if (grp == &rcfs_subsys.su_group)
+		return &ckrm_default_class;
+	rclass = group_to_rcfs_class(grp);
+	return rclass->core;
+}
+
+static inline struct ckrm_class *item_to_ckrm_class(struct config_item *item)
+{
+	return group_to_ckrm_class(to_config_group(item));
+}
+
+/*
+ * This is the function that is called when a 'mkdir' command
+ * is issued under our filesystem
+ */
+static struct config_group *make_rcfs_class(struct config_group *group,
+					const char *name)
+{
+	struct rcfs_class *rclass, *rc_par;
+	struct ckrm_class *core, *parent;
+	char *new_name = NULL, *par_name = NULL;
+	int par_sz = 0;
+
+	rclass = kzalloc(sizeof(struct rcfs_class), GFP_KERNEL);
+	if (!rclass)
+		return NULL;
+
+	parent = group_to_ckrm_class(group);
+
+	if (parent != &ckrm_default_class) {
+		rc_par = group_to_rcfs_class(group);
+		par_name = rc_par->name;
+		par_sz = strlen(par_name);
+	}
+	new_name = kmalloc(par_sz + strlen(name) + 2, GFP_KERNEL);
+	if (!new_name)
+		goto noname;
+	if (par_name)
+		sprintf(new_name, "%s/%s", par_name, name);
+	else
+		sprintf(new_name, "%s", name);
+
+	core = ckrm_alloc_class(parent, new_name);
+	if (!core)
+		goto nocore;
+	rclass->core = core;
+	rclass->name = new_name;
+
+	config_group_init_type_name(&rclass->group, name, &rcfs_class_type);
+	return &rclass->group;
+
+nocore:
+	kfree(new_name);
+noname:
+	kfree(rclass);
+	return NULL;
+}
+
+/*
+ * This is the function that is called when a 'rmdir' command
+ * is issued under our filesystem
+ */
+static void rcfs_class_release_item(struct config_item *item)
+{
+	struct rcfs_class *rclass = item_to_rcfs_class(item);
+
+	ckrm_free_class(rclass->core);
+	kfree(rclass->name);
+	kfree(rclass);
+}
+
+static struct configfs_item_operations rcfs_class_item_ops = {
+	.release	= rcfs_class_release_item,
+};
+
+static struct configfs_group_operations rcfs_class_group_ops = {
+	.make_group     = make_rcfs_class,
+};
+
+static struct config_item_type rcfs_class_type = {
+	.ct_owner	= THIS_MODULE,
+	.ct_item_ops    = &rcfs_class_item_ops,
+	.ct_group_ops	= &rcfs_class_group_ops,
+};
+
+static struct configfs_subsystem rcfs_subsys = {
+	.su_group = {
+		.cg_item = {
+			.ci_namebuf = "ckrm",
+			.ci_type = &rcfs_class_type,
+		}
+	}
+};
+
+static int __init rcfs_init(void)
+{
+	config_group_init(&rcfs_subsys.su_group);
+	init_MUTEX(&rcfs_subsys.su_sem);
+	return configfs_register_subsystem(&rcfs_subsys);
+}
+
+static void __exit rcfs_exit(void)
+{
+	configfs_unregister_subsystem(&rcfs_subsys);
+	ckrm_teardown();
+}
+
+late_initcall(rcfs_init);
+module_exit(rcfs_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("RCFS - Provides an interface to classes and allows control "
+		   "of their resource usage");
Index: linux2617-rc2/kernel/ckrm/Makefile
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/Makefile
+++ linux2617-rc2/kernel/ckrm/Makefile
@@ -1 +1,2 @@
 obj-y = ckrm.o ckrm_shares.o ckrm_task.o
+obj-$(CONFIG_CKRM_RCFS) += ckrm_rcfs.o

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 08/12] Add attribute support to RCFS
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (6 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 07/12] Configfs based filesystem user interface - RCFS sekharan
@ 2006-04-21  2:24 ` sekharan
  2006-04-21  2:25 ` [RFC] [PATCH 09/12] Add stats file " sekharan
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:24 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

08/12 - ckrm_configfs_rcfs_attr_support

Adds the basic attribute store and show functions.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 kernel/ckrm/ckrm_rcfs.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 50 insertions(+), 1 deletion(-)

Index: linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_rcfs.c
+++ linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
@@ -14,13 +14,21 @@
  * as published by the Free Software Foundation.
  *
  */
+#include <linux/ctype.h>
 #include <linux/module.h>
 #include <linux/configfs.h>
+#include <linux/parser.h>
 #include "ckrm_local.h"
 
 static struct configfs_subsystem rcfs_subsys;
 static struct config_item_type rcfs_class_type;
 
+struct class_attribute {
+	struct configfs_attribute configfs_attr;
+	ssize_t (*show)(struct ckrm_class *, char *);
+	int (*store)(struct ckrm_class *, const char *);
+};
+
 struct rcfs_class {
 	char *name;
 	struct ckrm_class *core;
@@ -56,6 +64,40 @@ static inline struct ckrm_class *item_to
 	return group_to_ckrm_class(to_config_group(item));
 }
 
+static ssize_t rcfs_attr_show(struct config_item *item,
+	      		      struct configfs_attribute *attr, char *buf)
+{
+	struct class_attribute *class_attr;
+	struct ckrm_class *class = item_to_ckrm_class(item);
+
+	class_attr = container_of(attr, struct class_attribute, configfs_attr);
+	return class_attr->show(class, buf);
+}
+
+static ssize_t rcfs_attr_store(struct config_item *item,
+			       struct configfs_attribute *attr, const char *buf,
+			       size_t count)
+{
+	char *filtered_buf, *p;
+	ssize_t rc;
+	struct class_attribute *class_attr;
+	struct ckrm_class *class = item_to_ckrm_class(item);
+
+	class_attr = container_of(attr, struct class_attribute, configfs_attr);
+	filtered_buf = kzalloc(count + 1, GFP_KERNEL);
+	if (!filtered_buf)
+		return -ENOMEM;
+	strncpy(filtered_buf, buf, count);
+	for (p = filtered_buf; isprint(*p); ++p)
+		;
+	*p = '\0';
+	rc = class_attr->store(class, filtered_buf);
+	kfree(filtered_buf);
+	if (rc)
+		return rc;
+	return count;
+}
+
 /*
  * This is the function that is called when a 'mkdir' command
  * is issued under our filesystem
@@ -117,17 +159,24 @@ static void rcfs_class_release_item(stru
 }
 
 static struct configfs_item_operations rcfs_class_item_ops = {
-	.release	= rcfs_class_release_item,
+	.release		= rcfs_class_release_item,
+	.show_attribute		= rcfs_attr_show,
+	.store_attribute	= rcfs_attr_store,
 };
 
 static struct configfs_group_operations rcfs_class_group_ops = {
 	.make_group     = make_rcfs_class,
 };
 
+static struct configfs_attribute *class_attrs[] = {
+	NULL
+};
+
 static struct config_item_type rcfs_class_type = {
 	.ct_owner	= THIS_MODULE,
 	.ct_item_ops    = &rcfs_class_item_ops,
 	.ct_group_ops	= &rcfs_class_group_ops,
+	.ct_attrs       = class_attrs
 };
 
 static struct configfs_subsystem rcfs_subsys = {

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 09/12] Add stats file support to RCFS
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (7 preceding siblings ...)
  2006-04-21  2:24 ` [RFC] [PATCH 08/12] Add attribute support to RCFS sekharan
@ 2006-04-21  2:25 ` sekharan
  2006-04-21  2:25 ` [RFC] [PATCH 10/12] Add shares " sekharan
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:25 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

09/12 - ckrm_configfs_rcfs_stats

Adds attr_store and attr_show support for stats file.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 kernel/ckrm/ckrm_rcfs.c |  114 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 112 insertions(+), 2 deletions(-)

Index: linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_rcfs.c
+++ linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
@@ -20,8 +20,104 @@
 #include <linux/parser.h>
 #include "ckrm_local.h"
 
-static struct configfs_subsystem rcfs_subsys;
-static struct config_item_type rcfs_class_type;
+#define CKRM_NAME_LEN 20
+
+#define RES_STRING "res"
+
+static ssize_t show_stats(struct ckrm_class *class, char *buf)
+{
+	int i, j = 0, rc = 0;
+	size_t buf_size = PAGE_SIZE-1; /* allow only PAGE_SIZE # of bytes */
+	struct ckrm_controller *ctlr;
+	struct ckrm_shares *shares;
+
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++, j = 0) {
+		if (buf_size <= 0)
+			break;
+		ctlr = ckrm_get_controller_by_id(i);
+		if (!ctlr)
+			 continue;
+		shares = ckrm_get_controller_shares(class, ctlr);
+		if (shares && ctlr->show_stats)
+			j = ctlr->show_stats(shares, buf, buf_size);
+		ckrm_put_controller(ctlr);
+		rc += j;
+		buf += j;
+		buf_size -= j;
+	}
+	if (i < CKRM_MAX_RES_CTLRS)
+		rc = -ENOSPC;
+	return rc;
+}
+
+enum parse_token_t {
+	parse_res_type, parse_err
+};
+
+static match_table_t parse_tokens = {
+	{parse_res_type, RES_STRING"=%s"},
+	{parse_err, NULL}
+};
+
+static int ckrm_stats_parse(const char *options,
+				char **resname, char **remaining_line)
+{
+	char *p, *str;
+	int rc = -EINVAL;
+
+	if (!options)
+		return -EINVAL;
+
+	while ((p = strsep((char **)&options, ",")) != NULL) {
+		substring_t args[MAX_OPT_ARGS];
+		int token;
+
+		if (!*p)
+			continue;
+		token = match_token(p, parse_tokens, args);
+		if (token == parse_res_type) {
+			*resname = match_strdup(args);
+			str = p + strlen(p) + 1;
+			*remaining_line = kmalloc(strlen(str) + 1, GFP_KERNEL);
+			if (*remaining_line == NULL) {
+				kfree(*resname);
+				*resname = NULL;
+				rc = -ENOMEM;
+			} else {
+				strcpy(*remaining_line, str);
+				rc = 0;
+			}
+			break;
+		}
+	}
+	return rc;
+}
+
+static int reset_stats(struct ckrm_class *class, const char *str)
+{
+	int rc;
+	char *resname = NULL, *statstr = NULL;
+	struct ckrm_controller *ctlr;
+	struct ckrm_shares *shares;
+
+	rc = ckrm_stats_parse(str, &resname, &statstr);
+	if (rc)
+		return rc;
+
+	ctlr = ckrm_get_controller_by_name(resname);
+	if (!ctlr) {
+		rc = -EINVAL;
+		goto done;
+	}
+	shares = ckrm_get_controller_shares(class, ctlr);
+	if (shares && ctlr->reset_stats)
+		rc = ctlr->reset_stats(shares, statstr);
+	ckrm_put_controller(ctlr);
+done:
+	kfree(resname);
+	kfree(statstr);
+	return rc;
+}
 
 struct class_attribute {
 	struct configfs_attribute configfs_attr;
@@ -29,6 +125,19 @@ struct class_attribute {
 	int (*store)(struct ckrm_class *, const char *);
 };
 
+struct class_attribute stats_attr = {
+	.configfs_attr = {
+		.ca_name = "stats",
+		.ca_owner = THIS_MODULE,
+		.ca_mode = S_IRUGO | S_IWUSR
+	},
+	.show = show_stats,
+	.store = reset_stats
+};
+
+static struct configfs_subsystem rcfs_subsys;
+static struct config_item_type rcfs_class_type;
+
 struct rcfs_class {
 	char *name;
 	struct ckrm_class *core;
@@ -169,6 +278,7 @@ static struct configfs_group_operations 
 };
 
 static struct configfs_attribute *class_attrs[] = {
+	&stats_attr.configfs_attr,
 	NULL
 };
 

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 10/12] Add shares file support to RCFS
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (8 preceding siblings ...)
  2006-04-21  2:25 ` [RFC] [PATCH 09/12] Add stats file " sekharan
@ 2006-04-21  2:25 ` sekharan
  2006-04-21  2:25 ` [RFC] [PATCH 11/12] Add members " sekharan
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:25 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

10/12 - ckrm_configfs_rcfs_shares

Adds attr_store and attr_show support for shares file.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 kernel/ckrm/ckrm_rcfs.c |  136 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+)

Index: linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_rcfs.c
+++ linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
@@ -23,6 +23,9 @@
 #define CKRM_NAME_LEN 20
 
 #define RES_STRING "res"
+#define MIN_SHARES_STRING "min_shares"
+#define MAX_SHARES_STRING "max_shares"
+#define CHILD_SHARES_DIVISOR_STRING "child_shares_divisor"
 
 static ssize_t show_stats(struct ckrm_class *class, char *buf)
 {
@@ -119,6 +122,128 @@ done:
 	return rc;
 }
 
+
+enum share_token_t {
+	MIN_SHARES_TOKEN,
+	MAX_SHARES_TOKEN,
+	CHILD_SHARES_DIVISOR_TOKEN,
+	RESOURCE_TYPE_TOKEN,
+	ERROR_TOKEN
+};
+
+/* Token matching for parsing input to this magic file */
+static match_table_t shares_tokens = {
+	{RESOURCE_TYPE_TOKEN, RES_STRING"=%s"},
+	{MIN_SHARES_TOKEN, MIN_SHARES_STRING"=%d"},
+	{MAX_SHARES_TOKEN, MAX_SHARES_STRING"=%d"},
+	{CHILD_SHARES_DIVISOR_TOKEN, CHILD_SHARES_DIVISOR_STRING"=%d"},
+	{ERROR_TOKEN, NULL}
+};
+
+static int shares_parse(const char *options, char **resname,
+					struct ckrm_shares *shares)
+{
+	char *p;
+	int option, rc = -EINVAL;
+
+	*resname = NULL;
+	if (!options)
+		goto done;
+	while ((p = strsep((char **)&options, ",")) != NULL) {
+		substring_t args[MAX_OPT_ARGS];
+		int token;
+
+		if (!*p)
+			continue;
+		token = match_token(p, shares_tokens, args);
+		switch (token) {
+		case RESOURCE_TYPE_TOKEN:
+			if (*resname)
+				goto done;
+			*resname = match_strdup(args);
+			break;
+		case MIN_SHARES_TOKEN:
+			if (match_int(args, &option))
+				goto done;
+			shares->min_shares = option;
+			break;
+		case MAX_SHARES_TOKEN:
+			if (match_int(args, &option))
+				goto done;
+			shares->max_shares = option;
+			break;
+		case CHILD_SHARES_DIVISOR_TOKEN:
+			if (match_int(args, &option))
+				goto done;
+			shares->child_shares_divisor = option;
+			break;
+		default:
+			goto done;
+		}
+	}
+	rc = 0;
+done:
+	if (rc) {
+		kfree(*resname);
+		*resname = NULL;
+	}
+	return rc;
+}
+
+static int set_shares(struct ckrm_class *class, const char *str)
+{
+	char *resname = NULL;
+	int rc;
+	struct ckrm_controller *ctlr;
+	struct ckrm_shares shares = {
+		.min_shares = CKRM_SHARE_UNCHANGED,
+		.max_shares = CKRM_SHARE_UNCHANGED,
+		.child_shares_divisor = CKRM_SHARE_UNCHANGED,
+	};
+
+	rc = shares_parse(str, &resname, &shares);
+	if (!rc) {
+		ctlr = ckrm_get_controller_by_name(resname);
+		if (ctlr) {
+			rc = ckrm_set_controller_shares(class, ctlr, &shares);
+			ckrm_put_controller(ctlr);
+		} else
+			rc = -EINVAL;
+		kfree(resname);
+	}
+	return rc;
+}
+
+static ssize_t show_shares(struct ckrm_class *class, char *buf)
+{
+	int i;
+	ssize_t j, rc = 0, bufsize = PAGE_SIZE;
+	struct ckrm_shares *shares;
+	struct ckrm_controller *ctlr;
+
+	for (i = 0; i < CKRM_MAX_RES_CTLRS; i++) {
+		ctlr = ckrm_get_controller_by_id(i);
+		if (!ctlr)
+			continue;
+		shares = ckrm_get_controller_shares(class, ctlr);
+		if (shares) {
+			if (bufsize <= 0)
+				break;
+			j = snprintf(buf, bufsize, "%s=%s,%s=%d,%s=%d,%s=%d\n",
+				RES_STRING, ctlr->name,
+				MIN_SHARES_STRING, shares->min_shares,
+				MAX_SHARES_STRING, shares->max_shares,
+				CHILD_SHARES_DIVISOR_STRING,
+				shares->child_shares_divisor);
+			rc += j; buf += j; bufsize -= j;
+		}
+		ckrm_put_controller(ctlr);
+	}
+	if (i < CKRM_MAX_RES_CTLRS)
+		rc = -ENOSPC;
+	return rc;
+}
+
 struct class_attribute {
 	struct configfs_attribute configfs_attr;
 	ssize_t (*show)(struct ckrm_class *, char *);
@@ -135,6 +260,16 @@ struct class_attribute stats_attr = {
 	.store = reset_stats
 };
 
+struct class_attribute shares_attr = {
+	.configfs_attr = {
+		.ca_name = "shares",
+		.ca_owner = THIS_MODULE,
+		.ca_mode = S_IRUGO | S_IWUSR
+	},
+	.show = show_shares,
+	.store = set_shares
+};
+
 static struct configfs_subsystem rcfs_subsys;
 static struct config_item_type rcfs_class_type;
 
@@ -279,6 +414,7 @@ static struct configfs_group_operations 
 
 static struct configfs_attribute *class_attrs[] = {
 	&stats_attr.configfs_attr,
+	&shares_attr.configfs_attr,
 	NULL
 };
 

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 11/12] Add members file support to RCFS
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (9 preceding siblings ...)
  2006-04-21  2:25 ` [RFC] [PATCH 10/12] Add shares " sekharan
@ 2006-04-21  2:25 ` sekharan
  2006-04-21  2:25 ` [RFC] [PATCH 12/12] Documentation for CKRM sekharan
  2006-04-21 14:49 ` [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Dave Hansen
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:25 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

11/12 - ckrm_configfs_rcfs_members

Adds attr_store and attr_show support for members file.
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 kernel/ckrm/ckrm_rcfs.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 49 insertions(+)

Index: linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
===================================================================
--- linux2617-rc2.orig/kernel/ckrm/ckrm_rcfs.c
+++ linux2617-rc2/kernel/ckrm/ckrm_rcfs.c
@@ -244,6 +244,43 @@ static ssize_t show_shares(struct ckrm_c
 	return rc;
 }
 
+/*
+ * Given a buffer with a pid in it, add the task with that pid to the class.
+ * Ignores entire buffer after the first pid is parsed.
+ */
+static int add_member(struct ckrm_class *class, const char *str)
+{
+	pid_t pid;
+
+	pid = (pid_t) simple_strtol(str, NULL, 0);
+	if (pid <= 0)
+		return -EINVAL; /* Not a valid pid */
+	return ckrm_setclass(pid, class);
+}
+
+/*
+ * Lists pids of tasks that belong to the given class.
+ */
+static ssize_t show_members(struct ckrm_class *class, char *buf)
+{
+	ssize_t i, rc = 0, bufsize = PAGE_SIZE;
+	struct task_struct *tsk;
+
+	spin_lock(&class->class_lock);
+	list_for_each_entry(tsk, &class->task_list, member_list) {
+		if (bufsize <= 0) {
+			rc = -ENOSPC;
+			break;
+		}
+		if (!tsk->pid)	/* Ignore swappers */
+			continue;
+		i = snprintf(buf, bufsize, "%ld\n", (long)tsk->pid);
+		buf += i; rc += i; bufsize -= i;
+	}
+	spin_unlock(&class->class_lock);
+	return rc;
+}
+
 struct class_attribute {
 	struct configfs_attribute configfs_attr;
 	ssize_t (*show)(struct ckrm_class *, char *);
@@ -270,6 +307,17 @@ struct class_attribute shares_attr = {
 	.store = set_shares
 };
 
+struct class_attribute members_attr = {
+	.configfs_attr = {
+		.ca_name = "members",
+		.ca_owner = THIS_MODULE,
+		.ca_mode = S_IRUGO | S_IWUSR
+	},
+	.show = show_members,
+	.store = add_member
+};
+
+
 static struct configfs_subsystem rcfs_subsys;
 static struct config_item_type rcfs_class_type;
 
@@ -415,6 +463,7 @@ static struct configfs_group_operations 
 static struct configfs_attribute *class_attrs[] = {
 	&stats_attr.configfs_attr,
 	&shares_attr.configfs_attr,
+	&members_attr.configfs_attr,
 	NULL
 };
 

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC] [PATCH 12/12] Documentation for CKRM
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (10 preceding siblings ...)
  2006-04-21  2:25 ` [RFC] [PATCH 11/12] Add members " sekharan
@ 2006-04-21  2:25 ` sekharan
  2006-04-21 14:49 ` [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Dave Hansen
  12 siblings, 0 replies; 43+ messages in thread
From: sekharan @ 2006-04-21  2:25 UTC (permalink / raw)
  To: linux-kernel, ckrm-tech; +Cc: sekharan

12/12 - ckrm_docs

Documentation describing important CKRM elements such as classes, shares,
controllers, and the interface provided to userspace via RCFS
--

Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-Off-By: Hubertus Franke <frankeh@us.ibm.com>
Signed-Off-By: Shailabh Nagar <nagar@watson.ibm.com>
Signed-Off-By: Gerrit Huizenga <gh@us.ibm.com>
Signed-Off-By: Vivek Kashyap <kashyapv@us.ibm.com>
Signed-Off-By: Matt Helsley <matthltc@us.ibm.com>

 Documentation/ckrm/ckrm_basics  |   65 ++++++++++++++++++++++++++++++++++++++++
 Documentation/ckrm/ckrm_install |   54 +++++++++++++++++++++++++++++++++
 Documentation/ckrm/ckrm_usage   |   52 ++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+)

Index: linux-2.6.16/Documentation/ckrm/ckrm_basics
===================================================================
--- /dev/null
+++ linux-2.6.16/Documentation/ckrm/ckrm_basics
@@ -0,0 +1,65 @@
+CKRM Basics
+-------------
+A brief review of CKRM concepts and terminology will help make installation
+and testing easier. For more details, please visit http://ckrm.sf.net.
+
+Concept:
+User defines a class, associate some amount of resources to the class, and
+associates tasks with the class. Tasks belonging to that class will be
+bound by the amount of resources that are assigned to that class.
+
+RCFS depicts a CKRM class as a directory. Hierarchy of classes can be
+created in which children of a class share resources allotted to
+the parent. Tasks can be classified to any class which is at any level.
+There is no correlation between parent-child relationship of tasks and
+the parent-child relationship of classes they belong to.
+
+During fork(), class is inherited by a task. A privileged user can
+reassign a task to any class.
+
+Characteristics of a class can be accessed/changed through the following
+files under the directory representing the class:
+
+shares:  allows changing shares of different resource managed by the class
+stats:   shows statistics associated with each resource managed by the class
+members: allows assignment of tasks to a class and shows tasks that are
+         assigned to a class.
+
+Resource allocation of a class is proportional to the amount of resources
+available to the class's parent.
+Resource allocation for a class is controlled by the parameters:
+
+min_shares: Minimum amount shares that can be allocated by a class. A
+            special value of DONT_CARE(-3) means that there is no minimum
+	    shares of a resource specified. This class may not get
+	    any resource if the system is running short on resources.
+max_shares: Specifies the maximum amount of resource that is allowed to be
+           allocated by a class. A special value DONT_CARE(-3) means
+	   there is no specific limit is specified, this class can get all
+	   the resources available.
+child_shares_divisor: total guarantee that is allowed among the children of
+           this class. In other words, the sum of "guarantee"s of all
+	   children of this class cannot exceed this number.
+
+Any of these parameters can have a special value, UNSUPPORTED(-2) meaning
+that the specific controller does not support this parameter. User
+request to change the value will be ignored.
+
+None of these parameters neither absolute nor have any units associated with
+them. These are just numbers that are used to calculate the absolute number
+of resource available for a specific class.
+
+In order to make them independent of the type of resource and handle
+complexities like hotplug none of these parameters have units associated
+with them. Furthermore they are not percentages. They are called shares
+because an appropriate analogy would be shares in a stock market.
+
+The absolute amount (for example no. of tasks) of minimum shares available
+for a class is calculuated as:
+
+	absolute minimum shares = (parent's absolute amount of resource) *
+			(class's min_shares / parent's child_shares_divisor)
+
+Maximum shares is also calculated in the same way.
+
+Root class is allocated all the resources available in the system. In other
Index: linux-2.6.16/Documentation/ckrm/ckrm_install
===================================================================
--- /dev/null
+++ linux-2.6.16/Documentation/ckrm/ckrm_install
@@ -0,0 +1,54 @@
+Kernel installation
+------------------------------
+
+<kernver> = version of mainline Linux kernel
+<ckrmver> = version of CKRM
+
+Note: It is expected that CKRM versions will change often. Hence once
+a CKRM version has been released for some <kernver>, it will only be made
+available for future <kernver>'s until the next CKRM version is released.
+
+Patches released will specify which version of kernel source that patchset
+is released against.
+
+Core patches will be released in two formats
+	1. set of patches with a series file (to be used with quilt)
+	2. a single patch that is inclusive of all the core patches.
+
+Controler patches will be released as a set. An excpetion would be the
+numtasks controller which would be released as part of the core patchset.
+
+1. Patch
+
+    Apply ckrm-single-<ckrmversion>.patch to a mainline kernel
+    tree with version <kernver>.
+
+2. Configure
+
+Select appropriate configuration options:
+
+   Enable configfs filesystem:
+   File systems --->
+     Pseudo filesystems --->
+       <M> Userspace-driven configuration filesystem (EXPERIMENTAL)
+
+   Enable CKRM components:
+   General Setup --->
+     Class Based Kernel Resource Management --->
+        [*] Class Based Kernel Resource Management Core
+        <M> Resource Class File System (User API)
+        [*]     Number of Tasks Resource Manager
+
+
+3. Build, boot the kernel
+
+4. Enable rcfs
+
+    # insmod <patchestree>/fs/configfs/configfs.ko # if compiled as module
+    # insmod <patchedtree>/kernel/ckrm/ckrm_rcfs.ko # if compiled in as module
+    # mount -t configfs none /config
+
+    This will create the directory /config/ckrm which is the root of classes.
+
+5. Work with class hierarchy as explained in the file ckrm_usage
+
Index: linux-2.6.16/Documentation/ckrm/ckrm_usage
===================================================================
--- /dev/null
+++ linux-2.6.16/Documentation/ckrm/ckrm_usage
@@ -0,0 +1,52 @@
+Usage of CKRM
+-------------
+
+1. Create a class
+
+   # mkdir /config/ckrm/c1
+   creates a class named c1 , while
+
+The newly created class directory is automatically populated by magic files
+shares, stats, members, and attrib.
+
+2. View default shares of a class
+
+   # cat /config/ckrm/c1/shares
+   min_shares=-3,max_shares=-3,child_total_divisor=100
+
+   Above is the default value set for resources that have controllers
+   registered with CKRM.
+
+3. change shares of a specific resource in a class
+
+   One or more of the following fields can/must be specified
+       res=<res_name> #mandatory
+       min_shares=<number>
+       max_shares=<number>
+       child_total_divisor=<number>
+   e.g.
+	# echo "res=numtasks,max_shares=20" > /config/ckrm/c1/shares
+
+   If any of these parameters are not specified, the current value will be
+   retained.
+
+4. Reclassify a task
+
+   write the pid of the process to the destination class' members file
+   # echo 1004 > /config/ckrm/c1/members
+
+5. Get a list of tasks assigned to a class
+
+   # cat /config/ckrm/c1/members
+   lists pids of tasks belonging to c1
+
+6. Get statictics of different resources of a class
+
+   # cat /config/ckrm/c1/stats
+   shows c1's statistics for each registered resource controller.
+
+7. Configuration settings for controllers
+   Configuration values for controller are available through module
+   parameter interfaces. Consult the controller specific documents for
+   details. For example, numtasks has it available through
+   /sys/module/ckrm_numtasks/parameters.

-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
                   ` (11 preceding siblings ...)
  2006-04-21  2:25 ` [RFC] [PATCH 12/12] Documentation for CKRM sekharan
@ 2006-04-21 14:49 ` Dave Hansen
  2006-04-21 16:58   ` Chandra Seetharaman
  12 siblings, 1 reply; 43+ messages in thread
From: Dave Hansen @ 2006-04-21 14:49 UTC (permalink / raw)
  To: sekharan; +Cc: linux-kernel, ckrm-tech

On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> CKRM has gone through a major overhaul by removing some of the complexity,
> cutting down on features and moving portions to userspace.

What do you want done with these patches?  Do you think they are ready
for mainline?  -mm?  Or, are you just posting here for comments?

-- Dave


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 14:49 ` [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Dave Hansen
@ 2006-04-21 16:58   ` Chandra Seetharaman
  2006-04-21 22:57     ` Andrew Morton
  0 siblings, 1 reply; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-21 16:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > CKRM has gone through a major overhaul by removing some of the complexity,
> > cutting down on features and moving portions to userspace.
> 
> What do you want done with these patches?  Do you think they are ready
> for mainline?  -mm?  Or, are you just posting here for comments?
> 

We think it is ready for -mm. But, want to go through a review cycle in
lkml before i request Andrew for that.

Thanks for asking,

chandra
> -- Dave
> 
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
@ 2006-04-21 19:07 Al Boldi
  2006-04-21 22:04 ` Matt Helsley
  2006-04-21 22:09 ` Chandra Seetharaman
  0 siblings, 2 replies; 43+ messages in thread
From: Al Boldi @ 2006-04-21 19:07 UTC (permalink / raw)
  To: linux-kernel

Chandra Seetharaman wrote:
> On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > CKRM has gone through a major overhaul by removing some of the
> > > complexity, cutting down on features and moving portions to userspace.
> >
> > What do you want done with these patches?  Do you think they are ready
> > for mainline?  -mm?  Or, are you just posting here for comments?
>
> We think it is ready for -mm. But, want to go through a review cycle in
> lkml before i request Andrew for that.

IMHO, it would be a good idea to decouple the current implementation and 
reconnect them via an open mapper/wrapper to allow a more flexible/open 
approach to resource management, which may ease its transition into 
mainline, due to a step-by-step instead of an all-or-none approach.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 19:07 Al Boldi
@ 2006-04-21 22:04 ` Matt Helsley
       [not found]   ` <200604220708.40018.a1426z@gawab.com>
  2006-04-21 22:09 ` Chandra Seetharaman
  1 sibling, 1 reply; 43+ messages in thread
From: Matt Helsley @ 2006-04-21 22:04 UTC (permalink / raw)
  To: Al Boldi; +Cc: LKML

On Fri, 2006-04-21 at 22:07 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the
> > > > complexity, cutting down on features and moving portions to userspace.
> > >
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> >
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> IMHO, it would be a good idea to decouple the current implementation and 
> reconnect them via an open mapper/wrapper to allow a more flexible/open 
> approach to resource management, which may ease its transition into 
> mainline, due to a step-by-step instead of an all-or-none approach.
> 
> Thanks!
> 
> --
> Al

Hi Al,

	I'm sorry, I don't understand what you're suggesting. Could you please
elaborate on how you think it should be decoupled?

Thanks,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 19:07 Al Boldi
  2006-04-21 22:04 ` Matt Helsley
@ 2006-04-21 22:09 ` Chandra Seetharaman
  1 sibling, 0 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-21 22:09 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel

On Fri, 2006-04-21 at 22:07 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the
> > > > complexity, cutting down on features and moving portions to userspace.
> > >
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> >
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> IMHO, it would be a good idea to decouple the current implementation and 
> reconnect them via an open mapper/wrapper to allow a more flexible/open 

I am not understanding your comment, can you please elaborate.

> approach to resource management, which may ease its transition into 
> mainline, due to a step-by-step instead of an all-or-none approach.

BTW, the design does allow step by step approach to resource management.
You can add individual resource control one at a time, or even turn on
only the resources you are interested in.

> 
> Thanks!
> 
> --
> Al
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 16:58   ` Chandra Seetharaman
@ 2006-04-21 22:57     ` Andrew Morton
  2006-04-22  1:48       ` Chandra Seetharaman
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2006-04-21 22:57 UTC (permalink / raw)
  To: sekharan; +Cc: haveblue, linux-kernel, ckrm-tech

Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>
> On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > CKRM has gone through a major overhaul by removing some of the complexity,
> > > cutting down on features and moving portions to userspace.
> > 
> > What do you want done with these patches?  Do you think they are ready
> > for mainline?  -mm?  Or, are you just posting here for comments?
> > 
> 
> We think it is ready for -mm. But, want to go through a review cycle in
> lkml before i request Andrew for that.

>From a quick scan, the overall code quality is probably the best I've seen
for an initial submission of this magnitude.  I had a few minor issues and
questions, but it'd need a couple of hours to go through it all.

So.  Send 'em over when you're ready.

I have one concern.  If we merge this framework into mainline then we'd
(quite reasonably) expect to see an ongoing dribble of new controllers
being submitted.  But we haven't seen those controllers yet.  So there is a
risk that you'll submit a major new controller (most likely a net or memory
controller) and it will provoke a reviewer revolt.  We'd then be in a
situation of cant-go-forward, cant-go-backward.

It would increase the comfort level if we could see what the major
controllers look like before committing.  But that's unreasonable.

Could I ask that you briefly enumerate

a) which controllers you think we'll need in the forseeable future

b) what they need to do

c) pointer to prototype code if poss

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-21 22:57     ` Andrew Morton
@ 2006-04-22  1:48       ` Chandra Seetharaman
  2006-04-22  2:13         ` Andrew Morton
  2006-04-24  1:47         ` Hirokazu Takahashi
  0 siblings, 2 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  1:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 15:57 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > On Fri, 2006-04-21 at 07:49 -0700, Dave Hansen wrote:
> > > On Thu, 2006-04-20 at 19:24 -0700, sekharan@us.ibm.com wrote:
> > > > CKRM has gone through a major overhaul by removing some of the complexity,
> > > > cutting down on features and moving portions to userspace.
> > > 
> > > What do you want done with these patches?  Do you think they are ready
> > > for mainline?  -mm?  Or, are you just posting here for comments?
> > > 
> > 
> > We think it is ready for -mm. But, want to go through a review cycle in
> > lkml before i request Andrew for that.
> 
> From a quick scan, the overall code quality is probably the best I've seen
> for an initial submission of this magnitude.  I had a few minor issues and

Thanks, and thanks to all that helped.

> questions, but it'd need a couple of hours to go through it all.
> 
> So.  Send 'em over when you're ready.

Great. I will wait for couple of days for comments and then send them
your way.

> 
> I have one concern.  If we merge this framework into mainline then we'd
> (quite reasonably) expect to see an ongoing dribble of new controllers
> being submitted.  But we haven't seen those controllers yet.  So there is a
> risk that you'll submit a major new controller (most likely a net or memory
> controller) and it will provoke a reviewer revolt.  We'd then be in a
> situation of cant-go-forward, cant-go-backward.
> 

I totally understand your concern.

CKRM's design is not tied with a specific implementation of a
controller. It allows hooking up different controllers for the same
resource. If a controller is considered complex, it can cut some of the
features and be made simpler. Or a simpler controller can replace an
earlier complex controller without affecting the user interface. 
 
This flexibility feature reduces the "cant-go-forward, cant-go-back"
problem, somewhat.

FYI, we found out that managing network resources was not falling into
this task based model and we had to invent complex layering to
accommodate it. So, we dropped our plans for network support. 

One can write controller for any resource that can be accounted at task
level. The corresponding subsystem stakeholders can ensure that it is
clean, and at acceptable level.

> It would increase the comfort level if we could see what the major
> controllers look like before committing.  But that's unreasonable.

You might have seen the CPU controller (different implementation than
what we had earlier) and the numtasks controller (can prevent fork
bombs) that followed this patchset.

>
> Could I ask that you briefly enumerate
> 
> a) which controllers you think we'll need in the forseeable future
> 

Our main object is to provide resource control for the hardware
resources: CPU, I/O and memory.

We have already posted the CPU controller.

We have two implementations of memory controller and a I/O controller. 

Memory controller is understandably more complex and controversial, and
that is the reason we haven't posted it this time around (we are looking
at ways to simplify the design and hence the complexity). Both the
memory controllers has been posted to linux-mm.

I/O controller is based on CFQ-scheduler.

> b) what they need to do

Both memory controllers provide control for LRU lists.

 - One maintains the active/inactive lists per class for each zone. It
   is of order O(1). Current code is little complex. We are looking at
   ways to simplify it.

 - Another creates pseudo zones under each zones (by splitting the 
   number of pages available in a zone) and attaches them with
   each class.

I/O Controller that we are working on is based on CFQ scheduler and
provides bandwidth control.  
> 
> c) pointer to prototype code if poss

Both the memory controllers are fully functional. We need to trim them
down.

active/inactive list per class memory controller:
http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download

pzone based memory controller:
http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2

i/o controller: This controller is not ported to the framework posted,
but can be taken for a prototype version. New version would be simpler
though.

http://prdownloads.sourceforge.net/ckrm/io_rc.tar.bz2?download


Thanks & Regards,

chandra
> 
> Thanks.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  1:48       ` Chandra Seetharaman
@ 2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
                             ` (3 more replies)
  2006-04-24  1:47         ` Hirokazu Takahashi
  1 sibling, 4 replies; 43+ messages in thread
From: Andrew Morton @ 2006-04-22  2:13 UTC (permalink / raw)
  To: sekharan; +Cc: haveblue, linux-kernel, ckrm-tech

Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>
> > 
> > c) pointer to prototype code if poss
> 
> Both the memory controllers are fully functional. We need to trim them
> down.
> 
> active/inactive list per class memory controller:
> http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download

Oh my gosh.  That converts memory reclaim from per-zone LRU to
per-CKRM-class LRU.  If configured.

This is huge.  It means that we have basically two quite different versions
of memory reclaim to test and maintain.   This is a problem.

(I hope that's the before-we-added-comments version of the patch btw).

> pzone based memory controller:
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2

>From a super-quick scan that looks saner.  Is it effective?  Is this the
way you're planning on proceeding?

This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
Just that it covers a group of mm's and not just the one mm?

Do you attempt to manage just pagecache?  So if class A tries to read 10GB
from disk, does that get more aggressively reclaimed based on class A's
resource limits?

This all would have been more comfortable if done on top of the 2.4
kernel's virtual scanner.

(btw, using the term "class" to identify a group of tasks isn't very
comfortable - it's an instance, not a class...)


Worried.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
@ 2006-04-22  2:20           ` Matt Helsley
  2006-04-22  2:33             ` Andrew Morton
  2006-04-22  5:28           ` Chandra Seetharaman
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Matt Helsley @ 2006-04-22  2:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chandra S. Seetharaman, Dave Hansen, LKML, CKRM-Tech

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:

<snip> (I'll let those more familiar with the memory controller efforts
comment on those concerns)

> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

Yes, I can see how this would be uncomfortable. How about replacing
"class" with "resource group"?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:20           ` Matt Helsley
@ 2006-04-22  2:33             ` Andrew Morton
  0 siblings, 0 replies; 43+ messages in thread
From: Andrew Morton @ 2006-04-22  2:33 UTC (permalink / raw)
  To: Matt Helsley; +Cc: sekharan, haveblue, linux-kernel, ckrm-tech

Matt Helsley <matthltc@us.ibm.com> wrote:
>
> > (btw, using the term "class" to identify a group of tasks isn't very
>  > comfortable - it's an instance, not a class...)
> 
>  Yes, I can see how this would be uncomfortable. How about replacing
>  "class" with "resource group"?

Much more comfortable, thanks ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
@ 2006-04-22  5:28           ` Chandra Seetharaman
  2006-04-24  1:10             ` KUROSAWA Takahiro
  2006-04-24  5:18             ` Hirokazu Takahashi
  2006-04-23  6:52           ` Paul Jackson
  2006-04-28  1:58           ` Chandra Seetharaman
  3 siblings, 2 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  5:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: haveblue, linux-kernel, ckrm-tech, Valerie Clement,
	Takahiro Kurosawa

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > > 
> > > c) pointer to prototype code if poss
> > 
> > Both the memory controllers are fully functional. We need to trim them
> > down.
> > 
> > active/inactive list per class memory controller:
> > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> 
> Oh my gosh.  That converts memory reclaim from per-zone LRU to
> per-CKRM-class LRU.  If configured.

Yes. We originally had an implementation that would use the existing
per-zone LRU, but the reclamation path was O(n), where n is the number
of classes. So, we moved towards a O(1) algorithm.

> 
> This is huge.  It means that we have basically two quite different versions
> of memory reclaim to test and maintain.   This is a problem.

Understood, will work and come up with an acceptable memory controller.
> 
> (I hope that's the before-we-added-comments version of the patch btw).

Yes, indeed :). As I told earlier this patch is not ready for lkml or -
mm yet.
> 
> > pzone based memory controller:
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> 
> From a super-quick scan that looks saner.  Is it effective?  Is this the
> way you're planning on proceeding?
> 

Yes, it is effective, and the reclamation is O(1) too. It has couple of
problems by design, (1) doesn't handle shared pages and (2) doesn't
provide support for both min_shares and max_shares.

> This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> Just that it covers a group of mm's and not just the one mm?

Yes, that is the core object of ckrm, associate resources to a group of
tasks.

> 
> Do you attempt to manage just pagecache?  So if class A tries to read 10GB
> from disk, does that get more aggressively reclaimed based on class A's
> resource limits?

Yes, it would get more aggressively reclaimed. But, if you have the I/O
controller also configured appropriately only class A will be affected.

> 
> This all would have been more comfortable if done on top of the 2.4
> kernel's virtual scanner.
> 
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

We could go with "Resource Group" as Matt suggested.
> 
> 

Valerie, KUROSAWA, Please free to add any more details.
> Worried.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
       [not found]   ` <200604220708.40018.a1426z@gawab.com>
@ 2006-04-22  5:46     ` Chandra Seetharaman
  2006-04-22 20:40       ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-22  5:46 UTC (permalink / raw)
  To: Al Boldi; +Cc: Matt Helsley, LKML, Andrew Morton

On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:

> i.e: it should be possible to run the RCs w/o CKRM.
> 
> The current design pins the RCs on CKRM, when in fact this is not necessary.  
> One way to decouple them, could be to pin them against pid, thus allowing an 
> RC to leverage the pid hierarchy w/o the need for CKRM.  And only when finer 
> RM control is necessary, should CKRM come into play, by dynamically 
> adjusting the RC to achieve the desired effect.

This model works well in universities, where you associate some resource
when a student logs in, or a virtualised environment (like a UML or
vserver), where you attach resource to the root process.

It doesn't work with web servers, database servers etc.,, where the main
application will be forking tasks for different set of end users. In
that case you have to group tasks that are not related to one another
and attach resources to them.

Having a unified interface gives the system administrator ability to
group the tasks as they see them in real life (a department or important
transactions or just critical apps in a desktop).

It also has the added advantage that the resource controller writer do
not have to spend their time in coming up with an interface for their
controller. On the other hand if they do, the user finally ends up with
multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
resource management.

> 
> Thanks!
> 
> --
> Al
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:46     ` Chandra Seetharaman
@ 2006-04-22 20:40       ` Al Boldi
  2006-04-23  2:33         ` Matt Helsley
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-22 20:40 UTC (permalink / raw)
  To: sekharan; +Cc: Matt Helsley, LKML, Andrew Morton

Chandra Seetharaman wrote:
> On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > i.e: it should be possible to run the RCs w/o CKRM.
> >
> > The current design pins the RCs on CKRM, when in fact this is not
> > necessary. One way to decouple them, could be to pin them against pid,
> > thus allowing an RC to leverage the pid hierarchy w/o the need for CKRM.
> >  And only when finer RM control is necessary, should CKRM come into
> > play, by dynamically adjusting the RC to achieve the desired effect.
>
> This model works well in universities, where you associate some resource
> when a student logs in, or a virtualised environment (like a UML or
> vserver), where you attach resource to the root process.
>
> It doesn't work with web servers, database servers etc.,, where the main
> application will be forking tasks for different set of end users. In
> that case you have to group tasks that are not related to one another
> and attach resources to them.
>
> Having a unified interface gives the system administrator ability to
> group the tasks as they see them in real life (a department or important
> transactions or just critical apps in a desktop).

So, why drag this unified interface around when it is only needed in certain 
models.  The underlying interface via pid comes for free and should be 
leveraged as such to yield a low overhead implementation.  Then maybe, when 
a more complex model is involved should CKRM come into play.

> It also has the added advantage that the resource controller writer do
> not have to spend their time in coming up with an interface for their
> controller. On the other hand if they do, the user finally ends up with
> multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
> resource management.

So, maybe what is needed is an abstract parent RC that implements this 
interface and lets the child RCs implement the specifics, and allows CKRM to 
connect to the parent RC to allow finer RM control when a specific model 
requires it.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22 20:40       ` Al Boldi
@ 2006-04-23  2:33         ` Matt Helsley
  2006-04-23 11:22           ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Matt Helsley @ 2006-04-23  2:33 UTC (permalink / raw)
  To: Al Boldi; +Cc: sekharan, LKML, Andrew Morton

On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> Chandra Seetharaman wrote:
> > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > i.e: it should be possible to run the RCs w/o CKRM.
> > >
> > > The current design pins the RCs on CKRM, when in fact this is not
> > > necessary. One way to decouple them, could be to pin them against pid,
> > > thus allowing an RC to leverage the pid hierarchy w/o the need for CKRM.
> > >  And only when finer RM control is necessary, should CKRM come into
> > > play, by dynamically adjusting the RC to achieve the desired effect.
> >
> > This model works well in universities, where you associate some resource
> > when a student logs in, or a virtualised environment (like a UML or
> > vserver), where you attach resource to the root process.
> >
> > It doesn't work with web servers, database servers etc.,, where the main
> > application will be forking tasks for different set of end users. In
> > that case you have to group tasks that are not related to one another
> > and attach resources to them.
> >
> > Having a unified interface gives the system administrator ability to
> > group the tasks as they see them in real life (a department or important
> > transactions or just critical apps in a desktop).
> 
> So, why drag this unified interface around when it is only needed in certain 
> models.  The underlying interface via pid comes for free and should be 
> leveraged as such to yield a low overhead implementation.  Then maybe, when 
> a more complex model is involved should CKRM come into play.

Assuming I'm not misinterpretting your brief description above:

	The interface "via pid" does not come for free. You'd essentially
attach the shares structures to the task and implement inheritance and
hierarchy of those shares during fork -- hardly lower overhead when you
consider that in most cases the number of tasks is going to be much
larger than the number of classes. Furthermore this would mean
duplicating the loops in ckrm_alloc_class, ckrm_free_class,
ckrm_register_controller, and ckrm_unregister_controller. I suspect the
loops would be deeper, more complex, execute more frequently, and have a
much wider performance impact when you consider that we'd be dealing
with the task struct directly instead of a class. The class structure
effectively factors most of the loops out of the fork() and exit() paths
and into mkdir() rmdir() calls that create and remove classes. The
remaining loops in fork() and exit() paths are proportional to the
number of resource controllers -- currently limitedto 8 by
CKRM_MAX_RES_CTLRS.

	Classes also have an advantage when it comes to administrating resource
management -- they are created and destroyed by an administrator and
hence are easier to control. In contrast, the resource management
decisions associated purely with tasks would disappear with the task. In
many cases a task would be too short-lived for an administrator to
manually intervene even if swarms of these tasks are created. Having
this orthogonal hierarchy gives us the opportunity to manage all of
these situations via a common interface and factors out overhead from
the per-task solution you seem to be advocating.

I'm willing to discuss your ideas without patches but I think patches
(even if incomplete) would be clearer.

> > It also has the added advantage that the resource controller writer do
> > not have to spend their time in coming up with an interface for their
> > controller. On the other hand if they do, the user finally ends up with
> > multiple interface (/proc, sysfs, configfs, /dev etc.,) to do their
> > resource management.
> 
> So, maybe what is needed is an abstract parent RC that implements this 
> interface and lets the child RCs implement the specifics, and allows CKRM to 
> connect to the parent RC to allow finer RM control when a specific model 
> requires it.

	I'm not sure what advantage that would give compared to CKRM as it
stands now -- it sounds much more complex. Could you give an example of
what kind of interfaces you're suggesting?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
  2006-04-22  2:20           ` Matt Helsley
  2006-04-22  5:28           ` Chandra Seetharaman
@ 2006-04-23  6:52           ` Paul Jackson
  2006-04-23  9:31             ` Matt Helsley
  2006-04-28  1:58           ` Chandra Seetharaman
  3 siblings, 1 reply; 43+ messages in thread
From: Paul Jackson @ 2006-04-23  6:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: sekharan, haveblue, linux-kernel, ckrm-tech

Andrew wrote:
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)

Bless you.  I objected to the term 'class' a long time ago, but failed
to advance my case in a successful fashion.

Matt replied:
> "resource group"?

Nice.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23  6:52           ` Paul Jackson
@ 2006-04-23  9:31             ` Matt Helsley
  0 siblings, 0 replies; 43+ messages in thread
From: Matt Helsley @ 2006-04-23  9:31 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andrew Morton, sekharan, haveblue, linux-kernel, ckrm-tech

On Sat, 2006-04-22 at 23:52 -0700, Paul Jackson wrote:
> Andrew wrote:
> > (btw, using the term "class" to identify a group of tasks isn't very
> > comfortable - it's an instance, not a class...)
> 
> Bless you.  I objected to the term 'class' a long time ago, but failed
> to advance my case in a successful fashion.

	Well, I wouldn't say you were entirely unsuccessful. I distinctly
remembered your case and I tried to think of suitable names during the
recent changes. Please take a look at the latest set of patches and see
if you think the names are clearer.

> Matt replied:
> > "resource group"?
> 
> Nice.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23  2:33         ` Matt Helsley
@ 2006-04-23 11:22           ` Al Boldi
  2006-04-24 18:23             ` Chandra Seetharaman
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-23 11:22 UTC (permalink / raw)
  To: Matt Helsley; +Cc: sekharan, LKML, Andrew Morton

Matt Helsley wrote:
> On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> > Chandra Seetharaman wrote:
> > > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > > i.e: it should be possible to run the RCs w/o CKRM.
> > > >
> > > > The current design pins the RCs on CKRM, when in fact this is not
> > > > necessary. One way to decouple them, could be to pin them against
> > > > pid, thus allowing an RC to leverage the pid hierarchy w/o the need
> > > > for CKRM. And only when finer RM control is necessary, should CKRM
> > > > come into play, by dynamically adjusting the RC to achieve the
> > > > desired effect.
> > >
> > > This model works well in universities, where you associate some
> > > resource when a student logs in, or a virtualised environment (like a
> > > UML or vserver), where you attach resource to the root process.
> > >
> > > It doesn't work with web servers, database servers etc.,, where the
> > > main application will be forking tasks for different set of end users.
> > > In that case you have to group tasks that are not related to one
> > > another and attach resources to them.
> > >
> > > Having a unified interface gives the system administrator ability to
> > > group the tasks as they see them in real life (a department or
> > > important transactions or just critical apps in a desktop).
> >
> > So, why drag this unified interface around when it is only needed in
> > certain models.  The underlying interface via pid comes for free and
> > should be leveraged as such to yield a low overhead implementation. 
> > Then maybe, when a more complex model is involved should CKRM come into
> > play.
>
> Assuming I'm not misinterpretting your brief description above:
>
> 	The interface "via pid" does not come for free. You'd essentially
> attach the shares structures to the task and implement inheritance and
> hierarchy of those shares during fork

No, attach the shares struct to the parent RC, and allow it to take advantage 
of the free pid hierarchy.

> I'm willing to discuss your ideas without patches but I think patches
> (even if incomplete) would be clearer.

The discussion here is more about design rather than implementation.

> > > It also has the added advantage that the resource controller writer do
> > > not have to spend their time in coming up with an interface for their
> > > controller. On the other hand if they do, the user finally ends up
> > > with multiple interface (/proc, sysfs, configfs, /dev etc.,) to do
> > > their resource management.
> >
> > So, maybe what is needed is an abstract parent RC that implements this
> > interface and lets the child RCs implement the specifics, and allows
> > CKRM to connect to the parent RC to allow finer RM control when a
> > specific model requires it.
>
> 	I'm not sure what advantage that would give compared to CKRM as it
> stands now -- it sounds much more complex. Could you give an example of
> what kind of interfaces you're suggesting?

Nothing wrong w/ CKRM per se, other than its monolithic approach.

The suggestion here would be to modularize CKRM by removing dependencies, 
effectively splitting CKRM into 3 parts:

	  RM --- RC parent (no-op)
		/ | \
	      RC child (ntask, cpu, mem, .....)

So it could be possible to:
1. Load the RC parent to provide for simple stats based on pid-hierarchy.
2. Load an RC child for rc-enforcement.
3. Load the RM for finer control across different tasks by way of an 
orthogonal hierarchy.

Although this may look more complex than the monolithic approach, it is in 
fact lots simpler, due to its "division of labor" approach.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:28           ` Chandra Seetharaman
@ 2006-04-24  1:10             ` KUROSAWA Takahiro
  2006-04-24  4:39               ` Kirill Korotaev
  2006-04-24  5:18             ` Hirokazu Takahashi
  1 sibling, 1 reply; 43+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  1:10 UTC (permalink / raw)
  To: sekharan
  Cc: akpm, haveblue, linux-kernel, ckrm-tech,
	" Valerie.Clement"

On Fri, 21 Apr 2006 22:28:45 -0700
Chandra Seetharaman <sekharan@us.ibm.com> wrote:

> > > pzone based memory controller:
> > > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> > 
> > From a super-quick scan that looks saner.  Is it effective?  Is this the
> > way you're planning on proceeding?
> 
> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> problems by design, (1) doesn't handle shared pages and (2) doesn't
> provide support for both min_shares and max_shares.

Right.  I wanted to show proof-of-cencept of the pzone based controller
and implemented minimal features necessary as the memory controller.
So, the pzone based controller still needs development and some cleanup.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  1:48       ` Chandra Seetharaman
  2006-04-22  2:13         ` Andrew Morton
@ 2006-04-24  1:47         ` Hirokazu Takahashi
  2006-04-24 20:42           ` Shailabh Nagar
  1 sibling, 1 reply; 43+ messages in thread
From: Hirokazu Takahashi @ 2006-04-24  1:47 UTC (permalink / raw)
  To: sekharan; +Cc: akpm, haveblue, linux-kernel, ckrm-tech

Hi Chandra,

> > Could I ask that you briefly enumerate
> > 
> > a) which controllers you think we'll need in the forseeable future
> > 
> 
> Our main object is to provide resource control for the hardware
> resources: CPU, I/O and memory.
> 
> We have already posted the CPU controller.
> 
> We have two implementations of memory controller and a I/O controller. 
> 
> Memory controller is understandably more complex and controversial, and
> that is the reason we haven't posted it this time around (we are looking
> at ways to simplify the design and hence the complexity). Both the
> memory controllers has been posted to linux-mm.
> 
> I/O controller is based on CFQ-scheduler.
> 
> > b) what they need to do

	(snip)

> I/O Controller that we are working on is based on CFQ scheduler and
> provides bandwidth control.  
> > 
> > c) pointer to prototype code if poss

	(snip)

> i/o controller: This controller is not ported to the framework posted,
> but can be taken for a prototype version. New version would be simpler
> though.

I think controlling I/O bandwidth is right way to go.

However, I think you need to change the design of the controller a bit.
A lot of I/O requests processes issue will be handled by other contexts.
There are AIO, journaling, pdflush and vmscan, which some kernel threads
treat instead of the processes.

The current design looks not to care about this.

Thanks,
Hirokazu Takahashi.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  1:10             ` KUROSAWA Takahiro
@ 2006-04-24  4:39               ` Kirill Korotaev
  2006-04-24  5:41                 ` KUROSAWA Takahiro
  0 siblings, 1 reply; 43+ messages in thread
From: Kirill Korotaev @ 2006-04-24  4:39 UTC (permalink / raw)
  To: KUROSAWA Takahiro
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement

>>>> pzone based memory controller:
>>>> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
>>> From a super-quick scan that looks saner.  Is it effective?  Is this the
>>> way you're planning on proceeding?
>> Yes, it is effective, and the reclamation is O(1) too. It has couple of
>> problems by design, (1) doesn't handle shared pages and (2) doesn't
>> provide support for both min_shares and max_shares.
> 
> Right.  I wanted to show proof-of-cencept of the pzone based controller
> and implemented minimal features necessary as the memory controller.
> So, the pzone based controller still needs development and some cleanup.
Just out of curiosity, how it was meassured that it is effective?
How does it work when there is a global memory shortage in the system?

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  5:28           ` Chandra Seetharaman
  2006-04-24  1:10             ` KUROSAWA Takahiro
@ 2006-04-24  5:18             ` Hirokazu Takahashi
  2006-04-25  1:42               ` Chandra Seetharaman
  1 sibling, 1 reply; 43+ messages in thread
From: Hirokazu Takahashi @ 2006-04-24  5:18 UTC (permalink / raw)
  To: sekharan
  Cc: akpm, haveblue, linux-kernel, ckrm-tech,
	" Valerie.Clement", kurosawa

Hi Chandra, 

> > > > c) pointer to prototype code if poss
> > > 
> > > Both the memory controllers are fully functional. We need to trim them
> > > down.
> > > 
> > > active/inactive list per class memory controller:
> > > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> > 
> > Oh my gosh.  That converts memory reclaim from per-zone LRU to
> > per-CKRM-class LRU.  If configured.
> 
> Yes. We originally had an implementation that would use the existing
> per-zone LRU, but the reclamation path was O(n), where n is the number
> of classes. So, we moved towards a O(1) algorithm.
> 
> > 
> > This is huge.  It means that we have basically two quite different versions
> > of memory reclaim to test and maintain.   This is a problem.
> 
> Understood, will work and come up with an acceptable memory controller.
> > 
> > (I hope that's the before-we-added-comments version of the patch btw).
> 
> Yes, indeed :). As I told earlier this patch is not ready for lkml or -
> mm yet.
> > 
> > > pzone based memory controller:
> > > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> > 
> > From a super-quick scan that looks saner.  Is it effective?  Is this the
> > way you're planning on proceeding?
> > 
> 
> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> problems by design, (1) doesn't handle shared pages and (2) doesn't
> provide support for both min_shares and max_shares.

I'm not sure all of them have to be managed under ckrm_core and rcfs
in kernel.

These functions you mentioned can be implemented in user space
to minimize the overhead in usual VM operations because it isn't
expected quick response to resize it. It is a bit different from
that of CPU resource.

You don't need to invent everything. I think you can reuse what
NUMA team is doing instead. This approach may not fit in your rcfs,
though.

> > This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> > Just that it covers a group of mm's and not just the one mm?
> 
> Yes, that is the core object of ckrm, associate resources to a group of
> tasks.

Thanks,
Hirokazu Takahahsi.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  4:39               ` Kirill Korotaev
@ 2006-04-24  5:41                 ` KUROSAWA Takahiro
  2006-04-24  6:45                   ` Kirill Korotaev
  0 siblings, 1 reply; 43+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  5:41 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement

On Mon, 24 Apr 2006 08:39:52 +0400
Kirill Korotaev <dev@openvz.org> wrote:

> >>>> pzone based memory controller:
> >>>> http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> >>> From a super-quick scan that looks saner.  Is it effective?  Is this the
> >>> way you're planning on proceeding?
> >> Yes, it is effective, and the reclamation is O(1) too. It has couple of
> >> problems by design, (1) doesn't handle shared pages and (2) doesn't
> >> provide support for both min_shares and max_shares.
> > 
> > Right.  I wanted to show proof-of-cencept of the pzone based controller
> > and implemented minimal features necessary as the memory controller.
> > So, the pzone based controller still needs development and some cleanup.
> Just out of curiosity, how it was meassured that it is effective?

I don't have any benchmark numbers yet, so I can't explain the
effectiveness with numbers.  I've been looking for the way to
measure the cost of pzones correctly, but I've not found it out yet.

> How does it work when there is a global memory shortage in the system?

I guess you are referring to the situation that global memory is running
out but there are free pages in pzones.  These free pages in pzones are
handled as reserved for pzone users and not used even in global memory 
shortage.

Thanks,
-- 
KUROSAWA, Takahiro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  5:41                 ` KUROSAWA Takahiro
@ 2006-04-24  6:45                   ` Kirill Korotaev
  2006-04-24  7:12                     ` KUROSAWA Takahiro
  0 siblings, 1 reply; 43+ messages in thread
From: Kirill Korotaev @ 2006-04-24  6:45 UTC (permalink / raw)
  To: KUROSAWA Takahiro
  Cc: Kirill Korotaev, sekharan, akpm, haveblue, linux-kernel,
	ckrm-tech, Valerie.Clement, devel

>>>>Yes, it is effective, and the reclamation is O(1) too. It has couple of
>>>>problems by design, (1) doesn't handle shared pages and (2) doesn't
>>>>provide support for both min_shares and max_shares.
>>>
>>>Right.  I wanted to show proof-of-cencept of the pzone based controller
>>>and implemented minimal features necessary as the memory controller.
>>>So, the pzone based controller still needs development and some cleanup.
>>
>>Just out of curiosity, how it was meassured that it is effective?
> 
> 
> I don't have any benchmark numbers yet, so I can't explain the
> effectiveness with numbers.  I've been looking for the way to
> measure the cost of pzones correctly, but I've not found it out yet.
> 
> 
>>How does it work when there is a global memory shortage in the system?
> 
> 
> I guess you are referring to the situation that global memory is running
> out but there are free pages in pzones.  These free pages in pzones are
> handled as reserved for pzone users and not used even in global memory 
> shortage.
ok. Let me explain what I mean.
Imagine the situation with global memory shortage. In kernel, there are 
threads which do some job behalf the user, e.g. kjournald, loop etc. If 
the user has some pzone memory, but these threads fail to do their job 
some nasty things can happen (ext3 problems, deadlocks, OOM etc.)
If such behaviour is ok for you, then great. But did you consider it?

Also, I can't understand how it works with OOM killer. If pzones has 
enough memory, but there is a global shortage, who will be killed?

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  6:45                   ` Kirill Korotaev
@ 2006-04-24  7:12                     ` KUROSAWA Takahiro
  0 siblings, 0 replies; 43+ messages in thread
From: KUROSAWA Takahiro @ 2006-04-24  7:12 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech,
	Valerie.Clement, devel

On Mon, 24 Apr 2006 10:45:59 +0400
Kirill Korotaev <dev@openvz.org> wrote:

> >>>>Yes, it is effective, and the reclamation is O(1) too. It has couple of
> >>>>problems by design, (1) doesn't handle shared pages and (2) doesn't
> >>>>provide support for both min_shares and max_shares.
> >>>
> >>>Right.  I wanted to show proof-of-cencept of the pzone based controller
> >>>and implemented minimal features necessary as the memory controller.
> >>>So, the pzone based controller still needs development and some cleanup.
> >>
> >>Just out of curiosity, how it was meassured that it is effective?
> > 
> > I don't have any benchmark numbers yet, so I can't explain the
> > effectiveness with numbers.  I've been looking for the way to
> > measure the cost of pzones correctly, but I've not found it out yet.
> > 
> >>How does it work when there is a global memory shortage in the system?
> > 
> > I guess you are referring to the situation that global memory is running
> > out but there are free pages in pzones.  These free pages in pzones are
> > handled as reserved for pzone users and not used even in global memory 
> > shortage.
> ok. Let me explain what I mean.
> Imagine the situation with global memory shortage. In kernel, there are 
> threads which do some job behalf the user, e.g. kjournald, loop etc. If 
> the user has some pzone memory, but these threads fail to do their job 
> some nasty things can happen (ext3 problems, deadlocks, OOM etc.)
> If such behaviour is ok for you, then great. But did you consider it?
> 
> Also, I can't understand how it works with OOM killer. If pzones has 
> enough memory, but there is a global shortage, who will be killed?

I understand.
IMHO, only the system processes should use global memory.
User processes that may cause such memory shortage should be 
enclosed in pzones first.

Thanks,

-- 
KUROSAWA, Takahiro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-23 11:22           ` Al Boldi
@ 2006-04-24 18:23             ` Chandra Seetharaman
  0 siblings, 0 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-24 18:23 UTC (permalink / raw)
  To: Al Boldi; +Cc: Matt Helsley, LKML, Andrew Morton

On Sun, 2006-04-23 at 14:22 +0300, Al Boldi wrote:
> Matt Helsley wrote:
> > On Sat, 2006-04-22 at 23:40 +0300, Al Boldi wrote:
> > > Chandra Seetharaman wrote:
> > > > On Sat, 2006-04-22 at 07:08 +0300, Al Boldi wrote:
> > > > > i.e: it should be possible to run the RCs w/o CKRM.
> > > > >
> > > > > The current design pins the RCs on CKRM, when in fact this is not
> > > > > necessary. One way to decouple them, could be to pin them against
> > > > > pid, thus allowing an RC to leverage the pid hierarchy w/o the need
> > > > > for CKRM. And only when finer RM control is necessary, should CKRM
> > > > > come into play, by dynamically adjusting the RC to achieve the
> > > > > desired effect.
> > > >
> > > > This model works well in universities, where you associate some
> > > > resource when a student logs in, or a virtualised environment (like a
> > > > UML or vserver), where you attach resource to the root process.
> > > >
> > > > It doesn't work with web servers, database servers etc.,, where the
> > > > main application will be forking tasks for different set of end users.
> > > > In that case you have to group tasks that are not related to one
> > > > another and attach resources to them.
> > > >
> > > > Having a unified interface gives the system administrator ability to
> > > > group the tasks as they see them in real life (a department or
> > > > important transactions or just critical apps in a desktop).
> > >
> > > So, why drag this unified interface around when it is only needed in
> > > certain models.  The underlying interface via pid comes for free and

The "pid tree" approach will not allow ISPs to provide workload
management capabilities inside a virual server also.

> > > should be leveraged as such to yield a low overhead implementation.

As you can see "pid based resource control" does not lead to a low
overhead implementation.

>  
> > > Then maybe, when a more complex model is involved should CKRM come into
> > > play.
> >
> > Assuming I'm not misinterpretting your brief description above:
> >
> > 	The interface "via pid" does not come for free. You'd essentially
> > attach the shares structures to the task and implement inheritance and
> > hierarchy of those shares during fork
> 
> No, attach the shares struct to the parent RC, and allow it to take advantage 
> of the free pid hierarchy.
> 
> > I'm willing to discuss your ideas without patches but I think patches
> > (even if incomplete) would be clearer.
> 
> The discussion here is more about design rather than implementation.

"pid based RC" does sound as easily understandable design. But IMHO, we
should consider how the implementation will be (in this context
comparing it with CKRM). As Matt pointed it may not be any less
complex. 
> 
> > > > It also has the added advantage that the resource controller writer do
> > > > not have to spend their time in coming up with an interface for their
> > > > controller. On the other hand if they do, the user finally ends up
> > > > with multiple interface (/proc, sysfs, configfs, /dev etc.,) to do
> > > > their resource management.
> > >
> > > So, maybe what is needed is an abstract parent RC that implements this
> > > interface and lets the child RCs implement the specifics, and allows
> > > CKRM to connect to the parent RC to allow finer RM control when a
> > > specific model requires it.
> >
> > 	I'm not sure what advantage that would give compared to CKRM as it
> > stands now -- it sounds much more complex. Could you give an example of
> > what kind of interfaces you're suggesting?
> 
> Nothing wrong w/ CKRM per se, other than its monolithic approach.
> 
> The suggestion here would be to modularize CKRM by removing dependencies, 
> effectively splitting CKRM into 3 parts:
> 
> 	  RM --- RC parent (no-op)
> 		/ | \
> 	      RC child (ntask, cpu, mem, .....)
> 

If the "RC parent" is _only_ going to allow attaching resource shares
with a "pid hierarchy", then how can an RM attach unrelated tasks to any
resource share ?

CKRM brings in grouping of unrelated tasks, which IMO is not possible
with the "pid tree" approach. On the other hand, CKRM takes care of
different scenarios.

> So it could be possible to:
> 1. Load the RC parent to provide for simple stats based on pid-hierarchy.
> 2. Load an RC child for rc-enforcement.
> 3. Load the RM for finer control across different tasks by way of an 
> orthogonal hierarchy.
> 
> Although this may look more complex than the monolithic approach, it is in 
> fact lots simpler, due to its "division of labor" approach.
> 
> Thanks!
> 
> --
> Al
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  1:47         ` Hirokazu Takahashi
@ 2006-04-24 20:42           ` Shailabh Nagar
  0 siblings, 0 replies; 43+ messages in thread
From: Shailabh Nagar @ 2006-04-24 20:42 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: sekharan, akpm, haveblue, linux-kernel, ckrm-tech

Hirokazu Takahashi wrote:

>  
>
>>i/o controller: This controller is not ported to the framework posted,
>>but can be taken for a prototype version. New version would be simpler
>>though.
>>    
>>
>
>I think controlling I/O bandwidth is right way to go.
>  
>
Thanks. Obviously we agree heartily :-)

>However, I think you need to change the design of the controller a bit.
>A lot of I/O requests processes issue will be handled by other contexts.
>There are AIO, journaling, pdflush and vmscan, which some kernel threads
>treat instead of the processes.
>
>The current design looks not to care about this.
>  
>
Yes. The current design, which builds directly on top of the CFQ 
scheduler, does not attempt to treat kernel
threads specially in order to account the I/O they're doing on behalf of 
others properly. This was mainly because
of the desire to keep the controller simple.

I suspect pdflush and vmscan I/O is never going to be properly 
attributable and journaling may be possible but
unlikely to be worth it given the risks of throttling it ?  AIO is 
likely to be something we can address if there is
consensus that one is willing to pay the price of tracking the source 
through the I/O submission layers.

I suppose this would be a good time to dust off the I/O controller and 
post it so discussions can become more
concrete.

But as always, changes in the design and implementation are always 
welcome....

Regards,
Shailabh


>Thanks,
>Hirokazu Takahashi.
>  
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-24  5:18             ` Hirokazu Takahashi
@ 2006-04-25  1:42               ` Chandra Seetharaman
  0 siblings, 0 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-25  1:42 UTC (permalink / raw)
  To: Hirokazu Takahashi
  Cc: akpm, haveblue, linux-kernel, ckrm-tech, Valerie.Clement,
	kurosawa

On Mon, 2006-04-24 at 14:18 +0900, Hirokazu Takahashi wrote:
> Hi Chandra, 
<snip>
> > Yes, it is effective, and the reclamation is O(1) too. It has couple of
> > problems by design, (1) doesn't handle shared pages and (2) doesn't
> > provide support for both min_shares and max_shares.
> 
> I'm not sure all of them have to be managed under ckrm_core and rcfs
> in kernel.
> 
> These functions you mentioned can be implemented in user space
> to minimize the overhead in usual VM operations because it isn't
> expected quick response to resize it. It is a bit different from
> that of CPU resource.

Agree, that is where the additional complexity arise from.

If the user can achieve the same results with user space solution that
would be good too. 

Thanks

chandra

> You don't need to invent everything. I think you can reuse what
> NUMA team is doing instead. This approach may not fit in your rcfs,
> though.
> 
> > > This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> > > Just that it covers a group of mm's and not just the one mm?
> > 
> > Yes, that is the core object of ckrm, associate resources to a group of
> > tasks.
> 
> Thanks,
> Hirokazu Takahahsi.
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-22  2:13         ` Andrew Morton
                             ` (2 preceding siblings ...)
  2006-04-23  6:52           ` Paul Jackson
@ 2006-04-28  1:58           ` Chandra Seetharaman
  2006-04-28  6:07             ` Kirill Korotaev
  3 siblings, 1 reply; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-28  1:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-21 at 19:13 -0700, Andrew Morton wrote:
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> >
> > > 
> > > c) pointer to prototype code if poss
> > 
> > Both the memory controllers are fully functional. We need to trim them
> > down.
> > 
> > active/inactive list per class memory controller:
> > http://prdownloads.sourceforge.net/ckrm/mem_rc-f0.4-2615-v2.tz?download
> 
> Oh my gosh.  That converts memory reclaim from per-zone LRU to
> per-CKRM-class LRU.  If configured.
> 
> This is huge.  It means that we have basically two quite different versions
> of memory reclaim to test and maintain.   This is a problem.
> 
> (I hope that's the before-we-added-comments version of the patch btw).
> 
> > pzone based memory controller:
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=113867467006531&w=2
> 
> From a super-quick scan that looks saner.  Is it effective?  Is this the
> way you're planning on proceeding?
> 
> This requirement is basically a glorified RLIMIT_RSS manager, isn't it? 
> Just that it covers a group of mm's and not just the one mm?
> 
> Do you attempt to manage just pagecache?  So if class A tries to read 10GB
> from disk, does that get more aggressively reclaimed based on class A's
> resource limits?
> 
> This all would have been more comfortable if done on top of the 2.4
> kernel's virtual scanner.
> 
> (btw, using the term "class" to identify a group of tasks isn't very
> comfortable - it's an instance, not a class...)
> 
> 
> Worried.

The object of this infrastructure is to get a unified interface for
resource management, irrespective of the resource that is being managed.

As I mentioned in my earlier email, subsystem experts are the ones who
will finally decide what type resource controller they will accept. With
VM experts' direction and advice, i am positive that we will get an
excellent memory controller (as well as other controllers).

As you might have noticed, we have gone through major changes to come to
community's acceptance levels. We are now making use of all possible
features (kref, process event connector, configfs, module parameter,
kzalloc) in this infrastructure.

Having a CPU controller, two memory controllers, an I/O controller and a
numtasks controller proves that the infrastructure does handle major
resources nicely and is also capable of managing virtual resources.

Hope i reduced your worries (at least some :).

regards,

chandra
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-28  1:58           ` Chandra Seetharaman
@ 2006-04-28  6:07             ` Kirill Korotaev
  2006-04-28 17:57               ` Chandra Seetharaman
  0 siblings, 1 reply; 43+ messages in thread
From: Kirill Korotaev @ 2006-04-28  6:07 UTC (permalink / raw)
  To: sekharan; +Cc: Andrew Morton, haveblue, linux-kernel, ckrm-tech

>>Worried.
> The object of this infrastructure is to get a unified interface for
> resource management, irrespective of the resource that is being managed.
> 
> As I mentioned in my earlier email, subsystem experts are the ones who
> will finally decide what type resource controller they will accept. With
> VM experts' direction and advice, i am positive that we will get an
> excellent memory controller (as well as other controllers).
> 
> As you might have noticed, we have gone through major changes to come to
> community's acceptance levels. We are now making use of all possible
> features (kref, process event connector, configfs, module parameter,
> kzalloc) in this infrastructure.
> 
> Having a CPU controller, two memory controllers, an I/O controller and a
> numtasks controller proves that the infrastructure does handle major
> resources nicely and is also capable of managing virtual resources.
> 
> Hope i reduced your worries (at least some :).
Not all :) Let me explain.

Until you provided something more complex then numtasks, this 
infrastructure is pure theory. For example, in your infrastracture, when 
you will add memory resource controller with data sharing, you will face 
that changing CKRM class of the tasks is almost impossible in a suitable 
way. Another possible situation: hierarchical classes with shared memory 
are even more complicated thing.

In both cases you can end up with a poor/complicated/slow solution or 
dropping some of your infrastructre features (changing class on the fly, 
hierarchy) or which is worse IMHO with incosistency between controllers 
and interfaces.

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul
  2006-04-28  6:07             ` Kirill Korotaev
@ 2006-04-28 17:57               ` Chandra Seetharaman
  0 siblings, 0 replies; 43+ messages in thread
From: Chandra Seetharaman @ 2006-04-28 17:57 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: Andrew Morton, haveblue, linux-kernel, ckrm-tech

On Fri, 2006-04-28 at 10:07 +0400, Kirill Korotaev wrote:
> >>Worried.
> > The object of this infrastructure is to get a unified interface for
> > resource management, irrespective of the resource that is being managed.
> > 
> > As I mentioned in my earlier email, subsystem experts are the ones who
> > will finally decide what type resource controller they will accept. With
> > VM experts' direction and advice, i am positive that we will get an
> > excellent memory controller (as well as other controllers).
> > 
> > As you might have noticed, we have gone through major changes to come to
> > community's acceptance levels. We are now making use of all possible
> > features (kref, process event connector, configfs, module parameter,
> > kzalloc) in this infrastructure.
> > 
> > Having a CPU controller, two memory controllers, an I/O controller and a
> > numtasks controller proves that the infrastructure does handle major
> > resources nicely and is also capable of managing virtual resources.
> > 
> > Hope i reduced your worries (at least some :).
> Not all :) Let me explain.
> 
> Until you provided something more complex then numtasks, this 
> infrastructure is pure theory. For example, in your infrastracture, when 
> you will add memory resource controller with data sharing, you will face 
> that changing CKRM class of the tasks is almost impossible in a suitable 

I do not see a problem here, there could be 2 solutions:
 - do not account shared pages against the resource group(put them in
   the default resource group (as some other OSs do)).
 - when you are moving the task to a different class, calculate the
   resource group's usage depending on how many users are using a 
   specific page.
> way. Another possible situation: hierarchical classes with shared memory 
> are even more complicated thing.

Hierarchy is not an issue. Resource controller can calculate the
absolute number of resources (say no. of pages in this case) when the
shares are assigned and then treat all resource groups as flat.

> 
> In both cases you can end up with a poor/complicated/slow solution or 
> dropping some of your infrastructre features (changing class on the fly, 
> hierarchy) or which is worse IMHO with incosistency between controllers 
> and interfaces.

I am not convinced (based on the above explanations).
> 
> Thanks,
> Kirill
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2006-04-28 17:57 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-21  2:24 [RFC] [PATCH 00/12] CKRM after a major overhaul sekharan
2006-04-21  2:24 ` [RFC] [PATCH 01/12] Register/Unregister interface for Controllers sekharan
2006-04-21  2:24 ` [RFC] [PATCH 02/12] Class creation/deletion sekharan
2006-04-21  2:24 ` [RFC] [PATCH 03/12] Share Handling sekharan
2006-04-21  2:24 ` [RFC] [PATCH 04/12] Add task logic to class sekharan
2006-04-21  2:24 ` [RFC] [PATCH 05/12] Init and clear class info in task sekharan
2006-04-21  2:24 ` [RFC] [PATCH 06/12] Add proc interface to get class info of task sekharan
2006-04-21  2:24 ` [RFC] [PATCH 07/12] Configfs based filesystem user interface - RCFS sekharan
2006-04-21  2:24 ` [RFC] [PATCH 08/12] Add attribute support to RCFS sekharan
2006-04-21  2:25 ` [RFC] [PATCH 09/12] Add stats file " sekharan
2006-04-21  2:25 ` [RFC] [PATCH 10/12] Add shares " sekharan
2006-04-21  2:25 ` [RFC] [PATCH 11/12] Add members " sekharan
2006-04-21  2:25 ` [RFC] [PATCH 12/12] Documentation for CKRM sekharan
2006-04-21 14:49 ` [ckrm-tech] [RFC] [PATCH 00/12] CKRM after a major overhaul Dave Hansen
2006-04-21 16:58   ` Chandra Seetharaman
2006-04-21 22:57     ` Andrew Morton
2006-04-22  1:48       ` Chandra Seetharaman
2006-04-22  2:13         ` Andrew Morton
2006-04-22  2:20           ` Matt Helsley
2006-04-22  2:33             ` Andrew Morton
2006-04-22  5:28           ` Chandra Seetharaman
2006-04-24  1:10             ` KUROSAWA Takahiro
2006-04-24  4:39               ` Kirill Korotaev
2006-04-24  5:41                 ` KUROSAWA Takahiro
2006-04-24  6:45                   ` Kirill Korotaev
2006-04-24  7:12                     ` KUROSAWA Takahiro
2006-04-24  5:18             ` Hirokazu Takahashi
2006-04-25  1:42               ` Chandra Seetharaman
2006-04-23  6:52           ` Paul Jackson
2006-04-23  9:31             ` Matt Helsley
2006-04-28  1:58           ` Chandra Seetharaman
2006-04-28  6:07             ` Kirill Korotaev
2006-04-28 17:57               ` Chandra Seetharaman
2006-04-24  1:47         ` Hirokazu Takahashi
2006-04-24 20:42           ` Shailabh Nagar
  -- strict thread matches above, loose matches on Subject: below --
2006-04-21 19:07 Al Boldi
2006-04-21 22:04 ` Matt Helsley
     [not found]   ` <200604220708.40018.a1426z@gawab.com>
2006-04-22  5:46     ` Chandra Seetharaman
2006-04-22 20:40       ` Al Boldi
2006-04-23  2:33         ` Matt Helsley
2006-04-23 11:22           ` Al Boldi
2006-04-24 18:23             ` Chandra Seetharaman
2006-04-21 22:09 ` Chandra Seetharaman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox