* [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 5:01 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 5:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
A simple hard-wall hierarhcy support for res_counter.
Changelog v2->v3
- changed the name and arguments of functions.
- rewrote to be read easily.
- named as HardWall hierarchy.
This implements following model
- A cgroup's tree means hierarchy of resource.
- All child's resource is moved from its parents.
- The resource moved to children is charged as parent's usage.
- The resource moves when child->limit is changed.
- The sum of resource for children and its own usage is limited by "limit".
This implies
- No dynamic automatic hierarhcy balancing in the kernel.
- Each resource is isolated completely.
- The kernel just supports resource-move-at-change-in-limit.
- The user (middle-ware) is responsible to make hierarhcy balanced well.
Good balance can be achieved by changing limit from user land.
Background:
Recently, there are popular resource isolation technique widely used,
i.e. Hardware-Virtualization. We can do hierarchical resource isolation
by using cgroup on it. But supporting hierarchy management in croups
has some advantages of performance, unity and costs of management.
There are good resource management in other OSs, they support some kind of
hierarchical resource management. We wonder what kind of hierarchy policy
is good for Linux. And there is an another point. Hierarchical system can be
implemented by the kernel and user-land co-operation. So, there are various
choices to do in the kernel. Doing all in the kernel or export some proper
interfaces to the user-land. Middle-wares are tend to be used for management.
I hope there will be Open Source one.
At supporting hierarchy in cgroup, several aspects of characteristics of
policy of hierarchy can be considered. Some needs automatic balancing
between several groups.
- fairness ... how fairness is kept under policy
- performance ... should be _fast_. multi-level resource balancing tend
to use much amount of CPU and can cause soft lockup.
- predictability ... resource management are usually used for resource
isolation. the kernel must not break the isolation and
predictability of users against application's progress.
- flexibility ... some sophisticated dynamic resource balancing with
soft-limit is welcomed when the user doesn't want strict
resource isolation or when the user cannot estimate how much
they want correctly.
Hard Wall Hierarchy.
This patch implements a hard-wall model of hierarchy for resources.
Works well for users who want strict resource isolation.
This model allows the move of resource only between a parent and its children.
The resource is moved to a child when it declares the amount of resources to be
used. (by limit)
Automatic resource balancing is not supported in this code.
(But users can do non-automatic by changing limit dynamically.)
- fairness ... good. no resource sharing. works as specified by users.
- performance ... good. each resources are capsuled to its own level.
- predictability ... good. resources are completely isolated. balancing only
occurs at the event of changes in limit.
- flexibility ... bad. no flexibility and scheduling in the kernel level.
need middle-ware's help.
Considerations:
- This implementation uses "limit" == "current_available_resource".
This should be revisited when Soft-Limit one is implemented.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Documentation/controllers/resource_counter.txt | 41 +++++++++
include/linux/res_counter.h | 90 +++++++++++++++++++-
kernel/res_counter.c | 112 +++++++++++++++++++++++--
3 files changed, 235 insertions(+), 8 deletions(-)
Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
===================================================================
--- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
+++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
@@ -38,6 +38,16 @@ struct res_counter {
* the number of unsuccessful attempts to consume the resource
*/
unsigned long long failcnt;
+
+ /*
+ * hierarchy support: the parent of this resource.
+ */
+ struct res_counter *parent;
+ /*
+ * the amount of resources assigned to children.
+ */
+ unsigned long long for_children;
+
/*
* the lock to protect all of the above.
* the routines below consider this to be IRQ-safe
@@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
ssize_t res_counter_read(struct res_counter *counter, int member,
const char __user *buf, size_t nbytes, loff_t *pos,
int (*read_strategy)(unsigned long long val, char *s));
+
+/*
+ * An interface for setting res_counter's member (ex. limit)
+ * the new parameter is passed by *buf and translated by write_strategy().
+ * Then, it is applied to member under the control of set_strategy().
+ * If write_strategy() and set_strategy() can be NULL. see res_counter.c
+ */
+
ssize_t res_counter_write(struct res_counter *counter, int member,
- const char __user *buf, size_t nbytes, loff_t *pos,
- int (*write_strategy)(char *buf, unsigned long long *val));
+ const char __user *buf, size_t nbytes, loff_t *pos,
+ int (*write_strategy)(char *buf, unsigned long long *val),
+ int (*set_strategy)(struct res_counter *res, unsigned long long val,
+ int what),
+ );
/*
* the field descriptors. one for each member of res_counter
@@ -76,15 +97,33 @@ enum {
RES_MAX_USAGE,
RES_LIMIT,
RES_FAILCNT,
+ RES_FOR_CHILDREN,
};
/*
* helpers for accounting
*/
+/*
+ * initialize res_counter.
+ * @counter : the counter
+ *
+ * initialize res_counter and set default limit to very big value(unlimited)
+ */
+
void res_counter_init(struct res_counter *counter);
/*
+ * initialize res_counter under hierarchy.
+ * @counter : the counter
+ * @parent : the parent of the counter
+ *
+ * initialize res_counter and set default limit to 0. and set "parent".
+ */
+void res_counter_init_hierarchy(struct res_counter *counter,
+ struct res_counter *parent);
+
+/*
* charge - try to consume more resource.
*
* @counter: the counter
@@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
cnt->failcnt = 0;
spin_unlock_irqrestore(&cnt->lock, flags);
}
+
+/**
+ * Move resources from a parent to a child.
+ * At success,
+ * parent->usage += val.
+ * parent->for_children += val.
+ * child->limit += val.
+ *
+ * @child: an entity to set res->limit. The parent is child->parent.
+ * @val: the amount of resource to be moved.
+ * @callback: called when the parent's free resource is not enough to be moved.
+ * this can be NULL if no callback is necessary.
+ * @retry: limit for the number of trying to callback.
+ * -1 means infinite loop. At each retry, yield() is called.
+ * Returns 0 at success, !0 at failure.
+ *
+ * The callback returns 0 at success, !0 at failure.
+ *
+ */
+
+int res_counter_move_resource(struct res_counter *child,
+ unsigned long long val,
+ int (*callback)(struct res_counter *res, unsigned long long val),
+ int retry);
+
+
+/**
+ * Return resource to its parent.
+ * At success,
+ * parent->usage -= val.
+ * parent->for_children -= val.
+ * child->limit -= val.
+ *
+ * @child: entry to resize. The parent is child->parent.
+ * @val : How much does child repay to parent ? -1 means 'all'
+ * @callback: A callback for decreasing resource usage of child before
+ * returning. If NULL, just deceases child's limit.
+ * @retry: # of retries at calling callback for freeing resource.
+ * -1 means infinite loop. At each retry, yield() is called.
+ * Returns 0 at success.
+ */
+
+int res_counter_return_resource(struct res_counter *child,
+ unsigned long long val,
+ int (*callback)(struct res_counter *res, unsigned long long val),
+ int retry);
+
#endif
Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
===================================================================
--- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
+++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
@@ -44,6 +44,13 @@ to work with it.
Protects changes of the above values.
+ f. struct res_counter *parent
+
+ Parent res_counter under hierarchy.
+
+ g. unsigned long long for_children
+
+ Resources assigned to children. This is included in usage.
2. Basic accounting routines
@@ -179,3 +186,37 @@ counter fields. They are recommended to
still can help with it).
c. Compile and run :)
+
+
+6. Hierarchy
+ a. No Hierarchy
+ each cgroup can use its own private resource.
+
+ b. Hard-wall Hierarhcy
+ A simple hierarchical tree system for resource isolation.
+ Allows moving resources only between a parent and its children.
+ A parent can move its resource to children and remember the amount to
+ for_children member. A child can get new resource only from its parent.
+ Limit of a child is the amount of resource which is moved from its parent.
+
+ When add "val" to a child,
+ parent->usage += val
+ parent->for_children += val
+ child->limit += val
+ When a child returns its resource
+ parent->usage -= val
+ parent->for_children -= val
+ child->limit -= val.
+
+ This implements resource isolation among each group. This works very well
+ when you want to use strict resource isolation.
+
+ Usage Hint:
+ This seems for static resource assignment but dynamic resource re-assignment
+ can be done by resetting "limit" of groups. When you consider "limit" as
+ the amount of allowed _current_ resource, a sophisticated resource management
+ system based on strict resource isolation can be implemented.
+
+c. Soft-wall Hierarchy
+ TBD.
+
Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
===================================================================
--- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
+++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
@@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
counter->limit = (unsigned long long)LLONG_MAX;
}
+void res_counter_init_hierarchy(struct res_counter *counter,
+ struct res_counter *parent)
+{
+ spin_lock_init(&counter->lock);
+ counter->limit = 0;
+ counter->parent = parent;
+}
+
int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
{
if (counter->usage + val > counter->limit) {
@@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
return &counter->limit;
case RES_FAILCNT:
return &counter->failcnt;
+ case RES_FOR_CHILDREN:
+ return &counter->for_children;
};
BUG();
@@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
ssize_t res_counter_write(struct res_counter *counter, int member,
const char __user *userbuf, size_t nbytes, loff_t *pos,
- int (*write_strategy)(char *st_buf, unsigned long long *val))
+ int (*write_strategy)(char *st_buf, unsigned long long *val),
+ int (*set_strategy)(struct res_counter *res,
+ unsigned long long val, int what))
{
int ret;
char *buf, *end;
@@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
if (*end != '\0')
goto out_free;
}
- spin_lock_irqsave(&counter->lock, flags);
- val = res_counter_member(counter, member);
- *val = tmp;
- spin_unlock_irqrestore(&counter->lock, flags);
- ret = nbytes;
+ if (set_strategy) {
+ ret = set_strategy(res, tmp, member);
+ if (!ret)
+ ret = nbytes;
+ } else {
+ spin_lock_irqsave(&counter->lock, flags);
+ val = res_counter_member(counter, member);
+ *val = tmp;
+ spin_unlock_irqrestore(&counter->lock, flags);
+ ret = nbytes;
+ }
out_free:
kfree(buf);
out:
return ret;
}
+
+
+int res_counter_move_resource(struct res_counter *child,
+ unsigned long long val,
+ int (*callback)(struct res_counter *res, unsigned long long val),
+ int retry)
+{
+ struct res_counter *parent = child->parent;
+ unsigned long flags;
+
+ BUG_ON(!parent);
+
+ while (1) {
+ spin_lock_irqsave(&parent->lock, flags);
+ if (parent->usage + val < parent->limit) {
+ parent->for_children += val;
+ parent->usage += val;
+ break;
+ }
+ spin_unlock_irqrestore(&parent->lock, flags);
+
+ if (!retry || !callback)
+ goto failed;
+ /* -1 means infinite loop */
+ if (retry != -1)
+ --retry;
+ yield();
+ callback(parent, val);
+ }
+ spin_unlock_irqrestore(&parent->lock, flags);
+
+ spin_lock_irqsave(&child->lock, flags);
+ child->limit += val;
+ spin_unlock_irqrestore(&child->lock, flags);
+ return 0;
+fail:
+ return 1;
+}
+
+
+int res_counter_return_resource(struct res_counter *child,
+ unsigned long long val,
+ int (*callback)(struct res_counter *res, unsigned long long val),
+ int retry)
+{
+ unsigned long flags;
+ struct res_counter *parent = child->parent;
+
+ BUG_ON(!parent);
+
+ while (1) {
+ spin_lock_irqsave(&child->lock, flags);
+ if (val == (unsigned long long) -1) {
+ val = child->limit;
+ child->limit = 0;
+ break;
+ } else if (child->usage <= child->limit - val) {
+ child->limit -= val;
+ break;
+ }
+ spin_unlock_irqrestore(&child->lock, flags);
+
+ if (!retry)
+ goto fail;
+ /* -1 means infinite loop */
+ if (retry != -1)
+ --retry;
+ yield();
+ callback(parent, val);
+ }
+ spin_unlock_irqrestore(&child->lock, flags);
+
+ spin_lock_irqsave(&parent->lock, flags);
+ BUG_ON(parent->for_children < val);
+ BUG_ON(parent->usage < val);
+ parent->for_children -= val;
+ parent->usage -= val;
+ spin_unlock_irqrestore(&parent->lock, flags);
+ return 0;
+fail:
+ return 1;
+}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 5:01 ` KAMEZAWA Hiroyuki
@ 2008-06-04 6:54 ` Li Zefan
-1 siblings, 0 replies; 52+ messages in thread
From: Li Zefan @ 2008-06-04 6:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -44,6 +44,13 @@ to work with it.
> Protects changes of the above values.
>
>
> + f. struct res_counter *parent
> +
> + Parent res_counter under hierarchy.
> +
> + g. unsigned long long for_children
> +
> + Resources assigned to children. This is included in usage.
Since parent and for_children are also protected by res_count->lock,
the above text should appear before 'e. spinlock_t lock'.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 6:54 ` Li Zefan
0 siblings, 0 replies; 52+ messages in thread
From: Li Zefan @ 2008-06-04 6:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -44,6 +44,13 @@ to work with it.
> Protects changes of the above values.
>
>
> + f. struct res_counter *parent
> +
> + Parent res_counter under hierarchy.
> +
> + g. unsigned long long for_children
> +
> + Resources assigned to children. This is included in usage.
Since parent and for_children are also protected by res_count->lock,
the above text should appear before 'e. spinlock_t lock'.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 6:54 ` Li Zefan
@ 2008-06-04 7:03 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 7:03 UTC (permalink / raw)
To: Li Zefan
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 04 Jun 2008 14:54:12 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -44,6 +44,13 @@ to work with it.
> > Protects changes of the above values.
> >
> >
> > + f. struct res_counter *parent
> > +
> > + Parent res_counter under hierarchy.
> > +
> > + g. unsigned long long for_children
> > +
> > + Resources assigned to children. This is included in usage.
>
> Since parent and for_children are also protected by res_count->lock,
> the above text should appear before 'e. spinlock_t lock'.
>
ok.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 7:03 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 7:03 UTC (permalink / raw)
To: Li Zefan
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 04 Jun 2008 14:54:12 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -44,6 +44,13 @@ to work with it.
> > Protects changes of the above values.
> >
> >
> > + f. struct res_counter *parent
> > +
> > + Parent res_counter under hierarchy.
> > +
> > + g. unsigned long long for_children
> > +
> > + Resources assigned to children. This is included in usage.
>
> Since parent and for_children are also protected by res_count->lock,
> the above text should appear before 'e. spinlock_t lock'.
>
ok.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 5:01 ` KAMEZAWA Hiroyuki
@ 2008-06-04 7:20 ` YAMAMOTO Takashi
-1 siblings, 0 replies; 52+ messages in thread
From: YAMAMOTO Takashi @ 2008-06-04 7:20 UTC (permalink / raw)
To: kamezawa.hiroyu; +Cc: linux-mm, linux-kernel, balbir, menage, xemul
> ssize_t res_counter_write(struct res_counter *counter, int member,
> - const char __user *buf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *buf, unsigned long long *val));
> + const char __user *buf, size_t nbytes, loff_t *pos,
> + int (*write_strategy)(char *buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> + int what),
this comma seems surplus.
> + );
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + callback(parent, val);
s/parent/child/
YAMAMOTO Takashi
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 7:20 ` YAMAMOTO Takashi
0 siblings, 0 replies; 52+ messages in thread
From: YAMAMOTO Takashi @ 2008-06-04 7:20 UTC (permalink / raw)
To: kamezawa.hiroyu; +Cc: linux-mm, linux-kernel, balbir, menage, xemul
> ssize_t res_counter_write(struct res_counter *counter, int member,
> - const char __user *buf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *buf, unsigned long long *val));
> + const char __user *buf, size_t nbytes, loff_t *pos,
> + int (*write_strategy)(char *buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> + int what),
this comma seems surplus.
> + );
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + callback(parent, val);
s/parent/child/
YAMAMOTO Takashi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 7:20 ` YAMAMOTO Takashi
@ 2008-06-04 7:32 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 7:32 UTC (permalink / raw)
To: YAMAMOTO Takashi; +Cc: linux-mm, linux-kernel, balbir, menage, xemul
On Wed, 4 Jun 2008 16:20:48 +0900 (JST)
yamamoto@valinux.co.jp (YAMAMOTO Takashi) wrote:
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > - const char __user *buf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *buf, unsigned long long *val));
> > + const char __user *buf, size_t nbytes, loff_t *pos,
> > + int (*write_strategy)(char *buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> > + int what),
>
> this comma seems surplus.
>
Ouch, I thought I fixed this...maybe patch reflesh trouble. Thanks.
> > + );
>
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry)
> > +{
>
> > + callback(parent, val);
>
> s/parent/child/
>
Hmm..yes.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 7:32 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 7:32 UTC (permalink / raw)
To: YAMAMOTO Takashi; +Cc: linux-mm, linux-kernel, balbir, menage, xemul
On Wed, 4 Jun 2008 16:20:48 +0900 (JST)
yamamoto@valinux.co.jp (YAMAMOTO Takashi) wrote:
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > - const char __user *buf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *buf, unsigned long long *val));
> > + const char __user *buf, size_t nbytes, loff_t *pos,
> > + int (*write_strategy)(char *buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> > + int what),
>
> this comma seems surplus.
>
Ouch, I thought I fixed this...maybe patch reflesh trouble. Thanks.
> > + );
>
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry)
> > +{
>
> > + callback(parent, val);
>
> s/parent/child/
>
Hmm..yes.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 5:01 ` KAMEZAWA Hiroyuki
@ 2008-06-04 8:59 ` Paul Menage
-1 siblings, 0 replies; 52+ messages in thread
From: Paul Menage @ 2008-06-04 8:59 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, yamamoto@valinux.co.jp
On Tue, Jun 3, 2008 at 10:01 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
I think that the hierarchy/reclaim handling that you currently have in
the memory controller should be here; the memory controller should
just be able to pass a reference to try_to_free_mem_cgroup_pages() and
have everything else handled by res_counter.
Paul
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 8:59 ` Paul Menage
0 siblings, 0 replies; 52+ messages in thread
From: Paul Menage @ 2008-06-04 8:59 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, yamamoto@valinux.co.jp
On Tue, Jun 3, 2008 at 10:01 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
I think that the hierarchy/reclaim handling that you currently have in
the memory controller should be here; the memory controller should
just be able to pass a reference to try_to_free_mem_cgroup_pages() and
have everything else handled by res_counter.
Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 8:59 ` Paul Menage
@ 2008-06-04 9:18 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 9:18 UTC (permalink / raw)
To: Paul Menage
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 4 Jun 2008 01:59:31 -0700
"Paul Menage" <menage@google.com> wrote:
> On Tue, Jun 3, 2008 at 10:01 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > int ret;
> > char *buf, *end;
> > @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> > if (*end != '\0')
> > goto out_free;
> > }
> > - spin_lock_irqsave(&counter->lock, flags);
> > - val = res_counter_member(counter, member);
> > - *val = tmp;
> > - spin_unlock_irqrestore(&counter->lock, flags);
> > - ret = nbytes;
> > + if (set_strategy) {
> > + ret = set_strategy(res, tmp, member);
> > + if (!ret)
> > + ret = nbytes;
> > + } else {
> > + spin_lock_irqsave(&counter->lock, flags);
> > + val = res_counter_member(counter, member);
> > + *val = tmp;
> > + spin_unlock_irqrestore(&counter->lock, flags);
> > + ret = nbytes;
> > + }
>
> I think that the hierarchy/reclaim handling that you currently have in
> the memory controller should be here; the memory controller should
> just be able to pass a reference to try_to_free_mem_cgroup_pages() and
> have everything else handled by res_counter.
>
Sounds reasonable. I'll re-design the whole AMAP. I think I can do more.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-04 9:18 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04 9:18 UTC (permalink / raw)
To: Paul Menage
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 4 Jun 2008 01:59:31 -0700
"Paul Menage" <menage@google.com> wrote:
> On Tue, Jun 3, 2008 at 10:01 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > int ret;
> > char *buf, *end;
> > @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> > if (*end != '\0')
> > goto out_free;
> > }
> > - spin_lock_irqsave(&counter->lock, flags);
> > - val = res_counter_member(counter, member);
> > - *val = tmp;
> > - spin_unlock_irqrestore(&counter->lock, flags);
> > - ret = nbytes;
> > + if (set_strategy) {
> > + ret = set_strategy(res, tmp, member);
> > + if (!ret)
> > + ret = nbytes;
> > + } else {
> > + spin_lock_irqsave(&counter->lock, flags);
> > + val = res_counter_member(counter, member);
> > + *val = tmp;
> > + spin_unlock_irqrestore(&counter->lock, flags);
> > + ret = nbytes;
> > + }
>
> I think that the hierarchy/reclaim handling that you currently have in
> the memory controller should be here; the memory controller should
> just be able to pass a reference to try_to_free_mem_cgroup_pages() and
> have everything else handled by res_counter.
>
Sounds reasonable. I'll re-design the whole AMAP. I think I can do more.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 5:01 ` KAMEZAWA Hiroyuki
@ 2008-06-09 9:48 ` Balbir Singh
-1 siblings, 0 replies; 52+ messages in thread
From: Balbir Singh @ 2008-06-09 9:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
KAMEZAWA Hiroyuki wrote:
> A simple hard-wall hierarhcy support for res_counter.
>
> Changelog v2->v3
> - changed the name and arguments of functions.
> - rewrote to be read easily.
> - named as HardWall hierarchy.
>
> This implements following model
> - A cgroup's tree means hierarchy of resource.
> - All child's resource is moved from its parents.
> - The resource moved to children is charged as parent's usage.
> - The resource moves when child->limit is changed.
> - The sum of resource for children and its own usage is limited by "limit".
>
> This implies
> - No dynamic automatic hierarhcy balancing in the kernel.
> - Each resource is isolated completely.
> - The kernel just supports resource-move-at-change-in-limit.
> - The user (middle-ware) is responsible to make hierarhcy balanced well.
We'd definitely like to see a user level tool/application as a demo of how this
can be achieved.
> Good balance can be achieved by changing limit from user land.
>
>
> Background:
> Recently, there are popular resource isolation technique widely used,
> i.e. Hardware-Virtualization. We can do hierarchical resource isolation
> by using cgroup on it. But supporting hierarchy management in croups
> has some advantages of performance, unity and costs of management.
>
> There are good resource management in other OSs, they support some kind of
> hierarchical resource management. We wonder what kind of hierarchy policy
> is good for Linux. And there is an another point. Hierarchical system can be
> implemented by the kernel and user-land co-operation. So, there are various
> choices to do in the kernel. Doing all in the kernel or export some proper
> interfaces to the user-land. Middle-wares are tend to be used for management.
> I hope there will be Open Source one.
>
> At supporting hierarchy in cgroup, several aspects of characteristics of
> policy of hierarchy can be considered. Some needs automatic balancing
> between several groups.
>
> - fairness ... how fairness is kept under policy
>
> - performance ... should be _fast_. multi-level resource balancing tend
> to use much amount of CPU and can cause soft lockup.
>
> - predictability ... resource management are usually used for resource
> isolation. the kernel must not break the isolation and
> predictability of users against application's progress.
>
> - flexibility ... some sophisticated dynamic resource balancing with
> soft-limit is welcomed when the user doesn't want strict
> resource isolation or when the user cannot estimate how much
> they want correctly.
Soft limits has been on my plate for a while now. I'll take a crack at it. At
the moment the statistics is a bit of a worry, since users/administrators need
good statistics to take further action.
>
> Hard Wall Hierarchy.
>
> This patch implements a hard-wall model of hierarchy for resources.
> Works well for users who want strict resource isolation.
>
> This model allows the move of resource only between a parent and its children.
> The resource is moved to a child when it declares the amount of resources to be
> used. (by limit)
The other reason for preferring a shares based approach is that, the it will be
more in line with the CPU controllers interfaces.
> Automatic resource balancing is not supported in this code.
> (But users can do non-automatic by changing limit dynamically.)
>
> - fairness ... good. no resource sharing. works as specified by users.
> - performance ... good. each resources are capsuled to its own level.
> - predictability ... good. resources are completely isolated. balancing only
> occurs at the event of changes in limit.
> - flexibility ... bad. no flexibility and scheduling in the kernel level.
> need middle-ware's help.
>
> Considerations:
> - This implementation uses "limit" == "current_available_resource".
> This should be revisited when Soft-Limit one is implemented.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
> Documentation/controllers/resource_counter.txt | 41 +++++++++
> include/linux/res_counter.h | 90 +++++++++++++++++++-
> kernel/res_counter.c | 112 +++++++++++++++++++++++--
> 3 files changed, 235 insertions(+), 8 deletions(-)
>
> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> @@ -38,6 +38,16 @@ struct res_counter {
> * the number of unsuccessful attempts to consume the resource
> */
> unsigned long long failcnt;
> +
> + /*
> + * hierarchy support: the parent of this resource.
> + */
> + struct res_counter *parent;
> + /*
> + * the amount of resources assigned to children.
> + */
> + unsigned long long for_children;
> +
I would prefer to use a better name, lent_out? reserved_for_children?
borrowed_by_children?
> /*
> * the lock to protect all of the above.
> * the routines below consider this to be IRQ-safe
> @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
> ssize_t res_counter_read(struct res_counter *counter, int member,
> const char __user *buf, size_t nbytes, loff_t *pos,
> int (*read_strategy)(unsigned long long val, char *s));
> +
> +/*
> + * An interface for setting res_counter's member (ex. limit)
> + * the new parameter is passed by *buf and translated by write_strategy().
> + * Then, it is applied to member under the control of set_strategy().
> + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
> + */
> +
> ssize_t res_counter_write(struct res_counter *counter, int member,
> - const char __user *buf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *buf, unsigned long long *val));
> + const char __user *buf, size_t nbytes, loff_t *pos,
> + int (*write_strategy)(char *buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> + int what),
> + );
>
> /*
> * the field descriptors. one for each member of res_counter
> @@ -76,15 +97,33 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_FOR_CHILDREN,
RES_BORROWED? RES_BORROWED_BY_CHILDREN?
> };
>
> /*
> * helpers for accounting
> */
>
> +/*
> + * initialize res_counter.
> + * @counter : the counter
> + *
> + * initialize res_counter and set default limit to very big value(unlimited)
> + */
> +
> void res_counter_init(struct res_counter *counter);
>
> /*
> + * initialize res_counter under hierarchy.
> + * @counter : the counter
> + * @parent : the parent of the counter
> + *
> + * initialize res_counter and set default limit to 0. and set "parent".
> + */
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent);
> +
> +/*
> * charge - try to consume more resource.
> *
> * @counter: the counter
> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> cnt->failcnt = 0;
> spin_unlock_irqrestore(&cnt->lock, flags);
> }
> +
> +/**
> + * Move resources from a parent to a child.
> + * At success,
> + * parent->usage += val.
> + * parent->for_children += val.
> + * child->limit += val.
> + *
> + * @child: an entity to set res->limit. The parent is child->parent.
> + * @val: the amount of resource to be moved.
> + * @callback: called when the parent's free resource is not enough to be moved.
> + * this can be NULL if no callback is necessary.
> + * @retry: limit for the number of trying to callback.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success, !0 at failure.
> + *
> + * The callback returns 0 at success, !0 at failure.
> + *
> + */
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> +
> +/**
> + * Return resource to its parent.
> + * At success,
> + * parent->usage -= val.
> + * parent->for_children -= val.
> + * child->limit -= val.
> + *
> + * @child: entry to resize. The parent is child->parent.
> + * @val : How much does child repay to parent ? -1 means 'all'
> + * @callback: A callback for decreasing resource usage of child before
> + * returning. If NULL, just deceases child's limit.
> + * @retry: # of retries at calling callback for freeing resource.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success.
> + */
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> #endif
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -44,6 +44,13 @@ to work with it.
> Protects changes of the above values.
>
>
> + f. struct res_counter *parent
> +
> + Parent res_counter under hierarchy.
> +
> + g. unsigned long long for_children
> +
> + Resources assigned to children. This is included in usage.
>
> 2. Basic accounting routines
>
> @@ -179,3 +186,37 @@ counter fields. They are recommended to
> still can help with it).
>
> c. Compile and run :)
> +
> +
> +6. Hierarchy
> + a. No Hierarchy
> + each cgroup can use its own private resource.
> +
> + b. Hard-wall Hierarhcy
> + A simple hierarchical tree system for resource isolation.
> + Allows moving resources only between a parent and its children.
> + A parent can move its resource to children and remember the amount to
> + for_children member. A child can get new resource only from its parent.
> + Limit of a child is the amount of resource which is moved from its parent.
> +
OK, after reading this I am totally sure I want a shares based interface. Limits
are not shared like this.
A child and a parent should both be capable of having a limit of 1G, but they
could use different shares factors to govern, how much each children will get.
Doing it this way, breaks limit semantics.
> + When add "val" to a child,
> + parent->usage += val
> + parent->for_children += val
> + child->limit += val
> + When a child returns its resource
> + parent->usage -= val
> + parent->for_children -= val
> + child->limit -= val.
> +
> + This implements resource isolation among each group. This works very well
> + when you want to use strict resource isolation.
> +
> + Usage Hint:
> + This seems for static resource assignment but dynamic resource re-assignment
> + can be done by resetting "limit" of groups. When you consider "limit" as
> + the amount of allowed _current_ resource, a sophisticated resource management
> + system based on strict resource isolation can be implemented.
> +
> +c. Soft-wall Hierarchy
> + TBD.
> +
> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> counter->limit = (unsigned long long)LLONG_MAX;
> }
>
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent)
> +{
> + spin_lock_init(&counter->lock);
> + counter->limit = 0;
> + counter->parent = parent;
> +}
> +
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> {
> if (counter->usage + val > counter->limit) {
> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_FOR_CHILDREN:
> + return &counter->for_children;
> };
>
> BUG();
> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>
> ssize_t res_counter_write(struct res_counter *counter, int member,
> const char __user *userbuf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *st_buf, unsigned long long *val))
> + int (*write_strategy)(char *st_buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res,
> + unsigned long long val, int what))
> {
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
I'm afraid, I don't understand the set_strategy and it's purpose.
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
> out_free:
> kfree(buf);
> out:
> return ret;
> }
> +
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + struct res_counter *parent = child->parent;
> + unsigned long flags;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&parent->lock, flags);
> + if (parent->usage + val < parent->limit) {
> + parent->for_children += val;
> + parent->usage += val;
> + break;
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + if (!retry || !callback)
> + goto failed;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
I don't like the idea of spinning in an infinite loop, I would prefer to fail
things instead of burning CPU cycles.
> + yield();
> + callback(parent, val);
This code is not very understandable. Why do we yield before callback?
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + spin_lock_irqsave(&child->lock, flags);
> + child->limit += val;
> + spin_unlock_irqrestore(&child->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> +
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + unsigned long flags;
> + struct res_counter *parent = child->parent;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&child->lock, flags);
> + if (val == (unsigned long long) -1) {
> + val = child->limit;
> + child->limit = 0;
> + break;
> + } else if (child->usage <= child->limit - val) {
> + child->limit -= val;
> + break;
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + if (!retry)
> + goto fail;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
Ditto comments as above.
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + spin_lock_irqsave(&parent->lock, flags);
> + BUG_ON(parent->for_children < val);
> + BUG_ON(parent->usage < val);
> + parent->for_children -= val;
> + parent->usage -= val;
> + spin_unlock_irqrestore(&parent->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-09 9:48 ` Balbir Singh
0 siblings, 0 replies; 52+ messages in thread
From: Balbir Singh @ 2008-06-09 9:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
KAMEZAWA Hiroyuki wrote:
> A simple hard-wall hierarhcy support for res_counter.
>
> Changelog v2->v3
> - changed the name and arguments of functions.
> - rewrote to be read easily.
> - named as HardWall hierarchy.
>
> This implements following model
> - A cgroup's tree means hierarchy of resource.
> - All child's resource is moved from its parents.
> - The resource moved to children is charged as parent's usage.
> - The resource moves when child->limit is changed.
> - The sum of resource for children and its own usage is limited by "limit".
>
> This implies
> - No dynamic automatic hierarhcy balancing in the kernel.
> - Each resource is isolated completely.
> - The kernel just supports resource-move-at-change-in-limit.
> - The user (middle-ware) is responsible to make hierarhcy balanced well.
We'd definitely like to see a user level tool/application as a demo of how this
can be achieved.
> Good balance can be achieved by changing limit from user land.
>
>
> Background:
> Recently, there are popular resource isolation technique widely used,
> i.e. Hardware-Virtualization. We can do hierarchical resource isolation
> by using cgroup on it. But supporting hierarchy management in croups
> has some advantages of performance, unity and costs of management.
>
> There are good resource management in other OSs, they support some kind of
> hierarchical resource management. We wonder what kind of hierarchy policy
> is good for Linux. And there is an another point. Hierarchical system can be
> implemented by the kernel and user-land co-operation. So, there are various
> choices to do in the kernel. Doing all in the kernel or export some proper
> interfaces to the user-land. Middle-wares are tend to be used for management.
> I hope there will be Open Source one.
>
> At supporting hierarchy in cgroup, several aspects of characteristics of
> policy of hierarchy can be considered. Some needs automatic balancing
> between several groups.
>
> - fairness ... how fairness is kept under policy
>
> - performance ... should be _fast_. multi-level resource balancing tend
> to use much amount of CPU and can cause soft lockup.
>
> - predictability ... resource management are usually used for resource
> isolation. the kernel must not break the isolation and
> predictability of users against application's progress.
>
> - flexibility ... some sophisticated dynamic resource balancing with
> soft-limit is welcomed when the user doesn't want strict
> resource isolation or when the user cannot estimate how much
> they want correctly.
Soft limits has been on my plate for a while now. I'll take a crack at it. At
the moment the statistics is a bit of a worry, since users/administrators need
good statistics to take further action.
>
> Hard Wall Hierarchy.
>
> This patch implements a hard-wall model of hierarchy for resources.
> Works well for users who want strict resource isolation.
>
> This model allows the move of resource only between a parent and its children.
> The resource is moved to a child when it declares the amount of resources to be
> used. (by limit)
The other reason for preferring a shares based approach is that, the it will be
more in line with the CPU controllers interfaces.
> Automatic resource balancing is not supported in this code.
> (But users can do non-automatic by changing limit dynamically.)
>
> - fairness ... good. no resource sharing. works as specified by users.
> - performance ... good. each resources are capsuled to its own level.
> - predictability ... good. resources are completely isolated. balancing only
> occurs at the event of changes in limit.
> - flexibility ... bad. no flexibility and scheduling in the kernel level.
> need middle-ware's help.
>
> Considerations:
> - This implementation uses "limit" == "current_available_resource".
> This should be revisited when Soft-Limit one is implemented.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
> Documentation/controllers/resource_counter.txt | 41 +++++++++
> include/linux/res_counter.h | 90 +++++++++++++++++++-
> kernel/res_counter.c | 112 +++++++++++++++++++++++--
> 3 files changed, 235 insertions(+), 8 deletions(-)
>
> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> @@ -38,6 +38,16 @@ struct res_counter {
> * the number of unsuccessful attempts to consume the resource
> */
> unsigned long long failcnt;
> +
> + /*
> + * hierarchy support: the parent of this resource.
> + */
> + struct res_counter *parent;
> + /*
> + * the amount of resources assigned to children.
> + */
> + unsigned long long for_children;
> +
I would prefer to use a better name, lent_out? reserved_for_children?
borrowed_by_children?
> /*
> * the lock to protect all of the above.
> * the routines below consider this to be IRQ-safe
> @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
> ssize_t res_counter_read(struct res_counter *counter, int member,
> const char __user *buf, size_t nbytes, loff_t *pos,
> int (*read_strategy)(unsigned long long val, char *s));
> +
> +/*
> + * An interface for setting res_counter's member (ex. limit)
> + * the new parameter is passed by *buf and translated by write_strategy().
> + * Then, it is applied to member under the control of set_strategy().
> + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
> + */
> +
> ssize_t res_counter_write(struct res_counter *counter, int member,
> - const char __user *buf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *buf, unsigned long long *val));
> + const char __user *buf, size_t nbytes, loff_t *pos,
> + int (*write_strategy)(char *buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> + int what),
> + );
>
> /*
> * the field descriptors. one for each member of res_counter
> @@ -76,15 +97,33 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_FOR_CHILDREN,
RES_BORROWED? RES_BORROWED_BY_CHILDREN?
> };
>
> /*
> * helpers for accounting
> */
>
> +/*
> + * initialize res_counter.
> + * @counter : the counter
> + *
> + * initialize res_counter and set default limit to very big value(unlimited)
> + */
> +
> void res_counter_init(struct res_counter *counter);
>
> /*
> + * initialize res_counter under hierarchy.
> + * @counter : the counter
> + * @parent : the parent of the counter
> + *
> + * initialize res_counter and set default limit to 0. and set "parent".
> + */
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent);
> +
> +/*
> * charge - try to consume more resource.
> *
> * @counter: the counter
> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> cnt->failcnt = 0;
> spin_unlock_irqrestore(&cnt->lock, flags);
> }
> +
> +/**
> + * Move resources from a parent to a child.
> + * At success,
> + * parent->usage += val.
> + * parent->for_children += val.
> + * child->limit += val.
> + *
> + * @child: an entity to set res->limit. The parent is child->parent.
> + * @val: the amount of resource to be moved.
> + * @callback: called when the parent's free resource is not enough to be moved.
> + * this can be NULL if no callback is necessary.
> + * @retry: limit for the number of trying to callback.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success, !0 at failure.
> + *
> + * The callback returns 0 at success, !0 at failure.
> + *
> + */
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> +
> +/**
> + * Return resource to its parent.
> + * At success,
> + * parent->usage -= val.
> + * parent->for_children -= val.
> + * child->limit -= val.
> + *
> + * @child: entry to resize. The parent is child->parent.
> + * @val : How much does child repay to parent ? -1 means 'all'
> + * @callback: A callback for decreasing resource usage of child before
> + * returning. If NULL, just deceases child's limit.
> + * @retry: # of retries at calling callback for freeing resource.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success.
> + */
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> #endif
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -44,6 +44,13 @@ to work with it.
> Protects changes of the above values.
>
>
> + f. struct res_counter *parent
> +
> + Parent res_counter under hierarchy.
> +
> + g. unsigned long long for_children
> +
> + Resources assigned to children. This is included in usage.
>
> 2. Basic accounting routines
>
> @@ -179,3 +186,37 @@ counter fields. They are recommended to
> still can help with it).
>
> c. Compile and run :)
> +
> +
> +6. Hierarchy
> + a. No Hierarchy
> + each cgroup can use its own private resource.
> +
> + b. Hard-wall Hierarhcy
> + A simple hierarchical tree system for resource isolation.
> + Allows moving resources only between a parent and its children.
> + A parent can move its resource to children and remember the amount to
> + for_children member. A child can get new resource only from its parent.
> + Limit of a child is the amount of resource which is moved from its parent.
> +
OK, after reading this I am totally sure I want a shares based interface. Limits
are not shared like this.
A child and a parent should both be capable of having a limit of 1G, but they
could use different shares factors to govern, how much each children will get.
Doing it this way, breaks limit semantics.
> + When add "val" to a child,
> + parent->usage += val
> + parent->for_children += val
> + child->limit += val
> + When a child returns its resource
> + parent->usage -= val
> + parent->for_children -= val
> + child->limit -= val.
> +
> + This implements resource isolation among each group. This works very well
> + when you want to use strict resource isolation.
> +
> + Usage Hint:
> + This seems for static resource assignment but dynamic resource re-assignment
> + can be done by resetting "limit" of groups. When you consider "limit" as
> + the amount of allowed _current_ resource, a sophisticated resource management
> + system based on strict resource isolation can be implemented.
> +
> +c. Soft-wall Hierarchy
> + TBD.
> +
> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> counter->limit = (unsigned long long)LLONG_MAX;
> }
>
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent)
> +{
> + spin_lock_init(&counter->lock);
> + counter->limit = 0;
> + counter->parent = parent;
> +}
> +
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> {
> if (counter->usage + val > counter->limit) {
> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_FOR_CHILDREN:
> + return &counter->for_children;
> };
>
> BUG();
> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>
> ssize_t res_counter_write(struct res_counter *counter, int member,
> const char __user *userbuf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *st_buf, unsigned long long *val))
> + int (*write_strategy)(char *st_buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res,
> + unsigned long long val, int what))
> {
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
I'm afraid, I don't understand the set_strategy and it's purpose.
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
> out_free:
> kfree(buf);
> out:
> return ret;
> }
> +
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + struct res_counter *parent = child->parent;
> + unsigned long flags;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&parent->lock, flags);
> + if (parent->usage + val < parent->limit) {
> + parent->for_children += val;
> + parent->usage += val;
> + break;
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + if (!retry || !callback)
> + goto failed;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
I don't like the idea of spinning in an infinite loop, I would prefer to fail
things instead of burning CPU cycles.
> + yield();
> + callback(parent, val);
This code is not very understandable. Why do we yield before callback?
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + spin_lock_irqsave(&child->lock, flags);
> + child->limit += val;
> + spin_unlock_irqrestore(&child->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> +
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + unsigned long flags;
> + struct res_counter *parent = child->parent;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&child->lock, flags);
> + if (val == (unsigned long long) -1) {
> + val = child->limit;
> + child->limit = 0;
> + break;
> + } else if (child->usage <= child->limit - val) {
> + child->limit -= val;
> + break;
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + if (!retry)
> + goto fail;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
Ditto comments as above.
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + spin_lock_irqsave(&parent->lock, flags);
> + BUG_ON(parent->for_children < val);
> + BUG_ON(parent->usage < val);
> + parent->for_children -= val;
> + parent->usage -= val;
> + spin_unlock_irqrestore(&parent->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-09 9:48 ` Balbir Singh
@ 2008-06-09 10:20 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-09 10:20 UTC (permalink / raw)
To: balbir
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
On Mon, 09 Jun 2008 15:18:47 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > A simple hard-wall hierarhcy support for res_counter.
> >
> > Changelog v2->v3
> > - changed the name and arguments of functions.
> > - rewrote to be read easily.
> > - named as HardWall hierarchy.
> >
> > This implements following model
> > - A cgroup's tree means hierarchy of resource.
> > - All child's resource is moved from its parents.
> > - The resource moved to children is charged as parent's usage.
> > - The resource moves when child->limit is changed.
> > - The sum of resource for children and its own usage is limited by "limit".
> >
> > This implies
> > - No dynamic automatic hierarhcy balancing in the kernel.
> > - Each resource is isolated completely.
> > - The kernel just supports resource-move-at-change-in-limit.
> > - The user (middle-ware) is responsible to make hierarhcy balanced well.
>
> We'd definitely like to see a user level tool/application as a demo of how this
> can be achieved.
>
I don't have one, now. I'll write one when I have time. Need now ?
Hmm...maybe I(we) need some more patches to implement useful statistics,
notifier to middlewares.
> > Good balance can be achieved by changing limit from user land.
> >
> >
> > Background:
> > Recently, there are popular resource isolation technique widely used,
> > i.e. Hardware-Virtualization. We can do hierarchical resource isolation
> > by using cgroup on it. But supporting hierarchy management in croups
> > has some advantages of performance, unity and costs of management.
> >
> > There are good resource management in other OSs, they support some kind of
> > hierarchical resource management. We wonder what kind of hierarchy policy
> > is good for Linux. And there is an another point. Hierarchical system can be
> > implemented by the kernel and user-land co-operation. So, there are various
> > choices to do in the kernel. Doing all in the kernel or export some proper
> > interfaces to the user-land. Middle-wares are tend to be used for management.
> > I hope there will be Open Source one.
> >
> > At supporting hierarchy in cgroup, several aspects of characteristics of
> > policy of hierarchy can be considered. Some needs automatic balancing
> > between several groups.
> >
> > - fairness ... how fairness is kept under policy
> >
> > - performance ... should be _fast_. multi-level resource balancing tend
> > to use much amount of CPU and can cause soft lockup.
> >
> > - predictability ... resource management are usually used for resource
> > isolation. the kernel must not break the isolation and
> > predictability of users against application's progress.
> >
> > - flexibility ... some sophisticated dynamic resource balancing with
> > soft-limit is welcomed when the user doesn't want strict
> > resource isolation or when the user cannot estimate how much
> > they want correctly.
>
> Soft limits has been on my plate for a while now. I'll take a crack at it. At
> the moment the statistics is a bit of a worry, since users/administrators need
> good statistics to take further action.
>
Yes, statistics is not enough now.
> >
> > Hard Wall Hierarchy.
> >
> > This patch implements a hard-wall model of hierarchy for resources.
> > Works well for users who want strict resource isolation.
> >
> > This model allows the move of resource only between a parent and its children.
> > The resource is moved to a child when it declares the amount of resources to be
> > used. (by limit)
>
> The other reason for preferring a shares based approach is that, the it will be
> more in line with the CPU controllers interfaces.
>
You have to think of the major difference of tha nature of CPU and Memory.
We have to reclaim the resource with some feedbacks among sevral cgroups.
But ok, if it's can be implemented in simple way.
I have no objections if cost is very low. My concern is only performance.
(and maintenance)
> > Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> > +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> > @@ -38,6 +38,16 @@ struct res_counter {
> > * the number of unsuccessful attempts to consume the resource
> > */
> > unsigned long long failcnt;
> > +
> > + /*
> > + * hierarchy support: the parent of this resource.
> > + */
> > + struct res_counter *parent;
> > + /*
> > + * the amount of resources assigned to children.
> > + */
> > + unsigned long long for_children;
> > +
>
> I would prefer to use a better name, lent_out? reserved_for_children?
> borrowed_by_children?
>
ok. use other names.
> > /*
> > * the lock to protect all of the above.
> > * the routines below consider this to be IRQ-safe
> > @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
> > ssize_t res_counter_read(struct res_counter *counter, int member,
> > const char __user *buf, size_t nbytes, loff_t *pos,
> > int (*read_strategy)(unsigned long long val, char *s));
> > +
> > +/*
> > + * An interface for setting res_counter's member (ex. limit)
> > + * the new parameter is passed by *buf and translated by write_strategy().
> > + * Then, it is applied to member under the control of set_strategy().
> > + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
> > + */
> > +
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > - const char __user *buf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *buf, unsigned long long *val));
> > + const char __user *buf, size_t nbytes, loff_t *pos,
> > + int (*write_strategy)(char *buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> > + int what),
> > + );
> >
> > /*
> > * the field descriptors. one for each member of res_counter
> > @@ -76,15 +97,33 @@ enum {
> > RES_MAX_USAGE,
> > RES_LIMIT,
> > RES_FAILCNT,
> > + RES_FOR_CHILDREN,
>
> RES_BORROWED? RES_BORROWED_BY_CHILDREN?
>
ok, again.
> > };
> >
> > /*
> > * helpers for accounting
> > */
> >
> > +/*
> > + * initialize res_counter.
> > + * @counter : the counter
> > + *
> > + * initialize res_counter and set default limit to very big value(unlimited)
> > + */
> > +
> > void res_counter_init(struct res_counter *counter);
> >
> > /*
> > + * initialize res_counter under hierarchy.
> > + * @counter : the counter
> > + * @parent : the parent of the counter
> > + *
> > + * initialize res_counter and set default limit to 0. and set "parent".
> > + */
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent);
> > +
> > +/*
> > * charge - try to consume more resource.
> > *
> > * @counter: the counter
> > @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> > cnt->failcnt = 0;
> > spin_unlock_irqrestore(&cnt->lock, flags);
> > }
> > +
> > +/**
> > + * Move resources from a parent to a child.
> > + * At success,
> > + * parent->usage += val.
> > + * parent->for_children += val.
> > + * child->limit += val.
> > + *
> > + * @child: an entity to set res->limit. The parent is child->parent.
> > + * @val: the amount of resource to be moved.
> > + * @callback: called when the parent's free resource is not enough to be moved.
> > + * this can be NULL if no callback is necessary.
> > + * @retry: limit for the number of trying to callback.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success, !0 at failure.
> > + *
> > + * The callback returns 0 at success, !0 at failure.
> > + *
> > + */
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > +
> > +/**
> > + * Return resource to its parent.
> > + * At success,
> > + * parent->usage -= val.
> > + * parent->for_children -= val.
> > + * child->limit -= val.
> > + *
> > + * @child: entry to resize. The parent is child->parent.
> > + * @val : How much does child repay to parent ? -1 means 'all'
> > + * @callback: A callback for decreasing resource usage of child before
> > + * returning. If NULL, just deceases child's limit.
> > + * @retry: # of retries at calling callback for freeing resource.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success.
> > + */
> > +
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > #endif
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -44,6 +44,13 @@ to work with it.
> > Protects changes of the above values.
> >
> >
> > + f. struct res_counter *parent
> > +
> > + Parent res_counter under hierarchy.
> > +
> > + g. unsigned long long for_children
> > +
> > + Resources assigned to children. This is included in usage.
> >
> > 2. Basic accounting routines
> >
> > @@ -179,3 +186,37 @@ counter fields. They are recommended to
> > still can help with it).
> >
> > c. Compile and run :)
> > +
> > +
> > +6. Hierarchy
> > + a. No Hierarchy
> > + each cgroup can use its own private resource.
> > +
> > + b. Hard-wall Hierarhcy
> > + A simple hierarchical tree system for resource isolation.
> > + Allows moving resources only between a parent and its children.
> > + A parent can move its resource to children and remember the amount to
> > + for_children member. A child can get new resource only from its parent.
> > + Limit of a child is the amount of resource which is moved from its parent.
> > +
>
> OK, after reading this I am totally sure I want a shares based interface. Limits
> are not shared like this.
>
> A child and a parent should both be capable of having a limit of 1G, but they
> could use different shares factors to govern, how much each children will get.
> Doing it this way, breaks limit semantics.
>
Not easy to use in my point of view. Can we use 'share' in proper way
on no-swap machine ?
>
> > + When add "val" to a child,
> > + parent->usage += val
> > + parent->for_children += val
> > + child->limit += val
> > + When a child returns its resource
> > + parent->usage -= val
> > + parent->for_children -= val
> > + child->limit -= val.
> > +
> > + This implements resource isolation among each group. This works very well
> > + when you want to use strict resource isolation.
> > +
> > + Usage Hint:
> > + This seems for static resource assignment but dynamic resource re-assignment
> > + can be done by resetting "limit" of groups. When you consider "limit" as
> > + the amount of allowed _current_ resource, a sophisticated resource management
> > + system based on strict resource isolation can be implemented.
> > +
> > +c. Soft-wall Hierarchy
> > + TBD.
> > +
> > Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> > +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> > @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> > counter->limit = (unsigned long long)LLONG_MAX;
> > }
> >
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent)
> > +{
> > + spin_lock_init(&counter->lock);
> > + counter->limit = 0;
> > + counter->parent = parent;
> > +}
> > +
> > int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > {
> > if (counter->usage + val > counter->limit) {
> > @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> > return &counter->limit;
> > case RES_FAILCNT:
> > return &counter->failcnt;
> > + case RES_FOR_CHILDREN:
> > + return &counter->for_children;
> > };
> >
> > BUG();
> > @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
> >
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > const char __user *userbuf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *st_buf, unsigned long long *val))
> > + int (*write_strategy)(char *st_buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res,
> > + unsigned long long val, int what))
> > {
> > int ret;
> > char *buf, *end;
> > @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> > if (*end != '\0')
> > goto out_free;
> > }
> > - spin_lock_irqsave(&counter->lock, flags);
> > - val = res_counter_member(counter, member);
> > - *val = tmp;
> > - spin_unlock_irqrestore(&counter->lock, flags);
> > - ret = nbytes;
> > + if (set_strategy) {
> > + ret = set_strategy(res, tmp, member);
>
>
> I'm afraid, I don't understand the set_strategy and it's purpose.
>
Sorry. I'm now rewritten and removed this.
> > + if (!ret)
> > + ret = nbytes;
> > + } else {
> > + spin_lock_irqsave(&counter->lock, flags);
> > + val = res_counter_member(counter, member);
> > + *val = tmp;
> > + spin_unlock_irqrestore(&counter->lock, flags);
> > + ret = nbytes;
> > + }
> > out_free:
> > kfree(buf);
> > out:
> > return ret;
> > }
> > +
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry)
> > +{
> > + struct res_counter *parent = child->parent;
> > + unsigned long flags;
> > +
> > + BUG_ON(!parent);
> > +
> > + while (1) {
> > + spin_lock_irqsave(&parent->lock, flags);
> > + if (parent->usage + val < parent->limit) {
> > + parent->for_children += val;
> > + parent->usage += val;
> > + break;
> > + }
> > + spin_unlock_irqrestore(&parent->lock, flags);
> > +
> > + if (!retry || !callback)
> > + goto failed;
> > + /* -1 means infinite loop */
> > + if (retry != -1)
> > + --retry;
>
> I don't like the idea of spinning in an infinite loop, I would prefer to fail
> things instead of burning CPU cycles.
>
ok, will remove "-1" case.
> > + yield();
> > + callback(parent, val);
>
> This code is not very understandable. Why do we yield before callback?
>
yield() after callback() means that res_counter's state will be
far different from the state after callback.
So, we have to yield before call back and check res_coutner sooner.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-09 10:20 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-09 10:20 UTC (permalink / raw)
To: balbir
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
On Mon, 09 Jun 2008 15:18:47 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > A simple hard-wall hierarhcy support for res_counter.
> >
> > Changelog v2->v3
> > - changed the name and arguments of functions.
> > - rewrote to be read easily.
> > - named as HardWall hierarchy.
> >
> > This implements following model
> > - A cgroup's tree means hierarchy of resource.
> > - All child's resource is moved from its parents.
> > - The resource moved to children is charged as parent's usage.
> > - The resource moves when child->limit is changed.
> > - The sum of resource for children and its own usage is limited by "limit".
> >
> > This implies
> > - No dynamic automatic hierarhcy balancing in the kernel.
> > - Each resource is isolated completely.
> > - The kernel just supports resource-move-at-change-in-limit.
> > - The user (middle-ware) is responsible to make hierarhcy balanced well.
>
> We'd definitely like to see a user level tool/application as a demo of how this
> can be achieved.
>
I don't have one, now. I'll write one when I have time. Need now ?
Hmm...maybe I(we) need some more patches to implement useful statistics,
notifier to middlewares.
> > Good balance can be achieved by changing limit from user land.
> >
> >
> > Background:
> > Recently, there are popular resource isolation technique widely used,
> > i.e. Hardware-Virtualization. We can do hierarchical resource isolation
> > by using cgroup on it. But supporting hierarchy management in croups
> > has some advantages of performance, unity and costs of management.
> >
> > There are good resource management in other OSs, they support some kind of
> > hierarchical resource management. We wonder what kind of hierarchy policy
> > is good for Linux. And there is an another point. Hierarchical system can be
> > implemented by the kernel and user-land co-operation. So, there are various
> > choices to do in the kernel. Doing all in the kernel or export some proper
> > interfaces to the user-land. Middle-wares are tend to be used for management.
> > I hope there will be Open Source one.
> >
> > At supporting hierarchy in cgroup, several aspects of characteristics of
> > policy of hierarchy can be considered. Some needs automatic balancing
> > between several groups.
> >
> > - fairness ... how fairness is kept under policy
> >
> > - performance ... should be _fast_. multi-level resource balancing tend
> > to use much amount of CPU and can cause soft lockup.
> >
> > - predictability ... resource management are usually used for resource
> > isolation. the kernel must not break the isolation and
> > predictability of users against application's progress.
> >
> > - flexibility ... some sophisticated dynamic resource balancing with
> > soft-limit is welcomed when the user doesn't want strict
> > resource isolation or when the user cannot estimate how much
> > they want correctly.
>
> Soft limits has been on my plate for a while now. I'll take a crack at it. At
> the moment the statistics is a bit of a worry, since users/administrators need
> good statistics to take further action.
>
Yes, statistics is not enough now.
> >
> > Hard Wall Hierarchy.
> >
> > This patch implements a hard-wall model of hierarchy for resources.
> > Works well for users who want strict resource isolation.
> >
> > This model allows the move of resource only between a parent and its children.
> > The resource is moved to a child when it declares the amount of resources to be
> > used. (by limit)
>
> The other reason for preferring a shares based approach is that, the it will be
> more in line with the CPU controllers interfaces.
>
You have to think of the major difference of tha nature of CPU and Memory.
We have to reclaim the resource with some feedbacks among sevral cgroups.
But ok, if it's can be implemented in simple way.
I have no objections if cost is very low. My concern is only performance.
(and maintenance)
> > Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> > +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> > @@ -38,6 +38,16 @@ struct res_counter {
> > * the number of unsuccessful attempts to consume the resource
> > */
> > unsigned long long failcnt;
> > +
> > + /*
> > + * hierarchy support: the parent of this resource.
> > + */
> > + struct res_counter *parent;
> > + /*
> > + * the amount of resources assigned to children.
> > + */
> > + unsigned long long for_children;
> > +
>
> I would prefer to use a better name, lent_out? reserved_for_children?
> borrowed_by_children?
>
ok. use other names.
> > /*
> > * the lock to protect all of the above.
> > * the routines below consider this to be IRQ-safe
> > @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
> > ssize_t res_counter_read(struct res_counter *counter, int member,
> > const char __user *buf, size_t nbytes, loff_t *pos,
> > int (*read_strategy)(unsigned long long val, char *s));
> > +
> > +/*
> > + * An interface for setting res_counter's member (ex. limit)
> > + * the new parameter is passed by *buf and translated by write_strategy().
> > + * Then, it is applied to member under the control of set_strategy().
> > + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
> > + */
> > +
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > - const char __user *buf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *buf, unsigned long long *val));
> > + const char __user *buf, size_t nbytes, loff_t *pos,
> > + int (*write_strategy)(char *buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res, unsigned long long val,
> > + int what),
> > + );
> >
> > /*
> > * the field descriptors. one for each member of res_counter
> > @@ -76,15 +97,33 @@ enum {
> > RES_MAX_USAGE,
> > RES_LIMIT,
> > RES_FAILCNT,
> > + RES_FOR_CHILDREN,
>
> RES_BORROWED? RES_BORROWED_BY_CHILDREN?
>
ok, again.
> > };
> >
> > /*
> > * helpers for accounting
> > */
> >
> > +/*
> > + * initialize res_counter.
> > + * @counter : the counter
> > + *
> > + * initialize res_counter and set default limit to very big value(unlimited)
> > + */
> > +
> > void res_counter_init(struct res_counter *counter);
> >
> > /*
> > + * initialize res_counter under hierarchy.
> > + * @counter : the counter
> > + * @parent : the parent of the counter
> > + *
> > + * initialize res_counter and set default limit to 0. and set "parent".
> > + */
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent);
> > +
> > +/*
> > * charge - try to consume more resource.
> > *
> > * @counter: the counter
> > @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> > cnt->failcnt = 0;
> > spin_unlock_irqrestore(&cnt->lock, flags);
> > }
> > +
> > +/**
> > + * Move resources from a parent to a child.
> > + * At success,
> > + * parent->usage += val.
> > + * parent->for_children += val.
> > + * child->limit += val.
> > + *
> > + * @child: an entity to set res->limit. The parent is child->parent.
> > + * @val: the amount of resource to be moved.
> > + * @callback: called when the parent's free resource is not enough to be moved.
> > + * this can be NULL if no callback is necessary.
> > + * @retry: limit for the number of trying to callback.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success, !0 at failure.
> > + *
> > + * The callback returns 0 at success, !0 at failure.
> > + *
> > + */
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > +
> > +/**
> > + * Return resource to its parent.
> > + * At success,
> > + * parent->usage -= val.
> > + * parent->for_children -= val.
> > + * child->limit -= val.
> > + *
> > + * @child: entry to resize. The parent is child->parent.
> > + * @val : How much does child repay to parent ? -1 means 'all'
> > + * @callback: A callback for decreasing resource usage of child before
> > + * returning. If NULL, just deceases child's limit.
> > + * @retry: # of retries at calling callback for freeing resource.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success.
> > + */
> > +
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > #endif
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -44,6 +44,13 @@ to work with it.
> > Protects changes of the above values.
> >
> >
> > + f. struct res_counter *parent
> > +
> > + Parent res_counter under hierarchy.
> > +
> > + g. unsigned long long for_children
> > +
> > + Resources assigned to children. This is included in usage.
> >
> > 2. Basic accounting routines
> >
> > @@ -179,3 +186,37 @@ counter fields. They are recommended to
> > still can help with it).
> >
> > c. Compile and run :)
> > +
> > +
> > +6. Hierarchy
> > + a. No Hierarchy
> > + each cgroup can use its own private resource.
> > +
> > + b. Hard-wall Hierarhcy
> > + A simple hierarchical tree system for resource isolation.
> > + Allows moving resources only between a parent and its children.
> > + A parent can move its resource to children and remember the amount to
> > + for_children member. A child can get new resource only from its parent.
> > + Limit of a child is the amount of resource which is moved from its parent.
> > +
>
> OK, after reading this I am totally sure I want a shares based interface. Limits
> are not shared like this.
>
> A child and a parent should both be capable of having a limit of 1G, but they
> could use different shares factors to govern, how much each children will get.
> Doing it this way, breaks limit semantics.
>
Not easy to use in my point of view. Can we use 'share' in proper way
on no-swap machine ?
>
> > + When add "val" to a child,
> > + parent->usage += val
> > + parent->for_children += val
> > + child->limit += val
> > + When a child returns its resource
> > + parent->usage -= val
> > + parent->for_children -= val
> > + child->limit -= val.
> > +
> > + This implements resource isolation among each group. This works very well
> > + when you want to use strict resource isolation.
> > +
> > + Usage Hint:
> > + This seems for static resource assignment but dynamic resource re-assignment
> > + can be done by resetting "limit" of groups. When you consider "limit" as
> > + the amount of allowed _current_ resource, a sophisticated resource management
> > + system based on strict resource isolation can be implemented.
> > +
> > +c. Soft-wall Hierarchy
> > + TBD.
> > +
> > Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> > +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> > @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> > counter->limit = (unsigned long long)LLONG_MAX;
> > }
> >
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent)
> > +{
> > + spin_lock_init(&counter->lock);
> > + counter->limit = 0;
> > + counter->parent = parent;
> > +}
> > +
> > int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > {
> > if (counter->usage + val > counter->limit) {
> > @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> > return &counter->limit;
> > case RES_FAILCNT:
> > return &counter->failcnt;
> > + case RES_FOR_CHILDREN:
> > + return &counter->for_children;
> > };
> >
> > BUG();
> > @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
> >
> > ssize_t res_counter_write(struct res_counter *counter, int member,
> > const char __user *userbuf, size_t nbytes, loff_t *pos,
> > - int (*write_strategy)(char *st_buf, unsigned long long *val))
> > + int (*write_strategy)(char *st_buf, unsigned long long *val),
> > + int (*set_strategy)(struct res_counter *res,
> > + unsigned long long val, int what))
> > {
> > int ret;
> > char *buf, *end;
> > @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> > if (*end != '\0')
> > goto out_free;
> > }
> > - spin_lock_irqsave(&counter->lock, flags);
> > - val = res_counter_member(counter, member);
> > - *val = tmp;
> > - spin_unlock_irqrestore(&counter->lock, flags);
> > - ret = nbytes;
> > + if (set_strategy) {
> > + ret = set_strategy(res, tmp, member);
>
>
> I'm afraid, I don't understand the set_strategy and it's purpose.
>
Sorry. I'm now rewritten and removed this.
> > + if (!ret)
> > + ret = nbytes;
> > + } else {
> > + spin_lock_irqsave(&counter->lock, flags);
> > + val = res_counter_member(counter, member);
> > + *val = tmp;
> > + spin_unlock_irqrestore(&counter->lock, flags);
> > + ret = nbytes;
> > + }
> > out_free:
> > kfree(buf);
> > out:
> > return ret;
> > }
> > +
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry)
> > +{
> > + struct res_counter *parent = child->parent;
> > + unsigned long flags;
> > +
> > + BUG_ON(!parent);
> > +
> > + while (1) {
> > + spin_lock_irqsave(&parent->lock, flags);
> > + if (parent->usage + val < parent->limit) {
> > + parent->for_children += val;
> > + parent->usage += val;
> > + break;
> > + }
> > + spin_unlock_irqrestore(&parent->lock, flags);
> > +
> > + if (!retry || !callback)
> > + goto failed;
> > + /* -1 means infinite loop */
> > + if (retry != -1)
> > + --retry;
>
> I don't like the idea of spinning in an infinite loop, I would prefer to fail
> things instead of burning CPU cycles.
>
ok, will remove "-1" case.
> > + yield();
> > + callback(parent, val);
>
> This code is not very understandable. Why do we yield before callback?
>
yield() after callback() means that res_counter's state will be
far different from the state after callback.
So, we have to yield before call back and check res_coutner sooner.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-09 10:20 ` KAMEZAWA Hiroyuki
@ 2008-06-09 10:37 ` Balbir Singh
-1 siblings, 0 replies; 52+ messages in thread
From: Balbir Singh @ 2008-06-09 10:37 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
KAMEZAWA Hiroyuki wrote:
> On Mon, 09 Jun 2008 15:18:47 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>> KAMEZAWA Hiroyuki wrote:
>>> A simple hard-wall hierarhcy support for res_counter.
>>>
>>> Changelog v2->v3
>>> - changed the name and arguments of functions.
>>> - rewrote to be read easily.
>>> - named as HardWall hierarchy.
>>>
>>> This implements following model
>>> - A cgroup's tree means hierarchy of resource.
>>> - All child's resource is moved from its parents.
>>> - The resource moved to children is charged as parent's usage.
>>> - The resource moves when child->limit is changed.
>>> - The sum of resource for children and its own usage is limited by "limit".
>>>
>>> This implies
>>> - No dynamic automatic hierarhcy balancing in the kernel.
>>> - Each resource is isolated completely.
>>> - The kernel just supports resource-move-at-change-in-limit.
>>> - The user (middle-ware) is responsible to make hierarhcy balanced well.
>> We'd definitely like to see a user level tool/application as a demo of how this
>> can be achieved.
>>
> I don't have one, now. I'll write one when I have time. Need now ?
> Hmm...maybe I(we) need some more patches to implement useful statistics,
> notifier to middlewares.
>
Yes, we need more useful statistics.
>
>
>>> Good balance can be achieved by changing limit from user land.
>>>
>>>
>>> Background:
>>> Recently, there are popular resource isolation technique widely used,
>>> i.e. Hardware-Virtualization. We can do hierarchical resource isolation
>>> by using cgroup on it. But supporting hierarchy management in croups
>>> has some advantages of performance, unity and costs of management.
>>>
>>> There are good resource management in other OSs, they support some kind of
>>> hierarchical resource management. We wonder what kind of hierarchy policy
>>> is good for Linux. And there is an another point. Hierarchical system can be
>>> implemented by the kernel and user-land co-operation. So, there are various
>>> choices to do in the kernel. Doing all in the kernel or export some proper
>>> interfaces to the user-land. Middle-wares are tend to be used for management.
>>> I hope there will be Open Source one.
>>>
>>> At supporting hierarchy in cgroup, several aspects of characteristics of
>>> policy of hierarchy can be considered. Some needs automatic balancing
>>> between several groups.
>>>
>>> - fairness ... how fairness is kept under policy
>>>
>>> - performance ... should be _fast_. multi-level resource balancing tend
>>> to use much amount of CPU and can cause soft lockup.
>>>
>>> - predictability ... resource management are usually used for resource
>>> isolation. the kernel must not break the isolation and
>>> predictability of users against application's progress.
>>>
>>> - flexibility ... some sophisticated dynamic resource balancing with
>>> soft-limit is welcomed when the user doesn't want strict
>>> resource isolation or when the user cannot estimate how much
>>> they want correctly.
>> Soft limits has been on my plate for a while now. I'll take a crack at it. At
>> the moment the statistics is a bit of a worry, since users/administrators need
>> good statistics to take further action.
>>
> Yes, statistics is not enough now.
>
>
>
>>> Hard Wall Hierarchy.
>>>
>>> This patch implements a hard-wall model of hierarchy for resources.
>>> Works well for users who want strict resource isolation.
>>>
>>> This model allows the move of resource only between a parent and its children.
>>> The resource is moved to a child when it declares the amount of resources to be
>>> used. (by limit)
>> The other reason for preferring a shares based approach is that, the it will be
>> more in line with the CPU controllers interfaces.
>>
>
> You have to think of the major difference of tha nature of CPU and Memory.
> We have to reclaim the resource with some feedbacks among sevral cgroups.
> But ok, if it's can be implemented in simple way.
> I have no objections if cost is very low. My concern is only performance.
> (and maintenance)
>
True, I don't see hierarchy as adding too much additional cost.
>
>>> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
>>> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
>>> @@ -38,6 +38,16 @@ struct res_counter {
>>> * the number of unsuccessful attempts to consume the resource
>>> */
>>> unsigned long long failcnt;
>>> +
>>> + /*
>>> + * hierarchy support: the parent of this resource.
>>> + */
>>> + struct res_counter *parent;
>>> + /*
>>> + * the amount of resources assigned to children.
>>> + */
>>> + unsigned long long for_children;
>>> +
>> I would prefer to use a better name, lent_out? reserved_for_children?
>> borrowed_by_children?
>>
> ok. use other names.
>
>
>
>>> /*
>>> * the lock to protect all of the above.
>>> * the routines below consider this to be IRQ-safe
>>> @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
>>> ssize_t res_counter_read(struct res_counter *counter, int member,
>>> const char __user *buf, size_t nbytes, loff_t *pos,
>>> int (*read_strategy)(unsigned long long val, char *s));
>>> +
>>> +/*
>>> + * An interface for setting res_counter's member (ex. limit)
>>> + * the new parameter is passed by *buf and translated by write_strategy().
>>> + * Then, it is applied to member under the control of set_strategy().
>>> + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
>>> + */
>>> +
>>> ssize_t res_counter_write(struct res_counter *counter, int member,
>>> - const char __user *buf, size_t nbytes, loff_t *pos,
>>> - int (*write_strategy)(char *buf, unsigned long long *val));
>>> + const char __user *buf, size_t nbytes, loff_t *pos,
>>> + int (*write_strategy)(char *buf, unsigned long long *val),
>>> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
>>> + int what),
>>> + );
>>>
>>> /*
>>> * the field descriptors. one for each member of res_counter
>>> @@ -76,15 +97,33 @@ enum {
>>> RES_MAX_USAGE,
>>> RES_LIMIT,
>>> RES_FAILCNT,
>>> + RES_FOR_CHILDREN,
>> RES_BORROWED? RES_BORROWED_BY_CHILDREN?
>>
> ok, again.
>
>>> };
>>>
>>> /*
>>> * helpers for accounting
>>> */
>>>
>>> +/*
>>> + * initialize res_counter.
>>> + * @counter : the counter
>>> + *
>>> + * initialize res_counter and set default limit to very big value(unlimited)
>>> + */
>>> +
>>> void res_counter_init(struct res_counter *counter);
>>>
>>> /*
>>> + * initialize res_counter under hierarchy.
>>> + * @counter : the counter
>>> + * @parent : the parent of the counter
>>> + *
>>> + * initialize res_counter and set default limit to 0. and set "parent".
>>> + */
>>> +void res_counter_init_hierarchy(struct res_counter *counter,
>>> + struct res_counter *parent);
>>> +
>>> +/*
>>> * charge - try to consume more resource.
>>> *
>>> * @counter: the counter
>>> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
>>> cnt->failcnt = 0;
>>> spin_unlock_irqrestore(&cnt->lock, flags);
>>> }
>>> +
>>> +/**
>>> + * Move resources from a parent to a child.
>>> + * At success,
>>> + * parent->usage += val.
>>> + * parent->for_children += val.
>>> + * child->limit += val.
>>> + *
>>> + * @child: an entity to set res->limit. The parent is child->parent.
>>> + * @val: the amount of resource to be moved.
>>> + * @callback: called when the parent's free resource is not enough to be moved.
>>> + * this can be NULL if no callback is necessary.
>>> + * @retry: limit for the number of trying to callback.
>>> + * -1 means infinite loop. At each retry, yield() is called.
>>> + * Returns 0 at success, !0 at failure.
>>> + *
>>> + * The callback returns 0 at success, !0 at failure.
>>> + *
>>> + */
>>> +
>>> +int res_counter_move_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry);
>>> +
>>> +
>>> +/**
>>> + * Return resource to its parent.
>>> + * At success,
>>> + * parent->usage -= val.
>>> + * parent->for_children -= val.
>>> + * child->limit -= val.
>>> + *
>>> + * @child: entry to resize. The parent is child->parent.
>>> + * @val : How much does child repay to parent ? -1 means 'all'
>>> + * @callback: A callback for decreasing resource usage of child before
>>> + * returning. If NULL, just deceases child's limit.
>>> + * @retry: # of retries at calling callback for freeing resource.
>>> + * -1 means infinite loop. At each retry, yield() is called.
>>> + * Returns 0 at success.
>>> + */
>>> +
>>> +int res_counter_return_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry);
>>> +
>>> #endif
>>> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
>>> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
>>> @@ -44,6 +44,13 @@ to work with it.
>>> Protects changes of the above values.
>>>
>>>
>>> + f. struct res_counter *parent
>>> +
>>> + Parent res_counter under hierarchy.
>>> +
>>> + g. unsigned long long for_children
>>> +
>>> + Resources assigned to children. This is included in usage.
>>>
>>> 2. Basic accounting routines
>>>
>>> @@ -179,3 +186,37 @@ counter fields. They are recommended to
>>> still can help with it).
>>>
>>> c. Compile and run :)
>>> +
>>> +
>>> +6. Hierarchy
>>> + a. No Hierarchy
>>> + each cgroup can use its own private resource.
>>> +
>>> + b. Hard-wall Hierarhcy
>>> + A simple hierarchical tree system for resource isolation.
>>> + Allows moving resources only between a parent and its children.
>>> + A parent can move its resource to children and remember the amount to
>>> + for_children member. A child can get new resource only from its parent.
>>> + Limit of a child is the amount of resource which is moved from its parent.
>>> +
>> OK, after reading this I am totally sure I want a shares based interface. Limits
>> are not shared like this.
>>
>> A child and a parent should both be capable of having a limit of 1G, but they
>> could use different shares factors to govern, how much each children will get.
>> Doing it this way, breaks limit semantics.
>>
> Not easy to use in my point of view. Can we use 'share' in proper way
> on no-swap machine ?
>
Not sure I understand your question. Share represents the share of available
resources.
>
>>> + When add "val" to a child,
>>> + parent->usage += val
>>> + parent->for_children += val
>>> + child->limit += val
>>> + When a child returns its resource
>>> + parent->usage -= val
>>> + parent->for_children -= val
>>> + child->limit -= val.
>>> +
>>> + This implements resource isolation among each group. This works very well
>>> + when you want to use strict resource isolation.
>>> +
>>> + Usage Hint:
>>> + This seems for static resource assignment but dynamic resource re-assignment
>>> + can be done by resetting "limit" of groups. When you consider "limit" as
>>> + the amount of allowed _current_ resource, a sophisticated resource management
>>> + system based on strict resource isolation can be implemented.
>>> +
>>> +c. Soft-wall Hierarchy
>>> + TBD.
>>> +
>>> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
>>> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
>>> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
>>> counter->limit = (unsigned long long)LLONG_MAX;
>>> }
>>>
>>> +void res_counter_init_hierarchy(struct res_counter *counter,
>>> + struct res_counter *parent)
>>> +{
>>> + spin_lock_init(&counter->lock);
>>> + counter->limit = 0;
>>> + counter->parent = parent;
>>> +}
>>> +
>>> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>>> {
>>> if (counter->usage + val > counter->limit) {
>>> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
>>> return &counter->limit;
>>> case RES_FAILCNT:
>>> return &counter->failcnt;
>>> + case RES_FOR_CHILDREN:
>>> + return &counter->for_children;
>>> };
>>>
>>> BUG();
>>> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>>>
>>> ssize_t res_counter_write(struct res_counter *counter, int member,
>>> const char __user *userbuf, size_t nbytes, loff_t *pos,
>>> - int (*write_strategy)(char *st_buf, unsigned long long *val))
>>> + int (*write_strategy)(char *st_buf, unsigned long long *val),
>>> + int (*set_strategy)(struct res_counter *res,
>>> + unsigned long long val, int what))
>>> {
>>> int ret;
>>> char *buf, *end;
>>> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
>>> if (*end != '\0')
>>> goto out_free;
>>> }
>>> - spin_lock_irqsave(&counter->lock, flags);
>>> - val = res_counter_member(counter, member);
>>> - *val = tmp;
>>> - spin_unlock_irqrestore(&counter->lock, flags);
>>> - ret = nbytes;
>>> + if (set_strategy) {
>>> + ret = set_strategy(res, tmp, member);
>>
>> I'm afraid, I don't understand the set_strategy and it's purpose.
>>
> Sorry. I'm now rewritten and removed this.
>
>
OK
>
>>> + if (!ret)
>>> + ret = nbytes;
>>> + } else {
>>> + spin_lock_irqsave(&counter->lock, flags);
>>> + val = res_counter_member(counter, member);
>>> + *val = tmp;
>>> + spin_unlock_irqrestore(&counter->lock, flags);
>>> + ret = nbytes;
>>> + }
>>> out_free:
>>> kfree(buf);
>>> out:
>>> return ret;
>>> }
>>> +
>>> +
>>> +int res_counter_move_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry)
>>> +{
>>> + struct res_counter *parent = child->parent;
>>> + unsigned long flags;
>>> +
>>> + BUG_ON(!parent);
>>> +
>>> + while (1) {
>>> + spin_lock_irqsave(&parent->lock, flags);
>>> + if (parent->usage + val < parent->limit) {
>>> + parent->for_children += val;
>>> + parent->usage += val;
>>> + break;
>>> + }
>>> + spin_unlock_irqrestore(&parent->lock, flags);
>>> +
>>> + if (!retry || !callback)
>>> + goto failed;
>>> + /* -1 means infinite loop */
>>> + if (retry != -1)
>>> + --retry;
>> I don't like the idea of spinning in an infinite loop, I would prefer to fail
>> things instead of burning CPU cycles.
>>
> ok, will remove "-1" case.
>
>
>
>
>>> + yield();
>>> + callback(parent, val);
>> This code is not very understandable. Why do we yield before callback?
>>
>
> yield() after callback() means that res_counter's state will be
> far different from the state after callback.
> So, we have to yield before call back and check res_coutner sooner.
>
But does yield() get us any guarantees of seeing the state change?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-09 10:37 ` Balbir Singh
0 siblings, 0 replies; 52+ messages in thread
From: Balbir Singh @ 2008-06-09 10:37 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, menage@google.com, xemul@openvz.org,
yamamoto@valinux.co.jp
KAMEZAWA Hiroyuki wrote:
> On Mon, 09 Jun 2008 15:18:47 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>> KAMEZAWA Hiroyuki wrote:
>>> A simple hard-wall hierarhcy support for res_counter.
>>>
>>> Changelog v2->v3
>>> - changed the name and arguments of functions.
>>> - rewrote to be read easily.
>>> - named as HardWall hierarchy.
>>>
>>> This implements following model
>>> - A cgroup's tree means hierarchy of resource.
>>> - All child's resource is moved from its parents.
>>> - The resource moved to children is charged as parent's usage.
>>> - The resource moves when child->limit is changed.
>>> - The sum of resource for children and its own usage is limited by "limit".
>>>
>>> This implies
>>> - No dynamic automatic hierarhcy balancing in the kernel.
>>> - Each resource is isolated completely.
>>> - The kernel just supports resource-move-at-change-in-limit.
>>> - The user (middle-ware) is responsible to make hierarhcy balanced well.
>> We'd definitely like to see a user level tool/application as a demo of how this
>> can be achieved.
>>
> I don't have one, now. I'll write one when I have time. Need now ?
> Hmm...maybe I(we) need some more patches to implement useful statistics,
> notifier to middlewares.
>
Yes, we need more useful statistics.
>
>
>>> Good balance can be achieved by changing limit from user land.
>>>
>>>
>>> Background:
>>> Recently, there are popular resource isolation technique widely used,
>>> i.e. Hardware-Virtualization. We can do hierarchical resource isolation
>>> by using cgroup on it. But supporting hierarchy management in croups
>>> has some advantages of performance, unity and costs of management.
>>>
>>> There are good resource management in other OSs, they support some kind of
>>> hierarchical resource management. We wonder what kind of hierarchy policy
>>> is good for Linux. And there is an another point. Hierarchical system can be
>>> implemented by the kernel and user-land co-operation. So, there are various
>>> choices to do in the kernel. Doing all in the kernel or export some proper
>>> interfaces to the user-land. Middle-wares are tend to be used for management.
>>> I hope there will be Open Source one.
>>>
>>> At supporting hierarchy in cgroup, several aspects of characteristics of
>>> policy of hierarchy can be considered. Some needs automatic balancing
>>> between several groups.
>>>
>>> - fairness ... how fairness is kept under policy
>>>
>>> - performance ... should be _fast_. multi-level resource balancing tend
>>> to use much amount of CPU and can cause soft lockup.
>>>
>>> - predictability ... resource management are usually used for resource
>>> isolation. the kernel must not break the isolation and
>>> predictability of users against application's progress.
>>>
>>> - flexibility ... some sophisticated dynamic resource balancing with
>>> soft-limit is welcomed when the user doesn't want strict
>>> resource isolation or when the user cannot estimate how much
>>> they want correctly.
>> Soft limits has been on my plate for a while now. I'll take a crack at it. At
>> the moment the statistics is a bit of a worry, since users/administrators need
>> good statistics to take further action.
>>
> Yes, statistics is not enough now.
>
>
>
>>> Hard Wall Hierarchy.
>>>
>>> This patch implements a hard-wall model of hierarchy for resources.
>>> Works well for users who want strict resource isolation.
>>>
>>> This model allows the move of resource only between a parent and its children.
>>> The resource is moved to a child when it declares the amount of resources to be
>>> used. (by limit)
>> The other reason for preferring a shares based approach is that, the it will be
>> more in line with the CPU controllers interfaces.
>>
>
> You have to think of the major difference of tha nature of CPU and Memory.
> We have to reclaim the resource with some feedbacks among sevral cgroups.
> But ok, if it's can be implemented in simple way.
> I have no objections if cost is very low. My concern is only performance.
> (and maintenance)
>
True, I don't see hierarchy as adding too much additional cost.
>
>>> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
>>> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
>>> @@ -38,6 +38,16 @@ struct res_counter {
>>> * the number of unsuccessful attempts to consume the resource
>>> */
>>> unsigned long long failcnt;
>>> +
>>> + /*
>>> + * hierarchy support: the parent of this resource.
>>> + */
>>> + struct res_counter *parent;
>>> + /*
>>> + * the amount of resources assigned to children.
>>> + */
>>> + unsigned long long for_children;
>>> +
>> I would prefer to use a better name, lent_out? reserved_for_children?
>> borrowed_by_children?
>>
> ok. use other names.
>
>
>
>>> /*
>>> * the lock to protect all of the above.
>>> * the routines below consider this to be IRQ-safe
>>> @@ -63,9 +73,20 @@ u64 res_counter_read_u64(struct res_coun
>>> ssize_t res_counter_read(struct res_counter *counter, int member,
>>> const char __user *buf, size_t nbytes, loff_t *pos,
>>> int (*read_strategy)(unsigned long long val, char *s));
>>> +
>>> +/*
>>> + * An interface for setting res_counter's member (ex. limit)
>>> + * the new parameter is passed by *buf and translated by write_strategy().
>>> + * Then, it is applied to member under the control of set_strategy().
>>> + * If write_strategy() and set_strategy() can be NULL. see res_counter.c
>>> + */
>>> +
>>> ssize_t res_counter_write(struct res_counter *counter, int member,
>>> - const char __user *buf, size_t nbytes, loff_t *pos,
>>> - int (*write_strategy)(char *buf, unsigned long long *val));
>>> + const char __user *buf, size_t nbytes, loff_t *pos,
>>> + int (*write_strategy)(char *buf, unsigned long long *val),
>>> + int (*set_strategy)(struct res_counter *res, unsigned long long val,
>>> + int what),
>>> + );
>>>
>>> /*
>>> * the field descriptors. one for each member of res_counter
>>> @@ -76,15 +97,33 @@ enum {
>>> RES_MAX_USAGE,
>>> RES_LIMIT,
>>> RES_FAILCNT,
>>> + RES_FOR_CHILDREN,
>> RES_BORROWED? RES_BORROWED_BY_CHILDREN?
>>
> ok, again.
>
>>> };
>>>
>>> /*
>>> * helpers for accounting
>>> */
>>>
>>> +/*
>>> + * initialize res_counter.
>>> + * @counter : the counter
>>> + *
>>> + * initialize res_counter and set default limit to very big value(unlimited)
>>> + */
>>> +
>>> void res_counter_init(struct res_counter *counter);
>>>
>>> /*
>>> + * initialize res_counter under hierarchy.
>>> + * @counter : the counter
>>> + * @parent : the parent of the counter
>>> + *
>>> + * initialize res_counter and set default limit to 0. and set "parent".
>>> + */
>>> +void res_counter_init_hierarchy(struct res_counter *counter,
>>> + struct res_counter *parent);
>>> +
>>> +/*
>>> * charge - try to consume more resource.
>>> *
>>> * @counter: the counter
>>> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
>>> cnt->failcnt = 0;
>>> spin_unlock_irqrestore(&cnt->lock, flags);
>>> }
>>> +
>>> +/**
>>> + * Move resources from a parent to a child.
>>> + * At success,
>>> + * parent->usage += val.
>>> + * parent->for_children += val.
>>> + * child->limit += val.
>>> + *
>>> + * @child: an entity to set res->limit. The parent is child->parent.
>>> + * @val: the amount of resource to be moved.
>>> + * @callback: called when the parent's free resource is not enough to be moved.
>>> + * this can be NULL if no callback is necessary.
>>> + * @retry: limit for the number of trying to callback.
>>> + * -1 means infinite loop. At each retry, yield() is called.
>>> + * Returns 0 at success, !0 at failure.
>>> + *
>>> + * The callback returns 0 at success, !0 at failure.
>>> + *
>>> + */
>>> +
>>> +int res_counter_move_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry);
>>> +
>>> +
>>> +/**
>>> + * Return resource to its parent.
>>> + * At success,
>>> + * parent->usage -= val.
>>> + * parent->for_children -= val.
>>> + * child->limit -= val.
>>> + *
>>> + * @child: entry to resize. The parent is child->parent.
>>> + * @val : How much does child repay to parent ? -1 means 'all'
>>> + * @callback: A callback for decreasing resource usage of child before
>>> + * returning. If NULL, just deceases child's limit.
>>> + * @retry: # of retries at calling callback for freeing resource.
>>> + * -1 means infinite loop. At each retry, yield() is called.
>>> + * Returns 0 at success.
>>> + */
>>> +
>>> +int res_counter_return_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry);
>>> +
>>> #endif
>>> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
>>> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
>>> @@ -44,6 +44,13 @@ to work with it.
>>> Protects changes of the above values.
>>>
>>>
>>> + f. struct res_counter *parent
>>> +
>>> + Parent res_counter under hierarchy.
>>> +
>>> + g. unsigned long long for_children
>>> +
>>> + Resources assigned to children. This is included in usage.
>>>
>>> 2. Basic accounting routines
>>>
>>> @@ -179,3 +186,37 @@ counter fields. They are recommended to
>>> still can help with it).
>>>
>>> c. Compile and run :)
>>> +
>>> +
>>> +6. Hierarchy
>>> + a. No Hierarchy
>>> + each cgroup can use its own private resource.
>>> +
>>> + b. Hard-wall Hierarhcy
>>> + A simple hierarchical tree system for resource isolation.
>>> + Allows moving resources only between a parent and its children.
>>> + A parent can move its resource to children and remember the amount to
>>> + for_children member. A child can get new resource only from its parent.
>>> + Limit of a child is the amount of resource which is moved from its parent.
>>> +
>> OK, after reading this I am totally sure I want a shares based interface. Limits
>> are not shared like this.
>>
>> A child and a parent should both be capable of having a limit of 1G, but they
>> could use different shares factors to govern, how much each children will get.
>> Doing it this way, breaks limit semantics.
>>
> Not easy to use in my point of view. Can we use 'share' in proper way
> on no-swap machine ?
>
Not sure I understand your question. Share represents the share of available
resources.
>
>>> + When add "val" to a child,
>>> + parent->usage += val
>>> + parent->for_children += val
>>> + child->limit += val
>>> + When a child returns its resource
>>> + parent->usage -= val
>>> + parent->for_children -= val
>>> + child->limit -= val.
>>> +
>>> + This implements resource isolation among each group. This works very well
>>> + when you want to use strict resource isolation.
>>> +
>>> + Usage Hint:
>>> + This seems for static resource assignment but dynamic resource re-assignment
>>> + can be done by resetting "limit" of groups. When you consider "limit" as
>>> + the amount of allowed _current_ resource, a sophisticated resource management
>>> + system based on strict resource isolation can be implemented.
>>> +
>>> +c. Soft-wall Hierarchy
>>> + TBD.
>>> +
>>> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
>>> ===================================================================
>>> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
>>> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
>>> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
>>> counter->limit = (unsigned long long)LLONG_MAX;
>>> }
>>>
>>> +void res_counter_init_hierarchy(struct res_counter *counter,
>>> + struct res_counter *parent)
>>> +{
>>> + spin_lock_init(&counter->lock);
>>> + counter->limit = 0;
>>> + counter->parent = parent;
>>> +}
>>> +
>>> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>>> {
>>> if (counter->usage + val > counter->limit) {
>>> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
>>> return &counter->limit;
>>> case RES_FAILCNT:
>>> return &counter->failcnt;
>>> + case RES_FOR_CHILDREN:
>>> + return &counter->for_children;
>>> };
>>>
>>> BUG();
>>> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>>>
>>> ssize_t res_counter_write(struct res_counter *counter, int member,
>>> const char __user *userbuf, size_t nbytes, loff_t *pos,
>>> - int (*write_strategy)(char *st_buf, unsigned long long *val))
>>> + int (*write_strategy)(char *st_buf, unsigned long long *val),
>>> + int (*set_strategy)(struct res_counter *res,
>>> + unsigned long long val, int what))
>>> {
>>> int ret;
>>> char *buf, *end;
>>> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
>>> if (*end != '\0')
>>> goto out_free;
>>> }
>>> - spin_lock_irqsave(&counter->lock, flags);
>>> - val = res_counter_member(counter, member);
>>> - *val = tmp;
>>> - spin_unlock_irqrestore(&counter->lock, flags);
>>> - ret = nbytes;
>>> + if (set_strategy) {
>>> + ret = set_strategy(res, tmp, member);
>>
>> I'm afraid, I don't understand the set_strategy and it's purpose.
>>
> Sorry. I'm now rewritten and removed this.
>
>
OK
>
>>> + if (!ret)
>>> + ret = nbytes;
>>> + } else {
>>> + spin_lock_irqsave(&counter->lock, flags);
>>> + val = res_counter_member(counter, member);
>>> + *val = tmp;
>>> + spin_unlock_irqrestore(&counter->lock, flags);
>>> + ret = nbytes;
>>> + }
>>> out_free:
>>> kfree(buf);
>>> out:
>>> return ret;
>>> }
>>> +
>>> +
>>> +int res_counter_move_resource(struct res_counter *child,
>>> + unsigned long long val,
>>> + int (*callback)(struct res_counter *res, unsigned long long val),
>>> + int retry)
>>> +{
>>> + struct res_counter *parent = child->parent;
>>> + unsigned long flags;
>>> +
>>> + BUG_ON(!parent);
>>> +
>>> + while (1) {
>>> + spin_lock_irqsave(&parent->lock, flags);
>>> + if (parent->usage + val < parent->limit) {
>>> + parent->for_children += val;
>>> + parent->usage += val;
>>> + break;
>>> + }
>>> + spin_unlock_irqrestore(&parent->lock, flags);
>>> +
>>> + if (!retry || !callback)
>>> + goto failed;
>>> + /* -1 means infinite loop */
>>> + if (retry != -1)
>>> + --retry;
>> I don't like the idea of spinning in an infinite loop, I would prefer to fail
>> things instead of burning CPU cycles.
>>
> ok, will remove "-1" case.
>
>
>
>
>>> + yield();
>>> + callback(parent, val);
>> This code is not very understandable. Why do we yield before callback?
>>
>
> yield() after callback() means that res_counter's state will be
> far different from the state after callback.
> So, we have to yield before call back and check res_coutner sooner.
>
But does yield() get us any guarantees of seeing the state change?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-04 5:01 ` KAMEZAWA Hiroyuki
@ 2008-06-11 23:24 ` Randy Dunlap
-1 siblings, 0 replies; 52+ messages in thread
From: Randy Dunlap @ 2008-06-11 23:24 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 4 Jun 2008 14:01:53 +0900 KAMEZAWA Hiroyuki wrote:
> A simple hard-wall hierarhcy support for res_counter.
>
> Changelog v2->v3
> - changed the name and arguments of functions.
> - rewrote to be read easily.
> - named as HardWall hierarchy.
> ---
> Documentation/controllers/resource_counter.txt | 41 +++++++++
> include/linux/res_counter.h | 90 +++++++++++++++++++-
> kernel/res_counter.c | 112 +++++++++++++++++++++++--
> 3 files changed, 235 insertions(+), 8 deletions(-)
>
> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> @@ -76,15 +97,33 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_FOR_CHILDREN,
> };
>
> /*
> * helpers for accounting
> */
>
> +/*
> + * initialize res_counter.
> + * @counter : the counter
> + *
> + * initialize res_counter and set default limit to very big value(unlimited)
> + */
> +
> void res_counter_init(struct res_counter *counter);
For these non-static (non-private) functions, please use kernel-doc notation
(see Documentation/kernel-doc-nano-HOWTO.txt and/or examples in other source files).
Also, we prefer for the function documentation to be above its definition (implementation)
rather than above its declaration, so the kernel-doc should be moved to .c files
instead of living in .h files.
>
> /*
> + * initialize res_counter under hierarchy.
> + * @counter : the counter
> + * @parent : the parent of the counter
> + *
> + * initialize res_counter and set default limit to 0. and set "parent".
> + */
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent);
> +
> +/*
> * charge - try to consume more resource.
> *
> * @counter: the counter
> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> cnt->failcnt = 0;
> spin_unlock_irqrestore(&cnt->lock, flags);
> }
> +
> +/**
> + * Move resources from a parent to a child.
> + * At success,
> + * parent->usage += val.
> + * parent->for_children += val.
> + * child->limit += val.
> + *
> + * @child: an entity to set res->limit. The parent is child->parent.
> + * @val: the amount of resource to be moved.
> + * @callback: called when the parent's free resource is not enough to be moved.
> + * this can be NULL if no callback is necessary.
> + * @retry: limit for the number of trying to callback.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success, !0 at failure.
> + *
> + * The callback returns 0 at success, !0 at failure.
> + *
> + */
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> +
> +/**
> + * Return resource to its parent.
> + * At success,
> + * parent->usage -= val.
> + * parent->for_children -= val.
> + * child->limit -= val.
> + *
> + * @child: entry to resize. The parent is child->parent.
> + * @val : How much does child repay to parent ? -1 means 'all'
> + * @callback: A callback for decreasing resource usage of child before
> + * returning. If NULL, just deceases child's limit.
> + * @retry: # of retries at calling callback for freeing resource.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success.
> + */
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> #endif
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -179,3 +186,37 @@ counter fields. They are recommended to
> still can help with it).
>
> c. Compile and run :)
> +
> +
> +6. Hierarchy
> + a. No Hierarchy
> + each cgroup can use its own private resource.
> +
> + b. Hard-wall Hierarhcy
> + A simple hierarchical tree system for resource isolation.
> + Allows moving resources only between a parent and its children.
> + A parent can move its resource to children and remember the amount to
> + for_children member. A child can get new resource only from its parent.
> + Limit of a child is the amount of resource which is moved from its parent.
> +
> + When add "val" to a child,
> + parent->usage += val
> + parent->for_children += val
> + child->limit += val
> + When a child returns its resource
> + parent->usage -= val
> + parent->for_children -= val
> + child->limit -= val.
> +
> + This implements resource isolation among each group. This works very well
> + when you want to use strict resource isolation.
> +
> + Usage Hint:
> + This seems for static resource assignment but dynamic resource re-assignment
seems to be?
> + can be done by resetting "limit" of groups. When you consider "limit" as
> + the amount of allowed _current_ resource, a sophisticated resource management
> + system based on strict resource isolation can be implemented.
> +
> +c. Soft-wall Hierarchy
> + TBD.
> +
> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> counter->limit = (unsigned long long)LLONG_MAX;
> }
>
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent)
> +{
> + spin_lock_init(&counter->lock);
> + counter->limit = 0;
> + counter->parent = parent;
> +}
> +
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> {
> if (counter->usage + val > counter->limit) {
> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_FOR_CHILDREN:
> + return &counter->for_children;
> };
>
> BUG();
> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>
> ssize_t res_counter_write(struct res_counter *counter, int member,
> const char __user *userbuf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *st_buf, unsigned long long *val))
> + int (*write_strategy)(char *st_buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res,
> + unsigned long long val, int what))
> {
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
> out_free:
> kfree(buf);
> out:
> return ret;
> }
> +
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + struct res_counter *parent = child->parent;
> + unsigned long flags;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&parent->lock, flags);
> + if (parent->usage + val < parent->limit) {
> + parent->for_children += val;
> + parent->usage += val;
> + break;
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + if (!retry || !callback)
> + goto failed;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + spin_lock_irqsave(&child->lock, flags);
> + child->limit += val;
> + spin_unlock_irqrestore(&child->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> +
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + unsigned long flags;
> + struct res_counter *parent = child->parent;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&child->lock, flags);
> + if (val == (unsigned long long) -1) {
> + val = child->limit;
> + child->limit = 0;
> + break;
> + } else if (child->usage <= child->limit - val) {
> + child->limit -= val;
> + break;
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + if (!retry)
> + goto fail;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + spin_lock_irqsave(&parent->lock, flags);
> + BUG_ON(parent->for_children < val);
> + BUG_ON(parent->usage < val);
> + parent->for_children -= val;
> + parent->usage -= val;
> + spin_unlock_irqrestore(&parent->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> --
---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-11 23:24 ` Randy Dunlap
0 siblings, 0 replies; 52+ messages in thread
From: Randy Dunlap @ 2008-06-11 23:24 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 4 Jun 2008 14:01:53 +0900 KAMEZAWA Hiroyuki wrote:
> A simple hard-wall hierarhcy support for res_counter.
>
> Changelog v2->v3
> - changed the name and arguments of functions.
> - rewrote to be read easily.
> - named as HardWall hierarchy.
> ---
> Documentation/controllers/resource_counter.txt | 41 +++++++++
> include/linux/res_counter.h | 90 +++++++++++++++++++-
> kernel/res_counter.c | 112 +++++++++++++++++++++++--
> 3 files changed, 235 insertions(+), 8 deletions(-)
>
> Index: temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/include/linux/res_counter.h
> +++ temp-2.6.26-rc2-mm1/include/linux/res_counter.h
> @@ -76,15 +97,33 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_FOR_CHILDREN,
> };
>
> /*
> * helpers for accounting
> */
>
> +/*
> + * initialize res_counter.
> + * @counter : the counter
> + *
> + * initialize res_counter and set default limit to very big value(unlimited)
> + */
> +
> void res_counter_init(struct res_counter *counter);
For these non-static (non-private) functions, please use kernel-doc notation
(see Documentation/kernel-doc-nano-HOWTO.txt and/or examples in other source files).
Also, we prefer for the function documentation to be above its definition (implementation)
rather than above its declaration, so the kernel-doc should be moved to .c files
instead of living in .h files.
>
> /*
> + * initialize res_counter under hierarchy.
> + * @counter : the counter
> + * @parent : the parent of the counter
> + *
> + * initialize res_counter and set default limit to 0. and set "parent".
> + */
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent);
> +
> +/*
> * charge - try to consume more resource.
> *
> * @counter: the counter
> @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> cnt->failcnt = 0;
> spin_unlock_irqrestore(&cnt->lock, flags);
> }
> +
> +/**
> + * Move resources from a parent to a child.
> + * At success,
> + * parent->usage += val.
> + * parent->for_children += val.
> + * child->limit += val.
> + *
> + * @child: an entity to set res->limit. The parent is child->parent.
> + * @val: the amount of resource to be moved.
> + * @callback: called when the parent's free resource is not enough to be moved.
> + * this can be NULL if no callback is necessary.
> + * @retry: limit for the number of trying to callback.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success, !0 at failure.
> + *
> + * The callback returns 0 at success, !0 at failure.
> + *
> + */
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> +
> +/**
> + * Return resource to its parent.
> + * At success,
> + * parent->usage -= val.
> + * parent->for_children -= val.
> + * child->limit -= val.
> + *
> + * @child: entry to resize. The parent is child->parent.
> + * @val : How much does child repay to parent ? -1 means 'all'
> + * @callback: A callback for decreasing resource usage of child before
> + * returning. If NULL, just deceases child's limit.
> + * @retry: # of retries at calling callback for freeing resource.
> + * -1 means infinite loop. At each retry, yield() is called.
> + * Returns 0 at success.
> + */
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry);
> +
> #endif
> Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> @@ -179,3 +186,37 @@ counter fields. They are recommended to
> still can help with it).
>
> c. Compile and run :)
> +
> +
> +6. Hierarchy
> + a. No Hierarchy
> + each cgroup can use its own private resource.
> +
> + b. Hard-wall Hierarhcy
> + A simple hierarchical tree system for resource isolation.
> + Allows moving resources only between a parent and its children.
> + A parent can move its resource to children and remember the amount to
> + for_children member. A child can get new resource only from its parent.
> + Limit of a child is the amount of resource which is moved from its parent.
> +
> + When add "val" to a child,
> + parent->usage += val
> + parent->for_children += val
> + child->limit += val
> + When a child returns its resource
> + parent->usage -= val
> + parent->for_children -= val
> + child->limit -= val.
> +
> + This implements resource isolation among each group. This works very well
> + when you want to use strict resource isolation.
> +
> + Usage Hint:
> + This seems for static resource assignment but dynamic resource re-assignment
seems to be?
> + can be done by resetting "limit" of groups. When you consider "limit" as
> + the amount of allowed _current_ resource, a sophisticated resource management
> + system based on strict resource isolation can be implemented.
> +
> +c. Soft-wall Hierarchy
> + TBD.
> +
> Index: temp-2.6.26-rc2-mm1/kernel/res_counter.c
> ===================================================================
> --- temp-2.6.26-rc2-mm1.orig/kernel/res_counter.c
> +++ temp-2.6.26-rc2-mm1/kernel/res_counter.c
> @@ -20,6 +20,14 @@ void res_counter_init(struct res_counter
> counter->limit = (unsigned long long)LLONG_MAX;
> }
>
> +void res_counter_init_hierarchy(struct res_counter *counter,
> + struct res_counter *parent)
> +{
> + spin_lock_init(&counter->lock);
> + counter->limit = 0;
> + counter->parent = parent;
> +}
> +
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> {
> if (counter->usage + val > counter->limit) {
> @@ -74,6 +82,8 @@ res_counter_member(struct res_counter *c
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_FOR_CHILDREN:
> + return &counter->for_children;
> };
>
> BUG();
> @@ -104,7 +114,9 @@ u64 res_counter_read_u64(struct res_coun
>
> ssize_t res_counter_write(struct res_counter *counter, int member,
> const char __user *userbuf, size_t nbytes, loff_t *pos,
> - int (*write_strategy)(char *st_buf, unsigned long long *val))
> + int (*write_strategy)(char *st_buf, unsigned long long *val),
> + int (*set_strategy)(struct res_counter *res,
> + unsigned long long val, int what))
> {
> int ret;
> char *buf, *end;
> @@ -133,13 +145,101 @@ ssize_t res_counter_write(struct res_cou
> if (*end != '\0')
> goto out_free;
> }
> - spin_lock_irqsave(&counter->lock, flags);
> - val = res_counter_member(counter, member);
> - *val = tmp;
> - spin_unlock_irqrestore(&counter->lock, flags);
> - ret = nbytes;
> + if (set_strategy) {
> + ret = set_strategy(res, tmp, member);
> + if (!ret)
> + ret = nbytes;
> + } else {
> + spin_lock_irqsave(&counter->lock, flags);
> + val = res_counter_member(counter, member);
> + *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> + ret = nbytes;
> + }
> out_free:
> kfree(buf);
> out:
> return ret;
> }
> +
> +
> +int res_counter_move_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + struct res_counter *parent = child->parent;
> + unsigned long flags;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&parent->lock, flags);
> + if (parent->usage + val < parent->limit) {
> + parent->for_children += val;
> + parent->usage += val;
> + break;
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + if (!retry || !callback)
> + goto failed;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
> + }
> + spin_unlock_irqrestore(&parent->lock, flags);
> +
> + spin_lock_irqsave(&child->lock, flags);
> + child->limit += val;
> + spin_unlock_irqrestore(&child->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> +
> +
> +int res_counter_return_resource(struct res_counter *child,
> + unsigned long long val,
> + int (*callback)(struct res_counter *res, unsigned long long val),
> + int retry)
> +{
> + unsigned long flags;
> + struct res_counter *parent = child->parent;
> +
> + BUG_ON(!parent);
> +
> + while (1) {
> + spin_lock_irqsave(&child->lock, flags);
> + if (val == (unsigned long long) -1) {
> + val = child->limit;
> + child->limit = 0;
> + break;
> + } else if (child->usage <= child->limit - val) {
> + child->limit -= val;
> + break;
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + if (!retry)
> + goto fail;
> + /* -1 means infinite loop */
> + if (retry != -1)
> + --retry;
> + yield();
> + callback(parent, val);
> + }
> + spin_unlock_irqrestore(&child->lock, flags);
> +
> + spin_lock_irqsave(&parent->lock, flags);
> + BUG_ON(parent->for_children < val);
> + BUG_ON(parent->usage < val);
> + parent->for_children -= val;
> + parent->usage -= val;
> + spin_unlock_irqrestore(&parent->lock, flags);
> + return 0;
> +fail:
> + return 1;
> +}
> --
---
~Randy
'"Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
2008-06-11 23:24 ` Randy Dunlap
@ 2008-06-12 4:59 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-12 4:59 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 11 Jun 2008 16:24:27 -0700
Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > /*
> > * helpers for accounting
> > */
> >
> > +/*
> > + * initialize res_counter.
> > + * @counter : the counter
> > + *
> > + * initialize res_counter and set default limit to very big value(unlimited)
> > + */
> > +
> > void res_counter_init(struct res_counter *counter);
>
> For these non-static (non-private) functions, please use kernel-doc notation
> (see Documentation/kernel-doc-nano-HOWTO.txt and/or examples in other source files).
> Also, we prefer for the function documentation to be above its definition (implementation)
> rather than above its declaration, so the kernel-doc should be moved to .c files
> instead of living in .h files.
>
Ah, sorry. I'll do so in the next version. Maybe I should move other comments
in res_counter.h to res_counter.c
>
> >
> > /*
> > + * initialize res_counter under hierarchy.
> > + * @counter : the counter
> > + * @parent : the parent of the counter
> > + *
> > + * initialize res_counter and set default limit to 0. and set "parent".
> > + */
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent);
> > +
> > +/*
> > * charge - try to consume more resource.
> > *
> > * @counter: the counter
> > @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> > cnt->failcnt = 0;
> > spin_unlock_irqrestore(&cnt->lock, flags);
> > }
> > +
> > +/**
> > + * Move resources from a parent to a child.
> > + * At success,
> > + * parent->usage += val.
> > + * parent->for_children += val.
> > + * child->limit += val.
> > + *
> > + * @child: an entity to set res->limit. The parent is child->parent.
> > + * @val: the amount of resource to be moved.
> > + * @callback: called when the parent's free resource is not enough to be moved.
> > + * this can be NULL if no callback is necessary.
> > + * @retry: limit for the number of trying to callback.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success, !0 at failure.
> > + *
> > + * The callback returns 0 at success, !0 at failure.
> > + *
> > + */
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > +
> > +/**
> > + * Return resource to its parent.
> > + * At success,
> > + * parent->usage -= val.
> > + * parent->for_children -= val.
> > + * child->limit -= val.
> > + *
> > + * @child: entry to resize. The parent is child->parent.
> > + * @val : How much does child repay to parent ? -1 means 'all'
> > + * @callback: A callback for decreasing resource usage of child before
> > + * returning. If NULL, just deceases child's limit.
> > + * @retry: # of retries at calling callback for freeing resource.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success.
> > + */
> > +
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > #endif
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -179,3 +186,37 @@ counter fields. They are recommended to
> > still can help with it).
> >
> > c. Compile and run :)
> > +
> > +
> > +6. Hierarchy
> > + a. No Hierarchy
> > + each cgroup can use its own private resource.
> > +
> > + b. Hard-wall Hierarhcy
> > + A simple hierarchical tree system for resource isolation.
> > + Allows moving resources only between a parent and its children.
> > + A parent can move its resource to children and remember the amount to
> > + for_children member. A child can get new resource only from its parent.
> > + Limit of a child is the amount of resource which is moved from its parent.
> > +
> > + When add "val" to a child,
> > + parent->usage += val
> > + parent->for_children += val
> > + child->limit += val
> > + When a child returns its resource
> > + parent->usage -= val
> > + parent->for_children -= val
> > + child->limit -= val.
> > +
> > + This implements resource isolation among each group. This works very well
> > + when you want to use strict resource isolation.
> > +
> > + Usage Hint:
> > + This seems for static resource assignment but dynamic resource re-assignment
>
> seems to be?
>
will fix.
Thank you for review!
Thanks,
-Kame
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
@ 2008-06-12 4:59 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 52+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-12 4:59 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-mm@kvack.org, LKML, balbir@linux.vnet.ibm.com,
menage@google.com, xemul@openvz.org, yamamoto@valinux.co.jp
On Wed, 11 Jun 2008 16:24:27 -0700
Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > /*
> > * helpers for accounting
> > */
> >
> > +/*
> > + * initialize res_counter.
> > + * @counter : the counter
> > + *
> > + * initialize res_counter and set default limit to very big value(unlimited)
> > + */
> > +
> > void res_counter_init(struct res_counter *counter);
>
> For these non-static (non-private) functions, please use kernel-doc notation
> (see Documentation/kernel-doc-nano-HOWTO.txt and/or examples in other source files).
> Also, we prefer for the function documentation to be above its definition (implementation)
> rather than above its declaration, so the kernel-doc should be moved to .c files
> instead of living in .h files.
>
Ah, sorry. I'll do so in the next version. Maybe I should move other comments
in res_counter.h to res_counter.c
>
> >
> > /*
> > + * initialize res_counter under hierarchy.
> > + * @counter : the counter
> > + * @parent : the parent of the counter
> > + *
> > + * initialize res_counter and set default limit to 0. and set "parent".
> > + */
> > +void res_counter_init_hierarchy(struct res_counter *counter,
> > + struct res_counter *parent);
> > +
> > +/*
> > * charge - try to consume more resource.
> > *
> > * @counter: the counter
> > @@ -153,4 +192,51 @@ static inline void res_counter_reset_fai
> > cnt->failcnt = 0;
> > spin_unlock_irqrestore(&cnt->lock, flags);
> > }
> > +
> > +/**
> > + * Move resources from a parent to a child.
> > + * At success,
> > + * parent->usage += val.
> > + * parent->for_children += val.
> > + * child->limit += val.
> > + *
> > + * @child: an entity to set res->limit. The parent is child->parent.
> > + * @val: the amount of resource to be moved.
> > + * @callback: called when the parent's free resource is not enough to be moved.
> > + * this can be NULL if no callback is necessary.
> > + * @retry: limit for the number of trying to callback.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success, !0 at failure.
> > + *
> > + * The callback returns 0 at success, !0 at failure.
> > + *
> > + */
> > +
> > +int res_counter_move_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > +
> > +/**
> > + * Return resource to its parent.
> > + * At success,
> > + * parent->usage -= val.
> > + * parent->for_children -= val.
> > + * child->limit -= val.
> > + *
> > + * @child: entry to resize. The parent is child->parent.
> > + * @val : How much does child repay to parent ? -1 means 'all'
> > + * @callback: A callback for decreasing resource usage of child before
> > + * returning. If NULL, just deceases child's limit.
> > + * @retry: # of retries at calling callback for freeing resource.
> > + * -1 means infinite loop. At each retry, yield() is called.
> > + * Returns 0 at success.
> > + */
> > +
> > +int res_counter_return_resource(struct res_counter *child,
> > + unsigned long long val,
> > + int (*callback)(struct res_counter *res, unsigned long long val),
> > + int retry);
> > +
> > #endif
> > Index: temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/Documentation/controllers/resource_counter.txt
> > +++ temp-2.6.26-rc2-mm1/Documentation/controllers/resource_counter.txt
> > @@ -179,3 +186,37 @@ counter fields. They are recommended to
> > still can help with it).
> >
> > c. Compile and run :)
> > +
> > +
> > +6. Hierarchy
> > + a. No Hierarchy
> > + each cgroup can use its own private resource.
> > +
> > + b. Hard-wall Hierarhcy
> > + A simple hierarchical tree system for resource isolation.
> > + Allows moving resources only between a parent and its children.
> > + A parent can move its resource to children and remember the amount to
> > + for_children member. A child can get new resource only from its parent.
> > + Limit of a child is the amount of resource which is moved from its parent.
> > +
> > + When add "val" to a child,
> > + parent->usage += val
> > + parent->for_children += val
> > + child->limit += val
> > + When a child returns its resource
> > + parent->usage -= val
> > + parent->for_children -= val
> > + child->limit -= val.
> > +
> > + This implements resource isolation among each group. This works very well
> > + when you want to use strict resource isolation.
> > +
> > + Usage Hint:
> > + This seems for static resource assignment but dynamic resource re-assignment
>
> seems to be?
>
will fix.
Thank you for review!
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 52+ messages in thread