* [RFC][ only for review ] memory controller bacground reclaim [0/5]
@ 2007-11-28 8:49 KAMEZAWA Hiroyuki
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:49 UTC (permalink / raw)
To: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org
Cc: yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
Hi, this set is for memory controller background reclaim.
Merged YAMAMOTO-san's version onto 2.6.23-rc3-mm1 + my NUMA patch.
And splitted to several sets.
Major changes from his one is
- use kthread instead of work_queue
- adjust high/low watermark when limit changes automatically
and set default value. (a user can specify his own later.)
This version is just for review. (I'll rebase this to the next -mm.)
Patch description
[1/5] ... just a fix
[2/5] ... add set/get interface to res_counter
[3/5] ... add high/low watermark to res_counter
[4/5] ... add high/low watermark value to memory cgroup
[5/5] ... add background reclaim to memory cgroup
Any comments are welcome.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 8:51 ` KAMEZAWA Hiroyuki
[not found] ` <20071128175135.c42adecc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:52 ` [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter KAMEZAWA Hiroyuki
` (4 subsequent siblings)
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:51 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
spinlock is necessary when someone changes res_counter value.
splited out from YAMAMOTO's background page reclaim for memory cgroup set.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
kernel/res_counter.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
===================================================================
--- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-27 14:07:44.000000000 +0900
+++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-27 14:09:40.000000000 +0900
@@ -98,7 +98,7 @@
{
int ret;
char *buf, *end;
- unsigned long long tmp, *val;
+ unsigned long long flags, tmp, *val;
buf = kmalloc(nbytes + 1, GFP_KERNEL);
ret = -ENOMEM;
@@ -121,9 +121,10 @@
if (*end != '\0')
goto out_free;
}
-
+ spin_lock_irqsave(&counter->lock, flags);
val = res_counter_member(counter, member);
*val = tmp;
+ spin_unlock_irqrestore(&counter->lock, flags);
ret = nbytes;
out_free:
kfree(buf);
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:51 ` [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification KAMEZAWA Hiroyuki
@ 2007-11-28 8:52 ` KAMEZAWA Hiroyuki
[not found] ` <20071128175239.e20ec09d.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:54 ` [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter KAMEZAWA Hiroyuki
` (3 subsequent siblings)
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:52 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
At implmenting high/low watermark in res_counter, it will be better to
adjust high/low value when limit changes. (or don't allow user to specify
high/low value)
This patch adds *internal* interface to modify resource value.
(If there are only limit/usage/failcnt, these routines are not necessary but..)
And will be used later.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
include/linux/res_counter.h | 7 +++++++
kernel/res_counter.c | 19 +++++++++++++++++++
2 files changed, 26 insertions(+)
Index: linux-2.6.24-rc3-mm1/include/linux/res_counter.h
===================================================================
--- linux-2.6.24-rc3-mm1.orig/include/linux/res_counter.h 2007-11-28 14:18:21.000000000 +0900
+++ linux-2.6.24-rc3-mm1/include/linux/res_counter.h 2007-11-28 14:18:33.000000000 +0900
@@ -59,6 +59,13 @@
int (*write_strategy)(char *buf, unsigned long long *val));
/*
+ * A routine for set/get limitation value from kernel internal code.
+ * res->lock should be held before call this.
+ */
+unsigned long long res_counter_get(struct res_counter *counter, int member);
+void res_counter_set(struct res_counter *conter, int member,
+ unsigned long long val);
+/*
* the field descriptors. one for each member of res_counter
*/
Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
===================================================================
--- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-28 14:18:21.000000000 +0900
+++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-28 14:18:33.000000000 +0900
@@ -75,6 +75,25 @@
return NULL;
}
+unsigned long long res_counter_get(struct res_counter *res, int member)
+{
+ unsigned long long *val;
+
+ val = res_counter_member(res, member);
+
+ return *val;
+}
+
+void res_counter_set(struct res_counter *res, int member,
+ unsigned long long newval)
+{
+ unsigned long long *val;
+
+ val = res_counter_member(res, member);
+ *val = newval;
+ return;
+}
+
ssize_t res_counter_read(struct res_counter *counter, int member,
const char __user *userbuf, size_t nbytes, loff_t *pos,
int (*read_strategy)(unsigned long long val, char *st_buf))
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:51 ` [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification KAMEZAWA Hiroyuki
2007-11-28 8:52 ` [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter KAMEZAWA Hiroyuki
@ 2007-11-28 8:54 ` KAMEZAWA Hiroyuki
[not found] ` <20071128175408.1ee479f3.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:56 ` [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller KAMEZAWA Hiroyuki
` (2 subsequent siblings)
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
This patch adds high/low watermark parameter to res_counter.
splitted out from YAMAMOTO's background page reclaim for memory cgroup set.
Changes:
* added param watermark_state this allows status check without lock.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
include/linux/res_counter.h | 27 +++++++++++++++++++++++++
kernel/res_counter.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 72 insertions(+), 1 deletion(-)
Index: linux-2.6.24-rc3-mm1/include/linux/res_counter.h
===================================================================
--- linux-2.6.24-rc3-mm1.orig/include/linux/res_counter.h 2007-11-28 14:33:19.000000000 +0900
+++ linux-2.6.24-rc3-mm1/include/linux/res_counter.h 2007-11-28 16:37:32.000000000 +0900
@@ -19,6 +19,12 @@
* the helpers described beyond
*/
+enum watermark_state {
+ RES_WATERMARK_BELOW_LOW,
+ RES_WATERMARK_ABOVE_LOW,
+ RES_WATERMARK_ABOVE_HIGH,
+};
+
struct res_counter {
/*
* the current resource consumption level
@@ -33,10 +39,19 @@
*/
unsigned long long failcnt;
/*
+ * Watermarks
+ * Should be changed automatically when the limit is changed and
+ * keep low < high < limit.
+ */
+ unsigned long long high_watermark;
+ unsigned long long low_watermark;
+ /*
* the lock to protect all of the above.
* the routines below consider this to be IRQ-safe
*/
spinlock_t lock;
+ /* can be read without lock */
+ enum watermark_state watermark_state;
};
/*
@@ -73,6 +88,8 @@
RES_USAGE,
RES_LIMIT,
RES_FAILCNT,
+ RES_HIGH_WATERMARK,
+ RES_LOW_WATERMARK,
};
/*
@@ -131,4 +148,17 @@
return ret;
}
+/*
+ * Helper function for implementing high/low watermark to resource controller.
+ */
+static inline bool res_counter_below_low_watermark(struct res_counter *cnt)
+{
+ return (cnt->watermark_state == RES_WATERMARK_BELOW_LOW);
+}
+
+static inline bool res_counter_above_high_watermark(struct res_counter *cnt)
+{
+ return (cnt->watermark_state == RES_WATERMARK_ABOVE_HIGH);
+}
+
#endif
Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
===================================================================
--- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-28 14:33:19.000000000 +0900
+++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-28 16:33:20.000000000 +0900
@@ -17,6 +17,9 @@
{
spin_lock_init(&counter->lock);
counter->limit = (unsigned long long)LLONG_MAX;
+ counter->low_watermark = (unsigned long long)LLONG_MAX;
+ counter->high_watermark = (unsigned long long)LLONG_MAX;
+ counter->watermark_state = RES_WATERMARK_BELOW_LOW;
}
int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
@@ -27,6 +30,15 @@
}
counter->usage += val;
+
+ if (counter->usage > counter->high_watermark) {
+ counter->watermark_state = RES_WATERMARK_ABOVE_HIGH;
+ return 0;
+ }
+
+ if (counter->usage > counter->low_watermark)
+ counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
+
return 0;
}
@@ -47,6 +59,13 @@
val = counter->usage;
counter->usage -= val;
+
+ if (counter->usage < counter->low_watermark) {
+ counter->watermark_state = RES_WATERMARK_BELOW_LOW;
+ return;
+ }
+ if (counter->usage < counter->high_watermark)
+ counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
}
void res_counter_uncharge(struct res_counter *counter, unsigned long val)
@@ -69,6 +88,10 @@
return &counter->limit;
case RES_FAILCNT:
return &counter->failcnt;
+ case RES_HIGH_WATERMARK:
+ return &counter->high_watermark;
+ case RES_LOW_WATERMARK:
+ return &counter->low_watermark;
};
BUG();
@@ -117,7 +140,7 @@
{
int ret;
char *buf, *end;
- unsigned long long flags, tmp, *val;
+ unsigned long long flags, tmp, *val, limit, low, high;
buf = kmalloc(nbytes + 1, GFP_KERNEL);
ret = -ENOMEM;
@@ -141,6 +164,26 @@
goto out_free;
}
spin_lock_irqsave(&counter->lock, flags);
+ /*
+ * High/Low watermark should be changed automatically AMAP.
+ */
+ switch (member) {
+ case RES_HIGH_WATERMARK:
+ limit = res_counter_get(counter, RES_LIMIT);
+ if (tmp > limit)
+ goto out_free;
+ low = res_counter_get(counter, RES_LOW_WATERMARK);
+ if (tmp <= low)
+ goto out_free;
+ break;
+ case RES_LOW_WATERMARK:
+ high= res_counter_get(counter, RES_HIGH_WATERMARK);
+ if (tmp >= high)
+ goto out_free;
+ break;
+ default:
+ break;
+ }
val = res_counter_member(counter, member);
*val = tmp;
spin_unlock_irqrestore(&counter->lock, flags);
@@ -150,3 +193,4 @@
out:
return ret;
}
+
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
` (2 preceding siblings ...)
2007-11-28 8:54 ` [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter KAMEZAWA Hiroyuki
@ 2007-11-28 8:56 ` KAMEZAWA Hiroyuki
[not found] ` <20071128175607.37df2187.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:57 ` [RFC][ only for review ] memory controller bacground reclaim [5/5] KAMEZAWA Hiroyuki
2007-11-29 11:53 ` [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?) KAMEZAWA Hiroyuki
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:56 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
High Low watermark for page reclaiming in memory cgroup(1)
High-Low value here is implemented for support background reclaim.
- If USAGE is bigger than high watermark, background reclaim starts.
- If USAGE is lower than low watermark, background reclaim stops.
Each value is represented in bytes.
(Not configurable now, but can be easily.)
The routine which reclaim is in the next patch. This patch just adds
high/low watermark value.
Changes from YAMAMOTO-san's version
- I like automatic adjustment of high/low watermark.
So, I added a code to modify high/low watermark every time the limit
changes. A user can custorimize it by hand later.
(I think 93%/97% is not so bad value. need investigation.)
TODO:
- update documentation.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 57 insertions(+), 6 deletions(-)
Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
===================================================================
--- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-28 16:30:44.000000000 +0900
+++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-28 16:44:57.000000000 +0900
@@ -108,16 +108,12 @@
struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
};
+
/*
* The memory controller data structure. The memory controller controls both
* page cache and RSS per cgroup. We would eventually like to provide
* statistics based on the statistics developed by Rik Van Riel for clock-pro,
* to help the administrator determine what knobs to tune.
- *
- * TODO: Add a water mark for the memory controller. Reclaim will begin when
- * we hit the water mark. May be even add a low water mark, such that
- * no reclaim occurs from a cgroup at it's low water mark, this is
- * a feature that will be implemented much later in the future.
*/
struct mem_cgroup {
struct cgroup_subsys_state css;
@@ -162,6 +158,15 @@
#define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */
#define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */
+/*
+ * Default value for high-low watermark for start/stop background reclaim.
+ * represented in percent here. can be customized by user later.
+ * This is applied always limit change.
+ */
+#define DEFAULT_WATERMARK_PERCENT_LOW (93ULL)
+#define DEFAULT_WATERMARK_PERCENT_HIGH (97ULL)
+
+
static inline int page_cgroup_nid(struct page_cgroup *pc)
{
return page_to_nid(pc->page);
@@ -926,6 +931,29 @@
return 0;
}
+static void mem_cgroup_init_watermark(struct cgroup *cont)
+{
+ struct mem_cgroup *mem;
+ unsigned long long val, high, low;
+ unsigned long flags;
+
+ mem = mem_cgroup_from_cont(cont);
+ spin_lock_irqsave(&mem->res.lock, flags);
+ val = res_counter_get(&mem->res, RES_LIMIT);
+ if (val == (unsigned long long) LLONG_MAX) {
+ low = (unsigned long long) LLONG_MAX;
+ high = (unsigned long long) LLONG_MAX;
+ } else {
+ low = val * DEFAULT_WATERMARK_PERCENT_LOW / 100ULL;
+ high = val * DEFAULT_WATERMARK_PERCENT_HIGH / 100ULL;
+ }
+ res_counter_set(&mem->res, RES_LOW_WATERMARK, low);
+ res_counter_set(&mem->res, RES_HIGH_WATERMARK, high);
+ spin_unlock_irqrestore(&mem->res.lock, flags);
+ return;
+}
+
+
static ssize_t mem_cgroup_read(struct cgroup *cont,
struct cftype *cft, struct file *file,
char __user *userbuf, size_t nbytes, loff_t *ppos)
@@ -944,6 +972,17 @@
mem_cgroup_write_strategy);
}
+static ssize_t mem_cgroup_write_limit(struct cgroup *cont, struct cftype *cft,
+ struct file *file, const char __user *userbuf,
+ size_t nbytes, loff_t *ppos)
+{
+ ssize_t ret;
+ ret = mem_cgroup_write(cont, cft, file, userbuf, nbytes, ppos);
+ if (ret > 0)
+ mem_cgroup_init_watermark(cont);
+ return ret;
+}
+
static ssize_t mem_control_type_write(struct cgroup *cont,
struct cftype *cft, struct file *file,
const char __user *userbuf,
@@ -1088,7 +1127,7 @@
{
.name = "limit_in_bytes",
.private = RES_LIMIT,
- .write = mem_cgroup_write,
+ .write = mem_cgroup_write_limit,
.read = mem_cgroup_read,
},
{
@@ -1102,6 +1141,18 @@
.read = mem_control_type_read,
},
{
+ .name = "low_watermark_in_bytes",
+ .private = RES_LOW_WATERMARK,
+ .write = mem_cgroup_write,
+ .read = mem_cgroup_read,
+ },
+ {
+ .name = "high_watermark_in_bytes",
+ .private = RES_HIGH_WATERMARK,
+ .write = mem_cgroup_write,
+ .read = mem_cgroup_read,
+ },
+ {
.name = "force_empty",
.write = mem_force_empty_write,
.read = mem_force_empty_read,
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][ only for review ] memory controller bacground reclaim [5/5]
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
` (3 preceding siblings ...)
2007-11-28 8:56 ` [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller KAMEZAWA Hiroyuki
@ 2007-11-28 8:57 ` KAMEZAWA Hiroyuki
[not found] ` <20071128175713.4e9b8fff.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-29 11:53 ` [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?) KAMEZAWA Hiroyuki
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-28 8:57 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
Create a daemon which does background page reclaim.
This daemon
* starts when usage > high_watermark
* stops when usage < low_watermark.
Because kthread_run() cannot be used when init_mem_cgroup is initialized(Sigh),
thread for init_mem_cgroup is invoked later by initcall.
Changes from YAMAMOTO-san's version
* use kthread instead of workqueue.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
mm/memcontrol.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)
Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
===================================================================
--- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-28 16:44:57.000000000 +0900
+++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-28 17:21:57.000000000 +0900
@@ -30,6 +30,8 @@
#include <linux/spinlock.h>
#include <linux/fs.h>
#include <linux/seq_file.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
#include <asm/uaccess.h>
@@ -122,6 +124,13 @@
*/
struct res_counter res;
/*
+ * background reclaim
+ */
+ struct {
+ wait_queue_head_t waitq;
+ struct task_struct *thread;
+ } daemon;
+ /*
* Per cgroup active and inactive list, similar to the
* per zone LRU lists.
*/
@@ -401,6 +410,17 @@
}
}
+static void
+mem_cgroup_schedule_reclaim(struct mem_cgroup *mem)
+{
+ if (!unlikely(mem->daemon.thread))
+ return;
+ if (!waitqueue_active(&mem->daemon.waitq))
+ return;
+ wake_up_interruptible(&mem->daemon.waitq);
+}
+
+
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
{
int ret;
@@ -677,6 +697,8 @@
mem_cgroup_out_of_memory(mem, GFP_KERNEL);
goto free_pc;
}
+ if (res_counter_above_high_watermark(&mem->res))
+ mem_cgroup_schedule_reclaim(mem);
atomic_set(&pc->ref_cnt, 1);
pc->mem_cgroup = mem;
@@ -832,6 +854,38 @@
}
/*
+ * Background page reclaim daeom for memory controller.
+ */
+
+static int mem_cgroup_reclaim_daemon(void *data)
+{
+ DEFINE_WAIT(wait);
+ struct mem_cgroup *mem = data;
+
+ css_get(&mem->css);
+ set_freezable();
+
+ while (!kthread_should_stop()) {
+ prepare_to_wait(&mem->daemon.waitq, &wait, TASK_INTERRUPTIBLE);
+
+ if (res_counter_below_low_watermark(&mem->res)) {
+ if (!kthread_should_stop()) {
+ schedule();
+ try_to_freeze();
+ }
+ finish_wait(&mem->daemon.waitq, &wait);
+ continue;
+ }
+ finish_wait(&mem->daemon.waitq, &wait);
+ try_to_free_mem_cgroup_pages(mem, GFP_HIGHUSER_MOVABLE);
+ }
+
+ css_put(&mem->css);
+
+ return 0;
+}
+
+/*
* This routine traverse page_cgroup in given list and drop them all.
* This routine ignores page_cgroup->ref_cnt.
* *And* this routine doesn't reclaim page itself, just removes page_cgroup.
@@ -1045,6 +1100,7 @@
{
struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
int ret;
+
ret = mem_cgroup_force_empty(mem);
if (!ret)
ret = nbytes;
@@ -1188,6 +1244,16 @@
static struct mem_cgroup init_mem_cgroup;
+static int __init mem_cgroup_reclaim_init(void)
+{
+ init_mem_cgroup.daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
+ &init_mem_cgroup, "memcontd");
+ if (IS_ERR(init_mem_cgroup.daemon.thread))
+ BUG();
+ return 0;
+}
+late_initcall(mem_cgroup_reclaim_init);
+
static struct cgroup_subsys_state *
mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -1212,6 +1278,17 @@
if (alloc_mem_cgroup_per_zone_info(mem, node))
goto free_out;
+ /* Memory Reclaim Daemon per cgroup */
+ init_waitqueue_head(&mem->daemon.waitq);
+ if (mem != &init_mem_cgroup) {
+ /* Complicated...but we cannot call kthread create here..*/
+ /* init call will later assign kthread */
+ mem->daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
+ mem, "memcontd");
+ if (IS_ERR(mem->daemon.thread))
+ goto free_out;
+ }
+
return &mem->css;
free_out:
for_each_node_state(node, N_POSSIBLE)
@@ -1226,6 +1303,7 @@
{
struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
mem_cgroup_force_empty(mem);
+ kthread_stop(mem->daemon.thread);
}
static void mem_cgroup_destroy(struct cgroup_subsys *ss,
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [5/5]
[not found] ` <20071128175713.4e9b8fff.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 11:06 ` Pavel Emelyanov
[not found] ` <474D4BAE.7090407-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-12-01 7:16 ` Paul Menage
1 sibling, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 11:06 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> Create a daemon which does background page reclaim.
>
> This daemon
> * starts when usage > high_watermark
> * stops when usage < low_watermark.
>
> Because kthread_run() cannot be used when init_mem_cgroup is initialized(Sigh),
> thread for init_mem_cgroup is invoked later by initcall.
>
> Changes from YAMAMOTO-san's version
> * use kthread instead of workqueue.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
>
>
>
> mm/memcontrol.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 79 insertions(+)
>
> Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-28 16:44:57.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-28 17:21:57.000000000 +0900
> @@ -30,6 +30,8 @@
> #include <linux/spinlock.h>
> #include <linux/fs.h>
> #include <linux/seq_file.h>
> +#include <linux/kthread.h>
> +#include <linux/freezer.h>
>
> #include <asm/uaccess.h>
>
> @@ -122,6 +124,13 @@
> */
> struct res_counter res;
> /*
> + * background reclaim
> + */
> + struct {
> + wait_queue_head_t waitq;
> + struct task_struct *thread;
> + } daemon;
Does this HAS to be a struct?
> + /*
> * Per cgroup active and inactive list, similar to the
> * per zone LRU lists.
> */
> @@ -401,6 +410,17 @@
> }
> }
>
> +static void
> +mem_cgroup_schedule_reclaim(struct mem_cgroup *mem)
> +{
> + if (!unlikely(mem->daemon.thread))
> + return;
> + if (!waitqueue_active(&mem->daemon.waitq))
> + return;
> + wake_up_interruptible(&mem->daemon.waitq);
> +}
> +
> +
> int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> {
> int ret;
> @@ -677,6 +697,8 @@
> mem_cgroup_out_of_memory(mem, GFP_KERNEL);
> goto free_pc;
> }
> + if (res_counter_above_high_watermark(&mem->res))
> + mem_cgroup_schedule_reclaim(mem);
>
> atomic_set(&pc->ref_cnt, 1);
> pc->mem_cgroup = mem;
> @@ -832,6 +854,38 @@
> }
>
> /*
> + * Background page reclaim daeom for memory controller.
> + */
> +
> +static int mem_cgroup_reclaim_daemon(void *data)
> +{
> + DEFINE_WAIT(wait);
> + struct mem_cgroup *mem = data;
> +
> + css_get(&mem->css);
Won't this prevent the css from being removed?
> + set_freezable();
> +
> + while (!kthread_should_stop()) {
> + prepare_to_wait(&mem->daemon.waitq, &wait, TASK_INTERRUPTIBLE);
> +
> + if (res_counter_below_low_watermark(&mem->res)) {
> + if (!kthread_should_stop()) {
> + schedule();
> + try_to_freeze();
> + }
> + finish_wait(&mem->daemon.waitq, &wait);
> + continue;
> + }
> + finish_wait(&mem->daemon.waitq, &wait);
> + try_to_free_mem_cgroup_pages(mem, GFP_HIGHUSER_MOVABLE);
> + }
> +
> + css_put(&mem->css);
> +
> + return 0;
> +}
> +
> +/*
> * This routine traverse page_cgroup in given list and drop them all.
> * This routine ignores page_cgroup->ref_cnt.
> * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
> @@ -1045,6 +1100,7 @@
> {
> struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> int ret;
> +
This hunk is not needed :)
> ret = mem_cgroup_force_empty(mem);
> if (!ret)
> ret = nbytes;
> @@ -1188,6 +1244,16 @@
>
> static struct mem_cgroup init_mem_cgroup;
>
> +static int __init mem_cgroup_reclaim_init(void)
> +{
> + init_mem_cgroup.daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
> + &init_mem_cgroup, "memcontd");
> + if (IS_ERR(init_mem_cgroup.daemon.thread))
> + BUG();
> + return 0;
> +}
> +late_initcall(mem_cgroup_reclaim_init);
> +
> static struct cgroup_subsys_state *
> mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> {
> @@ -1212,6 +1278,17 @@
> if (alloc_mem_cgroup_per_zone_info(mem, node))
> goto free_out;
>
> + /* Memory Reclaim Daemon per cgroup */
> + init_waitqueue_head(&mem->daemon.waitq);
> + if (mem != &init_mem_cgroup) {
> + /* Complicated...but we cannot call kthread create here..*/
> + /* init call will later assign kthread */
> + mem->daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
> + mem, "memcontd");
> + if (IS_ERR(mem->daemon.thread))
> + goto free_out;
goto free_mem_cgroup_per_zone_info()?
> + }
> +
> return &mem->css;
> free_out:
> for_each_node_state(node, N_POSSIBLE)
> @@ -1226,6 +1303,7 @@
> {
> struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> mem_cgroup_force_empty(mem);
> + kthread_stop(mem->daemon.thread);
> }
>
> static void mem_cgroup_destroy(struct cgroup_subsys *ss,
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification
[not found] ` <20071128175135.c42adecc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 11:08 ` Pavel Emelyanov
[not found] ` <474D4C2F.8020701-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 11:08 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> spinlock is necessary when someone changes res_counter value.
> splited out from YAMAMOTO's background page reclaim for memory cgroup set.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
>
>
> kernel/res_counter.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-27 14:07:44.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-27 14:09:40.000000000 +0900
> @@ -98,7 +98,7 @@
> {
> int ret;
> char *buf, *end;
> - unsigned long long tmp, *val;
> + unsigned long long flags, tmp, *val;
Flags for irqsave should be unsigned long. No?
> buf = kmalloc(nbytes + 1, GFP_KERNEL);
> ret = -ENOMEM;
> @@ -121,9 +121,10 @@
> if (*end != '\0')
> goto out_free;
> }
> -
> + spin_lock_irqsave(&counter->lock, flags);
> val = res_counter_member(counter, member);
> *val = tmp;
> + spin_unlock_irqrestore(&counter->lock, flags);
> ret = nbytes;
> out_free:
> kfree(buf);
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter
[not found] ` <20071128175239.e20ec09d.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 11:09 ` Pavel Emelyanov
[not found] ` <474D4C66.2080303-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 11:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> At implmenting high/low watermark in res_counter, it will be better to
> adjust high/low value when limit changes. (or don't allow user to specify
> high/low value)
>
> This patch adds *internal* interface to modify resource value.
> (If there are only limit/usage/failcnt, these routines are not necessary but..)
> And will be used later.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>
> include/linux/res_counter.h | 7 +++++++
> kernel/res_counter.c | 19 +++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> Index: linux-2.6.24-rc3-mm1/include/linux/res_counter.h
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/include/linux/res_counter.h 2007-11-28 14:18:21.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/include/linux/res_counter.h 2007-11-28 14:18:33.000000000 +0900
> @@ -59,6 +59,13 @@
> int (*write_strategy)(char *buf, unsigned long long *val));
>
> /*
> + * A routine for set/get limitation value from kernel internal code.
> + * res->lock should be held before call this.
> + */
> +unsigned long long res_counter_get(struct res_counter *counter, int member);
> +void res_counter_set(struct res_counter *conter, int member,
> + unsigned long long val);
> +/*
> * the field descriptors. one for each member of res_counter
> */
>
> Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-28 14:18:21.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-28 14:18:33.000000000 +0900
> @@ -75,6 +75,25 @@
> return NULL;
> }
>
> +unsigned long long res_counter_get(struct res_counter *res, int member)
> +{
> + unsigned long long *val;
> +
> + val = res_counter_member(res, member);
> +
> + return *val;
> +}
> +
> +void res_counter_set(struct res_counter *res, int member,
> + unsigned long long newval)
> +{
> + unsigned long long *val;
> +
> + val = res_counter_member(res, member);
> + *val = newval;
You put locking here in the res_counter_write() (patch #1). Why
is it missed here?
> + return;
> +}
> +
> ssize_t res_counter_read(struct res_counter *counter, int member,
> const char __user *userbuf, size_t nbytes, loff_t *pos,
> int (*read_strategy)(unsigned long long val, char *st_buf))
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <20071128175408.1ee479f3.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 11:12 ` Pavel Emelyanov
[not found] ` <474D4D35.9060603-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 2:56 ` YAMAMOTO Takashi
1 sibling, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 11:12 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> This patch adds high/low watermark parameter to res_counter.
> splitted out from YAMAMOTO's background page reclaim for memory cgroup set.
>
> Changes:
> * added param watermark_state this allows status check without lock.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
>
> include/linux/res_counter.h | 27 +++++++++++++++++++++++++
> kernel/res_counter.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 72 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.24-rc3-mm1/include/linux/res_counter.h
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/include/linux/res_counter.h 2007-11-28 14:33:19.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/include/linux/res_counter.h 2007-11-28 16:37:32.000000000 +0900
> @@ -19,6 +19,12 @@
> * the helpers described beyond
> */
>
> +enum watermark_state {
> + RES_WATERMARK_BELOW_LOW,
> + RES_WATERMARK_ABOVE_LOW,
> + RES_WATERMARK_ABOVE_HIGH,
> +};
> +
> struct res_counter {
> /*
> * the current resource consumption level
> @@ -33,10 +39,19 @@
> */
> unsigned long long failcnt;
> /*
> + * Watermarks
> + * Should be changed automatically when the limit is changed and
> + * keep low < high < limit.
> + */
> + unsigned long long high_watermark;
> + unsigned long long low_watermark;
> + /*
> * the lock to protect all of the above.
> * the routines below consider this to be IRQ-safe
> */
> spinlock_t lock;
> + /* can be read without lock */
> + enum watermark_state watermark_state;
> };
>
> /*
> @@ -73,6 +88,8 @@
> RES_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_HIGH_WATERMARK,
> + RES_LOW_WATERMARK,
I'd prefer some shorter names. Like RES_HWMARK and RES_LWMARK.
> };
>
> /*
> @@ -131,4 +148,17 @@
> return ret;
> }
>
> +/*
> + * Helper function for implementing high/low watermark to resource controller.
> + */
> +static inline bool res_counter_below_low_watermark(struct res_counter *cnt)
> +{
> + return (cnt->watermark_state == RES_WATERMARK_BELOW_LOW);
> +}
> +
> +static inline bool res_counter_above_high_watermark(struct res_counter *cnt)
> +{
> + return (cnt->watermark_state == RES_WATERMARK_ABOVE_HIGH);
> +}
> +
> #endif
> Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-28 14:33:19.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-28 16:33:20.000000000 +0900
> @@ -17,6 +17,9 @@
> {
> spin_lock_init(&counter->lock);
> counter->limit = (unsigned long long)LLONG_MAX;
> + counter->low_watermark = (unsigned long long)LLONG_MAX;
> + counter->high_watermark = (unsigned long long)LLONG_MAX;
> + counter->watermark_state = RES_WATERMARK_BELOW_LOW;
> }
>
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> @@ -27,6 +30,15 @@
> }
>
> counter->usage += val;
> +
> + if (counter->usage > counter->high_watermark) {
> + counter->watermark_state = RES_WATERMARK_ABOVE_HIGH;
> + return 0;
> + }
"else" would look much better here :)
> + if (counter->usage > counter->low_watermark)
> + counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
> +
> return 0;
> }
>
> @@ -47,6 +59,13 @@
> val = counter->usage;
>
> counter->usage -= val;
> +
> + if (counter->usage < counter->low_watermark) {
> + counter->watermark_state = RES_WATERMARK_BELOW_LOW;
> + return;
> + }
and here
> + if (counter->usage < counter->high_watermark)
> + counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
> }
>
> void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> @@ -69,6 +88,10 @@
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_HIGH_WATERMARK:
> + return &counter->high_watermark;
> + case RES_LOW_WATERMARK:
> + return &counter->low_watermark;
> };
>
> BUG();
> @@ -117,7 +140,7 @@
> {
> int ret;
> char *buf, *end;
> - unsigned long long flags, tmp, *val;
> + unsigned long long flags, tmp, *val, limit, low, high;
>
> buf = kmalloc(nbytes + 1, GFP_KERNEL);
> ret = -ENOMEM;
> @@ -141,6 +164,26 @@
> goto out_free;
> }
> spin_lock_irqsave(&counter->lock, flags);
> + /*
> + * High/Low watermark should be changed automatically AMAP.
> + */
> + switch (member) {
> + case RES_HIGH_WATERMARK:
> + limit = res_counter_get(counter, RES_LIMIT);
> + if (tmp > limit)
> + goto out_free;
> + low = res_counter_get(counter, RES_LOW_WATERMARK);
> + if (tmp <= low)
> + goto out_free;
> + break;
> + case RES_LOW_WATERMARK:
> + high= res_counter_get(counter, RES_HIGH_WATERMARK);
> + if (tmp >= high)
> + goto out_free;
> + break;
Why there's no checks for limit? Smth like
case RES_LIMIT:
high = res_counter_get(counter, RES_HIGH_WATERMARK);
if (tmp < high)
goto out_free;
> + default:
> + break;
> + }
> val = res_counter_member(counter, member);
> *val = tmp;
> spin_unlock_irqrestore(&counter->lock, flags);
> @@ -150,3 +193,4 @@
> out:
> return ret;
> }
> +
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <20071128175607.37df2187.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-28 11:20 ` Pavel Emelyanov
[not found] ` <474D4F01.4070705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-28 12:20 ` Pavel Emelyanov
2007-12-01 7:09 ` Paul Menage
2 siblings, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 11:20 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> High Low watermark for page reclaiming in memory cgroup(1)
>
> High-Low value here is implemented for support background reclaim.
>
> - If USAGE is bigger than high watermark, background reclaim starts.
> - If USAGE is lower than low watermark, background reclaim stops.
>
> Each value is represented in bytes.
> (Not configurable now, but can be easily.)
> The routine which reclaim is in the next patch. This patch just adds
> high/low watermark value.
>
> Changes from YAMAMOTO-san's version
> - I like automatic adjustment of high/low watermark.
> So, I added a code to modify high/low watermark every time the limit
> changes. A user can custorimize it by hand later.
> (I think 93%/97% is not so bad value. need investigation.)
>
> TODO:
> - update documentation.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>
> mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 57 insertions(+), 6 deletions(-)
>
> Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-28 16:30:44.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-28 16:44:57.000000000 +0900
> @@ -108,16 +108,12 @@
> struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
> };
>
> +
> /*
> * The memory controller data structure. The memory controller controls both
> * page cache and RSS per cgroup. We would eventually like to provide
> * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> * to help the administrator determine what knobs to tune.
> - *
> - * TODO: Add a water mark for the memory controller. Reclaim will begin when
> - * we hit the water mark. May be even add a low water mark, such that
> - * no reclaim occurs from a cgroup at it's low water mark, this is
> - * a feature that will be implemented much later in the future.
> */
> struct mem_cgroup {
> struct cgroup_subsys_state css;
> @@ -162,6 +158,15 @@
> #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */
> #define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */
>
> +/*
> + * Default value for high-low watermark for start/stop background reclaim.
> + * represented in percent here. can be customized by user later.
> + * This is applied always limit change.
> + */
> +#define DEFAULT_WATERMARK_PERCENT_LOW (93ULL)
> +#define DEFAULT_WATERMARK_PERCENT_HIGH (97ULL)
> +
> +
> static inline int page_cgroup_nid(struct page_cgroup *pc)
> {
> return page_to_nid(pc->page);
> @@ -926,6 +931,29 @@
> return 0;
> }
>
> +static void mem_cgroup_init_watermark(struct cgroup *cont)
> +{
> + struct mem_cgroup *mem;
> + unsigned long long val, high, low;
> + unsigned long flags;
> +
> + mem = mem_cgroup_from_cont(cont);
> + spin_lock_irqsave(&mem->res.lock, flags);
> + val = res_counter_get(&mem->res, RES_LIMIT);
> + if (val == (unsigned long long) LLONG_MAX) {
> + low = (unsigned long long) LLONG_MAX;
> + high = (unsigned long long) LLONG_MAX;
> + } else {
> + low = val * DEFAULT_WATERMARK_PERCENT_LOW / 100ULL;
> + high = val * DEFAULT_WATERMARK_PERCENT_HIGH / 100ULL;
> + }
> + res_counter_set(&mem->res, RES_LOW_WATERMARK, low);
> + res_counter_set(&mem->res, RES_HIGH_WATERMARK, high);
> + spin_unlock_irqrestore(&mem->res.lock, flags);
> + return;
> +}
> +
> +
> static ssize_t mem_cgroup_read(struct cgroup *cont,
> struct cftype *cft, struct file *file,
> char __user *userbuf, size_t nbytes, loff_t *ppos)
> @@ -944,6 +972,17 @@
> mem_cgroup_write_strategy);
> }
>
> +static ssize_t mem_cgroup_write_limit(struct cgroup *cont, struct cftype *cft,
> + struct file *file, const char __user *userbuf,
> + size_t nbytes, loff_t *ppos)
> +{
> + ssize_t ret;
> + ret = mem_cgroup_write(cont, cft, file, userbuf, nbytes, ppos);
> + if (ret > 0)
> + mem_cgroup_init_watermark(cont);
No, please, no! I'd be very disappointed if I tune high and low watermarks
carefully and then they are silently re-set after I tune the limit. Better
(see my comment to patch #3) return -EINVAL in case I try to set limit
below hwmark. Please :)
> + return ret;
> +}
> +
> static ssize_t mem_control_type_write(struct cgroup *cont,
> struct cftype *cft, struct file *file,
> const char __user *userbuf,
> @@ -1088,7 +1127,7 @@
> {
> .name = "limit_in_bytes",
> .private = RES_LIMIT,
> - .write = mem_cgroup_write,
> + .write = mem_cgroup_write_limit,
> .read = mem_cgroup_read,
> },
> {
> @@ -1102,6 +1141,18 @@
> .read = mem_control_type_read,
> },
> {
> + .name = "low_watermark_in_bytes",
> + .private = RES_LOW_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
> + {
> + .name = "high_watermark_in_bytes",
> + .private = RES_HIGH_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
> + {
> .name = "force_empty",
> .write = mem_force_empty_write,
> .read = mem_force_empty_read,
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <20071128175607.37df2187.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:20 ` Pavel Emelyanov
@ 2007-11-28 12:20 ` Pavel Emelyanov
[not found] ` <474D5D1A.4070409-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-12-01 7:09 ` Paul Menage
2 siblings, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2007-11-28 12:20 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
KAMEZAWA Hiroyuki wrote:
> High Low watermark for page reclaiming in memory cgroup(1)
>
> High-Low value here is implemented for support background reclaim.
>
> - If USAGE is bigger than high watermark, background reclaim starts.
> - If USAGE is lower than low watermark, background reclaim stops.
>
> Each value is represented in bytes.
> (Not configurable now, but can be easily.)
> The routine which reclaim is in the next patch. This patch just adds
> high/low watermark value.
>
> Changes from YAMAMOTO-san's version
> - I like automatic adjustment of high/low watermark.
> So, I added a code to modify high/low watermark every time the limit
> changes. A user can custorimize it by hand later.
> (I think 93%/97% is not so bad value. need investigation.)
>
> TODO:
> - update documentation.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>
> mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 57 insertions(+), 6 deletions(-)
>
> Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-28 16:30:44.000000000 +0900
> +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-28 16:44:57.000000000 +0900
> @@ -108,16 +108,12 @@
> struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
> };
>
> +
> /*
> * The memory controller data structure. The memory controller controls both
> * page cache and RSS per cgroup. We would eventually like to provide
> * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> * to help the administrator determine what knobs to tune.
> - *
> - * TODO: Add a water mark for the memory controller. Reclaim will begin when
> - * we hit the water mark. May be even add a low water mark, such that
> - * no reclaim occurs from a cgroup at it's low water mark, this is
> - * a feature that will be implemented much later in the future.
> */
> struct mem_cgroup {
> struct cgroup_subsys_state css;
> @@ -162,6 +158,15 @@
> #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */
> #define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */
>
> +/*
> + * Default value for high-low watermark for start/stop background reclaim.
> + * represented in percent here. can be customized by user later.
> + * This is applied always limit change.
> + */
> +#define DEFAULT_WATERMARK_PERCENT_LOW (93ULL)
> +#define DEFAULT_WATERMARK_PERCENT_HIGH (97ULL)
> +
> +
> static inline int page_cgroup_nid(struct page_cgroup *pc)
> {
> return page_to_nid(pc->page);
> @@ -926,6 +931,29 @@
> return 0;
> }
>
> +static void mem_cgroup_init_watermark(struct cgroup *cont)
> +{
> + struct mem_cgroup *mem;
> + unsigned long long val, high, low;
> + unsigned long flags;
> +
> + mem = mem_cgroup_from_cont(cont);
> + spin_lock_irqsave(&mem->res.lock, flags);
> + val = res_counter_get(&mem->res, RES_LIMIT);
> + if (val == (unsigned long long) LLONG_MAX) {
> + low = (unsigned long long) LLONG_MAX;
> + high = (unsigned long long) LLONG_MAX;
> + } else {
> + low = val * DEFAULT_WATERMARK_PERCENT_LOW / 100ULL;
> + high = val * DEFAULT_WATERMARK_PERCENT_HIGH / 100ULL;
BTW, I tried to compile such a code:
unsigned long long x, y;
y = ...;
x = y / 100ULL;
(similar to yours) and that's what I got:
kernel/built-in.o: In function `xxx':
: undefined reference to `__udivdi3'
It looks like i386 doesn't have any support for ULL divisions.
It doesn't have it in CPU, and I thought that it was some-how
emulated, but it is not...
Did I miss something?
> + }
> + res_counter_set(&mem->res, RES_LOW_WATERMARK, low);
> + res_counter_set(&mem->res, RES_HIGH_WATERMARK, high);
> + spin_unlock_irqrestore(&mem->res.lock, flags);
> + return;
> +}
> +
> +
> static ssize_t mem_cgroup_read(struct cgroup *cont,
> struct cftype *cft, struct file *file,
> char __user *userbuf, size_t nbytes, loff_t *ppos)
> @@ -944,6 +972,17 @@
> mem_cgroup_write_strategy);
> }
>
> +static ssize_t mem_cgroup_write_limit(struct cgroup *cont, struct cftype *cft,
> + struct file *file, const char __user *userbuf,
> + size_t nbytes, loff_t *ppos)
> +{
> + ssize_t ret;
> + ret = mem_cgroup_write(cont, cft, file, userbuf, nbytes, ppos);
> + if (ret > 0)
> + mem_cgroup_init_watermark(cont);
> + return ret;
> +}
> +
> static ssize_t mem_control_type_write(struct cgroup *cont,
> struct cftype *cft, struct file *file,
> const char __user *userbuf,
> @@ -1088,7 +1127,7 @@
> {
> .name = "limit_in_bytes",
> .private = RES_LIMIT,
> - .write = mem_cgroup_write,
> + .write = mem_cgroup_write_limit,
> .read = mem_cgroup_read,
> },
> {
> @@ -1102,6 +1141,18 @@
> .read = mem_control_type_read,
> },
> {
> + .name = "low_watermark_in_bytes",
> + .private = RES_LOW_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
> + {
> + .name = "high_watermark_in_bytes",
> + .private = RES_HIGH_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
> + {
> .name = "force_empty",
> .write = mem_force_empty_write,
> .read = mem_force_empty_read,
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification
[not found] ` <474D4C2F.8020701-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:14 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:14 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 14:08:31 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> KAMEZAWA Hiroyuki wrote:
> > spinlock is necessary when someone changes res_counter value.
> > splited out from YAMAMOTO's background page reclaim for memory cgroup set.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> > From: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
> >
> >
> > kernel/res_counter.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6.24-rc3-mm1/kernel/res_counter.c
> > ===================================================================
> > --- linux-2.6.24-rc3-mm1.orig/kernel/res_counter.c 2007-11-27 14:07:44.000000000 +0900
> > +++ linux-2.6.24-rc3-mm1/kernel/res_counter.c 2007-11-27 14:09:40.000000000 +0900
> > @@ -98,7 +98,7 @@
> > {
> > int ret;
> > char *buf, *end;
> > - unsigned long long tmp, *val;
> > + unsigned long long flags, tmp, *val;
>
> Flags for irqsave should be unsigned long. No?
>
yes. It must be.
Thanks, I'll fix.
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter
[not found] ` <474D4C66.2080303-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:16 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:16 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 14:09:26 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > +void res_counter_set(struct res_counter *res, int member,
> > + unsigned long long newval)
> > +{
> > + unsigned long long *val;
> > +
> > + val = res_counter_member(res, member);
> > + *val = newval;
>
> You put locking here in the res_counter_write() (patch #1). Why
> is it missed here?
>
Sorry, my mistake.
I'll drop this patch because automatic value adjustment patch will be dropped.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <474D4D35.9060603-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:18 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:18 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 14:12:53 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > /*
> > @@ -73,6 +88,8 @@
> > RES_USAGE,
> > RES_LIMIT,
> > RES_FAILCNT,
> > + RES_HIGH_WATERMARK,
> > + RES_LOW_WATERMARK,
>
> I'd prefer some shorter names. Like RES_HWMARK and RES_LWMARK.
>
Hmm, ok.
> > counter->usage += val;
> > +
> > + if (counter->usage > counter->high_watermark) {
> > + counter->watermark_state = RES_WATERMARK_ABOVE_HIGH;
> > + return 0;
> > + }
>
> "else" would look much better here :)
I agree and will fix. thanks.
>
> > + if (counter->usage > counter->low_watermark)
> > + counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
> > +
> > return 0;
> > }
> >
> > @@ -47,6 +59,13 @@
> > val = counter->usage;
> >
> > counter->usage -= val;
> > +
> > + if (counter->usage < counter->low_watermark) {
> > + counter->watermark_state = RES_WATERMARK_BELOW_LOW;
> > + return;
> > + }
>
> and here
>
here, too.
> > + if (counter->usage < counter->high_watermark)
> > + counter->watermark_state = RES_WATERMARK_ABOVE_LOW;
> > }
> >
> > void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> > @@ -69,6 +88,10 @@
> > return &counter->limit;
> > case RES_FAILCNT:
> > return &counter->failcnt;
> > + case RES_HIGH_WATERMARK:
> > + return &counter->high_watermark;
> > + case RES_LOW_WATERMARK:
> > + return &counter->low_watermark;
> > };
> >
> > BUG();
> > @@ -117,7 +140,7 @@
> > {
> > int ret;
> > char *buf, *end;
> > - unsigned long long flags, tmp, *val;
> > + unsigned long long flags, tmp, *val, limit, low, high;
> >
> > buf = kmalloc(nbytes + 1, GFP_KERNEL);
> > ret = -ENOMEM;
> > @@ -141,6 +164,26 @@
> > goto out_free;
> > }
> > spin_lock_irqsave(&counter->lock, flags);
> > + /*
> > + * High/Low watermark should be changed automatically AMAP.
> > + */
> > + switch (member) {
> > + case RES_HIGH_WATERMARK:
> > + limit = res_counter_get(counter, RES_LIMIT);
> > + if (tmp > limit)
> > + goto out_free;
> > + low = res_counter_get(counter, RES_LOW_WATERMARK);
> > + if (tmp <= low)
> > + goto out_free;
> > + break;
> > + case RES_LOW_WATERMARK:
> > + high= res_counter_get(counter, RES_HIGH_WATERMARK);
> > + if (tmp >= high)
> > + goto out_free;
> > + break;
>
> Why there's no checks for limit? Smth like
>
> case RES_LIMIT:
> high = res_counter_get(counter, RES_HIGH_WATERMARK);
> if (tmp < high)
> goto out_free;
>
Ok, I'll drop automatic adjustment patch and add check for LIMIT.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <474D5D1A.4070409-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:20 ` KAMEZAWA Hiroyuki
[not found] ` <20071129102044.8087386c.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:20 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 15:20:42 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > + mem = mem_cgroup_from_cont(cont);
> > + spin_lock_irqsave(&mem->res.lock, flags);
> > + val = res_counter_get(&mem->res, RES_LIMIT);
> > + if (val == (unsigned long long) LLONG_MAX) {
> > + low = (unsigned long long) LLONG_MAX;
> > + high = (unsigned long long) LLONG_MAX;
> > + } else {
> > + low = val * DEFAULT_WATERMARK_PERCENT_LOW / 100ULL;
> > + high = val * DEFAULT_WATERMARK_PERCENT_HIGH / 100ULL;
>
> BTW, I tried to compile such a code:
>
> unsigned long long x, y;
> y = ...;
> x = y / 100ULL;
>
> (similar to yours) and that's what I got:
>
> kernel/built-in.o: In function `xxx':
> : undefined reference to `__udivdi3'
>
> It looks like i386 doesn't have any support for ULL divisions.
> It doesn't have it in CPU, and I thought that it was some-how
> emulated, but it is not...
>
> Did I miss something?
>
Ah, I didn't try i386...
But I'll drop this automatic watermark adjustment part.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [5/5]
[not found] ` <474D4BAE.7090407-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:26 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:26 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 14:06:22 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > + struct {
> > + wait_queue_head_t waitq;
> > + struct task_struct *thread;
> > + } daemon;
>
> Does this HAS to be a struct?
>
No, but for shorter name of member.
(I'd like to add another waitq for avoiding contention in reclaiming,
If I can.)
> > + * Background page reclaim daeom for memory controller.
> > + */
> > +
> > +static int mem_cgroup_reclaim_daemon(void *data)
> > +{
> > + DEFINE_WAIT(wait);
> > + struct mem_cgroup *mem = data;
> > +
> > + css_get(&mem->css);
>
> Won't this prevent the css from being removed?
>
This refcnt is dropped when this thread dies.
pre_destroy handler handles this at rmdir().
> > struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> > int ret;
> > +
>
> This hunk is not needed :)
>
yes :)
> > ret = mem_cgroup_force_empty(mem);
> > if (!ret)
> > ret = nbytes;
> > @@ -1188,6 +1244,16 @@
> >
> > static struct mem_cgroup init_mem_cgroup;
> >
> > +static int __init mem_cgroup_reclaim_init(void)
> > +{
> > + init_mem_cgroup.daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
> > + &init_mem_cgroup, "memcontd");
> > + if (IS_ERR(init_mem_cgroup.daemon.thread))
> > + BUG();
> > + return 0;
> > +}
> > +late_initcall(mem_cgroup_reclaim_init);
> > +
> > static struct cgroup_subsys_state *
> > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > {
> > @@ -1212,6 +1278,17 @@
> > if (alloc_mem_cgroup_per_zone_info(mem, node))
> > goto free_out;
> >
> > + /* Memory Reclaim Daemon per cgroup */
> > + init_waitqueue_head(&mem->daemon.waitq);
> > + if (mem != &init_mem_cgroup) {
> > + /* Complicated...but we cannot call kthread create here..*/
> > + /* init call will later assign kthread */
> > + mem->daemon.thread = kthread_run(mem_cgroup_reclaim_daemon,
> > + mem, "memcontd");
> > + if (IS_ERR(mem->daemon.thread))
> > + goto free_out;
>
> goto free_mem_cgroup_per_zone_info()?
>
Ah.. there may be some bugs. will fix soon.
Thank you for all your comments.
Regards,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <474D4F01.4070705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-11-29 1:27 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 1:27 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 14:20:33 +0300
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > +static ssize_t mem_cgroup_write_limit(struct cgroup *cont, struct cftype *cft,
> > + struct file *file, const char __user *userbuf,
> > + size_t nbytes, loff_t *ppos)
> > +{
> > + ssize_t ret;
> > + ret = mem_cgroup_write(cont, cft, file, userbuf, nbytes, ppos);
> > + if (ret > 0)
> > + mem_cgroup_init_watermark(cont);
>
> No, please, no! I'd be very disappointed if I tune high and low watermarks
> carefully and then they are silently re-set after I tune the limit. Better
> (see my comment to patch #3) return -EINVAL in case I try to set limit
> below hwmark. Please :)
>
Ok, I'll drop this.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <20071128175408.1ee479f3.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:12 ` Pavel Emelyanov
@ 2007-11-29 2:56 ` YAMAMOTO Takashi
[not found] ` <20071129025609.0B7111CFE7C-Pcsii4f/SVk@public.gmane.org>
1 sibling, 1 reply; 31+ messages in thread
From: YAMAMOTO Takashi @ 2007-11-29 2:56 UTC (permalink / raw)
To: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
Cc: containers-qjLDD68F18O7TbgM5vRIOg, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
xemul-GEFAQzZX7r8dnm+yROfE0A
> This patch adds high/low watermark parameter to res_counter.
> splitted out from YAMAMOTO's background page reclaim for memory cgroup set.
thanks.
> + * Watermarks
> + * Should be changed automatically when the limit is changed and
> + * keep low < high < limit.
> + */
> + unsigned long long high_watermark;
> + unsigned long long low_watermark;
> + /*
do you have any specific reason not to allow low == high == limit?
if you want to ensure low < high < limit, it's better to make the
default values below meet the condition as well.
to me, it seems weird to prevent users from making these values back to
the default.
YAMAMOTO Takashi
> @@ -17,6 +17,9 @@
> {
> spin_lock_init(&counter->lock);
> counter->limit = (unsigned long long)LLONG_MAX;
> + counter->low_watermark = (unsigned long long)LLONG_MAX;
> + counter->high_watermark = (unsigned long long)LLONG_MAX;
> + counter->watermark_state = RES_WATERMARK_BELOW_LOW;
> }
>
> int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <20071129025609.0B7111CFE7C-Pcsii4f/SVk@public.gmane.org>
@ 2007-11-29 3:24 ` KAMEZAWA Hiroyuki
[not found] ` <20071129122402.101c5fbc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 3:24 UTC (permalink / raw)
To: YAMAMOTO Takashi
Cc: containers-qjLDD68F18O7TbgM5vRIOg, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
xemul-GEFAQzZX7r8dnm+yROfE0A
On Thu, 29 Nov 2007 11:56:08 +0900 (JST)
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org (YAMAMOTO Takashi) wrote:
> > This patch adds high/low watermark parameter to res_counter.
> > splitted out from YAMAMOTO's background page reclaim for memory cgroup set.
>
> thanks.
>
> > + * Watermarks
> > + * Should be changed automatically when the limit is changed and
> > + * keep low < high < limit.
> > + */
> > + unsigned long long high_watermark;
> > + unsigned long long low_watermark;
> > + /*
>
> do you have any specific reason not to allow low == high == limit?
No.
> if you want to ensure low < high < limit, it's better to make the
> default values below meet the condition as well.
Hmm, I see.
> to me, it seems weird to prevent users from making these values back to
> the default.
>
will fix.
LLONG_MAX-1 for high
LLONG_MAX-2 for low ...?
BTW, it should be "low + PAGE_SIZE < high + PAGE_SIZE < limit" ...?
Does anyone have good idea ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter
[not found] ` <20071129122402.101c5fbc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-29 3:36 ` YAMAMOTO Takashi
0 siblings, 0 replies; 31+ messages in thread
From: YAMAMOTO Takashi @ 2007-11-29 3:36 UTC (permalink / raw)
To: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
Cc: containers-qjLDD68F18O7TbgM5vRIOg, menage-hpIqsD4AKlfQT0dZR+AlfA,
xemul-GEFAQzZX7r8dnm+yROfE0A,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
> > to me, it seems weird to prevent users from making these values back to
> > the default.
> >
> will fix.
> LLONG_MAX-1 for high
> LLONG_MAX-2 for low ...?
imo it's better to simply allow low == high == limit.
> BTW, it should be "low + PAGE_SIZE < high + PAGE_SIZE < limit" ...?
it shouldn't, unless you are going to throw away the res_counter abstraction.
YAMAMOTO Takashi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?)
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
` (4 preceding siblings ...)
2007-11-28 8:57 ` [RFC][ only for review ] memory controller bacground reclaim [5/5] KAMEZAWA Hiroyuki
@ 2007-11-29 11:53 ` KAMEZAWA Hiroyuki
[not found] ` <20071129205324.f9e7ab4e.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
5 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-29 11:53 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A, menage-hpIqsD4AKlfQT0dZR+AlfA,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Wed, 28 Nov 2007 17:49:23 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> Hi, this set is for memory controller background reclaim.
>
> Merged YAMAMOTO-san's version onto 2.6.23-rc3-mm1 + my NUMA patch.
> And splitted to several sets.
>
> Major changes from his one is
> - use kthread instead of work_queue
> - adjust high/low watermark when limit changes automatically
> and set default value. (a user can specify his own later.)
>
FYI rather than RFC.
I wrote attached patch and run kernbench on 8CPU/2Node NUMA/ia64.
It does make -j 32.
Memory limitation was 800M. Low/High watermark here was 750M/780M.
== These numbers are stable to some extent.==
2.6.24-rc3-mm2: (Limit: 800M)
Average Optimal -j 32 Load Run:
Elapsed Time 358.933---------------------------(*)
User Time 1069.63
System Time 140.667
Percent CPU 337.333
Context Switches 220821
Sleeps 196912
2.6.24-rc3-mm2 + throttle (Limit:800M)
Average Optimal -j 32 Load Run:
Elapsed Time 266.697---------------------------(*)
User Time 1105.39
System Time 124.423
Percent CPU 471.667
Context Switches 251797
Sleeps 231038
2.6.24-rc3-mm2 + throttle + High/Low watermark.
(low:750M High:780M Limit:800M)
Average Optimal -j 32 Load Run:
Elapsed Time 266.844---------------------------(*)
User Time 1112.9
System Time 112.273
Percent CPU 473.667
Context Switches 251795
Sleeps 220339
==
Seems throttling reclaim has some good effect (for kernbench).
Does anyone have an idea for throttling reclaiming of memory controller ?
Thanks,
-Kame
==
Just and Experimental. add throttling at starting direct reclaim.
mm/memcontrol.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
Index: linux-2.6.24-rc3-mm2/mm/memcontrol.c
===================================================================
--- linux-2.6.24-rc3-mm2.orig/mm/memcontrol.c
+++ linux-2.6.24-rc3-mm2/mm/memcontrol.c
@@ -133,6 +133,7 @@ struct mem_cgroup {
unsigned long control_type; /* control RSS or RSS+Pagecache */
int prev_priority; /* for recording reclaim priority */
+ atomic_t reclaimers; /* # of processes which calls reclaim */
/*
* statistics.
*/
@@ -635,14 +636,27 @@ retry:
*/
while (res_counter_charge(&mem->res, PAGE_SIZE)) {
bool is_atomic = gfp_mask & GFP_ATOMIC;
+ int limit;
/*
* We cannot reclaim under GFP_ATOMIC, fail the charge
*/
if (is_atomic)
goto noreclaim;
-
- if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
- continue;
+ /*
+ * Experimental:
+ * Obviously, we need some throttling here.
+ */
+ limit = num_online_cpus()/4 + 1;
+ if (atomic_add_unless(&mem->reclaimers, 1, limit)) {
+ if (try_to_free_mem_cgroup_pages(mem, gfp_mask)) {
+ atomic_dec(&mem->reclaimers);
+ continue;
+ }
+ atomic_dec(&mem->reclaimers);
+ } else {
+ yield();
+ nr_retries++;
+ }
/*
* try_to_free_mem_cgroup_pages() might not give us a full
@@ -1168,7 +1182,7 @@ mem_cgroup_create(struct cgroup_subsys *
mem->control_type = MEM_CGROUP_TYPE_ALL;
memset(&mem->info, 0, sizeof(mem->info));
-
+ atomic_set(&mem->reclaimers, 0);
for_each_node_state(node, N_POSSIBLE)
if (alloc_mem_cgroup_per_zone_info(mem, node))
goto free_out;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?)
[not found] ` <20071129205324.f9e7ab4e.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-29 14:42 ` Balbir Singh
[not found] ` <474ECFEB.9090202-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Balbir Singh @ 2007-11-29 14:42 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
Lee Schermerhorn,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA, Pavel Emelianov
KAMEZAWA Hiroyuki wrote:
> On Wed, 28 Nov 2007 17:49:23 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>
>> Hi, this set is for memory controller background reclaim.
>>
>> Merged YAMAMOTO-san's version onto 2.6.23-rc3-mm1 + my NUMA patch.
>> And splitted to several sets.
>>
>> Major changes from his one is
>> - use kthread instead of work_queue
>> - adjust high/low watermark when limit changes automatically
>> and set default value. (a user can specify his own later.)
>>
> FYI rather than RFC.
>
> I wrote attached patch and run kernbench on 8CPU/2Node NUMA/ia64.
> It does make -j 32.
>
> Memory limitation was 800M. Low/High watermark here was 750M/780M.
>
> == These numbers are stable to some extent.==
> 2.6.24-rc3-mm2: (Limit: 800M)
> Average Optimal -j 32 Load Run:
> Elapsed Time 358.933---------------------------(*)
> User Time 1069.63
> System Time 140.667
> Percent CPU 337.333
> Context Switches 220821
> Sleeps 196912
>
> 2.6.24-rc3-mm2 + throttle (Limit:800M)
> Average Optimal -j 32 Load Run:
> Elapsed Time 266.697---------------------------(*)
> User Time 1105.39
> System Time 124.423
> Percent CPU 471.667
> Context Switches 251797
> Sleeps 231038
>
> 2.6.24-rc3-mm2 + throttle + High/Low watermark.
> (low:750M High:780M Limit:800M)
> Average Optimal -j 32 Load Run:
> Elapsed Time 266.844---------------------------(*)
> User Time 1112.9
> System Time 112.273
> Percent CPU 473.667
> Context Switches 251795
> Sleeps 220339
> ==
>
Looks good to me, was there any impact on memory.failcnt?
> Seems throttling reclaim has some good effect (for kernbench).
> Does anyone have an idea for throttling reclaiming of memory controller ?
>
In the past I've run workloads of apache+geronimo+open trade, I've run
linear sequential memory access tests, kernbench, lmbench, database
benchmarks (DOTS, pgbench, etc). I think Lee Schermerhorn has a very
interesting setup (that I need to learn to replicate).
> Thanks,
> -Kame
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <20071129102044.8087386c.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2007-11-29 19:55 ` Oren Laadan
[not found] ` <474F194C.1000401-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2007-11-29 19:55 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA, Pavel Emelyanov
KAMEZAWA Hiroyuki wrote:
> On Wed, 28 Nov 2007 15:20:42 +0300
> Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
>>> + mem = mem_cgroup_from_cont(cont);
>>> + spin_lock_irqsave(&mem->res.lock, flags);
>>> + val = res_counter_get(&mem->res, RES_LIMIT);
>>> + if (val == (unsigned long long) LLONG_MAX) {
>>> + low = (unsigned long long) LLONG_MAX;
>>> + high = (unsigned long long) LLONG_MAX;
>>> + } else {
>>> + low = val * DEFAULT_WATERMARK_PERCENT_LOW / 100ULL;
>>> + high = val * DEFAULT_WATERMARK_PERCENT_HIGH / 100ULL;
>> BTW, I tried to compile such a code:
>>
>> unsigned long long x, y;
>> y = ...;
>> x = y / 100ULL;
>>
>> (similar to yours) and that's what I got:
>>
>> kernel/built-in.o: In function `xxx':
>> : undefined reference to `__udivdi3'
>>
>> It looks like i386 doesn't have any support for ULL divisions.
>> It doesn't have it in CPU, and I thought that it was some-how
>> emulated, but it is not...
>>
>> Did I miss something?
>>
> Ah, I didn't try i386...
> But I'll drop this automatic watermark adjustment part.
FYI, if you do need it, you can do long long division on i386 using
the macro "do_div()", defined in "include/asm-i386/div64.h".
Oren.
> Thanks,
> -Kame
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <474F194C.1000401-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2007-11-30 0:26 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-30 0:26 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA, Pavel Emelyanov
On Thu, 29 Nov 2007 14:55:56 -0500
Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote:
> >> It looks like i386 doesn't have any support for ULL divisions.
> >> It doesn't have it in CPU, and I thought that it was some-how
> >> emulated, but it is not...
> >>
> >> Did I miss something?
> >>
> > Ah, I didn't try i386...
> > But I'll drop this automatic watermark adjustment part.
>
> FYI, if you do need it, you can do long long division on i386 using
> the macro "do_div()", defined in "include/asm-i386/div64.h".
>
Thank you.
I'll check it.
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?)
[not found] ` <474ECFEB.9090202-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2007-11-30 0:29 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-11-30 0:29 UTC (permalink / raw)
To: balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
Lee Schermerhorn,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
menage-hpIqsD4AKlfQT0dZR+AlfA, Pavel Emelianov
On Thu, 29 Nov 2007 20:12:51 +0530
Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> > == These numbers are stable to some extent.==
> > 2.6.24-rc3-mm2: (Limit: 800M)
> > Average Optimal -j 32 Load Run:
> > Elapsed Time 358.933---------------------------(*)
> > User Time 1069.63
> > System Time 140.667
> > Percent CPU 337.333
> > Context Switches 220821
> > Sleeps 196912
> >
> > 2.6.24-rc3-mm2 + throttle (Limit:800M)
> > Average Optimal -j 32 Load Run:
> > Elapsed Time 266.697---------------------------(*)
> > User Time 1105.39
> > System Time 124.423
> > Percent CPU 471.667
> > Context Switches 251797
> > Sleeps 231038
> >
> > 2.6.24-rc3-mm2 + throttle + High/Low watermark.
> > (low:750M High:780M Limit:800M)
> > Average Optimal -j 32 Load Run:
> > Elapsed Time 266.844---------------------------(*)
> > User Time 1112.9
> > System Time 112.273
> > Percent CPU 473.667
> > Context Switches 251795
> > Sleeps 220339
> > ==
> >
>
> Looks good to me, was there any impact on memory.failcnt?
>
This version o patch doesn't care it. (I'll fix.)
I just wanted to ask someone has (another) throttling patch or idea.
> > Seems throttling reclaim has some good effect (for kernbench).
> > Does anyone have an idea for throttling reclaiming of memory controller ?
> >
>
> In the past I've run workloads of apache+geronimo+open trade, I've run
> linear sequential memory access tests, kernbench, lmbench, database
> benchmarks (DOTS, pgbench, etc). I think Lee Schermerhorn has a very
> interesting setup (that I need to learn to replicate).
>
Ok, thanks.
I will reflesh and post new one.
Regards
-Kame
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <20071128175607.37df2187.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:20 ` Pavel Emelyanov
2007-11-28 12:20 ` Pavel Emelyanov
@ 2007-12-01 7:09 ` Paul Menage
[not found] ` <6599ad830711302309p3a68828fjec6793bc9d854a1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2 siblings, 1 reply; 31+ messages in thread
From: Paul Menage @ 2007-12-01 7:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Nov 28, 2007 12:56 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> {
> + .name = "low_watermark_in_bytes",
> + .private = RES_LOW_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
> + {
> + .name = "high_watermark_in_bytes",
> + .private = RES_HIGH_WATERMARK,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
From a style point of view, I dislike having the "in_bytes" suffix
tacked on to all the memory controller filenames.
If people really want this to be self-documenting, how about we allow
cgroup control files to specify metadata, which would be presented to
the user via an auto-generated "api" file.
As an example, the addition above might then look something like:
{
.name = "low_watermark",
.units = "bytes",
.description = "usage below which background reclaim stops",
.write = mem_cgroup_write,
.read = mem_cgroup_read,
}
which would correspond to a line in the "mem.api" auto-generated control file as
low_watermark: usage below which background reclaim stops (bytes)
Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [5/5]
[not found] ` <20071128175713.4e9b8fff.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:06 ` Pavel Emelyanov
@ 2007-12-01 7:16 ` Paul Menage
1 sibling, 0 replies; 31+ messages in thread
From: Paul Menage @ 2007-12-01 7:16 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A,
balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
On Nov 28, 2007 12:57 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> @@ -1226,6 +1303,7 @@
> {
> struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> mem_cgroup_force_empty(mem);
> + kthread_stop(mem->daemon.thread);
> }
>
This means that if the user tries to delete the cgroup (hence calling
the predestroy handlers) but fails (e.g. because some other subsystem
is still busy) then the cgroup's reclaim daemon will have exited but
the cgroup will still exist. I think you should wait until the actual
destroy call, and synchronize with the exiting reclaim daemon.
Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <6599ad830711302309p3a68828fjec6793bc9d854a1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-12-01 10:45 ` Balbir Singh
[not found] ` <47513B50.8090003-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Balbir Singh @ 2007-12-01 10:45 UTC (permalink / raw)
To: Paul Menage
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A
Paul Menage wrote:
> On Nov 28, 2007 12:56 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>> {
>> + .name = "low_watermark_in_bytes",
>> + .private = RES_LOW_WATERMARK,
>> + .write = mem_cgroup_write,
>> + .read = mem_cgroup_read,
>> + },
>> + {
>> + .name = "high_watermark_in_bytes",
>> + .private = RES_HIGH_WATERMARK,
>> + .write = mem_cgroup_write,
>> + .read = mem_cgroup_read,
>> + },
>
> From a style point of view, I dislike having the "in_bytes" suffix
> tacked on to all the memory controller filenames.
>
The idea came from Andrew, when he suggested it.
Please see /proc/sys/vm. We have files like min_free_kbyes, I think it's
a good idea to tell the user what units are used.
> If people really want this to be self-documenting, how about we allow
> cgroup control files to specify metadata, which would be presented to
> the user via an auto-generated "api" file.
>
> As an example, the addition above might then look something like:
>
> {
> .name = "low_watermark",
> .units = "bytes",
> .description = "usage below which background reclaim stops",
> .write = mem_cgroup_write,
> .read = mem_cgroup_read,
> }
>
> which would correspond to a line in the "mem.api" auto-generated control file as
>
The user is expected to cat "memory.api" in order to figure out how to
use the file?
> low_watermark: usage below which background reclaim stops (bytes)
>
> Paul
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <47513B50.8090003-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2007-12-01 16:55 ` Paul Menage
[not found] ` <6599ad830712010855j7967ddeau1a558474de4eea19-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Paul Menage @ 2007-12-01 16:55 UTC (permalink / raw)
To: balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A
On Dec 1, 2007 2:45 AM, Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> >
> > which would correspond to a line in the "mem.api" auto-generated control file as
> >
>
> The user is expected to cat "memory.api" in order to figure out how to
> use the file?
>
Yes, the very first time they try to use a particular cgroup, if
they've not bothered to read any documentation about it. There has to
be some documentation *somewhere*, and putting some of the basic bits
at a point where they're immediately accessible could be an advantage.
Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller
[not found] ` <6599ad830712010855j7967ddeau1a558474de4eea19-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-12-01 17:16 ` Balbir Singh
0 siblings, 0 replies; 31+ messages in thread
From: Balbir Singh @ 2007-12-01 17:16 UTC (permalink / raw)
To: Paul Menage
Cc: containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
xemul-GEFAQzZX7r8dnm+yROfE0A
Paul Menage wrote:
> On Dec 1, 2007 2:45 AM, Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
>>> which would correspond to a line in the "mem.api" auto-generated control file as
>>>
>> The user is expected to cat "memory.api" in order to figure out how to
>> use the file?
>>
>
> Yes, the very first time they try to use a particular cgroup, if
> they've not bothered to read any documentation about it. There has to
> be some documentation *somewhere*, and putting some of the basic bits
> at a point where they're immediately accessible could be an advantage.
>
> Paul
I worry about the kernel size bloat due to adding documentation in the
kernel. I don't think the documentation belongs to the kernel image, but
using file names like /proc/sys/vm/min_free_kbytes is a much better idea.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2007-12-01 17:16 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-28 8:49 [RFC][ only for review ] memory controller bacground reclaim [0/5] KAMEZAWA Hiroyuki
[not found] ` <20071128174923.1f54f53f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 8:51 ` [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification KAMEZAWA Hiroyuki
[not found] ` <20071128175135.c42adecc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:08 ` Pavel Emelyanov
[not found] ` <474D4C2F.8020701-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:14 ` KAMEZAWA Hiroyuki
2007-11-28 8:52 ` [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter KAMEZAWA Hiroyuki
[not found] ` <20071128175239.e20ec09d.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:09 ` Pavel Emelyanov
[not found] ` <474D4C66.2080303-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:16 ` KAMEZAWA Hiroyuki
2007-11-28 8:54 ` [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter KAMEZAWA Hiroyuki
[not found] ` <20071128175408.1ee479f3.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:12 ` Pavel Emelyanov
[not found] ` <474D4D35.9060603-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:18 ` KAMEZAWA Hiroyuki
2007-11-29 2:56 ` YAMAMOTO Takashi
[not found] ` <20071129025609.0B7111CFE7C-Pcsii4f/SVk@public.gmane.org>
2007-11-29 3:24 ` KAMEZAWA Hiroyuki
[not found] ` <20071129122402.101c5fbc.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-29 3:36 ` YAMAMOTO Takashi
2007-11-28 8:56 ` [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller KAMEZAWA Hiroyuki
[not found] ` <20071128175607.37df2187.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:20 ` Pavel Emelyanov
[not found] ` <474D4F01.4070705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:27 ` KAMEZAWA Hiroyuki
2007-11-28 12:20 ` Pavel Emelyanov
[not found] ` <474D5D1A.4070409-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:20 ` KAMEZAWA Hiroyuki
[not found] ` <20071129102044.8087386c.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-29 19:55 ` Oren Laadan
[not found] ` <474F194C.1000401-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2007-11-30 0:26 ` KAMEZAWA Hiroyuki
2007-12-01 7:09 ` Paul Menage
[not found] ` <6599ad830711302309p3a68828fjec6793bc9d854a1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-12-01 10:45 ` Balbir Singh
[not found] ` <47513B50.8090003-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2007-12-01 16:55 ` Paul Menage
[not found] ` <6599ad830712010855j7967ddeau1a558474de4eea19-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-12-01 17:16 ` Balbir Singh
2007-11-28 8:57 ` [RFC][ only for review ] memory controller bacground reclaim [5/5] KAMEZAWA Hiroyuki
[not found] ` <20071128175713.4e9b8fff.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-28 11:06 ` Pavel Emelyanov
[not found] ` <474D4BAE.7090407-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-29 1:26 ` KAMEZAWA Hiroyuki
2007-12-01 7:16 ` Paul Menage
2007-11-29 11:53 ` [RFC][ only for review ] memory controller bacground reclaim [0/5] (Does anyone have an idea about throttling ?) KAMEZAWA Hiroyuki
[not found] ` <20071129205324.f9e7ab4e.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2007-11-29 14:42 ` Balbir Singh
[not found] ` <474ECFEB.9090202-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2007-11-30 0:29 ` KAMEZAWA Hiroyuki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.