* [RFC PATCH 1/4] Disable "organizing cgroups over soft limit in a RB-Tree"
2011-05-12 18:47 [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Ying Han
@ 2011-05-12 18:47 ` Ying Han
2011-05-12 18:47 ` [RFC PATCH 2/4] Organize memcgs over soft limit in round-robin Ying Han
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Ying Han @ 2011-05-12 18:47 UTC (permalink / raw)
To: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Rik van Riel, Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai
Cc: linux-mm
The current implementation of softlimit is based on per-zone RB tree, where
only the cgroup exceeds the soft_limit the most being selected for reclaim.
The problems are:
1. It takes no consideration of how many pages actually allocated on the zone
from this cgroup. The RB tree is indexed by the cgroup_(usage - soft_limit).
2. It makes less sense to only reclaim from one cgroup rather than reclaiming
all cgroups based on calculated propotion. This is required for fairness.
3. The target of the soft limit reclaim is to bring one cgroup's usage under
its soft_limit. However the target of global memory pressure is to reclaim
pages above the zone's high_wmark.
So the current softlimit reclaim is far from fulfilling the efficiency
requirement. From the discussion on LSF mm session, we agree to change to
organizing the memcgs in a round-robin fashion. This patch is to revert the
current RB-Tree implementation.
Signed-off-by: Ying Han <yinghan@google.com>
---
mm/memcontrol.c | 304 +------------------------------------------------------
1 files changed, 1 insertions(+), 303 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index da1fb2b..9da3ecf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -34,7 +34,6 @@
#include <linux/rcupdate.h>
#include <linux/limits.h>
#include <linux/mutex.h>
-#include <linux/rbtree.h>
#include <linux/slab.h>
#include <linux/swap.h>
#include <linux/swapops.h>
@@ -137,10 +136,6 @@ struct mem_cgroup_per_zone {
unsigned long count[NR_LRU_LISTS];
struct zone_reclaim_stat reclaim_stat;
- struct rb_node tree_node; /* RB tree node */
- unsigned long long usage_in_excess;/* Set to the value by which */
- /* the soft limit is exceeded*/
- bool on_tree;
struct mem_cgroup *mem; /* Back pointer, we cannot */
/* use container_of */
};
@@ -155,26 +150,6 @@ struct mem_cgroup_lru_info {
struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
};
-/*
- * Cgroups above their limits are maintained in a RB-Tree, independent of
- * their hierarchy representation
- */
-
-struct mem_cgroup_tree_per_zone {
- struct rb_root rb_root;
- spinlock_t lock;
-};
-
-struct mem_cgroup_tree_per_node {
- struct mem_cgroup_tree_per_zone rb_tree_per_zone[MAX_NR_ZONES];
-};
-
-struct mem_cgroup_tree {
- struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES];
-};
-
-static struct mem_cgroup_tree soft_limit_tree __read_mostly;
-
struct mem_cgroup_threshold {
struct eventfd_ctx *eventfd;
u64 threshold;
@@ -384,164 +359,6 @@ page_cgroup_zoneinfo(struct mem_cgroup *mem, struct page *page)
return mem_cgroup_zoneinfo(mem, nid, zid);
}
-static struct mem_cgroup_tree_per_zone *
-soft_limit_tree_node_zone(int nid, int zid)
-{
- return &soft_limit_tree.rb_tree_per_node[nid]->rb_tree_per_zone[zid];
-}
-
-static struct mem_cgroup_tree_per_zone *
-soft_limit_tree_from_page(struct page *page)
-{
- int nid = page_to_nid(page);
- int zid = page_zonenum(page);
-
- return &soft_limit_tree.rb_tree_per_node[nid]->rb_tree_per_zone[zid];
-}
-
-static void
-__mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
- struct mem_cgroup_per_zone *mz,
- struct mem_cgroup_tree_per_zone *mctz,
- unsigned long long new_usage_in_excess)
-{
- struct rb_node **p = &mctz->rb_root.rb_node;
- struct rb_node *parent = NULL;
- struct mem_cgroup_per_zone *mz_node;
-
- if (mz->on_tree)
- return;
-
- mz->usage_in_excess = new_usage_in_excess;
- if (!mz->usage_in_excess)
- return;
- while (*p) {
- parent = *p;
- mz_node = rb_entry(parent, struct mem_cgroup_per_zone,
- tree_node);
- if (mz->usage_in_excess < mz_node->usage_in_excess)
- p = &(*p)->rb_left;
- /*
- * We can't avoid mem cgroups that are over their soft
- * limit by the same amount
- */
- else if (mz->usage_in_excess >= mz_node->usage_in_excess)
- p = &(*p)->rb_right;
- }
- rb_link_node(&mz->tree_node, parent, p);
- rb_insert_color(&mz->tree_node, &mctz->rb_root);
- mz->on_tree = true;
-}
-
-static void
-__mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
- struct mem_cgroup_per_zone *mz,
- struct mem_cgroup_tree_per_zone *mctz)
-{
- if (!mz->on_tree)
- return;
- rb_erase(&mz->tree_node, &mctz->rb_root);
- mz->on_tree = false;
-}
-
-static void
-mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
- struct mem_cgroup_per_zone *mz,
- struct mem_cgroup_tree_per_zone *mctz)
-{
- spin_lock(&mctz->lock);
- __mem_cgroup_remove_exceeded(mem, mz, mctz);
- spin_unlock(&mctz->lock);
-}
-
-
-static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page *page)
-{
- unsigned long long excess;
- struct mem_cgroup_per_zone *mz;
- struct mem_cgroup_tree_per_zone *mctz;
- int nid = page_to_nid(page);
- int zid = page_zonenum(page);
- mctz = soft_limit_tree_from_page(page);
-
- /*
- * Necessary to update all ancestors when hierarchy is used.
- * because their event counter is not touched.
- */
- for (; mem; mem = parent_mem_cgroup(mem)) {
- mz = mem_cgroup_zoneinfo(mem, nid, zid);
- excess = res_counter_soft_limit_excess(&mem->res);
- /*
- * We have to update the tree if mz is on RB-tree or
- * mem is over its softlimit.
- */
- if (excess || mz->on_tree) {
- spin_lock(&mctz->lock);
- /* if on-tree, remove it */
- if (mz->on_tree)
- __mem_cgroup_remove_exceeded(mem, mz, mctz);
- /*
- * Insert again. mz->usage_in_excess will be updated.
- * If excess is 0, no tree ops.
- */
- __mem_cgroup_insert_exceeded(mem, mz, mctz, excess);
- spin_unlock(&mctz->lock);
- }
- }
-}
-
-static void mem_cgroup_remove_from_trees(struct mem_cgroup *mem)
-{
- int node, zone;
- struct mem_cgroup_per_zone *mz;
- struct mem_cgroup_tree_per_zone *mctz;
-
- for_each_node_state(node, N_POSSIBLE) {
- for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- mz = mem_cgroup_zoneinfo(mem, node, zone);
- mctz = soft_limit_tree_node_zone(node, zone);
- mem_cgroup_remove_exceeded(mem, mz, mctz);
- }
- }
-}
-
-static struct mem_cgroup_per_zone *
-__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
-{
- struct rb_node *rightmost = NULL;
- struct mem_cgroup_per_zone *mz;
-
-retry:
- mz = NULL;
- rightmost = rb_last(&mctz->rb_root);
- if (!rightmost)
- goto done; /* Nothing to reclaim from */
-
- mz = rb_entry(rightmost, struct mem_cgroup_per_zone, tree_node);
- /*
- * Remove the node now but someone else can add it back,
- * we will to add it back at the end of reclaim to its correct
- * position in the tree.
- */
- __mem_cgroup_remove_exceeded(mz->mem, mz, mctz);
- if (!res_counter_soft_limit_excess(&mz->mem->res) ||
- !css_tryget(&mz->mem->css))
- goto retry;
-done:
- return mz;
-}
-
-static struct mem_cgroup_per_zone *
-mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
-{
- struct mem_cgroup_per_zone *mz;
-
- spin_lock(&mctz->lock);
- mz = __mem_cgroup_largest_soft_limit_node(mctz);
- spin_unlock(&mctz->lock);
- return mz;
-}
-
/*
* Implementation Note: reading percpu statistics for memcg.
*
@@ -727,7 +544,6 @@ static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
__mem_cgroup_target_update(mem, MEM_CGROUP_TARGET_THRESH);
if (unlikely(__memcg_event_check(mem,
MEM_CGROUP_TARGET_SOFTLIMIT))){
- mem_cgroup_update_tree(mem, page);
__mem_cgroup_target_update(mem,
MEM_CGROUP_TARGET_SOFTLIMIT);
}
@@ -3373,95 +3189,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
gfp_t gfp_mask,
unsigned long *total_scanned)
{
- unsigned long nr_reclaimed = 0;
- struct mem_cgroup_per_zone *mz, *next_mz = NULL;
- unsigned long reclaimed;
- int loop = 0;
- struct mem_cgroup_tree_per_zone *mctz;
- unsigned long long excess;
- unsigned long nr_scanned;
-
- if (order > 0)
- return 0;
-
- mctz = soft_limit_tree_node_zone(zone_to_nid(zone), zone_idx(zone));
- /*
- * This loop can run a while, specially if mem_cgroup's continuously
- * keep exceeding their soft limit and putting the system under
- * pressure
- */
- do {
- if (next_mz)
- mz = next_mz;
- else
- mz = mem_cgroup_largest_soft_limit_node(mctz);
- if (!mz)
- break;
-
- nr_scanned = 0;
- reclaimed = mem_cgroup_hierarchical_reclaim(mz->mem, zone,
- gfp_mask,
- MEM_CGROUP_RECLAIM_SOFT,
- &nr_scanned);
- nr_reclaimed += reclaimed;
- *total_scanned += nr_scanned;
-
- spin_lock(&mctz->lock);
-
- /*
- * If we failed to reclaim anything from this memory cgroup
- * it is time to move on to the next cgroup
- */
- next_mz = NULL;
- if (!reclaimed) {
- do {
- /*
- * Loop until we find yet another one.
- *
- * By the time we get the soft_limit lock
- * again, someone might have aded the
- * group back on the RB tree. Iterate to
- * make sure we get a different mem.
- * mem_cgroup_largest_soft_limit_node returns
- * NULL if no other cgroup is present on
- * the tree
- */
- next_mz =
- __mem_cgroup_largest_soft_limit_node(mctz);
- if (next_mz == mz)
- css_put(&next_mz->mem->css);
- else /* next_mz == NULL or other memcg */
- break;
- } while (1);
- }
- __mem_cgroup_remove_exceeded(mz->mem, mz, mctz);
- excess = res_counter_soft_limit_excess(&mz->mem->res);
- /*
- * One school of thought says that we should not add
- * back the node to the tree if reclaim returns 0.
- * But our reclaim could return 0, simply because due
- * to priority we are exposing a smaller subset of
- * memory to reclaim from. Consider this as a longer
- * term TODO.
- */
- /* If excess == 0, no tree ops */
- __mem_cgroup_insert_exceeded(mz->mem, mz, mctz, excess);
- spin_unlock(&mctz->lock);
- css_put(&mz->mem->css);
- loop++;
- /*
- * Could not reclaim anything and there are no more
- * mem cgroups to try or we seem to be looping without
- * reclaiming anything.
- */
- if (!nr_reclaimed &&
- (next_mz == NULL ||
- loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
- break;
- } while (!nr_reclaimed);
- if (next_mz)
- css_put(&next_mz->mem->css);
- return nr_reclaimed;
+ return 0;
}
/*
@@ -4525,8 +4253,6 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
mz = &pn->zoneinfo[zone];
for_each_lru(l)
INIT_LIST_HEAD(&mz->lists[l]);
- mz->usage_in_excess = 0;
- mz->on_tree = false;
mz->mem = mem;
}
return 0;
@@ -4580,7 +4306,6 @@ static void __mem_cgroup_free(struct mem_cgroup *mem)
{
int node;
- mem_cgroup_remove_from_trees(mem);
free_css_id(&mem_cgroup_subsys, &mem->css);
for_each_node_state(node, N_POSSIBLE)
@@ -4635,31 +4360,6 @@ static void __init enable_swap_cgroup(void)
}
#endif
-static int mem_cgroup_soft_limit_tree_init(void)
-{
- struct mem_cgroup_tree_per_node *rtpn;
- struct mem_cgroup_tree_per_zone *rtpz;
- int tmp, node, zone;
-
- for_each_node_state(node, N_POSSIBLE) {
- tmp = node;
- if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
- rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, tmp);
- if (!rtpn)
- return 1;
-
- soft_limit_tree.rb_tree_per_node[node] = rtpn;
-
- for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- rtpz = &rtpn->rb_tree_per_zone[zone];
- rtpz->rb_root = RB_ROOT;
- spin_lock_init(&rtpz->lock);
- }
- }
- return 0;
-}
-
static struct cgroup_subsys_state * __ref
mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -4681,8 +4381,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
enable_swap_cgroup();
parent = NULL;
root_mem_cgroup = mem;
- if (mem_cgroup_soft_limit_tree_init())
- goto free_out;
for_each_possible_cpu(cpu) {
struct memcg_stock_pcp *stock =
&per_cpu(memcg_stock, cpu);
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 2/4] Organize memcgs over soft limit in round-robin.
2011-05-12 18:47 [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Ying Han
2011-05-12 18:47 ` [RFC PATCH 1/4] Disable "organizing cgroups over soft limit in a RB-Tree" Ying Han
@ 2011-05-12 18:47 ` Ying Han
2011-05-12 18:47 ` [RFC PATCH 3/4] Implementation of soft_limit reclaim " Ying Han
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Ying Han @ 2011-05-12 18:47 UTC (permalink / raw)
To: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Rik van Riel, Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai
Cc: linux-mm
Based on the discussion from LSF, we came up with the design where all the
memcgs are stored in link-list and reclaims happen in a round-robin fashion.
We build per-zone memcg list which links mem_cgroup_per_zone for all memcgs
exceeded their soft_limit and have memory allocated on the zone.
1. new memcg is examed and inserted once per 1024 increments of
mem_cgroup_commit_charge().
2. under global memory pressure, we iterate the list and try to reclaim a
target number of pages from each memcg.
3. move the memcg to the tail after finishing the reclaim.
4. remove the memcg from the list if the usage dropped below the soft_limit.
Signed-off-by: Ying Han <yinghan@google.com>
---
mm/memcontrol.c | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 159 insertions(+), 0 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9da3ecf..1360de6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -136,6 +136,9 @@ struct mem_cgroup_per_zone {
unsigned long count[NR_LRU_LISTS];
struct zone_reclaim_stat reclaim_stat;
+ struct list_head soft_limit_list;
+ unsigned long long usage_in_excess;
+ bool on_list;
struct mem_cgroup *mem; /* Back pointer, we cannot */
/* use container_of */
};
@@ -150,6 +153,25 @@ struct mem_cgroup_lru_info {
struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
};
+/*
+ * Cgroups above their limits are maintained in a link-list, independent of
+ * their hierarchy representation
+ */
+struct mem_cgroup_list_per_zone {
+ struct list_head list;
+ spinlock_t lock;
+};
+
+struct mem_cgroup_list_per_node {
+ struct mem_cgroup_list_per_zone list_per_zone[MAX_NR_ZONES];
+};
+
+struct mem_cgroup_list {
+ struct mem_cgroup_list_per_node *list_per_node[MAX_NUMNODES];
+};
+
+static struct mem_cgroup_list soft_limit_list __read_mostly;
+
struct mem_cgroup_threshold {
struct eventfd_ctx *eventfd;
u64 threshold;
@@ -359,6 +381,112 @@ page_cgroup_zoneinfo(struct mem_cgroup *mem, struct page *page)
return mem_cgroup_zoneinfo(mem, nid, zid);
}
+static struct mem_cgroup_list_per_zone *
+soft_limit_list_node_zone(int nid, int zid)
+{
+ return &soft_limit_list.list_per_node[nid]->list_per_zone[zid];
+}
+
+static struct mem_cgroup_list_per_zone *
+soft_limit_list_from_page(struct page *page)
+{
+ int nid = page_to_nid(page);
+ int zid = page_zonenum(page);
+
+ return &soft_limit_list.list_per_node[nid]->list_per_zone[zid];
+}
+
+static void
+__mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
+ struct mem_cgroup_per_zone *mz,
+ struct mem_cgroup_list_per_zone *mclz,
+ unsigned long long new_usage_in_excess)
+{
+ if (mz->on_list)
+ return;
+
+ mz->usage_in_excess = new_usage_in_excess;
+ if (!mz->usage_in_excess)
+ return;
+
+ list_add(&mz->soft_limit_list, &mclz->list);
+ mz->on_list = true;
+}
+
+static void
+mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
+ struct mem_cgroup_per_zone *mz,
+ struct mem_cgroup_list_per_zone *mclz,
+ unsigned long long new_usage_in_excess)
+{
+ spin_lock(&mclz->lock);
+ __mem_cgroup_insert_exceeded(mem, mz, mclz, new_usage_in_excess);
+ spin_unlock(&mclz->lock);
+}
+
+static void
+__mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
+ struct mem_cgroup_per_zone *mz,
+ struct mem_cgroup_list_per_zone *mclz)
+{
+ if (!mz->on_list)
+ return;
+
+ if (list_empty(&mclz->list))
+ return;
+
+ list_del(&mz->soft_limit_list);
+ mz->on_list = false;
+}
+
+static void
+mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
+ struct mem_cgroup_per_zone *mz,
+ struct mem_cgroup_list_per_zone *mclz)
+{
+
+ spin_lock(&mclz->lock);
+ __mem_cgroup_remove_exceeded(mem, mz, mclz);
+ spin_unlock(&mclz->lock);
+}
+
+static void
+mem_cgroup_update_list(struct mem_cgroup *mem, struct page *page)
+{
+ unsigned long long excess;
+ struct mem_cgroup_per_zone *mz;
+ struct mem_cgroup_list_per_zone *mclz;
+ int nid = page_to_nid(page);
+ int zid = page_zonenum(page);
+ mclz = soft_limit_list_from_page(page);
+
+ for (; mem; mem = parent_mem_cgroup(mem)) {
+ mz = mem_cgroup_zoneinfo(mem, nid, zid);
+ excess = res_counter_soft_limit_excess(&mem->res);
+
+ if (excess)
+ mem_cgroup_insert_exceeded(mem, mz, mclz, excess);
+ else
+ mem_cgroup_remove_exceeded(mem, mz, mclz);
+ }
+}
+
+static void
+mem_cgroup_remove_from_lists(struct mem_cgroup *mem)
+{
+ int node, zone;
+ struct mem_cgroup_per_zone *mz;
+ struct mem_cgroup_list_per_zone *mclz;
+
+ for_each_node_state(node, N_POSSIBLE) {
+ for (zone = 0; zone < MAX_NR_ZONES; zone++) {
+ mz = mem_cgroup_zoneinfo(mem, node, zone);
+ mclz = soft_limit_list_node_zone(node, zone);
+ mem_cgroup_remove_exceeded(mem, mz, mclz);
+ }
+ }
+}
+
/*
* Implementation Note: reading percpu statistics for memcg.
*
@@ -544,6 +672,7 @@ static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
__mem_cgroup_target_update(mem, MEM_CGROUP_TARGET_THRESH);
if (unlikely(__memcg_event_check(mem,
MEM_CGROUP_TARGET_SOFTLIMIT))){
+ mem_cgroup_update_list(mem, page);
__mem_cgroup_target_update(mem,
MEM_CGROUP_TARGET_SOFTLIMIT);
}
@@ -4253,6 +4382,8 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
mz = &pn->zoneinfo[zone];
for_each_lru(l)
INIT_LIST_HEAD(&mz->lists[l]);
+ mz->usage_in_excess = 0;
+ mz->on_list = false;
mz->mem = mem;
}
return 0;
@@ -4306,6 +4437,7 @@ static void __mem_cgroup_free(struct mem_cgroup *mem)
{
int node;
+ mem_cgroup_remove_from_lists(mem);
free_css_id(&mem_cgroup_subsys, &mem->css);
for_each_node_state(node, N_POSSIBLE)
@@ -4360,6 +4492,31 @@ static void __init enable_swap_cgroup(void)
}
#endif
+static int mem_cgroup_soft_limit_list_init(void)
+{
+ struct mem_cgroup_list_per_node *rlpn;
+ struct mem_cgroup_list_per_zone *rlpz;
+ int tmp, node, zone;
+
+ for_each_node_state(node, N_POSSIBLE) {
+ tmp = node;
+ if (!node_state(node, N_NORMAL_MEMORY))
+ tmp = -1;
+ rlpn = kzalloc_node(sizeof(*rlpn), GFP_KERNEL, tmp);
+ if (!rlpn)
+ return 1;
+
+ soft_limit_list.list_per_node[node] = rlpn;
+
+ for (zone = 0; zone < MAX_NR_ZONES; zone++) {
+ rlpz = &rlpn->list_per_zone[zone];
+ INIT_LIST_HEAD(&rlpz->list);
+ spin_lock_init(&rlpz->lock);
+ }
+ }
+ return 0;
+}
+
static struct cgroup_subsys_state * __ref
mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -4381,6 +4538,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
enable_swap_cgroup();
parent = NULL;
root_mem_cgroup = mem;
+ if (mem_cgroup_soft_limit_list_init())
+ goto free_out;
for_each_possible_cpu(cpu) {
struct memcg_stock_pcp *stock =
&per_cpu(memcg_stock, cpu);
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 3/4] Implementation of soft_limit reclaim in round-robin.
2011-05-12 18:47 [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Ying Han
2011-05-12 18:47 ` [RFC PATCH 1/4] Disable "organizing cgroups over soft limit in a RB-Tree" Ying Han
2011-05-12 18:47 ` [RFC PATCH 2/4] Organize memcgs over soft limit in round-robin Ying Han
@ 2011-05-12 18:47 ` Ying Han
2011-05-12 18:47 ` [RFC PATCH 4/4] Add some debugging stats Ying Han
2011-05-13 0:40 ` [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Rik van Riel
4 siblings, 0 replies; 7+ messages in thread
From: Ying Han @ 2011-05-12 18:47 UTC (permalink / raw)
To: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Rik van Riel, Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai
Cc: linux-mm
This patch re-implement the soft_limit reclaim function which it
picks up next memcg to reclaim from in a round-robin fashion.
For each memcg, we do hierarchical reclaim and checks the zone_wmark_ok()
after each iteration. There is a rate limit per each memcg on how many
pages to scan based on how much it exceeds the soft_limit.
This patch is a first step approach to switch from RB-tree based reclaim
to link-list based reclaim, and improvement on per-memcg soft_limit reclaim
algorithm is needed next.
Some test result:
Test 1:
Here I have three memcgs each doing a read on 20g file on a 32g system(no swap).
Meantime I have a program pinned a 18g anon pages under root. The hard_limit and
soft_limit is listed as container(hard_limit, soft_limit)
root: 18g anon pages w/o swap
A (20g, 2g):
soft_kswapd_steal 4265600
soft_kswapd_scan 4265600
B (20g, 2g):
soft_kswapd_steal 4265600
soft_kswapd_scan 4265600
C: (20g, 2g)
soft_kswapd_steal 4083904
soft_kswapd_scan 4083904
vmstat:
kswapd_steal 12617255
99.9% steal
This two stats shows the zone_wmark_ok is fullfilled after soft_limit
reclaim vs per-zone reclaim.
kwapd_zone_wmark_ok 1974
kswapd_soft_limit_zone_wmark_ok 1969
Test2:
Here the same memcgs but each is doing a 20g file write.
root: 18g anon pages w/o swap
A (20g, 2g):
soft_kswapd_steal 4718336
soft_kswapd_scan 4718336
B (20g, 2g):
soft_kswapd_steal 4710304
soft_kswapd_scan 4710304
C (20g, 3g);
soft_kswapd_steal 2933406
soft_kswapd_scan 5797460
kswapd_steal 15958486
77%
kswapd_zone_wmark_ok 2517
kswapd_soft_limit_zone_wmark_ok 2405
TODO:
1. We would like to do better on targeting reclaim by calculating the target
nr_to_scan per-memcg, especially combining the current heuristics with
soft_limit exceeds. How much weight we would like to put for the soft_limit
exceed, or do we want to make it configurable?
2. As decided in LSF, we need a second list of memcgs under their soft_limit
per-zone as well. This is needed to do zone balancing w/o global LRU. We
shouldn't scan the second list unless the first list exhausted.
Signed-off-by: Ying Han <yinghan@google.com>
---
include/linux/memcontrol.h | 3 +-
mm/memcontrol.c | 119 ++++++++++++++++++++++++++++++++++++++++++-
mm/vmscan.c | 25 +++++-----
3 files changed, 131 insertions(+), 16 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6a0cffd..c7fcb26 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -145,7 +145,8 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
}
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask,
+ gfp_t gfp_mask, int end_zone,
+ unsigned long balance_gap,
unsigned long *total_scanned);
u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1360de6..b87ccc8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1093,6 +1093,19 @@ unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
return MEM_CGROUP_ZSTAT(mz, lru);
}
+unsigned long mem_cgroup_zone_reclaimable_pages(struct mem_cgroup_per_zone *mz)
+{
+ unsigned long total = 0;
+
+ if (nr_swap_pages) {
+ total += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_ANON);
+ total += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_ANON);
+ }
+ total += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_FILE);
+ total += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_FILE);
+ return total;
+}
+
struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
struct zone *zone)
{
@@ -1528,7 +1541,14 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
return ret;
total += ret;
if (check_soft) {
- if (!res_counter_soft_limit_excess(&root_mem->res))
+ /*
+ * We want to be fair for each memcg soft_limit reclaim
+ * based on the excess.excess >> 2 is not to excessive
+ * so as to reclaim too much, nor too less that we keep
+ * coming back to reclaim from tis cgroup.
+ */
+ if (!res_counter_soft_limit_excess(&root_mem->res) ||
+ total >= (excess >> 2))
return total;
} else if (mem_cgroup_margin(root_mem))
return 1 + total;
@@ -3314,11 +3334,104 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
return ret;
}
+static struct mem_cgroup_per_zone *
+__mem_cgroup_next_soft_limit_node(struct mem_cgroup_list_per_zone *mclz)
+{
+ struct mem_cgroup_per_zone *mz;
+
+retry:
+ mz = NULL;
+ if (list_empty(&mclz->list))
+ goto done;
+
+ mz = list_entry(mclz->list.prev, struct mem_cgroup_per_zone,
+ soft_limit_list);
+
+ __mem_cgroup_remove_exceeded(mz->mem, mz, mclz);
+ if (!res_counter_soft_limit_excess(&mz->mem->res) ||
+ !mem_cgroup_zone_reclaimable_pages(mz) ||
+ !css_tryget(&mz->mem->css))
+ goto retry;
+done:
+ return mz;
+}
+
+static struct mem_cgroup_per_zone *
+mem_cgroup_next_soft_limit_node(struct mem_cgroup_list_per_zone *mclz)
+{
+ struct mem_cgroup_per_zone *mz;
+
+ spin_lock(&mclz->lock);
+ mz = __mem_cgroup_next_soft_limit_node(mclz);
+ spin_unlock(&mclz->lock);
+ return mz;
+}
+
unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
- gfp_t gfp_mask,
+ gfp_t gfp_mask, int end_zone,
+ unsigned long balance_gap,
unsigned long *total_scanned)
{
- return 0;
+ unsigned long nr_reclaimed = 0;
+ unsigned long reclaimed;
+ struct mem_cgroup_per_zone *mz;
+ struct mem_cgroup_list_per_zone *mclz;
+ unsigned long long excess;
+ unsigned long nr_scanned;
+ int loop = 0;
+
+ /*
+ * memcg reclaim doesn't support lumpy.
+ */
+ if (order > 0)
+ return 0;
+
+ mclz = soft_limit_list_node_zone(zone_to_nid(zone), zone_idx(zone));
+ /*
+ * Start from the head of list.
+ */
+ while (!list_empty(&mclz->list)) {
+ mz = mem_cgroup_next_soft_limit_node(mclz);
+ if (!mz)
+ break;
+
+ nr_scanned = 0;
+ reclaimed = mem_cgroup_hierarchical_reclaim(mz->mem, zone,
+ gfp_mask,
+ MEM_CGROUP_RECLAIM_SOFT,
+ &nr_scanned);
+ nr_reclaimed += reclaimed;
+ *total_scanned += nr_scanned;
+
+ spin_lock(&mclz->lock);
+
+ __mem_cgroup_remove_exceeded(mz->mem, mz, mclz);
+ /*
+ * Add it back to the list even the reclaimed equals
+ * to zero as long as the memcg is still above its
+ * soft_limit. It could be possible lots of pages becomes
+ * reclaimable suddently.
+ */
+ excess = res_counter_soft_limit_excess(&mz->mem->res);
+ __mem_cgroup_insert_exceeded(mz->mem, mz, mclz, excess);
+
+ spin_unlock(&mclz->lock);
+ css_put(&mz->mem->css);
+ loop++;
+
+ if (zone_watermark_ok_safe(zone, order,
+ high_wmark_pages(zone) + balance_gap,
+ end_zone, 0)) {
+ break;
+ }
+
+ if (loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS ||
+ *total_scanned > nr_reclaimed + nr_reclaimed / 2)
+ break;
+
+ }
+
+ return nr_reclaimed;
}
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 96789e0..9d79070 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2421,18 +2421,6 @@ loop_again:
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;
- sc.nr_scanned = 0;
-
- nr_soft_scanned = 0;
- /*
- * Call soft limit reclaim before calling shrink_zone.
- */
- nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
- order, sc.gfp_mask,
- &nr_soft_scanned);
- sc.nr_reclaimed += nr_soft_reclaimed;
- total_scanned += nr_soft_scanned;
-
/*
* We put equal pressure on every zone, unless
* one zone has way too many pages free
@@ -2445,6 +2433,19 @@ loop_again:
(zone->present_pages +
KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
KSWAPD_ZONE_BALANCE_GAP_RATIO);
+ sc.nr_scanned = 0;
+
+ nr_soft_scanned = 0;
+ /*
+ * Call soft limit reclaim before calling shrink_zone.
+ */
+ nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
+ order, sc.gfp_mask,
+ end_zone, balance_gap,
+ &nr_soft_scanned);
+ sc.nr_reclaimed += nr_soft_reclaimed;
+ total_scanned += nr_soft_scanned;
+
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone) + balance_gap,
end_zone, 0))
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 4/4] Add some debugging stats
2011-05-12 18:47 [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Ying Han
` (2 preceding siblings ...)
2011-05-12 18:47 ` [RFC PATCH 3/4] Implementation of soft_limit reclaim " Ying Han
@ 2011-05-12 18:47 ` Ying Han
2011-05-13 0:40 ` [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Rik van Riel
4 siblings, 0 replies; 7+ messages in thread
From: Ying Han @ 2011-05-12 18:47 UTC (permalink / raw)
To: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Rik van Riel, Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai
Cc: linux-mm
This patch is not intended to be included but only including debugging
stats.
It includes counters memcg being inserted/deleted in the list. And also
counters where zone_wmark_ok() fullfilled from soft_limit reclaim.
Signed-off-by: Ying Han <yinghan@google.com>
---
include/linux/memcontrol.h | 14 ++++++++++++++
include/linux/vm_event_item.h | 1 +
mm/memcontrol.c | 23 +++++++++++++++++++++++
mm/vmscan.c | 3 ++-
mm/vmstat.c | 2 ++
5 files changed, 42 insertions(+), 1 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c7fcb26..d97aa1c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -121,6 +121,10 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
extern int do_swap_account;
#endif
+/* background reclaim stats */
+void mem_cgroup_list_insert(struct mem_cgroup *memcg, int val);
+void mem_cgroup_list_remove(struct mem_cgroup *memcg, int val);
+
static inline bool mem_cgroup_disabled(void)
{
if (mem_cgroup_subsys.disabled)
@@ -363,6 +367,16 @@ static inline
void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
{
}
+
+static inline void mem_cgroup_list_insert(struct mem_cgroup *memcg,
+ int val)
+{
+}
+
+static inline void mem_cgroup_list_remove(struct mem_cgroup *memcg,
+ int val)
+{
+}
#endif /* CONFIG_CGROUP_MEM_CONT */
#if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 03b90cdc..f226bfd 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -35,6 +35,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
+ KSWAPD_ZONE_WMARK_OK, KSWAPD_SOFT_LIMIT_ZONE_WMARK_OK,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
#ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b87ccc8..bd7c481 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -103,6 +103,8 @@ enum mem_cgroup_events_index {
/* soft reclaim in direct reclaim */
MEM_CGROUP_EVENTS_SOFT_DIRECT_SCAN, /* # of pages scanned from */
/* soft reclaim in direct reclaim */
+ MEM_CGROUP_EVENTS_LIST_INSERT,
+ MEM_CGROUP_EVENTS_LIST_REMOVE,
MEM_CGROUP_EVENTS_NSTATS,
};
/*
@@ -411,6 +413,7 @@ __mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
list_add(&mz->soft_limit_list, &mclz->list);
mz->on_list = true;
+ mem_cgroup_list_insert(mem, 1);
}
static void
@@ -437,6 +440,7 @@ __mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
list_del(&mz->soft_limit_list);
mz->on_list = false;
+ mem_cgroup_list_remove(mem, 1);
}
static void
@@ -550,6 +554,16 @@ void mem_cgroup_pgmajfault(struct mem_cgroup *mem, int val)
this_cpu_add(mem->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT], val);
}
+void mem_cgroup_list_insert(struct mem_cgroup *mem, int val)
+{
+ this_cpu_add(mem->stat->events[MEM_CGROUP_EVENTS_LIST_INSERT], val);
+}
+
+void mem_cgroup_list_remove(struct mem_cgroup *mem, int val)
+{
+ this_cpu_add(mem->stat->events[MEM_CGROUP_EVENTS_LIST_REMOVE], val);
+}
+
static unsigned long mem_cgroup_read_events(struct mem_cgroup *mem,
enum mem_cgroup_events_index idx)
{
@@ -3422,6 +3436,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
if (zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone) + balance_gap,
end_zone, 0)) {
+ count_vm_events(KSWAPD_SOFT_LIMIT_ZONE_WMARK_OK, 1);
break;
}
@@ -3838,6 +3853,8 @@ enum {
MCS_SOFT_KSWAPD_SCAN,
MCS_SOFT_DIRECT_STEAL,
MCS_SOFT_DIRECT_SCAN,
+ MCS_LIST_INSERT,
+ MCS_LIST_REMOVE,
MCS_INACTIVE_ANON,
MCS_ACTIVE_ANON,
MCS_INACTIVE_FILE,
@@ -3866,6 +3883,8 @@ struct {
{"soft_kswapd_scan", "total_soft_kswapd_scan"},
{"soft_direct_steal", "total_soft_direct_steal"},
{"soft_direct_scan", "total_soft_direct_scan"},
+ {"list_insert", "total_list_insert"},
+ {"list_remove", "total_list_remove"},
{"inactive_anon", "total_inactive_anon"},
{"active_anon", "total_active_anon"},
{"inactive_file", "total_inactive_file"},
@@ -3906,6 +3925,10 @@ mem_cgroup_get_local_stat(struct mem_cgroup *mem, struct mcs_total_stat *s)
s->stat[MCS_PGFAULT] += val;
val = mem_cgroup_read_events(mem, MEM_CGROUP_EVENTS_PGMAJFAULT);
s->stat[MCS_PGMAJFAULT] += val;
+ val = mem_cgroup_read_events(mem, MEM_CGROUP_EVENTS_LIST_INSERT);
+ s->stat[MCS_LIST_INSERT] += val;
+ val = mem_cgroup_read_events(mem, MEM_CGROUP_EVENTS_LIST_REMOVE);
+ s->stat[MCS_LIST_REMOVE] += val;
/* per zone stat */
val = mem_cgroup_get_local_zonestat(mem, LRU_INACTIVE_ANON);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9d79070..fc3da68 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2492,11 +2492,12 @@ loop_again:
zone_clear_flag(zone, ZONE_CONGESTED);
if (i <= *classzone_idx)
balanced += zone->present_pages;
+ count_vm_events(KSWAPD_ZONE_WMARK_OK, 1);
}
-
}
if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
break; /* kswapd: all done */
+
/*
* OK, kswapd is getting into trouble. Take a nap, then take
* another pass across the zones.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a2b7344..2b3a7e5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -922,6 +922,8 @@ const char * const vmstat_text[] = {
"kswapd_low_wmark_hit_quickly",
"kswapd_high_wmark_hit_quickly",
"kswapd_skip_congestion_wait",
+ "kswapd_zone_wmark_ok",
+ "kswapd_soft_limit_zone_wmark_ok",
"pageoutrun",
"allocstall",
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention
2011-05-12 18:47 [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Ying Han
` (3 preceding siblings ...)
2011-05-12 18:47 ` [RFC PATCH 4/4] Add some debugging stats Ying Han
@ 2011-05-13 0:40 ` Rik van Riel
2011-05-13 0:54 ` Ying Han
4 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2011-05-13 0:40 UTC (permalink / raw)
To: Ying Han
Cc: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai, linux-mm,
Michel Lespinasse
On 05/12/2011 02:47 PM, Ying Han wrote:
> TODO:
> a) there was a question on how to do zone balancing w/o global LRU. This could be
> solved by building another cgroup list per-zone, where we also link cgroups under
> their soft_limit. We won't scan the list unless the first list being exhausted and
> the free pages is still under the high_wmark.
> b). one of the tricky part is to calculate the target nr_to_scan for each cgroup,
> especially combining the current heuristics with soft_limit exceeds. it depends how
> much weight we need to put on the second. One way is to make the ratio to be user
> configurable.
Johannes addresses these in his patch series.
> Ying Han (4):
> Disable "organizing cgroups over soft limit in a RB-Tree"
> Organize memcgs over soft limit in round-robin.
> Implementation of soft_limit reclaim in round-robin.
> Add some debugging stats
Looks like you also have some things Johannes doesn't have.
It may be good for the two patch series you have to get
merged into one series, before stuff gets merged upstream.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention
2011-05-13 0:40 ` [RFC PATCH 0/4] memcg: revisit soft_limit reclaim on contention Rik van Riel
@ 2011-05-13 0:54 ` Ying Han
0 siblings, 0 replies; 7+ messages in thread
From: Ying Han @ 2011-05-13 0:54 UTC (permalink / raw)
To: Rik van Riel
Cc: Johannes Weiner, KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura,
Balbir Singh, Tejun Heo, Pavel Emelyanov, KAMEZAWA Hiroyuki,
Andrew Morton, Li Zefan, Mel Gorman, Christoph Lameter,
Hugh Dickins, Michal Hocko, Dave Hansen, Zhu Yanhai, linux-mm,
Michel Lespinasse
[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]
On Thu, May 12, 2011 at 5:40 PM, Rik van Riel <riel@redhat.com> wrote:
> On 05/12/2011 02:47 PM, Ying Han wrote:
>
> TODO:
>> a) there was a question on how to do zone balancing w/o global LRU. This
>> could be
>> solved by building another cgroup list per-zone, where we also link
>> cgroups under
>> their soft_limit. We won't scan the list unless the first list being
>> exhausted and
>> the free pages is still under the high_wmark.
>>
>
> b). one of the tricky part is to calculate the target nr_to_scan for each
>> cgroup,
>> especially combining the current heuristics with soft_limit exceeds. it
>> depends how
>> much weight we need to put on the second. One way is to make the ratio to
>> be user
>> configurable.
>>
>
> Johannes addresses these in his patch series.
That would be great, I am reading through his patch and apparently not
getting there yet :)
>
>
> Ying Han (4):
>> Disable "organizing cgroups over soft limit in a RB-Tree"
>> Organize memcgs over soft limit in round-robin.
>> Implementation of soft_limit reclaim in round-robin.
>> Add some debugging stats
>>
>
> Looks like you also have some things Johannes doesn't have.
>
> It may be good for the two patch series you have to get
> merged into one series, before stuff gets merged upstream.
>
> Yes, that is my motivation here to post the patch here :)
--Ying
> --
> All rights reversed
>
[-- Attachment #2: Type: text/html, Size: 2438 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread