[PATCH V2 0/2] Provide more precise dump info for memcg-oom

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 0/2] Provide more precise dump info for memcg-oom
@ 2012-11-07  8:40 Sha Zhengju
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-07  8:40 UTC (permalink / raw)
  To: linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm, rientjes
  Cc: linux-kernel, Sha Zhengju

From: Sha Zhengju <handai.szj@taobao.com>


When memcg oom is happening the current memcg related dump information
is limited for debugging. The patches provide more detailed memcg page statistics
and also take hierarchy into consideration.
The previous primitive version can be reached here: https://lkml.org/lkml/2012/7/30/179.

Change log:
	1. some modification towards hierarchy
	2. rework dump_tasks
	3. rebased on Michal's mm tree since-3.6  

Any comments are welcomed. : )


Sha Zhengju (2):
	memcg-oom-provide-more-precise-dump-info-while-memcg.patch
	oom-rework-dump_tasks-to-optimize-memcg-oom-situatio.patch

 include/linux/memcontrol.h |    7 ++++
 include/linux/oom.h        |    2 +
 mm/memcontrol.c            |   85 +++++++++++++++++++++++++++++++++++++++-----
 mm/oom_kill.c              |   61 +++++++++++++++++++------------
 4 files changed, 122 insertions(+), 33 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07  8:40 [PATCH V2 0/2] Provide more precise dump info for memcg-oom Sha Zhengju
@ 2012-11-07  8:41 ` Sha Zhengju
  2012-11-07 18:02   ` David Rientjes
                     ` (2 more replies)
  2012-11-07  8:41 ` [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation Sha Zhengju
  2012-11-07 19:31 ` [PATCH V2 0/2] Provide more precise dump info for memcg-oom Andrew Morton
  2 siblings, 3 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-07  8:41 UTC (permalink / raw)
  To: linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm, rientjes
  Cc: linux-kernel, Sha Zhengju

From: Sha Zhengju <handai.szj@taobao.com>

Current, when a memcg oom is happening the oom dump messages is still global
state and provides few useful info for users. This patch prints more pointed
memcg page statistics for memcg-oom.

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/memcontrol.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 mm/oom_kill.c   |    6 +++-
 2 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0eab7d5..2df5e72 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = {
 	"pgmajfault",
 };
 
+static const char * const mem_cgroup_lru_names[] = {
+	"inactive_anon",
+	"active_anon",
+	"inactive_file",
+	"active_file",
+	"unevictable",
+};
+
 /*
  * Per memcg event counter is incremented at every pagein/pageout. With THP,
  * it will be incremated by the number of pages. This counter is used for
@@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
 	spin_unlock_irqrestore(&memcg->move_lock, *flags);
 }
 
+#define K(x) ((x) << (PAGE_SHIFT-10))
+static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
+{
+	struct mem_cgroup *mi;
+	unsigned int i;
+
+	if (!memcg->use_hierarchy && memcg != root_mem_cgroup) {
+		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+				continue;
+			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
+				K(mem_cgroup_read_stat(memcg, i)));
+		}
+
+		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
+			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
+				mem_cgroup_read_events(memcg, i));
+
+		for (i = 0; i < NR_LRU_LISTS; i++)
+			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
+				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
+	} else {
+
+		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+			long long val = 0;
+
+			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+				continue;
+			for_each_mem_cgroup_tree(mi, memcg)
+				val += mem_cgroup_read_stat(mi, i);
+			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
+		}
+
+		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+			unsigned long long val = 0;
+
+			for_each_mem_cgroup_tree(mi, memcg)
+				val += mem_cgroup_read_events(mi, i);
+			printk(KERN_CONT "%s:%llu ",
+				mem_cgroup_events_names[i], val);
+		}
+
+		for (i = 0; i < NR_LRU_LISTS; i++) {
+			unsigned long long val = 0;
+
+			for_each_mem_cgroup_tree(mi, memcg)
+				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
+			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
+		}
+	}
+	printk(KERN_CONT "\n");
+}
 /**
- * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
  * @memcg: The memory cgroup that went over limit
  * @p: Task that is going to be killed
  *
@@ -1569,6 +1628,8 @@ done:
 		res_counter_read_u64(&memcg->kmem, RES_USAGE) >> 10,
 		res_counter_read_u64(&memcg->kmem, RES_LIMIT) >> 10,
 		res_counter_read_u64(&memcg->kmem, RES_FAILCNT));
+
+	mem_cgroup_print_oom_stat(memcg);
 }
 
 /*
@@ -5195,14 +5256,6 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 }
 #endif /* CONFIG_NUMA */
 
-static const char * const mem_cgroup_lru_names[] = {
-	"inactive_anon",
-	"active_anon",
-	"inactive_file",
-	"active_file",
-	"unevictable",
-};
-
 static inline void mem_cgroup_lru_names_not_uptodate(void)
 {
 	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7e9e911..4b8a6dd 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -421,8 +421,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
-	mem_cgroup_print_oom_info(memcg, p);
-	show_mem(SHOW_MEM_FILTER_NODES);
+	if (memcg)
+		mem_cgroup_print_oom_info(memcg, p);
+	else
+		show_mem(SHOW_MEM_FILTER_NODES);
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(memcg, nodemask);
 }
-- 
1.7.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation
  2012-11-07  8:40 [PATCH V2 0/2] Provide more precise dump info for memcg-oom Sha Zhengju
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
@ 2012-11-07  8:41 ` Sha Zhengju
  2012-11-07 18:07   ` David Rientjes
  2012-11-07 22:34   ` Michal Hocko
  2012-11-07 19:31 ` [PATCH V2 0/2] Provide more precise dump info for memcg-oom Andrew Morton
  2 siblings, 2 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-07  8:41 UTC (permalink / raw)
  To: linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm, rientjes
  Cc: linux-kernel, Sha Zhengju

From: Sha Zhengju <handai.szj@taobao.com>

If memcg oom happening, don't scan all system tasks to dump memory state of
eligible tasks, instead we iterates only over the process attached to the oom
memcg and avoid the rcu lock.


Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memcontrol.h |    7 +++++
 include/linux/oom.h        |    2 +
 mm/memcontrol.c            |   14 +++++++++++
 mm/oom_kill.c              |   55 ++++++++++++++++++++++++++-----------------
 4 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c91e3c1..4322ca8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -122,6 +122,8 @@ unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
 void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 					struct task_struct *p);
+extern void dump_tasks_memcg(const struct mem_cgroup *memcg,
+					const nodemask_t *nodemask);
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
 					struct page *newpage);
 
@@ -337,6 +339,11 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
 }
 
+static inline void
+dump_tasks_memcg(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
+{
+}
+
 static inline void mem_cgroup_begin_update_page_stat(struct page *page,
 					bool *locked, unsigned long *flags)
 {
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 20b5c46..9ba3344 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -57,6 +57,8 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
 		unsigned long totalpages, const nodemask_t *nodemask,
 		bool force_kill);
 
+extern inline void dump_per_task(struct task_struct *p,
+				const nodemask_t *nodemask);
 extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		int order, nodemask_t *mask, bool force_kill);
 extern int register_oom_notifier(struct notifier_block *nb);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2df5e72..fe648f8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1665,6 +1665,20 @@ static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
 	return min(limit, memsw);
 }
 
+void dump_tasks_memcg(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+	struct cgroup *cgroup = memcg->css.cgroup;
+
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		dump_per_task(task, nodemask);
+	}
+
+	cgroup_iter_end(cgroup, &it);
+}
+
 static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 				     int order)
 {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4b8a6dd..aaf6237 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -367,6 +367,32 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	return chosen;
 }
 
+inline void dump_per_task(struct task_struct *p, const nodemask_t *nodemask)
+{
+	struct task_struct *task;
+
+	if (oom_unkillable_task(p, NULL, nodemask))
+		return;
+
+	task = find_lock_task_mm(p);
+	if (!task) {
+		/*
+		 * This is a kthread or all of p's threads have already
+		 * detached their mm's.  There's no need to report
+		 * them; they can't be oom killed anyway.
+		 */
+		return;
+	}
+
+	pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
+		task->pid, from_kuid(&init_user_ns, task_uid(task)),
+		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
+		task->mm->nr_ptes,
+		get_mm_counter(task->mm, MM_SWAPENTS),
+		task->signal->oom_score_adj, task->comm);
+	task_unlock(task);
+}
+
 /**
  * dump_tasks - dump current memory state of all system tasks
  * @memcg: current's memory controller, if constrained
@@ -381,32 +407,17 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
 {
 	struct task_struct *p;
-	struct task_struct *task;
 
 	pr_info("[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name\n");
-	rcu_read_lock();
-	for_each_process(p) {
-		if (oom_unkillable_task(p, memcg, nodemask))
-			continue;
-
-		task = find_lock_task_mm(p);
-		if (!task) {
-			/*
-			 * This is a kthread or all of p's threads have already
-			 * detached their mm's.  There's no need to report
-			 * them; they can't be oom killed anyway.
-			 */
-			continue;
-		}
 
-		pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
-			task->pid, from_kuid(&init_user_ns, task_uid(task)),
-			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
-			task->mm->nr_ptes,
-			get_mm_counter(task->mm, MM_SWAPENTS),
-			task->signal->oom_score_adj, task->comm);
-		task_unlock(task);
+	if (memcg) {
+		dump_tasks_memcg(memcg, nodemask);
+		return;
 	}
+
+	rcu_read_lock();
+	for_each_process(p)
+		dump_per_task(p, nodemask);
 	rcu_read_unlock();
 }
 
-- 
1.7.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
@ 2012-11-07 18:02   ` David Rientjes
  2012-11-08 12:37     ` Sha Zhengju
  2012-11-07 22:17   ` Michal Hocko
  2012-11-08  9:07   ` Kamezawa Hiroyuki
  2 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2012-11-07 18:02 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm, linux-kernel,
	Sha Zhengju

On Wed, 7 Nov 2012, Sha Zhengju wrote:

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0eab7d5..2df5e72 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = {
>  	"pgmajfault",
>  };
>  
> +static const char * const mem_cgroup_lru_names[] = {
> +	"inactive_anon",
> +	"active_anon",
> +	"inactive_file",
> +	"active_file",
> +	"unevictable",
> +};
> +
>  /*
>   * Per memcg event counter is incremented at every pagein/pageout. With THP,
>   * it will be incremated by the number of pages. This counter is used for
> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>  	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>  }
>  
> +#define K(x) ((x) << (PAGE_SHIFT-10))
> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *mi;
> +	unsigned int i;
> +
> +	if (!memcg->use_hierarchy && memcg != root_mem_cgroup) {
> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],

This printk isn't continuing any previous printk, so using KERN_CONT here 
will require a short header to be printed first ("Memcg: "?) with 
KERN_INFO before the iterations.

> +				K(mem_cgroup_read_stat(memcg, i)));
> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
> +				mem_cgroup_read_events(memcg, i));
> +
> +		for (i = 0; i < NR_LRU_LISTS; i++)
> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
> +	} else {
> +

Spurious newline.

Eek, is there really no way to avoid this if-conditional and just use 
for_each_mem_cgroup_tree() for everything and use

	mem_cgroup_iter_break(memcg, iter);
	break;

for !memcg->use_hierarchy?

> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			long long val = 0;
> +
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_stat(mi, i);
> +			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_events(mi, i);
> +			printk(KERN_CONT "%s:%llu ",
> +				mem_cgroup_events_names[i], val);
> +		}
> +
> +		for (i = 0; i < NR_LRU_LISTS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
> +			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
> +		}
> +	}
> +	printk(KERN_CONT "\n");
> +}
>  /**
> - * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
>   * @memcg: The memory cgroup that went over limit
>   * @p: Task that is going to be killed
>   *
> @@ -1569,6 +1628,8 @@ done:
>  		res_counter_read_u64(&memcg->kmem, RES_USAGE) >> 10,
>  		res_counter_read_u64(&memcg->kmem, RES_LIMIT) >> 10,
>  		res_counter_read_u64(&memcg->kmem, RES_FAILCNT));
> +
> +	mem_cgroup_print_oom_stat(memcg);

I think this should be folded into mem_cgroup_print_oom_info(), I don't 
see a need for a new function.

>  }
>  
>  /*
> @@ -5195,14 +5256,6 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>  }
>  #endif /* CONFIG_NUMA */
>  
> -static const char * const mem_cgroup_lru_names[] = {
> -	"inactive_anon",
> -	"active_anon",
> -	"inactive_file",
> -	"active_file",
> -	"unevictable",
> -};
> -
>  static inline void mem_cgroup_lru_names_not_uptodate(void)
>  {
>  	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7e9e911..4b8a6dd 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -421,8 +421,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  	cpuset_print_task_mems_allowed(current);
>  	task_unlock(current);
>  	dump_stack();
> -	mem_cgroup_print_oom_info(memcg, p);
> -	show_mem(SHOW_MEM_FILTER_NODES);
> +	if (memcg)
> +		mem_cgroup_print_oom_info(memcg, p);

mem_cgroup_print_oom_info() already returns immediately for !memcg, so I'm 
not sure why this change is made.

> +	else
> +		show_mem(SHOW_MEM_FILTER_NODES);

Well that's disappointing if memcg == root_mem_cgroup, we'd probably like 
to know the global memory state to determine what the problem is.

>  	if (sysctl_oom_dump_tasks)
>  		dump_tasks(memcg, nodemask);
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation
  2012-11-07  8:41 ` [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation Sha Zhengju
@ 2012-11-07 18:07   ` David Rientjes
  2012-11-07 22:34   ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2012-11-07 18:07 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm, linux-kernel,
	Sha Zhengju

On Wed, 7 Nov 2012, Sha Zhengju wrote:

> From: Sha Zhengju <handai.szj@taobao.com>
> 
> If memcg oom happening, don't scan all system tasks to dump memory state of
> eligible tasks, instead we iterates only over the process attached to the oom
> memcg and avoid the rcu lock.
> 

Avoiding the rcu lock isn't actually that impressive here, the cgroup 
iterator will use it's own lock for that memcg.

> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/memcontrol.h |    7 +++++
>  include/linux/oom.h        |    2 +
>  mm/memcontrol.c            |   14 +++++++++++
>  mm/oom_kill.c              |   55 ++++++++++++++++++++++++++-----------------
>  4 files changed, 56 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index c91e3c1..4322ca8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -122,6 +122,8 @@ unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
>  void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
>  extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
>  					struct task_struct *p);
> +extern void dump_tasks_memcg(const struct mem_cgroup *memcg,
> +					const nodemask_t *nodemask);

Shouldn't need the nodemask parameter, just have dump_tasks_memcg() pass 
NULL to dump_per_task(), we won't be isolating to tasks with mempolicies 
restricted to a particular set of nodes since we're in the memcg oom path 
here, not the global page allocator oom path.

>  extern void mem_cgroup_replace_page_cache(struct page *oldpage,
>  					struct page *newpage);
>  
> @@ -337,6 +339,11 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
>  {
>  }
>  
> +static inline void
> +dump_tasks_memcg(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
> +{
> +}
> +
>  static inline void mem_cgroup_begin_update_page_stat(struct page *page,
>  					bool *locked, unsigned long *flags)
>  {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 20b5c46..9ba3344 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -57,6 +57,8 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>  		unsigned long totalpages, const nodemask_t *nodemask,
>  		bool force_kill);
>  
> +extern inline void dump_per_task(struct task_struct *p,
> +				const nodemask_t *nodemask);

This is a global symbol, so dump_per_task() doesn't make a lot of sense: 
it would need to be prefixed with "oom_" so perhaps oom_dump_task() is 
better?

>  extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>  		int order, nodemask_t *mask, bool force_kill);
>  extern int register_oom_notifier(struct notifier_block *nb);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2df5e72..fe648f8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1665,6 +1665,20 @@ static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
>  	return min(limit, memsw);
>  }
>  
> +void dump_tasks_memcg(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
> +{
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +	struct cgroup *cgroup = memcg->css.cgroup;
> +
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it))) {
> +		dump_per_task(task, nodemask);
> +	}
> +
> +	cgroup_iter_end(cgroup, &it);
> +}
> +
>  static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  				     int order)
>  {
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4b8a6dd..aaf6237 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -367,6 +367,32 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  	return chosen;
>  }
>  
> +inline void dump_per_task(struct task_struct *p, const nodemask_t *nodemask)

No inline.

> +{
> +	struct task_struct *task;
> +
> +	if (oom_unkillable_task(p, NULL, nodemask))
> +		return;
> +
> +	task = find_lock_task_mm(p);
> +	if (!task) {
> +		/*
> +		 * This is a kthread or all of p's threads have already
> +		 * detached their mm's.  There's no need to report
> +		 * them; they can't be oom killed anyway.
> +		 */
> +		return;
> +	}
> +
> +	pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
> +		task->pid, from_kuid(&init_user_ns, task_uid(task)),
> +		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
> +		task->mm->nr_ptes,
> +		get_mm_counter(task->mm, MM_SWAPENTS),
> +		task->signal->oom_score_adj, task->comm);
> +	task_unlock(task);
> +}
> +
>  /**
>   * dump_tasks - dump current memory state of all system tasks
>   * @memcg: current's memory controller, if constrained
> @@ -381,32 +407,17 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>  	struct task_struct *p;
> -	struct task_struct *task;
>  
>  	pr_info("[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name\n");
> -	rcu_read_lock();
> -	for_each_process(p) {
> -		if (oom_unkillable_task(p, memcg, nodemask))
> -			continue;
> -
> -		task = find_lock_task_mm(p);
> -		if (!task) {
> -			/*
> -			 * This is a kthread or all of p's threads have already
> -			 * detached their mm's.  There's no need to report
> -			 * them; they can't be oom killed anyway.
> -			 */
> -			continue;
> -		}
>  
> -		pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
> -			task->pid, from_kuid(&init_user_ns, task_uid(task)),
> -			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
> -			task->mm->nr_ptes,
> -			get_mm_counter(task->mm, MM_SWAPENTS),
> -			task->signal->oom_score_adj, task->comm);
> -		task_unlock(task);
> +	if (memcg) {
> +		dump_tasks_memcg(memcg, nodemask);
> +		return;
>  	}
> +
> +	rcu_read_lock();
> +	for_each_process(p)
> +		dump_per_task(p, nodemask);
>  	rcu_read_unlock();
>  }
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 0/2] Provide more precise dump info for memcg-oom
  2012-11-07  8:40 [PATCH V2 0/2] Provide more precise dump info for memcg-oom Sha Zhengju
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
  2012-11-07  8:41 ` [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation Sha Zhengju
@ 2012-11-07 19:31 ` Andrew Morton
  2 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2012-11-07 19:31 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, mhocko, kamezawa.hiroyu, rientjes,
	linux-kernel, Sha Zhengju

On Wed,  7 Nov 2012 16:40:02 +0800
Sha Zhengju <handai.szj@gmail.com> wrote:

> When memcg oom is happening the current memcg related dump information
> is limited for debugging. The patches provide more detailed memcg page statistics
> and also take hierarchy into consideration.

Within the changelogs, please include a sample of the proposed output
so we can properly review the proposal.

Also it would be useful to provide some justification for the decisions
in this patch: which data is displayed and, particularly, which is not?
Why is the displayed information useful to developers, etc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
  2012-11-07 18:02   ` David Rientjes
@ 2012-11-07 22:17   ` Michal Hocko
  2012-11-08 12:45     ` Sha Zhengju
  2012-11-08  9:07   ` Kamezawa Hiroyuki
  2 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2012-11-07 22:17 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, kamezawa.hiroyu, akpm, rientjes, linux-kernel,
	Sha Zhengju

On Wed 07-11-12 16:41:36, Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
> 
> Current, when a memcg oom is happening the oom dump messages is still global
> state and provides few useful info for users. This patch prints more pointed
> memcg page statistics for memcg-oom.
> 
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/memcontrol.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>  mm/oom_kill.c   |    6 +++-
>  2 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0eab7d5..2df5e72 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>  	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>  }
>  
> +#define K(x) ((x) << (PAGE_SHIFT-10))
> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *mi;
> +	unsigned int i;
> +
> +	if (!memcg->use_hierarchy && memcg != root_mem_cgroup) {
> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
> +				K(mem_cgroup_read_stat(memcg, i)));
> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
> +				mem_cgroup_read_events(memcg, i));
> +
> +		for (i = 0; i < NR_LRU_LISTS; i++)
> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
> +	} else {
> +
> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			long long val = 0;
> +
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_stat(mi, i);
> +			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_events(mi, i);
> +			printk(KERN_CONT "%s:%llu ",
> +				mem_cgroup_events_names[i], val);
> +		}
> +
> +		for (i = 0; i < NR_LRU_LISTS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
> +			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
> +		}
> +	}

This is just plain ugly. for_each_mem_cgroup_tree is use_hierarchy aware
and there is no need for if (use_hierarchy) part.
memcg != root_mem_cgroup test doesn't make much sense as well because we
call that a global oom killer ;)
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation
  2012-11-07  8:41 ` [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation Sha Zhengju
  2012-11-07 18:07   ` David Rientjes
@ 2012-11-07 22:34   ` Michal Hocko
  2012-11-08 13:58     ` Sha Zhengju
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2012-11-07 22:34 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, kamezawa.hiroyu, akpm, rientjes, linux-kernel,
	Sha Zhengju

On Wed 07-11-12 16:41:59, Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
> 
> If memcg oom happening, don't scan all system tasks to dump memory state of
> eligible tasks, instead we iterates only over the process attached to the oom
> memcg and avoid the rcu lock.

you have replaced rcu lock by css_set_lock which is, well, heavier than
rcu. Besides that the patch is not correct because you have excluded
all tasks that are from subgroups because you iterate only through the
top level one.
I am not sure the whole optimization would be a win even if implemented
correctly. Well, we scan through more tasks currently and most of them
are not relevant but then you would need to exclude task_in_mem_cgroup
from oom_unkillable_task and that would be more code churn than the
win.

> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/linux/memcontrol.h |    7 +++++
>  include/linux/oom.h        |    2 +
>  mm/memcontrol.c            |   14 +++++++++++
>  mm/oom_kill.c              |   55 ++++++++++++++++++++++++++-----------------
>  4 files changed, 56 insertions(+), 22 deletions(-)
> 
[...]
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 20b5c46..9ba3344 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -57,6 +57,8 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>  		unsigned long totalpages, const nodemask_t *nodemask,
>  		bool force_kill);
>  
> +extern inline void dump_per_task(struct task_struct *p,
> +				const nodemask_t *nodemask);
>  extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>  		int order, nodemask_t *mask, bool force_kill);
>  extern int register_oom_notifier(struct notifier_block *nb);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2df5e72..fe648f8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1665,6 +1665,20 @@ static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
>  	return min(limit, memsw);
>  }
>  
> +void dump_tasks_memcg(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
> +{
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +	struct cgroup *cgroup = memcg->css.cgroup;
> +
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it))) {
> +		dump_per_task(task, nodemask);
> +	}
> +
> +	cgroup_iter_end(cgroup, &it);
> +}
> +
>  static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  				     int order)
>  {
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4b8a6dd..aaf6237 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -367,6 +367,32 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  	return chosen;
>  }
>  
> +inline void dump_per_task(struct task_struct *p, const nodemask_t *nodemask)
> +{
> +	struct task_struct *task;
> +
> +	if (oom_unkillable_task(p, NULL, nodemask))
> +		return;
> +
> +	task = find_lock_task_mm(p);
> +	if (!task) {
> +		/*
> +		 * This is a kthread or all of p's threads have already
> +		 * detached their mm's.  There's no need to report
> +		 * them; they can't be oom killed anyway.
> +		 */
> +		return;
> +	}
> +
> +	pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
> +		task->pid, from_kuid(&init_user_ns, task_uid(task)),
> +		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
> +		task->mm->nr_ptes,
> +		get_mm_counter(task->mm, MM_SWAPENTS),
> +		task->signal->oom_score_adj, task->comm);
> +	task_unlock(task);
> +}
> +
>  /**
>   * dump_tasks - dump current memory state of all system tasks
>   * @memcg: current's memory controller, if constrained
> @@ -381,32 +407,17 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>  	struct task_struct *p;
> -	struct task_struct *task;
>  
>  	pr_info("[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name\n");
> -	rcu_read_lock();
> -	for_each_process(p) {
> -		if (oom_unkillable_task(p, memcg, nodemask))
> -			continue;
> -
> -		task = find_lock_task_mm(p);
> -		if (!task) {
> -			/*
> -			 * This is a kthread or all of p's threads have already
> -			 * detached their mm's.  There's no need to report
> -			 * them; they can't be oom killed anyway.
> -			 */
> -			continue;
> -		}
>  
> -		pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu         %5d %s\n",
> -			task->pid, from_kuid(&init_user_ns, task_uid(task)),
> -			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
> -			task->mm->nr_ptes,
> -			get_mm_counter(task->mm, MM_SWAPENTS),
> -			task->signal->oom_score_adj, task->comm);
> -		task_unlock(task);
> +	if (memcg) {
> +		dump_tasks_memcg(memcg, nodemask);
> +		return;
>  	}
> +
> +	rcu_read_lock();
> +	for_each_process(p)
> +		dump_per_task(p, nodemask);
>  	rcu_read_unlock();
>  }
>  
> -- 
> 1.7.6.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
  2012-11-07 18:02   ` David Rientjes
  2012-11-07 22:17   ` Michal Hocko
@ 2012-11-08  9:07   ` Kamezawa Hiroyuki
  2012-11-08 13:12     ` Sha Zhengju
  2 siblings, 1 reply; 14+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08  9:07 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: linux-mm, cgroups, mhocko, akpm, rientjes, linux-kernel,
	Sha Zhengju

(2012/11/07 17:41), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
> 
> Current, when a memcg oom is happening the oom dump messages is still global
> state and provides few useful info for users. This patch prints more pointed
> memcg page statistics for memcg-oom.
> 
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>   mm/memcontrol.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>   mm/oom_kill.c   |    6 +++-
>   2 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0eab7d5..2df5e72 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = {
>   	"pgmajfault",
>   };
>   
> +static const char * const mem_cgroup_lru_names[] = {
> +	"inactive_anon",
> +	"active_anon",
> +	"inactive_file",
> +	"active_file",
> +	"unevictable",
> +};
> +

Is this for the same strings with show_free_areas() ?


>   /*
>    * Per memcg event counter is incremented at every pagein/pageout. With THP,
>    * it will be incremated by the number of pages. This counter is used for
> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>   	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>   }
>   
> +#define K(x) ((x) << (PAGE_SHIFT-10))
> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *mi;
> +	unsigned int i;
> +
> +	if (!memcg->use_hierarchy && memcg != root_mem_cgroup) {

Why do you need to have this condition check ?

> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
> +				K(mem_cgroup_read_stat(memcg, i)));

Hm, how about using the same style with show_free_areas() ?

> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
> +				mem_cgroup_read_events(memcg, i));
> +

I don't think EVENTS info is useful for oom.

> +		for (i = 0; i < NR_LRU_LISTS; i++)
> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));

How far does your new information has different format than usual oom ?
Could you show a sample and difference in changelog ?

Of course, I prefer both of them has similar format.





> +	} else {
> +
> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> +			long long val = 0;
> +
> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> +				continue;
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_stat(mi, i);
> +			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
> +		}
> +
> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_read_events(mi, i);
> +			printk(KERN_CONT "%s:%llu ",
> +				mem_cgroup_events_names[i], val);
> +		}
> +
> +		for (i = 0; i < NR_LRU_LISTS; i++) {
> +			unsigned long long val = 0;
> +
> +			for_each_mem_cgroup_tree(mi, memcg)
> +				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
> +			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
> +		}
> +	}
> +	printk(KERN_CONT "\n");
> +}




>   /**
> - * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
>    * @memcg: The memory cgroup that went over limit
>    * @p: Task that is going to be killed
>    *
> @@ -1569,6 +1628,8 @@ done:
>   		res_counter_read_u64(&memcg->kmem, RES_USAGE) >> 10,
>   		res_counter_read_u64(&memcg->kmem, RES_LIMIT) >> 10,
>   		res_counter_read_u64(&memcg->kmem, RES_FAILCNT));
> +
> +	mem_cgroup_print_oom_stat(memcg);
>   }

please put directly in print_oom_info()



Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07 18:02   ` David Rientjes
@ 2012-11-08 12:37     ` Sha Zhengju
  2012-11-08 12:44       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Sha Zhengju @ 2012-11-08 12:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Sha Zhengju, linux-mm, cgroups, mhocko, kamezawa.hiroyu, akpm,
	linux-kernel

On 11/08/2012 02:02 AM, David Rientjes wrote:
> On Wed, 7 Nov 2012, Sha Zhengju wrote:
>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 0eab7d5..2df5e72 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = {
>>   	"pgmajfault",
>>   };
>>
>> +static const char * const mem_cgroup_lru_names[] = {
>> +	"inactive_anon",
>> +	"active_anon",
>> +	"inactive_file",
>> +	"active_file",
>> +	"unevictable",
>> +};
>> +
>>   /*
>>    * Per memcg event counter is incremented at every pagein/pageout. With THP,
>>    * it will be incremated by the number of pages. This counter is used for
>> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>>   	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>>   }
>>
>> +#define K(x) ((x)<<  (PAGE_SHIFT-10))
>> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
>> +{
>> +	struct mem_cgroup *mi;
>> +	unsigned int i;
>> +
>> +	if (!memcg->use_hierarchy&&  memcg != root_mem_cgroup) {
>> +		for (i = 0; i<  MEM_CGROUP_STAT_NSTATS; i++) {
>> +			if (i == MEM_CGROUP_STAT_SWAP&&  !do_swap_account)
>> +				continue;
>> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
> This printk isn't continuing any previous printk, so using KERN_CONT here
> will require a short header to be printed first ("Memcg: "?) with
> KERN_INFO before the iterations.
>

Yep...I think I lost it while rebasing... sorry for the stupid mistake.

>> +				K(mem_cgroup_read_stat(memcg, i)));
>> +		}
>> +
>> +		for (i = 0; i<  MEM_CGROUP_EVENTS_NSTATS; i++)
>> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
>> +				mem_cgroup_read_events(memcg, i));
>> +
>> +		for (i = 0; i<  NR_LRU_LISTS; i++)
>> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
>> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
>> +	} else {
>> +
> Spurious newline.
>
> Eek, is there really no way to avoid this if-conditional and just use
> for_each_mem_cgroup_tree() for everything and use
>
> 	mem_cgroup_iter_break(memcg, iter);
> 	break;
>
> for !memcg->use_hierarchy?
>

Now I'm shamed at my bad brain of yesterday by sending this chunk out...
Yes, the if-part code above is obviously unwanted, and the 
for_each_mem_cgroup_tree
can handle hierarchy already.


>> +		for (i = 0; i<  MEM_CGROUP_STAT_NSTATS; i++) {
>> +			long long val = 0;
>> +
>> +			if (i == MEM_CGROUP_STAT_SWAP&&  !do_swap_account)
>> +				continue;
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_read_stat(mi, i);
>> +			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
>> +		}
>> +
>> +		for (i = 0; i<  MEM_CGROUP_EVENTS_NSTATS; i++) {
>> +			unsigned long long val = 0;
>> +
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_read_events(mi, i);
>> +			printk(KERN_CONT "%s:%llu ",
>> +				mem_cgroup_events_names[i], val);
>> +		}
>> +
>> +		for (i = 0; i<  NR_LRU_LISTS; i++) {
>> +			unsigned long long val = 0;
>> +
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
>> +			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
>> +		}
>> +	}
>> +	printk(KERN_CONT "\n");
>> +}
>>   /**
>> - * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode.
>>    * @memcg: The memory cgroup that went over limit
>>    * @p: Task that is going to be killed
>>    *
>> @@ -1569,6 +1628,8 @@ done:
>>   		res_counter_read_u64(&memcg->kmem, RES_USAGE)>>  10,
>>   		res_counter_read_u64(&memcg->kmem, RES_LIMIT)>>  10,
>>   		res_counter_read_u64(&memcg->kmem, RES_FAILCNT));
>> +
>> +	mem_cgroup_print_oom_stat(memcg);
> I think this should be folded into mem_cgroup_print_oom_info(), I don't
> see a need for a new function.
>
>>   }
>>
>>   /*
>> @@ -5195,14 +5256,6 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>>   }
>>   #endif /* CONFIG_NUMA */
>>
>> -static const char * const mem_cgroup_lru_names[] = {
>> -	"inactive_anon",
>> -	"active_anon",
>> -	"inactive_file",
>> -	"active_file",
>> -	"unevictable",
>> -};
>> -
>>   static inline void mem_cgroup_lru_names_not_uptodate(void)
>>   {
>>   	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 7e9e911..4b8a6dd 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -421,8 +421,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>>   	cpuset_print_task_mems_allowed(current);
>>   	task_unlock(current);
>>   	dump_stack();
>> -	mem_cgroup_print_oom_info(memcg, p);
>> -	show_mem(SHOW_MEM_FILTER_NODES);
>> +	if (memcg)
>> +		mem_cgroup_print_oom_info(memcg, p);
> mem_cgroup_print_oom_info() already returns immediately for !memcg, so I'm
> not sure why this change is made.
>

Here the if-else checking is aiming at printing distinct messages for 
memcg & non-memcg.
IMHO, global state has little actual use for memcg-oom and why not we 
wipe off iti 1/4 ?
Though mem_cgroup_print_oom_info already checking for !memcg, the 
if-statement
can avoid one function call and save the deep-enough oom call stack a 
little.

>> +	else
>> +		show_mem(SHOW_MEM_FILTER_NODES);
> Well that's disappointing if memcg == root_mem_cgroup, we'd probably like
> to know the global memory state to determine what the problem is.
>

I really wondering if there is any case that can pass root_mem_cgroup 
down here.
It's called by global or memcg oom killer and the global oom will set 
memcg=NULL
directly instead of root_mem_cgroup. Besides, root memcg will not go 
through charging
and there is no chance to call mem_cgroup_out_of_memory for root cgroup 
tasks.



Thanks,
Sha


>>   	if (sysctl_oom_dump_tasks)
>>   		dump_tasks(memcg, nodemask);
>>   }
> .
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-08 12:37     ` Sha Zhengju
@ 2012-11-08 12:44       ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2012-11-08 12:44 UTC (permalink / raw)
  To: Sha Zhengju
  Cc: David Rientjes, linux-mm, cgroups, kamezawa.hiroyu, akpm,
	linux-kernel

On Thu 08-11-12 20:37:45, Sha Zhengju wrote:
> On 11/08/2012 02:02 AM, David Rientjes wrote:
> >On Wed, 7 Nov 2012, Sha Zhengju wrote:
[..]
> >>+	else
> >>+		show_mem(SHOW_MEM_FILTER_NODES);
> >Well that's disappointing if memcg == root_mem_cgroup, we'd probably like
> >to know the global memory state to determine what the problem is.
> >
> 
> I really wondering if there is any case that can pass
> root_mem_cgroup down here.

No it cannot because the root cgroup doesn't have any limit so we cannot
trigger memcg oom killer.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-07 22:17   ` Michal Hocko
@ 2012-11-08 12:45     ` Sha Zhengju
  0 siblings, 0 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-08 12:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, cgroups, kamezawa.hiroyu, akpm, rientjes, linux-kernel,
	Sha Zhengju

On 11/08/2012 06:17 AM, Michal Hocko wrote:
> On Wed 07-11-12 16:41:36, Sha Zhengju wrote:
>> From: Sha Zhengju<handai.szj@taobao.com>
>>
>> Current, when a memcg oom is happening the oom dump messages is still global
>> state and provides few useful info for users. This patch prints more pointed
>> memcg page statistics for memcg-oom.
>>
>> Signed-off-by: Sha Zhengju<handai.szj@taobao.com>
>> Cc: Michal Hocko<mhocko@suse.cz>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: David Rientjes<rientjes@google.com>
>> Cc: Andrew Morton<akpm@linux-foundation.org>
>> ---
>>   mm/memcontrol.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>>   mm/oom_kill.c   |    6 +++-
>>   2 files changed, 66 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 0eab7d5..2df5e72 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>>   	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>>   }
>>
>> +#define K(x) ((x)<<  (PAGE_SHIFT-10))
>> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
>> +{
>> +	struct mem_cgroup *mi;
>> +	unsigned int i;
>> +
>> +	if (!memcg->use_hierarchy&&  memcg != root_mem_cgroup) {
>> +		for (i = 0; i<  MEM_CGROUP_STAT_NSTATS; i++) {
>> +			if (i == MEM_CGROUP_STAT_SWAP&&  !do_swap_account)
>> +				continue;
>> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
>> +				K(mem_cgroup_read_stat(memcg, i)));
>> +		}
>> +
>> +		for (i = 0; i<  MEM_CGROUP_EVENTS_NSTATS; i++)
>> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
>> +				mem_cgroup_read_events(memcg, i));
>> +
>> +		for (i = 0; i<  NR_LRU_LISTS; i++)
>> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
>> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
>> +	} else {
>> +
>> +		for (i = 0; i<  MEM_CGROUP_STAT_NSTATS; i++) {
>> +			long long val = 0;
>> +
>> +			if (i == MEM_CGROUP_STAT_SWAP&&  !do_swap_account)
>> +				continue;
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_read_stat(mi, i);
>> +			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
>> +		}
>> +
>> +		for (i = 0; i<  MEM_CGROUP_EVENTS_NSTATS; i++) {
>> +			unsigned long long val = 0;
>> +
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_read_events(mi, i);
>> +			printk(KERN_CONT "%s:%llu ",
>> +				mem_cgroup_events_names[i], val);
>> +		}
>> +
>> +		for (i = 0; i<  NR_LRU_LISTS; i++) {
>> +			unsigned long long val = 0;
>> +
>> +			for_each_mem_cgroup_tree(mi, memcg)
>> +				val += mem_cgroup_nr_lru_pages(mi, BIT(i));
>> +			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
>> +		}
>> +	}
> This is just plain ugly. for_each_mem_cgroup_tree is use_hierarchy aware
> and there is no need for if (use_hierarchy) part.
> memcg != root_mem_cgroup test doesn't make much sense as well because we
> call that a global oom killer ;)

Yes... bitterly did I repent the patch... The else-part of 
for_each_mem_cgroup_tree
is enough for hierarchy. I'll send a update one later.
Sorry for the noise. : (


Thanks,
Sha


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
  2012-11-08  9:07   ` Kamezawa Hiroyuki
@ 2012-11-08 13:12     ` Sha Zhengju
  0 siblings, 0 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-08 13:12 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: linux-mm, cgroups, mhocko, akpm, rientjes, linux-kernel,
	Sha Zhengju

On 11/08/2012 05:07 PM, Kamezawa Hiroyuki wrote:
> (2012/11/07 17:41), Sha Zhengju wrote:
>> From: Sha Zhengju <handai.szj@taobao.com>
>>
>> Current, when a memcg oom is happening the oom dump messages is still global
>> state and provides few useful info for users. This patch prints more pointed
>> memcg page statistics for memcg-oom.
>>
>> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>   mm/memcontrol.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>>   mm/oom_kill.c   |    6 +++-
>>   2 files changed, 66 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 0eab7d5..2df5e72 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = {
>>   	"pgmajfault",
>>   };
>>   
>> +static const char * const mem_cgroup_lru_names[] = {
>> +	"inactive_anon",
>> +	"active_anon",
>> +	"inactive_file",
>> +	"active_file",
>> +	"unevictable",
>> +};
>> +
> Is this for the same strings with show_free_areas() ?
>

I just move the declaration here from the bottom of source file to make
the following use error-free.

>>   /*
>>    * Per memcg event counter is incremented at every pagein/pageout. With THP,
>>    * it will be incremated by the number of pages. This counter is used for
>> @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
>>   	spin_unlock_irqrestore(&memcg->move_lock, *flags);
>>   }
>>   
>> +#define K(x) ((x) << (PAGE_SHIFT-10))
>> +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg)
>> +{
>> +	struct mem_cgroup *mi;
>> +	unsigned int i;
>> +
>> +	if (!memcg->use_hierarchy && memcg != root_mem_cgroup) {
> Why do you need to have this condition check ?
>

Yes, the check is unnecessary... I'll remove it next version.

>> +		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>> +			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> +				continue;
>> +			printk(KERN_CONT "%s:%ldKB ", mem_cgroup_stat_names[i],
>> +				K(mem_cgroup_read_stat(memcg, i)));
> Hm, how about using the same style with show_free_areas() ?
>

I'm also trying do so. show_free_areas() prints the memory related info
in two style:
one is in page unit and the oher is in KB (I've no idea why we distinct
them), but
I think the KB format is more readable.


>> +		}
>> +
>> +		for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
>> +			printk(KERN_CONT "%s:%lu ", mem_cgroup_events_names[i],
>> +				mem_cgroup_read_events(memcg, i));
>> +
> I don't think EVENTS info is useful for oom.
>

It seems you're right. : )

>> +		for (i = 0; i < NR_LRU_LISTS; i++)
>> +			printk(KERN_CONT "%s:%luKB ", mem_cgroup_lru_names[i],
>> +				K(mem_cgroup_nr_lru_pages(memcg, BIT(i))));
> How far does your new information has different format than usual oom ?
> Could you show a sample and difference in changelog ?
>
> Of course, I prefer both of them has similar format.
>
>
>
The new memcg-oom info excludes global state out and prints the memcg
statistics instead
which seems more brevity. I'll add a sample next time. Thanks for
reminding me!


Thanks,
Sha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation
  2012-11-07 22:34   ` Michal Hocko
@ 2012-11-08 13:58     ` Sha Zhengju
  0 siblings, 0 replies; 14+ messages in thread
From: Sha Zhengju @ 2012-11-08 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, cgroups, kamezawa.hiroyu, akpm, rientjes, linux-kernel,
	Sha Zhengju

On 11/08/2012 06:34 AM, Michal Hocko wrote:
> On Wed 07-11-12 16:41:59, Sha Zhengju wrote:
>> From: Sha Zhengju<handai.szj@taobao.com>
>>
>> If memcg oom happening, don't scan all system tasks to dump memory state of
>> eligible tasks, instead we iterates only over the process attached to the oom
>> memcg and avoid the rcu lock.
> you have replaced rcu lock by css_set_lock which is, well, heavier than
> rcu. Besides that the patch is not correct because you have excluded
> all tasks that are from subgroups because you iterate only through the
> top level one.
> I am not sure the whole optimization would be a win even if implemented
> correctly. Well, we scan through more tasks currently and most of them
> are not relevant but then you would need to exclude task_in_mem_cgroup
> from oom_unkillable_task and that would be more code churn than the
> win.

Thanks for your and David's advice.
This piece is trying to save some expense while dumping memcg tasks, but 
failed to
scanning subgroups by iterating the cgroup. I'm agreed with your cost&win
opinion, so I decide to give up this one. : )


Thanks,
Sha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-11-08 13:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-07  8:40 [PATCH V2 0/2] Provide more precise dump info for memcg-oom Sha Zhengju
2012-11-07  8:41 ` [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
2012-11-07 18:02   ` David Rientjes
2012-11-08 12:37     ` Sha Zhengju
2012-11-08 12:44       ` Michal Hocko
2012-11-07 22:17   ` Michal Hocko
2012-11-08 12:45     ` Sha Zhengju
2012-11-08  9:07   ` Kamezawa Hiroyuki
2012-11-08 13:12     ` Sha Zhengju
2012-11-07  8:41 ` [PATCH 2/2] oom: rework dump_tasks to optimize memcg-oom situation Sha Zhengju
2012-11-07 18:07   ` David Rientjes
2012-11-07 22:34   ` Michal Hocko
2012-11-08 13:58     ` Sha Zhengju
2012-11-07 19:31 ` [PATCH V2 0/2] Provide more precise dump info for memcg-oom Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).