All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sha Zhengju <handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH V3] memcg, oom: provide more precise dump info while memcg oom happening
Date: Fri, 09 Nov 2012 20:09:29 +0800	[thread overview]
Message-ID: <509CF279.1080602@gmail.com> (raw)
In-Reply-To: <20121109105040.GA5006-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

On 11/09/2012 06:50 PM, Michal Hocko wrote:
> On Fri 09-11-12 18:23:07, Sha Zhengju wrote:
>> On 11/09/2012 12:25 AM, Michal Hocko wrote:
>>> On Thu 08-11-12 23:52:47, Sha Zhengju wrote:
> [...]
>>>> +	for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
>>>> +		long long val = 0;
>>>> +		if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
>>>> +			continue;
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_read_stat(mi, i);
>>>> +		printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
>>>> +	}
>>>> +
>>>> +	for (i = 0; i<   NR_LRU_LISTS; i++) {
>>>> +		unsigned long long val = 0;
>>>> +
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_nr_lru_pages(mi, BIT(i));
>>>> +		printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
>>>> +	}
>>>> +	printk(KERN_CONT "\n");
>>> This is nice and simple I am just thinking whether it is enough. Say
>>> that you have a deeper hierarchy and the there is a safety limit in the
>>> its root
>>>          A (limit)
>>>         /|\
>>>        B C D
>>>            |\
>>> 	  E F
>>>
>>> and we trigger an OOM on the A's limit. Now we know that something blew
>>> up but what it was we do not know. Wouldn't it be better to swap the for
>>> and for_each_mem_cgroup_tree loops? Then we would see the whole
>>> hierarchy and can potentially point at the group which doesn't behave.
>>> Memory cgroup stats for A/: ...
>>> Memory cgroup stats for A/B/: ...
>>> Memory cgroup stats for A/C/: ...
>>> Memory cgroup stats for A/D/: ...
>>> Memory cgroup stats for A/D/E/: ...
>>> Memory cgroup stats for A/D/F/: ...
>>>
>>> Would it still fit in with your use case?
>>> [...]
>> We haven't used those complicate hierarchy yet, but it sounds a good
>> suggestion. :)
>> Hierarchy is a little complex to use from our experience, and the
>> three cgroups involved in memcg oom can be different: memcg of
>> invoker, killed task, memcg of going over limit.Suppose a process in
>> B triggers oom and a victim in root A is selected to be killed, we
>> may as well want to know memcg stats just local in A cgroup(excludes
>> BCD). So besides hierarchy info, does it acceptable to also print
>> the local root node stats which as I did in the V1
>> version(https://lkml.org/lkml/2012/7/30/179).
> Ohh, I probably wasn't clear enough. I didn't suggest cumulative
> numbers. Only per group. So it would be something like:
>
> 	for_each_mem_cgroup_tree(mi, memcg) {
> 		printk("Memory cgroup stats for %s", memcg_name);
> 		for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
> 			if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
> 				continue;
> 			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i],
> 				K(mem_cgroup_read_stat(mi, i)));
> 		}
> 		for (i = 0; i<   NR_LRU_LISTS; i++)
> 			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i],
> 				K(mem_cgroup_nr_lru_pages(mi, BIT(i))));
>
> 		printk(KERN_CONT"\n");
> 	}
>

Now I catch your point and understand the above... It's smarter than I 
thought before.
Thanks for explaining!

>> Another one I'm hesitating is numa stats, it seems the output is
>> beginning to get more and more....
> NUMA stats are basically per node - per zone LRU data and that the
> for(NR_LRU_LISTS) can be easily extended to cover that.

Yes, the numa_stat cgroup file has done works here. I'll add the numa
stats if you don't feel improper.


WARNING: multiple messages have this Message-ID (diff)
From: Sha Zhengju <handai.szj@gmail.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, akpm@linux-foundation.org,
	rientjes@google.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] memcg, oom: provide more precise dump info while memcg oom happening
Date: Fri, 09 Nov 2012 20:09:29 +0800	[thread overview]
Message-ID: <509CF279.1080602@gmail.com> (raw)
In-Reply-To: <20121109105040.GA5006@dhcp22.suse.cz>

On 11/09/2012 06:50 PM, Michal Hocko wrote:
> On Fri 09-11-12 18:23:07, Sha Zhengju wrote:
>> On 11/09/2012 12:25 AM, Michal Hocko wrote:
>>> On Thu 08-11-12 23:52:47, Sha Zhengju wrote:
> [...]
>>>> +	for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
>>>> +		long long val = 0;
>>>> +		if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
>>>> +			continue;
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_read_stat(mi, i);
>>>> +		printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
>>>> +	}
>>>> +
>>>> +	for (i = 0; i<   NR_LRU_LISTS; i++) {
>>>> +		unsigned long long val = 0;
>>>> +
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_nr_lru_pages(mi, BIT(i));
>>>> +		printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
>>>> +	}
>>>> +	printk(KERN_CONT "\n");
>>> This is nice and simple I am just thinking whether it is enough. Say
>>> that you have a deeper hierarchy and the there is a safety limit in the
>>> its root
>>>          A (limit)
>>>         /|\
>>>        B C D
>>>            |\
>>> 	  E F
>>>
>>> and we trigger an OOM on the A's limit. Now we know that something blew
>>> up but what it was we do not know. Wouldn't it be better to swap the for
>>> and for_each_mem_cgroup_tree loops? Then we would see the whole
>>> hierarchy and can potentially point at the group which doesn't behave.
>>> Memory cgroup stats for A/: ...
>>> Memory cgroup stats for A/B/: ...
>>> Memory cgroup stats for A/C/: ...
>>> Memory cgroup stats for A/D/: ...
>>> Memory cgroup stats for A/D/E/: ...
>>> Memory cgroup stats for A/D/F/: ...
>>>
>>> Would it still fit in with your use case?
>>> [...]
>> We haven't used those complicate hierarchy yet, but it sounds a good
>> suggestion. :)
>> Hierarchy is a little complex to use from our experience, and the
>> three cgroups involved in memcg oom can be different: memcg of
>> invoker, killed task, memcg of going over limit.Suppose a process in
>> B triggers oom and a victim in root A is selected to be killed, we
>> may as well want to know memcg stats just local in A cgroup(excludes
>> BCD). So besides hierarchy info, does it acceptable to also print
>> the local root node stats which as I did in the V1
>> version(https://lkml.org/lkml/2012/7/30/179).
> Ohh, I probably wasn't clear enough. I didn't suggest cumulative
> numbers. Only per group. So it would be something like:
>
> 	for_each_mem_cgroup_tree(mi, memcg) {
> 		printk("Memory cgroup stats for %s", memcg_name);
> 		for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
> 			if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
> 				continue;
> 			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i],
> 				K(mem_cgroup_read_stat(mi, i)));
> 		}
> 		for (i = 0; i<   NR_LRU_LISTS; i++)
> 			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i],
> 				K(mem_cgroup_nr_lru_pages(mi, BIT(i))));
>
> 		printk(KERN_CONT"\n");
> 	}
>

Now I catch your point and understand the above... It's smarter than I 
thought before.
Thanks for explaining!

>> Another one I'm hesitating is numa stats, it seems the output is
>> beginning to get more and more....
> NUMA stats are basically per node - per zone LRU data and that the
> for(NR_LRU_LISTS) can be easily extended to cover that.

Yes, the numa_stat cgroup file has done works here. I'll add the numa
stats if you don't feel improper.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Sha Zhengju <handai.szj@gmail.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, akpm@linux-foundation.org,
	rientjes@google.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] memcg, oom: provide more precise dump info while memcg oom happening
Date: Fri, 09 Nov 2012 20:09:29 +0800	[thread overview]
Message-ID: <509CF279.1080602@gmail.com> (raw)
In-Reply-To: <20121109105040.GA5006@dhcp22.suse.cz>

On 11/09/2012 06:50 PM, Michal Hocko wrote:
> On Fri 09-11-12 18:23:07, Sha Zhengju wrote:
>> On 11/09/2012 12:25 AM, Michal Hocko wrote:
>>> On Thu 08-11-12 23:52:47, Sha Zhengju wrote:
> [...]
>>>> +	for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
>>>> +		long long val = 0;
>>>> +		if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
>>>> +			continue;
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_read_stat(mi, i);
>>>> +		printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
>>>> +	}
>>>> +
>>>> +	for (i = 0; i<   NR_LRU_LISTS; i++) {
>>>> +		unsigned long long val = 0;
>>>> +
>>>> +		for_each_mem_cgroup_tree(mi, memcg)
>>>> +			val += mem_cgroup_nr_lru_pages(mi, BIT(i));
>>>> +		printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
>>>> +	}
>>>> +	printk(KERN_CONT "\n");
>>> This is nice and simple I am just thinking whether it is enough. Say
>>> that you have a deeper hierarchy and the there is a safety limit in the
>>> its root
>>>          A (limit)
>>>         /|\
>>>        B C D
>>>            |\
>>> 	  E F
>>>
>>> and we trigger an OOM on the A's limit. Now we know that something blew
>>> up but what it was we do not know. Wouldn't it be better to swap the for
>>> and for_each_mem_cgroup_tree loops? Then we would see the whole
>>> hierarchy and can potentially point at the group which doesn't behave.
>>> Memory cgroup stats for A/: ...
>>> Memory cgroup stats for A/B/: ...
>>> Memory cgroup stats for A/C/: ...
>>> Memory cgroup stats for A/D/: ...
>>> Memory cgroup stats for A/D/E/: ...
>>> Memory cgroup stats for A/D/F/: ...
>>>
>>> Would it still fit in with your use case?
>>> [...]
>> We haven't used those complicate hierarchy yet, but it sounds a good
>> suggestion. :)
>> Hierarchy is a little complex to use from our experience, and the
>> three cgroups involved in memcg oom can be different: memcg of
>> invoker, killed task, memcg of going over limit.Suppose a process in
>> B triggers oom and a victim in root A is selected to be killed, we
>> may as well want to know memcg stats just local in A cgroup(excludes
>> BCD). So besides hierarchy info, does it acceptable to also print
>> the local root node stats which as I did in the V1
>> version(https://lkml.org/lkml/2012/7/30/179).
> Ohh, I probably wasn't clear enough. I didn't suggest cumulative
> numbers. Only per group. So it would be something like:
>
> 	for_each_mem_cgroup_tree(mi, memcg) {
> 		printk("Memory cgroup stats for %s", memcg_name);
> 		for (i = 0; i<   MEM_CGROUP_STAT_NSTATS; i++) {
> 			if (i == MEM_CGROUP_STAT_SWAP&&   !do_swap_account)
> 				continue;
> 			printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i],
> 				K(mem_cgroup_read_stat(mi, i)));
> 		}
> 		for (i = 0; i<   NR_LRU_LISTS; i++)
> 			printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i],
> 				K(mem_cgroup_nr_lru_pages(mi, BIT(i))));
>
> 		printk(KERN_CONT"\n");
> 	}
>

Now I catch your point and understand the above... It's smarter than I 
thought before.
Thanks for explaining!

>> Another one I'm hesitating is numa stats, it seems the output is
>> beginning to get more and more....
> NUMA stats are basically per node - per zone LRU data and that the
> for(NR_LRU_LISTS) can be easily extended to cover that.

Yes, the numa_stat cgroup file has done works here. I'll add the numa
stats if you don't feel improper.



  parent reply	other threads:[~2012-11-09 12:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-08 15:52 [PATCH V3] memcg, oom: provide more precise dump info while memcg oom happening Sha Zhengju
2012-11-08 15:52 ` Sha Zhengju
2012-11-08 15:52 ` Sha Zhengju
     [not found] ` <1352389967-23270-1-git-send-email-handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org>
2012-11-08 16:25   ` Michal Hocko
2012-11-08 16:25     ` Michal Hocko
2012-11-08 16:25     ` Michal Hocko
     [not found]     ` <20121108162539.GP31821-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-09 10:23       ` Sha Zhengju
2012-11-09 10:23         ` Sha Zhengju
2012-11-09 10:23         ` Sha Zhengju
     [not found]         ` <509CD98B.7080503-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-11-09 10:50           ` Michal Hocko
2012-11-09 10:50             ` Michal Hocko
2012-11-09 10:50             ` Michal Hocko
     [not found]             ` <20121109105040.GA5006-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-09 12:09               ` Sha Zhengju [this message]
2012-11-09 12:09                 ` Sha Zhengju
2012-11-09 12:09                 ` Sha Zhengju
2012-11-09 12:21                 ` Michal Hocko
2012-11-09 12:21                   ` Michal Hocko
2012-11-09  8:12 ` Kamezawa Hiroyuki
2012-11-09  8:12   ` Kamezawa Hiroyuki
     [not found]   ` <509CBB07.9050709-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-11-09 10:45     ` Sha Zhengju
2012-11-09 10:45       ` Sha Zhengju
2012-11-09 10:45       ` Sha Zhengju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=509CF279.1080602@gmail.com \
    --to=handai.szj-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.