* [PATCH 1/6] memcg: use global stat directly for root memcg usage
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
@ 2013-03-12 10:08 ` Sha Zhengju
2013-03-13 1:05 ` Kamezawa Hiroyuki
2013-03-12 10:09 ` [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats Sha Zhengju
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:08 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
Since mem_cgroup_recursive_stat(root_mem_cgroup, INDEX) will sum up
all memcg stats without regard to root's use_hierarchy, we may use
global stats instead for simplicity.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 669d16a..735cd41 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4987,11 +4987,11 @@ static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
return res_counter_read_u64(&memcg->memsw, RES_USAGE);
}
- val = mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_CACHE);
- val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_RSS);
+ val = global_page_state(NR_FILE_PAGES);
+ val += global_page_state(NR_ANON_PAGES);
if (swap)
- val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_SWAP);
+ val += total_swap_pages - atomic_long_read(&nr_swap_pages);
return val << PAGE_SHIFT;
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6] memcg: use global stat directly for root memcg usage
2013-03-12 10:08 ` [PATCH 1/6] memcg: use global stat directly for root memcg usage Sha Zhengju
@ 2013-03-13 1:05 ` Kamezawa Hiroyuki
2013-03-13 8:50 ` Sha Zhengju
0 siblings, 1 reply; 13+ messages in thread
From: Kamezawa Hiroyuki @ 2013-03-13 1:05 UTC (permalink / raw)
To: Sha Zhengju
Cc: cgroups, linux-mm, mhocko, glommer, akpm, mgorman, Sha Zhengju
(2013/03/12 19:08), Sha Zhengju wrote:
> Since mem_cgroup_recursive_stat(root_mem_cgroup, INDEX) will sum up
> all memcg stats without regard to root's use_hierarchy, we may use
> global stats instead for simplicity.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> ---
> mm/memcontrol.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 669d16a..735cd41 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4987,11 +4987,11 @@ static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> return res_counter_read_u64(&memcg->memsw, RES_USAGE);
> }
>
> - val = mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_CACHE);
> - val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_RSS);
> + val = global_page_state(NR_FILE_PAGES);
> + val += global_page_state(NR_ANON_PAGES);
>
you missed NR_ANON_TRANSPARENT_HUGEPAGES
> if (swap)
> - val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_SWAP);
> + val += total_swap_pages - atomic_long_read(&nr_swap_pages);
>
Double count mapped SwapCache ? Did you saw Costa's trial in a week ago ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6] memcg: use global stat directly for root memcg usage
2013-03-13 1:05 ` Kamezawa Hiroyuki
@ 2013-03-13 8:50 ` Sha Zhengju
0 siblings, 0 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-13 8:50 UTC (permalink / raw)
To: Kamezawa Hiroyuki
Cc: cgroups, linux-mm, mhocko, glommer, akpm, mgorman, Sha Zhengju
On Wed, Mar 13, 2013 at 9:05 AM, Kamezawa Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2013/03/12 19:08), Sha Zhengju wrote:
>> Since mem_cgroup_recursive_stat(root_mem_cgroup, INDEX) will sum up
>> all memcg stats without regard to root's use_hierarchy, we may use
>> global stats instead for simplicity.
>>
>> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
>> ---
>> mm/memcontrol.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 669d16a..735cd41 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -4987,11 +4987,11 @@ static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>> return res_counter_read_u64(&memcg->memsw, RES_USAGE);
>> }
>>
>> - val = mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_CACHE);
>> - val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_RSS);
>> + val = global_page_state(NR_FILE_PAGES);
>> + val += global_page_state(NR_ANON_PAGES);
>>
> you missed NR_ANON_TRANSPARENT_HUGEPAGES
right..
>
>> if (swap)
>> - val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_SWAP);
>> + val += total_swap_pages - atomic_long_read(&nr_swap_pages);
>>
> Double count mapped SwapCache ? Did you saw Costa's trial in a week ago ?
yeah, I’m hesitating how to handle swapcache. I've replied in that thread. : )
Thanks,
Sha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
2013-03-12 10:08 ` [PATCH 1/6] memcg: use global stat directly for root memcg usage Sha Zhengju
@ 2013-03-12 10:09 ` Sha Zhengju
2013-03-13 1:12 ` Kamezawa Hiroyuki
2013-03-20 7:07 ` Glauber Costa
2013-03-12 10:10 ` [PATCH 3/6] memcg: Don't account root memcg MEM_CGROUP_STAT_FILE_MAPPED stats Sha Zhengju
` (3 subsequent siblings)
5 siblings, 2 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:09 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
If memcg is enabled and no non-root memcg exists, all allocated pages
belong to root_mem_cgroup and go through root memcg statistics routines
which brings some overheads.
So for the sake of performance, we can give up accounting stats of root
memcg for MEM_CGROUP_STAT_CACHE/RSS and instead we pay special attention
to memcg_stat_show() while showing root memcg numbers:
as we don't account root memcg stats anymore, the root_mem_cgroup->stat
numbers are actually 0. So we fake these numbers by using stats of global
state and all other memcg. That is for root memcg:
nr(MEM_CGROUP_STAT_CACHE) = global_page_state(NR_FILE_PAGES) -
sum_of_all_memcg(MEM_CGROUP_STAT_CACHE);
Rss pages accounting are in the similar way.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 735cd41..e89204f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -958,26 +958,27 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
{
preempt_disable();
- /*
- * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
- * counted as CACHE even if it's on ANON LRU.
- */
- if (anon)
- __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
- nr_pages);
- else
- __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
- nr_pages);
-
/* pagein of a big page is an event. So, ignore page size */
if (nr_pages > 0)
__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
- else {
+ else
__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
- nr_pages = -nr_pages; /* for event */
- }
- __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
+ __this_cpu_add(memcg->stat->nr_page_events,
+ nr_pages < 0 ? -nr_pages : nr_pages);
+
+ if (!mem_cgroup_is_root(memcg)) {
+ /*
+ * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
+ * counted as CACHE even if it's on ANON LRU.
+ */
+ if (anon)
+ __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
+ nr_pages);
+ else
+ __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
+ nr_pages);
+ }
preempt_enable();
}
@@ -5445,12 +5446,24 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
struct mem_cgroup *mi;
unsigned int i;
+ enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES};
+ long root_stat[MEM_CGROUP_STAT_NSTATS] = {0};
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ long val = 0;
+
if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
continue;
+
+ if (mem_cgroup_is_root(memcg) && (i == MEM_CGROUP_STAT_CACHE
+ || i == MEM_CGROUP_STAT_RSS)) {
+ val = global_page_state(global_stat[i]) -
+ mem_cgroup_recursive_stat(memcg, i);
+ root_stat[i] = val = val < 0 ? 0 : val;
+ } else
+ val = mem_cgroup_read_stat(memcg, i);
seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
- mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
+ val * PAGE_SIZE);
}
for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
@@ -5478,6 +5491,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
continue;
for_each_mem_cgroup_tree(mi, memcg)
val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
+
+ /* Adding local stats of root memcg */
+ if (mem_cgroup_is_root(memcg))
+ val += root_stat[i] * PAGE_SIZE;
+
seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats
2013-03-12 10:09 ` [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats Sha Zhengju
@ 2013-03-13 1:12 ` Kamezawa Hiroyuki
2013-03-13 9:09 ` Sha Zhengju
2013-03-20 7:07 ` Glauber Costa
1 sibling, 1 reply; 13+ messages in thread
From: Kamezawa Hiroyuki @ 2013-03-13 1:12 UTC (permalink / raw)
To: Sha Zhengju
Cc: cgroups, linux-mm, mhocko, glommer, akpm, mgorman, Sha Zhengju
(2013/03/12 19:09), Sha Zhengju wrote:
> If memcg is enabled and no non-root memcg exists, all allocated pages
> belong to root_mem_cgroup and go through root memcg statistics routines
> which brings some overheads.
>
> So for the sake of performance, we can give up accounting stats of root
> memcg for MEM_CGROUP_STAT_CACHE/RSS and instead we pay special attention
> to memcg_stat_show() while showing root memcg numbers:
> as we don't account root memcg stats anymore, the root_mem_cgroup->stat
> numbers are actually 0. So we fake these numbers by using stats of global
> state and all other memcg. That is for root memcg:
>
> nr(MEM_CGROUP_STAT_CACHE) = global_page_state(NR_FILE_PAGES) -
> sum_of_all_memcg(MEM_CGROUP_STAT_CACHE);
>
> Rss pages accounting are in the similar way.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> ---
> mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 34 insertions(+), 16 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 735cd41..e89204f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -958,26 +958,27 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
> {
> preempt_disable();
>
> - /*
> - * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
> - * counted as CACHE even if it's on ANON LRU.
> - */
> - if (anon)
> - __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
> - nr_pages);
> - else
> - __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
> - nr_pages);
> -
> /* pagein of a big page is an event. So, ignore page size */
> if (nr_pages > 0)
> __this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
> - else {
> + else
> __this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
> - nr_pages = -nr_pages; /* for event */
> - }
>
> - __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
> + __this_cpu_add(memcg->stat->nr_page_events,
> + nr_pages < 0 ? -nr_pages : nr_pages);
> +
> + if (!mem_cgroup_is_root(memcg)) {
> + /*
> + * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
> + * counted as CACHE even if it's on ANON LRU.
> + */
> + if (anon)
> + __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
> + nr_pages);
> + else
> + __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
> + nr_pages);
> + }
Hmm. I don't like to add this check to this fast path. IIUC, with Costa's patch, root memcg
will not make any charges at all and never call this function. I like his one rather than
this patching.
Thanks,
-Kame
>
> preempt_enable();
> }
> @@ -5445,12 +5446,24 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> struct mem_cgroup *mi;
> unsigned int i;
> + enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES};
> + long root_stat[MEM_CGROUP_STAT_NSTATS] = {0};
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + long val = 0;
> +
> if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> continue;
> +
> + if (mem_cgroup_is_root(memcg) && (i == MEM_CGROUP_STAT_CACHE
> + || i == MEM_CGROUP_STAT_RSS)) {
> + val = global_page_state(global_stat[i]) -
> + mem_cgroup_recursive_stat(memcg, i);
> + root_stat[i] = val = val < 0 ? 0 : val;
> + } else
> + val = mem_cgroup_read_stat(memcg, i);
> seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> - mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> + val * PAGE_SIZE);
> }
>
> for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> @@ -5478,6 +5491,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> continue;
> for_each_mem_cgroup_tree(mi, memcg)
> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> +
> + /* Adding local stats of root memcg */
> + if (mem_cgroup_is_root(memcg))
> + val += root_stat[i] * PAGE_SIZE;
> +
> seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
> }
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats
2013-03-13 1:12 ` Kamezawa Hiroyuki
@ 2013-03-13 9:09 ` Sha Zhengju
0 siblings, 0 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-13 9:09 UTC (permalink / raw)
To: Kamezawa Hiroyuki
Cc: cgroups, linux-mm, mhocko, glommer, akpm, mgorman, Sha Zhengju
On Wed, Mar 13, 2013 at 9:12 AM, Kamezawa Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2013/03/12 19:09), Sha Zhengju wrote:
>> If memcg is enabled and no non-root memcg exists, all allocated pages
>> belong to root_mem_cgroup and go through root memcg statistics routines
>> which brings some overheads.
>>
>> So for the sake of performance, we can give up accounting stats of root
>> memcg for MEM_CGROUP_STAT_CACHE/RSS and instead we pay special attention
>> to memcg_stat_show() while showing root memcg numbers:
>> as we don't account root memcg stats anymore, the root_mem_cgroup->stat
>> numbers are actually 0. So we fake these numbers by using stats of global
>> state and all other memcg. That is for root memcg:
>>
>> nr(MEM_CGROUP_STAT_CACHE) = global_page_state(NR_FILE_PAGES) -
>> sum_of_all_memcg(MEM_CGROUP_STAT_CACHE);
>>
>> Rss pages accounting are in the similar way.
>>
>> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
>> ---
>> mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++++++----------------
>> 1 file changed, 34 insertions(+), 16 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 735cd41..e89204f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -958,26 +958,27 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
>> {
>> preempt_disable();
>>
>> - /*
>> - * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
>> - * counted as CACHE even if it's on ANON LRU.
>> - */
>> - if (anon)
>> - __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
>> - nr_pages);
>> - else
>> - __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
>> - nr_pages);
>> -
>> /* pagein of a big page is an event. So, ignore page size */
>> if (nr_pages > 0)
>> __this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
>> - else {
>> + else
>> __this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
>> - nr_pages = -nr_pages; /* for event */
>> - }
>>
>> - __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
>> + __this_cpu_add(memcg->stat->nr_page_events,
>> + nr_pages < 0 ? -nr_pages : nr_pages);
>> +
>> + if (!mem_cgroup_is_root(memcg)) {
>> + /*
>> + * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
>> + * counted as CACHE even if it's on ANON LRU.
>> + */
>> + if (anon)
>> + __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS],
>> + nr_pages);
>> + else
>> + __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_CACHE],
>> + nr_pages);
>> + }
>
> Hmm. I don't like to add this check to this fast path. IIUC, with Costa's patch, root memcg
> will not make any charges at all and never call this function. I like his one rather than
Yes. But I think that one still has some other problems such as
PGPGIN/PGPGOUT and threshold events related things. I prefer to
improve this as a start.
Thanks,
Sha
> this patching.
>
> Thanks,
> -Kame
>
>
>>
>> preempt_enable();
>> }
>> @@ -5445,12 +5446,24 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>> struct mem_cgroup *mi;
>> unsigned int i;
>> + enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES};
>> + long root_stat[MEM_CGROUP_STAT_NSTATS] = {0};
>>
>> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>> + long val = 0;
>> +
>> if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> continue;
>> +
>> + if (mem_cgroup_is_root(memcg) && (i == MEM_CGROUP_STAT_CACHE
>> + || i == MEM_CGROUP_STAT_RSS)) {
>> + val = global_page_state(global_stat[i]) -
>> + mem_cgroup_recursive_stat(memcg, i);
>> + root_stat[i] = val = val < 0 ? 0 : val;
>> + } else
>> + val = mem_cgroup_read_stat(memcg, i);
>> seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
>> - mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
>> + val * PAGE_SIZE);
>> }
>>
>> for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
>> @@ -5478,6 +5491,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>> continue;
>> for_each_mem_cgroup_tree(mi, memcg)
>> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
>> +
>> + /* Adding local stats of root memcg */
>> + if (mem_cgroup_is_root(memcg))
>> + val += root_stat[i] * PAGE_SIZE;
>> +
>> seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
>> }
>>
>>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats
2013-03-12 10:09 ` [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats Sha Zhengju
2013-03-13 1:12 ` Kamezawa Hiroyuki
@ 2013-03-20 7:07 ` Glauber Costa
1 sibling, 0 replies; 13+ messages in thread
From: Glauber Costa @ 2013-03-20 7:07 UTC (permalink / raw)
To: Sha Zhengju
Cc: cgroups, linux-mm, mhocko, kamezawa.hiroyu, akpm, mgorman,
Sha Zhengju
On 03/12/2013 02:09 PM, Sha Zhengju wrote:
> If memcg is enabled and no non-root memcg exists, all allocated pages
> belong to root_mem_cgroup and go through root memcg statistics routines
> which brings some overheads.
>
> So for the sake of performance, we can give up accounting stats of root
> memcg for MEM_CGROUP_STAT_CACHE/RSS and instead we pay special attention
> to memcg_stat_show() while showing root memcg numbers:
> as we don't account root memcg stats anymore, the root_mem_cgroup->stat
> numbers are actually 0. So we fake these numbers by using stats of global
> state and all other memcg. That is for root memcg:
>
> nr(MEM_CGROUP_STAT_CACHE) = global_page_state(NR_FILE_PAGES) -
> sum_of_all_memcg(MEM_CGROUP_STAT_CACHE);
>
> Rss pages accounting are in the similar way.
>
Well,
The problem is that statistics is not the only cause for overhead. We
will still incur in in the whole charging operation, and the same for
uncharge. There is memory overhead from page_cgroup, etc.
So my view is that this patch is far from complete.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/6] memcg: Don't account root memcg MEM_CGROUP_STAT_FILE_MAPPED stats
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
2013-03-12 10:08 ` [PATCH 1/6] memcg: use global stat directly for root memcg usage Sha Zhengju
2013-03-12 10:09 ` [PATCH 2/6] memcg: Don't account root memcg CACHE/RSS stats Sha Zhengju
@ 2013-03-12 10:10 ` Sha Zhengju
2013-03-12 10:10 ` [PATCH 4/6] memcg: Don't account root memcg swap stats Sha Zhengju
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:10 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
Similar with root memcg's CACHE/RSS, we don't account its stats counted
by mem_cgroup_update_page_stat() (now MEM_CGROUP_STAT_FILE_MAPPED only)
to improve performance.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e89204f..24ce5e6d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2277,6 +2277,10 @@ void mem_cgroup_update_page_stat(struct page *page,
return;
memcg = pc->mem_cgroup;
+
+ if (mem_cgroup_is_root(memcg))
+ return;
+
if (unlikely(!memcg || !PageCgroupUsed(pc)))
return;
@@ -5446,7 +5450,8 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
struct mem_cgroup *mi;
unsigned int i;
- enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES};
+ enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES,
+ NR_FILE_MAPPED};
long root_stat[MEM_CGROUP_STAT_NSTATS] = {0};
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
@@ -5455,8 +5460,7 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
continue;
- if (mem_cgroup_is_root(memcg) && (i == MEM_CGROUP_STAT_CACHE
- || i == MEM_CGROUP_STAT_RSS)) {
+ if (mem_cgroup_is_root(memcg) && (i != MEM_CGROUP_STAT_SWAP)) {
val = global_page_state(global_stat[i]) -
mem_cgroup_recursive_stat(memcg, i);
root_stat[i] = val = val < 0 ? 0 : val;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/6] memcg: Don't account root memcg swap stats
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
` (2 preceding siblings ...)
2013-03-12 10:10 ` [PATCH 3/6] memcg: Don't account root memcg MEM_CGROUP_STAT_FILE_MAPPED stats Sha Zhengju
@ 2013-03-12 10:10 ` Sha Zhengju
2013-03-12 10:11 ` [PATCH 5/6] memcg: Don't account root memcg PGFAULT/PGMAJFAULT events Sha Zhengju
2013-03-12 10:11 ` [PATCH 6/6] memcg: disable memcg page stat accounting Sha Zhengju
5 siblings, 0 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:10 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
Similar with root memcg's CACHE/RSS, we don't account its swap stats
to improve performance.
And for root memcg memcg_stat_show():
nr(MEM_CGROUP_STAT_SWAP) = total_swap_pages - nr_swap_pages
- sum_of_all_memcg(MEM_CGROUP_STAT_SWAP);
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 24ce5e6d..b73758e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -934,7 +934,9 @@ static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
bool charge)
{
int val = (charge) ? 1 : -1;
- this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val);
+
+ if (!mem_cgroup_is_root(memcg))
+ this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val);
}
static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg,
@@ -5460,10 +5462,13 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
continue;
- if (mem_cgroup_is_root(memcg) && (i != MEM_CGROUP_STAT_SWAP)) {
- val = global_page_state(global_stat[i]) -
- mem_cgroup_recursive_stat(memcg, i);
- root_stat[i] = val = val < 0 ? 0 : val;
+ if (mem_cgroup_is_root(memcg)) {
+ if (i == MEM_CGROUP_STAT_SWAP)
+ val = total_swap_pages -
+ atomic_long_read(&nr_swap_pages);
+ else
+ val = global_page_state(global_stat[i]);
+ val = val - mem_cgroup_recursive_stat(memcg, i);
} else
val = mem_cgroup_read_stat(memcg, i);
seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/6] memcg: Don't account root memcg PGFAULT/PGMAJFAULT events
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
` (3 preceding siblings ...)
2013-03-12 10:10 ` [PATCH 4/6] memcg: Don't account root memcg swap stats Sha Zhengju
@ 2013-03-12 10:11 ` Sha Zhengju
2013-03-12 10:11 ` [PATCH 6/6] memcg: disable memcg page stat accounting Sha Zhengju
5 siblings, 0 replies; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:11 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
Use the similar way to handle root memcg PGFAULT/PGMAJFAULT events.
So
nr(MEM_CGROUP_EVENTS_PGFAULT/PGMAJFAULT) = global_event_states -
sum_of_all_memcg(MEM_CGROUP_EVENTS_PGFAULT/PGMAJFAULT);
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b73758e..cea4b02 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -53,6 +53,7 @@
#include <linux/page_cgroup.h>
#include <linux/cpu.h>
#include <linux/oom.h>
+#include <linux/vmstat.h>
#include "internal.h"
#include <net/sock.h>
#include <net/ip.h>
@@ -1252,6 +1253,10 @@ void __mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
rcu_read_lock();
memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
+
+ if (mem_cgroup_is_root(memcg))
+ goto out;
+
if (unlikely(!memcg))
goto out;
@@ -4983,6 +4988,18 @@ static unsigned long mem_cgroup_recursive_stat(struct mem_cgroup *memcg,
return val;
}
+static unsigned long mem_cgroup_recursive_events(struct mem_cgroup *memcg,
+ enum mem_cgroup_events_index idx)
+{
+ struct mem_cgroup *iter;
+ unsigned long val = 0;
+
+ for_each_mem_cgroup_tree(iter, memcg)
+ val += mem_cgroup_read_events(iter, idx);
+
+ return val;
+}
+
static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
{
u64 val;
@@ -5455,6 +5472,7 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
enum zone_stat_item global_stat[] = {NR_FILE_PAGES, NR_ANON_PAGES,
NR_FILE_MAPPED};
long root_stat[MEM_CGROUP_STAT_NSTATS] = {0};
+ unsigned long root_events[MEM_CGROUP_EVENTS_NSTATS] = {0};
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
long val = 0;
@@ -5475,9 +5493,30 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
val * PAGE_SIZE);
}
- for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
- seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i],
- mem_cgroup_read_events(memcg, i));
+ for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+ unsigned long val = 0;
+
+ if (mem_cgroup_is_root(memcg) &&
+ ((i == MEM_CGROUP_EVENTS_PGFAULT) ||
+ i == MEM_CGROUP_EVENTS_PGMAJFAULT)) {
+ int cpu;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+ if (i == MEM_CGROUP_EVENTS_PGFAULT)
+ val += this->event[PGFAULT];
+ else
+ val += this->event[PGMAJFAULT];
+ }
+ put_online_cpus();
+
+ val = val - mem_cgroup_recursive_events(memcg, i);
+ root_events[i] = val = val < 0 ? 0 : val;
+ } else
+ val = mem_cgroup_read_events(memcg, i);
+ seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i], val);
+ }
for (i = 0; i < NR_LRU_LISTS; i++)
seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i],
@@ -5513,6 +5552,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
for_each_mem_cgroup_tree(mi, memcg)
val += mem_cgroup_read_events(mi, i);
+
+ /* Adding local events of root memcg */
+ if (mem_cgroup_is_root(memcg))
+ val += root_events[i];
+
seq_printf(m, "total_%s %llu\n",
mem_cgroup_events_names[i], val);
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/6] memcg: disable memcg page stat accounting
2013-03-12 10:06 [PATCH 0/6] memcg: bypass root memcg page stat accounting Sha Zhengju
` (4 preceding siblings ...)
2013-03-12 10:11 ` [PATCH 5/6] memcg: Don't account root memcg PGFAULT/PGMAJFAULT events Sha Zhengju
@ 2013-03-12 10:11 ` Sha Zhengju
2013-03-20 7:09 ` Glauber Costa
5 siblings, 1 reply; 13+ messages in thread
From: Sha Zhengju @ 2013-03-12 10:11 UTC (permalink / raw)
To: cgroups, linux-mm
Cc: mhocko, kamezawa.hiroyu, glommer, akpm, mgorman, Sha Zhengju
Use jump label to patch the memcg page stat accounting code
in or out when not used. when the first non-root memcg comes to
life the code is patching in otherwise it is out.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
---
include/linux/memcontrol.h | 23 +++++++++++++++++++++++
mm/memcontrol.c | 34 +++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d6183f0..99dca91 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -42,6 +42,14 @@ struct mem_cgroup_reclaim_cookie {
};
#ifdef CONFIG_MEMCG
+
+extern struct static_key memcg_in_use_key;
+
+static inline bool mem_cgroup_in_use(void)
+{
+ return static_key_false(&memcg_in_use_key);
+}
+
/*
* All "charge" functions with gfp_mask should use GFP_KERNEL or
* (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't
@@ -145,6 +153,10 @@ static inline void mem_cgroup_begin_update_page_stat(struct page *page,
{
if (mem_cgroup_disabled())
return;
+
+ if (!mem_cgroup_in_use())
+ return;
+
rcu_read_lock();
*locked = false;
if (atomic_read(&memcg_moving))
@@ -158,6 +170,10 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page,
{
if (mem_cgroup_disabled())
return;
+
+ if (!mem_cgroup_in_use())
+ return;
+
if (*locked)
__mem_cgroup_end_update_page_stat(page, flags);
rcu_read_unlock();
@@ -189,6 +205,9 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
{
if (mem_cgroup_disabled())
return;
+ if (!mem_cgroup_in_use())
+ return;
+
__mem_cgroup_count_vm_event(mm, idx);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -201,6 +220,10 @@ void mem_cgroup_print_bad_page(struct page *page);
#endif
#else /* CONFIG_MEMCG */
struct mem_cgroup;
+static inline bool mem_cgroup_in_use(void)
+{
+ return false;
+}
static inline int mem_cgroup_newpage_charge(struct page *page,
struct mm_struct *mm, gfp_t gfp_mask)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cea4b02..4e08347 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -562,6 +562,14 @@ enum res_type {
*/
static DEFINE_MUTEX(memcg_create_mutex);
+/* static_key used for marking memcg in use or not. We use this jump label to
+ * patch memcg page stat accounting code in or out.
+ * The key will be increased when non-root memcg is created, and be decreased
+ * when memcg is destroyed.
+ */
+struct static_key memcg_in_use_key;
+EXPORT_SYMBOL(memcg_in_use_key);
+
static void mem_cgroup_get(struct mem_cgroup *memcg);
static void mem_cgroup_put(struct mem_cgroup *memcg);
@@ -707,10 +715,21 @@ static void disarm_kmem_keys(struct mem_cgroup *memcg)
}
#endif /* CONFIG_MEMCG_KMEM */
+static void disarm_inuse_keys(void)
+{
+ static_key_slow_dec(&memcg_in_use_key);
+}
+
+static void arm_inuse_keys(void)
+{
+ static_key_slow_inc(&memcg_in_use_key);
+}
+
static void disarm_static_keys(struct mem_cgroup *memcg)
{
disarm_sock_keys(memcg);
disarm_kmem_keys(memcg);
+ disarm_inuse_keys();
}
static void drain_all_stock_async(struct mem_cgroup *memcg);
@@ -936,6 +955,9 @@ static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
{
int val = (charge) ? 1 : -1;
+ if (!mem_cgroup_in_use())
+ return;
+
if (!mem_cgroup_is_root(memcg))
this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val);
}
@@ -970,6 +992,11 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
__this_cpu_add(memcg->stat->nr_page_events,
nr_pages < 0 ? -nr_pages : nr_pages);
+ if (!mem_cgroup_in_use()) {
+ preempt_enable();
+ return;
+ }
+
if (!mem_cgroup_is_root(memcg)) {
/*
* Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
@@ -2278,11 +2305,13 @@ void mem_cgroup_update_page_stat(struct page *page,
{
struct mem_cgroup *memcg;
struct page_cgroup *pc = lookup_page_cgroup(page);
- unsigned long uninitialized_var(flags);
if (mem_cgroup_disabled())
return;
+ if (!mem_cgroup_in_use())
+ return;
+
memcg = pc->mem_cgroup;
if (mem_cgroup_is_root(memcg))
@@ -6414,6 +6443,9 @@ mem_cgroup_css_online(struct cgroup *cont)
}
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
+ if (!error)
+ arm_inuse_keys();
+
mutex_unlock(&memcg_create_mutex);
if (error) {
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 6/6] memcg: disable memcg page stat accounting
2013-03-12 10:11 ` [PATCH 6/6] memcg: disable memcg page stat accounting Sha Zhengju
@ 2013-03-20 7:09 ` Glauber Costa
0 siblings, 0 replies; 13+ messages in thread
From: Glauber Costa @ 2013-03-20 7:09 UTC (permalink / raw)
To: Sha Zhengju
Cc: cgroups, linux-mm, mhocko, kamezawa.hiroyu, akpm, mgorman,
Sha Zhengju
On 03/12/2013 02:11 PM, Sha Zhengju wrote:
> Use jump label to patch the memcg page stat accounting code
> in or out when not used. when the first non-root memcg comes to
> life the code is patching in otherwise it is out.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> ---
> include/linux/memcontrol.h | 23 +++++++++++++++++++++++
> mm/memcontrol.c | 34 +++++++++++++++++++++++++++++++++-
> 2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d6183f0..99dca91 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -42,6 +42,14 @@ struct mem_cgroup_reclaim_cookie {
> };
>
> #ifdef CONFIG_MEMCG
> +
> +extern struct static_key memcg_in_use_key;
> +
> +static inline bool mem_cgroup_in_use(void)
> +{
> + return static_key_false(&memcg_in_use_key);
> +}
> +
I believe the big advantage of the approach I've taken, including this
test in mem_cgroup_disabled(), is that we patch out a lot of things for
free.
We just need to be careful because some code expected that decision to
be permanent and now that status can change.
But I would still advocate for that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread