* [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon()
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 9:15 ` Lord Glauber Costa of Sealand
2013-01-29 13:41 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, cgroups
Refector swap_cgroup_swapon() to setup the number of pages only, and
move the rest to swap_cgroup_prepare(), so that the later can be used
for allocating buffers when creating the first non-root memcg.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Sha Zhengju <handai.szj@taobao.com>
---
mm/page_cgroup.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3..c945254 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -360,6 +360,9 @@ static int swap_cgroup_prepare(int type)
unsigned long idx, max;
ctrl = &swap_cgroup_ctrl[type];
+ ctrl->map = vzalloc(ctrl->length * sizeof(void *));
+ if (!ctrl->map)
+ goto nomem;
for (idx = 0; idx < ctrl->length; idx++) {
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
@@ -368,11 +371,13 @@ static int swap_cgroup_prepare(int type)
ctrl->map[idx] = page;
}
return 0;
+
not_enough_page:
max = idx;
for (idx = 0; idx < max; idx++)
__free_page(ctrl->map[idx]);
-
+ ctrl->map = NULL;
+nomem:
return -ENOMEM;
}
@@ -460,8 +465,6 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
int swap_cgroup_swapon(int type, unsigned long max_pages)
{
- void *array;
- unsigned long array_size;
unsigned long length;
struct swap_cgroup_ctrl *ctrl;
@@ -469,23 +472,15 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
return 0;
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
- array_size = length * sizeof(void *);
-
- array = vzalloc(array_size);
- if (!array)
- goto nomem;
ctrl = &swap_cgroup_ctrl[type];
mutex_lock(&swap_cgroup_mutex);
ctrl->length = length;
- ctrl->map = array;
spin_lock_init(&ctrl->lock);
if (swap_cgroup_prepare(type)) {
/* memory shortage */
- ctrl->map = NULL;
ctrl->length = 0;
mutex_unlock(&swap_cgroup_mutex);
- vfree(array);
goto nomem;
}
mutex_unlock(&swap_cgroup_mutex);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon()
2013-01-28 10:54 ` [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon() Jeff Liu
@ 2013-01-29 9:15 ` Lord Glauber Costa of Sealand
2013-01-29 13:41 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 9:15 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, cgroups
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> Refector swap_cgroup_swapon() to setup the number of pages only, and
> move the rest to swap_cgroup_prepare(), so that the later can be used
> for allocating buffers when creating the first non-root memcg.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
In itself seems just like a healthy reshuffling to me
So looks good.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon()
2013-01-28 10:54 ` [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon() Jeff Liu
2013-01-29 9:15 ` Lord Glauber Costa of Sealand
@ 2013-01-29 13:41 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 13:41 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:30, Jeff Liu wrote:
> Refector swap_cgroup_swapon() to setup the number of pages only, and
> move the rest to swap_cgroup_prepare(), so that the later can be used
> for allocating buffers when creating the first non-root memcg.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
Makes sense and it even saves some lines.
Acked-by: Michal Hocko <mhocko@suse.cz>
>
> ---
> mm/page_cgroup.c | 17 ++++++-----------
> 1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 6d757e3..c945254 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -360,6 +360,9 @@ static int swap_cgroup_prepare(int type)
> unsigned long idx, max;
>
> ctrl = &swap_cgroup_ctrl[type];
> + ctrl->map = vzalloc(ctrl->length * sizeof(void *));
> + if (!ctrl->map)
> + goto nomem;
>
> for (idx = 0; idx < ctrl->length; idx++) {
> page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> @@ -368,11 +371,13 @@ static int swap_cgroup_prepare(int type)
> ctrl->map[idx] = page;
> }
> return 0;
> +
> not_enough_page:
> max = idx;
> for (idx = 0; idx < max; idx++)
> __free_page(ctrl->map[idx]);
> -
> + ctrl->map = NULL;
> +nomem:
> return -ENOMEM;
> }
>
> @@ -460,8 +465,6 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
>
> int swap_cgroup_swapon(int type, unsigned long max_pages)
> {
> - void *array;
> - unsigned long array_size;
> unsigned long length;
> struct swap_cgroup_ctrl *ctrl;
>
> @@ -469,23 +472,15 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
> return 0;
>
> length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
> - array_size = length * sizeof(void *);
> -
> - array = vzalloc(array_size);
> - if (!array)
> - goto nomem;
>
> ctrl = &swap_cgroup_ctrl[type];
> mutex_lock(&swap_cgroup_mutex);
> ctrl->length = length;
> - ctrl->map = array;
> spin_lock_init(&ctrl->lock);
> if (swap_cgroup_prepare(type)) {
> /* memory shortage */
> - ctrl->map = NULL;
> ctrl->length = 0;
> mutex_unlock(&swap_cgroup_mutex);
> - vfree(array);
> goto nomem;
> }
> mutex_unlock(&swap_cgroup_mutex);
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 10:18 ` Lord Glauber Costa of Sealand
2013-01-29 14:13 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, handai.szj
Root memcg with swap cgroup is special since we only do tracking but can
not set limits against it. In order to facilitate the implementation of
the coming swap cgroup structures delay allocation mechanism, we can bypass
the default swap statistics upon the root memcg and figure it out through
the global stats instead as below:
root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
In this way, we'll return an invalid CSS_ID(generally, it's 0) at swap
cgroup related tracking infrastructures if only the root memcg is alive.
That is to say, we have not yet allocate swap cgroup structures.
As a result, the per pages swapin/swapout stats number agains the root
memcg shoud be ZERO.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09255ec..afe5e86 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5231,12 +5231,34 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
struct mem_cgroup *mi;
unsigned int i;
+ long long root_swap_stat = 0;
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
- if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
- continue;
+ long val = 0;
+
+ if (i != MEM_CGROUP_STAT_SWAP)
+ val = mem_cgroup_read_stat(memcg, i);
+ else {
+ if (!do_swap_account)
+ continue;
+ if (!mem_cgroup_is_root(memcg))
+ val = mem_cgroup_read_stat(memcg, i);
+ else {
+ /*
+ * The corresponding stat number of swap for
+ * root_mem_cgroup is 0 since we don't account
+ * it in any case. Instead, we can fake the
+ * root number via: total_swap_pages -
+ * nr_swap_pages - total_swap_pages_of_all_memcg
+ */
+ for_each_mem_cgroup(mi)
+ val += mem_cgroup_read_stat(mi, i);
+ val = root_swap_stat = (total_swap_pages -
+ nr_swap_pages - val);
+ }
+ }
seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
- mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
+ val * PAGE_SIZE);
}
for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
@@ -5260,8 +5282,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
long long val = 0;
- if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
- continue;
+ if (i == MEM_CGROUP_STAT_SWAP) {
+ if (!do_swap_account)
+ continue;
+ val += root_swap_stat * PAGE_SIZE;
+ }
for_each_mem_cgroup_tree(mi, memcg)
val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-28 10:54 ` [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg Jeff Liu
@ 2013-01-29 10:18 ` Lord Glauber Costa of Sealand
2013-01-31 6:18 ` Jeff Liu
2013-01-29 14:13 ` Michal Hocko
1 sibling, 1 reply; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 10:18 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, handai.szj
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> Root memcg with swap cgroup is special since we only do tracking but can
> not set limits against it. In order to facilitate the implementation of
> the coming swap cgroup structures delay allocation mechanism, we can bypass
> the default swap statistics upon the root memcg and figure it out through
> the global stats instead as below:
>
I am sorry if this is was already discussed before, but:
> root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
>
Shouldn't it *at least* be dependent on use_hierarchy?
I don't see why root_memcg won't be always total_swap_pages -
nr_swap_pages, since the root memcg is always viewed as a superset of
the others, AFAIR.
Even if it is not the general case (which again, I really believe it
is), it certainly is the case for hierarchy enabled setups.
Also, I truly don't understand what is the business of
root_memcg_swap_stat in non-root memcgs.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-29 10:18 ` Lord Glauber Costa of Sealand
@ 2013-01-31 6:18 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-31 6:18 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand; +Cc: linux-mm, Michal Hocko, handai.szj
Hi Glauber,
Sorry for my late response!
On 01/29/2013 06:18 PM, Lord Glauber Costa of Sealand wrote:
> On 01/28/2013 02:54 PM, Jeff Liu wrote:
>> Root memcg with swap cgroup is special since we only do tracking but can
>> not set limits against it. In order to facilitate the implementation of
>> the coming swap cgroup structures delay allocation mechanism, we can bypass
>> the default swap statistics upon the root memcg and figure it out through
>> the global stats instead as below:
>>
> I am sorry if this is was already discussed before, but:
>> root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
>> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
>>
>
> Shouldn't it *at least* be dependent on use_hierarchy?
Yeah, actually use_hierarchy only affects the total swap numbers. While
computing memcg_total_swap_stats, it's original
for_each_mem_cgroup_tree() way already handle the stuff about hierarchy
and here we do not change its behavior. What we do is adding the root's
local swap stat to total (since the root statistics are not accounting
anymore, the per cpu data is ZERO).
>
> I don't see why root_memcg won't be always total_swap_pages -
> nr_swap_pages, since the root memcg is always viewed as a superset of
> the others, AFAIR.
>
> Even if it is not the general case (which again, I really believe it
> is), it certainly is the case for hierarchy enabled setups.
I'm also confused a little by hierarchy recently especially the
corresponding behavior of root is different... But we may have to do
some modification just based on the current implementation as the plan
of getting rid of hierarchy is slow as Michal mentioned.
>
> Also, I truly don't understand what is the business of
> root_memcg_swap_stat in non-root memcgs.
root_memcg_swap_stat represents swap numbers belonging to the root mem
cgroup. As its stats is 0 now, we need to fake it by the help of other
memcgs, that is: global_used_swap - swaps_used_by_other_memcgs(non-root
memcgs).
Thanks,
-Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-28 10:54 ` [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg Jeff Liu
2013-01-29 10:18 ` Lord Glauber Costa of Sealand
@ 2013-01-29 14:13 ` Michal Hocko
2013-01-30 16:01 ` Jeff Liu
1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 14:13 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, handai.szj
On Mon 28-01-13 18:54:38, Jeff Liu wrote:
> Root memcg with swap cgroup is special since we only do tracking but can
> not set limits against it. In order to facilitate the implementation of
> the coming swap cgroup structures delay allocation mechanism, we can bypass
> the default swap statistics upon the root memcg and figure it out through
> the global stats instead as below:
>
> root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
How do you protect from races with swap{in,out}? Or they are tolerable?
> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
I am not sure I understand and if I do then it is not true:
root (swap = 10M, use_hierarchy = 0/1)
\
A (swap = 1M, use_hierarchy = 1)
\
B (swap = 2M)
total for A is 3M regardless of what root has "accounted" while
total for root should be 10 for use_hierarchy = 0 and 13 for the other
case (this is btw. broken in the tree already now because
for_each_mem_cgroup_tree resp. mem_cgroup_iter doesn't honor
use_hierarchy for the root cgroup - this is a separate topic though).
> In this way, we'll return an invalid CSS_ID(generally, it's 0) at swap
> cgroup related tracking infrastructures if only the root memcg is alive.
> That is to say, we have not yet allocate swap cgroup structures.
> As a result, the per pages swapin/swapout stats number agains the root
> memcg shoud be ZERO.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
>
> ---
> mm/memcontrol.c | 35 ++++++++++++++++++++++++++++++-----
> 1 file changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 09255ec..afe5e86 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5231,12 +5231,34 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> struct mem_cgroup *mi;
> unsigned int i;
> + long long root_swap_stat = 0;
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> - continue;
> + long val = 0;
> +
> + if (i != MEM_CGROUP_STAT_SWAP)
> + val = mem_cgroup_read_stat(memcg, i);
> + else {
> + if (!do_swap_account)
> + continue;
> + if (!mem_cgroup_is_root(memcg))
> + val = mem_cgroup_read_stat(memcg, i);
> + else {
> + /*
> + * The corresponding stat number of swap for
> + * root_mem_cgroup is 0 since we don't account
> + * it in any case. Instead, we can fake the
> + * root number via: total_swap_pages -
> + * nr_swap_pages - total_swap_pages_of_all_memcg
> + */
> + for_each_mem_cgroup(mi)
> + val += mem_cgroup_read_stat(mi, i);
> + val = root_swap_stat = (total_swap_pages -
> + nr_swap_pages - val);
> + }
This calls for a helper.
> + }
> seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> - mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> + val * PAGE_SIZE);
> }
>
> for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> @@ -5260,8 +5282,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> long long val = 0;
>
> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> - continue;
> + if (i == MEM_CGROUP_STAT_SWAP) {
> + if (!do_swap_account)
> + continue;
> + val += root_swap_stat * PAGE_SIZE;
> + }
This doesn't seem right because you are adding root swap amount to _all_
groups. This should be done only if (memcg == root_mem_cgroup).
> for_each_mem_cgroup_tree(mi, memcg)
> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-29 14:13 ` Michal Hocko
@ 2013-01-30 16:01 ` Jeff Liu
2013-01-30 16:29 ` Michal Hocko
0 siblings, 1 reply; 33+ messages in thread
From: Jeff Liu @ 2013-01-30 16:01 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, handai.szj
On 01/29/2013 10:13 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:38, Jeff Liu wrote:
>> Root memcg with swap cgroup is special since we only do tracking but can
>> not set limits against it. In order to facilitate the implementation of
>> the coming swap cgroup structures delay allocation mechanism, we can bypass
>> the default swap statistics upon the root memcg and figure it out through
>> the global stats instead as below:
>>
>> root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
>
> How do you protect from races with swap{in,out}? Or they are tolerable?
To be honest, I previously have not taken race with swapin/out into consideration.
Yes, this patch would cause a little error since it has to iterate each memcg which can
introduce a bit overhead based on how many memcgs are configured.
However, considering our current implementation of swap statistics, we do account when swap
cache is uncharged, but it is possible that the swap slot is already allocated before that.
That is to say, there is a inconsistent window in swap accounting stats IMHO.
As a figure shows to human, I think it can be tolerated to some extents. :)
>
>> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
>
> I am not sure I understand and if I do then it is not true:
> root (swap = 10M, use_hierarchy = 0/1)
> \
> A (swap = 1M, use_hierarchy = 1)
> \
> B (swap = 2M)
>
> total for A is 3M regardless of what root has "accounted" while
> total for root should be 10 for use_hierarchy = 0 and 13 for the other
I am not sure I catch your point, but I think the total for root should be 13 no matter
use_hierarchy = 0 or 1, and the current patch is just doing that.
Originally, for_each_mem_cgroup_tree(iter, memcg) does statistics by iterating
all those children memcgs including the memcg itself. But now, as we don't account the
root memcg swap statistics anymore(hence the stats is 0), we need to add the local swap
stats of root memcg itself(10M) to the memcg_total_swap_stats. So actually we don't change
the way of accounting memcg_total_swap_stats.
> case (this is btw. broken in the tree already now because
> for_each_mem_cgroup_tree resp. mem_cgroup_iter doesn't honor
> use_hierarchy for the root cgroup - this is a separate topic though).
Yes, I noticed that the for_each_mem_cgroup_tree() resp, mem_cgroup_iter()
don't take the root->use_hierarchy into consideration, as it has the following logic:
if (!root->use_hierarchy && root != root_mem_cgroup) {
if (prev)
return NULL;
return root;
}
As i don't change the for_each_mem_cgroup_tree(), so it is in accordance with the original
behavior.
>> In this way, we'll return an invalid CSS_ID(generally, it's 0) at swap
>> cgroup related tracking infrastructures if only the root memcg is alive.
>> That is to say, we have not yet allocate swap cgroup structures.
>> As a result, the per pages swapin/swapout stats number agains the root
>> memcg shoud be ZERO.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>>
>> ---
>> mm/memcontrol.c | 35 ++++++++++++++++++++++++++++++-----
>> 1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 09255ec..afe5e86 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5231,12 +5231,34 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>> struct mem_cgroup *mi;
>> unsigned int i;
>> + long long root_swap_stat = 0;
>>
>> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> - continue;
>> + long val = 0;
>> +
>> + if (i != MEM_CGROUP_STAT_SWAP)
>> + val = mem_cgroup_read_stat(memcg, i);
>> + else {
>> + if (!do_swap_account)
>> + continue;
>
>
>> + if (!mem_cgroup_is_root(memcg))
>> + val = mem_cgroup_read_stat(memcg, i);
>> + else {
>> + /*
>> + * The corresponding stat number of swap for
>> + * root_mem_cgroup is 0 since we don't account
>> + * it in any case. Instead, we can fake the
>> + * root number via: total_swap_pages -
>> + * nr_swap_pages - total_swap_pages_of_all_memcg
>> + */
>> + for_each_mem_cgroup(mi)
>> + val += mem_cgroup_read_stat(mi, i);
>> + val = root_swap_stat = (total_swap_pages -
>> + nr_swap_pages - val);
>> + }
>
> This calls for a helper.
Yes, Sir.
>
>> + }
>> seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
>> - mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
>> + val * PAGE_SIZE);
>> }
>>
>> for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
>> @@ -5260,8 +5282,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>> long long val = 0;
>>
>> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> - continue;
>> + if (i == MEM_CGROUP_STAT_SWAP) {
>> + if (!do_swap_account)
>> + continue;
>> + val += root_swap_stat * PAGE_SIZE;
>> + }
>
> This doesn't seem right because you are adding root swap amount to _all_
> groups. This should be done only if (memcg == root_mem_cgroup).
Ah, I?m too dumb!
Thanks,
-Jeff
>
>> for_each_mem_cgroup_tree(mi, memcg)
>> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
>> seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-30 16:01 ` Jeff Liu
@ 2013-01-30 16:29 ` Michal Hocko
2013-01-31 4:00 ` Jeff Liu
0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-30 16:29 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, handai.szj
On Thu 31-01-13 00:01:28, Jeff Liu wrote:
> On 01/29/2013 10:13 PM, Michal Hocko wrote:
> > On Mon 28-01-13 18:54:38, Jeff Liu wrote:
> >> Root memcg with swap cgroup is special since we only do tracking
> >> but can not set limits against it. In order to facilitate
> >> the implementation of the coming swap cgroup structures delay
> >> allocation mechanism, we can bypass the default swap statistics
> >> upon the root memcg and figure it out through the global stats
> >> instead as below:
> >>
> >> root_memcg_swap_stat: total_swap_pages - nr_swap_pages -
> >> used_swap_pages_of_all_memcgs
> >
> > How do you protect from races with swap{in,out}? Or they are
> > tolerable?
> To be honest, I previously have not taken race with swapin/out into
> consideration.
>
> Yes, this patch would cause a little error since it has to iterate
> each memcg which can introduce a bit overhead based on how many memcgs
> are configured.
>
> However, considering our current implementation of swap statistics, we
> do account when swap cache is uncharged, but it is possible that the
> swap slot is already allocated before that.
I am not sure I follow you here. I was merely interested in races while
there is a swapping activity while the value is calculated. The errors,
or let's say imprecision, shouldn't be big but it would be good to think
how big it can be and how it can be reduced (e.g. what if we start
accounting for root once there is another group existing - this would
solve the problem of delayed allocation and the imprecision as well).
> That is to say, there is a inconsistent window in swap accounting stats IMHO.
> As a figure shows to human, I think it can be tolerated to some extents. :)
> >
> >> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
> >
> > I am not sure I understand and if I do then it is not true:
> > root (swap = 10M, use_hierarchy = 0/1)
> > \
> > A (swap = 1M, use_hierarchy = 1)
> > \
> > B (swap = 2M)
> >
> > total for A is 3M regardless of what root has "accounted" while
> > total for root should be 10 for use_hierarchy = 0 and 13 for the
> > other
>
> I am not sure I catch your point, but I think the total for root
> should be 13 no matter use_hierarchy = 0 or 1, and the current patch
> is just doing that.
I do not see any reason to make root different wrt. other roots of
hierarchy. Anyway this is not important right now.
> Originally, for_each_mem_cgroup_tree(iter, memcg) does statistics by
> iterating all those children memcgs including the memcg itself. But
> now, as we don't account the root memcg swap statistics anymore(hence
> the stats is 0), we need to add the local swap stats of root memcg
> itself(10M) to the memcg_total_swap_stats. So actually we don't
> change the way of accounting memcg_total_swap_stats.
I guess you are talking about tatal_ numbers. And yes, your patch
doesn't change that.
> > case (this is btw. broken in the tree already now because
> > for_each_mem_cgroup_tree resp. mem_cgroup_iter doesn't honor
> > use_hierarchy for the root cgroup - this is a separate topic
> > though).
>
> Yes, I noticed that the for_each_mem_cgroup_tree() resp,
> mem_cgroup_iter() don't take the root->use_hierarchy into
> consideration, as it has the following logic:
> if (!root->use_hierarchy && root != root_mem_cgroup) {
> if (prev)
> return NULL;
> return root;
> }
>
> As i don't change the for_each_mem_cgroup_tree(), so it is in
> accordance with the original behavior.
True, and I was just mentioning that as I noticed this only during
the review. It wasn't meant to dispute your patch. Sorry if this wasn't
clear enough. We have that behavior for ages and nobody complained so it
is probably not worth fixing (especially when use_hierarchy is on the
way out very sloooooowly).
[...]
Thanks
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
2013-01-30 16:29 ` Michal Hocko
@ 2013-01-31 4:00 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-31 4:00 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, handai.szj
On 01/31/2013 12:29 AM, Michal Hocko wrote:
> On Thu 31-01-13 00:01:28, Jeff Liu wrote:
>> On 01/29/2013 10:13 PM, Michal Hocko wrote:
>>> On Mon 28-01-13 18:54:38, Jeff Liu wrote:
>>>> Root memcg with swap cgroup is special since we only do tracking
>>>> but can not set limits against it. In order to facilitate
>>>> the implementation of the coming swap cgroup structures delay
>>>> allocation mechanism, we can bypass the default swap statistics
>>>> upon the root memcg and figure it out through the global stats
>>>> instead as below:
>>>>
>>>> root_memcg_swap_stat: total_swap_pages - nr_swap_pages -
>>>> used_swap_pages_of_all_memcgs
>>>
>>> How do you protect from races with swap{in,out}? Or they are
>>> tolerable?
>
>> To be honest, I previously have not taken race with swapin/out into
>> consideration.
>>
>> Yes, this patch would cause a little error since it has to iterate
>> each memcg which can introduce a bit overhead based on how many memcgs
>> are configured.
>>
>> However, considering our current implementation of swap statistics, we
>> do account when swap cache is uncharged, but it is possible that the
>> swap slot is already allocated before that.
>
> I am not sure I follow you here. I was merely interested in races while
> there is a swapping activity while the value is calculated. The errors,
> or let's say imprecision, shouldn't be big but it would be good to think
> how big it can be and how it can be reduced
...
> (e.g. what if we start
> accounting for root once there is another group existing - this would
> solve the problem of delayed allocation and the imprecision as well).
At first, I also tried to account for root memcg swap in this sway, i.e.
return the root_memcg css_id from swap_cgroup_cmpxchg() &&
swap_cgroup_record() if there is no children memcg exists, however, it
caused the kernel panic with deadlock...
If return 0 from both functions in this case, no panic but it's
obviously that the root memcg swap accounting number is wrong, so that's
why I choose the current idea to bypass it...
I need some time to dig into the details anyway.
>
>> That is to say, there is a inconsistent window in swap accounting stats IMHO.
>> As a figure shows to human, I think it can be tolerated to some extents. :)
>>>
>>>> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
>>>
>>> I am not sure I understand and if I do then it is not true:
>>> root (swap = 10M, use_hierarchy = 0/1)
>>> \
>>> A (swap = 1M, use_hierarchy = 1)
>>> \
>>> B (swap = 2M)
>>>
>>> total for A is 3M regardless of what root has "accounted" while
>>> total for root should be 10 for use_hierarchy = 0 and 13 for the
>>> other
>>
>> I am not sure I catch your point, but I think the total for root
>> should be 13 no matter use_hierarchy = 0 or 1, and the current patch
>> is just doing that.
>
> I do not see any reason to make root different wrt. other roots of
> hierarchy. Anyway this is not important right now.
>
>> Originally, for_each_mem_cgroup_tree(iter, memcg) does statistics by
>> iterating all those children memcgs including the memcg itself. But
>> now, as we don't account the root memcg swap statistics anymore(hence
>> the stats is 0), we need to add the local swap stats of root memcg
>> itself(10M) to the memcg_total_swap_stats. So actually we don't
>> change the way of accounting memcg_total_swap_stats.
>
> I guess you are talking about tatal_ numbers. And yes, your patch
> doesn't change that.
Yes.
Thanks you!
-Jeff
>
>>> case (this is btw. broken in the tree already now because
>>> for_each_mem_cgroup_tree resp. mem_cgroup_iter doesn't honor
>>> use_hierarchy for the root cgroup - this is a separate topic
>>> though).
>>
>> Yes, I noticed that the for_each_mem_cgroup_tree() resp,
>> mem_cgroup_iter() don't take the root->use_hierarchy into
>> consideration, as it has the following logic:
>> if (!root->use_hierarchy && root != root_mem_cgroup) {
>> if (prev)
>> return NULL;
>> return root;
>> }
>>
>> As i don't change the for_each_mem_cgroup_tree(), so it is in
>> accordance with the original behavior.
>
> True, and I was just mentioning that as I noticed this only during
> the review. It wasn't meant to dispute your patch. Sorry if this wasn't
> clear enough. We have that behavior for ages and nobody complained so it
> is probably not worth fixing (especially when use_hierarchy is on the
> way out very sloooooowly).
>
> [...]
>
> Thanks
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 3/6] memcg: introduce memsw_accounting_users
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 9:46 ` Lord Glauber Costa of Sealand
2013-01-29 14:24 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, cgroups
As we don't account the swap stat number for the root_mem_cgroup anymore,
here we can just return an invalid CSS ID if there is no non-root memcg
is alive. Also, introduce memsw_accounting_users to track the number of
active non-root memcgs.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Sha Zhengju <handai.szj@taobao.com>
---
mm/page_cgroup.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index c945254..189fbf5 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -336,6 +336,8 @@ struct swap_cgroup {
};
#define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
+static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
+
/*
* SwapCgroup implements "lookup" and "exchange" operations.
* In typical usage, this swap_cgroup is accessed via memcg's charge/uncharge
@@ -389,6 +391,9 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
struct page *mappage;
struct swap_cgroup *sc;
+ if (!atomic_read(&memsw_accounting_users))
+ return NULL;
+
ctrl = &swap_cgroup_ctrl[swp_type(ent)];
if (ctrlp)
*ctrlp = ctrl;
@@ -416,6 +421,8 @@ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
unsigned short retval;
sc = lookup_swap_cgroup(ent, &ctrl);
+ if (!sc)
+ return 0;
spin_lock_irqsave(&ctrl->lock, flags);
retval = sc->id;
@@ -443,6 +450,8 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
unsigned long flags;
sc = lookup_swap_cgroup(ent, &ctrl);
+ if (!sc)
+ return 0;
spin_lock_irqsave(&ctrl->lock, flags);
old = sc->id;
@@ -460,7 +469,9 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
*/
unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
{
- return lookup_swap_cgroup(ent, NULL)->id;
+ struct swap_cgroup *sc = lookup_swap_cgroup(ent, NULL);
+
+ return sc ? sc->id : 0;
}
int swap_cgroup_swapon(int type, unsigned long max_pages)
@@ -471,6 +482,9 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
if (!do_swap_account)
return 0;
+ if (!atomic_read(&memsw_accounting_users))
+ return 0;
+
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
ctrl = &swap_cgroup_ctrl[type];
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 3/6] memcg: introduce memsw_accounting_users
2013-01-28 10:54 ` [PATCH v2 3/6] memcg: introduce memsw_accounting_users Jeff Liu
@ 2013-01-29 9:46 ` Lord Glauber Costa of Sealand
2013-01-29 10:52 ` Jeff Liu
2013-01-29 14:26 ` Michal Hocko
2013-01-29 14:24 ` Michal Hocko
1 sibling, 2 replies; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 9:46 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, cgroups
Hi,
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> As we don't account the swap stat number for the root_mem_cgroup anymore,
> here we can just return an invalid CSS ID if there is no non-root memcg
> is alive. Also, introduce memsw_accounting_users to track the number of
> active non-root memcgs.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
> ---
> mm/page_cgroup.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index c945254..189fbf5 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -336,6 +336,8 @@ struct swap_cgroup {
> };
> #define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
>
> +static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
> +
I am not seeing this being incremented or decremented. I can only guess
that it comes in later patches. However, they are clearly used as a
global reference counter.
This is precisely one of the use cases static branches solve very
neatly. Did you consider using them?
True, they will help a lot more when we are touching hot paths, and swap
is hardly a hot path.
However, since one of the main complaints about memcg has been that we
inflict "death by a thousand cuts", maybe it wouldn't hurt everything
else being the same.
Michal and others, do you have any feelings here?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 3/6] memcg: introduce memsw_accounting_users
2013-01-29 9:46 ` Lord Glauber Costa of Sealand
@ 2013-01-29 10:52 ` Jeff Liu
2013-01-29 14:26 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 10:52 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand; +Cc: linux-mm, Michal Hocko, cgroups
On 01/29/2013 05:46 PM, Lord Glauber Costa of Sealand wrote:
> Hi,
>
> On 01/28/2013 02:54 PM, Jeff Liu wrote:
>> As we don't account the swap stat number for the root_mem_cgroup anymore,
>> here we can just return an invalid CSS ID if there is no non-root memcg
>> is alive. Also, introduce memsw_accounting_users to track the number of
>> active non-root memcgs.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Sha Zhengju <handai.szj@taobao.com>
>>
>> ---
>> mm/page_cgroup.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>> index c945254..189fbf5 100644
>> --- a/mm/page_cgroup.c
>> +++ b/mm/page_cgroup.c
>> @@ -336,6 +336,8 @@ struct swap_cgroup {
>> };
>> #define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
>>
>> +static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
>> +
>
> I am not seeing this being incremented or decremented. I can only guess
> that it comes in later patches. However, they are clearly used as a
> global reference counter.
>
> This is precisely one of the use cases static branches solve very
> neatly. Did you consider using them?
Yep, I'll try to replace it with the static branches if there are no
objections from the others. :)
Thanks,
-Jeff
>
> True, they will help a lot more when we are touching hot paths, and swap
> is hardly a hot path.
>
> However, since one of the main complaints about memcg has been that we
> inflict "death by a thousand cuts", maybe it wouldn't hurt everything
> else being the same.
>
> Michal and others, do you have any feelings here?
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 3/6] memcg: introduce memsw_accounting_users
2013-01-29 9:46 ` Lord Glauber Costa of Sealand
2013-01-29 10:52 ` Jeff Liu
@ 2013-01-29 14:26 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 14:26 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand; +Cc: Jeff Liu, linux-mm, cgroups
On Tue 29-01-13 13:46:33, Glauber Costa wrote:
> Hi,
>
> On 01/28/2013 02:54 PM, Jeff Liu wrote:
> > As we don't account the swap stat number for the root_mem_cgroup anymore,
> > here we can just return an invalid CSS ID if there is no non-root memcg
> > is alive. Also, introduce memsw_accounting_users to track the number of
> > active non-root memcgs.
> >
> > Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> > CC: Glauber Costa <glommer@parallels.com>
> > CC: Michal Hocko <mhocko@suse.cz>
> > CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > CC: Johannes Weiner <hannes@cmpxchg.org>
> > CC: Mel Gorman <mgorman@suse.de>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Sha Zhengju <handai.szj@taobao.com>
> >
> > ---
> > mm/page_cgroup.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > index c945254..189fbf5 100644
> > --- a/mm/page_cgroup.c
> > +++ b/mm/page_cgroup.c
> > @@ -336,6 +336,8 @@ struct swap_cgroup {
> > };
> > #define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
> >
> > +static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
> > +
>
> I am not seeing this being incremented or decremented. I can only guess
> that it comes in later patches. However, they are clearly used as a
> global reference counter.
>
> This is precisely one of the use cases static branches solve very
> neatly. Did you consider using them?
>
> True, they will help a lot more when we are touching hot paths, and swap
> is hardly a hot path.
>
> However, since one of the main complaints about memcg has been that we
> inflict "death by a thousand cuts", maybe it wouldn't hurt everything
> else being the same.
>
> Michal and others, do you have any feelings here?
I would leave a static branch change to a separate patch. Make it work
first and only then care about how it looks.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 3/6] memcg: introduce memsw_accounting_users
2013-01-28 10:54 ` [PATCH v2 3/6] memcg: introduce memsw_accounting_users Jeff Liu
2013-01-29 9:46 ` Lord Glauber Costa of Sealand
@ 2013-01-29 14:24 ` Michal Hocko
2013-01-29 15:16 ` Jeff Liu
1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 14:24 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:40, Jeff Liu wrote:
> As we don't account the swap stat number for the root_mem_cgroup anymore,
> here we can just return an invalid CSS ID if there is no non-root memcg
> is alive. Also, introduce memsw_accounting_users to track the number of
> active non-root memcgs.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
> ---
> mm/page_cgroup.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index c945254..189fbf5 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -336,6 +336,8 @@ struct swap_cgroup {
> };
> #define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
>
> +static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
Nobody manipulates whith this number. I suspect the next patch will do
but it is generally better to have also users of the introduced counters
in the same patch. For one thing this patch would introduce a regression
because no pages would be accounted at this stage (for example during
git bisect).
> +
> /*
> * SwapCgroup implements "lookup" and "exchange" operations.
> * In typical usage, this swap_cgroup is accessed via memcg's charge/uncharge
> @@ -389,6 +391,9 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
> struct page *mappage;
> struct swap_cgroup *sc;
>
> + if (!atomic_read(&memsw_accounting_users))
> + return NULL;
> +
> ctrl = &swap_cgroup_ctrl[swp_type(ent)];
> if (ctrlp)
> *ctrlp = ctrl;
> @@ -416,6 +421,8 @@ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
> unsigned short retval;
>
> sc = lookup_swap_cgroup(ent, &ctrl);
> + if (!sc)
> + return 0;
>
> spin_lock_irqsave(&ctrl->lock, flags);
> retval = sc->id;
> @@ -443,6 +450,8 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
> unsigned long flags;
>
> sc = lookup_swap_cgroup(ent, &ctrl);
> + if (!sc)
> + return 0;
>
> spin_lock_irqsave(&ctrl->lock, flags);
> old = sc->id;
> @@ -460,7 +469,9 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
> */
> unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
> {
> - return lookup_swap_cgroup(ent, NULL)->id;
> + struct swap_cgroup *sc = lookup_swap_cgroup(ent, NULL);
> +
> + return sc ? sc->id : 0;
> }
>
> int swap_cgroup_swapon(int type, unsigned long max_pages)
> @@ -471,6 +482,9 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
> if (!do_swap_account)
> return 0;
>
> + if (!atomic_read(&memsw_accounting_users))
> + return 0;
> +
> length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
>
> ctrl = &swap_cgroup_ctrl[type];
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 3/6] memcg: introduce memsw_accounting_users
2013-01-29 14:24 ` Michal Hocko
@ 2013-01-29 15:16 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 15:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, cgroups
On 01/29/2013 10:24 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:40, Jeff Liu wrote:
>> As we don't account the swap stat number for the root_mem_cgroup anymore,
>> here we can just return an invalid CSS ID if there is no non-root memcg
>> is alive. Also, introduce memsw_accounting_users to track the number of
>> active non-root memcgs.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Sha Zhengju <handai.szj@taobao.com>
>>
>> ---
>> mm/page_cgroup.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>> index c945254..189fbf5 100644
>> --- a/mm/page_cgroup.c
>> +++ b/mm/page_cgroup.c
>> @@ -336,6 +336,8 @@ struct swap_cgroup {
>> };
>> #define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup))
>>
>> +static atomic_t memsw_accounting_users = ATOMIC_INIT(0);
>
> Nobody manipulates whith this number. I suspect the next patch will do
> but it is generally better to have also users of the introduced counters
> in the same patch. For one thing this patch would introduce a regression
> because no pages would be accounted at this stage (for example during
> git bisect).
Ah, I'll revise it accordingly.
Thanks,
-Jeff
>
>> +
>> /*
>> * SwapCgroup implements "lookup" and "exchange" operations.
>> * In typical usage, this swap_cgroup is accessed via memcg's charge/uncharge
>> @@ -389,6 +391,9 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
>> struct page *mappage;
>> struct swap_cgroup *sc;
>>
>> + if (!atomic_read(&memsw_accounting_users))
>> + return NULL;
>> +
>> ctrl = &swap_cgroup_ctrl[swp_type(ent)];
>> if (ctrlp)
>> *ctrlp = ctrl;
>> @@ -416,6 +421,8 @@ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
>> unsigned short retval;
>>
>> sc = lookup_swap_cgroup(ent, &ctrl);
>> + if (!sc)
>> + return 0;
>>
>> spin_lock_irqsave(&ctrl->lock, flags);
>> retval = sc->id;
>> @@ -443,6 +450,8 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
>> unsigned long flags;
>>
>> sc = lookup_swap_cgroup(ent, &ctrl);
>> + if (!sc)
>> + return 0;
>>
>> spin_lock_irqsave(&ctrl->lock, flags);
>> old = sc->id;
>> @@ -460,7 +469,9 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
>> */
>> unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
>> {
>> - return lookup_swap_cgroup(ent, NULL)->id;
>> + struct swap_cgroup *sc = lookup_swap_cgroup(ent, NULL);
>> +
>> + return sc ? sc->id : 0;
>> }
>>
>> int swap_cgroup_swapon(int type, unsigned long max_pages)
>> @@ -471,6 +482,9 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
>> if (!do_swap_account)
>> return 0;
>>
>> + if (!atomic_read(&memsw_accounting_users))
>> + return 0;
>> +
>> length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
>>
>> ctrl = &swap_cgroup_ctrl[type];
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 4/6] memcg: export nr_swap_files
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 9:47 ` Lord Glauber Costa of Sealand
2013-01-29 14:31 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, cgroups
Export nr_swap_files which would be used for initializing and destorying
swap cgroup structures in the coming patch.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Sha Zhengju <handai.szj@taobao.com>
---
include/linux/swap.h | 1 +
mm/swapfile.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 68df9c1..6de44c9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -348,6 +348,7 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
/* linux/mm/swapfile.c */
extern long nr_swap_pages;
extern long total_swap_pages;
+extern unsigned int nr_swapfiles;
extern void si_swapinfo(struct sysinfo *);
extern swp_entry_t get_swap_page(void);
extern swp_entry_t get_swap_page_of_type(int);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index e97a0e5..89cebcf 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -46,7 +46,7 @@ static void free_swap_count_continuations(struct swap_info_struct *);
static sector_t map_swap_entry(swp_entry_t, struct block_device**);
DEFINE_SPINLOCK(swap_lock);
-static unsigned int nr_swapfiles;
+unsigned int nr_swapfiles;
long nr_swap_pages;
long total_swap_pages;
static int least_priority;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 4/6] memcg: export nr_swap_files
2013-01-28 10:54 ` [PATCH v2 4/6] memcg: export nr_swap_files Jeff Liu
@ 2013-01-29 9:47 ` Lord Glauber Costa of Sealand
2013-01-29 14:31 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 9:47 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, cgroups
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> Export nr_swap_files which would be used for initializing and destorying
> swap cgroup structures in the coming patch.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
Well, nothing fancy here.
Acked-by: Glauber Costa <glommer@parallels.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 4/6] memcg: export nr_swap_files
2013-01-28 10:54 ` [PATCH v2 4/6] memcg: export nr_swap_files Jeff Liu
2013-01-29 9:47 ` Lord Glauber Costa of Sealand
@ 2013-01-29 14:31 ` Michal Hocko
2013-01-29 15:17 ` Jeff Liu
1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 14:31 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:46, Jeff Liu wrote:
> Export nr_swap_files which would be used for initializing and destorying
> swap cgroup structures in the coming patch.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
You are missing definition for !CONFIG_SWAP
After this is fixed you can add my
Acked-by: Michal Hocko <mhocko@suse.cz>
>
> ---
> include/linux/swap.h | 1 +
> mm/swapfile.c | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 68df9c1..6de44c9 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -348,6 +348,7 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> /* linux/mm/swapfile.c */
> extern long nr_swap_pages;
> extern long total_swap_pages;
> +extern unsigned int nr_swapfiles;
> extern void si_swapinfo(struct sysinfo *);
> extern swp_entry_t get_swap_page(void);
> extern swp_entry_t get_swap_page_of_type(int);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index e97a0e5..89cebcf 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -46,7 +46,7 @@ static void free_swap_count_continuations(struct swap_info_struct *);
> static sector_t map_swap_entry(swp_entry_t, struct block_device**);
>
> DEFINE_SPINLOCK(swap_lock);
> -static unsigned int nr_swapfiles;
> +unsigned int nr_swapfiles;
> long nr_swap_pages;
> long total_swap_pages;
> static int least_priority;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 4/6] memcg: export nr_swap_files
2013-01-29 14:31 ` Michal Hocko
@ 2013-01-29 15:17 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 15:17 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, cgroups
On 01/29/2013 10:31 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:46, Jeff Liu wrote:
>> Export nr_swap_files which would be used for initializing and destorying
>> swap cgroup structures in the coming patch.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Sha Zhengju <handai.szj@taobao.com>
>
> You are missing definition for !CONFIG_SWAP
Yep, I missed "#define nr_swapfiles 0U" in this case.
Thanks,
-Jeff
>
> After this is fixed you can add my
> Acked-by: Michal Hocko <mhocko@suse.cz>
>
>>
>> ---
>> include/linux/swap.h | 1 +
>> mm/swapfile.c | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 68df9c1..6de44c9 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -348,6 +348,7 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
>> /* linux/mm/swapfile.c */
>> extern long nr_swap_pages;
>> extern long total_swap_pages;
>> +extern unsigned int nr_swapfiles;
>> extern void si_swapinfo(struct sysinfo *);
>> extern swp_entry_t get_swap_page(void);
>> extern swp_entry_t get_swap_page_of_type(int);
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index e97a0e5..89cebcf 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -46,7 +46,7 @@ static void free_swap_count_continuations(struct swap_info_struct *);
>> static sector_t map_swap_entry(swp_entry_t, struct block_device**);
>>
>> DEFINE_SPINLOCK(swap_lock);
>> -static unsigned int nr_swapfiles;
>> +unsigned int nr_swapfiles;
>> long nr_swap_pages;
>> long total_swap_pages;
>> static int least_priority;
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 9:57 ` Lord Glauber Costa of Sealand
2013-01-29 14:56 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, cgroups
Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
non-root memcg and deallocate buffers on the last non-root memcg is gone.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Sha Zhengju <handai.szj@taobao.com>
---
include/linux/page_cgroup.h | 12 +++++
mm/page_cgroup.c | 108 +++++++++++++++++++++++++++++++++++++++----
2 files changed, 110 insertions(+), 10 deletions(-)
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 777a524..1255cc9 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -113,6 +113,8 @@ extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
extern int swap_cgroup_swapon(int type, unsigned long max_pages);
extern void swap_cgroup_swapoff(int type);
+extern int swap_cgroup_init(void);
+extern void swap_cgroup_free(void);
#else
static inline
@@ -138,6 +140,16 @@ static inline void swap_cgroup_swapoff(int type)
return;
}
+static inline int swap_cgroup_init(void)
+{
+ return 0;
+}
+
+static inline void swap_cgroup_free(void)
+{
+ return;
+}
+
#endif /* CONFIG_MEMCG_SWAP */
#endif /* !__GENERATING_BOUNDS_H */
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 189fbf5..0ebd127 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -362,14 +362,28 @@ static int swap_cgroup_prepare(int type)
unsigned long idx, max;
ctrl = &swap_cgroup_ctrl[type];
+ if (!ctrl->length) {
+ /*
+ * Bypass the buffer allocation if the corresponding swap
+ * partition/file was turned off.
+ */
+ pr_debug("couldn't allocate swap_cgroup on a disabled swap "
+ "partition or file, index: %d\n", type);
+ return 0;
+ }
+
ctrl->map = vzalloc(ctrl->length * sizeof(void *));
- if (!ctrl->map)
+ if (!ctrl->map) {
+ ctrl->length = 0;
goto nomem;
+ }
for (idx = 0; idx < ctrl->length; idx++) {
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!page)
+ if (!page) {
+ ctrl->length = 0;
goto not_enough_page;
+ }
ctrl->map[idx] = page;
}
return 0;
@@ -383,6 +397,32 @@ nomem:
return -ENOMEM;
}
+/*
+ * free buffer for swap_cgroup.
+ */
+static void swap_cgroup_teardown(int type)
+{
+ struct page **map;
+ unsigned long length;
+ struct swap_cgroup_ctrl *ctrl;
+
+ ctrl = &swap_cgroup_ctrl[type];
+ map = ctrl->map;
+ length = ctrl->length;
+ ctrl->map = NULL;
+ ctrl->length = 0;
+
+ if (map) {
+ unsigned long i;
+ for (i = 0; i < length; i++) {
+ struct page *page = map[i];
+ if (page)
+ __free_page(page);
+ }
+ vfree(map);
+ }
+}
+
static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
struct swap_cgroup_ctrl **ctrlp)
{
@@ -474,6 +514,56 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
return sc ? sc->id : 0;
}
+/*
+ * Allocate swap cgroup accounting structures when the first non-root
+ * memcg is created.
+ */
+int swap_cgroup_init(void)
+{
+ unsigned int type;
+
+ if (!do_swap_account)
+ return 0;
+
+ if (atomic_add_return(1, &memsw_accounting_users) != 1)
+ return 0;
+
+ mutex_lock(&swap_cgroup_mutex);
+ for (type = 0; type < nr_swapfiles; type++) {
+ if (swap_cgroup_prepare(type) < 0) {
+ mutex_unlock(&swap_cgroup_mutex);
+ goto nomem;
+ }
+ }
+ mutex_unlock(&swap_cgroup_mutex);
+ return 0;
+
+nomem:
+ pr_info("couldn't allocate enough memory for swap_cgroup "
+ "while creating non-root memcg.\n");
+ return -ENOMEM;
+}
+
+/*
+ * Deallocate swap cgroup accounting structures on the last non-root
+ * memcg removal.
+ */
+void swap_cgroup_free(void)
+{
+ unsigned int type;
+
+ if (!do_swap_account)
+ return;
+
+ if (atomic_sub_return(1, &memsw_accounting_users))
+ return;
+
+ mutex_lock(&swap_cgroup_mutex);
+ for (type = 0; type < nr_swapfiles; type++)
+ swap_cgroup_teardown(type);
+ mutex_unlock(&swap_cgroup_mutex);
+}
+
int swap_cgroup_swapon(int type, unsigned long max_pages)
{
unsigned long length;
@@ -482,20 +572,18 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
if (!do_swap_account)
return 0;
- if (!atomic_read(&memsw_accounting_users))
- return 0;
-
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
ctrl = &swap_cgroup_ctrl[type];
mutex_lock(&swap_cgroup_mutex);
ctrl->length = length;
spin_lock_init(&ctrl->lock);
- if (swap_cgroup_prepare(type)) {
- /* memory shortage */
- ctrl->length = 0;
- mutex_unlock(&swap_cgroup_mutex);
- goto nomem;
+ if (atomic_read(&memsw_accounting_users)) {
+ if (swap_cgroup_prepare(type)) {
+ /* memory shortage */
+ mutex_unlock(&swap_cgroup_mutex);
+ goto nomem;
+ }
}
mutex_unlock(&swap_cgroup_mutex);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
2013-01-28 10:54 ` [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free() Jeff Liu
@ 2013-01-29 9:57 ` Lord Glauber Costa of Sealand
2013-01-29 10:21 ` Jeff Liu
2013-01-29 14:56 ` Michal Hocko
1 sibling, 1 reply; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 9:57 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, cgroups
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
> non-root memcg and deallocate buffers on the last non-root memcg is gone.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
Looks sane.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Only:
> #endif /* !__GENERATING_BOUNDS_H */
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 189fbf5..0ebd127 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -362,14 +362,28 @@ static int swap_cgroup_prepare(int type)
> unsigned long idx, max;
>
> ctrl = &swap_cgroup_ctrl[type];
> + if (!ctrl->length) {
> + /*
> + * Bypass the buffer allocation if the corresponding swap
> + * partition/file was turned off.
> + */
> + pr_debug("couldn't allocate swap_cgroup on a disabled swap "
> + "partition or file, index: %d\n", type);
> + return 0;
> + }
> +
> ctrl->map = vzalloc(ctrl->length * sizeof(void *));
> - if (!ctrl->map)
> + if (!ctrl->map) {
> + ctrl->length = 0;
Considering moving this assignment somewhere in the exit path in the
labels region.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
2013-01-29 9:57 ` Lord Glauber Costa of Sealand
@ 2013-01-29 10:21 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 10:21 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand; +Cc: linux-mm, Michal Hocko, cgroups
On 01/29/2013 05:57 PM, Lord Glauber Costa of Sealand wrote:
> On 01/28/2013 02:54 PM, Jeff Liu wrote:
>> Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
>> non-root memcg and deallocate buffers on the last non-root memcg is gone.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Sha Zhengju <handai.szj@taobao.com>
>>
>
> Looks sane.
>
> Reviewed-by: Glauber Costa <glommer@parallels.com>
>
> Only:
>
>> #endif /* !__GENERATING_BOUNDS_H */
>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>> index 189fbf5..0ebd127 100644
>> --- a/mm/page_cgroup.c
>> +++ b/mm/page_cgroup.c
>> @@ -362,14 +362,28 @@ static int swap_cgroup_prepare(int type)
>> unsigned long idx, max;
>>
>> ctrl = &swap_cgroup_ctrl[type];
>> + if (!ctrl->length) {
>> + /*
>> + * Bypass the buffer allocation if the corresponding swap
>> + * partition/file was turned off.
>> + */
>> + pr_debug("couldn't allocate swap_cgroup on a disabled swap "
>> + "partition or file, index: %d\n", type);
>> + return 0;
>> + }
>> +
>> ctrl->map = vzalloc(ctrl->length * sizeof(void *));
>> - if (!ctrl->map)
>> + if (!ctrl->map) {
>> + ctrl->length = 0;
>
> Considering moving this assignment somewhere in the exit path in the
> labels region.
Nice point. Both "ctrl->length = 0" statements in this function should be
moved down to the exit path of "nomem:" label.
Thanks,
-Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
2013-01-28 10:54 ` [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free() Jeff Liu
2013-01-29 9:57 ` Lord Glauber Costa of Sealand
@ 2013-01-29 14:56 ` Michal Hocko
2013-01-29 15:51 ` Jeff Liu
1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 14:56 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:47, Jeff Liu wrote:
> Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
> non-root memcg and deallocate buffers on the last non-root memcg is gone.
I think this deserves more words ;) At least it would be good to
describe contexts from which init and free might be called. What are the
locking rules.
Also swap_cgroup_destroy sounds more in pair with swap_cgroup_init.
Please add the users of those function here as well. It is much easier
to review.
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
>
> ---
> include/linux/page_cgroup.h | 12 +++++
> mm/page_cgroup.c | 108 +++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 110 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 777a524..1255cc9 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -113,6 +113,8 @@ extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
> extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
> extern int swap_cgroup_swapon(int type, unsigned long max_pages);
> extern void swap_cgroup_swapoff(int type);
> +extern int swap_cgroup_init(void);
> +extern void swap_cgroup_free(void);
> #else
>
> static inline
> @@ -138,6 +140,16 @@ static inline void swap_cgroup_swapoff(int type)
> return;
> }
>
> +static inline int swap_cgroup_init(void)
> +{
> + return 0;
> +}
> +
> +static inline void swap_cgroup_free(void)
> +{
> + return;
> +}
> +
> #endif /* CONFIG_MEMCG_SWAP */
>
> #endif /* !__GENERATING_BOUNDS_H */
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 189fbf5..0ebd127 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -362,14 +362,28 @@ static int swap_cgroup_prepare(int type)
> unsigned long idx, max;
>
> ctrl = &swap_cgroup_ctrl[type];
> + if (!ctrl->length) {
> + /*
> + * Bypass the buffer allocation if the corresponding swap
> + * partition/file was turned off.
> + */
> + pr_debug("couldn't allocate swap_cgroup on a disabled swap "
> + "partition or file, index: %d\n", type);
Do we really need to log this? I guess your scenario is:
swapon part1
swapon part2
swapon part3
swapoff part2
create first non-root cgroup
which is perfectly ok and I do not see any reason to log it.
> + return 0;
> + }
> +
> ctrl->map = vzalloc(ctrl->length * sizeof(void *));
> - if (!ctrl->map)
> + if (!ctrl->map) {
> + ctrl->length = 0;
> goto nomem;
> + }
>
> for (idx = 0; idx < ctrl->length; idx++) {
> page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> - if (!page)
> + if (!page) {
> + ctrl->length = 0;
> goto not_enough_page;
> + }
> ctrl->map[idx] = page;
> }
> return 0;
> @@ -383,6 +397,32 @@ nomem:
ctrl->length = 0 under this label would be probably nicer than keeping
it at two places.
> return -ENOMEM;
> }
>
> +/*
> + * free buffer for swap_cgroup.
> + */
> +static void swap_cgroup_teardown(int type)
> +{
> + struct page **map;
> + unsigned long length;
> + struct swap_cgroup_ctrl *ctrl;
> +
> + ctrl = &swap_cgroup_ctrl[type];
> + map = ctrl->map;
> + length = ctrl->length;
> + ctrl->map = NULL;
> + ctrl->length = 0;
> +
> + if (map) {
allocation path checks for ctrl->length so it would be good to unify
both. They are handling the same case (gone swap).
> + unsigned long i;
> + for (i = 0; i < length; i++) {
> + struct page *page = map[i];
> + if (page)
> + __free_page(page);
> + }
> + vfree(map);
> + }
> +}
> +
> static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
> struct swap_cgroup_ctrl **ctrlp)
> {
> @@ -474,6 +514,56 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
> return sc ? sc->id : 0;
> }
>
> +/*
> + * Allocate swap cgroup accounting structures when the first non-root
> + * memcg is created.
> + */
> +int swap_cgroup_init(void)
> +{
> + unsigned int type;
> +
> + if (!do_swap_account)
> + return 0;
> +
> + if (atomic_add_return(1, &memsw_accounting_users) != 1)
> + return 0;
> +
> + mutex_lock(&swap_cgroup_mutex);
> + for (type = 0; type < nr_swapfiles; type++) {
> + if (swap_cgroup_prepare(type) < 0) {
> + mutex_unlock(&swap_cgroup_mutex);
> + goto nomem;
> + }
> + }
You should clean up those types that were successful...
> + mutex_unlock(&swap_cgroup_mutex);
> + return 0;
> +
> +nomem:
> + pr_info("couldn't allocate enough memory for swap_cgroup "
> + "while creating non-root memcg.\n");
> + return -ENOMEM;
> +}
> +
> +/*
> + * Deallocate swap cgroup accounting structures on the last non-root
> + * memcg removal.
> + */
> +void swap_cgroup_free(void)
> +{
> + unsigned int type;
> +
> + if (!do_swap_account)
> + return;
> +
> + if (atomic_sub_return(1, &memsw_accounting_users))
> + return;
> +
> + mutex_lock(&swap_cgroup_mutex);
> + for (type = 0; type < nr_swapfiles; type++)
> + swap_cgroup_teardown(type);
> + mutex_unlock(&swap_cgroup_mutex);
> +}
> +
> int swap_cgroup_swapon(int type, unsigned long max_pages)
> {
> unsigned long length;
> @@ -482,20 +572,18 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
> if (!do_swap_account)
> return 0;
>
> - if (!atomic_read(&memsw_accounting_users))
> - return 0;
> -
> length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
>
> ctrl = &swap_cgroup_ctrl[type];
> mutex_lock(&swap_cgroup_mutex);
> ctrl->length = length;
> spin_lock_init(&ctrl->lock);
> - if (swap_cgroup_prepare(type)) {
> - /* memory shortage */
> - ctrl->length = 0;
> - mutex_unlock(&swap_cgroup_mutex);
> - goto nomem;
> + if (atomic_read(&memsw_accounting_users)) {
> + if (swap_cgroup_prepare(type)) {
> + /* memory shortage */
> + mutex_unlock(&swap_cgroup_mutex);
> + goto nomem;
> + }
> }
> mutex_unlock(&swap_cgroup_mutex);
>
> --
> 1.7.9.5
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
2013-01-29 14:56 ` Michal Hocko
@ 2013-01-29 15:51 ` Jeff Liu
2013-01-29 16:09 ` Michal Hocko
0 siblings, 1 reply; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 15:51 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, cgroups
On 01/29/2013 10:56 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:47, Jeff Liu wrote:
>> Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
>> non-root memcg and deallocate buffers on the last non-root memcg is gone.
>
> I think this deserves more words ;) At least it would be good to
> describe contexts from which init and free might be called. What are the
> locking rules.
> Also swap_cgroup_destroy sounds more in pair with swap_cgroup_init.
Will improve the comments log as well as fix the naming. Btw, I named
it as swap_cgroup_free() because we have mem_cgroup_free() corresponding
to mem_cgroup_init(). :)
>
> Please add the users of those function here as well. It is much easier
> to review.
Sure.
>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Sha Zhengju <handai.szj@taobao.com>
>>
>> ---
>> include/linux/page_cgroup.h | 12 +++++
>> mm/page_cgroup.c | 108 +++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 110 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 777a524..1255cc9 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -113,6 +113,8 @@ extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
>> extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
>> extern int swap_cgroup_swapon(int type, unsigned long max_pages);
>> extern void swap_cgroup_swapoff(int type);
>> +extern int swap_cgroup_init(void);
>> +extern void swap_cgroup_free(void);
>> #else
>>
>> static inline
>> @@ -138,6 +140,16 @@ static inline void swap_cgroup_swapoff(int type)
>> return;
>> }
>>
>> +static inline int swap_cgroup_init(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline void swap_cgroup_free(void)
>> +{
>> + return;
>> +}
>> +
>> #endif /* CONFIG_MEMCG_SWAP */
>>
>> #endif /* !__GENERATING_BOUNDS_H */
>> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
>> index 189fbf5..0ebd127 100644
>> --- a/mm/page_cgroup.c
>> +++ b/mm/page_cgroup.c
>> @@ -362,14 +362,28 @@ static int swap_cgroup_prepare(int type)
>> unsigned long idx, max;
>>
>> ctrl = &swap_cgroup_ctrl[type];
>> + if (!ctrl->length) {
>> + /*
>> + * Bypass the buffer allocation if the corresponding swap
>> + * partition/file was turned off.
>> + */
>> + pr_debug("couldn't allocate swap_cgroup on a disabled swap "
>> + "partition or file, index: %d\n", type);
>
> Do we really need to log this? I guess your scenario is:
> swapon part1
> swapon part2
> swapon part3
> swapoff part2
> create first non-root cgroup
Yesss!!
>
> which is perfectly ok and I do not see any reason to log it.
okay, I'll remove this redundant debug info.
>
>> + return 0;
>> + }
>> +
>> ctrl->map = vzalloc(ctrl->length * sizeof(void *));
>> - if (!ctrl->map)
>> + if (!ctrl->map) {
>> + ctrl->length = 0;
>> goto nomem;
>> + }
>>
>> for (idx = 0; idx < ctrl->length; idx++) {
>> page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> - if (!page)
>> + if (!page) {
>> + ctrl->length = 0;
>> goto not_enough_page;
>> + }
>> ctrl->map[idx] = page;
>> }
>> return 0;
>> @@ -383,6 +397,32 @@ nomem:
>
> ctrl->length = 0 under this label would be probably nicer than keeping
> it at two places.
Will take care of it.
>
>> return -ENOMEM;
>> }
>>
>> +/*
>> + * free buffer for swap_cgroup.
>> + */
>> +static void swap_cgroup_teardown(int type)
>> +{
>> + struct page **map;
>> + unsigned long length;
>> + struct swap_cgroup_ctrl *ctrl;
>> +
>> + ctrl = &swap_cgroup_ctrl[type];
>> + map = ctrl->map;
>> + length = ctrl->length;
>> + ctrl->map = NULL;
>> + ctrl->length = 0;
>> +
>> + if (map) {
>
> allocation path checks for ctrl->length so it would be good to unify
> both. They are handling the same case (gone swap).
yes, will fix it accordingly.
>
>> + unsigned long i;
>> + for (i = 0; i < length; i++) {
>> + struct page *page = map[i];
>> + if (page)
>> + __free_page(page);
>> + }
>> + vfree(map);
>> + }
>> +}
>> +
>> static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent,
>> struct swap_cgroup_ctrl **ctrlp)
>> {
>> @@ -474,6 +514,56 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent)
>> return sc ? sc->id : 0;
>> }
>>
>> +/*
>> + * Allocate swap cgroup accounting structures when the first non-root
>> + * memcg is created.
>> + */
>> +int swap_cgroup_init(void)
>> +{
>> + unsigned int type;
>> +
>> + if (!do_swap_account)
>> + return 0;
>> +
>> + if (atomic_add_return(1, &memsw_accounting_users) != 1)
>> + return 0;
>> +
>> + mutex_lock(&swap_cgroup_mutex);
>> + for (type = 0; type < nr_swapfiles; type++) {
>> + if (swap_cgroup_prepare(type) < 0) {
>> + mutex_unlock(&swap_cgroup_mutex);
>> + goto nomem;
>> + }
>> + }
>
> You should clean up those types that were successful...
Oops, those types should be deallocated...
Thanks,
-Jeff
>
>> + mutex_unlock(&swap_cgroup_mutex);
>> + return 0;
>> +
>> +nomem:
>> + pr_info("couldn't allocate enough memory for swap_cgroup "
>> + "while creating non-root memcg.\n");
>> + return -ENOMEM;
>> +}
>> +
>> +/*
>> + * Deallocate swap cgroup accounting structures on the last non-root
>> + * memcg removal.
>> + */
>> +void swap_cgroup_free(void)
>> +{
>> + unsigned int type;
>> +
>> + if (!do_swap_account)
>> + return;
>> +
>> + if (atomic_sub_return(1, &memsw_accounting_users))
>> + return;
>> +
>> + mutex_lock(&swap_cgroup_mutex);
>> + for (type = 0; type < nr_swapfiles; type++)
>> + swap_cgroup_teardown(type);
>> + mutex_unlock(&swap_cgroup_mutex);
>> +}
>> +
>> int swap_cgroup_swapon(int type, unsigned long max_pages)
>> {
>> unsigned long length;
>> @@ -482,20 +572,18 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
>> if (!do_swap_account)
>> return 0;
>>
>> - if (!atomic_read(&memsw_accounting_users))
>> - return 0;
>> -
>> length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
>>
>> ctrl = &swap_cgroup_ctrl[type];
>> mutex_lock(&swap_cgroup_mutex);
>> ctrl->length = length;
>> spin_lock_init(&ctrl->lock);
>> - if (swap_cgroup_prepare(type)) {
>> - /* memory shortage */
>> - ctrl->length = 0;
>> - mutex_unlock(&swap_cgroup_mutex);
>> - goto nomem;
>> + if (atomic_read(&memsw_accounting_users)) {
>> + if (swap_cgroup_prepare(type)) {
>> + /* memory shortage */
>> + mutex_unlock(&swap_cgroup_mutex);
>> + goto nomem;
>> + }
>> }
>> mutex_unlock(&swap_cgroup_mutex);
>>
>> --
>> 1.7.9.5
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free()
2013-01-29 15:51 ` Jeff Liu
@ 2013-01-29 16:09 ` Michal Hocko
0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 16:09 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Tue 29-01-13 23:51:42, Jeff Liu wrote:
> On 01/29/2013 10:56 PM, Michal Hocko wrote:
> > On Mon 28-01-13 18:54:47, Jeff Liu wrote:
> >> Introduce swap_cgroup_init()/swap_cgroup_free() to allocate buffers when creating the first
> >> non-root memcg and deallocate buffers on the last non-root memcg is gone.
> >
> > I think this deserves more words ;) At least it would be good to
> > describe contexts from which init and free might be called. What are the
> > locking rules.
> > Also swap_cgroup_destroy sounds more in pair with swap_cgroup_init.
> Will improve the comments log as well as fix the naming. Btw, I named
> it as swap_cgroup_free() because we have mem_cgroup_free() corresponding
> to mem_cgroup_init(). :)
I do see mem_cgroup_alloc and __mem_cgroup_free. Anyway this is not that
important. Consistent naming is not any rule. It is nice to have though.
So take these renaming suggestions as hints rather than you definitely
_have_ to do that.
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg
@ 2013-01-28 10:54 ` Jeff Liu
2013-01-29 9:59 ` Lord Glauber Costa of Sealand
2013-01-29 15:11 ` Michal Hocko
0 siblings, 2 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-28 10:54 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Glauber Costa, cgroups
Initialize swap_cgroup strucutres when creating a non-root memcg,
swap_cgroup_init() will be called for multiple times but only does
buffer allocation per the first non-root memcg.
Free swap_cgroup structures correspondingly on the last non-root memcg
removal.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Glauber Costa <glommer@parallels.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Sha Zhengju <handai.szj@taobao.com>
---
mm/memcontrol.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index afe5e86..031d242 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5998,6 +5998,7 @@ static void free_work(struct work_struct *work)
memcg = container_of(work, struct mem_cgroup, work_freeing);
__mem_cgroup_free(memcg);
+ swap_cgroup_free();
}
static void free_rcu(struct rcu_head *rcu_head)
@@ -6116,6 +6117,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
INIT_WORK(&stock->work, drain_local_stock);
}
} else {
+ if (swap_cgroup_init())
+ goto free_out;
parent = mem_cgroup_from_cont(cont->parent);
memcg->use_hierarchy = parent->use_hierarchy;
memcg->oom_kill_disable = parent->oom_kill_disable;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg
2013-01-28 10:54 ` [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg Jeff Liu
@ 2013-01-29 9:59 ` Lord Glauber Costa of Sealand
2013-01-29 10:27 ` Jeff Liu
2013-01-29 15:11 ` Michal Hocko
1 sibling, 1 reply; 33+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-29 9:59 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Michal Hocko, cgroups
On 01/28/2013 02:54 PM, Jeff Liu wrote:
> static void free_rcu(struct rcu_head *rcu_head)
> @@ -6116,6 +6117,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
> INIT_WORK(&stock->work, drain_local_stock);
> }
> } else {
> + if (swap_cgroup_init())
> + goto free_out;
> parent = mem_cgroup_from_cont(cont->parent);
> memcg->use_hierarchy = parent->use_hierarchy;
> memcg->oom_kill_disable = parent->oom_kill_disable;
Be aware that this will conflict with latest -mm where those were moved
to css_online().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg
2013-01-29 9:59 ` Lord Glauber Costa of Sealand
@ 2013-01-29 10:27 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 10:27 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand; +Cc: linux-mm, Michal Hocko, cgroups
On 01/29/2013 05:59 PM, Lord Glauber Costa of Sealand wrote:
> On 01/28/2013 02:54 PM, Jeff Liu wrote:
>> static void free_rcu(struct rcu_head *rcu_head)
>> @@ -6116,6 +6117,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>> INIT_WORK(&stock->work, drain_local_stock);
>> }
>> } else {
>> + if (swap_cgroup_init())
>> + goto free_out;
>> parent = mem_cgroup_from_cont(cont->parent);
>> memcg->use_hierarchy = parent->use_hierarchy;
>> memcg->oom_kill_disable = parent->oom_kill_disable;
> Be aware that this will conflict with latest -mm where those were moved
> to css_online().
Thanks for the kind reminder, will work out the next round posts based
on latest -mm or Michal's.
-Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg
2013-01-28 10:54 ` [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg Jeff Liu
2013-01-29 9:59 ` Lord Glauber Costa of Sealand
@ 2013-01-29 15:11 ` Michal Hocko
1 sibling, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 15:11 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:52, Jeff Liu wrote:
> Initialize swap_cgroup strucutres when creating a non-root memcg,
> swap_cgroup_init() will be called for multiple times but only does
> buffer allocation per the first non-root memcg.
>
> Free swap_cgroup structures correspondingly on the last non-root memcg
> removal.
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Sha Zhengju <handai.szj@taobao.com>
OK, looks good to me. Except for the outdated tree you are based on.
You should hook into mem_cgroup_css_online for the creation path.
Please fold this into the previous patch. It won't make it harder to
review and we will have all users of init/free in one patch.
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index afe5e86..031d242 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5998,6 +5998,7 @@ static void free_work(struct work_struct *work)
>
> memcg = container_of(work, struct mem_cgroup, work_freeing);
> __mem_cgroup_free(memcg);
> + swap_cgroup_free();
> }
>
> static void free_rcu(struct rcu_head *rcu_head)
> @@ -6116,6 +6117,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
> INIT_WORK(&stock->work, drain_local_stock);
> }
> } else {
> + if (swap_cgroup_init())
> + goto free_out;
> parent = mem_cgroup_from_cont(cont->parent);
> memcg->use_hierarchy = parent->use_hierarchy;
> memcg->oom_kill_disable = parent->oom_kill_disable;
> --
> 1.7.9.5
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon
2013-01-28 10:54 [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon Jeff Liu
` (5 preceding siblings ...)
2013-01-28 10:54 ` [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg Jeff Liu
@ 2013-01-29 15:15 ` Michal Hocko
2013-01-29 16:50 ` Jeff Liu
6 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2013-01-29 15:15 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-mm, Glauber Costa, cgroups
On Mon 28-01-13 18:54:27, Jeff Liu wrote:
> Hello,
>
> Here is the v2 patch set for disabling swap_cgroup structures allocation
> per swapon.
>
> In the initial version, one big issue is that I have missed the swap tracking
> for the root memcg, thanks Michal pointing it out. :)
>
> In order to solve it, the easiest approach I can think out is to bypass the root
> memcg swap accounting during the business and figure it out with some global stats,
> which means that we always return 0 per root memcg swap charge/uncharge stage, and
> this is inspired by another proposal from Zhengju:
> "memcg: Don't account root memcg page statistics -- https://lkml.org/lkml/2013/1/2/71"
>
> Besides that, another major fix is deallocate swap accounting structures on the last
> non-root memcg remove after all references to it are gone rather than doing it on
> mem_cgroup_destroy().
>
> Any comment are welcome!
Could you also post your testing methodology and results, please? It
would be also really great if you could sum up memory savings.
Anyway thanks this second version looks really promising.
> v1->v2:
> - Refactor swap_cgroup_swapon()/swap_cgroup_prepare(), to make the later can be
> used for allocating buffers per the first non-root memcg creation.
> - Bypass root memcg swap statistics, using the global stats to figure it out instead.
> - Export nr_swap_files which would be used when creating/freeing swap_cgroup
> - Deallocate swap accounting structures on the last non-root memcg removal
>
> Old patch set:
> v1:
> http://marc.info/?l=linux-mm&m=135461016823964&w=2
>
>
> Thanks,
> -Jeff
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon
2013-01-29 15:15 ` [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon Michal Hocko
@ 2013-01-29 16:50 ` Jeff Liu
0 siblings, 0 replies; 33+ messages in thread
From: Jeff Liu @ 2013-01-29 16:50 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Glauber Costa, cgroups
On 01/29/2013 11:15 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:27, Jeff Liu wrote:
>> Hello,
>>
>> Here is the v2 patch set for disabling swap_cgroup structures allocation
>> per swapon.
>>
>> In the initial version, one big issue is that I have missed the swap tracking
>> for the root memcg, thanks Michal pointing it out. :)
>>
>> In order to solve it, the easiest approach I can think out is to bypass the root
>> memcg swap accounting during the business and figure it out with some global stats,
>> which means that we always return 0 per root memcg swap charge/uncharge stage, and
>> this is inspired by another proposal from Zhengju:
>> "memcg: Don't account root memcg page statistics -- https://lkml.org/lkml/2013/1/2/71"
>>
>> Besides that, another major fix is deallocate swap accounting structures on the last
>> non-root memcg remove after all references to it are gone rather than doing it on
>> mem_cgroup_destroy().
>>
>> Any comment are welcome!
>
> Could you also post your testing methodology and results, please? It
> would be also really great if you could sum up memory savings.
Sure, I'll post those info in the next try, and will revisit [PATCH v2
2/6] and answer the comments from you and Glauber since my head is not
very clear now. :)
>
> Anyway thanks this second version looks really promising.
Thanks for your kind review, have a nice day!
-Jeff
>
>> v1->v2:
>> - Refactor swap_cgroup_swapon()/swap_cgroup_prepare(), to make the later can be
>> used for allocating buffers per the first non-root memcg creation.
>> - Bypass root memcg swap statistics, using the global stats to figure it out instead.
>> - Export nr_swap_files which would be used when creating/freeing swap_cgroup
>> - Deallocate swap accounting structures on the last non-root memcg removal
>>
>> Old patch set:
>> v1:
>> http://marc.info/?l=linux-mm&m=135461016823964&w=2
>>
>>
>> Thanks,
>> -Jeff
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 33+ messages in thread