* [PATCH v4 1/9] Revert "memcg: avoid dangling reference count in creation failure."
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
@ 2013-06-14 1:53 ` Li Zefan
2013-06-14 1:53 ` [PATCH v4 2/9] memcg, kmem: fix reference count handling on the error path Li Zefan
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
From: Michal Hocko <mhocko@suse.cz>
This reverts commit e4715f01be697a3730c78f8ffffb595591d6a88c
mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2e851f4..0bacc0d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6332,8 +6332,6 @@ mem_cgroup_css_online(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
}
return error;
}
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 2/9] memcg, kmem: fix reference count handling on the error path
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
2013-06-14 1:53 ` [PATCH v4 1/9] Revert "memcg: avoid dangling reference count in creation failure." Li Zefan
@ 2013-06-14 1:53 ` Li Zefan
2013-06-14 1:54 ` [PATCH v4 3/9] memcg: use css_get() in sock_update_memcg() Li Zefan
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
From: Michal Hocko <mhocko@suse.cz>
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
fails. This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem that
should clean up after itself so this patch moves mem_cgroup_put over
there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.
The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.
Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0bacc0d..b5ec4da 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6325,14 +6325,6 @@ mem_cgroup_css_online(struct cgroup *cont)
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- }
return error;
}
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 3/9] memcg: use css_get() in sock_update_memcg()
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
2013-06-14 1:53 ` [PATCH v4 1/9] Revert "memcg: avoid dangling reference count in creation failure." Li Zefan
2013-06-14 1:53 ` [PATCH v4 2/9] memcg, kmem: fix reference count handling on the error path Li Zefan
@ 2013-06-14 1:54 ` Li Zefan
2013-06-14 1:54 ` [PATCH v4 4/9] memcg: don't use mem_cgroup_get() when creating a kmemcg cache Li Zefan
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Use css_get/css_put instead of mem_cgroup_get/put.
Note, if at the same time someone is moving @current to a different
cgroup and removing the old cgroup, css_tryget() may return false,
and sock->sk_cgrp won't be initialized, which is fine.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b5ec4da..a8e8d2a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -561,15 +561,15 @@ void sock_update_memcg(struct sock *sk)
*/
if (sk->sk_cgrp) {
BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
- mem_cgroup_get(sk->sk_cgrp->memcg);
+ css_get(&sk->sk_cgrp->memcg->css);
return;
}
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
cg_proto = sk->sk_prot->proto_cgroup(memcg);
- if (!mem_cgroup_is_root(memcg) && memcg_proto_active(cg_proto)) {
- mem_cgroup_get(memcg);
+ if (!mem_cgroup_is_root(memcg) &&
+ memcg_proto_active(cg_proto) && css_tryget(&memcg->css)) {
sk->sk_cgrp = cg_proto;
}
rcu_read_unlock();
@@ -583,7 +583,7 @@ void sock_release_memcg(struct sock *sk)
struct mem_cgroup *memcg;
WARN_ON(!sk->sk_cgrp->memcg);
memcg = sk->sk_cgrp->memcg;
- mem_cgroup_put(memcg);
+ css_put(&sk->sk_cgrp->memcg->css);
}
}
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 4/9] memcg: don't use mem_cgroup_get() when creating a kmemcg cache
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (2 preceding siblings ...)
2013-06-14 1:54 ` [PATCH v4 3/9] memcg: use css_get() in sock_update_memcg() Li Zefan
@ 2013-06-14 1:54 ` Li Zefan
2013-06-14 1:54 ` [PATCH v4 5/9] memcg: use css_get/put when charging/uncharging kmem Li Zefan
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Use css_get()/css_put() instead of mem_cgroup_get()/mem_cgroup_put().
There are two things being done in the current code:
First, we acquired a css_ref to make sure that the underlying cgroup
would not go away. That is a short lived reference, and it is put as
soon as the cache is created.
At this point, we acquire a long-lived per-cache memcg reference count
to guarantee that the memcg will still be alive.
so it is:
enqueue: css_get
create : memcg_get, css_put
destroy: memcg_put
So we only need to get rid of the memcg_get, change the memcg_put to
css_put, and get rid of the now extra css_put.
(This changelog is mostly written by Glauber)
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a8e8d2a..466c595 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3252,7 +3252,7 @@ void memcg_release_cache(struct kmem_cache *s)
list_del(&s->memcg_params->list);
mutex_unlock(&memcg->slab_caches_mutex);
- mem_cgroup_put(memcg);
+ css_put(&memcg->css);
out:
kfree(s->memcg_params);
}
@@ -3412,16 +3412,18 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
mutex_lock(&memcg_cache_mutex);
new_cachep = cachep->memcg_params->memcg_caches[idx];
- if (new_cachep)
+ if (new_cachep) {
+ css_put(&memcg->css);
goto out;
+ }
new_cachep = kmem_cache_dup(memcg, cachep);
if (new_cachep == NULL) {
new_cachep = cachep;
+ css_put(&memcg->css);
goto out;
}
- mem_cgroup_get(memcg);
atomic_set(&new_cachep->memcg_params->nr_pages , 0);
cachep->memcg_params->memcg_caches[idx] = new_cachep;
@@ -3509,8 +3511,6 @@ static void memcg_create_cache_work_func(struct work_struct *w)
cw = container_of(w, struct create_work, work);
memcg_create_kmem_cache(cw->memcg, cw->cachep);
- /* Drop the reference gotten when we enqueued. */
- css_put(&cw->memcg->css);
kfree(cw);
}
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 5/9] memcg: use css_get/put when charging/uncharging kmem
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (3 preceding siblings ...)
2013-06-14 1:54 ` [PATCH v4 4/9] memcg: don't use mem_cgroup_get() when creating a kmemcg cache Li Zefan
@ 2013-06-14 1:54 ` Li Zefan
2013-06-28 22:59 ` Andrew Morton
2013-06-14 1:55 ` [PATCH v4 6/9] memcg: use css_get/put for swap memcg Li Zefan
` (4 subsequent siblings)
9 siblings, 1 reply; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Use css_get/put instead of mem_cgroup_get/put.
We can't do a simple replacement, because here mem_cgroup_put()
is called during mem_cgroup_css_free(), while mem_cgroup_css_free()
won't be called until css refcnt goes down to 0.
Instead we increment css refcnt in mem_cgroup_css_offline(), and
then check if there's still kmem charges. If not, css refcnt will
be decremented immediately, otherwise the refcnt will be released
after the last kmem allocation is uncahred.
v3:
- changed wmb() to smp_wmb(), and moved it to memcg_kmem_mark_dead(),
and added comment.
v2:
- added wmb() in kmem_cgroup_css_offline(), pointed out by Michal
- revised comments as suggested by Michal
- fixed to check if kmem is activated in kmem_cgroup_css_offline()
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
---
mm/memcontrol.c | 70 ++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 45 insertions(+), 25 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 466c595..8f20a9c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -416,6 +416,11 @@ static void memcg_kmem_clear_activated(struct mem_cgroup *memcg)
static void memcg_kmem_mark_dead(struct mem_cgroup *memcg)
{
+ /*
+ * We need to call css_get() first, because memcg_uncharge_kmem()
+ * will call css_put() if it sees the memcg is dead.
+ */
+ smp_wmb();
if (test_bit(KMEM_ACCOUNTED_ACTIVE, &memcg->kmem_account_flags))
set_bit(KMEM_ACCOUNTED_DEAD, &memcg->kmem_account_flags);
}
@@ -3060,8 +3065,16 @@ static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
if (res_counter_uncharge(&memcg->kmem, size))
return;
+ /*
+ * Releases a reference taken in kmem_cgroup_css_offline in case
+ * this last uncharge is racing with the offlining code or it is
+ * outliving the memcg existence.
+ *
+ * The memory barrier imposed by test&clear is paired with the
+ * explicit one in memcg_kmem_mark_dead().
+ */
if (memcg_kmem_test_and_clear_dead(memcg))
- mem_cgroup_put(memcg);
+ css_put(&memcg->css);
}
void memcg_cache_list_add(struct mem_cgroup *memcg, struct kmem_cache *cachep)
@@ -5165,14 +5178,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
* starts accounting before all call sites are patched
*/
memcg_kmem_set_active(memcg);
-
- /*
- * kmem charges can outlive the cgroup. In the case of slab
- * pages, for instance, a page contain objects from various
- * processes, so it is unfeasible to migrate them away. We
- * need to reference count the memcg because of that.
- */
- mem_cgroup_get(memcg);
} else
ret = res_counter_set_limit(&memcg->kmem, val);
out:
@@ -5205,12 +5210,10 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
goto out;
/*
- * destroy(), called if we fail, will issue static_key_slow_inc() and
- * mem_cgroup_put() if kmem is enabled. We have to either call them
- * unconditionally, or clear the KMEM_ACTIVE flag. I personally find
- * this more consistent, since it always leads to the same destroy path
+ * __mem_cgroup_free() will issue static_key_slow_dec() because this
+ * memcg is active already. If the later initialization fails then the
+ * cgroup core triggers the cleanup so we do not have to do it here.
*/
- mem_cgroup_get(memcg);
static_key_slow_inc(&memcg_kmem_enabled_key);
mutex_lock(&set_limit_mutex);
@@ -5893,23 +5896,38 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
return mem_cgroup_sockets_init(memcg, ss);
}
-static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
+static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
{
- mem_cgroup_sockets_destroy(memcg);
+ if (!memcg_kmem_is_active(memcg))
+ return;
+
+ /*
+ * kmem charges can outlive the cgroup. In the case of slab
+ * pages, for instance, a page contain objects from various
+ * processes. As we prevent from taking a reference for every
+ * such allocation we have to be careful when doing uncharge
+ * (see memcg_uncharge_kmem) and here during offlining.
+ *
+ * The idea is that that only the _last_ uncharge which sees
+ * the dead memcg will drop the last reference. An additional
+ * reference is taken here before the group is marked dead
+ * which is then paired with css_put during uncharge resp. here.
+ *
+ * Although this might sound strange as this path is called from
+ * css_offline() when the referencemight have dropped down to 0
+ * and shouldn't be incremented anymore (css_tryget would fail)
+ * we do not have other options because of the kmem allocations
+ * lifetime.
+ */
+ css_get(&memcg->css);
memcg_kmem_mark_dead(memcg);
if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
return;
- /*
- * Charges already down to 0, undo mem_cgroup_get() done in the charge
- * path here, being careful not to race with memcg_uncharge_kmem: it is
- * possible that the charges went down to 0 between mark_dead and the
- * res_counter read, so in that case, we don't need the put
- */
if (memcg_kmem_test_and_clear_dead(memcg))
- mem_cgroup_put(memcg);
+ css_put(&memcg->css);
}
#else
static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
@@ -5917,7 +5935,7 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
return 0;
}
-static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
+static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
{
}
#endif
@@ -6350,6 +6368,8 @@ static void mem_cgroup_css_offline(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+ kmem_cgroup_css_offline(memcg);
+
mem_cgroup_invalidate_reclaim_iterators(memcg);
mem_cgroup_reparent_charges(memcg);
mem_cgroup_destroy_all_caches(memcg);
@@ -6359,7 +6379,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
- kmem_cgroup_destroy(memcg);
+ mem_cgroup_sockets_destroy(memcg);
mem_cgroup_put(memcg);
}
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 5/9] memcg: use css_get/put when charging/uncharging kmem
2013-06-14 1:54 ` [PATCH v4 5/9] memcg: use css_get/put when charging/uncharging kmem Li Zefan
@ 2013-06-28 22:59 ` Andrew Morton
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2013-06-28 22:59 UTC (permalink / raw)
To: Li Zefan
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
On Fri, 14 Jun 2013 09:54:57 +0800 Li Zefan <lizefan@huawei.com> wrote:
> Use css_get/put instead of mem_cgroup_get/put.
>
> We can't do a simple replacement, because here mem_cgroup_put()
> is called during mem_cgroup_css_free(), while mem_cgroup_css_free()
> won't be called until css refcnt goes down to 0.
>
> Instead we increment css refcnt in mem_cgroup_css_offline(), and
> then check if there's still kmem charges. If not, css refcnt will
> be decremented immediately, otherwise the refcnt will be released
> after the last kmem allocation is uncahred.
>
> ...
>
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -416,6 +416,11 @@ static void memcg_kmem_clear_activated(struct mem_cgroup *memcg)
>
> static void memcg_kmem_mark_dead(struct mem_cgroup *memcg)
> {
> + /*
> + * We need to call css_get() first, because memcg_uncharge_kmem()
> + * will call css_put() if it sees the memcg is dead.
> + */
> + smp_wmb();
> if (test_bit(KMEM_ACCOUNTED_ACTIVE, &memcg->kmem_account_flags))
> set_bit(KMEM_ACCOUNTED_DEAD, &memcg->kmem_account_flags);
> }
That comment is rather confusing and unhelpful. "We need to call
css_get", but we clearly *don't* call css_get(). I guess we want
--- a/mm/memcontrol.c~memcg-use-css_get-put-when-charging-uncharging-kmem-fix
+++ a/mm/memcontrol.c
@@ -407,7 +407,7 @@ static void memcg_kmem_clear_activated(s
static void memcg_kmem_mark_dead(struct mem_cgroup *memcg)
{
/*
- * We need to call css_get() first, because memcg_uncharge_kmem()
+ * Our caller must use css_get() first, because memcg_uncharge_kmem()
* will call css_put() if it sees the memcg is dead.
*/
smp_wmb();
_
But it's still not good.
- What is the smp_wmb() for? These barriers should always be
documented so readers can see what we're barriering against but this
one is floating around unexplained.
- What does memcg_uncharge_kmem() have to do with all this?
memcg_kmem_mark_dead() just does a set_bit() - what has that to do
with memcg_kmem_mark_dead().
So I dunno, it's all clear as mud and I hope we can do better.
> @@ -3060,8 +3065,16 @@ static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
> if (res_counter_uncharge(&memcg->kmem, size))
> return;
>
> + /*
> + * Releases a reference taken in kmem_cgroup_css_offline in case
> + * this last uncharge is racing with the offlining code or it is
> + * outliving the memcg existence.
> + *
> + * The memory barrier imposed by test&clear is paired with the
> + * explicit one in memcg_kmem_mark_dead().
> + */
OK, that clears things up a bit. A small bit.
This code is far too tricky :(
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 6/9] memcg: use css_get/put for swap memcg
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (4 preceding siblings ...)
2013-06-14 1:54 ` [PATCH v4 5/9] memcg: use css_get/put when charging/uncharging kmem Li Zefan
@ 2013-06-14 1:55 ` Li Zefan
2013-06-14 1:55 ` [PATCH v4 7/9] memcg: don't need to get a reference to the parent Li Zefan
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Use css_get/put instead of mem_cgroup_get/put. A simple replacement
will do.
The historical reason that memcg has its own refcnt instead of always
using css_get/put, is that cgroup couldn't be removed if there're still
css refs, so css refs can't be used as long-lived reference. The
situation has changed so that rmdir a cgroup will succeed regardless
css refs, but won't be freed until css refs goes down to 0.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8f20a9c..a63d9f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4213,12 +4213,12 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype,
unlock_page_cgroup(pc);
/*
* even after unlock, we have memcg->res.usage here and this memcg
- * will never be freed.
+ * will never be freed, so it's safe to call css_get().
*/
memcg_check_events(memcg, page);
if (do_swap_account && ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) {
mem_cgroup_swap_statistics(memcg, true);
- mem_cgroup_get(memcg);
+ css_get(&memcg->css);
}
/*
* Migration does not charge the res_counter for the
@@ -4330,7 +4330,7 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
/*
* record memcg information, if swapout && memcg != NULL,
- * mem_cgroup_get() was called in uncharge().
+ * css_get() was called in uncharge().
*/
if (do_swap_account && swapout && memcg)
swap_cgroup_record(ent, css_id(&memcg->css));
@@ -4361,7 +4361,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
if (!mem_cgroup_is_root(memcg))
res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
mem_cgroup_swap_statistics(memcg, false);
- mem_cgroup_put(memcg);
+ css_put(&memcg->css);
}
rcu_read_unlock();
}
@@ -4395,11 +4395,14 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
* This function is only called from task migration context now.
* It postpones res_counter and refcount handling till the end
* of task migration(mem_cgroup_clear_mc()) for performance
- * improvement. But we cannot postpone mem_cgroup_get(to)
- * because if the process that has been moved to @to does
- * swap-in, the refcount of @to might be decreased to 0.
+ * improvement. But we cannot postpone css_get(to) because if
+ * the process that has been moved to @to does swap-in, the
+ * refcount of @to might be decreased to 0.
+ *
+ * We are in attach() phase, so the cgroup is guaranteed to be
+ * alive, so we can just call css_get().
*/
- mem_cgroup_get(to);
+ css_get(&to->css);
return 0;
}
return -EINVAL;
@@ -6690,6 +6693,7 @@ static void __mem_cgroup_clear_mc(void)
{
struct mem_cgroup *from = mc.from;
struct mem_cgroup *to = mc.to;
+ int i;
/* we must uncharge all the leftover precharges from mc.to */
if (mc.precharge) {
@@ -6710,7 +6714,9 @@ static void __mem_cgroup_clear_mc(void)
if (!mem_cgroup_is_root(mc.from))
res_counter_uncharge(&mc.from->memsw,
PAGE_SIZE * mc.moved_swap);
- __mem_cgroup_put(mc.from, mc.moved_swap);
+
+ for (i = 0; i < mc.moved_swap; i++)
+ css_put(&mc.from->css);
if (!mem_cgroup_is_root(mc.to)) {
/*
@@ -6720,7 +6726,7 @@ static void __mem_cgroup_clear_mc(void)
res_counter_uncharge(&mc.to->res,
PAGE_SIZE * mc.moved_swap);
}
- /* we've already done mem_cgroup_get(mc.to) */
+ /* we've already done css_get(mc.to) */
mc.moved_swap = 0;
}
memcg_oom_recover(from);
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 7/9] memcg: don't need to get a reference to the parent
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (5 preceding siblings ...)
2013-06-14 1:55 ` [PATCH v4 6/9] memcg: use css_get/put for swap memcg Li Zefan
@ 2013-06-14 1:55 ` Li Zefan
2013-06-14 1:55 ` [PATCH v4 8/9] memcg: kill memcg refcnt Li Zefan
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
The cgroup core guarantees it's always safe to access the parent.
v2:
- added a comment in mem_cgroup_put() as suggested by Michal
- removed mem_cgroup_get(), otherwise gcc will warn that it's not used
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a63d9f1..443fb45 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -513,7 +513,6 @@ enum res_type {
*/
static DEFINE_MUTEX(memcg_create_mutex);
-static void mem_cgroup_get(struct mem_cgroup *memcg);
static void mem_cgroup_put(struct mem_cgroup *memcg);
static inline
@@ -6210,19 +6209,10 @@ static void free_rcu(struct rcu_head *rcu_head)
schedule_work(&memcg->work_freeing);
}
-static void mem_cgroup_get(struct mem_cgroup *memcg)
-{
- atomic_inc(&memcg->refcnt);
-}
-
static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
{
- if (atomic_sub_and_test(count, &memcg->refcnt)) {
- struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+ if (atomic_sub_and_test(count, &memcg->refcnt))
call_rcu(&memcg->rcu_freeing, free_rcu);
- if (parent)
- mem_cgroup_put(parent);
- }
}
static void mem_cgroup_put(struct mem_cgroup *memcg)
@@ -6325,12 +6315,9 @@ mem_cgroup_css_online(struct cgroup *cont)
res_counter_init(&memcg->kmem, &parent->kmem);
/*
- * We increment refcnt of the parent to ensure that we can
- * safely access it on res_counter_charge/uncharge.
- * This refcnt will be decremented when freeing this
- * mem_cgroup(see mem_cgroup_put).
+ * No need to take a reference to the parent because cgroup
+ * core guarantees its existence.
*/
- mem_cgroup_get(parent);
} else {
res_counter_init(&memcg->res, NULL);
res_counter_init(&memcg->memsw, NULL);
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 8/9] memcg: kill memcg refcnt
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (6 preceding siblings ...)
2013-06-14 1:55 ` [PATCH v4 7/9] memcg: don't need to get a reference to the parent Li Zefan
@ 2013-06-14 1:55 ` Li Zefan
2013-06-14 1:56 ` [PATCH v4 9/9] memcg: don't need to free memcg via RCU or workqueue Li Zefan
2013-06-19 1:29 ` [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Now memcg has the same life cycle as its corresponding cgroup.
Kill the useless refcnt.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 443fb45..7b622c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -303,8 +303,6 @@ struct mem_cgroup {
bool oom_lock;
atomic_t under_oom;
- atomic_t refcnt;
-
int swappiness;
/* OOM-Killer disable */
int oom_kill_disable;
@@ -513,8 +511,6 @@ enum res_type {
*/
static DEFINE_MUTEX(memcg_create_mutex);
-static void mem_cgroup_put(struct mem_cgroup *memcg);
-
static inline
struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
{
@@ -6209,17 +6205,6 @@ static void free_rcu(struct rcu_head *rcu_head)
schedule_work(&memcg->work_freeing);
}
-static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
-{
- if (atomic_sub_and_test(count, &memcg->refcnt))
- call_rcu(&memcg->rcu_freeing, free_rcu);
-}
-
-static void mem_cgroup_put(struct mem_cgroup *memcg)
-{
- __mem_cgroup_put(memcg, 1);
-}
-
/*
* Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
*/
@@ -6279,7 +6264,6 @@ mem_cgroup_css_alloc(struct cgroup *cont)
memcg->last_scanned_node = MAX_NUMNODES;
INIT_LIST_HEAD(&memcg->oom_notify);
- atomic_set(&memcg->refcnt, 1);
memcg->move_charge_at_immigrate = 0;
mutex_init(&memcg->thresholds_lock);
spin_lock_init(&memcg->move_lock);
@@ -6371,7 +6355,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
mem_cgroup_sockets_destroy(memcg);
- mem_cgroup_put(memcg);
+ call_rcu(&memcg->rcu_freeing, free_rcu);
}
#ifdef CONFIG_MMU
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 9/9] memcg: don't need to free memcg via RCU or workqueue
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (7 preceding siblings ...)
2013-06-14 1:55 ` [PATCH v4 8/9] memcg: kill memcg refcnt Li Zefan
@ 2013-06-14 1:56 ` Li Zefan
2013-06-19 1:29 ` [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-14 1:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko, Hugh Dickins
Now memcg has the same life cycle with its corresponding cgroup, and
a cgroup is freed via RCU and then mem_cgroup_css_free() will be
called in a work function, so we can simply call __mem_cgroup_free()
in mem_cgroup_css_free().
This actually reverts 59927fb984de1703c67bc640c3e522d8b5276c73
("memcg: free mem_cgroup by RCU to fix oops").
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 51 +++++----------------------------------------------
1 file changed, 5 insertions(+), 46 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b622c3..234f311 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -267,28 +267,10 @@ struct mem_cgroup {
/* vmpressure notifications */
struct vmpressure vmpressure;
- union {
- /*
- * the counter to account for mem+swap usage.
- */
- struct res_counter memsw;
-
- /*
- * rcu_freeing is used only when freeing struct mem_cgroup,
- * so put it into a union to avoid wasting more memory.
- * It must be disjoint from the css field. It could be
- * in a union with the res field, but res plays a much
- * larger part in mem_cgroup life than memsw, and might
- * be of interest, even at time of free, when debugging.
- * So share rcu_head with the less interesting memsw.
- */
- struct rcu_head rcu_freeing;
- /*
- * We also need some space for a worker in deferred freeing.
- * By the time we call it, rcu_freeing is no longer in use.
- */
- struct work_struct work_freeing;
- };
+ /*
+ * the counter to account for mem+swap usage.
+ */
+ struct res_counter memsw;
/*
* the counter to account for kernel memory usage.
@@ -6182,29 +6164,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
vfree(memcg);
}
-
-/*
- * Helpers for freeing a kmalloc()ed/vzalloc()ed mem_cgroup by RCU,
- * but in process context. The work_freeing structure is overlaid
- * on the rcu_freeing structure, which itself is overlaid on memsw.
- */
-static void free_work(struct work_struct *work)
-{
- struct mem_cgroup *memcg;
-
- memcg = container_of(work, struct mem_cgroup, work_freeing);
- __mem_cgroup_free(memcg);
-}
-
-static void free_rcu(struct rcu_head *rcu_head)
-{
- struct mem_cgroup *memcg;
-
- memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
- INIT_WORK(&memcg->work_freeing, free_work);
- schedule_work(&memcg->work_freeing);
-}
-
/*
* Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
*/
@@ -6355,7 +6314,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
mem_cgroup_sockets_destroy(memcg);
- call_rcu(&memcg->rcu_freeing, free_rcu);
+ __mem_cgroup_free(memcg);
}
#ifdef CONFIG_MMU
--
1.8.0.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup
2013-06-14 1:53 [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup Li Zefan
` (8 preceding siblings ...)
2013-06-14 1:56 ` [PATCH v4 9/9] memcg: don't need to free memcg via RCU or workqueue Li Zefan
@ 2013-06-19 1:29 ` Li Zefan
9 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-06-19 1:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner,
LKML, Cgroups, linux-mm, Michal Hocko
Hi Andrew, any chance for this patchset to be queued for 3.11?
On 2013/6/14 9:53, Li Zefan wrote:
> Hi Andrew,
>
> All the patches in this patchset has been acked by Michal and Kamezawa-san, and
> it's ready to be merged into -mm.
>
> I have another pending patchset that kills css_id, which depends on this one.
>
>
> Changes since v3:
> - rebased against mmotm 2013-06-06-16-19
> - changed wmb() to smp_wmb() and moved it to memcg_kmem_mark_dead() and added
> more comment.
>
> Changes since v2:
>
> - rebased against 3.10-rc1
> - collected some acks
> - the two memcg bug fixes has been merged into mainline
> - the cgroup core patch has been merged into mainline
>
> Changes since v1:
>
> - wrote better changelog and added acked-by and reviewed-by tags
> - revised some comments as suggested by Michal
> - added a wmb() in kmem_cgroup_css_offline(), pointed out by Michal
> - fixed a bug which causes a css_put() never be called
>
>
> Now memcg has its own refcnt, so when a cgroup is destroyed, the memcg can
> still be alive. This patchset converts memcg to always use css_get/put, so
> memcg will have the same life cycle as its corresponding cgroup.
>
> The historical reason that memcg didn't use css_get in some cases, is that
> cgroup couldn't be removed if there're still css refs. The situation has
> changed so that rmdir a cgroup will succeed regardless css refs, but won't
> be freed until css refs goes down to 0.
>
> Since the introduction of kmemcg, the memcg refcnt handling grows even more
> complicated. This patchset greately simplifies memcg's life cycle management.
>
> Also, after those changes, we can convert memcg to use cgroup->id, and then
> we can kill css_id.
>
> Li Zefan (7):
> memcg: use css_get() in sock_update_memcg()
> memcg: don't use mem_cgroup_get() when creating a kmemcg cache
> memcg: use css_get/put when charging/uncharging kmem
> memcg: use css_get/put for swap memcg
> memcg: don't need to get a reference to the parent
> memcg: kill memcg refcnt
> memcg: don't need to free memcg via RCU or workqueue
>
> Michal Hocko (2):
> Revert "memcg: avoid dangling reference count in creation failure."
> memcg, kmem: fix reference count handling on the error path
>
> mm/memcontrol.c | 208 +++++++++++++++++++++-----------------------------------
> 1 file changed, 77 insertions(+), 131 deletions(-)
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread