* [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-02 15:04 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-02 15:04 UTC (permalink / raw)
To: Glauber Costa
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Tue 02-04-13 18:33:30, Glauber Costa wrote:
> On 04/02/2013 06:28 PM, Michal Hocko wrote:
> > On Tue 02-04-13 18:20:56, Glauber Costa wrote:
> >> On 04/02/2013 06:16 PM, Michal Hocko wrote:
> >>> mem_cgroup_css_online
> >>> memcg_init_kmem
> >>> mem_cgroup_get # refcnt = 2
> >>> memcg_update_all_caches
> >>> memcg_update_cache_size # fails with ENOMEM
> >>
> >> Here is the thing: this one in kmem only happens for kmem enabled
> >> memcgs. For those, we tend to do a get once, and put only when the last
> >> kmem reference is gone.
> >>
> >> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
> >> the mem_cgroup_put() in css_free.
> >
> > So we need this, right?
> > ---
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..2ef875d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
> > return ret;
> > }
> > #endif /* CONFIG_MEMCG_KMEM */
> > @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
> >
> > error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> > mutex_unlock(&memcg_create_mutex);
> > - if (error) {
> > - /*
> > - * We call put now because our (and parent's) refcnts
> > - * are already in place. mem_cgroup_put() will internally
> > - * call __mem_cgroup_free, so return directly
> > - */
> > - mem_cgroup_put(memcg);
> > - if (parent->use_hierarchy)
> > - mem_cgroup_put(parent);
> > - }
> > return error;
> > }
> >
> >
> Yes, indeed you are very right - and thanks for looking at such depth.
So what about the patch bellow? It seems that I provoked all this mess
but my brain managed to push it away so I do not remember why I thought
the parent needs reference drop... It is "only" 3.9 thing fortunately.
---
>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
From: Li Zefan <lizefan@huawei.com>
Date: Tue, 2 Apr 2013 16:37:39 +0200
Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
fails
mem_cgroup_css_online is called with memcg with refcnt = 1 and it
expects that mem_cgroup_css_free will drop this last reference.
This doesn't hold when memcg_init_kmem fails though and a reference is
dropped for both memcg and its parent explicitly if it returns with an
error.
This is not correct for two reasons. Firstly mem_cgroup_put on parent is
excessive because mem_cgroup_put is hierarchy aware and secondly only
memcg_propagate_kmem takes an additional reference.
The first one is a real use-after-free bug introduced by e4715f01
(memcg: avoid dangling reference count in creation failure)
The later one is non-issue right now because the only implementation
of init_cgroup seems to be tcp_init_cgroup which doesn't fail
but it is better to make the error handling saner and move the
mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..cf9ba7e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
ret = memcg_update_cache_sizes(memcg);
mutex_unlock(&set_limit_mutex);
out:
+ if (ret)
+ mem_cgroup_put(memcg);
return ret;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
- }
+
return error;
}
--
1.7.10.4
--
1.7.10.4
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 66+ messages in thread* [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-02 15:04 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-02 15:04 UTC (permalink / raw)
To: Glauber Costa
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Tue 02-04-13 18:33:30, Glauber Costa wrote:
> On 04/02/2013 06:28 PM, Michal Hocko wrote:
> > On Tue 02-04-13 18:20:56, Glauber Costa wrote:
> >> On 04/02/2013 06:16 PM, Michal Hocko wrote:
> >>> mem_cgroup_css_online
> >>> memcg_init_kmem
> >>> mem_cgroup_get # refcnt = 2
> >>> memcg_update_all_caches
> >>> memcg_update_cache_size # fails with ENOMEM
> >>
> >> Here is the thing: this one in kmem only happens for kmem enabled
> >> memcgs. For those, we tend to do a get once, and put only when the last
> >> kmem reference is gone.
> >>
> >> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
> >> the mem_cgroup_put() in css_free.
> >
> > So we need this, right?
> > ---
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..2ef875d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
> > return ret;
> > }
> > #endif /* CONFIG_MEMCG_KMEM */
> > @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
> >
> > error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> > mutex_unlock(&memcg_create_mutex);
> > - if (error) {
> > - /*
> > - * We call put now because our (and parent's) refcnts
> > - * are already in place. mem_cgroup_put() will internally
> > - * call __mem_cgroup_free, so return directly
> > - */
> > - mem_cgroup_put(memcg);
> > - if (parent->use_hierarchy)
> > - mem_cgroup_put(parent);
> > - }
> > return error;
> > }
> >
> >
> Yes, indeed you are very right - and thanks for looking at such depth.
So what about the patch bellow? It seems that I provoked all this mess
but my brain managed to push it away so I do not remember why I thought
the parent needs reference drop... It is "only" 3.9 thing fortunately.
---
^ permalink raw reply [flat|nested] 66+ messages in thread[parent not found: <20130402150422.GB32520-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-02 15:04 ` Michal Hocko
(?)
@ 2013-04-03 3:49 ` Li Zefan
-1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 3:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
>>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Date: Tue, 2 Apr 2013 16:37:39 +0200
> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> fails
>
> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> expects that mem_cgroup_css_free will drop this last reference.
> This doesn't hold when memcg_init_kmem fails though and a reference is
> dropped for both memcg and its parent explicitly if it returns with an
> error.
>
> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> excessive because mem_cgroup_put is hierarchy aware and secondly only
> memcg_propagate_kmem takes an additional reference.
>
> The first one is a real use-after-free bug introduced by e4715f01
> (memcg: avoid dangling reference count in creation failure)
>
> The later one is non-issue right now because the only implementation
> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> but it is better to make the error handling saner and move the
> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>
> Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> ---
> mm/memcontrol.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f608546..cf9ba7e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> ret = memcg_update_cache_sizes(memcg);
> mutex_unlock(&set_limit_mutex);
> out:
> + if (ret)
> + mem_cgroup_put(memcg);
Correct me if I'm wrong, but I think:
When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
to mem_cgroup_css_free() is called by cgroup core:
static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
kmem_cgroup_destroy(memcg);
mem_cgroup_put(memcg);
}
static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
{
mem_cgroup_sockets_destroy(memcg);
memcg_kmem_mark_dead(memcg);
if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
return;
if (memcg_kmem_test_and_clear_dead(memcg))
mem_cgroup_put(memcg); <------- !!!!!!!!!
}
> return ret;
> }
> #endif /* CONFIG_MEMCG_KMEM */
> @@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> +
> return error;
> }
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 3:49 ` Li Zefan
0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 3:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
>>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> From: Li Zefan <lizefan@huawei.com>
> Date: Tue, 2 Apr 2013 16:37:39 +0200
> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> fails
>
> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> expects that mem_cgroup_css_free will drop this last reference.
> This doesn't hold when memcg_init_kmem fails though and a reference is
> dropped for both memcg and its parent explicitly if it returns with an
> error.
>
> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> excessive because mem_cgroup_put is hierarchy aware and secondly only
> memcg_propagate_kmem takes an additional reference.
>
> The first one is a real use-after-free bug introduced by e4715f01
> (memcg: avoid dangling reference count in creation failure)
>
> The later one is non-issue right now because the only implementation
> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> but it is better to make the error handling saner and move the
> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f608546..cf9ba7e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> ret = memcg_update_cache_sizes(memcg);
> mutex_unlock(&set_limit_mutex);
> out:
> + if (ret)
> + mem_cgroup_put(memcg);
Correct me if I'm wrong, but I think:
When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
to mem_cgroup_css_free() is called by cgroup core:
static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
kmem_cgroup_destroy(memcg);
mem_cgroup_put(memcg);
}
static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
{
mem_cgroup_sockets_destroy(memcg);
memcg_kmem_mark_dead(memcg);
if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
return;
if (memcg_kmem_test_and_clear_dead(memcg))
mem_cgroup_put(memcg); <------- !!!!!!!!!
}
> return ret;
> }
> #endif /* CONFIG_MEMCG_KMEM */
> @@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> +
> return error;
> }
>
>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 3:49 ` Li Zefan
0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 3:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
>>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> From: Li Zefan <lizefan@huawei.com>
> Date: Tue, 2 Apr 2013 16:37:39 +0200
> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> fails
>
> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> expects that mem_cgroup_css_free will drop this last reference.
> This doesn't hold when memcg_init_kmem fails though and a reference is
> dropped for both memcg and its parent explicitly if it returns with an
> error.
>
> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> excessive because mem_cgroup_put is hierarchy aware and secondly only
> memcg_propagate_kmem takes an additional reference.
>
> The first one is a real use-after-free bug introduced by e4715f01
> (memcg: avoid dangling reference count in creation failure)
>
> The later one is non-issue right now because the only implementation
> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> but it is better to make the error handling saner and move the
> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f608546..cf9ba7e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> ret = memcg_update_cache_sizes(memcg);
> mutex_unlock(&set_limit_mutex);
> out:
> + if (ret)
> + mem_cgroup_put(memcg);
Correct me if I'm wrong, but I think:
When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
to mem_cgroup_css_free() is called by cgroup core:
static void mem_cgroup_css_free(struct cgroup *cont)
{
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
kmem_cgroup_destroy(memcg);
mem_cgroup_put(memcg);
}
static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
{
mem_cgroup_sockets_destroy(memcg);
memcg_kmem_mark_dead(memcg);
if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
return;
if (memcg_kmem_test_and_clear_dead(memcg))
mem_cgroup_put(memcg); <------- !!!!!!!!!
}
> return ret;
> }
> #endif /* CONFIG_MEMCG_KMEM */
> @@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - if (parent->use_hierarchy)
> - mem_cgroup_put(parent);
> - }
> +
> return error;
> }
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread[parent not found: <515BA6C9.2000704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 3:49 ` Li Zefan
(?)
@ 2013-04-03 7:43 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 7:43 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >> Yes, indeed you are very right - and thanks for looking at such depth.
> >
> > So what about the patch bellow? It seems that I provoked all this mess
> > but my brain managed to push it away so I do not remember why I thought
> > the parent needs reference drop... It is "only" 3.9 thing fortunately.
> > ---
> >>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> > From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > Date: Tue, 2 Apr 2013 16:37:39 +0200
> > Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> > fails
> >
> > mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> > expects that mem_cgroup_css_free will drop this last reference.
> > This doesn't hold when memcg_init_kmem fails though and a reference is
> > dropped for both memcg and its parent explicitly if it returns with an
> > error.
> >
> > This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> > excessive because mem_cgroup_put is hierarchy aware and secondly only
> > memcg_propagate_kmem takes an additional reference.
> >
> > The first one is a real use-after-free bug introduced by e4715f01
> > (memcg: avoid dangling reference count in creation failure)
> >
> > The later one is non-issue right now because the only implementation
> > of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> > but it is better to make the error handling saner and move the
> > mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >
> > Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > ---
> > mm/memcontrol.c | 13 +++----------
> > 1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..cf9ba7e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
>
> Correct me if I'm wrong, but I think:
>
> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> to mem_cgroup_css_free() is called by cgroup core:
>
> static void mem_cgroup_css_free(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> kmem_cgroup_destroy(memcg);
>
> mem_cgroup_put(memcg);
> }
>
> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> {
> mem_cgroup_sockets_destroy(memcg);
>
> memcg_kmem_mark_dead(memcg);
>
> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> return;
>
> if (memcg_kmem_test_and_clear_dead(memcg))
> mem_cgroup_put(memcg); <------- !!!!!!!!!
> }
But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
error path.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 7:43 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 7:43 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >> Yes, indeed you are very right - and thanks for looking at such depth.
> >
> > So what about the patch bellow? It seems that I provoked all this mess
> > but my brain managed to push it away so I do not remember why I thought
> > the parent needs reference drop... It is "only" 3.9 thing fortunately.
> > ---
> >>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> > From: Li Zefan <lizefan@huawei.com>
> > Date: Tue, 2 Apr 2013 16:37:39 +0200
> > Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> > fails
> >
> > mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> > expects that mem_cgroup_css_free will drop this last reference.
> > This doesn't hold when memcg_init_kmem fails though and a reference is
> > dropped for both memcg and its parent explicitly if it returns with an
> > error.
> >
> > This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> > excessive because mem_cgroup_put is hierarchy aware and secondly only
> > memcg_propagate_kmem takes an additional reference.
> >
> > The first one is a real use-after-free bug introduced by e4715f01
> > (memcg: avoid dangling reference count in creation failure)
> >
> > The later one is non-issue right now because the only implementation
> > of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> > but it is better to make the error handling saner and move the
> > mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >
> > Signed-off-by: Li Zefan <lizefan@huawei.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> > mm/memcontrol.c | 13 +++----------
> > 1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..cf9ba7e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
>
> Correct me if I'm wrong, but I think:
>
> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> to mem_cgroup_css_free() is called by cgroup core:
>
> static void mem_cgroup_css_free(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> kmem_cgroup_destroy(memcg);
>
> mem_cgroup_put(memcg);
> }
>
> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> {
> mem_cgroup_sockets_destroy(memcg);
>
> memcg_kmem_mark_dead(memcg);
>
> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> return;
>
> if (memcg_kmem_test_and_clear_dead(memcg))
> mem_cgroup_put(memcg); <------- !!!!!!!!!
> }
But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
error path.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 7:43 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 7:43 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >> Yes, indeed you are very right - and thanks for looking at such depth.
> >
> > So what about the patch bellow? It seems that I provoked all this mess
> > but my brain managed to push it away so I do not remember why I thought
> > the parent needs reference drop... It is "only" 3.9 thing fortunately.
> > ---
> >>From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> > From: Li Zefan <lizefan@huawei.com>
> > Date: Tue, 2 Apr 2013 16:37:39 +0200
> > Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> > fails
> >
> > mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> > expects that mem_cgroup_css_free will drop this last reference.
> > This doesn't hold when memcg_init_kmem fails though and a reference is
> > dropped for both memcg and its parent explicitly if it returns with an
> > error.
> >
> > This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> > excessive because mem_cgroup_put is hierarchy aware and secondly only
> > memcg_propagate_kmem takes an additional reference.
> >
> > The first one is a real use-after-free bug introduced by e4715f01
> > (memcg: avoid dangling reference count in creation failure)
> >
> > The later one is non-issue right now because the only implementation
> > of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> > but it is better to make the error handling saner and move the
> > mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >
> > Signed-off-by: Li Zefan <lizefan@huawei.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> > mm/memcontrol.c | 13 +++----------
> > 1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f608546..cf9ba7e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> > ret = memcg_update_cache_sizes(memcg);
> > mutex_unlock(&set_limit_mutex);
> > out:
> > + if (ret)
> > + mem_cgroup_put(memcg);
>
> Correct me if I'm wrong, but I think:
>
> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> to mem_cgroup_css_free() is called by cgroup core:
>
> static void mem_cgroup_css_free(struct cgroup *cont)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> kmem_cgroup_destroy(memcg);
>
> mem_cgroup_put(memcg);
> }
>
> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> {
> mem_cgroup_sockets_destroy(memcg);
>
> memcg_kmem_mark_dead(memcg);
>
> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> return;
>
> if (memcg_kmem_test_and_clear_dead(memcg))
> mem_cgroup_put(memcg); <------- !!!!!!!!!
> }
But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
error path.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 7:43 ` Michal Hocko
@ 2013-04-03 7:49 ` Li Zefan
-1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 7:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 2013/4/3 15:43, Michal Hocko wrote:
> On Wed 03-04-13 11:49:29, Li Zefan wrote:
>>>> Yes, indeed you are very right - and thanks for looking at such depth.
>>>
>>> So what about the patch bellow? It seems that I provoked all this mess
>>> but my brain managed to push it away so I do not remember why I thought
>>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
>>> ---
>>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
>>> From: Li Zefan <lizefan@huawei.com>
>>> Date: Tue, 2 Apr 2013 16:37:39 +0200
>>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
>>> fails
>>>
>>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
>>> expects that mem_cgroup_css_free will drop this last reference.
>>> This doesn't hold when memcg_init_kmem fails though and a reference is
>>> dropped for both memcg and its parent explicitly if it returns with an
>>> error.
>>>
>>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
>>> excessive because mem_cgroup_put is hierarchy aware and secondly only
>>> memcg_propagate_kmem takes an additional reference.
>>>
>>> The first one is a real use-after-free bug introduced by e4715f01
>>> (memcg: avoid dangling reference count in creation failure)
>>>
>>> The later one is non-issue right now because the only implementation
>>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
>>> but it is better to make the error handling saner and move the
>>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>>>
>>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>>> ---
>>> mm/memcontrol.c | 13 +++----------
>>> 1 file changed, 3 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..cf9ba7e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>
>> Correct me if I'm wrong, but I think:
>>
>> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
>> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
>> to mem_cgroup_css_free() is called by cgroup core:
>>
>> static void mem_cgroup_css_free(struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>
>> kmem_cgroup_destroy(memcg);
>>
>> mem_cgroup_put(memcg);
>> }
>>
>> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
>> {
>> mem_cgroup_sockets_destroy(memcg);
>>
>> memcg_kmem_mark_dead(memcg);
>>
>> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
>> return;
>>
>> if (memcg_kmem_test_and_clear_dead(memcg))
>> mem_cgroup_put(memcg); <------- !!!!!!!!!
>> }
>
> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> error path.
>
But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
Am I missing something?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 7:49 ` Li Zefan
0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 7:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 2013/4/3 15:43, Michal Hocko wrote:
> On Wed 03-04-13 11:49:29, Li Zefan wrote:
>>>> Yes, indeed you are very right - and thanks for looking at such depth.
>>>
>>> So what about the patch bellow? It seems that I provoked all this mess
>>> but my brain managed to push it away so I do not remember why I thought
>>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
>>> ---
>>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
>>> From: Li Zefan <lizefan@huawei.com>
>>> Date: Tue, 2 Apr 2013 16:37:39 +0200
>>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
>>> fails
>>>
>>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
>>> expects that mem_cgroup_css_free will drop this last reference.
>>> This doesn't hold when memcg_init_kmem fails though and a reference is
>>> dropped for both memcg and its parent explicitly if it returns with an
>>> error.
>>>
>>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
>>> excessive because mem_cgroup_put is hierarchy aware and secondly only
>>> memcg_propagate_kmem takes an additional reference.
>>>
>>> The first one is a real use-after-free bug introduced by e4715f01
>>> (memcg: avoid dangling reference count in creation failure)
>>>
>>> The later one is non-issue right now because the only implementation
>>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
>>> but it is better to make the error handling saner and move the
>>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
>>>
>>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>>> ---
>>> mm/memcontrol.c | 13 +++----------
>>> 1 file changed, 3 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..cf9ba7e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>
>> Correct me if I'm wrong, but I think:
>>
>> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
>> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
>> to mem_cgroup_css_free() is called by cgroup core:
>>
>> static void mem_cgroup_css_free(struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>
>> kmem_cgroup_destroy(memcg);
>>
>> mem_cgroup_put(memcg);
>> }
>>
>> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
>> {
>> mem_cgroup_sockets_destroy(memcg);
>>
>> memcg_kmem_mark_dead(memcg);
>>
>> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
>> return;
>>
>> if (memcg_kmem_test_and_clear_dead(memcg))
>> mem_cgroup_put(memcg); <------- !!!!!!!!!
>> }
>
> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> error path.
>
But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
Am I missing something?
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 7:49 ` Li Zefan
@ 2013-04-03 8:18 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:18 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 15:49:06, Li Zefan wrote:
> On 2013/4/3 15:43, Michal Hocko wrote:
> > On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >>>> Yes, indeed you are very right - and thanks for looking at such depth.
> >>>
> >>> So what about the patch bellow? It seems that I provoked all this mess
> >>> but my brain managed to push it away so I do not remember why I thought
> >>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> >>> ---
> >>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> >>> From: Li Zefan <lizefan@huawei.com>
> >>> Date: Tue, 2 Apr 2013 16:37:39 +0200
> >>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> >>> fails
> >>>
> >>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> >>> expects that mem_cgroup_css_free will drop this last reference.
> >>> This doesn't hold when memcg_init_kmem fails though and a reference is
> >>> dropped for both memcg and its parent explicitly if it returns with an
> >>> error.
> >>>
> >>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> >>> excessive because mem_cgroup_put is hierarchy aware and secondly only
> >>> memcg_propagate_kmem takes an additional reference.
> >>>
> >>> The first one is a real use-after-free bug introduced by e4715f01
> >>> (memcg: avoid dangling reference count in creation failure)
> >>>
> >>> The later one is non-issue right now because the only implementation
> >>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> >>> but it is better to make the error handling saner and move the
> >>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >>>
> >>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> >>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> >>> ---
> >>> mm/memcontrol.c | 13 +++----------
> >>> 1 file changed, 3 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index f608546..cf9ba7e 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> >>> ret = memcg_update_cache_sizes(memcg);
> >>> mutex_unlock(&set_limit_mutex);
> >>> out:
> >>> + if (ret)
> >>> + mem_cgroup_put(memcg);
> >>
> >> Correct me if I'm wrong, but I think:
> >>
> >> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> >> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> >> to mem_cgroup_css_free() is called by cgroup core:
> >>
> >> static void mem_cgroup_css_free(struct cgroup *cont)
> >> {
> >> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> >>
> >> kmem_cgroup_destroy(memcg);
> >>
> >> mem_cgroup_put(memcg);
> >> }
> >>
> >> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> >> {
> >> mem_cgroup_sockets_destroy(memcg);
> >>
> >> memcg_kmem_mark_dead(memcg);
> >>
> >> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> >> return;
> >>
> >> if (memcg_kmem_test_and_clear_dead(memcg))
> >> mem_cgroup_put(memcg); <------- !!!!!!!!!
> >> }
> >
> > But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> > error path.
> >
>
> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> Am I missing something?
>
Dang. You are right! Glauber, is there any reason why
memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
KMEM_ACCOUNTED_MASK?
This all is very confusing to say the least.
Anyway, this all means that Li's first patch is correct. I am not sure I
like it though. I think that the refcount cleanup should be done as
close to where it has been taken as possible otherwise we will end up in
this "chase the nasty details" again and again. There are definitely two
bugs here. The one introduced by e4715f01 and the other one introduced
even earlier (I haven't checked that history yet). I think we should do
something like the 2 follow up patches but if you guys think that the smaller
patch from Li is more appropriate then I will not block it.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:18 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:18 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 15:49:06, Li Zefan wrote:
> On 2013/4/3 15:43, Michal Hocko wrote:
> > On Wed 03-04-13 11:49:29, Li Zefan wrote:
> >>>> Yes, indeed you are very right - and thanks for looking at such depth.
> >>>
> >>> So what about the patch bellow? It seems that I provoked all this mess
> >>> but my brain managed to push it away so I do not remember why I thought
> >>> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> >>> ---
> >>> >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001
> >>> From: Li Zefan <lizefan@huawei.com>
> >>> Date: Tue, 2 Apr 2013 16:37:39 +0200
> >>> Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online()
> >>> fails
> >>>
> >>> mem_cgroup_css_online is called with memcg with refcnt = 1 and it
> >>> expects that mem_cgroup_css_free will drop this last reference.
> >>> This doesn't hold when memcg_init_kmem fails though and a reference is
> >>> dropped for both memcg and its parent explicitly if it returns with an
> >>> error.
> >>>
> >>> This is not correct for two reasons. Firstly mem_cgroup_put on parent is
> >>> excessive because mem_cgroup_put is hierarchy aware and secondly only
> >>> memcg_propagate_kmem takes an additional reference.
> >>>
> >>> The first one is a real use-after-free bug introduced by e4715f01
> >>> (memcg: avoid dangling reference count in creation failure)
> >>>
> >>> The later one is non-issue right now because the only implementation
> >>> of init_cgroup seems to be tcp_init_cgroup which doesn't fail
> >>> but it is better to make the error handling saner and move the
> >>> mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs.
> >>>
> >>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> >>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> >>> ---
> >>> mm/memcontrol.c | 13 +++----------
> >>> 1 file changed, 3 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index f608546..cf9ba7e 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
> >>> ret = memcg_update_cache_sizes(memcg);
> >>> mutex_unlock(&set_limit_mutex);
> >>> out:
> >>> + if (ret)
> >>> + mem_cgroup_put(memcg);
> >>
> >> Correct me if I'm wrong, but I think:
> >>
> >> When memcg_propagate_kmem() calls mem_cgroup_get(), it's because the kmemcg
> >> is active by inheritance. Then when memcg_update_cache_sizes() fails, leading
> >> to mem_cgroup_css_free() is called by cgroup core:
> >>
> >> static void mem_cgroup_css_free(struct cgroup *cont)
> >> {
> >> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> >>
> >> kmem_cgroup_destroy(memcg);
> >>
> >> mem_cgroup_put(memcg);
> >> }
> >>
> >> static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
> >> {
> >> mem_cgroup_sockets_destroy(memcg);
> >>
> >> memcg_kmem_mark_dead(memcg);
> >>
> >> if (res_counter_read_u64(&memcg->kmem, RES_USAGE) != 0)
> >> return;
> >>
> >> if (memcg_kmem_test_and_clear_dead(memcg))
> >> mem_cgroup_put(memcg); <------- !!!!!!!!!
> >> }
> >
> > But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> > error path.
> >
>
> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> Am I missing something?
>
Dang. You are right! Glauber, is there any reason why
memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
KMEM_ACCOUNTED_MASK?
This all is very confusing to say the least.
Anyway, this all means that Li's first patch is correct. I am not sure I
like it though. I think that the refcount cleanup should be done as
close to where it has been taken as possible otherwise we will end up in
this "chase the nasty details" again and again. There are definitely two
bugs here. The one introduced by e4715f01 and the other one introduced
even earlier (I haven't checked that history yet). I think we should do
something like the 2 follow up patches but if you guys think that the smaller
patch from Li is more appropriate then I will not block it.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 8:18 ` Michal Hocko
(?)
@ 2013-04-03 8:30 ` Glauber Costa
-1 siblings, 0 replies; 66+ messages in thread
From: Glauber Costa @ 2013-04-03 8:30 UTC (permalink / raw)
To: Michal Hocko
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 04/03/2013 12:18 PM, Michal Hocko wrote:
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
Yes, it is.
In kmemcg we need to differentiate between "active" and "activated"
states due to static branches management. This is only important in the
first activation, to make sure the static branches patching are
synchronized.
From this point on, the ACTIVE flag is the one we should be looking at.
Again, I fully agree it is complicated, but being that a property of the
static branches (we tried to fix it in the static branches itself but
without a lot of luck, since by their design they patch one site at a
time). I tried to overcome this by testing handcrafted errors and
documenting the states as well as I could.
But that not always work *that* well. Maybe we can use the results of
this discussion to document the tear down process a bit more?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:30 ` Glauber Costa
0 siblings, 0 replies; 66+ messages in thread
From: Glauber Costa @ 2013-04-03 8:30 UTC (permalink / raw)
To: Michal Hocko
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 04/03/2013 12:18 PM, Michal Hocko wrote:
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
Yes, it is.
In kmemcg we need to differentiate between "active" and "activated"
states due to static branches management. This is only important in the
first activation, to make sure the static branches patching are
synchronized.
>From this point on, the ACTIVE flag is the one we should be looking at.
Again, I fully agree it is complicated, but being that a property of the
static branches (we tried to fix it in the static branches itself but
without a lot of luck, since by their design they patch one site at a
time). I tried to overcome this by testing handcrafted errors and
documenting the states as well as I could.
But that not always work *that* well. Maybe we can use the results of
this discussion to document the tear down process a bit more?
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:30 ` Glauber Costa
0 siblings, 0 replies; 66+ messages in thread
From: Glauber Costa @ 2013-04-03 8:30 UTC (permalink / raw)
To: Michal Hocko
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 04/03/2013 12:18 PM, Michal Hocko wrote:
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
Yes, it is.
In kmemcg we need to differentiate between "active" and "activated"
states due to static branches management. This is only important in the
first activation, to make sure the static branches patching are
synchronized.
>From this point on, the ACTIVE flag is the one we should be looking at.
Again, I fully agree it is complicated, but being that a property of the
static branches (we tried to fix it in the static branches itself but
without a lot of luck, since by their design they patch one site at a
time). I tried to overcome this by testing handcrafted errors and
documenting the states as well as I could.
But that not always work *that* well. Maybe we can use the results of
this discussion to document the tear down process a bit more?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <20130403081843.GC14384-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 8:18 ` Michal Hocko
(?)
@ 2013-04-03 8:37 ` Li Zefan
-1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 8:37 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
>>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
>>> error path.
>>>
>>
>> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
>> Am I missing something?
>>
>
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
>
> Anyway, this all means that Li's first patch is correct. I am not sure I
> like it though. I think that the refcount cleanup should be done as
> close to where it has been taken as possible otherwise we will end up in
> this "chase the nasty details" again and again. There are definitely two
> bugs here. The one introduced by e4715f01 and the other one introduced
> even earlier (I haven't checked that history yet). I think we should do
> something like the 2 follow up patches but if you guys think that the smaller
> patch from Li is more appropriate then I will not block it.
>
Or we can queue my patch for 3.9, and then see if we want to change the
tear down process, and if yes we make the change for 3.10.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:37 ` Li Zefan
0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 8:37 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
>>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
>>> error path.
>>>
>>
>> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
>> Am I missing something?
>>
>
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
>
> Anyway, this all means that Li's first patch is correct. I am not sure I
> like it though. I think that the refcount cleanup should be done as
> close to where it has been taken as possible otherwise we will end up in
> this "chase the nasty details" again and again. There are definitely two
> bugs here. The one introduced by e4715f01 and the other one introduced
> even earlier (I haven't checked that history yet). I think we should do
> something like the 2 follow up patches but if you guys think that the smaller
> patch from Li is more appropriate then I will not block it.
>
Or we can queue my patch for 3.9, and then see if we want to change the
tear down process, and if yes we make the change for 3.10.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:37 ` Li Zefan
0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-04-03 8:37 UTC (permalink / raw)
To: Michal Hocko
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
>>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
>>> error path.
>>>
>>
>> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
>> Am I missing something?
>>
>
> Dang. You are right! Glauber, is there any reason why
> memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> KMEM_ACCOUNTED_MASK?
>
> This all is very confusing to say the least.
>
> Anyway, this all means that Li's first patch is correct. I am not sure I
> like it though. I think that the refcount cleanup should be done as
> close to where it has been taken as possible otherwise we will end up in
> this "chase the nasty details" again and again. There are definitely two
> bugs here. The one introduced by e4715f01 and the other one introduced
> even earlier (I haven't checked that history yet). I think we should do
> something like the 2 follow up patches but if you guys think that the smaller
> patch from Li is more appropriate then I will not block it.
>
Or we can queue my patch for 3.9, and then see if we want to change the
tear down process, and if yes we make the change for 3.10.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-03 8:37 ` Li Zefan
@ 2013-04-03 8:50 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:50 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 16:37:53, Li Zefan wrote:
> >>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> >>> error path.
> >>>
> >>
> >> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> >> Am I missing something?
> >>
> >
> > Dang. You are right! Glauber, is there any reason why
> > memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> > KMEM_ACCOUNTED_MASK?
> >
> > This all is very confusing to say the least.
> >
> > Anyway, this all means that Li's first patch is correct. I am not sure I
> > like it though. I think that the refcount cleanup should be done as
> > close to where it has been taken as possible otherwise we will end up in
> > this "chase the nasty details" again and again. There are definitely two
> > bugs here. The one introduced by e4715f01 and the other one introduced
> > even earlier (I haven't checked that history yet). I think we should do
> > something like the 2 follow up patches but if you guys think that the smaller
> > patch from Li is more appropriate then I will not block it.
> >
>
> Or we can queue my patch for 3.9, and then see if we want to change the
> tear down process, and if yes we make the change for 3.10.
OK, I thought it would be easier but I always end up with something
similar to your patch. So feel free to add my Acked-by and parts of my
changelog that fit (namely obvious bug introduced by e4715f01 and
documentnation of the clean-up path). I have a split up version in case
others like it more - will follow.
Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:50 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:50 UTC (permalink / raw)
To: Li Zefan
Cc: Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On Wed 03-04-13 16:37:53, Li Zefan wrote:
> >>> But memcg_update_cache_sizes calls memcg_kmem_clear_activated on the
> >>> error path.
> >>>
> >>
> >> But memcg_kmem_mark_dead() checks the ACCOUNT flag not the ACCOUNTED flag.
> >> Am I missing something?
> >>
> >
> > Dang. You are right! Glauber, is there any reason why
> > memcg_kmem_mark_dead checks only KMEM_ACCOUNTED_ACTIVE rather than
> > KMEM_ACCOUNTED_MASK?
> >
> > This all is very confusing to say the least.
> >
> > Anyway, this all means that Li's first patch is correct. I am not sure I
> > like it though. I think that the refcount cleanup should be done as
> > close to where it has been taken as possible otherwise we will end up in
> > this "chase the nasty details" again and again. There are definitely two
> > bugs here. The one introduced by e4715f01 and the other one introduced
> > even earlier (I haven't checked that history yet). I think we should do
> > something like the 2 follow up patches but if you guys think that the smaller
> > patch from Li is more appropriate then I will not block it.
> >
>
> Or we can queue my patch for 3.9, and then see if we want to change the
> tear down process, and if yes we make the change for 3.10.
OK, I thought it would be easier but I always end up with something
similar to your patch. So feel free to add my Acked-by and parts of my
changelog that fit (namely obvious bug introduced by e4715f01 and
documentnation of the clean-up path). I have a split up version in case
others like it more - will follow.
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/2] Revert "memcg: avoid dangling reference count in creation failure."
2013-04-03 8:50 ` Michal Hocko
@ 2013-04-03 8:53 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:53 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
This reverts commit e4715f01be697a3730c78f8ffffb595591d6a88c
mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..6de6d70 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6424,8 +6424,6 @@ mem_cgroup_css_online(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
}
return error;
}
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 1/2] Revert "memcg: avoid dangling reference count in creation failure."
@ 2013-04-03 8:53 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:53 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
This reverts commit e4715f01be697a3730c78f8ffffb595591d6a88c
mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..6de6d70 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6424,8 +6424,6 @@ mem_cgroup_css_online(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
- if (parent->use_hierarchy)
- mem_cgroup_put(parent);
}
return error;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path
2013-04-03 8:53 ` Michal Hocko
@ 2013-04-03 8:53 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:53 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
fails. This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem that
should clean up after itself so this patch moves mem_cgroup_put over
there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.
The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6de6d70..65b2850 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- }
return error;
}
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 66+ messages in thread* [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path
@ 2013-04-03 8:53 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 8:53 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
fails. This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem that
should clean up after itself so this patch moves mem_cgroup_put over
there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.
The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6de6d70..65b2850 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
mutex_unlock(&memcg_create_mutex);
- if (error) {
- /*
- * We call put now because our (and parent's) refcnts
- * are already in place. mem_cgroup_put() will internally
- * call __mem_cgroup_free, so return directly
- */
- mem_cgroup_put(memcg);
- }
return error;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 66+ messages in thread[parent not found: <1364979234-16427-2-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>]
* Re: [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path
2013-04-03 8:53 ` Michal Hocko
(?)
@ 2013-04-03 9:48 ` Michal Hocko
-1 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 9:48 UTC (permalink / raw)
To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Wed 03-04-13 10:53:54, Michal Hocko wrote:
> mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
> fails. This is not correct because only memcg_propagate_kmem takes an
> additional reference while mem_cgroup_sockets_init is allowed to fail as
> well (although no current implementation fails) but it doesn't take any
> reference. This all suggests that it should be memcg_propagate_kmem that
> should clean up after itself so this patch moves mem_cgroup_put over
> there.
> Unfortunately this is not that easy (as pointed out by Li Zefan) because
> memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
> is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
> memcg_propagate_kmem fails so the additional reference is dropped in
> that case in kmem_cgroup_destroy which means that the reference would be
> dropped two times.
>
> The easiest way then would be to simply remove mem_cgrroup_put from
> mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
> thing.
Forgot to mention that this one could be marked for stable for 3.8
> Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> ---
> mm/memcontrol.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6de6d70..65b2850 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - }
> return error;
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path
@ 2013-04-03 9:48 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 9:48 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
On Wed 03-04-13 10:53:54, Michal Hocko wrote:
> mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
> fails. This is not correct because only memcg_propagate_kmem takes an
> additional reference while mem_cgroup_sockets_init is allowed to fail as
> well (although no current implementation fails) but it doesn't take any
> reference. This all suggests that it should be memcg_propagate_kmem that
> should clean up after itself so this patch moves mem_cgroup_put over
> there.
> Unfortunately this is not that easy (as pointed out by Li Zefan) because
> memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
> is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
> memcg_propagate_kmem fails so the additional reference is dropped in
> that case in kmem_cgroup_destroy which means that the reference would be
> dropped two times.
>
> The easiest way then would be to simply remove mem_cgrroup_put from
> mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
> thing.
Forgot to mention that this one could be marked for stable for 3.8
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6de6d70..65b2850 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - }
> return error;
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH 2/2] memcg, kmem: clean up reference count handling on the error path
@ 2013-04-03 9:48 ` Michal Hocko
0 siblings, 0 replies; 66+ messages in thread
From: Michal Hocko @ 2013-04-03 9:48 UTC (permalink / raw)
To: linux-mm
Cc: Li Zefan, Glauber Costa, Johannes Weiner, KAMEZAWA Hiroyuki,
linux-kernel, cgroups
On Wed 03-04-13 10:53:54, Michal Hocko wrote:
> mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem
> fails. This is not correct because only memcg_propagate_kmem takes an
> additional reference while mem_cgroup_sockets_init is allowed to fail as
> well (although no current implementation fails) but it doesn't take any
> reference. This all suggests that it should be memcg_propagate_kmem that
> should clean up after itself so this patch moves mem_cgroup_put over
> there.
> Unfortunately this is not that easy (as pointed out by Li Zefan) because
> memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it
> is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
> memcg_propagate_kmem fails so the additional reference is dropped in
> that case in kmem_cgroup_destroy which means that the reference would be
> dropped two times.
>
> The easiest way then would be to simply remove mem_cgrroup_put from
> mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
> thing.
Forgot to mention that this one could be marked for stable for 3.8
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6de6d70..65b2850 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6417,14 +6417,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> mutex_unlock(&memcg_create_mutex);
> - if (error) {
> - /*
> - * We call put now because our (and parent's) refcnts
> - * are already in place. mem_cgroup_put() will internally
> - * call __mem_cgroup_free, so return directly
> - */
> - mem_cgroup_put(memcg);
> - }
> return error;
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
2013-04-02 15:04 ` Michal Hocko
@ 2013-04-03 8:08 ` Glauber Costa
-1 siblings, 0 replies; 66+ messages in thread
From: Glauber Costa @ 2013-04-03 8:08 UTC (permalink / raw)
To: Michal Hocko
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 04/02/2013 07:04 PM, Michal Hocko wrote:
> On Tue 02-04-13 18:33:30, Glauber Costa wrote:
>> On 04/02/2013 06:28 PM, Michal Hocko wrote:
>>> On Tue 02-04-13 18:20:56, Glauber Costa wrote:
>>>> On 04/02/2013 06:16 PM, Michal Hocko wrote:
>>>>> mem_cgroup_css_online
>>>>> memcg_init_kmem
>>>>> mem_cgroup_get # refcnt = 2
>>>>> memcg_update_all_caches
>>>>> memcg_update_cache_size # fails with ENOMEM
>>>>
>>>> Here is the thing: this one in kmem only happens for kmem enabled
>>>> memcgs. For those, we tend to do a get once, and put only when the last
>>>> kmem reference is gone.
>>>>
>>>> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
>>>> the mem_cgroup_put() in css_free.
>>>
>>> So we need this, right?
>>> ---
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..2ef875d 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>> return ret;
>>> }
>>> #endif /* CONFIG_MEMCG_KMEM */
>>> @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>>>
>>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>> mutex_unlock(&memcg_create_mutex);
>>> - if (error) {
>>> - /*
>>> - * We call put now because our (and parent's) refcnts
>>> - * are already in place. mem_cgroup_put() will internally
>>> - * call __mem_cgroup_free, so return directly
>>> - */
>>> - mem_cgroup_put(memcg);
>>> - if (parent->use_hierarchy)
>>> - mem_cgroup_put(parent);
>>> - }
>>> return error;
>>> }
>>>
>>>
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
Li being fine with it, I am fine with it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 66+ messages in thread* Re: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails
@ 2013-04-03 8:08 ` Glauber Costa
0 siblings, 0 replies; 66+ messages in thread
From: Glauber Costa @ 2013-04-03 8:08 UTC (permalink / raw)
To: Michal Hocko
Cc: Li Zefan, Johannes Weiner, KAMEZAWA Hiroyuki, LKML, Cgroups,
linux-mm
On 04/02/2013 07:04 PM, Michal Hocko wrote:
> On Tue 02-04-13 18:33:30, Glauber Costa wrote:
>> On 04/02/2013 06:28 PM, Michal Hocko wrote:
>>> On Tue 02-04-13 18:20:56, Glauber Costa wrote:
>>>> On 04/02/2013 06:16 PM, Michal Hocko wrote:
>>>>> mem_cgroup_css_online
>>>>> memcg_init_kmem
>>>>> mem_cgroup_get # refcnt = 2
>>>>> memcg_update_all_caches
>>>>> memcg_update_cache_size # fails with ENOMEM
>>>>
>>>> Here is the thing: this one in kmem only happens for kmem enabled
>>>> memcgs. For those, we tend to do a get once, and put only when the last
>>>> kmem reference is gone.
>>>>
>>>> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by
>>>> the mem_cgroup_put() in css_free.
>>>
>>> So we need this, right?
>>> ---
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index f608546..2ef875d 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>>> ret = memcg_update_cache_sizes(memcg);
>>> mutex_unlock(&set_limit_mutex);
>>> out:
>>> + if (ret)
>>> + mem_cgroup_put(memcg);
>>> return ret;
>>> }
>>> #endif /* CONFIG_MEMCG_KMEM */
>>> @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont)
>>>
>>> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>> mutex_unlock(&memcg_create_mutex);
>>> - if (error) {
>>> - /*
>>> - * We call put now because our (and parent's) refcnts
>>> - * are already in place. mem_cgroup_put() will internally
>>> - * call __mem_cgroup_free, so return directly
>>> - */
>>> - mem_cgroup_put(memcg);
>>> - if (parent->use_hierarchy)
>>> - mem_cgroup_put(parent);
>>> - }
>>> return error;
>>> }
>>>
>>>
>> Yes, indeed you are very right - and thanks for looking at such depth.
>
> So what about the patch bellow? It seems that I provoked all this mess
> but my brain managed to push it away so I do not remember why I thought
> the parent needs reference drop... It is "only" 3.9 thing fortunately.
> ---
Li being fine with it, I am fine with it.
^ permalink raw reply [flat|nested] 66+ messages in thread