From: Vladimir Davydov <vdavydov@parallels.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, glommer@gmail.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
devel@openvz.org
Subject: Re: [PATCH RESEND -mm 01/12] memcg: flush cache creation works before memcg cache destruction
Date: Tue, 18 Mar 2014 13:28:48 +0400 [thread overview]
Message-ID: <532811D0.5070203@parallels.com> (raw)
In-Reply-To: <20140318085532.GB3191@dhcp22.suse.cz>
On 03/18/2014 12:55 PM, Michal Hocko wrote:
> On Tue 18-03-14 12:14:37, Vladimir Davydov wrote:
>> On 03/17/2014 08:07 PM, Michal Hocko wrote:
>>> On Thu 13-03-14 19:06:39, Vladimir Davydov wrote:
>>>> When we get to memcg cache destruction, either from the root cache
>>>> destruction path or when turning memcg offline, there still might be
>>>> memcg cache creation works pending that was scheduled before we
>>>> initiated destruction. We need to flush them before starting to destroy
>>>> memcg caches, otherwise we can get a leaked kmem cache or, even worse,
>>>> an attempt to use after free.
>>> How can we use-after-free? Even if there is a pending work item to
>>> create a new cache then we keep the css reference for the memcg and
>>> release it from the worker (memcg_create_cache_work_func). So although
>>> this can race with memcg offlining the memcg itself will be still alive.
>> There are actually two issues:
>>
>> 1) When we destroy a root cache using kmem_cache_destroy(), we should
>> ensure all pending memcg creation works for this root cache are over,
>> otherwise a work could be executed after the root cache is destroyed
>> resulting in use-after-free.
> Dunno, but this sounds backwards to me. If we are using a root cache for
> a new child creation then the child should make sure that the root
> doesn't go away, no? Cannot we take a reference to the root cache before
> we schedule memcg_create_cache_work_func?
Yeah, that would work of course. We already have kmem_cache::refcount,
which is currently used for alias handling, and I guess we could reuse
it here. We would only have to make it atomic, because we can't take the
slab_mutex in memcg_kmem_get_cache(), but it shouldn't be a problem.
> But I admit that the root cache concept is not entirely clear to me.
>
>> 2) Memcg offline. In this case use-after-free is impossible in a memcg
>> creation work handler, because, as you mentioned, the work holds the css
>> reference. However, we still have to synchronize against pending
>> requests, otherwise a work handler can be executed after we destroyed
>> the caches corresponding to the memcg being offlined resulting in a
>> kmem_cache leak.
> If that is a case then we should come up with a proper synchronization
> because synchronization by workqueues and explicit flushing and
> canceling is really bad.
Would be something like this suitable as proper synchronization:
mem_cgroup_destroy_all_caches():
/* currently we don't take the slab_mutex here,
* so we'd have to add this line */
take slab_mutex
mark the memcg dead
schedule the memcg's caches destruction
release slab_mutex
kmem_cache_create_memcg():
take slab_mutex
if memcg is not dead, then create a cache
release slab_mutex
?
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: <akpm@linux-foundation.org>, <hannes@cmpxchg.org>,
<glommer@gmail.com>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, <devel@openvz.org>
Subject: Re: [PATCH RESEND -mm 01/12] memcg: flush cache creation works before memcg cache destruction
Date: Tue, 18 Mar 2014 13:28:48 +0400 [thread overview]
Message-ID: <532811D0.5070203@parallels.com> (raw)
In-Reply-To: <20140318085532.GB3191@dhcp22.suse.cz>
On 03/18/2014 12:55 PM, Michal Hocko wrote:
> On Tue 18-03-14 12:14:37, Vladimir Davydov wrote:
>> On 03/17/2014 08:07 PM, Michal Hocko wrote:
>>> On Thu 13-03-14 19:06:39, Vladimir Davydov wrote:
>>>> When we get to memcg cache destruction, either from the root cache
>>>> destruction path or when turning memcg offline, there still might be
>>>> memcg cache creation works pending that was scheduled before we
>>>> initiated destruction. We need to flush them before starting to destroy
>>>> memcg caches, otherwise we can get a leaked kmem cache or, even worse,
>>>> an attempt to use after free.
>>> How can we use-after-free? Even if there is a pending work item to
>>> create a new cache then we keep the css reference for the memcg and
>>> release it from the worker (memcg_create_cache_work_func). So although
>>> this can race with memcg offlining the memcg itself will be still alive.
>> There are actually two issues:
>>
>> 1) When we destroy a root cache using kmem_cache_destroy(), we should
>> ensure all pending memcg creation works for this root cache are over,
>> otherwise a work could be executed after the root cache is destroyed
>> resulting in use-after-free.
> Dunno, but this sounds backwards to me. If we are using a root cache for
> a new child creation then the child should make sure that the root
> doesn't go away, no? Cannot we take a reference to the root cache before
> we schedule memcg_create_cache_work_func?
Yeah, that would work of course. We already have kmem_cache::refcount,
which is currently used for alias handling, and I guess we could reuse
it here. We would only have to make it atomic, because we can't take the
slab_mutex in memcg_kmem_get_cache(), but it shouldn't be a problem.
> But I admit that the root cache concept is not entirely clear to me.
>
>> 2) Memcg offline. In this case use-after-free is impossible in a memcg
>> creation work handler, because, as you mentioned, the work holds the css
>> reference. However, we still have to synchronize against pending
>> requests, otherwise a work handler can be executed after we destroyed
>> the caches corresponding to the memcg being offlined resulting in a
>> kmem_cache leak.
> If that is a case then we should come up with a proper synchronization
> because synchronization by workqueues and explicit flushing and
> canceling is really bad.
Would be something like this suitable as proper synchronization:
mem_cgroup_destroy_all_caches():
/* currently we don't take the slab_mutex here,
* so we'd have to add this line */
take slab_mutex
mark the memcg dead
schedule the memcg's caches destruction
release slab_mutex
kmem_cache_create_memcg():
take slab_mutex
if memcg is not dead, then create a cache
release slab_mutex
?
Thanks.
next prev parent reply other threads:[~2014-03-18 9:28 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-13 15:06 [PATCH RESEND -mm 00/12] kmemcg reparenting Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 01/12] memcg: flush cache creation works before memcg cache destruction Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-17 16:07 ` Michal Hocko
2014-03-17 16:07 ` Michal Hocko
2014-03-18 8:14 ` Vladimir Davydov
2014-03-18 8:14 ` Vladimir Davydov
2014-03-18 8:55 ` Michal Hocko
2014-03-18 8:55 ` Michal Hocko
2014-03-18 9:28 ` Vladimir Davydov [this message]
2014-03-18 9:28 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 02/12] memcg: fix race in memcg cache destruction path Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-17 16:42 ` Michal Hocko
2014-03-17 16:42 ` Michal Hocko
2014-03-18 8:19 ` Vladimir Davydov
2014-03-18 8:19 ` Vladimir Davydov
2014-03-18 10:01 ` Michal Hocko
2014-03-18 10:01 ` Michal Hocko
2014-03-18 12:14 ` Vladimir Davydov
2014-03-18 12:14 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 03/12] memcg: fix root vs memcg cache destruction race Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 04/12] memcg: move slab caches list/mutex init to memcg creation Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 05/12] memcg: add pointer from memcg_cache_params to cache Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 06/12] memcg: keep all children of each root cache on a list Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 07/12] memcg: rework slab charging Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 08/12] memcg: do not charge kmalloc_large allocations Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 09/12] fork: do not charge thread_info to kmemcg Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 10/12] memcg: kill GFP_KMEMCG and stuff Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 11/12] memcg: reparent slab on css offline Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
2014-03-13 15:06 ` [PATCH RESEND -mm 12/12] slub: make sure all memcg caches have unique names on sysfs Vladimir Davydov
2014-03-13 15:06 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=532811D0.5070203@parallels.com \
--to=vdavydov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=devel@openvz.org \
--cc=glommer@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.