cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Race in memcg kmem?
@ 2013-12-10 13:59 Vladimir Davydov
       [not found] ` <52A71E43.9040200-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2013-12-10 13:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, Johannes Weiner, LKML,
	cgroups-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Balbir Singh, KAMEZAWA Hiroyuki

Hi,

Looking through the per-memcg kmem_cache initialization code, I have a
bad feeling that it is prone to a race. Before getting to fixing it, I'd
like to ensure this race is not only in my imagination. Here it goes.

We keep per-memcg kmem caches in the memcg_params field of each root
cache. The memcg_params array is grown dynamically by
memcg_update_cache_size(). I guess that if this function is executed
concurrently with memcg_create_kmem_cache() we can get a race resulting
in a memory leak.

-- memcg_create_kmem_cache(memcg, cachep) --
creates a new kmem_cache corresponding to a memcg and assigns it to the
root cache; called in the background - it is OK to have several such
functions trying to create a cache for the same memcg concurrently, but
only one of them should succeed.
@cachep is the root cache
@memcg is the memcg we want to create a cache for.

The function:

A1) assures there is no cache corresponding to the memcg (if it is we
have nothing to do):
    idx = memcg_cache_id(memcg);
    if (cachep->memcg_params[idx])
        goto out;

A2) creates and assigns a new cache:
    new_cachep = kmem_cache_dup(memcg, cachep);
    // init new_cachep
    cachep->memcg_params->memcg_caches[idx] = new_cachep;


-- memcg_update_cache_size(s, num_groups) --
grows s->memcg_params to accomodate data for num_groups memcg's
@s is the root cache whose memcg_params we want to grow
@num_groups is the new number of kmem-active cgroups (defines the new
size of memcg_params array).

The function:

B1) allocates and assigns a new cache:
    cur_params = s->memcg_params;
    s->memcg_params = kzalloc(size, GFP_KERNEL);

B2) copies per-memcg cache ptrs from the old memcg_params array to the
new one:
    for (i = 0; i < memcg_limited_groups_array_size; i++) {
        if (!cur_params->memcg_caches[i])
            continue;
        s->memcg_params->memcg_caches[i] =
                    cur_params->memcg_caches[i];
    }

B3) frees the old array:
    kfree(cur_params);


Since these two functions do not share any mutexes, we can get the
following race:

Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg
cache has already been created by another thread, so this function
should do nothing.

Cpu0    Cpu1
----    ----
        B1
A1              we haven't initialized memcg_params yet so Cpu0 will
                proceed to A2 to alloc and assign a new cache
A2
        B2      Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2
                - a memory leak?
        B3

I'd like to add that even if I'm right about the race, this is rather
not critical, because memcg_update_cache_sizes() is called very rarely.


BTW, it seems to me that the way we update memcg_params in
memcg_update_cache_size() make cache_from_memcg_idx() prone to
use-after-free:

> static inline struct kmem_cache *
> cache_from_memcg_idx(struct kmem_cache *s, int idx)
> {
>     if (!s->memcg_params)
>         return NULL;
>     return s->memcg_params->memcg_caches[idx];
> }

This is equivalent to

1) struct memcg_cache_params *params = s->memcg_params;
2) return params->memcg_caches[idx];

If memcg_update_cache_size() is executed between steps 1 and 2 on
another CPU, at step 2 we will dereference memcg_params that has already
been freed. This is very unlikely, but still possible. Perhaps, we
should free old memcg params only after a sync_rcu()?

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race in memcg kmem?
       [not found] ` <52A71E43.9040200-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-12-10 23:13   ` Glauber Costa
       [not found]     ` <CAA6-i6oyraHQ7_1GBKxgupS12_QFx708Niu93nyNFLrbQBXE5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2013-12-10 23:13 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Michal Hocko, Johannes Weiner, LKML,
	cgroups-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Balbir Singh, KAMEZAWA Hiroyuki

On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
<vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> Hi,
>
> Looking through the per-memcg kmem_cache initialization code, I have a
> bad feeling that it is prone to a race. Before getting to fixing it, I'd
> like to ensure this race is not only in my imagination. Here it goes.
>
> We keep per-memcg kmem caches in the memcg_params field of each root
> cache. The memcg_params array is grown dynamically by
> memcg_update_cache_size(). I guess that if this function is executed
> concurrently with memcg_create_kmem_cache() we can get a race resulting
> in a memory leak.
>
Ok, let's see.

> -- memcg_create_kmem_cache(memcg, cachep) --
> creates a new kmem_cache corresponding to a memcg and assigns it to the
> root cache; called in the background - it is OK to have several such
> functions trying to create a cache for the same memcg concurrently, but
> only one of them should succeed.

Yes.

> @cachep is the root cache
> @memcg is the memcg we want to create a cache for.
>
> The function:
>
> A1) assures there is no cache corresponding to the memcg (if it is we
> have nothing to do):
>     idx = memcg_cache_id(memcg);
>     if (cachep->memcg_params[idx])
>         goto out;
>
> A2) creates and assigns a new cache:
>     new_cachep = kmem_cache_dup(memcg, cachep);
Please note this cannot proceed in parallel with memcg_update_cache_size(),
because they both take the slab mutex.

>     // init new_cachep
>     cachep->memcg_params->memcg_caches[idx] = new_cachep;
>
>
> -- memcg_update_cache_size(s, num_groups) --
> grows s->memcg_params to accomodate data for num_groups memcg's
> @s is the root cache whose memcg_params we want to grow
> @num_groups is the new number of kmem-active cgroups (defines the new
> size of memcg_params array).
>
> The function:
>
> B1) allocates and assigns a new cache:
>     cur_params = s->memcg_params;
>     s->memcg_params = kzalloc(size, GFP_KERNEL);
>
> B2) copies per-memcg cache ptrs from the old memcg_params array to the
> new one:
>     for (i = 0; i < memcg_limited_groups_array_size; i++) {
>         if (!cur_params->memcg_caches[i])
>             continue;
>         s->memcg_params->memcg_caches[i] =
>                     cur_params->memcg_caches[i];
>     }
>
> B3) frees the old array:
>     kfree(cur_params);
>
>
> Since these two functions do not share any mutexes, we can get the

They do share a mutex, the slab mutex.

> following race:
>
> Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg
> cache has already been created by another thread, so this function
> should do nothing.
>
> Cpu0    Cpu1
> ----    ----
>         B1
> A1              we haven't initialized memcg_params yet so Cpu0 will
>                 proceed to A2 to alloc and assign a new cache
> A2
>         B2      Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2
>                 - a memory leak?
>         B3
>
> I'd like to add that even if I'm right about the race, this is rather
> not critical, because memcg_update_cache_sizes() is called very rarely.
>
Every race is critical.

So, I am a bit lost by your description. Get back to me if I got anything wrong,
but I am think that the point that you're missing is that all heavy
slab operations
take the slab_mutex underneath, and that includes cache creation and update.


>
> BTW, it seems to me that the way we update memcg_params in
> memcg_update_cache_size() make cache_from_memcg_idx() prone to
> use-after-free:
>
>> static inline struct kmem_cache *
>> cache_from_memcg_idx(struct kmem_cache *s, int idx)
>> {
>>     if (!s->memcg_params)
>>         return NULL;
>>     return s->memcg_params->memcg_caches[idx];
>> }
>
> This is equivalent to
>
> 1) struct memcg_cache_params *params = s->memcg_params;
> 2) return params->memcg_caches[idx];
>
> If memcg_update_cache_size() is executed between steps 1 and 2 on
> another CPU, at step 2 we will dereference memcg_params that has already
> been freed. This is very unlikely, but still possible. Perhaps, we
> should free old memcg params only after a sync_rcu()?
>

You seem to be right in this one. Indeed, if my mind does not betray
me, That is how I freed
the LRUs. (with synchronize_rcus).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race in memcg kmem?
       [not found]     ` <CAA6-i6oyraHQ7_1GBKxgupS12_QFx708Niu93nyNFLrbQBXE5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-12-11  6:22       ` Vladimir Davydov
       [not found]         ` <52A8048E.4020806-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2013-12-11  6:22 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, Johannes Weiner, LKML,
	cgroups-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Balbir Singh, KAMEZAWA Hiroyuki

On 12/11/2013 03:13 AM, Glauber Costa wrote:
> On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
> <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>> Hi,
>>
>> Looking through the per-memcg kmem_cache initialization code, I have a
>> bad feeling that it is prone to a race. Before getting to fixing it, I'd
>> like to ensure this race is not only in my imagination. Here it goes.
>>
>> We keep per-memcg kmem caches in the memcg_params field of each root
>> cache. The memcg_params array is grown dynamically by
>> memcg_update_cache_size(). I guess that if this function is executed
>> concurrently with memcg_create_kmem_cache() we can get a race resulting
>> in a memory leak.
>>
> Ok, let's see.
>
>> -- memcg_create_kmem_cache(memcg, cachep) --
>> creates a new kmem_cache corresponding to a memcg and assigns it to the
>> root cache; called in the background - it is OK to have several such
>> functions trying to create a cache for the same memcg concurrently, but
>> only one of them should succeed.
> Yes.
>
>> @cachep is the root cache
>> @memcg is the memcg we want to create a cache for.
>>
>> The function:
>>
>> A1) assures there is no cache corresponding to the memcg (if it is we
>> have nothing to do):
>>     idx = memcg_cache_id(memcg);
>>     if (cachep->memcg_params[idx])
>>         goto out;
>>
>> A2) creates and assigns a new cache:
>>     new_cachep = kmem_cache_dup(memcg, cachep);
> Please note this cannot proceed in parallel with memcg_update_cache_size(),
> because they both take the slab mutex.

Oh, I see. memcg_create_kmem_cache() takes and releases the slab mutex
in kmem_cache_create_memcg(), which is called by kmem_cache_dup(). This
effectively works as a barrier that does not allow A2 to proceed until
memcg_update_cache_sizes() finishes, which makes the race implausible.
Did not notice that at first. Thanks for clarification!

>
>>     // init new_cachep
>>     cachep->memcg_params->memcg_caches[idx] = new_cachep;
>>
>>
>> -- memcg_update_cache_size(s, num_groups) --
>> grows s->memcg_params to accomodate data for num_groups memcg's
>> @s is the root cache whose memcg_params we want to grow
>> @num_groups is the new number of kmem-active cgroups (defines the new
>> size of memcg_params array).
>>
>> The function:
>>
>> B1) allocates and assigns a new cache:
>>     cur_params = s->memcg_params;
>>     s->memcg_params = kzalloc(size, GFP_KERNEL);
>>
>> B2) copies per-memcg cache ptrs from the old memcg_params array to the
>> new one:
>>     for (i = 0; i < memcg_limited_groups_array_size; i++) {
>>         if (!cur_params->memcg_caches[i])
>>             continue;
>>         s->memcg_params->memcg_caches[i] =
>>                     cur_params->memcg_caches[i];
>>     }
>>
>> B3) frees the old array:
>>     kfree(cur_params);
>>
>>
>> Since these two functions do not share any mutexes, we can get the
> They do share a mutex, the slab mutex.
>
>> following race:
>>
>> Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg
>> cache has already been created by another thread, so this function
>> should do nothing.
>>
>> Cpu0    Cpu1
>> ----    ----
>>         B1
>> A1              we haven't initialized memcg_params yet so Cpu0 will
>>                 proceed to A2 to alloc and assign a new cache
>> A2
>>         B2      Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2
>>                 - a memory leak?
>>         B3
>>
>> I'd like to add that even if I'm right about the race, this is rather
>> not critical, because memcg_update_cache_sizes() is called very rarely.
>>
> Every race is critical.
>
> So, I am a bit lost by your description. Get back to me if I got anything wrong,
> but I am think that the point that you're missing is that all heavy
> slab operations
> take the slab_mutex underneath, and that includes cache creation and update.
>
>
>> BTW, it seems to me that the way we update memcg_params in
>> memcg_update_cache_size() make cache_from_memcg_idx() prone to
>> use-after-free:
>>
>>> static inline struct kmem_cache *
>>> cache_from_memcg_idx(struct kmem_cache *s, int idx)
>>> {
>>>     if (!s->memcg_params)
>>>         return NULL;
>>>     return s->memcg_params->memcg_caches[idx];
>>> }
>> This is equivalent to
>>
>> 1) struct memcg_cache_params *params = s->memcg_params;
>> 2) return params->memcg_caches[idx];
>>
>> If memcg_update_cache_size() is executed between steps 1 and 2 on
>> another CPU, at step 2 we will dereference memcg_params that has already
>> been freed. This is very unlikely, but still possible. Perhaps, we
>> should free old memcg params only after a sync_rcu()?
>>
> You seem to be right in this one. Indeed, if my mind does not betray
> me, That is how I freed
> the LRUs. (with synchronize_rcus).

Yes, you freed LRUs only after a sync_rcu, that's why the way
memcg_params is updated looks suspicious to me. I'll try to fix it then.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race in memcg kmem?
       [not found]         ` <52A8048E.4020806-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-12-12 13:21           ` Michal Hocko
  2013-12-12 13:39             ` Vladimir Davydov
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2013-12-12 13:21 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Glauber Costa, Johannes Weiner, LKML,
	cgroups-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Balbir Singh, KAMEZAWA Hiroyuki

On Wed 11-12-13 10:22:06, Vladimir Davydov wrote:
> On 12/11/2013 03:13 AM, Glauber Costa wrote:
> > On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
[...]
> >> -- memcg_update_cache_size(s, num_groups) --
> >> grows s->memcg_params to accomodate data for num_groups memcg's
> >> @s is the root cache whose memcg_params we want to grow
> >> @num_groups is the new number of kmem-active cgroups (defines the new
> >> size of memcg_params array).
> >>
> >> The function:
> >>
> >> B1) allocates and assigns a new cache:
> >>     cur_params = s->memcg_params;
> >>     s->memcg_params = kzalloc(size, GFP_KERNEL);
> >>
> >> B2) copies per-memcg cache ptrs from the old memcg_params array to the
> >> new one:
> >>     for (i = 0; i < memcg_limited_groups_array_size; i++) {
> >>         if (!cur_params->memcg_caches[i])
> >>             continue;
> >>         s->memcg_params->memcg_caches[i] =
> >>                     cur_params->memcg_caches[i];
> >>     }
> >>
> >> B3) frees the old array:
> >>     kfree(cur_params);
> >>
> >>
> >> Since these two functions do not share any mutexes, we can get the
> > They do share a mutex, the slab mutex.

Worth sticking in a lock_dep_assert?

> >
> >> following race:
> >>
> >> Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg
> >> cache has already been created by another thread, so this function
> >> should do nothing.
> >>
> >> Cpu0    Cpu1
> >> ----    ----
> >>         B1
> >> A1              we haven't initialized memcg_params yet so Cpu0 will
> >>                 proceed to A2 to alloc and assign a new cache
> >> A2
> >>         B2      Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2
> >>                 - a memory leak?
> >>         B3
> >>
> >> I'd like to add that even if I'm right about the race, this is rather
> >> not critical, because memcg_update_cache_sizes() is called very rarely.
> >>
> > Every race is critical.
> >
> > So, I am a bit lost by your description. Get back to me if I got anything wrong,
> > but I am think that the point that you're missing is that all heavy
> > slab operations
> > take the slab_mutex underneath, and that includes cache creation and update.
> >
> >
> >> BTW, it seems to me that the way we update memcg_params in
> >> memcg_update_cache_size() make cache_from_memcg_idx() prone to
> >> use-after-free:
> >>
> >>> static inline struct kmem_cache *
> >>> cache_from_memcg_idx(struct kmem_cache *s, int idx)
> >>> {
> >>>     if (!s->memcg_params)
> >>>         return NULL;
> >>>     return s->memcg_params->memcg_caches[idx];
> >>> }
> >> This is equivalent to
> >>
> >> 1) struct memcg_cache_params *params = s->memcg_params;
> >> 2) return params->memcg_caches[idx];
> >>
> >> If memcg_update_cache_size() is executed between steps 1 and 2 on
> >> another CPU, at step 2 we will dereference memcg_params that has already
> >> been freed. This is very unlikely, but still possible. Perhaps, we
> >> should free old memcg params only after a sync_rcu()?
> >>
> > You seem to be right in this one. Indeed, if my mind does not betray
> > me, That is how I freed
> > the LRUs. (with synchronize_rcus).
> 
> Yes, you freed LRUs only after a sync_rcu, that's why the way
> memcg_params is updated looks suspicious to me. I'll try to fix it then.

I have quickly glanced through cache_from_memcg_idx users and the
locking is quite inconsistent.
* mem_cgroup_slabinfo_read - memcg->slab_caches_mutex
* memcg_create_kmem_cache - memcg_cache_mutex
* kmem_cache_destroy_memcg_children - set_limit_mutex
* __memcg_kmem_get_cache - RCU read lock

I was afraid to go further...

memcg_update_cache_size and kmem_cache_create_memcg seem to be the only
who touch memcg_params and they rely on slab_mutex. The later one is
touching a cache which is not visible yet so it should be safe.

I didn't look closer at all the callers of cache_from_memcg_idx but I
guess we want to use RCU here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race in memcg kmem?
  2013-12-12 13:21           ` Michal Hocko
@ 2013-12-12 13:39             ` Vladimir Davydov
       [not found]               ` <52A9BC91.5020105-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2013-12-12 13:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, Johannes Weiner, LKML, cgroups, devel,
	Balbir Singh, KAMEZAWA Hiroyuki

On 12/12/2013 05:21 PM, Michal Hocko wrote:
> On Wed 11-12-13 10:22:06, Vladimir Davydov wrote:
>> On 12/11/2013 03:13 AM, Glauber Costa wrote:
>>> On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
> [...]
>>>> -- memcg_update_cache_size(s, num_groups) --
>>>> grows s->memcg_params to accomodate data for num_groups memcg's
>>>> @s is the root cache whose memcg_params we want to grow
>>>> @num_groups is the new number of kmem-active cgroups (defines the new
>>>> size of memcg_params array).
>>>>
>>>> The function:
>>>>
>>>> B1) allocates and assigns a new cache:
>>>>     cur_params = s->memcg_params;
>>>>     s->memcg_params = kzalloc(size, GFP_KERNEL);
>>>>
>>>> B2) copies per-memcg cache ptrs from the old memcg_params array to the
>>>> new one:
>>>>     for (i = 0; i < memcg_limited_groups_array_size; i++) {
>>>>         if (!cur_params->memcg_caches[i])
>>>>             continue;
>>>>         s->memcg_params->memcg_caches[i] =
>>>>                     cur_params->memcg_caches[i];
>>>>     }
>>>>
>>>> B3) frees the old array:
>>>>     kfree(cur_params);
>>>>
>>>>
>>>> Since these two functions do not share any mutexes, we can get the
>>> They do share a mutex, the slab mutex.
> Worth sticking in a lock_dep_assert?

AFAIU, lockdep_assert_held() is not applicable here:
memcg_create_kmem_cache() is called w/o the slab_mutex held, but it
calls kmem_cache_create_kmemcg(), which takes and releases this mutex,
working as a barrier. Placing lockdep_assert_held() into the latter
won't make things any clearer. IMO, we need a big good comment in
memcg_create_kmem_cache() proving its correctness.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Devel] Race in memcg kmem?
       [not found]               ` <52A9BC91.5020105-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-12-18  7:51                 ` Vladimir Davydov
  0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2013-12-18  7:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, Balbir Singh, LKML, KAMEZAWA Hiroyuki,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On 12/12/2013 05:39 PM, Vladimir Davydov wrote:
> On 12/12/2013 05:21 PM, Michal Hocko wrote:
>> On Wed 11-12-13 10:22:06, Vladimir Davydov wrote:
>>> On 12/11/2013 03:13 AM, Glauber Costa wrote:
>>>> On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov
>> [...]
>>>>> -- memcg_update_cache_size(s, num_groups) --
>>>>> grows s->memcg_params to accomodate data for num_groups memcg's
>>>>> @s is the root cache whose memcg_params we want to grow
>>>>> @num_groups is the new number of kmem-active cgroups (defines the new
>>>>> size of memcg_params array).
>>>>>
>>>>> The function:
>>>>>
>>>>> B1) allocates and assigns a new cache:
>>>>>     cur_params = s->memcg_params;
>>>>>     s->memcg_params = kzalloc(size, GFP_KERNEL);
>>>>>
>>>>> B2) copies per-memcg cache ptrs from the old memcg_params array to the
>>>>> new one:
>>>>>     for (i = 0; i < memcg_limited_groups_array_size; i++) {
>>>>>         if (!cur_params->memcg_caches[i])
>>>>>             continue;
>>>>>         s->memcg_params->memcg_caches[i] =
>>>>>                     cur_params->memcg_caches[i];
>>>>>     }
>>>>>
>>>>> B3) frees the old array:
>>>>>     kfree(cur_params);
>>>>>
>>>>>
>>>>> Since these two functions do not share any mutexes, we can get the
>>>> They do share a mutex, the slab mutex.
>> Worth sticking in a lock_dep_assert?
> AFAIU, lockdep_assert_held() is not applicable here:
> memcg_create_kmem_cache() is called w/o the slab_mutex held, but it
> calls kmem_cache_create_kmemcg(), which takes and releases this mutex,
> working as a barrier. Placing lockdep_assert_held() into the latter
> won't make things any clearer. IMO, we need a big good comment in
> memcg_create_kmem_cache() proving its correctness.

After a bit of thinking on the comment explaining why the race is
impossible I seem to have found another one in these two functions.

Assume two threads schedule kmem_cache creation works for the same
kmem_cache of the same memcg from __memcg_kmem_get_cache(). One of the
works successfully creates it. Another work should fail then, but if it
interleaves with memcg_update_cache_size() as follows, it does not:

memcg_create_kmem_cache()                       memcg_update_cache_size()
(called w/o mutexes held)                       (called with slab_mutex
held)
-------------------------                       -------------------------
mutex_lock(&memcg_cache_mutex)
                                                s->memcg_params=kzalloc(...)
new_cachep=cache_from_memcg_idx(cachep,idx)
// new_cachep==NULL => proceed to creation
                                                // initialize
s->memcg_params;
                                                // sets s->memcg_params
                                                //       
->memcg_caches[idx]
new_cachep = kmem_cache_dup(memcg, cachep)
// nothing prevents kmem_cache_dup from
// succeeding so ...
cachep->memcg_params->memcg_caches[idx]=new_cachep
// we've overwritten an existing cache ptr!

slab_mutex won't help here...

Anyway, I'm going to move check and initialization of memcg_caches[idx]
from memcg_create_kmem_cache() to kmem_cache_create_memcg() under the
slab_mutex eliminating every possibility of race there. Will send the
patch soon.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-18  7:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-10 13:59 Race in memcg kmem? Vladimir Davydov
     [not found] ` <52A71E43.9040200-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-10 23:13   ` Glauber Costa
     [not found]     ` <CAA6-i6oyraHQ7_1GBKxgupS12_QFx708Niu93nyNFLrbQBXE5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-11  6:22       ` Vladimir Davydov
     [not found]         ` <52A8048E.4020806-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-12 13:21           ` Michal Hocko
2013-12-12 13:39             ` Vladimir Davydov
     [not found]               ` <52A9BC91.5020105-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-18  7:51                 ` [Devel] " Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).