All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@parallels.com>
To: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@gmail.com>, Michal Hocko <mhocko@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, devel@openvz.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Pekka Enberg <penberg@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 4/6] memcg, slab: check and init memcg_cahes under slab_mutex
Date: Thu, 19 Dec 2013 13:21:46 +0400	[thread overview]
Message-ID: <52B2BAAA.40801@parallels.com> (raw)
In-Reply-To: <CAA6-i6r=hW+Y2+kdKME=GTWN6sCbi37kh4sX5dT3AKkatpQzGg@mail.gmail.com>

Hi, Christoph

We have a problem with memcg-vs-slab interactions. Currently we set the
pointer to a new kmem_cache in its parent's memcg_caches array inside
memcg_create_kmem_cache() (mm/memcontrol.c):

memcg_create_kmem_cache():
    new_cachep = cache_from_memcg_idx(cachep, idx);
    if (new_cachep)
        goto out;
    new_cachep = kmem_cache_dup(memcg, cachep);
    cachep->memcg_params->memcg_caches[idx] = new_cachep;

It seems to be prone to a race as explained in the comment to this
patch. To fix the race, we need to move the assignment of new_cachep to
memcg_caches[idx] to be called under the slab_mutex protection.

There are basically two ways of doing this:

1. Move the assignment to kmem_cache_create_memcg() defined in
mm/slab.c. This is how this patch handles it.
2. Move taking of the slab_mutex, along with some memcg-specific
initialization bits, from kmem_cache_create_memcg() to
memcg_create_kmem_cache().

The second way, although looks clearer, will break the convention not to
take the slab_mutex inside mm/memcontrol.c, Glauber tried to observe
while implementing kmemcg.

So the question is: what do you think about taking the slab_mutex
directly from mm/memcontrol.c w/o using helper functions (confusing call
paths, IMO)?

Thanks.

On 12/19/2013 12:00 PM, Glauber Costa wrote:
> On Thu, Dec 19, 2013 at 11:07 AM, Vladimir Davydov
> <vdavydov@parallels.com> wrote:
>> On 12/18/2013 09:41 PM, Michal Hocko wrote:
>>> On Wed 18-12-13 17:16:55, Vladimir Davydov wrote:
>>>> The memcg_params::memcg_caches array can be updated concurrently from
>>>> memcg_update_cache_size() and memcg_create_kmem_cache(). Although both
>>>> of these functions take the slab_mutex during their operation, the
>>>> latter checks if memcg's cache has already been allocated w/o taking the
>>>> mutex. This can result in a race as described below.
>>>>
>>>> Asume two threads schedule kmem_cache creation works for the same
>>>> kmem_cache of the same memcg from __memcg_kmem_get_cache(). One of the
>>>> works successfully creates it. Another work should fail then, but if it
>>>> interleaves with memcg_update_cache_size() as follows, it does not:
>>> I am not sure I understand the race. memcg_update_cache_size is called
>>> when we start accounting a new memcg or a child is created and it
>>> inherits accounting from the parent. memcg_create_kmem_cache is called
>>> when a new cache is first allocated from, right?
>> memcg_update_cache_size() is called when kmem accounting is activated
>> for a memcg, no matter how.
>>
>> memcg_create_kmem_cache() is scheduled from __memcg_kmem_get_cache().
>> It's OK to have a bunch of such methods trying to create the same memcg
>> cache concurrently, but only one of them should succeed.
>>
>>> Why cannot we simply take slab_mutex inside memcg_create_kmem_cache?
>>> it is running from the workqueue context so it should clash with other
>>> locks.
>> Hmm, Glauber's code never takes the slab_mutex inside memcontrol.c. I
>> have always been wondering why, because it could simplify flow paths
>> significantly (e.g. update_cache_sizes() -> update_all_caches() ->
>> update_cache_size() - from memcontrol.c to slab_common.c and back again
>> just to take the mutex).
>>
> Because that is a layering violation and exposes implementation
> details of the slab to
> the outside world. I agree this would make things a lot simpler, but
> please check with Christoph
> if this is acceptable before going forward.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@gmail.com>, Michal Hocko <mhocko@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<cgroups@vger.kernel.org>, <devel@openvz.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Pekka Enberg <penberg@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 4/6] memcg, slab: check and init memcg_cahes under slab_mutex
Date: Thu, 19 Dec 2013 13:21:46 +0400	[thread overview]
Message-ID: <52B2BAAA.40801@parallels.com> (raw)
In-Reply-To: <CAA6-i6r=hW+Y2+kdKME=GTWN6sCbi37kh4sX5dT3AKkatpQzGg@mail.gmail.com>

Hi, Christoph

We have a problem with memcg-vs-slab interactions. Currently we set the
pointer to a new kmem_cache in its parent's memcg_caches array inside
memcg_create_kmem_cache() (mm/memcontrol.c):

memcg_create_kmem_cache():
    new_cachep = cache_from_memcg_idx(cachep, idx);
    if (new_cachep)
        goto out;
    new_cachep = kmem_cache_dup(memcg, cachep);
    cachep->memcg_params->memcg_caches[idx] = new_cachep;

It seems to be prone to a race as explained in the comment to this
patch. To fix the race, we need to move the assignment of new_cachep to
memcg_caches[idx] to be called under the slab_mutex protection.

There are basically two ways of doing this:

1. Move the assignment to kmem_cache_create_memcg() defined in
mm/slab.c. This is how this patch handles it.
2. Move taking of the slab_mutex, along with some memcg-specific
initialization bits, from kmem_cache_create_memcg() to
memcg_create_kmem_cache().

The second way, although looks clearer, will break the convention not to
take the slab_mutex inside mm/memcontrol.c, Glauber tried to observe
while implementing kmemcg.

So the question is: what do you think about taking the slab_mutex
directly from mm/memcontrol.c w/o using helper functions (confusing call
paths, IMO)?

Thanks.

On 12/19/2013 12:00 PM, Glauber Costa wrote:
> On Thu, Dec 19, 2013 at 11:07 AM, Vladimir Davydov
> <vdavydov@parallels.com> wrote:
>> On 12/18/2013 09:41 PM, Michal Hocko wrote:
>>> On Wed 18-12-13 17:16:55, Vladimir Davydov wrote:
>>>> The memcg_params::memcg_caches array can be updated concurrently from
>>>> memcg_update_cache_size() and memcg_create_kmem_cache(). Although both
>>>> of these functions take the slab_mutex during their operation, the
>>>> latter checks if memcg's cache has already been allocated w/o taking the
>>>> mutex. This can result in a race as described below.
>>>>
>>>> Asume two threads schedule kmem_cache creation works for the same
>>>> kmem_cache of the same memcg from __memcg_kmem_get_cache(). One of the
>>>> works successfully creates it. Another work should fail then, but if it
>>>> interleaves with memcg_update_cache_size() as follows, it does not:
>>> I am not sure I understand the race. memcg_update_cache_size is called
>>> when we start accounting a new memcg or a child is created and it
>>> inherits accounting from the parent. memcg_create_kmem_cache is called
>>> when a new cache is first allocated from, right?
>> memcg_update_cache_size() is called when kmem accounting is activated
>> for a memcg, no matter how.
>>
>> memcg_create_kmem_cache() is scheduled from __memcg_kmem_get_cache().
>> It's OK to have a bunch of such methods trying to create the same memcg
>> cache concurrently, but only one of them should succeed.
>>
>>> Why cannot we simply take slab_mutex inside memcg_create_kmem_cache?
>>> it is running from the workqueue context so it should clash with other
>>> locks.
>> Hmm, Glauber's code never takes the slab_mutex inside memcontrol.c. I
>> have always been wondering why, because it could simplify flow paths
>> significantly (e.g. update_cache_sizes() -> update_all_caches() ->
>> update_cache_size() - from memcontrol.c to slab_common.c and back again
>> just to take the mutex).
>>
> Because that is a layering violation and exposes implementation
> details of the slab to
> the outside world. I agree this would make things a lot simpler, but
> please check with Christoph
> if this is acceptable before going forward.

  parent reply	other threads:[~2013-12-19  9:21 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18 13:16 [PATCH 1/6] slab: cleanup kmem_cache_create_memcg() Vladimir Davydov
2013-12-18 13:16 ` Vladimir Davydov
2013-12-18 13:16 ` Vladimir Davydov
2013-12-18 13:16 ` [PATCH 2/6] memcg, slab: kmem_cache_create_memcg(): free memcg params on error Vladimir Davydov
2013-12-18 13:16   ` Vladimir Davydov
     [not found]   ` <9420ad797a2cfa14c23ad1ba6db615a2a51ffee0.1387372122.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-18 17:06     ` Michal Hocko
2013-12-18 17:06       ` Michal Hocko
2013-12-18 17:06       ` Michal Hocko
     [not found]       ` <20131218170649.GC31080-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  6:32         ` Vladimir Davydov
2013-12-19  6:32           ` Vladimir Davydov
2013-12-19  6:32           ` Vladimir Davydov
2013-12-19  8:48           ` Michal Hocko
2013-12-19  8:48             ` Michal Hocko
2013-12-19  9:01             ` Vladimir Davydov
2013-12-19  9:01               ` Vladimir Davydov
2013-12-19  9:19               ` Michal Hocko
2013-12-19  9:19                 ` Michal Hocko
2013-12-18 13:16 ` [PATCH 3/6] memcg, slab: cleanup barrier usage when accessing memcg_caches Vladimir Davydov
2013-12-18 13:16   ` Vladimir Davydov
2013-12-18 17:14   ` Michal Hocko
2013-12-18 17:14     ` Michal Hocko
2013-12-19  6:37     ` Vladimir Davydov
2013-12-19  6:37       ` Vladimir Davydov
2013-12-19  9:10       ` Michal Hocko
2013-12-19  9:10         ` Michal Hocko
2013-12-19  9:16         ` Vladimir Davydov
2013-12-19  9:16           ` Vladimir Davydov
2013-12-19  9:21           ` Michal Hocko
2013-12-19  9:21             ` Michal Hocko
     [not found]             ` <20131219092137.GG9331-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  9:29               ` Vladimir Davydov
2013-12-19  9:29                 ` Vladimir Davydov
2013-12-19  9:29                 ` Vladimir Davydov
2013-12-19  9:36                 ` Michal Hocko
2013-12-19  9:36                   ` Michal Hocko
     [not found]                   ` <20131219093619.GA10855-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  9:53                     ` Vladimir Davydov
2013-12-19  9:53                       ` Vladimir Davydov
2013-12-19  9:53                       ` Vladimir Davydov
2013-12-18 13:16 ` [PATCH 4/6] memcg, slab: check and init memcg_cahes under slab_mutex Vladimir Davydov
2013-12-18 13:16   ` Vladimir Davydov
2013-12-18 17:41   ` Michal Hocko
2013-12-18 17:41     ` Michal Hocko
     [not found]     ` <20131218174105.GE31080-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  7:07       ` Vladimir Davydov
2013-12-19  7:07         ` Vladimir Davydov
2013-12-19  7:07         ` Vladimir Davydov
     [not found]         ` <52B29B2F.7050909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-19  8:00           ` Glauber Costa
2013-12-19  8:00             ` Glauber Costa
2013-12-19  8:00             ` Glauber Costa
2013-12-19  9:12             ` Michal Hocko
2013-12-19  9:12               ` Michal Hocko
2013-12-19  9:17               ` Vladimir Davydov
2013-12-19  9:17                 ` Vladimir Davydov
2013-12-19  9:21             ` Vladimir Davydov [this message]
2013-12-19  9:21               ` Vladimir Davydov
2013-12-18 13:16 ` [PATCH 5/6] memcg: clear memcg_params after removing cache from memcg_slab_caches list Vladimir Davydov
2013-12-18 13:16   ` Vladimir Davydov
2013-12-18 13:16 ` [PATCH 6/6] memcg, slab: RCU protect memcg_params for root caches Vladimir Davydov
2013-12-18 13:16   ` Vladimir Davydov
2013-12-19  9:28   ` Michal Hocko
2013-12-19  9:28     ` Michal Hocko
2013-12-19  9:36     ` Vladimir Davydov
2013-12-19  9:36       ` Vladimir Davydov
     [not found]       ` <52B2BE2A.2080509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-19  9:43         ` Michal Hocko
2013-12-19  9:43           ` Michal Hocko
2013-12-19  9:43           ` Michal Hocko
     [not found]           ` <20131219094333.GB10855-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  9:47             ` Vladimir Davydov
2013-12-19  9:47               ` Vladimir Davydov
2013-12-19  9:47               ` Vladimir Davydov
2013-12-19 10:06               ` Michal Hocko
2013-12-19 10:06                 ` Michal Hocko
2013-12-18 16:56 ` [PATCH 1/6] slab: cleanup kmem_cache_create_memcg() Michal Hocko
2013-12-18 16:56   ` Michal Hocko
     [not found]   ` <20131218165603.GB31080-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  6:31     ` Vladimir Davydov
2013-12-19  6:31       ` Vladimir Davydov
2013-12-19  6:31       ` Vladimir Davydov
2013-12-19  8:44       ` Michal Hocko
2013-12-19  8:44         ` Michal Hocko
     [not found]         ` <20131219084447.GA9331-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-12-19  8:51           ` Vladimir Davydov
2013-12-19  8:51             ` Vladimir Davydov
2013-12-19  8:51             ` Vladimir Davydov
2013-12-19  9:16             ` Michal Hocko
2013-12-19  9:16               ` Michal Hocko
     [not found] ` <6f02b2d079ffd0990ae335339c803337b13ecd8c.1387372122.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-19  7:27   ` Pekka Enberg
2013-12-19  7:27     ` Pekka Enberg
2013-12-19  7:27     ` Pekka Enberg
2013-12-19  8:17 ` [Devel] " Vasily Averin
2013-12-19  8:17   ` Vasily Averin
     [not found]   ` <52B2AB7C.1010803-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-19  8:39     ` Vladimir Davydov
2013-12-19  8:39       ` Vladimir Davydov
2013-12-19  8:39       ` Vladimir Davydov
     [not found]       ` <52B2B0A4.8050009-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-12-19  9:26         ` Vasily Averin
2013-12-19  9:26           ` Vasily Averin
2013-12-19  9:26           ` Vasily Averin
2013-12-19  9:42           ` Vladimir Davydov
2013-12-19  9:42             ` Vladimir Davydov
2013-12-19  9:45           ` Michal Hocko
2013-12-19  9:45             ` Michal Hocko
2013-12-19 10:23           ` Pekka Enberg
2013-12-19 10:23             ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52B2BAAA.40801@parallels.com \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=glommer@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.