[PATCH] memcg: take reference before releasing rcu_read

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memcg: take reference before releasing rcu_read_lock
@ 2013-03-29 10:28 Li Zefan
  2013-03-29 10:48 ` Glauber Costa
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Li Zefan @ 2013-03-29 10:28 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, KAMEZAWA Hiroyuki, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

The memcg is not referenced, so it can be destroyed at anytime right
after we exit rcu read section, so it's not safe to access it.

To fix this, we call css_tryget() to get a reference while we're still
in rcu read section.

This also removes a bogus comment above __memcg_create_cache_enqueue().

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bbe0742..01fe340 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3456,7 +3456,6 @@ static void memcg_create_cache_work_func(struct work_struct *w)
 
 /*
  * Enqueue the creation of a per-memcg kmem_cache.
- * Called with rcu_read_lock.
  */
 static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 					 struct kmem_cache *cachep)
@@ -3464,12 +3463,8 @@ static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	struct create_work *cw;
 
 	cw = kmalloc(sizeof(struct create_work), GFP_NOWAIT);
-	if (cw == NULL)
-		return;
-
-	/* The corresponding put will be done in the workqueue. */
-	if (!css_tryget(&memcg->css)) {
-		kfree(cw);
+	if (cw == NULL) {
+		css_put(&memcg->css);
 		return;
 	}
 
@@ -3525,10 +3520,9 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(rcu_dereference(current->mm->owner));
-	rcu_read_unlock();
 
 	if (!memcg_can_account_kmem(memcg))
-		return cachep;
+		goto out;
 
 	idx = memcg_cache_id(memcg);
 
@@ -3537,29 +3531,38 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
 	 * code updating memcg_caches will issue a write barrier to match this.
 	 */
 	read_barrier_depends();
-	if (unlikely(cachep->memcg_params->memcg_caches[idx] == NULL)) {
-		/*
-		 * If we are in a safe context (can wait, and not in interrupt
-		 * context), we could be be predictable and return right away.
-		 * This would guarantee that the allocation being performed
-		 * already belongs in the new cache.
-		 *
-		 * However, there are some clashes that can arrive from locking.
-		 * For instance, because we acquire the slab_mutex while doing
-		 * kmem_cache_dup, this means no further allocation could happen
-		 * with the slab_mutex held.
-		 *
-		 * Also, because cache creation issue get_online_cpus(), this
-		 * creates a lock chain: memcg_slab_mutex -> cpu_hotplug_mutex,
-		 * that ends up reversed during cpu hotplug. (cpuset allocates
-		 * a bunch of GFP_KERNEL memory during cpuup). Due to all that,
-		 * better to defer everything.
-		 */
-		memcg_create_cache_enqueue(memcg, cachep);
-		return cachep;
+	if (likely(cachep->memcg_params->memcg_caches[idx])) {
+		cachep = cachep->memcg_params->memcg_caches[idx];
+		goto out;
 	}
 
-	return cachep->memcg_params->memcg_caches[idx];
+	/* The corresponding put will be done in the workqueue. */
+	if (!css_tryget(&memcg->css))
+		goto out;
+	rcu_read_unlock();
+
+	/*
+	 * If we are in a safe context (can wait, and not in interrupt
+	 * context), we could be be predictable and return right away.
+	 * This would guarantee that the allocation being performed
+	 * already belongs in the new cache.
+	 *
+	 * However, there are some clashes that can arrive from locking.
+	 * For instance, because we acquire the slab_mutex while doing
+	 * kmem_cache_dup, this means no further allocation could happen
+	 * with the slab_mutex held.
+	 *
+	 * Also, because cache creation issue get_online_cpus(), this
+	 * creates a lock chain: memcg_slab_mutex -> cpu_hotplug_mutex,
+	 * that ends up reversed during cpu hotplug. (cpuset allocates
+	 * a bunch of GFP_KERNEL memory during cpuup). Due to all that,
+	 * better to defer everything.
+	 */
+	memcg_create_cache_enqueue(memcg, cachep);
+	return cachep;
+out:
+	rcu_read_unlock();
+	return cachep;
 }
 EXPORT_SYMBOL(__memcg_kmem_get_cache);
 
-- 
1.8.0.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: take reference before releasing rcu_read_lock
  2013-03-29 10:28 [PATCH] memcg: take reference before releasing rcu_read_lock Li Zefan
@ 2013-03-29 10:48 ` Glauber Costa
  2013-03-30  0:35   ` Li Zefan
  2013-03-29 14:59 ` Michal Hocko
  2013-04-01  5:03 ` Kamezawa Hiroyuki
  2 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2013-03-29 10:48 UTC (permalink / raw)
  To: Li Zefan
  Cc: Michal Hocko, KAMEZAWA Hiroyuki, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

On 03/29/2013 02:28 PM, Li Zefan wrote:
> The memcg is not referenced, so it can be destroyed at anytime right
> after we exit rcu read section, so it's not safe to access it.
> 
> To fix this, we call css_tryget() to get a reference while we're still
> in rcu read section.
> 
> This also removes a bogus comment above __memcg_create_cache_enqueue().
> 
Out of curiosity, did you see that happening ?

Theoretically, the race you describe seem real, and the fix is sound.

Acked-by: Glauber Costa <glommer@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: take reference before releasing rcu_read_lock
  2013-03-29 10:48 ` Glauber Costa
@ 2013-03-30  0:35   ` Li Zefan
  2013-04-01  7:24     ` Glauber Costa
  0 siblings, 1 reply; 6+ messages in thread
From: Li Zefan @ 2013-03-30  0:35 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, KAMEZAWA Hiroyuki, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

On 2013/3/29 18:48, Glauber Costa wrote:
> On 03/29/2013 02:28 PM, Li Zefan wrote:
>> The memcg is not referenced, so it can be destroyed at anytime right
>> after we exit rcu read section, so it's not safe to access it.
>>
>> To fix this, we call css_tryget() to get a reference while we're still
>> in rcu read section.
>>
>> This also removes a bogus comment above __memcg_create_cache_enqueue().
>>
> Out of curiosity, did you see that happening ?
> 

Just by code inspection. This is not the only place you use RCU in this
wrong way. Remember the last patch I sent? ;)

> Theoretically, the race you describe seem real, and the fix is sound.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: take reference before releasing rcu_read_lock
  2013-03-30  0:35   ` Li Zefan
@ 2013-04-01  7:24     ` Glauber Costa
  0 siblings, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2013-04-01  7:24 UTC (permalink / raw)
  To: Li Zefan
  Cc: Michal Hocko, KAMEZAWA Hiroyuki, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

On 03/30/2013 04:35 AM, Li Zefan wrote:
> On 2013/3/29 18:48, Glauber Costa wrote:
>> On 03/29/2013 02:28 PM, Li Zefan wrote:
>>> The memcg is not referenced, so it can be destroyed at anytime right
>>> after we exit rcu read section, so it's not safe to access it.
>>>
>>> To fix this, we call css_tryget() to get a reference while we're still
>>> in rcu read section.
>>>
>>> This also removes a bogus comment above __memcg_create_cache_enqueue().
>>>
>> Out of curiosity, did you see that happening ?
>>
> 
> Just by code inspection. This is not the only place you use RCU in this
> wrong way. Remember the last patch I sent? ;)
> 
Indeed, that is what happens with miscomprehensions: the mistake tends
to be repeated. Thanks for your diligence with this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: take reference before releasing rcu_read_lock
  2013-03-29 10:28 [PATCH] memcg: take reference before releasing rcu_read_lock Li Zefan
  2013-03-29 10:48 ` Glauber Costa
@ 2013-03-29 14:59 ` Michal Hocko
  2013-04-01  5:03 ` Kamezawa Hiroyuki
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2013-03-29 14:59 UTC (permalink / raw)
  To: Li Zefan
  Cc: Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

On Fri 29-03-13 18:28:57, Li Zefan wrote:
> The memcg is not referenced, so it can be destroyed at anytime right
> after we exit rcu read section, so it's not safe to access it.
> 
> To fix this, we call css_tryget() to get a reference while we're still
> in rcu read section.
> 
> This also removes a bogus comment above __memcg_create_cache_enqueue().
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>

Looks good to me.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bbe0742..01fe340 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3456,7 +3456,6 @@ static void memcg_create_cache_work_func(struct work_struct *w)
>  
>  /*
>   * Enqueue the creation of a per-memcg kmem_cache.
> - * Called with rcu_read_lock.
>   */
>  static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>  					 struct kmem_cache *cachep)
> @@ -3464,12 +3463,8 @@ static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>  	struct create_work *cw;
>  
>  	cw = kmalloc(sizeof(struct create_work), GFP_NOWAIT);
> -	if (cw == NULL)
> -		return;
> -
> -	/* The corresponding put will be done in the workqueue. */
> -	if (!css_tryget(&memcg->css)) {
> -		kfree(cw);
> +	if (cw == NULL) {
> +		css_put(&memcg->css);
>  		return;
>  	}
>  
> @@ -3525,10 +3520,9 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
>  
>  	rcu_read_lock();
>  	memcg = mem_cgroup_from_task(rcu_dereference(current->mm->owner));
> -	rcu_read_unlock();
>  
>  	if (!memcg_can_account_kmem(memcg))
> -		return cachep;
> +		goto out;
>  
>  	idx = memcg_cache_id(memcg);
>  
> @@ -3537,29 +3531,38 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
>  	 * code updating memcg_caches will issue a write barrier to match this.
>  	 */
>  	read_barrier_depends();
> -	if (unlikely(cachep->memcg_params->memcg_caches[idx] == NULL)) {
> -		/*
> -		 * If we are in a safe context (can wait, and not in interrupt
> -		 * context), we could be be predictable and return right away.
> -		 * This would guarantee that the allocation being performed
> -		 * already belongs in the new cache.
> -		 *
> -		 * However, there are some clashes that can arrive from locking.
> -		 * For instance, because we acquire the slab_mutex while doing
> -		 * kmem_cache_dup, this means no further allocation could happen
> -		 * with the slab_mutex held.
> -		 *
> -		 * Also, because cache creation issue get_online_cpus(), this
> -		 * creates a lock chain: memcg_slab_mutex -> cpu_hotplug_mutex,
> -		 * that ends up reversed during cpu hotplug. (cpuset allocates
> -		 * a bunch of GFP_KERNEL memory during cpuup). Due to all that,
> -		 * better to defer everything.
> -		 */
> -		memcg_create_cache_enqueue(memcg, cachep);
> -		return cachep;
> +	if (likely(cachep->memcg_params->memcg_caches[idx])) {
> +		cachep = cachep->memcg_params->memcg_caches[idx];
> +		goto out;
>  	}
>  
> -	return cachep->memcg_params->memcg_caches[idx];
> +	/* The corresponding put will be done in the workqueue. */
> +	if (!css_tryget(&memcg->css))
> +		goto out;
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If we are in a safe context (can wait, and not in interrupt
> +	 * context), we could be be predictable and return right away.
> +	 * This would guarantee that the allocation being performed
> +	 * already belongs in the new cache.
> +	 *
> +	 * However, there are some clashes that can arrive from locking.
> +	 * For instance, because we acquire the slab_mutex while doing
> +	 * kmem_cache_dup, this means no further allocation could happen
> +	 * with the slab_mutex held.
> +	 *
> +	 * Also, because cache creation issue get_online_cpus(), this
> +	 * creates a lock chain: memcg_slab_mutex -> cpu_hotplug_mutex,
> +	 * that ends up reversed during cpu hotplug. (cpuset allocates
> +	 * a bunch of GFP_KERNEL memory during cpuup). Due to all that,
> +	 * better to defer everything.
> +	 */
> +	memcg_create_cache_enqueue(memcg, cachep);
> +	return cachep;
> +out:
> +	rcu_read_unlock();
> +	return cachep;
>  }
>  EXPORT_SYMBOL(__memcg_kmem_get_cache);
>  
> -- 
> 1.8.0.2

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] memcg: take reference before releasing rcu_read_lock
  2013-03-29 10:28 [PATCH] memcg: take reference before releasing rcu_read_lock Li Zefan
  2013-03-29 10:48 ` Glauber Costa
  2013-03-29 14:59 ` Michal Hocko
@ 2013-04-01  5:03 ` Kamezawa Hiroyuki
  2 siblings, 0 replies; 6+ messages in thread
From: Kamezawa Hiroyuki @ 2013-04-01  5:03 UTC (permalink / raw)
  To: Li Zefan
  Cc: Glauber Costa, Michal Hocko, Johannes Weiner, LKML, Cgroups,
	linux-mm, Andrew Morton

(2013/03/29 19:28), Li Zefan wrote:
> The memcg is not referenced, so it can be destroyed at anytime right
> after we exit rcu read section, so it's not safe to access it.
> 
> To fix this, we call css_tryget() to get a reference while we're still
> in rcu read section.
> 
> This also removes a bogus comment above __memcg_create_cache_enqueue().
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-01  7:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-29 10:28 [PATCH] memcg: take reference before releasing rcu_read_lock Li Zefan
2013-03-29 10:48 ` Glauber Costa
2013-03-30  0:35   ` Li Zefan
2013-04-01  7:24     ` Glauber Costa
2013-03-29 14:59 ` Michal Hocko
2013-04-01  5:03 ` Kamezawa Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).