All of lore.kernel.org
 help / color / mirror / Atom feed
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Subject: Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
Date: Tue, 4 Dec 2012 12:05:21 +0400	[thread overview]
Message-ID: <50BDAEC1.8040805@parallels.com> (raw)
In-Reply-To: <20121203173205.GI17093-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

On 12/03/2012 09:32 PM, Michal Hocko wrote:
> On Fri 30-11-12 17:31:25, Glauber Costa wrote:
>> Although there is arguably some value in doing this per se, the main
>> goal of this patch is to make room for the locking changes to come.
>>
>> With all the value assignment from parent happening in a context where
>> our iterators can already be used, we can safely lock against value
>> change in some key values like use_hierarchy, without resorting to the
>> cgroup core at all.
> 
> I am sorry but I really do not get why online_css callback is more
> appropriate. Quite contrary. With this change iterators can see a group
> which is not fully initialized which calls for a problem (even though it
> is not one yet).

But it should be extremely easy to protect against this. It is just a
matter of not returning online css in the iterator: then we'll never see
them until they are online. This also sounds a lot more correct than
returning allocated css.


> Could you be more specific why we cannot keep the initialization in
> mem_cgroup_css_alloc? We can lock there as well, no?
> 
Because we need to parent value of things like use_hierarchy and
oom_control not to change after it was copied to a child.

If we do it in css_alloc, the iterators won't be working yet - nor will
cgrp->children list, for that matter - and we will risk a situation
where another thread thinks no children exist, and flips use_hierarchy
to 1 (or oom_control, etc), right after the children already got the
value of 0.

The two other ways to solve this problem that I see, are:

1) lock in css_alloc and unlock in css_online, that tejun already ruled
out as too damn ugly (and I can't possibly disagree)

2) have an alternate indication of emptiness that is working since
css_alloc (like counting number of children).

Since I don't share your concerns about the iterator showing incomplete
memcgs - trivial to fix, if not fixed already - I deemed my approach
preferable here.



>> Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>> ---
>>  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 35 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d80b6b5..b6d352f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5023,12 +5023,40 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  			INIT_WORK(&stock->work, drain_local_stock);
>>  		}
>>  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>> -	} else {
>> -		parent = mem_cgroup_from_cont(cont->parent);
>> -		memcg->use_hierarchy = parent->use_hierarchy;
>> -		memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>> +		res_counter_init(&memcg->res, NULL);
>> +		res_counter_init(&memcg->memsw, NULL);
>>  	}
>>  
>> +	memcg->last_scanned_node = MAX_NUMNODES;
>> +	INIT_LIST_HEAD(&memcg->oom_notify);
>> +	atomic_set(&memcg->refcnt, 1);
>> +	memcg->move_charge_at_immigrate = 0;
>> +	mutex_init(&memcg->thresholds_lock);
>> +	spin_lock_init(&memcg->move_lock);
>> +
>> +	return &memcg->css;
>> +
>> +free_out:
>> +	__mem_cgroup_free(memcg);
>> +	return ERR_PTR(error);
>> +}
>> +
>> +static int
>> +mem_cgroup_css_online(struct cgroup *cont)
>> +{
>> +	struct mem_cgroup *memcg, *parent;
>> +	int error = 0;
>> +
>> +	if (!cont->parent)
>> +		return 0;
>> +
>> +	memcg = mem_cgroup_from_cont(cont);
>> +	parent = mem_cgroup_from_cont(cont->parent);
>> +
>> +	memcg->use_hierarchy = parent->use_hierarchy;
>> +	memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>>  	if (parent && parent->use_hierarchy) {
>>  		res_counter_init(&memcg->res, &parent->res);
>>  		res_counter_init(&memcg->memsw, &parent->memsw);
>> @@ -5050,15 +5078,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		if (parent && parent != root_mem_cgroup)
>>  			mem_cgroup_subsys.broken_hierarchy = true;
>>  	}
>> -	memcg->last_scanned_node = MAX_NUMNODES;
>> -	INIT_LIST_HEAD(&memcg->oom_notify);
>>  
>> -	if (parent)
>> -		memcg->swappiness = mem_cgroup_swappiness(parent);
>> -	atomic_set(&memcg->refcnt, 1);
>> -	memcg->move_charge_at_immigrate = 0;
>> -	mutex_init(&memcg->thresholds_lock);
>> -	spin_lock_init(&memcg->move_lock);
>> +	memcg->swappiness = mem_cgroup_swappiness(parent);
>>  
>>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>  	if (error) {
>> @@ -5068,12 +5089,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		 * call __mem_cgroup_free, so return directly
>>  		 */
>>  		mem_cgroup_put(memcg);
>> -		return ERR_PTR(error);
>>  	}
>> -	return &memcg->css;
>> -free_out:
>> -	__mem_cgroup_free(memcg);
>> -	return ERR_PTR(error);
>> +	return error;
>>  }
>>  
>>  static void mem_cgroup_css_offline(struct cgroup *cont)
>> @@ -5702,6 +5719,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>>  	.name = "memory",
>>  	.subsys_id = mem_cgroup_subsys_id,
>>  	.css_alloc = mem_cgroup_css_alloc,
>> +	.css_online = mem_cgroup_css_online,
>>  	.css_offline = mem_cgroup_css_offline,
>>  	.css_free = mem_cgroup_css_free,
>>  	.can_attach = mem_cgroup_can_attach,
>> -- 
>> 1.7.11.7
>>
> 

WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer@parallels.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
	Tejun Heo <tj@kernel.org>,
	kamezawa.hiroyu@jp.fujitsu.com,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
Date: Tue, 4 Dec 2012 12:05:21 +0400	[thread overview]
Message-ID: <50BDAEC1.8040805@parallels.com> (raw)
In-Reply-To: <20121203173205.GI17093@dhcp22.suse.cz>

On 12/03/2012 09:32 PM, Michal Hocko wrote:
> On Fri 30-11-12 17:31:25, Glauber Costa wrote:
>> Although there is arguably some value in doing this per se, the main
>> goal of this patch is to make room for the locking changes to come.
>>
>> With all the value assignment from parent happening in a context where
>> our iterators can already be used, we can safely lock against value
>> change in some key values like use_hierarchy, without resorting to the
>> cgroup core at all.
> 
> I am sorry but I really do not get why online_css callback is more
> appropriate. Quite contrary. With this change iterators can see a group
> which is not fully initialized which calls for a problem (even though it
> is not one yet).

But it should be extremely easy to protect against this. It is just a
matter of not returning online css in the iterator: then we'll never see
them until they are online. This also sounds a lot more correct than
returning allocated css.


> Could you be more specific why we cannot keep the initialization in
> mem_cgroup_css_alloc? We can lock there as well, no?
> 
Because we need to parent value of things like use_hierarchy and
oom_control not to change after it was copied to a child.

If we do it in css_alloc, the iterators won't be working yet - nor will
cgrp->children list, for that matter - and we will risk a situation
where another thread thinks no children exist, and flips use_hierarchy
to 1 (or oom_control, etc), right after the children already got the
value of 0.

The two other ways to solve this problem that I see, are:

1) lock in css_alloc and unlock in css_online, that tejun already ruled
out as too damn ugly (and I can't possibly disagree)

2) have an alternate indication of emptiness that is working since
css_alloc (like counting number of children).

Since I don't share your concerns about the iterator showing incomplete
memcgs - trivial to fix, if not fixed already - I deemed my approach
preferable here.



>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> ---
>>  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 35 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d80b6b5..b6d352f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5023,12 +5023,40 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  			INIT_WORK(&stock->work, drain_local_stock);
>>  		}
>>  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>> -	} else {
>> -		parent = mem_cgroup_from_cont(cont->parent);
>> -		memcg->use_hierarchy = parent->use_hierarchy;
>> -		memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>> +		res_counter_init(&memcg->res, NULL);
>> +		res_counter_init(&memcg->memsw, NULL);
>>  	}
>>  
>> +	memcg->last_scanned_node = MAX_NUMNODES;
>> +	INIT_LIST_HEAD(&memcg->oom_notify);
>> +	atomic_set(&memcg->refcnt, 1);
>> +	memcg->move_charge_at_immigrate = 0;
>> +	mutex_init(&memcg->thresholds_lock);
>> +	spin_lock_init(&memcg->move_lock);
>> +
>> +	return &memcg->css;
>> +
>> +free_out:
>> +	__mem_cgroup_free(memcg);
>> +	return ERR_PTR(error);
>> +}
>> +
>> +static int
>> +mem_cgroup_css_online(struct cgroup *cont)
>> +{
>> +	struct mem_cgroup *memcg, *parent;
>> +	int error = 0;
>> +
>> +	if (!cont->parent)
>> +		return 0;
>> +
>> +	memcg = mem_cgroup_from_cont(cont);
>> +	parent = mem_cgroup_from_cont(cont->parent);
>> +
>> +	memcg->use_hierarchy = parent->use_hierarchy;
>> +	memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>>  	if (parent && parent->use_hierarchy) {
>>  		res_counter_init(&memcg->res, &parent->res);
>>  		res_counter_init(&memcg->memsw, &parent->memsw);
>> @@ -5050,15 +5078,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		if (parent && parent != root_mem_cgroup)
>>  			mem_cgroup_subsys.broken_hierarchy = true;
>>  	}
>> -	memcg->last_scanned_node = MAX_NUMNODES;
>> -	INIT_LIST_HEAD(&memcg->oom_notify);
>>  
>> -	if (parent)
>> -		memcg->swappiness = mem_cgroup_swappiness(parent);
>> -	atomic_set(&memcg->refcnt, 1);
>> -	memcg->move_charge_at_immigrate = 0;
>> -	mutex_init(&memcg->thresholds_lock);
>> -	spin_lock_init(&memcg->move_lock);
>> +	memcg->swappiness = mem_cgroup_swappiness(parent);
>>  
>>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>  	if (error) {
>> @@ -5068,12 +5089,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		 * call __mem_cgroup_free, so return directly
>>  		 */
>>  		mem_cgroup_put(memcg);
>> -		return ERR_PTR(error);
>>  	}
>> -	return &memcg->css;
>> -free_out:
>> -	__mem_cgroup_free(memcg);
>> -	return ERR_PTR(error);
>> +	return error;
>>  }
>>  
>>  static void mem_cgroup_css_offline(struct cgroup *cont)
>> @@ -5702,6 +5719,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>>  	.name = "memory",
>>  	.subsys_id = mem_cgroup_subsys_id,
>>  	.css_alloc = mem_cgroup_css_alloc,
>> +	.css_online = mem_cgroup_css_online,
>>  	.css_offline = mem_cgroup_css_offline,
>>  	.css_free = mem_cgroup_css_free,
>>  	.can_attach = mem_cgroup_can_attach,
>> -- 
>> 1.7.11.7
>>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-12-04  8:05 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
     [not found]   ` <1354282286-32278-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-11-30 15:11     ` Tejun Heo
2012-11-30 15:11       ` Tejun Heo
     [not found]       ` <20121130151158.GB3873-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-30 15:13         ` Glauber Costa
2012-11-30 15:13           ` Glauber Costa
     [not found]           ` <50B8CD32.4080807-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-11-30 15:45             ` Tejun Heo
2012-11-30 15:45               ` Tejun Heo
     [not found]               ` <20121130154504.GD3873-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-30 15:49                 ` Michal Hocko
2012-11-30 15:49                   ` Michal Hocko
2012-11-30 15:57                   ` Glauber Costa
2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
2012-11-30 15:19   ` Tejun Heo
2012-11-30 15:29     ` Glauber Costa
     [not found]   ` <1354282286-32278-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-04  9:29     ` Michal Hocko
2012-12-04  9:29       ` Michal Hocko
2012-11-30 13:31 ` [PATCH 3/4] memcg: split part of memcg creation to css_online Glauber Costa
     [not found]   ` <1354282286-32278-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-03 17:32     ` Michal Hocko
2012-12-03 17:32       ` Michal Hocko
     [not found]       ` <20121203173205.GI17093-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04  8:05         ` Glauber Costa [this message]
2012-12-04  8:05           ` Glauber Costa
     [not found]           ` <50BDAEC1.8040805-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-04  8:17             ` Michal Hocko
2012-12-04  8:17               ` Michal Hocko
     [not found]               ` <20121204081756.GA31319-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04  8:32                 ` Glauber Costa
2012-12-04  8:32                   ` Glauber Costa
     [not found]                   ` <50BDB511.5070107-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-04  8:52                     ` Michal Hocko
2012-12-04  8:52                       ` Michal Hocko
2012-11-30 13:31 ` [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
     [not found]   ` <1354282286-32278-5-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-03 17:15     ` Michal Hocko
2012-12-03 17:15       ` Michal Hocko
     [not found]       ` <20121203171532.GG17093-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-03 17:30         ` Michal Hocko
2012-12-03 17:30           ` Michal Hocko
     [not found]           ` <20121203173002.GH17093-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04  7:49             ` Glauber Costa
2012-12-04  7:49               ` Glauber Costa
2012-12-04  7:58       ` Glauber Costa
     [not found]         ` <50BDAD38.6030200-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-04  8:23           ` Michal Hocko
2012-12-04  8:23             ` Michal Hocko
     [not found]             ` <20121204082316.GB31319-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04  8:31               ` Glauber Costa
2012-12-04  8:31                 ` Glauber Costa
     [not found]                 ` <50BDB4E3.4040107-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-04  8:45                   ` Michal Hocko
2012-12-04  8:45                     ` Michal Hocko
     [not found]                     ` <20121204084544.GC31319-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04 14:52                       ` Tejun Heo
2012-12-04 14:52                         ` Tejun Heo
     [not found]                         ` <20121204145221.GA3885-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-12-04 15:14                           ` Michal Hocko
2012-12-04 15:14                             ` Michal Hocko
     [not found]                             ` <20121204151420.GL31319-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-04 15:22                               ` Tejun Heo
2012-12-04 15:22                                 ` Tejun Heo
     [not found]                                 ` <20121204152225.GC3885-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-12-05 14:35                                   ` Michal Hocko
2012-12-05 14:35                                     ` Michal Hocko
     [not found]                                     ` <20121205143537.GC9714-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-12-05 14:41                                       ` Tejun Heo
2012-12-05 14:41                                         ` Tejun Heo
     [not found] ` <1354282286-32278-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-11-30 15:52   ` [PATCH 0/4] replace cgroup_lock with local lock in memcg Tejun Heo
2012-11-30 15:52     ` Tejun Heo
2012-11-30 15:59     ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BDAEC1.8040805@parallels.com \
    --to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.