[PATCH 0/4] replace cgroup_lock with local lock in memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] replace cgroup_lock with local lock in memcg
@ 2012-11-30 13:31 Glauber Costa
  2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 13:31 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner,
	Michal Hocko

Hi,

In memcg, we use the cgroup_lock basically to synchronize against two events:
attaching tasks and children to a cgroup.

For the problem of attaching tasks, I am using something similar to cpusets:
when task attaching starts, we will flip a flag "attach_in_progress", that will
be flipped down when it finishes. This way, all readers can know that a task is
joining the group and take action accordingly. With this, we can guarantee that
the behavior of move_charge_at_immigrate continues safe

Protecting against children creation requires a bit more work. For those, the
calls to cgroup_lock() all live in handlers like mem_cgroup_hierarchy_write(),
where we change a tunable in the group, that is hierarchy-related. For
instance, the use_hierarchy flag cannot be changed if the cgroup already have
children.

Furthermore, those values are propageted from the parent to the child when a
new child is created. So if we don't lock like this, we can end up with the
following situation:

A                                   B
 memcg_css_alloc()                       mem_cgroup_hierarchy_write()
 copy use hierarchy from parent          change use hierarchy in parent
 finish creation.

This is mainly because during create, we are still not fully connected to the
css tree. So all iterators and the such that we could use, will fail to show
that the group has children.

My observation is that all of creation can proceed in parallel with those
tasks, except value assignment. So what this patchseries does is to first move
all value assignment that is dependent on parent values from css_alloc to
css_online, where the iterators all work, and then we lock only the value
assignment. This will guarantee that parent and children always have
consistent values.

Glauber Costa (4):
  cgroup: warn about broken hierarchies only after css_online
  memcg: prevent changes to move_charge_at_immigrate during task attach
  memcg: split part of memcg creation to css_online
  memcg: replace cgroup_lock with memcg specific memcg_lock

 kernel/cgroup.c |  18 +++----
 mm/memcontrol.c | 164 +++++++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 129 insertions(+), 53 deletions(-)

-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
@ 2012-11-30 13:31 ` Glauber Costa
  2012-11-30 15:11   ` Tejun Heo
  2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 13:31 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner,
	Michal Hocko, Glauber Costa

If everything goes right, it shouldn't really matter if we are spitting
this warning after css_alloc or css_online. If we fail between then,
there are some ill cases where we would previously see the message and
now we won't (like if the files fail to be created).

I believe it really shouldn't matter: this message is intended in spirit
to be shown when creation succeeds, but with insane settings.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 kernel/cgroup.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 56ed543..b35a10c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,15 +4151,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 			if (err)
 				goto err_free_all;
 		}
-
-		if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
-		    parent->parent) {
-			pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
-				   current->comm, current->pid, ss->name);
-			if (!strcmp(ss->name, "memory"))
-				pr_warning("cgroup: \"memory\" requires setting use_hierarchy to 1 on the root.\n");
-			ss->warned_broken_hierarchy = true;
-		}
 	}
 
 	/*
@@ -4188,6 +4179,15 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 		err = online_css(ss, cgrp);
 		if (err)
 			goto err_destroy;
+
+		if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
+		    parent->parent) {
+			pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
+				   current->comm, current->pid, ss->name);
+			if (!strcmp(ss->name, "memory"))
+				pr_warning("cgroup: \"memory\" requires setting use_hierarchy to 1 on the root.\n");
+			ss->warned_broken_hierarchy = true;
+		}
 	}
 
 	err = cgroup_populate_dir(cgrp, true, root->subsys_mask);
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach
  2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
  2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
@ 2012-11-30 13:31 ` Glauber Costa
  2012-11-30 15:19   ` Tejun Heo
  2012-12-04  9:29   ` Michal Hocko
  2012-11-30 13:31 ` [PATCH 3/4] memcg: split part of memcg creation to css_online Glauber Costa
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 13:31 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner,
	Michal Hocko, Glauber Costa

Currently, we rely on the cgroup_lock() to prevent changes to
move_charge_at_immigrate during task migration. We can do something
similar to what cpuset is doing, and flip a flag to tell us if task
movement is taking place.

In theory, we could busy loop waiting for that value to return to 0 - it
will eventually. But I am judging that returning EAGAIN is not too much
of a problem, since file writers already should be checking for error
codes anyway.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 64 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 51 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index feba87d..d80b6b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -311,7 +311,13 @@ struct mem_cgroup {
 	 * Should we move charges of a task when a task is moved into this
 	 * mem_cgroup ? And what type of charges should we move ?
 	 */
-	unsigned long 	move_charge_at_immigrate;
+	unsigned long	move_charge_at_immigrate;
+        /*
+	 * Tasks are being attached to this memcg.  Used mostly to prevent
+	 * changes to move_charge_at_immigrate
+	 */
+        int attach_in_progress;
+
 	/*
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
@@ -4114,6 +4120,7 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 					struct cftype *cft, u64 val)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int ret = -EAGAIN;
 
 	if (val >= (1 << NR_MOVE_TYPE))
 		return -EINVAL;
@@ -4123,10 +4130,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 	 * inconsistent.
 	 */
 	cgroup_lock();
+	if (memcg->attach_in_progress)
+		goto out;
 	memcg->move_charge_at_immigrate = val;
+	ret = 0;
+out:
 	cgroup_unlock();
-
-	return 0;
+	return ret;
 }
 #else
 static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
@@ -5443,12 +5453,12 @@ static void mem_cgroup_clear_mc(void)
 	mem_cgroup_end_move(from);
 }
 
-static int mem_cgroup_can_attach(struct cgroup *cgroup,
-				 struct cgroup_taskset *tset)
+
+static int __mem_cgroup_can_attach(struct mem_cgroup *memcg,
+				   struct cgroup_taskset *tset)
 {
 	struct task_struct *p = cgroup_taskset_first(tset);
 	int ret = 0;
-	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
 
 	if (memcg->move_charge_at_immigrate) {
 		struct mm_struct *mm;
@@ -5482,8 +5492,8 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
 	return ret;
 }
 
-static void mem_cgroup_cancel_attach(struct cgroup *cgroup,
-				     struct cgroup_taskset *tset)
+static void __mem_cgroup_cancel_attach(struct mem_cgroup *memcg,
+				       struct cgroup_taskset *tset)
 {
 	mem_cgroup_clear_mc();
 }
@@ -5630,8 +5640,8 @@ retry:
 	up_read(&mm->mmap_sem);
 }
 
-static void mem_cgroup_move_task(struct cgroup *cont,
-				 struct cgroup_taskset *tset)
+static void __mem_cgroup_move_task(struct mem_cgroup *memcg,
+				   struct cgroup_taskset *tset)
 {
 	struct task_struct *p = cgroup_taskset_first(tset);
 	struct mm_struct *mm = get_task_mm(p);
@@ -5645,20 +5655,48 @@ static void mem_cgroup_move_task(struct cgroup *cont,
 		mem_cgroup_clear_mc();
 }
 #else	/* !CONFIG_MMU */
+static int __mem_cgroup_can_attach(struct mem_cgroup *memcg,
+				   struct cgroup_taskset *tset)
+{
+	return 0;
+}
+
+static void __mem_cgroup_cancel_attach(struct mem_cgroup *memcg,
+				       struct cgroup_taskset *tset)
+{
+}
+
+static void __mem_cgroup_move_task(struct mem_cgroup *memcg,
+				   struct cgroup_taskset *tset)
+{
+}
+#endif
 static int mem_cgroup_can_attach(struct cgroup *cgroup,
 				 struct cgroup_taskset *tset)
 {
-	return 0;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
+	memcg->attach_in_progress++;
+	return __mem_cgroup_can_attach(memcg, tset);
 }
+
 static void mem_cgroup_cancel_attach(struct cgroup *cgroup,
 				     struct cgroup_taskset *tset)
 {
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
+	__mem_cgroup_cancel_attach(memcg, tset);
+	memcg->attach_in_progress--;
 }
-static void mem_cgroup_move_task(struct cgroup *cont,
+
+static void mem_cgroup_move_task(struct cgroup *cgroup,
 				 struct cgroup_taskset *tset)
 {
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
+	__mem_cgroup_move_task(memcg, tset);
+	memcg->attach_in_progress--;
 }
-#endif
 
 struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
  2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
  2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2012-11-30 13:31 ` Glauber Costa
  2012-12-03 17:32   ` Michal Hocko
  2012-11-30 13:31 ` [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
  2012-11-30 15:52 ` [PATCH 0/4] replace cgroup_lock with local lock in memcg Tejun Heo
  4 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 13:31 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner,
	Michal Hocko, Glauber Costa

Although there is arguably some value in doing this per se, the main
goal of this patch is to make room for the locking changes to come.

With all the value assignment from parent happening in a context where
our iterators can already be used, we can safely lock against value
change in some key values like use_hierarchy, without resorting to the
cgroup core at all.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d80b6b5..b6d352f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5023,12 +5023,40 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
 		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
-	} else {
-		parent = mem_cgroup_from_cont(cont->parent);
-		memcg->use_hierarchy = parent->use_hierarchy;
-		memcg->oom_kill_disable = parent->oom_kill_disable;
+
+		res_counter_init(&memcg->res, NULL);
+		res_counter_init(&memcg->memsw, NULL);
 	}
 
+	memcg->last_scanned_node = MAX_NUMNODES;
+	INIT_LIST_HEAD(&memcg->oom_notify);
+	atomic_set(&memcg->refcnt, 1);
+	memcg->move_charge_at_immigrate = 0;
+	mutex_init(&memcg->thresholds_lock);
+	spin_lock_init(&memcg->move_lock);
+
+	return &memcg->css;
+
+free_out:
+	__mem_cgroup_free(memcg);
+	return ERR_PTR(error);
+}
+
+static int
+mem_cgroup_css_online(struct cgroup *cont)
+{
+	struct mem_cgroup *memcg, *parent;
+	int error = 0;
+
+	if (!cont->parent)
+		return 0;
+
+	memcg = mem_cgroup_from_cont(cont);
+	parent = mem_cgroup_from_cont(cont->parent);
+
+	memcg->use_hierarchy = parent->use_hierarchy;
+	memcg->oom_kill_disable = parent->oom_kill_disable;
+
 	if (parent && parent->use_hierarchy) {
 		res_counter_init(&memcg->res, &parent->res);
 		res_counter_init(&memcg->memsw, &parent->memsw);
@@ -5050,15 +5078,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		if (parent && parent != root_mem_cgroup)
 			mem_cgroup_subsys.broken_hierarchy = true;
 	}
-	memcg->last_scanned_node = MAX_NUMNODES;
-	INIT_LIST_HEAD(&memcg->oom_notify);
 
-	if (parent)
-		memcg->swappiness = mem_cgroup_swappiness(parent);
-	atomic_set(&memcg->refcnt, 1);
-	memcg->move_charge_at_immigrate = 0;
-	mutex_init(&memcg->thresholds_lock);
-	spin_lock_init(&memcg->move_lock);
+	memcg->swappiness = mem_cgroup_swappiness(parent);
 
 	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
 	if (error) {
@@ -5068,12 +5089,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		 * call __mem_cgroup_free, so return directly
 		 */
 		mem_cgroup_put(memcg);
-		return ERR_PTR(error);
 	}
-	return &memcg->css;
-free_out:
-	__mem_cgroup_free(memcg);
-	return ERR_PTR(error);
+	return error;
 }
 
 static void mem_cgroup_css_offline(struct cgroup *cont)
@@ -5702,6 +5719,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
 	.subsys_id = mem_cgroup_subsys_id,
 	.css_alloc = mem_cgroup_css_alloc,
+	.css_online = mem_cgroup_css_online,
 	.css_offline = mem_cgroup_css_offline,
 	.css_free = mem_cgroup_css_free,
 	.can_attach = mem_cgroup_can_attach,
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
                   ` (2 preceding siblings ...)
  2012-11-30 13:31 ` [PATCH 3/4] memcg: split part of memcg creation to css_online Glauber Costa
@ 2012-11-30 13:31 ` Glauber Costa
  2012-12-03 17:15   ` Michal Hocko
  2012-11-30 15:52 ` [PATCH 0/4] replace cgroup_lock with local lock in memcg Tejun Heo
  4 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 13:31 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner,
	Michal Hocko, Glauber Costa

After the preparation work done in earlier patches, the
cgroup_lock can be trivially replaced with a memcg-specific lock in all
readers.

The writers, however, used to be naturally called under cgroup_lock, and
now we need to explicitly add the memcg_lock. Those are the callbacks in
attach_task, and parent-dependent value assignment in newly-created
memcgs.

With this, all the calls to cgroup_lock outside cgroup core are gone.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b6d352f..fd7b5d3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3830,6 +3830,17 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
 	} while (res_counter_read_u64(&memcg->res, RES_USAGE) > 0);
 }
 
+static DEFINE_MUTEX(memcg_lock);
+
+/*
+ * must be called with memcg_lock held, unless the cgroup is guaranteed to be
+ * already dead (like in mem_cgroup_force_empty, for instance).
+ */
+static inline bool memcg_has_children(struct mem_cgroup *memcg)
+{
+	return mem_cgroup_count_children(memcg) != 1;
+}
+
 /*
  * Reclaims as many pages from the given memcg as possible and moves
  * the rest to the parent.
@@ -3842,7 +3853,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 	struct cgroup *cgrp = memcg->css.cgroup;
 
 	/* returns EBUSY if there is a task or if we come here twice. */
-	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
+	if (cgroup_task_count(cgrp) || memcg_has_children(memcg))
 		return -EBUSY;
 
 	/* we call try-to-free pages for make this cgroup empty */
@@ -3900,7 +3911,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 	if (parent)
 		parent_memcg = mem_cgroup_from_cont(parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_lock);
 
 	if (memcg->use_hierarchy == val)
 		goto out;
@@ -3915,7 +3926,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 	 */
 	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
 				(val == 1 || val == 0)) {
-		if (list_empty(&cont->children))
+		if (!memcg_has_children(memcg))
 			memcg->use_hierarchy = val;
 		else
 			retval = -EBUSY;
@@ -3923,7 +3934,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 		retval = -EINVAL;
 
 out:
-	cgroup_unlock();
+	mutex_unlock(&memcg_lock);
 
 	return retval;
 }
@@ -4129,13 +4140,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 	 * attach(), so we need cgroup lock to prevent this value from being
 	 * inconsistent.
 	 */
-	cgroup_lock();
+	mutex_lock(&memcg_lock);
 	if (memcg->attach_in_progress)
 		goto out;
 	memcg->move_charge_at_immigrate = val;
 	ret = 0;
 out:
-	cgroup_unlock();
+	mutex_unlock(&memcg_lock);
 	return ret;
 }
 #else
@@ -4314,18 +4325,18 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 
 	parent = mem_cgroup_from_cont(cgrp->parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_lock);
 
 	/* If under hierarchy, only empty-root can set this value */
 	if ((parent->use_hierarchy) ||
-	    (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
-		cgroup_unlock();
+	    (memcg->use_hierarchy && !memcg_has_children(memcg))) {
+		mutex_unlock(&memcg_lock);
 		return -EINVAL;
 	}
 
 	memcg->swappiness = val;
 
-	cgroup_unlock();
+	mutex_unlock(&memcg_lock);
 
 	return 0;
 }
@@ -4651,17 +4662,17 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
 
 	parent = mem_cgroup_from_cont(cgrp->parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_lock);
 	/* oom-kill-disable is a flag for subhierarchy. */
 	if ((parent->use_hierarchy) ||
-	    (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
-		cgroup_unlock();
+	    (memcg->use_hierarchy && memcg_has_children(memcg))) {
+		mutex_unlock(&memcg_lock);
 		return -EINVAL;
 	}
 	memcg->oom_kill_disable = val;
 	if (!val)
 		memcg_oom_recover(memcg);
-	cgroup_unlock();
+	mutex_unlock(&memcg_lock);
 	return 0;
 }
 
@@ -5051,6 +5062,7 @@ mem_cgroup_css_online(struct cgroup *cont)
 	if (!cont->parent)
 		return 0;
 
+	mutex_lock(&memcg_lock);
 	memcg = mem_cgroup_from_cont(cont);
 	parent = mem_cgroup_from_cont(cont->parent);
 
@@ -5082,6 +5094,7 @@ mem_cgroup_css_online(struct cgroup *cont)
 	memcg->swappiness = mem_cgroup_swappiness(parent);
 
 	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
+	mutex_unlock(&memcg_lock);
 	if (error) {
 		/*
 		 * We call put now because our (and parent's) refcnts
@@ -5693,7 +5706,10 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
 
+	mutex_lock(&memcg_lock);
 	memcg->attach_in_progress++;
+	mutex_unlock(&memcg_lock);
+
 	return __mem_cgroup_can_attach(memcg, tset);
 }
 
@@ -5703,7 +5719,9 @@ static void mem_cgroup_cancel_attach(struct cgroup *cgroup,
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
 
 	__mem_cgroup_cancel_attach(memcg, tset);
+	mutex_lock(&memcg_lock);
 	memcg->attach_in_progress--;
+	mutex_unlock(&memcg_lock);
 }
 
 static void mem_cgroup_move_task(struct cgroup *cgroup,
@@ -5712,7 +5730,9 @@ static void mem_cgroup_move_task(struct cgroup *cgroup,
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
 
 	__mem_cgroup_move_task(memcg, tset);
+	mutex_lock(&memcg_lock);
 	memcg->attach_in_progress--;
+	mutex_unlock(&memcg_lock);
 }
 
 struct cgroup_subsys mem_cgroup_subsys = {
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
@ 2012-11-30 15:11   ` Tejun Heo
  2012-11-30 15:13     ` Glauber Costa
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-11-30 15:11 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

On Fri, Nov 30, 2012 at 05:31:23PM +0400, Glauber Costa wrote:
> If everything goes right, it shouldn't really matter if we are spitting
> this warning after css_alloc or css_online. If we fail between then,
> there are some ill cases where we would previously see the message and
> now we won't (like if the files fail to be created).
> 
> I believe it really shouldn't matter: this message is intended in spirit
> to be shown when creation succeeds, but with insane settings.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>

Applied to cgroup/for-3.8.  Thanks!

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 15:11   ` Tejun Heo
@ 2012-11-30 15:13     ` Glauber Costa
  2012-11-30 15:45       ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 15:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

On 11/30/2012 07:11 PM, Tejun Heo wrote:
> On Fri, Nov 30, 2012 at 05:31:23PM +0400, Glauber Costa wrote:
>> If everything goes right, it shouldn't really matter if we are spitting
>> this warning after css_alloc or css_online. If we fail between then,
>> there are some ill cases where we would previously see the message and
>> now we won't (like if the files fail to be created).
>>
>> I believe it really shouldn't matter: this message is intended in spirit
>> to be shown when creation succeeds, but with insane settings.
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
> 
> Applied to cgroup/for-3.8.  Thanks!
> 

We just need to be careful because when we merge it with morton's, more
bits will need converting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach
  2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2012-11-30 15:19   ` Tejun Heo
  2012-11-30 15:29     ` Glauber Costa
  2012-12-04  9:29   ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-11-30 15:19 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

Hello, Glauber.

On Fri, Nov 30, 2012 at 05:31:24PM +0400, Glauber Costa wrote:
> ---
>  mm/memcontrol.c | 64 +++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 51 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index feba87d..d80b6b5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -311,7 +311,13 @@ struct mem_cgroup {
>  	 * Should we move charges of a task when a task is moved into this
>  	 * mem_cgroup ? And what type of charges should we move ?
>  	 */
> -	unsigned long 	move_charge_at_immigrate;
> +	unsigned long	move_charge_at_immigrate;
> +        /*

Weird indentation (maybe spaces instead of a tab?)

> +	 * Tasks are being attached to this memcg.  Used mostly to prevent
> +	 * changes to move_charge_at_immigrate
> +	 */
> +        int attach_in_progress;

Ditto.

>  	/*
>  	 * set > 0 if pages under this cgroup are moving to other cgroup.
>  	 */
> @@ -4114,6 +4120,7 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  					struct cftype *cft, u64 val)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
> +	int ret = -EAGAIN;
>  
>  	if (val >= (1 << NR_MOVE_TYPE))
>  		return -EINVAL;
> @@ -4123,10 +4130,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  	 * inconsistent.
>  	 */
>  	cgroup_lock();
> +	if (memcg->attach_in_progress)
> +		goto out;
>  	memcg->move_charge_at_immigrate = val;
> +	ret = 0;
> +out:
>  	cgroup_unlock();
> -
> -	return 0;
> +	return ret;

Unsure whether this is a good behavior.  It's a bit nasty to fail for
internal temporary reasons like this.  If it ever causes a problem,
the occurrences are likely to be far and between making it difficult
to debug.  Can't you determine to immigrate or not in ->can_attach(),
record whether to do that or not on the css, and finish it in
->attach() according to that.  There's no need to consult the config
multiple times.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach
  2012-11-30 15:19   ` Tejun Heo
@ 2012-11-30 15:29     ` Glauber Costa
  0 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 15:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

>>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
>> +	int ret = -EAGAIN;
>>  
>>  	if (val >= (1 << NR_MOVE_TYPE))
>>  		return -EINVAL;
>> @@ -4123,10 +4130,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>>  	 * inconsistent.
>>  	 */
>>  	cgroup_lock();
>> +	if (memcg->attach_in_progress)
>> +		goto out;
>>  	memcg->move_charge_at_immigrate = val;
>> +	ret = 0;
>> +out:
>>  	cgroup_unlock();
>> -
>> -	return 0;
>> +	return ret;
> 
> Unsure whether this is a good behavior. 
to be honest, I am not so sure myself. I kinda leaned towards this after
some consideration, because the other solution I saw was basically busy
waiting...

> It's a bit nasty to fail for
> internal temporary reasons like this.  If it ever causes a problem,
> the occurrences are likely to be far and between making it difficult
> to debug.  Can't you determine to immigrate or not in ->can_attach(),
> record whether to do that or not on the css, and finish it in
> ->attach() according to that.  There's no need to consult the config
> multiple times.
> 
Well, yeah... that is an option too, and indeed better. Good call.

I will send it again soon



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 15:13     ` Glauber Costa
@ 2012-11-30 15:45       ` Tejun Heo
  2012-11-30 15:49         ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-11-30 15:45 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

Hello, Glauber.

On Fri, Nov 30, 2012 at 07:13:54PM +0400, Glauber Costa wrote:
> > Applied to cgroup/for-3.8.  Thanks!
> > 
> 
> We just need to be careful because when we merge it with morton's, more
> bits will need converting.

This one is in cgrou proper and I think it should be safe, right?
Other ones will be difficult.  Not sure how to handle them ATM.  An
easy way out would be deferring to the next merge window as it's so
close anyway.  Michal?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 15:45       ` Tejun Heo
@ 2012-11-30 15:49         ` Michal Hocko
  2012-11-30 15:57           ` Glauber Costa
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-11-30 15:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

On Fri 30-11-12 07:45:04, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Fri, Nov 30, 2012 at 07:13:54PM +0400, Glauber Costa wrote:
> > > Applied to cgroup/for-3.8.  Thanks!
> > > 
> > 
> > We just need to be careful because when we merge it with morton's, more
> > bits will need converting.
> 
> This one is in cgrou proper and I think it should be safe, right?
> Other ones will be difficult.  Not sure how to handle them ATM.  An
> easy way out would be deferring to the next merge window as it's so
> close anyway.  Michal?

yes, I think so as well. I guess the window will open soon.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] replace cgroup_lock with local lock in memcg
  2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
                   ` (3 preceding siblings ...)
  2012-11-30 13:31 ` [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2012-11-30 15:52 ` Tejun Heo
  2012-11-30 15:59   ` Glauber Costa
  4 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-11-30 15:52 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

Hey, Glauber.

I don't know enough about memcg to be acking this but overall it looks
pretty good to me.

On Fri, Nov 30, 2012 at 05:31:22PM +0400, Glauber Costa wrote:
> For the problem of attaching tasks, I am using something similar to cpusets:
> when task attaching starts, we will flip a flag "attach_in_progress", that will
> be flipped down when it finishes. This way, all readers can know that a task is
> joining the group and take action accordingly. With this, we can guarantee that
> the behavior of move_charge_at_immigrate continues safe

Yeap, attach_in_progress is useful if there are some conditions which
shouldn't change between ->can_attach() and ->attach().  With the
immigrate thing gone, this no longer is necessary, right?

> Protecting against children creation requires a bit more work. For those, the
> calls to cgroup_lock() all live in handlers like mem_cgroup_hierarchy_write(),
> where we change a tunable in the group, that is hierarchy-related. For
> instance, the use_hierarchy flag cannot be changed if the cgroup already have
> children.
> 
> Furthermore, those values are propageted from the parent to the child when a
> new child is created. So if we don't lock like this, we can end up with the
> following situation:
> 
> A                                   B
>  memcg_css_alloc()                       mem_cgroup_hierarchy_write()
>  copy use hierarchy from parent          change use hierarchy in parent
>  finish creation.
> 
> This is mainly because during create, we are still not fully connected to the
> css tree. So all iterators and the such that we could use, will fail to show
> that the group has children.
> 
> My observation is that all of creation can proceed in parallel with those
> tasks, except value assignment. So what this patchseries does is to first move
> all value assignment that is dependent on parent values from css_alloc to
> css_online, where the iterators all work, and then we lock only the value
> assignment. This will guarantee that parent and children always have
> consistent values.

Right, exactly the reason ->css_online() exists.

Thanks a lot!

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online
  2012-11-30 15:49         ` Michal Hocko
@ 2012-11-30 15:57           ` Glauber Costa
  0 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner

On 11/30/2012 07:49 PM, Michal Hocko wrote:
> On Fri 30-11-12 07:45:04, Tejun Heo wrote:
>> Hello, Glauber.
>>
>> On Fri, Nov 30, 2012 at 07:13:54PM +0400, Glauber Costa wrote:
>>>> Applied to cgroup/for-3.8.  Thanks!
>>>>
>>>
>>> We just need to be careful because when we merge it with morton's, more
>>> bits will need converting.
>>
>> This one is in cgrou proper and I think it should be safe, right?
>> Other ones will be difficult.  Not sure how to handle them ATM.  An
>> easy way out would be deferring to the next merge window as it's so
>> close anyway.  Michal?
> 
> yes, I think so as well. I guess the window will open soon.
> 
I vote for deferring as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] replace cgroup_lock with local lock in memcg
  2012-11-30 15:52 ` [PATCH 0/4] replace cgroup_lock with local lock in memcg Tejun Heo
@ 2012-11-30 15:59   ` Glauber Costa
  0 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-11-30 15:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

On 11/30/2012 07:52 PM, Tejun Heo wrote:
> Hey, Glauber.
> 
> I don't know enough about memcg to be acking this but overall it looks
> pretty good to me.
> 
> On Fri, Nov 30, 2012 at 05:31:22PM +0400, Glauber Costa wrote:
>> For the problem of attaching tasks, I am using something similar to cpusets:
>> when task attaching starts, we will flip a flag "attach_in_progress", that will
>> be flipped down when it finishes. This way, all readers can know that a task is
>> joining the group and take action accordingly. With this, we can guarantee that
>> the behavior of move_charge_at_immigrate continues safe
> 
> Yeap, attach_in_progress is useful if there are some conditions which
> shouldn't change between ->can_attach() and ->attach().  With the
> immigrate thing gone, this no longer is necessary, right?
> 

Yes and no. While it can help with immigrate, we still have kmem that
needs to be protected against tasks joining.

However, kmem is easier. If attach_in_progress is ever positive, it
means that a task is joining, and it is already unacceptable for kmem -
so we can fail right away.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-11-30 13:31 ` [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2012-12-03 17:15   ` Michal Hocko
  2012-12-03 17:30     ` Michal Hocko
  2012-12-04  7:58     ` Glauber Costa
  0 siblings, 2 replies; 32+ messages in thread
From: Michal Hocko @ 2012-12-03 17:15 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Fri 30-11-12 17:31:26, Glauber Costa wrote:
[...]
> +/*
> + * must be called with memcg_lock held, unless the cgroup is guaranteed to be
> + * already dead (like in mem_cgroup_force_empty, for instance).
> + */
> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
> +{
> +	return mem_cgroup_count_children(memcg) != 1;
> +}

Why not just keep list_empty(&cgrp->children) which is much simpler much
more effective and correct here as well because cgroup cannot vanish
while we are at the call because all callers come from cgroup fs?

[...]
> @@ -3900,7 +3911,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
>  	if (parent)
>  		parent_memcg = mem_cgroup_from_cont(parent);
>  
> -	cgroup_lock();
> +	mutex_lock(&memcg_lock);
>  
>  	if (memcg->use_hierarchy == val)
>  		goto out;
> @@ -3915,7 +3926,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
>  	 */
>  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
>  				(val == 1 || val == 0)) {
> -		if (list_empty(&cont->children))
> +		if (!memcg_has_children(memcg))
>  			memcg->use_hierarchy = val;
>  		else
>  			retval = -EBUSY;

Nothing prevents from a race when a task is on the way to be attached to
the group. This means that we might miss some charges up the way to the
parent.

mem_cgroup_hierarchy_write
  					cgroup_attach_task
					  ss->can_attach() = mem_cgroup_can_attach
					    mutex_lock(&memcg_lock)
					    memcg->attach_in_progress++
					    mutex_unlock(&memcg_lock)
					    __mem_cgroup_can_attach
					      mem_cgroup_precharge_mc (*)
  mutex_lock(memcg_lock)
  memcg_has_children(memcg)==false
					  cgroup_task_migrate
  memcg->use_hierarchy = val;
					  ss->attach()

(*) All the charches here are not propagated upwards.

Fixable simply by testing attach_in_progress as well. The same applies
to all other cases so it would be much better to prepare a common helper
which does the whole magic.

[...]

Thanks
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-03 17:15   ` Michal Hocko
@ 2012-12-03 17:30     ` Michal Hocko
  2012-12-04  7:49       ` Glauber Costa
  2012-12-04  7:58     ` Glauber Costa
  1 sibling, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-03 17:30 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Mon 03-12-12 18:15:32, Michal Hocko wrote:
[...]
> > @@ -3915,7 +3926,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
> >  	 */
> >  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
> >  				(val == 1 || val == 0)) {
> > -		if (list_empty(&cont->children))
> > +		if (!memcg_has_children(memcg))
> >  			memcg->use_hierarchy = val;
> >  		else
> >  			retval = -EBUSY;
> 
> Nothing prevents from a race when a task is on the way to be attached to
> the group. This means that we might miss some charges up the way to the
> parent.
> 
> mem_cgroup_hierarchy_write
>   					cgroup_attach_task
> 					  ss->can_attach() = mem_cgroup_can_attach
> 					    mutex_lock(&memcg_lock)
> 					    memcg->attach_in_progress++
> 					    mutex_unlock(&memcg_lock)
> 					    __mem_cgroup_can_attach
> 					      mem_cgroup_precharge_mc (*)
>   mutex_lock(memcg_lock)
>   memcg_has_children(memcg)==false

Dohh, retard alert. I obviously mixed tasks and children cgroups here.
Why I thought we also do check for no tasks in the group? Ahh, because
we should, at least here otherwise parent could see more uncharges than
charges.
But that deserves a separate patch. Sorry, for the confusion.

> 					  cgroup_task_migrate
>   memcg->use_hierarchy = val;
> 					  ss->attach()
> 
> (*) All the charches here are not propagated upwards.
> 
> Fixable simply by testing attach_in_progress as well. The same applies
> to all other cases so it would be much better to prepare a common helper
> which does the whole magic.
> 
> [...]
> 
> Thanks
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-11-30 13:31 ` [PATCH 3/4] memcg: split part of memcg creation to css_online Glauber Costa
@ 2012-12-03 17:32   ` Michal Hocko
  2012-12-04  8:05     ` Glauber Costa
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-03 17:32 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Fri 30-11-12 17:31:25, Glauber Costa wrote:
> Although there is arguably some value in doing this per se, the main
> goal of this patch is to make room for the locking changes to come.
> 
> With all the value assignment from parent happening in a context where
> our iterators can already be used, we can safely lock against value
> change in some key values like use_hierarchy, without resorting to the
> cgroup core at all.

I am sorry but I really do not get why online_css callback is more
appropriate. Quite contrary. With this change iterators can see a group
which is not fully initialized which calls for a problem (even though it
is not one yet).
Could you be more specific why we cannot keep the initialization in
mem_cgroup_css_alloc? We can lock there as well, no?

> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 35 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d80b6b5..b6d352f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5023,12 +5023,40 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  			INIT_WORK(&stock->work, drain_local_stock);
>  		}
>  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
> -	} else {
> -		parent = mem_cgroup_from_cont(cont->parent);
> -		memcg->use_hierarchy = parent->use_hierarchy;
> -		memcg->oom_kill_disable = parent->oom_kill_disable;
> +
> +		res_counter_init(&memcg->res, NULL);
> +		res_counter_init(&memcg->memsw, NULL);
>  	}
>  
> +	memcg->last_scanned_node = MAX_NUMNODES;
> +	INIT_LIST_HEAD(&memcg->oom_notify);
> +	atomic_set(&memcg->refcnt, 1);
> +	memcg->move_charge_at_immigrate = 0;
> +	mutex_init(&memcg->thresholds_lock);
> +	spin_lock_init(&memcg->move_lock);
> +
> +	return &memcg->css;
> +
> +free_out:
> +	__mem_cgroup_free(memcg);
> +	return ERR_PTR(error);
> +}
> +
> +static int
> +mem_cgroup_css_online(struct cgroup *cont)
> +{
> +	struct mem_cgroup *memcg, *parent;
> +	int error = 0;
> +
> +	if (!cont->parent)
> +		return 0;
> +
> +	memcg = mem_cgroup_from_cont(cont);
> +	parent = mem_cgroup_from_cont(cont->parent);
> +
> +	memcg->use_hierarchy = parent->use_hierarchy;
> +	memcg->oom_kill_disable = parent->oom_kill_disable;
> +
>  	if (parent && parent->use_hierarchy) {
>  		res_counter_init(&memcg->res, &parent->res);
>  		res_counter_init(&memcg->memsw, &parent->memsw);
> @@ -5050,15 +5078,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  		if (parent && parent != root_mem_cgroup)
>  			mem_cgroup_subsys.broken_hierarchy = true;
>  	}
> -	memcg->last_scanned_node = MAX_NUMNODES;
> -	INIT_LIST_HEAD(&memcg->oom_notify);
>  
> -	if (parent)
> -		memcg->swappiness = mem_cgroup_swappiness(parent);
> -	atomic_set(&memcg->refcnt, 1);
> -	memcg->move_charge_at_immigrate = 0;
> -	mutex_init(&memcg->thresholds_lock);
> -	spin_lock_init(&memcg->move_lock);
> +	memcg->swappiness = mem_cgroup_swappiness(parent);
>  
>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>  	if (error) {
> @@ -5068,12 +5089,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  		 * call __mem_cgroup_free, so return directly
>  		 */
>  		mem_cgroup_put(memcg);
> -		return ERR_PTR(error);
>  	}
> -	return &memcg->css;
> -free_out:
> -	__mem_cgroup_free(memcg);
> -	return ERR_PTR(error);
> +	return error;
>  }
>  
>  static void mem_cgroup_css_offline(struct cgroup *cont)
> @@ -5702,6 +5719,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>  	.name = "memory",
>  	.subsys_id = mem_cgroup_subsys_id,
>  	.css_alloc = mem_cgroup_css_alloc,
> +	.css_online = mem_cgroup_css_online,
>  	.css_offline = mem_cgroup_css_offline,
>  	.css_free = mem_cgroup_css_free,
>  	.can_attach = mem_cgroup_can_attach,
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-03 17:30     ` Michal Hocko
@ 2012-12-04  7:49       ` Glauber Costa
  0 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-12-04  7:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On 12/03/2012 09:30 PM, Michal Hocko wrote:
> On Mon 03-12-12 18:15:32, Michal Hocko wrote:
> [...]
>>> @@ -3915,7 +3926,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
>>>  	 */
>>>  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
>>>  				(val == 1 || val == 0)) {
>>> -		if (list_empty(&cont->children))
>>> +		if (!memcg_has_children(memcg))
>>>  			memcg->use_hierarchy = val;
>>>  		else
>>>  			retval = -EBUSY;
>>
>> Nothing prevents from a race when a task is on the way to be attached to
>> the group. This means that we might miss some charges up the way to the
>> parent.
>>
>> mem_cgroup_hierarchy_write
>>   					cgroup_attach_task
>> 					  ss->can_attach() = mem_cgroup_can_attach
>> 					    mutex_lock(&memcg_lock)
>> 					    memcg->attach_in_progress++
>> 					    mutex_unlock(&memcg_lock)
>> 					    __mem_cgroup_can_attach
>> 					      mem_cgroup_precharge_mc (*)
>>   mutex_lock(memcg_lock)
>>   memcg_has_children(memcg)==false
> 
> Dohh, retard alert. I obviously mixed tasks and children cgroups here.
> Why I thought we also do check for no tasks in the group? Ahh, because
> we should, at least here otherwise parent could see more uncharges than
> charges.
> But that deserves a separate patch. Sorry, for the confusion.

That is okay, here is a beer for you: http://bit.ly/R3rkML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-03 17:15   ` Michal Hocko
  2012-12-03 17:30     ` Michal Hocko
@ 2012-12-04  7:58     ` Glauber Costa
  2012-12-04  8:23       ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-12-04  7:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On 12/03/2012 09:15 PM, Michal Hocko wrote:
> On Fri 30-11-12 17:31:26, Glauber Costa wrote:
> [...]
>> +/*
>> + * must be called with memcg_lock held, unless the cgroup is guaranteed to be
>> + * already dead (like in mem_cgroup_force_empty, for instance).
>> + */
>> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
>> +{
>> +	return mem_cgroup_count_children(memcg) != 1;
>> +}
> 
> Why not just keep list_empty(&cgrp->children) which is much simpler much
> more effective and correct here as well because cgroup cannot vanish
> while we are at the call because all callers come from cgroup fs?
> 
Because it depends on cgroup's internal representation, which I think
we're better off not depending upon, even if this is not as serious a
case as the locking stuff. But also, technically, cgrp->children is
protected by the cgroup_lock(), while since we'll hold the memcg_lock
during creation and also around the iterators, we cover everything with
the same lock.

That said, of course we don't need to do the full iteration here, and
mem_cgroup_count_children is indeed overkill. We could just as easily
verify if any child exist - it is just an emptiness test after all. But
it is not living in any fast path, though, and I just assumed code reuse
to win over efficiency in this particular case -
mem_cgroup_count_children already existed...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-12-03 17:32   ` Michal Hocko
@ 2012-12-04  8:05     ` Glauber Costa
  2012-12-04  8:17       ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-12-04  8:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On 12/03/2012 09:32 PM, Michal Hocko wrote:
> On Fri 30-11-12 17:31:25, Glauber Costa wrote:
>> Although there is arguably some value in doing this per se, the main
>> goal of this patch is to make room for the locking changes to come.
>>
>> With all the value assignment from parent happening in a context where
>> our iterators can already be used, we can safely lock against value
>> change in some key values like use_hierarchy, without resorting to the
>> cgroup core at all.
> 
> I am sorry but I really do not get why online_css callback is more
> appropriate. Quite contrary. With this change iterators can see a group
> which is not fully initialized which calls for a problem (even though it
> is not one yet).

But it should be extremely easy to protect against this. It is just a
matter of not returning online css in the iterator: then we'll never see
them until they are online. This also sounds a lot more correct than
returning allocated css.


> Could you be more specific why we cannot keep the initialization in
> mem_cgroup_css_alloc? We can lock there as well, no?
> 
Because we need to parent value of things like use_hierarchy and
oom_control not to change after it was copied to a child.

If we do it in css_alloc, the iterators won't be working yet - nor will
cgrp->children list, for that matter - and we will risk a situation
where another thread thinks no children exist, and flips use_hierarchy
to 1 (or oom_control, etc), right after the children already got the
value of 0.

The two other ways to solve this problem that I see, are:

1) lock in css_alloc and unlock in css_online, that tejun already ruled
out as too damn ugly (and I can't possibly disagree)

2) have an alternate indication of emptiness that is working since
css_alloc (like counting number of children).

Since I don't share your concerns about the iterator showing incomplete
memcgs - trivial to fix, if not fixed already - I deemed my approach
preferable here.



>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> ---
>>  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 35 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d80b6b5..b6d352f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5023,12 +5023,40 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  			INIT_WORK(&stock->work, drain_local_stock);
>>  		}
>>  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>> -	} else {
>> -		parent = mem_cgroup_from_cont(cont->parent);
>> -		memcg->use_hierarchy = parent->use_hierarchy;
>> -		memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>> +		res_counter_init(&memcg->res, NULL);
>> +		res_counter_init(&memcg->memsw, NULL);
>>  	}
>>  
>> +	memcg->last_scanned_node = MAX_NUMNODES;
>> +	INIT_LIST_HEAD(&memcg->oom_notify);
>> +	atomic_set(&memcg->refcnt, 1);
>> +	memcg->move_charge_at_immigrate = 0;
>> +	mutex_init(&memcg->thresholds_lock);
>> +	spin_lock_init(&memcg->move_lock);
>> +
>> +	return &memcg->css;
>> +
>> +free_out:
>> +	__mem_cgroup_free(memcg);
>> +	return ERR_PTR(error);
>> +}
>> +
>> +static int
>> +mem_cgroup_css_online(struct cgroup *cont)
>> +{
>> +	struct mem_cgroup *memcg, *parent;
>> +	int error = 0;
>> +
>> +	if (!cont->parent)
>> +		return 0;
>> +
>> +	memcg = mem_cgroup_from_cont(cont);
>> +	parent = mem_cgroup_from_cont(cont->parent);
>> +
>> +	memcg->use_hierarchy = parent->use_hierarchy;
>> +	memcg->oom_kill_disable = parent->oom_kill_disable;
>> +
>>  	if (parent && parent->use_hierarchy) {
>>  		res_counter_init(&memcg->res, &parent->res);
>>  		res_counter_init(&memcg->memsw, &parent->memsw);
>> @@ -5050,15 +5078,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		if (parent && parent != root_mem_cgroup)
>>  			mem_cgroup_subsys.broken_hierarchy = true;
>>  	}
>> -	memcg->last_scanned_node = MAX_NUMNODES;
>> -	INIT_LIST_HEAD(&memcg->oom_notify);
>>  
>> -	if (parent)
>> -		memcg->swappiness = mem_cgroup_swappiness(parent);
>> -	atomic_set(&memcg->refcnt, 1);
>> -	memcg->move_charge_at_immigrate = 0;
>> -	mutex_init(&memcg->thresholds_lock);
>> -	spin_lock_init(&memcg->move_lock);
>> +	memcg->swappiness = mem_cgroup_swappiness(parent);
>>  
>>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>  	if (error) {
>> @@ -5068,12 +5089,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		 * call __mem_cgroup_free, so return directly
>>  		 */
>>  		mem_cgroup_put(memcg);
>> -		return ERR_PTR(error);
>>  	}
>> -	return &memcg->css;
>> -free_out:
>> -	__mem_cgroup_free(memcg);
>> -	return ERR_PTR(error);
>> +	return error;
>>  }
>>  
>>  static void mem_cgroup_css_offline(struct cgroup *cont)
>> @@ -5702,6 +5719,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>>  	.name = "memory",
>>  	.subsys_id = mem_cgroup_subsys_id,
>>  	.css_alloc = mem_cgroup_css_alloc,
>> +	.css_online = mem_cgroup_css_online,
>>  	.css_offline = mem_cgroup_css_offline,
>>  	.css_free = mem_cgroup_css_free,
>>  	.can_attach = mem_cgroup_can_attach,
>> -- 
>> 1.7.11.7
>>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-12-04  8:05     ` Glauber Costa
@ 2012-12-04  8:17       ` Michal Hocko
  2012-12-04  8:32         ` Glauber Costa
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-04  8:17 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Tue 04-12-12 12:05:21, Glauber Costa wrote:
> On 12/03/2012 09:32 PM, Michal Hocko wrote:
> > On Fri 30-11-12 17:31:25, Glauber Costa wrote:
> >> Although there is arguably some value in doing this per se, the main
> >> goal of this patch is to make room for the locking changes to come.
> >>
> >> With all the value assignment from parent happening in a context where
> >> our iterators can already be used, we can safely lock against value
> >> change in some key values like use_hierarchy, without resorting to the
> >> cgroup core at all.
> > 
> > I am sorry but I really do not get why online_css callback is more
> > appropriate. Quite contrary. With this change iterators can see a group
> > which is not fully initialized which calls for a problem (even though it
> > is not one yet).
> 
> But it should be extremely easy to protect against this. It is just a
> matter of not returning online css in the iterator: then we'll never see
> them until they are online. This also sounds a lot more correct than
> returning allocated css.

Yes but... Look at your other patch which relies on iterator when counting
children to find out if there is any available.
 
> > Could you be more specific why we cannot keep the initialization in
> > mem_cgroup_css_alloc? We can lock there as well, no?
> > 
> Because we need to parent value of things like use_hierarchy and
> oom_control not to change after it was copied to a child.
> 
> If we do it in css_alloc, the iterators won't be working yet - nor will
> cgrp->children list, for that matter - and we will risk a situation
> where another thread thinks no children exist, and flips use_hierarchy
> to 1 (or oom_control, etc), right after the children already got the
> value of 0.

You are right. I must have been blind yesterday evening.
 
> The two other ways to solve this problem that I see, are:
> 
> 1) lock in css_alloc and unlock in css_online, that tejun already ruled
> out as too damn ugly (and I can't possibly disagree)

yes, it is really ugly

> 2) have an alternate indication of emptiness that is working since
> css_alloc (like counting number of children).

I do not think it is worth the complication.

> Since I don't share your concerns about the iterator showing incomplete
> memcgs - trivial to fix, if not fixed already - I deemed my approach
> preferable here.

Agreed.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04  7:58     ` Glauber Costa
@ 2012-12-04  8:23       ` Michal Hocko
  2012-12-04  8:31         ` Glauber Costa
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-04  8:23 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Tue 04-12-12 11:58:48, Glauber Costa wrote:
> On 12/03/2012 09:15 PM, Michal Hocko wrote:
> > On Fri 30-11-12 17:31:26, Glauber Costa wrote:
> > [...]
> >> +/*
> >> + * must be called with memcg_lock held, unless the cgroup is guaranteed to be
> >> + * already dead (like in mem_cgroup_force_empty, for instance).
> >> + */
> >> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
> >> +{
> >> +	return mem_cgroup_count_children(memcg) != 1;
> >> +}
> > 
> > Why not just keep list_empty(&cgrp->children) which is much simpler much
> > more effective and correct here as well because cgroup cannot vanish
> > while we are at the call because all callers come from cgroup fs?
> > 
> Because it depends on cgroup's internal representation, which I think
> we're better off not depending upon, even if this is not as serious a
> case as the locking stuff. But also, technically, cgrp->children is
> protected by the cgroup_lock(), while since we'll hold the memcg_lock
> during creation and also around the iterators, we cover everything with
> the same lock.

The list is RCU safe so we do not have to use cgroup_lock there for this
kind of test.

> That said, of course we don't need to do the full iteration here, and
> mem_cgroup_count_children is indeed overkill. We could just as easily
> verify if any child exist - it is just an emptiness test after all. But
> it is not living in any fast path, though, and I just assumed code reuse
> to win over efficiency in this particular case -
> mem_cgroup_count_children already existed...

Yes but the function name suggests a more generic usage and the test is
really an overkill. Maybe we can get a cgroup generic helper
cgroup_as_children which would do the thing without exhibiting cgroup
internals. What do you think?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04  8:23       ` Michal Hocko
@ 2012-12-04  8:31         ` Glauber Costa
  2012-12-04  8:45           ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-12-04  8:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On 12/04/2012 12:23 PM, Michal Hocko wrote:
> On Tue 04-12-12 11:58:48, Glauber Costa wrote:
>> On 12/03/2012 09:15 PM, Michal Hocko wrote:
>>> On Fri 30-11-12 17:31:26, Glauber Costa wrote:
>>> [...]
>>>> +/*
>>>> + * must be called with memcg_lock held, unless the cgroup is guaranteed to be
>>>> + * already dead (like in mem_cgroup_force_empty, for instance).
>>>> + */
>>>> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
>>>> +{
>>>> +	return mem_cgroup_count_children(memcg) != 1;
>>>> +}
>>>
>>> Why not just keep list_empty(&cgrp->children) which is much simpler much
>>> more effective and correct here as well because cgroup cannot vanish
>>> while we are at the call because all callers come from cgroup fs?
>>>
>> Because it depends on cgroup's internal representation, which I think
>> we're better off not depending upon, even if this is not as serious a
>> case as the locking stuff. But also, technically, cgrp->children is
>> protected by the cgroup_lock(), while since we'll hold the memcg_lock
>> during creation and also around the iterators, we cover everything with
>> the same lock.
> 
> The list is RCU safe so we do not have to use cgroup_lock there for this
> kind of test.
> 
>> That said, of course we don't need to do the full iteration here, and
>> mem_cgroup_count_children is indeed overkill. We could just as easily
>> verify if any child exist - it is just an emptiness test after all. But
>> it is not living in any fast path, though, and I just assumed code reuse
>> to win over efficiency in this particular case -
>> mem_cgroup_count_children already existed...
> 
> Yes but the function name suggests a more generic usage and the test is
> really an overkill. Maybe we can get a cgroup generic helper
> cgroup_as_children which would do the thing without exhibiting cgroup
> internals. What do you think?
> 
I will give it another round of thinking, but I still don't see the
reason for calling to cgroup core with this. If you really dislike doing
a children count (I don't like as well, I just don't dislike), maybe we
can do something like:

i = 0;
for_each_mem_cgroup_tree(iter, memcg) {
	if (i++ == 1)
		return false;
}
return true;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-12-04  8:17       ` Michal Hocko
@ 2012-12-04  8:32         ` Glauber Costa
  2012-12-04  8:52           ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Glauber Costa @ 2012-12-04  8:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On 12/04/2012 12:17 PM, Michal Hocko wrote:
>> But it should be extremely easy to protect against this. It is just a
>> > matter of not returning online css in the iterator: then we'll never see
>> > them until they are online. This also sounds a lot more correct than
>> > returning allocated css.
> Yes but... Look at your other patch which relies on iterator when counting
> children to find out if there is any available.
>  
And what is the problem with it ?

As I said: if the iterator will not return css that are not online, we
should not have a problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04  8:31         ` Glauber Costa
@ 2012-12-04  8:45           ` Michal Hocko
  2012-12-04 14:52             ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-04  8:45 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Tue 04-12-12 12:31:31, Glauber Costa wrote:
> On 12/04/2012 12:23 PM, Michal Hocko wrote:
> > On Tue 04-12-12 11:58:48, Glauber Costa wrote:
> >> On 12/03/2012 09:15 PM, Michal Hocko wrote:
> >>> On Fri 30-11-12 17:31:26, Glauber Costa wrote:
> >>> [...]
> >>>> +/*
> >>>> + * must be called with memcg_lock held, unless the cgroup is guaranteed to be
> >>>> + * already dead (like in mem_cgroup_force_empty, for instance).
> >>>> + */
> >>>> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
> >>>> +{
> >>>> +	return mem_cgroup_count_children(memcg) != 1;
> >>>> +}
> >>>
> >>> Why not just keep list_empty(&cgrp->children) which is much simpler much
> >>> more effective and correct here as well because cgroup cannot vanish
> >>> while we are at the call because all callers come from cgroup fs?
> >>>
> >> Because it depends on cgroup's internal representation, which I think
> >> we're better off not depending upon, even if this is not as serious a
> >> case as the locking stuff. But also, technically, cgrp->children is
> >> protected by the cgroup_lock(), while since we'll hold the memcg_lock
> >> during creation and also around the iterators, we cover everything with
> >> the same lock.
> > 
> > The list is RCU safe so we do not have to use cgroup_lock there for this
> > kind of test.
> > 
> >> That said, of course we don't need to do the full iteration here, and
> >> mem_cgroup_count_children is indeed overkill. We could just as easily
> >> verify if any child exist - it is just an emptiness test after all. But
> >> it is not living in any fast path, though, and I just assumed code reuse
> >> to win over efficiency in this particular case -
> >> mem_cgroup_count_children already existed...
> > 
> > Yes but the function name suggests a more generic usage and the test is
> > really an overkill. Maybe we can get a cgroup generic helper
> > cgroup_as_children which would do the thing without exhibiting cgroup
> > internals. What do you think?
> > 
> I will give it another round of thinking, but I still don't see the
> reason for calling to cgroup core with this. 

Because such a helper might be useful in general? I didn't check if
somebody does the same test elsewhere though.

> If you really dislike doing a children count (I don't like as well, I
> just don't dislike), maybe we can do something like:
> 
> i = 0;
> for_each_mem_cgroup_tree(iter, memcg) {
> 	if (i++ == 1)
> 		return false;
> }
> return true;

I guess you meant:
i = 0;
for_each_mem_cgroup_tree(iter, memcg) {
	if (i++ == 1) {
		mem_cgroup_iter_break(iter);
		break;
	}
}
return i > 1;

which is still much more work than necessary. Not that this would be a
killer thing it just hit my eyes. I think the easiest thing would be to
not fold this change into this patch and do it as a separate patch if
there is a real reason for it - e.g. cgroup core would like to give us a
helper or they tell us _do_not_missuse_our_internals_.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] memcg: split part of memcg creation to css_online
  2012-12-04  8:32         ` Glauber Costa
@ 2012-12-04  8:52           ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-12-04  8:52 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Tue 04-12-12 12:32:17, Glauber Costa wrote:
> On 12/04/2012 12:17 PM, Michal Hocko wrote:
> >> But it should be extremely easy to protect against this. It is just a
> >> > matter of not returning online css in the iterator: then we'll never see
> >> > them until they are online. This also sounds a lot more correct than
> >> > returning allocated css.
> > Yes but... Look at your other patch which relies on iterator when counting
> > children to find out if there is any available.
> >  
> And what is the problem with it ?

Bahh. Right you are because the value is copied only at the css_online
time.  So even if mem_cgroup_hierarchy_write wouldn't see any child
(because they are still offline) and managed to set use_hierarchy=1 with
some children linked in all would be fixed in mem_cgroup_css_online.

P.S.
Hey mhocko stop saying crap.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach
  2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
  2012-11-30 15:19   ` Tejun Heo
@ 2012-12-04  9:29   ` Michal Hocko
  1 sibling, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-12-04  9:29 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, Tejun Heo, kamezawa.hiroyu, Johannes Weiner

On Fri 30-11-12 17:31:24, Glauber Costa wrote:
> Currently, we rely on the cgroup_lock() to prevent changes to
> move_charge_at_immigrate during task migration. We can do something
> similar to what cpuset is doing, and flip a flag to tell us if task
> movement is taking place.
> 
> In theory, we could busy loop waiting for that value to return to 0 - it
> will eventually. But I am judging that returning EAGAIN is not too much
> of a problem, since file writers already should be checking for error
> codes anyway.

I think we should prevent from EAGAIN because this is a behavior change.
Why not just loop with signal_pending test for breaking out and a small
sleep after attach_in_progress > 0 && unlock?

> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  mm/memcontrol.c | 64 +++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 51 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index feba87d..d80b6b5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -311,7 +311,13 @@ struct mem_cgroup {
>  	 * Should we move charges of a task when a task is moved into this
>  	 * mem_cgroup ? And what type of charges should we move ?
>  	 */
> -	unsigned long 	move_charge_at_immigrate;
> +	unsigned long	move_charge_at_immigrate;
> +        /*
> +	 * Tasks are being attached to this memcg.  Used mostly to prevent
> +	 * changes to move_charge_at_immigrate
> +	 */
> +        int attach_in_progress;
> +
>  	/*
>  	 * set > 0 if pages under this cgroup are moving to other cgroup.
>  	 */
> @@ -4114,6 +4120,7 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  					struct cftype *cft, u64 val)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
> +	int ret = -EAGAIN;
>  
>  	if (val >= (1 << NR_MOVE_TYPE))
>  		return -EINVAL;
> @@ -4123,10 +4130,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  	 * inconsistent.
>  	 */
>  	cgroup_lock();
> +	if (memcg->attach_in_progress)
> +		goto out;
>  	memcg->move_charge_at_immigrate = val;
> +	ret = 0;
> +out:
>  	cgroup_unlock();
> -
> -	return 0;
> +	return ret;
>  }
>  #else
>  static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
> @@ -5443,12 +5453,12 @@ static void mem_cgroup_clear_mc(void)
>  	mem_cgroup_end_move(from);
>  }
>  
> -static int mem_cgroup_can_attach(struct cgroup *cgroup,
> -				 struct cgroup_taskset *tset)
> +
> +static int __mem_cgroup_can_attach(struct mem_cgroup *memcg,
> +				   struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p = cgroup_taskset_first(tset);
>  	int ret = 0;
> -	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
>  
>  	if (memcg->move_charge_at_immigrate) {
>  		struct mm_struct *mm;
> @@ -5482,8 +5492,8 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
>  	return ret;
>  }
>  
> -static void mem_cgroup_cancel_attach(struct cgroup *cgroup,
> -				     struct cgroup_taskset *tset)
> +static void __mem_cgroup_cancel_attach(struct mem_cgroup *memcg,
> +				       struct cgroup_taskset *tset)
>  {
>  	mem_cgroup_clear_mc();
>  }
> @@ -5630,8 +5640,8 @@ retry:
>  	up_read(&mm->mmap_sem);
>  }
>  
> -static void mem_cgroup_move_task(struct cgroup *cont,
> -				 struct cgroup_taskset *tset)
> +static void __mem_cgroup_move_task(struct mem_cgroup *memcg,
> +				   struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p = cgroup_taskset_first(tset);
>  	struct mm_struct *mm = get_task_mm(p);
> @@ -5645,20 +5655,48 @@ static void mem_cgroup_move_task(struct cgroup *cont,
>  		mem_cgroup_clear_mc();
>  }
>  #else	/* !CONFIG_MMU */
> +static int __mem_cgroup_can_attach(struct mem_cgroup *memcg,
> +				   struct cgroup_taskset *tset)
> +{
> +	return 0;
> +}
> +
> +static void __mem_cgroup_cancel_attach(struct mem_cgroup *memcg,
> +				       struct cgroup_taskset *tset)
> +{
> +}
> +
> +static void __mem_cgroup_move_task(struct mem_cgroup *memcg,
> +				   struct cgroup_taskset *tset)
> +{
> +}
> +#endif
>  static int mem_cgroup_can_attach(struct cgroup *cgroup,
>  				 struct cgroup_taskset *tset)
>  {
> -	return 0;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
> +	memcg->attach_in_progress++;
> +	return __mem_cgroup_can_attach(memcg, tset);
>  }
> +
>  static void mem_cgroup_cancel_attach(struct cgroup *cgroup,
>  				     struct cgroup_taskset *tset)
>  {
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
> +	__mem_cgroup_cancel_attach(memcg, tset);
> +	memcg->attach_in_progress--;
>  }
> -static void mem_cgroup_move_task(struct cgroup *cont,
> +
> +static void mem_cgroup_move_task(struct cgroup *cgroup,
>  				 struct cgroup_taskset *tset)
>  {
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
> +	__mem_cgroup_move_task(memcg, tset);
> +	memcg->attach_in_progress--;
>  }
> -#endif
>  
>  struct cgroup_subsys mem_cgroup_subsys = {
>  	.name = "memory",
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04  8:45           ` Michal Hocko
@ 2012-12-04 14:52             ` Tejun Heo
  2012-12-04 15:14               ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-12-04 14:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

Hello, Michal, Glauber.

On Tue, Dec 04, 2012 at 09:45:44AM +0100, Michal Hocko wrote:
> Because such a helper might be useful in general? I didn't check if
> somebody does the same test elsewhere though.

The problem is that whether a cgroup has a child or not may differ
depending on the specific controller.  You can't tell whether
something exists or not at a given time without synchronization and
synchronization is per-controller.  IOW, if a controller cares about
when a cgroup comes online and goes offline, it should synchronize
those events in ->css_on/offline() and only consider cgroups marked
online as online.

> > If you really dislike doing a children count (I don't like as well, I
> > just don't dislike), maybe we can do something like:
> > 
> > i = 0;
> > for_each_mem_cgroup_tree(iter, memcg) {
> > 	if (i++ == 1)
> > 		return false;
> > }
> > return true;
> 
> I guess you meant:
> i = 0;
> for_each_mem_cgroup_tree(iter, memcg) {
> 	if (i++ == 1) {
> 		mem_cgroup_iter_break(iter);
> 		break;
> 	}
> }
> return i > 1;

Or sth like the following?

bool memcg_has_children(cgrp)
{
	lockdep_assert_held(memcg_lock);

	rcu_read_lock();
	cgroup_for_each_children(pos, cgrp) {
		if (memcg_is_online(pos)) {
			rcu_read_unlock();
			return true;
		}
	}
	rcu_read_unlock();
	return ret;
}

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04 14:52             ` Tejun Heo
@ 2012-12-04 15:14               ` Michal Hocko
  2012-12-04 15:22                 ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-04 15:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

On Tue 04-12-12 06:52:21, Tejun Heo wrote:
> Hello, Michal, Glauber.
> 
> On Tue, Dec 04, 2012 at 09:45:44AM +0100, Michal Hocko wrote:
> > Because such a helper might be useful in general? I didn't check if
> > somebody does the same test elsewhere though.
> 
> The problem is that whether a cgroup has a child or not may differ
> depending on the specific controller.  You can't tell whether
> something exists or not at a given time without synchronization and
> synchronization is per-controller.  IOW, if a controller cares about
> when a cgroup comes online and goes offline, it should synchronize
> those events in ->css_on/offline() and only consider cgroups marked
> online as online.

OK, I read this as "generic helper doesn't make much sense". Then I
would just ask. Does cgroup core really care whether we do
list_empty test? Is this something that we have to care about in memcg
and should fix? If yes then just try to do it as simple as possible.

My primary objection was that the full hierarchy walk is an overkill and
it doesn't fit into the patch which aims at a different task. So if
cgroup really cares about this cgroups internals abuse then let's fix it
but let's do it in a separate patch.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04 15:14               ` Michal Hocko
@ 2012-12-04 15:22                 ` Tejun Heo
  2012-12-05 14:35                   ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2012-12-04 15:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

Hello, Michal.

On Tue, Dec 04, 2012 at 04:14:20PM +0100, Michal Hocko wrote:
> OK, I read this as "generic helper doesn't make much sense". Then I
> would just ask. Does cgroup core really care whether we do
> list_empty test? Is this something that we have to care about in memcg
> and should fix? If yes then just try to do it as simple as possible.

The thing is, what does the test mean when it doesn't have proper
synchronization?  list_empty(&cgroup->children) doesn't really have a
precise meaning if you're not synchronized.  There could be cases
where such correct-most-of-the-time results are okay but depending on
stuff like that is a sure-fire way to subtle bugs.

So, my recommendation would be to bite the bullet and implement
properly synchronized on/offline state and teach the memcg iterator
about it so that memcg can definitively tell what's online and what's
not while holding memcg_mutex, and use such knowledge consistently.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-04 15:22                 ` Tejun Heo
@ 2012-12-05 14:35                   ` Michal Hocko
  2012-12-05 14:41                     ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-12-05 14:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

On Tue 04-12-12 07:22:25, Tejun Heo wrote:
> Hello, Michal.
> 
> On Tue, Dec 04, 2012 at 04:14:20PM +0100, Michal Hocko wrote:
> > OK, I read this as "generic helper doesn't make much sense". Then I
> > would just ask. Does cgroup core really care whether we do
> > list_empty test? Is this something that we have to care about in memcg
> > and should fix? If yes then just try to do it as simple as possible.
> 
> The thing is, what does the test mean when it doesn't have proper
> synchronization?  list_empty(&cgroup->children) doesn't really have a
> precise meaning if you're not synchronized. 

For the cases memcg use this test it is perfectly valid because the only
important information is whether there is a child group. We do not care
about its current state. The test is rather strict because we could set
use_hierarchy to 1 even if there is child which is not online yet (after
the value is copied in css_online of course). But do we care about this
race? If yes, patches with the use case are welcome.

> There could be cases where such correct-most-of-the-time results are
> okay but depending on stuff like that is a sure-fire way to subtle
> bugs.
> 
> So, my recommendation would be to bite the bullet and implement
> properly synchronized on/offline state and teach the memcg iterator
> about it so that memcg can definitively tell what's online and what's
> not while holding memcg_mutex, and use such knowledge consistently.

I would rather not complicate the iterators with even more logic but if
it turns out being useful then why not.

I do not want to repeat myself but please focus on cgroup->memcg locking
in this series and let's do other things separately (if at all).

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock
  2012-12-05 14:35                   ` Michal Hocko
@ 2012-12-05 14:41                     ` Tejun Heo
  0 siblings, 0 replies; 32+ messages in thread
From: Tejun Heo @ 2012-12-05 14:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, cgroups, linux-mm, kamezawa.hiroyu,
	Johannes Weiner

Hello, Michal.

On Wed, Dec 05, 2012 at 03:35:37PM +0100, Michal Hocko wrote:
> On Tue 04-12-12 07:22:25, Tejun Heo wrote:
> > Hello, Michal.
> > 
> > On Tue, Dec 04, 2012 at 04:14:20PM +0100, Michal Hocko wrote:
> > > OK, I read this as "generic helper doesn't make much sense". Then I
> > > would just ask. Does cgroup core really care whether we do
> > > list_empty test? Is this something that we have to care about in memcg
> > > and should fix? If yes then just try to do it as simple as possible.
> > 
> > The thing is, what does the test mean when it doesn't have proper
> > synchronization?  list_empty(&cgroup->children) doesn't really have a
> > precise meaning if you're not synchronized. 
> 
> For the cases memcg use this test it is perfectly valid because the only
> important information is whether there is a child group. We do not care
> about its current state. The test is rather strict because we could set
> use_hierarchy to 1 even if there is child which is not online yet (after
> the value is copied in css_online of course). But do we care about this
> race? If yes, patches with the use case are welcome.

Please just implement properly synchronized onlineness.  There is
absoluately *NO* reason not to do it.  It's gonna be error/race-prone
like hell if memcg keeps trying to dance around it.

> > There could be cases where such correct-most-of-the-time results are
> > okay but depending on stuff like that is a sure-fire way to subtle
> > bugs.
> > 
> > So, my recommendation would be to bite the bullet and implement
> > properly synchronized on/offline state and teach the memcg iterator
> > about it so that memcg can definitively tell what's online and what's
> > not while holding memcg_mutex, and use such knowledge consistently.
> 
> I would rather not complicate the iterators with even more logic but if
> it turns out being useful then why not.

It's gonna be as simple as the following.  I doubt it's gonna add much
to the complexity.

	if (!memcg_online(pos))
		continue; // or goto next; whatever

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-12-05 14:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-30 13:31 [PATCH 0/4] replace cgroup_lock with local lock in memcg Glauber Costa
2012-11-30 13:31 ` [PATCH 1/4] cgroup: warn about broken hierarchies only after css_online Glauber Costa
2012-11-30 15:11   ` Tejun Heo
2012-11-30 15:13     ` Glauber Costa
2012-11-30 15:45       ` Tejun Heo
2012-11-30 15:49         ` Michal Hocko
2012-11-30 15:57           ` Glauber Costa
2012-11-30 13:31 ` [PATCH 2/4] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
2012-11-30 15:19   ` Tejun Heo
2012-11-30 15:29     ` Glauber Costa
2012-12-04  9:29   ` Michal Hocko
2012-11-30 13:31 ` [PATCH 3/4] memcg: split part of memcg creation to css_online Glauber Costa
2012-12-03 17:32   ` Michal Hocko
2012-12-04  8:05     ` Glauber Costa
2012-12-04  8:17       ` Michal Hocko
2012-12-04  8:32         ` Glauber Costa
2012-12-04  8:52           ` Michal Hocko
2012-11-30 13:31 ` [PATCH 4/4] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
2012-12-03 17:15   ` Michal Hocko
2012-12-03 17:30     ` Michal Hocko
2012-12-04  7:49       ` Glauber Costa
2012-12-04  7:58     ` Glauber Costa
2012-12-04  8:23       ` Michal Hocko
2012-12-04  8:31         ` Glauber Costa
2012-12-04  8:45           ` Michal Hocko
2012-12-04 14:52             ` Tejun Heo
2012-12-04 15:14               ` Michal Hocko
2012-12-04 15:22                 ` Tejun Heo
2012-12-05 14:35                   ` Michal Hocko
2012-12-05 14:41                     ` Tejun Heo
2012-11-30 15:52 ` [PATCH 0/4] replace cgroup_lock with local lock in memcg Tejun Heo
2012-11-30 15:59   ` Glauber Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).