[PATCH v2 0/7] replace cgroup_lock with local memcg lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] replace cgroup_lock with local memcg lock
@ 2013-01-11  9:45 Glauber Costa
  2013-01-11  9:45 ` [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo

Hi,

In memcg, we use the cgroup_lock basically to synchronize against
attaching new children to a cgroup. We do this because we rely on cgroup core to
provide us with this information.

We need to guarantee that upon child creation, our tunables are consistent.
For those, the calls to cgroup_lock() all live in handlers like
mem_cgroup_hierarchy_write(), where we change a tunable in the group that is
hierarchy-related. For instance, the use_hierarchy flag cannot be changed if
the cgroup already have children.

Furthermore, those values are propageted from the parent to the child when a
new child is created. So if we don't lock like this, we can end up with the
following situation:

A                                   B
 memcg_css_alloc()                       mem_cgroup_hierarchy_write()
 copy use hierarchy from parent          change use hierarchy in parent
 finish creation.

This is mainly because during create, we are still not fully connected to the
css tree. So all iterators and the such that we could use, will fail to show
that the group has children.

My observation is that all of creation can proceed in parallel with those
tasks, except value assignment. So what this patchseries does is to first move
all value assignment that is dependent on parent values from css_alloc to
css_online, where the iterators all work, and then we lock only the value
assignment. This will guarantee that parent and children always have consistent
values. Together with an online test, that can be derived from the observation
that the refcount of an online memcg can be made to be always positive, we
should be able to synchronize our side without the cgroup lock.

*v2:
 - sanitize kmemcg assignment in the light of the current locking change.
 - don't grab locks on immigrate charges by caching the value during can_attach

Glauber Costa (7):
  memcg: prevent changes to move_charge_at_immigrate during task attach
  memcg: split part of memcg creation to css_online
  memcg: provide online test for memcg
  memcg: fast hierarchy-aware child test.
  May god have mercy on my soul.
  memcg: replace cgroup_lock with memcg specific memcg_lock
  memcg: increment static branch right after limit set.

 mm/memcontrol.c | 194 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 117 insertions(+), 77 deletions(-)

-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 14:16   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 2/7] memcg: split part of memcg creation to css_online Glauber Costa
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

Currently, we rely on the cgroup_lock() to prevent changes to
move_charge_at_immigrate during task migration. However, this is only
needed because the current strategy keeps checking this value throughout
the whole process. Since all we need is serialization, one needs only to
guarantee that whatever decision we made in the beginning of a specific
migration is respected throughout the process.

We can achieve this by just saving it in mc. By doing this, no kind of
locking is needed.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09255ec..18f4e76 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -412,6 +412,7 @@ static struct move_charge_struct {
 	spinlock_t	  lock; /* for from, to */
 	struct mem_cgroup *from;
 	struct mem_cgroup *to;
+	unsigned long move_charge_at_immigrate;
 	unsigned long precharge;
 	unsigned long moved_charge;
 	unsigned long moved_swap;
@@ -425,13 +426,13 @@ static struct move_charge_struct {
 static bool move_anon(void)
 {
 	return test_bit(MOVE_CHARGE_TYPE_ANON,
-					&mc.to->move_charge_at_immigrate);
+					&mc.move_charge_at_immigrate);
 }
 
 static bool move_file(void)
 {
 	return test_bit(MOVE_CHARGE_TYPE_FILE,
-					&mc.to->move_charge_at_immigrate);
+					&mc.move_charge_at_immigrate);
 }
 
 /*
@@ -5146,15 +5147,14 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 
 	if (val >= (1 << NR_MOVE_TYPE))
 		return -EINVAL;
+
 	/*
-	 * We check this value several times in both in can_attach() and
-	 * attach(), so we need cgroup lock to prevent this value from being
-	 * inconsistent.
+	 * No kind of locking is needed in here, because ->can_attach() will
+	 * check this value once in the beginning of the process, and then carry
+	 * on with stale data. This means that changes to this value will only
+	 * affect task migrations starting after the change.
 	 */
-	cgroup_lock();
 	memcg->move_charge_at_immigrate = val;
-	cgroup_unlock();
-
 	return 0;
 }
 #else
@@ -6530,8 +6530,15 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
 	struct task_struct *p = cgroup_taskset_first(tset);
 	int ret = 0;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+	unsigned long move_charge_at_immigrate;
 
-	if (memcg->move_charge_at_immigrate) {
+	/*
+	 * We are now commited to this value whatever it is. Changes in this
+	 * tunable will only affect upcoming migrations, not the current one.
+	 * So we need to save it, and keep it going.
+	 */
+	move_charge_at_immigrate  = memcg->move_charge_at_immigrate;
+	if (move_charge_at_immigrate) {
 		struct mm_struct *mm;
 		struct mem_cgroup *from = mem_cgroup_from_task(p);
 
@@ -6551,6 +6558,7 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
 			spin_lock(&mc.lock);
 			mc.from = from;
 			mc.to = memcg;
+			mc.move_charge_at_immigrate = move_charge_at_immigrate;
 			spin_unlock(&mc.lock);
 			/* We set mc.moving_task later */
 
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
  2013-01-11  9:45 ` [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 15:25   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 3/7] memcg: provide online test for memcg Glauber Costa
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

Although there is arguably some value in doing this per se, the main
goal of this patch is to make room for the locking changes to come.

With all the value assignment from parent happening in a context where
our iterators can already be used, we can safely lock against value
change in some key values like use_hierarchy, without resorting to the
cgroup core at all.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 18f4e76..2229945 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6090,12 +6090,41 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 						&per_cpu(memcg_stock, cpu);
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
-	} else {
-		parent = mem_cgroup_from_cont(cont->parent);
-		memcg->use_hierarchy = parent->use_hierarchy;
-		memcg->oom_kill_disable = parent->oom_kill_disable;
+
+		res_counter_init(&memcg->res, NULL);
+		res_counter_init(&memcg->memsw, NULL);
+		res_counter_init(&memcg->kmem, NULL);
 	}
 
+	memcg->last_scanned_node = MAX_NUMNODES;
+	INIT_LIST_HEAD(&memcg->oom_notify);
+	atomic_set(&memcg->refcnt, 1);
+	memcg->move_charge_at_immigrate = 0;
+	mutex_init(&memcg->thresholds_lock);
+	spin_lock_init(&memcg->move_lock);
+
+	return &memcg->css;
+
+free_out:
+	__mem_cgroup_free(memcg);
+	return ERR_PTR(error);
+}
+
+static int
+mem_cgroup_css_online(struct cgroup *cont)
+{
+	struct mem_cgroup *memcg, *parent;
+	int error = 0;
+
+	if (!cont->parent)
+		return 0;
+
+	memcg = mem_cgroup_from_cont(cont);
+	parent = mem_cgroup_from_cont(cont->parent);
+
+	memcg->use_hierarchy = parent->use_hierarchy;
+	memcg->oom_kill_disable = parent->oom_kill_disable;
+
 	if (parent && parent->use_hierarchy) {
 		res_counter_init(&memcg->res, &parent->res);
 		res_counter_init(&memcg->memsw, &parent->memsw);
@@ -6120,15 +6149,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		if (parent && parent != root_mem_cgroup)
 			mem_cgroup_subsys.broken_hierarchy = true;
 	}
-	memcg->last_scanned_node = MAX_NUMNODES;
-	INIT_LIST_HEAD(&memcg->oom_notify);
 
-	if (parent)
-		memcg->swappiness = mem_cgroup_swappiness(parent);
-	atomic_set(&memcg->refcnt, 1);
-	memcg->move_charge_at_immigrate = 0;
-	mutex_init(&memcg->thresholds_lock);
-	spin_lock_init(&memcg->move_lock);
+	memcg->swappiness = mem_cgroup_swappiness(parent);
 
 	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
 	if (error) {
@@ -6138,12 +6160,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		 * call __mem_cgroup_free, so return directly
 		 */
 		mem_cgroup_put(memcg);
-		return ERR_PTR(error);
 	}
-	return &memcg->css;
-free_out:
-	__mem_cgroup_free(memcg);
-	return ERR_PTR(error);
+	return error;
 }
 
 static void mem_cgroup_css_offline(struct cgroup *cont)
@@ -6753,6 +6771,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
 	.subsys_id = mem_cgroup_subsys_id,
 	.css_alloc = mem_cgroup_css_alloc,
+	.css_online = mem_cgroup_css_online,
 	.css_offline = mem_cgroup_css_offline,
 	.css_free = mem_cgroup_css_free,
 	.can_attach = mem_cgroup_can_attach,
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
  2013-01-11  9:45 ` [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
  2013-01-11  9:45 ` [PATCH v2 2/7] memcg: split part of memcg creation to css_online Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 15:37   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 4/7] memcg: fast hierarchy-aware child test Glauber Costa
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

Since we are now splitting the memcg creation in two parts, following
the cgroup standard, it would be helpful to be able to determine if a
created memcg is already online.

We can do this by initially forcing the refcnt to 0, and waiting until
the last minute to flip it to 1. During memcg's lifetime, this value
will vary. But if it ever reaches 0 again, memcg will be destructed. We
can therefore be sure that any value different than 0 will mean that
our group is online.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2229945..2ac2808 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -475,6 +475,11 @@ enum res_type {
 static void mem_cgroup_get(struct mem_cgroup *memcg);
 static void mem_cgroup_put(struct mem_cgroup *memcg);
 
+static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
+{
+	return atomic_read(&memcg->refcnt) > 0;
+}
+
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
 {
@@ -6098,7 +6103,7 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
-	atomic_set(&memcg->refcnt, 1);
+	atomic_set(&memcg->refcnt, 0);
 	memcg->move_charge_at_immigrate = 0;
 	mutex_init(&memcg->thresholds_lock);
 	spin_lock_init(&memcg->move_lock);
@@ -6116,10 +6121,13 @@ mem_cgroup_css_online(struct cgroup *cont)
 	struct mem_cgroup *memcg, *parent;
 	int error = 0;
 
-	if (!cont->parent)
+	memcg = mem_cgroup_from_cont(cont);
+	if (!cont->parent) {
+		/* no need to lock, since this is the root cgroup */
+		atomic_set(&memcg->refcnt, 1);
 		return 0;
+	}
 
-	memcg = mem_cgroup_from_cont(cont);
 	parent = mem_cgroup_from_cont(cont->parent);
 
 	memcg->use_hierarchy = parent->use_hierarchy;
@@ -6151,6 +6159,7 @@ mem_cgroup_css_online(struct cgroup *cont)
 	}
 
 	memcg->swappiness = mem_cgroup_swappiness(parent);
+	atomic_set(&memcg->refcnt, 1);
 
 	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
 	if (error) {
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
                   ` (2 preceding siblings ...)
  2013-01-11  9:45 ` [PATCH v2 3/7] memcg: provide online test for memcg Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 16:06   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 5/7] May god have mercy on my soul Glauber Costa
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

Currently, we use cgroups' provided list of children to verify if it is
safe to proceed with any value change that is dependent on the cgroup
being empty.

This is less than ideal, because it enforces a dependency over cgroup
core that we would be better off without. The solution proposed here is
to iterate over the child cgroups and if any is found that is already
online, we bounce and return: we don't really care how many children we
have, only if we have any.

This is also made to be hierarchy aware. IOW, cgroups with  hierarchy
disabled, while they still exist, will be considered for the purpose of
this interface as having no children.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ac2808..aa4e258 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4723,6 +4723,30 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
 }
 
 /*
+ * must be called with memcg_mutex held, unless the cgroup is guaranteed to be
+ * already dead (like in mem_cgroup_force_empty, for instance).  This is
+ * different than mem_cgroup_count_children, in the sense that we don't really
+ * care how many children we have, we only need to know if we have any. It is
+ * also count any memcg without hierarchy as infertile for that matter.
+ */
+static inline bool memcg_has_children(struct mem_cgroup *memcg)
+{
+	struct mem_cgroup *iter;
+
+	if (!memcg->use_hierarchy)
+		return false;
+
+	/* bounce at first found */
+	for_each_mem_cgroup_tree(iter, memcg) {
+		if ((iter == memcg) || !mem_cgroup_online(iter))
+			continue;
+		return true;
+	}
+
+	return false;
+}
+
+/*
  * Reclaims as many pages from the given memcg as possible and moves
  * the rest to the parent.
  *
@@ -4807,7 +4831,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 	 */
 	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
 				(val == 1 || val == 0)) {
-		if (list_empty(&cont->children))
+		if (!memcg_has_children(memcg))
 			memcg->use_hierarchy = val;
 		else
 			retval = -EBUSY;
@@ -4924,8 +4948,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 	cgroup_lock();
 	mutex_lock(&set_limit_mutex);
 	if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
-		if (cgroup_task_count(cont) || (memcg->use_hierarchy &&
-						!list_empty(&cont->children))) {
+		if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
 			ret = -EBUSY;
 			goto out;
 		}
@@ -5341,8 +5364,7 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	cgroup_lock();
 
 	/* If under hierarchy, only empty-root can set this value */
-	if ((parent->use_hierarchy) ||
-	    (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
+	if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
 		cgroup_unlock();
 		return -EINVAL;
 	}
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/7] May god have mercy on my soul.
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
                   ` (3 preceding siblings ...)
  2013-01-11  9:45 ` [PATCH v2 4/7] memcg: fast hierarchy-aware child test Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 16:07   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
  2013-01-11  9:45 ` [PATCH v2 7/7] memcg: increment static branch right after limit set Glauber Costa
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aa4e258..c024614 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2909,7 +2909,7 @@ int memcg_cache_id(struct mem_cgroup *memcg)
  * operation, because that is its main call site.
  *
  * But when we create a new cache, we can call this as well if its parent
- * is kmem-limited. That will have to hold set_limit_mutex as well.
+ * is kmem-limited. That will have to hold cgroup_lock as well.
  */
 int memcg_update_cache_sizes(struct mem_cgroup *memcg)
 {
@@ -2924,7 +2924,7 @@ int memcg_update_cache_sizes(struct mem_cgroup *memcg)
 	 * the beginning of this conditional), is no longer 0. This
 	 * guarantees only one process will set the following boolean
 	 * to true. We don't need test_and_set because we're protected
-	 * by the set_limit_mutex anyway.
+	 * by the cgroup_lock anyway.
 	 */
 	memcg_kmem_set_activated(memcg);
 
@@ -3265,9 +3265,9 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 	 *
 	 * Still, we don't want anyone else freeing memcg_caches under our
 	 * noses, which can happen if a new memcg comes to life. As usual,
-	 * we'll take the set_limit_mutex to protect ourselves against this.
+	 * we'll take the cgroup_lock to protect ourselves against this.
 	 */
-	mutex_lock(&set_limit_mutex);
+	cgroup_lock();
 	for (i = 0; i < memcg_limited_groups_array_size; i++) {
 		c = s->memcg_params->memcg_caches[i];
 		if (!c)
@@ -3290,7 +3290,7 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 	}
-	mutex_unlock(&set_limit_mutex);
+	cgroup_unlock();
 }
 
 struct create_work {
@@ -4946,7 +4946,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 	 * can also get rid of the use_hierarchy check.
 	 */
 	cgroup_lock();
-	mutex_lock(&set_limit_mutex);
 	if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
 		if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
 			ret = -EBUSY;
@@ -4971,7 +4970,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 	} else
 		ret = res_counter_set_limit(&memcg->kmem, val);
 out:
-	mutex_unlock(&set_limit_mutex);
 	cgroup_unlock();
 
 	/*
@@ -5029,9 +5027,9 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
 	mem_cgroup_get(memcg);
 	static_key_slow_inc(&memcg_kmem_enabled_key);
 
-	mutex_lock(&set_limit_mutex);
+	cgroup_lock();
 	ret = memcg_update_cache_sizes(memcg);
-	mutex_unlock(&set_limit_mutex);
+	cgroup_unlock();
 #endif
 out:
 	return ret;
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
                   ` (4 preceding siblings ...)
  2013-01-11  9:45 ` [PATCH v2 5/7] May god have mercy on my soul Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 16:21   ` Michal Hocko
  2013-01-11  9:45 ` [PATCH v2 7/7] memcg: increment static branch right after limit set Glauber Costa
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

After the preparation work done in earlier patches, the
cgroup_lock can be trivially replaced with a memcg-specific lock. This
is an automatic translation in every site the values involved were
queried.

The sites were values are written, however, used to be naturally called
under cgroup_lock. This is the case for instance of the css_online
callback. For those, we now need to explicitly add the memcg_lock.

With this, all the calls to cgroup_lock outside cgroup core are gone.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c024614..5f3adbc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -472,6 +472,8 @@ enum res_type {
 #define MEM_CGROUP_RECLAIM_SHRINK_BIT	0x1
 #define MEM_CGROUP_RECLAIM_SHRINK	(1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
 
+static DEFINE_MUTEX(memcg_mutex);
+
 static void mem_cgroup_get(struct mem_cgroup *memcg);
 static void mem_cgroup_put(struct mem_cgroup *memcg);
 
@@ -2909,7 +2911,7 @@ int memcg_cache_id(struct mem_cgroup *memcg)
  * operation, because that is its main call site.
  *
  * But when we create a new cache, we can call this as well if its parent
- * is kmem-limited. That will have to hold cgroup_lock as well.
+ * is kmem-limited. That will have to hold memcg_mutex as well.
  */
 int memcg_update_cache_sizes(struct mem_cgroup *memcg)
 {
@@ -2924,7 +2926,7 @@ int memcg_update_cache_sizes(struct mem_cgroup *memcg)
 	 * the beginning of this conditional), is no longer 0. This
 	 * guarantees only one process will set the following boolean
 	 * to true. We don't need test_and_set because we're protected
-	 * by the cgroup_lock anyway.
+	 * by the memcg_mutex anyway.
 	 */
 	memcg_kmem_set_activated(memcg);
 
@@ -3265,9 +3267,9 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 	 *
 	 * Still, we don't want anyone else freeing memcg_caches under our
 	 * noses, which can happen if a new memcg comes to life. As usual,
-	 * we'll take the cgroup_lock to protect ourselves against this.
+	 * we'll take the memcg_mutex to protect ourselves against this.
 	 */
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 	for (i = 0; i < memcg_limited_groups_array_size; i++) {
 		c = s->memcg_params->memcg_caches[i];
 		if (!c)
@@ -3290,7 +3292,7 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 	}
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 }
 
 struct create_work {
@@ -4816,7 +4818,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 	if (parent)
 		parent_memcg = mem_cgroup_from_cont(parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 
 	if (memcg->use_hierarchy == val)
 		goto out;
@@ -4839,7 +4841,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
 		retval = -EINVAL;
 
 out:
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 
 	return retval;
 }
@@ -4939,13 +4941,10 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 	 * After it first became limited, changes in the value of the limit are
 	 * of course permitted.
 	 *
-	 * Taking the cgroup_lock is really offensive, but it is so far the only
-	 * way to guarantee that no children will appear. There are plenty of
-	 * other offenders, and they should all go away. Fine grained locking
-	 * is probably the way to go here. When we are fully hierarchical, we
-	 * can also get rid of the use_hierarchy check.
+	 * We are protected by the memcg_mutex, so no other cgroups can appear
+	 * in the mean time.
 	 */
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 	if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
 		if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
 			ret = -EBUSY;
@@ -4970,7 +4969,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 	} else
 		ret = res_counter_set_limit(&memcg->kmem, val);
 out:
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 
 	/*
 	 * We are by now familiar with the fact that we can't inc the static
@@ -5027,9 +5026,9 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
 	mem_cgroup_get(memcg);
 	static_key_slow_inc(&memcg_kmem_enabled_key);
 
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 	ret = memcg_update_cache_sizes(memcg);
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 #endif
 out:
 	return ret;
@@ -5359,17 +5358,17 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 
 	parent = mem_cgroup_from_cont(cgrp->parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 
 	/* If under hierarchy, only empty-root can set this value */
 	if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
-		cgroup_unlock();
+		mutex_unlock(&memcg_mutex);
 		return -EINVAL;
 	}
 
 	memcg->swappiness = val;
 
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 
 	return 0;
 }
@@ -5695,7 +5694,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
 
 	parent = mem_cgroup_from_cont(cgrp->parent);
 
-	cgroup_lock();
+	mutex_lock(&memcg_mutex);
 	/* oom-kill-disable is a flag for subhierarchy. */
 	if ((parent->use_hierarchy) ||
 	    (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
@@ -5705,7 +5704,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
 	memcg->oom_kill_disable = val;
 	if (!val)
 		memcg_oom_recover(memcg);
-	cgroup_unlock();
+	mutex_unlock(&memcg_mutex);
 	return 0;
 }
 
@@ -6148,6 +6147,7 @@ mem_cgroup_css_online(struct cgroup *cont)
 		return 0;
 	}
 
+	mutex_lock(&memcg_mutex);
 	parent = mem_cgroup_from_cont(cont->parent);
 
 	memcg->use_hierarchy = parent->use_hierarchy;
@@ -6182,6 +6182,7 @@ mem_cgroup_css_online(struct cgroup *cont)
 	atomic_set(&memcg->refcnt, 1);
 
 	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
+	mutex_unlock(&memcg_mutex);
 	if (error) {
 		/*
 		 * We call put now because our (and parent's) refcnts
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/7] memcg: increment static branch right after limit set.
  2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
                   ` (5 preceding siblings ...)
  2013-01-11  9:45 ` [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2013-01-11  9:45 ` Glauber Costa
  2013-01-18 16:23   ` Michal Hocko
  6 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-11  9:45 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, kamezawa.hiroyu, Johannes Weiner,
	Tejun Heo, Glauber Costa

We were deferring the kmemcg static branch increment to a later time,
due to a nasty dependency between the cpu_hotplug lock, taken by the
jump label update, and the cgroup_lock.

Now we no longer take the cgroup lock, and we can save ourselves the
trouble.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 mm/memcontrol.c | 31 +++++++------------------------
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5f3adbc..f87d6d2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4926,8 +4926,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 {
 	int ret = -EINVAL;
 #ifdef CONFIG_MEMCG_KMEM
-	bool must_inc_static_branch = false;
-
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 	/*
 	 * For simplicity, we won't allow this to be disabled.  It also can't
@@ -4958,7 +4956,13 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 			res_counter_set_limit(&memcg->kmem, RESOURCE_MAX);
 			goto out;
 		}
-		must_inc_static_branch = true;
+		static_key_slow_inc(&memcg_kmem_enabled_key);
+		/*
+		 * setting the active bit after the inc will guarantee no one
+		 * starts accounting before all call sites are patched
+		 */
+		memcg_kmem_set_active(memcg);
+
 		/*
 		 * kmem charges can outlive the cgroup. In the case of slab
 		 * pages, for instance, a page contain objects from various
@@ -4970,27 +4974,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
 		ret = res_counter_set_limit(&memcg->kmem, val);
 out:
 	mutex_unlock(&memcg_mutex);
-
-	/*
-	 * We are by now familiar with the fact that we can't inc the static
-	 * branch inside cgroup_lock. See disarm functions for details. A
-	 * worker here is overkill, but also wrong: After the limit is set, we
-	 * must start accounting right away. Since this operation can't fail,
-	 * we can safely defer it to here - no rollback will be needed.
-	 *
-	 * The boolean used to control this is also safe, because
-	 * KMEM_ACCOUNTED_ACTIVATED guarantees that only one process will be
-	 * able to set it to true;
-	 */
-	if (must_inc_static_branch) {
-		static_key_slow_inc(&memcg_kmem_enabled_key);
-		/*
-		 * setting the active bit after the inc will guarantee no one
-		 * starts accounting before all call sites are patched
-		 */
-		memcg_kmem_set_active(memcg);
-	}
-
 #endif
 	return ret;
 }
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach
  2013-01-11  9:45 ` [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2013-01-18 14:16   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 14:16 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:21, Glauber Costa wrote:
> Currently, we rely on the cgroup_lock() to prevent changes to
> move_charge_at_immigrate during task migration. However, this is only
> needed because the current strategy keeps checking this value throughout
> the whole process. Since all we need is serialization, one needs only to
> guarantee that whatever decision we made in the beginning of a specific
> migration is respected throughout the process.
> 
> We can achieve this by just saving it in mc. By doing this, no kind of
> locking is needed.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>

I would probably prefer to use a different name so that we know that
move_charge_at_immigrate is a property of the cgroup while the other
immigrate_flags (or whatever) is a temporal state when greping the code.
But nothing serious.

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 09255ec..18f4e76 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -412,6 +412,7 @@ static struct move_charge_struct {
>  	spinlock_t	  lock; /* for from, to */
>  	struct mem_cgroup *from;
>  	struct mem_cgroup *to;
> +	unsigned long move_charge_at_immigrate;
>  	unsigned long precharge;
>  	unsigned long moved_charge;
>  	unsigned long moved_swap;
> @@ -425,13 +426,13 @@ static struct move_charge_struct {
>  static bool move_anon(void)
>  {
>  	return test_bit(MOVE_CHARGE_TYPE_ANON,
> -					&mc.to->move_charge_at_immigrate);
> +					&mc.move_charge_at_immigrate);
>  }
>  
>  static bool move_file(void)
>  {
>  	return test_bit(MOVE_CHARGE_TYPE_FILE,
> -					&mc.to->move_charge_at_immigrate);
> +					&mc.move_charge_at_immigrate);
>  }
>  
>  /*
> @@ -5146,15 +5147,14 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  
>  	if (val >= (1 << NR_MOVE_TYPE))
>  		return -EINVAL;
> +
>  	/*
> -	 * We check this value several times in both in can_attach() and
> -	 * attach(), so we need cgroup lock to prevent this value from being
> -	 * inconsistent.
> +	 * No kind of locking is needed in here, because ->can_attach() will
> +	 * check this value once in the beginning of the process, and then carry
> +	 * on with stale data. This means that changes to this value will only
> +	 * affect task migrations starting after the change.
>  	 */
> -	cgroup_lock();
>  	memcg->move_charge_at_immigrate = val;
> -	cgroup_unlock();
> -
>  	return 0;
>  }
>  #else
> @@ -6530,8 +6530,15 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
>  	struct task_struct *p = cgroup_taskset_first(tset);
>  	int ret = 0;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +	unsigned long move_charge_at_immigrate;
>  
> -	if (memcg->move_charge_at_immigrate) {
> +	/*
> +	 * We are now commited to this value whatever it is. Changes in this
> +	 * tunable will only affect upcoming migrations, not the current one.
> +	 * So we need to save it, and keep it going.
> +	 */
> +	move_charge_at_immigrate  = memcg->move_charge_at_immigrate;
> +	if (move_charge_at_immigrate) {
>  		struct mm_struct *mm;
>  		struct mem_cgroup *from = mem_cgroup_from_task(p);
>  
> @@ -6551,6 +6558,7 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
>  			spin_lock(&mc.lock);
>  			mc.from = from;
>  			mc.to = memcg;
> +			mc.move_charge_at_immigrate = move_charge_at_immigrate;
>  			spin_unlock(&mc.lock);
>  			/* We set mc.moving_task later */
>  
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-11  9:45 ` [PATCH v2 2/7] memcg: split part of memcg creation to css_online Glauber Costa
@ 2013-01-18 15:25   ` Michal Hocko
  2013-01-18 19:28     ` Glauber Costa
  2013-01-21  7:33     ` Glauber Costa
  0 siblings, 2 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 15:25 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:22, Glauber Costa wrote:
> Although there is arguably some value in doing this per se, the main

This begs for asking what are the other reasons but I would just leave
it alone and focus on the code reshuffling.

> goal of this patch is to make room for the locking changes to come.
> 
> With all the value assignment from parent happening in a context where
> our iterators can already be used, we can safely lock against value
> change in some key values like use_hierarchy, without resorting to the
> cgroup core at all.

Sorry but I do not understand the above. Please be more specific here.
Why the context matters if it matters at all.

Maybe something like the below?
"
mem_cgroup_css_alloc is currently responsible for the complete
initialization of a newly created memcg. Cgroup core offers another
stage of initialization - css_online - which is called after the newly
created group is already linked to the cgroup hierarchy.
All attributes inheritted from the parent group can be safely moved
into mem_cgroup_css_online because nobody can see the newly created
group yet. This has also an advantage that the parent can already see
the child group (via iterators) by the time we inherit values from it
so he can do appropriate steps (e.g. don't allow changing use_hierarchy
etc...).

This patch is a preparatory work for later locking rework to get rid of
big cgroup lock from memory controller code.
"

> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 18f4e76..2229945 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6090,12 +6090,41 @@ mem_cgroup_css_alloc(struct cgroup *cont)

parent becomes unused (except for parent = NULL; in the root_cgroup
branch).

>  						&per_cpu(memcg_stock, cpu);
>  			INIT_WORK(&stock->work, drain_local_stock);
>  		}
> -	} else {
> -		parent = mem_cgroup_from_cont(cont->parent);
> -		memcg->use_hierarchy = parent->use_hierarchy;
> -		memcg->oom_kill_disable = parent->oom_kill_disable;
> +
> +		res_counter_init(&memcg->res, NULL);
> +		res_counter_init(&memcg->memsw, NULL);
> +		res_counter_init(&memcg->kmem, NULL);
>  	}
>  
	/*
	 * All memcg attributes which are not inherited throughout
	 * the hierarchy are initialized here
	 */
> +	memcg->last_scanned_node = MAX_NUMNODES;
> +	INIT_LIST_HEAD(&memcg->oom_notify);
> +	atomic_set(&memcg->refcnt, 1);
> +	memcg->move_charge_at_immigrate = 0;
> +	mutex_init(&memcg->thresholds_lock);
> +	spin_lock_init(&memcg->move_lock);
> +
> +	return &memcg->css;
> +
> +free_out:
> +	__mem_cgroup_free(memcg);
> +	return ERR_PTR(error);
> +}
> +

> +static int
> +mem_cgroup_css_online(struct cgroup *cont)
> +{
> +	struct mem_cgroup *memcg, *parent;
> +	int error = 0;
> +
> +	if (!cont->parent)
> +		return 0;
> +
> +	memcg = mem_cgroup_from_cont(cont);
> +	parent = mem_cgroup_from_cont(cont->parent);
> +
	/*
	 * Initialization of attributes which are inherited from parent.
	 */
> +	memcg->use_hierarchy = parent->use_hierarchy;
> +	memcg->oom_kill_disable = parent->oom_kill_disable;
> +

	/*
	 * Initialization of attributes which are linked with parent
	 * based on use_hierarchy.
	 */
>  	if (parent && parent->use_hierarchy) {

parent cannot be NULL.

>  		res_counter_init(&memcg->res, &parent->res);
>  		res_counter_init(&memcg->memsw, &parent->memsw);
> @@ -6120,15 +6149,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  		if (parent && parent != root_mem_cgroup)
>  			mem_cgroup_subsys.broken_hierarchy = true;
>  	}
> -	memcg->last_scanned_node = MAX_NUMNODES;
> -	INIT_LIST_HEAD(&memcg->oom_notify);
>  
> -	if (parent)
> -		memcg->swappiness = mem_cgroup_swappiness(parent);
> -	atomic_set(&memcg->refcnt, 1);
> -	memcg->move_charge_at_immigrate = 0;
> -	mutex_init(&memcg->thresholds_lock);
> -	spin_lock_init(&memcg->move_lock);
> +	memcg->swappiness = mem_cgroup_swappiness(parent);

Please move this up to oom_kill_disable and use_hierarchy
initialization.

	/*
	 * kmem initialization depends on memcg->res initialization
	 * because it relies on parent_mem_cgroup
	 */
>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>  	if (error) {
> @@ -6138,12 +6160,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  		 * call __mem_cgroup_free, so return directly
>  		 */
>  		mem_cgroup_put(memcg);

Hmm, this doesn't release parent for use_hierarchy. The bug is there
from before this patch. So it should go into a separate patch.

> -		return ERR_PTR(error);
>  	}
> -	return &memcg->css;
> -free_out:
> -	__mem_cgroup_free(memcg);
> -	return ERR_PTR(error);
> +	return error;
>  }
>  
>  static void mem_cgroup_css_offline(struct cgroup *cont)
> @@ -6753,6 +6771,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>  	.name = "memory",
>  	.subsys_id = mem_cgroup_subsys_id,
>  	.css_alloc = mem_cgroup_css_alloc,
> +	.css_online = mem_cgroup_css_online,
>  	.css_offline = mem_cgroup_css_offline,
>  	.css_free = mem_cgroup_css_free,
>  	.can_attach = mem_cgroup_can_attach,
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-11  9:45 ` [PATCH v2 3/7] memcg: provide online test for memcg Glauber Costa
@ 2013-01-18 15:37   ` Michal Hocko
  2013-01-18 15:56     ` Michal Hocko
  2013-01-18 19:41     ` Glauber Costa
  0 siblings, 2 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 15:37 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:23, Glauber Costa wrote:
> Since we are now splitting the memcg creation in two parts, following
> the cgroup standard, it would be helpful to be able to determine if a
> created memcg is already online.
> 
> We can do this by initially forcing the refcnt to 0, and waiting until
> the last minute to flip it to 1.

Is this useful, though? What does it tell you? mem_cgroup_online can say
false even though half of the attributes have been already copied for
example. I think it should be vice versa. It should mark the point when
we _start_ copying values. mem_cgroup_online is not the best name then
of course. It depends what it is going to be used for...

> During memcg's lifetime, this value
> will vary. But if it ever reaches 0 again, memcg will be destructed. We
> can therefore be sure that any value different than 0 will mean that
> our group is online.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  mm/memcontrol.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2229945..2ac2808 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -475,6 +475,11 @@ enum res_type {
>  static void mem_cgroup_get(struct mem_cgroup *memcg);
>  static void mem_cgroup_put(struct mem_cgroup *memcg);
>  
> +static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
> +{
> +	return atomic_read(&memcg->refcnt) > 0;
> +}
> +
>  static inline
>  struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
>  {
> @@ -6098,7 +6103,7 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>  
>  	memcg->last_scanned_node = MAX_NUMNODES;
>  	INIT_LIST_HEAD(&memcg->oom_notify);
> -	atomic_set(&memcg->refcnt, 1);
> +	atomic_set(&memcg->refcnt, 0);

I would prefer a comment rather than an explicit atomic_set. The value
is zero already.

>  	memcg->move_charge_at_immigrate = 0;
>  	mutex_init(&memcg->thresholds_lock);
>  	spin_lock_init(&memcg->move_lock);
> @@ -6116,10 +6121,13 @@ mem_cgroup_css_online(struct cgroup *cont)
>  	struct mem_cgroup *memcg, *parent;
>  	int error = 0;
>
	
as I said above atomic_set(&memc->refcnt, 1) should be set here before
we start copying anything.

But maybe I have missed your intention and later patches in the series
will convince me...

> -	if (!cont->parent)
> +	memcg = mem_cgroup_from_cont(cont);
> +	if (!cont->parent) {
> +		/* no need to lock, since this is the root cgroup */
> +		atomic_set(&memcg->refcnt, 1);
>  		return 0;
> +	}
>  
> -	memcg = mem_cgroup_from_cont(cont);
>  	parent = mem_cgroup_from_cont(cont->parent);
>  
>  	memcg->use_hierarchy = parent->use_hierarchy;
> @@ -6151,6 +6159,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>  	}
>  
>  	memcg->swappiness = mem_cgroup_swappiness(parent);
> +	atomic_set(&memcg->refcnt, 1);
>  
>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>  	if (error) {
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-18 15:37   ` Michal Hocko
@ 2013-01-18 15:56     ` Michal Hocko
  2013-01-18 19:42       ` Glauber Costa
  2013-01-18 19:41     ` Glauber Costa
  1 sibling, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 15:56 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 18-01-13 16:37:15, Michal Hocko wrote:
> On Fri 11-01-13 13:45:23, Glauber Costa wrote:
> > Since we are now splitting the memcg creation in two parts, following
> > the cgroup standard, it would be helpful to be able to determine if a
> > created memcg is already online.
> > 
> > We can do this by initially forcing the refcnt to 0, and waiting until
> > the last minute to flip it to 1.
> 
> Is this useful, though? What does it tell you? mem_cgroup_online can say
> false even though half of the attributes have been already copied for
> example. I think it should be vice versa. It should mark the point when
> we _start_ copying values. mem_cgroup_online is not the best name then
> of course. It depends what it is going to be used for...

And the later patch in the series shows that it is really not helpful on
its own. You need to rely on other lock to be helpful. This calls for
troubles and I do not think the win you get is really worth it. All it
gives you is basically that you can change an inheritable attribute
while your child is between css_alloc and css_online and so your
attribute change doesn't fail if the child creation fails between those
two. Is this the case you want to handle? Does it really even matter?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-11  9:45 ` [PATCH v2 4/7] memcg: fast hierarchy-aware child test Glauber Costa
@ 2013-01-18 16:06   ` Michal Hocko
  2013-01-21  7:58     ` Glauber Costa
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 16:06 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:24, Glauber Costa wrote:
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2ac2808..aa4e258 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4723,6 +4723,30 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
>  }
>  
>  /*
> + * must be called with memcg_mutex held, unless the cgroup is guaranteed to be

This one doesn't exist yet.

> + * already dead (like in mem_cgroup_force_empty, for instance).  This is
> + * different than mem_cgroup_count_children, in the sense that we don't really
> + * care how many children we have, we only need to know if we have any. It is
> + * also count any memcg without hierarchy as infertile for that matter.
> + */
> +static inline bool memcg_has_children(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *iter;
> +
> +	if (!memcg->use_hierarchy)
> +		return false;
> +
> +	/* bounce at first found */
> +	for_each_mem_cgroup_tree(iter, memcg) {

This will not work. Consider you will see a !online memcg. What happens?
mem_cgroup_iter will css_get group that it returns and css_put it when
it visits another one or finishes the loop. So your poor iter will be
released before it gets born. Not good.

> +		if ((iter == memcg) || !mem_cgroup_online(iter))
> +			continue;
> +		return true;

mem_cgroup_iter_break here if you _really_ insist on
for_each_mem_cgroup_tree.

I still think that the hammer is too big for what we need here.

> +	}
> +
> +	return false;
> +}
> +
> +/*
>   * Reclaims as many pages from the given memcg as possible and moves
>   * the rest to the parent.
>   *
[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/7] May god have mercy on my soul.
  2013-01-11  9:45 ` [PATCH v2 5/7] May god have mercy on my soul Glauber Costa
@ 2013-01-18 16:07   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 16:07 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

Please merge this into the following patch.

On Fri 11-01-13 13:45:25, Glauber Costa wrote:
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  mm/memcontrol.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aa4e258..c024614 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2909,7 +2909,7 @@ int memcg_cache_id(struct mem_cgroup *memcg)
>   * operation, because that is its main call site.
>   *
>   * But when we create a new cache, we can call this as well if its parent
> - * is kmem-limited. That will have to hold set_limit_mutex as well.
> + * is kmem-limited. That will have to hold cgroup_lock as well.
>   */
>  int memcg_update_cache_sizes(struct mem_cgroup *memcg)
>  {
> @@ -2924,7 +2924,7 @@ int memcg_update_cache_sizes(struct mem_cgroup *memcg)
>  	 * the beginning of this conditional), is no longer 0. This
>  	 * guarantees only one process will set the following boolean
>  	 * to true. We don't need test_and_set because we're protected
> -	 * by the set_limit_mutex anyway.
> +	 * by the cgroup_lock anyway.
>  	 */
>  	memcg_kmem_set_activated(memcg);
>  
> @@ -3265,9 +3265,9 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  	 *
>  	 * Still, we don't want anyone else freeing memcg_caches under our
>  	 * noses, which can happen if a new memcg comes to life. As usual,
> -	 * we'll take the set_limit_mutex to protect ourselves against this.
> +	 * we'll take the cgroup_lock to protect ourselves against this.
>  	 */
> -	mutex_lock(&set_limit_mutex);
> +	cgroup_lock();
>  	for (i = 0; i < memcg_limited_groups_array_size; i++) {
>  		c = s->memcg_params->memcg_caches[i];
>  		if (!c)
> @@ -3290,7 +3290,7 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  		cancel_work_sync(&c->memcg_params->destroy);
>  		kmem_cache_destroy(c);
>  	}
> -	mutex_unlock(&set_limit_mutex);
> +	cgroup_unlock();
>  }
>  
>  struct create_work {
> @@ -4946,7 +4946,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  	 * can also get rid of the use_hierarchy check.
>  	 */
>  	cgroup_lock();
> -	mutex_lock(&set_limit_mutex);
>  	if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
>  		if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
>  			ret = -EBUSY;
> @@ -4971,7 +4970,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  	} else
>  		ret = res_counter_set_limit(&memcg->kmem, val);
>  out:
> -	mutex_unlock(&set_limit_mutex);
>  	cgroup_unlock();
>  
>  	/*
> @@ -5029,9 +5027,9 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>  	mem_cgroup_get(memcg);
>  	static_key_slow_inc(&memcg_kmem_enabled_key);
>  
> -	mutex_lock(&set_limit_mutex);
> +	cgroup_lock();
>  	ret = memcg_update_cache_sizes(memcg);
> -	mutex_unlock(&set_limit_mutex);
> +	cgroup_unlock();
>  #endif
>  out:
>  	return ret;
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock
  2013-01-11  9:45 ` [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2013-01-18 16:21   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 16:21 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:26, Glauber Costa wrote:
> After the preparation work done in earlier patches, the
> cgroup_lock can be trivially replaced with a memcg-specific lock. This
> is an automatic translation in every site the values involved were
> queried.
> 
> The sites were values are written, however, used to be naturally called
> under cgroup_lock. This is the case for instance of the css_online
> callback. For those, we now need to explicitly add the memcg_lock.
> 
> With this, all the calls to cgroup_lock outside cgroup core are gone.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>

Please add a doc for memcg_mutex

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c | 43 ++++++++++++++++++++++---------------------
>  1 file changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c024614..5f3adbc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -472,6 +472,8 @@ enum res_type {
>  #define MEM_CGROUP_RECLAIM_SHRINK_BIT	0x1
>  #define MEM_CGROUP_RECLAIM_SHRINK	(1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
>  
  /*
   * This lock protects TODO
   */
> +static DEFINE_MUTEX(memcg_mutex);
> +
>  static void mem_cgroup_get(struct mem_cgroup *memcg);
>  static void mem_cgroup_put(struct mem_cgroup *memcg);
>  
> @@ -2909,7 +2911,7 @@ int memcg_cache_id(struct mem_cgroup *memcg)
>   * operation, because that is its main call site.
>   *
>   * But when we create a new cache, we can call this as well if its parent
> - * is kmem-limited. That will have to hold cgroup_lock as well.
> + * is kmem-limited. That will have to hold memcg_mutex as well.
>   */
>  int memcg_update_cache_sizes(struct mem_cgroup *memcg)
>  {
> @@ -2924,7 +2926,7 @@ int memcg_update_cache_sizes(struct mem_cgroup *memcg)
>  	 * the beginning of this conditional), is no longer 0. This
>  	 * guarantees only one process will set the following boolean
>  	 * to true. We don't need test_and_set because we're protected
> -	 * by the cgroup_lock anyway.
> +	 * by the memcg_mutex anyway.
>  	 */
>  	memcg_kmem_set_activated(memcg);
>  
> @@ -3265,9 +3267,9 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  	 *
>  	 * Still, we don't want anyone else freeing memcg_caches under our
>  	 * noses, which can happen if a new memcg comes to life. As usual,
> -	 * we'll take the cgroup_lock to protect ourselves against this.
> +	 * we'll take the memcg_mutex to protect ourselves against this.
>  	 */
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  	for (i = 0; i < memcg_limited_groups_array_size; i++) {
>  		c = s->memcg_params->memcg_caches[i];
>  		if (!c)
> @@ -3290,7 +3292,7 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  		cancel_work_sync(&c->memcg_params->destroy);
>  		kmem_cache_destroy(c);
>  	}
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  }
>  
>  struct create_work {
> @@ -4816,7 +4818,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
>  	if (parent)
>  		parent_memcg = mem_cgroup_from_cont(parent);
>  
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  
>  	if (memcg->use_hierarchy == val)
>  		goto out;
> @@ -4839,7 +4841,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
>  		retval = -EINVAL;
>  
>  out:
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  
>  	return retval;
>  }
> @@ -4939,13 +4941,10 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  	 * After it first became limited, changes in the value of the limit are
>  	 * of course permitted.
>  	 *
> -	 * Taking the cgroup_lock is really offensive, but it is so far the only
> -	 * way to guarantee that no children will appear. There are plenty of
> -	 * other offenders, and they should all go away. Fine grained locking
> -	 * is probably the way to go here. When we are fully hierarchical, we
> -	 * can also get rid of the use_hierarchy check.
> +	 * We are protected by the memcg_mutex, so no other cgroups can appear
> +	 * in the mean time.
>  	 */
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  	if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
>  		if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
>  			ret = -EBUSY;
> @@ -4970,7 +4969,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  	} else
>  		ret = res_counter_set_limit(&memcg->kmem, val);
>  out:
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  
>  	/*
>  	 * We are by now familiar with the fact that we can't inc the static
> @@ -5027,9 +5026,9 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
>  	mem_cgroup_get(memcg);
>  	static_key_slow_inc(&memcg_kmem_enabled_key);
>  
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  	ret = memcg_update_cache_sizes(memcg);
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  #endif
>  out:
>  	return ret;
> @@ -5359,17 +5358,17 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
>  
>  	parent = mem_cgroup_from_cont(cgrp->parent);
>  
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  
>  	/* If under hierarchy, only empty-root can set this value */
>  	if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
> -		cgroup_unlock();
> +		mutex_unlock(&memcg_mutex);
>  		return -EINVAL;
>  	}
>  
>  	memcg->swappiness = val;
>  
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  
>  	return 0;
>  }
> @@ -5695,7 +5694,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
>  
>  	parent = mem_cgroup_from_cont(cgrp->parent);
>  
> -	cgroup_lock();
> +	mutex_lock(&memcg_mutex);
>  	/* oom-kill-disable is a flag for subhierarchy. */
>  	if ((parent->use_hierarchy) ||
>  	    (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
> @@ -5705,7 +5704,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
>  	memcg->oom_kill_disable = val;
>  	if (!val)
>  		memcg_oom_recover(memcg);
> -	cgroup_unlock();
> +	mutex_unlock(&memcg_mutex);
>  	return 0;
>  }
>  
> @@ -6148,6 +6147,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>  		return 0;
>  	}
>  
> +	mutex_lock(&memcg_mutex);
>  	parent = mem_cgroup_from_cont(cont->parent);
>  
>  	memcg->use_hierarchy = parent->use_hierarchy;
> @@ -6182,6 +6182,7 @@ mem_cgroup_css_online(struct cgroup *cont)
>  	atomic_set(&memcg->refcnt, 1);
>  
>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> +	mutex_unlock(&memcg_mutex);
>  	if (error) {
>  		/*
>  		 * We call put now because our (and parent's) refcnts
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] memcg: increment static branch right after limit set.
  2013-01-11  9:45 ` [PATCH v2 7/7] memcg: increment static branch right after limit set Glauber Costa
@ 2013-01-18 16:23   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-01-18 16:23 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Fri 11-01-13 13:45:27, Glauber Costa wrote:
> We were deferring the kmemcg static branch increment to a later time,
> due to a nasty dependency between the cpu_hotplug lock, taken by the
> jump label update, and the cgroup_lock.
> 
> Now we no longer take the cgroup lock, and we can save ourselves the
> trouble.

What a relief.

> Signed-off-by: Glauber Costa <glommer@parallels.com>

Acked-by: Michal Hocko <mhocko@suse>

> ---
>  mm/memcontrol.c | 31 +++++++------------------------
>  1 file changed, 7 insertions(+), 24 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5f3adbc..f87d6d2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4926,8 +4926,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  {
>  	int ret = -EINVAL;
>  #ifdef CONFIG_MEMCG_KMEM
> -	bool must_inc_static_branch = false;
> -
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  	/*
>  	 * For simplicity, we won't allow this to be disabled.  It also can't
> @@ -4958,7 +4956,13 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  			res_counter_set_limit(&memcg->kmem, RESOURCE_MAX);
>  			goto out;
>  		}
> -		must_inc_static_branch = true;
> +		static_key_slow_inc(&memcg_kmem_enabled_key);
> +		/*
> +		 * setting the active bit after the inc will guarantee no one
> +		 * starts accounting before all call sites are patched
> +		 */
> +		memcg_kmem_set_active(memcg);
> +
>  		/*
>  		 * kmem charges can outlive the cgroup. In the case of slab
>  		 * pages, for instance, a page contain objects from various
> @@ -4970,27 +4974,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
>  		ret = res_counter_set_limit(&memcg->kmem, val);
>  out:
>  	mutex_unlock(&memcg_mutex);
> -
> -	/*
> -	 * We are by now familiar with the fact that we can't inc the static
> -	 * branch inside cgroup_lock. See disarm functions for details. A
> -	 * worker here is overkill, but also wrong: After the limit is set, we
> -	 * must start accounting right away. Since this operation can't fail,
> -	 * we can safely defer it to here - no rollback will be needed.
> -	 *
> -	 * The boolean used to control this is also safe, because
> -	 * KMEM_ACCOUNTED_ACTIVATED guarantees that only one process will be
> -	 * able to set it to true;
> -	 */
> -	if (must_inc_static_branch) {
> -		static_key_slow_inc(&memcg_kmem_enabled_key);
> -		/*
> -		 * setting the active bit after the inc will guarantee no one
> -		 * starts accounting before all call sites are patched
> -		 */
> -		memcg_kmem_set_active(memcg);
> -	}
> -
>  #endif
>  	return ret;
>  }
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-18 15:25   ` Michal Hocko
@ 2013-01-18 19:28     ` Glauber Costa
  2013-01-21  7:33     ` Glauber Costa
  1 sibling, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-18 19:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 07:25 AM, Michal Hocko wrote:
> On Fri 11-01-13 13:45:22, Glauber Costa wrote:
>> Although there is arguably some value in doing this per se, the main
> 
> This begs for asking what are the other reasons but I would just leave
> it alone and focus on the code reshuffling.
> 
Yes, Sir.

>> goal of this patch is to make room for the locking changes to come.
>>
>> With all the value assignment from parent happening in a context where
>> our iterators can already be used, we can safely lock against value
>> change in some key values like use_hierarchy, without resorting to the
>> cgroup core at all.
> 
> Sorry but I do not understand the above. Please be more specific here.
> Why the context matters if it matters at all.
> 
> Maybe something like the below?
> "
> mem_cgroup_css_alloc is currently responsible for the complete
> initialization of a newly created memcg. Cgroup core offers another
> stage of initialization - css_online - which is called after the newly
> created group is already linked to the cgroup hierarchy.
> All attributes inheritted from the parent group can be safely moved
> into mem_cgroup_css_online because nobody can see the newly created
> group yet. This has also an advantage that the parent can already see
> the child group (via iterators) by the time we inherit values from it
> so he can do appropriate steps (e.g. don't allow changing use_hierarchy
> etc...).
> 
> This patch is a preparatory work for later locking rework to get rid of
> big cgroup lock from memory controller code.
> "
> 
Well, I will look into merging some of it, but AFAIK, you are explaining
why is it safe (a good thing to do), while I was focusing on telling our
future readers why is it needed.

I'll try to rewrite for clarity

> 
> 	/*
> 	 * Initialization of attributes which are linked with parent
> 	 * based on use_hierarchy.
> 	 */
>>  	if (parent && parent->use_hierarchy) {
> 
> parent cannot be NULL.
> 
indeed.

>>  		res_counter_init(&memcg->res, &parent->res);
>>  		res_counter_init(&memcg->memsw, &parent->memsw);
>> @@ -6120,15 +6149,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		if (parent && parent != root_mem_cgroup)
>>  			mem_cgroup_subsys.broken_hierarchy = true;
>>  	}
>> -	memcg->last_scanned_node = MAX_NUMNODES;
>> -	INIT_LIST_HEAD(&memcg->oom_notify);
>>  
>> -	if (parent)
>> -		memcg->swappiness = mem_cgroup_swappiness(parent);
>> -	atomic_set(&memcg->refcnt, 1);
>> -	memcg->move_charge_at_immigrate = 0;
>> -	mutex_init(&memcg->thresholds_lock);
>> -	spin_lock_init(&memcg->move_lock);
>> +	memcg->swappiness = mem_cgroup_swappiness(parent);
> 
> Please move this up to oom_kill_disable and use_hierarchy
> initialization.
> 
Yes, Sir!

> 	/*
> 	 * kmem initialization depends on memcg->res initialization
> 	 * because it relies on parent_mem_cgroup
> 	 */
>>  	error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
>>  	if (error) {
>> @@ -6138,12 +6160,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  		 * call __mem_cgroup_free, so return directly
>>  		 */
>>  		mem_cgroup_put(memcg);
> 
> Hmm, this doesn't release parent for use_hierarchy. The bug is there
> from before this patch. So it should go into a separate patch.
> 
Good catch.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-18 15:37   ` Michal Hocko
  2013-01-18 15:56     ` Michal Hocko
@ 2013-01-18 19:41     ` Glauber Costa
  1 sibling, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-18 19:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 07:37 AM, Michal Hocko wrote:
> On Fri 11-01-13 13:45:23, Glauber Costa wrote:
>> Since we are now splitting the memcg creation in two parts, following
>> the cgroup standard, it would be helpful to be able to determine if a
>> created memcg is already online.
>>
>> We can do this by initially forcing the refcnt to 0, and waiting until
>> the last minute to flip it to 1.
> 
> Is this useful, though? What does it tell you? mem_cgroup_online can say
> false even though half of the attributes have been already copied for
> example. I think it should be vice versa. It should mark the point when
> we _start_ copying values. mem_cgroup_online is not the best name then
> of course. It depends what it is going to be used for...
> 

I think you are right in the sense that setting it before copying any
fields is the correct behavior - thanks.

In this sense, this works as a commitment that we will have a complete
child, rather than a statement that we have a complete child.

>> During memcg's lifetime, this value
>> will vary. But if it ever reaches 0 again, memcg will be destructed. We
>> can therefore be sure that any value different than 0 will mean that
>> our group is online.
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> ---
>>  mm/memcontrol.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 2229945..2ac2808 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -475,6 +475,11 @@ enum res_type {
>>  static void mem_cgroup_get(struct mem_cgroup *memcg);
>>  static void mem_cgroup_put(struct mem_cgroup *memcg);
>>  
>> +static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
>> +{
>> +	return atomic_read(&memcg->refcnt) > 0;
>> +}
>> +
>>  static inline
>>  struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
>>  {
>> @@ -6098,7 +6103,7 @@ mem_cgroup_css_alloc(struct cgroup *cont)
>>  
>>  	memcg->last_scanned_node = MAX_NUMNODES;
>>  	INIT_LIST_HEAD(&memcg->oom_notify);
>> -	atomic_set(&memcg->refcnt, 1);
>> +	atomic_set(&memcg->refcnt, 0);
> 
> I would prefer a comment rather than an explicit atomic_set. The value
> is zero already.
> 
Yes, Sir!

>>  	memcg->move_charge_at_immigrate = 0;
>>  	mutex_init(&memcg->thresholds_lock);
>>  	spin_lock_init(&memcg->move_lock);
>> @@ -6116,10 +6121,13 @@ mem_cgroup_css_online(struct cgroup *cont)
>>  	struct mem_cgroup *memcg, *parent;
>>  	int error = 0;
>>
> 	
> as I said above atomic_set(&memc->refcnt, 1) should be set here before
> we start copying anything.
> 
> But maybe I have missed your intention and later patches in the series
> will convince me...
> 

It went the other way around...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-18 15:56     ` Michal Hocko
@ 2013-01-18 19:42       ` Glauber Costa
  2013-01-18 19:43         ` Glauber Costa
  0 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-18 19:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 07:56 AM, Michal Hocko wrote:
> On Fri 18-01-13 16:37:15, Michal Hocko wrote:
>> On Fri 11-01-13 13:45:23, Glauber Costa wrote:
>>> Since we are now splitting the memcg creation in two parts, following
>>> the cgroup standard, it would be helpful to be able to determine if a
>>> created memcg is already online.
>>>
>>> We can do this by initially forcing the refcnt to 0, and waiting until
>>> the last minute to flip it to 1.
>>
>> Is this useful, though? What does it tell you? mem_cgroup_online can say
>> false even though half of the attributes have been already copied for
>> example. I think it should be vice versa. It should mark the point when
>> we _start_ copying values. mem_cgroup_online is not the best name then
>> of course. It depends what it is going to be used for...
> 
> And the later patch in the series shows that it is really not helpful on
> its own. You need to rely on other lock to be helpful.
No, no need not.

The lock is there to protect the other fields, specially the outer
iterator. Not this in particular

> This calls for
> troubles and I do not think the win you get is really worth it. All it
> gives you is basically that you can change an inheritable attribute
> while your child is between css_alloc and css_online and so your
> attribute change doesn't fail if the child creation fails between those
> two. Is this the case you want to handle? Does it really even matter?
> 

I think it matters a lot. Aside from the before vs after discussion to
which I've already conceded, without this protection we can't guarantee
that we won't end up with an inconsistent value of the tunables between
parent and child.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] memcg: provide online test for memcg
  2013-01-18 19:42       ` Glauber Costa
@ 2013-01-18 19:43         ` Glauber Costa
  0 siblings, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-18 19:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 11:42 AM, Glauber Costa wrote:
>> And the later patch in the series shows that it is really not helpful on
>> > its own. You need to rely on other lock to be helpful.
> No, no need not.
geee, what kind of phrase is that??

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-18 15:25   ` Michal Hocko
  2013-01-18 19:28     ` Glauber Costa
@ 2013-01-21  7:33     ` Glauber Costa
  2013-01-21  8:38       ` Michal Hocko
  1 sibling, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-21  7:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 07:25 PM, Michal Hocko wrote:
>> -	spin_lock_init(&memcg->move_lock);
>> > +	memcg->swappiness = mem_cgroup_swappiness(parent);
> Please move this up to oom_kill_disable and use_hierarchy
> initialization.

One thing: wouldn't moving it to inside use_hierarchy be a change of
behavior here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-18 16:06   ` Michal Hocko
@ 2013-01-21  7:58     ` Glauber Costa
  2013-01-21  8:34       ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-21  7:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/18/2013 08:06 PM, Michal Hocko wrote:
>> +	/* bounce at first found */
>> > +	for_each_mem_cgroup_tree(iter, memcg) {
> This will not work. Consider you will see a !online memcg. What happens?
> mem_cgroup_iter will css_get group that it returns and css_put it when
> it visits another one or finishes the loop. So your poor iter will be
> released before it gets born. Not good.
> 
Reading this again, I don't really follow. The iterator is not supposed
to put() anything it hasn't get()'d before, so we will never release the
group. Note that if it ever appears in here, the css refcnt is expected
to be at least 1 already.

The online test relies on the memcg refcnt, not on the css refcnt.

Actually, now that the value setting is all done in css_online, the css
refcnt should be enough to denote if the cgroup already has children,
without a memcg-specific test. The css refcnt is bumped somewhere
between alloc and online. Unless Tejun objects it, I think I will just
get rid of the online test, and rely on the fact that if the iterator
sees any children, we should already online.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-21  7:58     ` Glauber Costa
@ 2013-01-21  8:34       ` Michal Hocko
  2013-01-21  8:41         ` Glauber Costa
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-01-21  8:34 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Mon 21-01-13 11:58:49, Glauber Costa wrote:
> On 01/18/2013 08:06 PM, Michal Hocko wrote:
> >> +	/* bounce at first found */
> >> > +	for_each_mem_cgroup_tree(iter, memcg) {
> > This will not work. Consider you will see a !online memcg. What happens?
> > mem_cgroup_iter will css_get group that it returns and css_put it when
> > it visits another one or finishes the loop. So your poor iter will be
> > released before it gets born. Not good.
> > 
> Reading this again, I don't really follow. The iterator is not supposed
> to put() anything it hasn't get()'d before, so we will never release the
> group. Note that if it ever appears in here, the css refcnt is expected
> to be at least 1 already.
> 
> The online test relies on the memcg refcnt, not on the css refcnt.

Bahh, yeah, sorry about the confusion. Damn, it's not the first time I
managed to mix those two...

> Actually, now that the value setting is all done in css_online, the css
> refcnt should be enough to denote if the cgroup already has children,
> without a memcg-specific test. The css refcnt is bumped somewhere
> between alloc and online. 

Yes, in init_cgroup_css.

> Unless Tejun objects it, I think I will just get rid of the online
> test, and rely on the fact that if the iterator sees any children, we
> should already online.

Which means that we are back to list_empty(&cgrp->children) test, aren't
we. We just call it a different name. If you really insist on not using
children directly then do something like:
	struct cgroup *pos;

	if (!memcg->use_hierarchy)
		cgroup_for_each_child(pos, memcg->css.cgroup)
			return true;

	return false;

This still has an issue that a change (e.g. vm_swappiness) that requires
this check will fail even though the child creation fails after it is
made visible (e.g. during css_online).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-21  7:33     ` Glauber Costa
@ 2013-01-21  8:38       ` Michal Hocko
  2013-01-21  8:42         ` Glauber Costa
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-01-21  8:38 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Mon 21-01-13 11:33:20, Glauber Costa wrote:
> On 01/18/2013 07:25 PM, Michal Hocko wrote:
> >> -	spin_lock_init(&memcg->move_lock);
> >> > +	memcg->swappiness = mem_cgroup_swappiness(parent);
> > Please move this up to oom_kill_disable and use_hierarchy
> > initialization.
> 
> One thing: wouldn't moving it to inside use_hierarchy be a change of
> behavior here?

I do not see how it would change the behavior. But maybe I wasn't clear
enough. I just wanted to make all three:
	memcg->use_hierarchy = parent->use_hierarchy;
	memcg->oom_kill_disable = parent->oom_kill_disable;
	memcg->swappiness = mem_cgroup_swappiness(parent);

in the same visual block so that we can split the function into three
parts. Inherited values which don't depend on use_hierarchy, those that
depend on use_hierarchy and the rest that depends on the previous
decisions (kmem e.g.).
Makes sense?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-21  8:34       ` Michal Hocko
@ 2013-01-21  8:41         ` Glauber Costa
  2013-01-21  9:15           ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2013-01-21  8:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/21/2013 12:34 PM, Michal Hocko wrote:
> On Mon 21-01-13 11:58:49, Glauber Costa wrote:
>> On 01/18/2013 08:06 PM, Michal Hocko wrote:
>>>> +	/* bounce at first found */
>>>>> +	for_each_mem_cgroup_tree(iter, memcg) {
>>> This will not work. Consider you will see a !online memcg. What happens?
>>> mem_cgroup_iter will css_get group that it returns and css_put it when
>>> it visits another one or finishes the loop. So your poor iter will be
>>> released before it gets born. Not good.
>>>
>> Reading this again, I don't really follow. The iterator is not supposed
>> to put() anything it hasn't get()'d before, so we will never release the
>> group. Note that if it ever appears in here, the css refcnt is expected
>> to be at least 1 already.
>>
>> The online test relies on the memcg refcnt, not on the css refcnt.
> 
> Bahh, yeah, sorry about the confusion. Damn, it's not the first time I
> managed to mix those two...
> 
You're excused. This time.

>> Actually, now that the value setting is all done in css_online, the css
>> refcnt should be enough to denote if the cgroup already has children,
>> without a memcg-specific test. The css refcnt is bumped somewhere
>> between alloc and online. 
> 
> Yes, in init_cgroup_css.
Yes, but that should not matter. We should not depend on anything more
general than "between alloc and online".

> 
>> Unless Tejun objects it, I think I will just get rid of the online
>> test, and rely on the fact that if the iterator sees any children, we
>> should already online.
> 
> Which means that we are back to list_empty(&cgrp->children) test, aren't
> we. 

As long as cgroup core keeps using a list yes.
The css itself won't go away, regardless of the infrastructure cgroup
uses internally. So I do believe this is stronger long term.

Note that we have been arguing here about the css_refcnt, but we don't
actually refer to it: we do css_get and let cgroup core do whatever it
pleases it internally.


> If you really insist on not using
> children directly then do something like:
> 	struct cgroup *pos;
> 
> 	if (!memcg->use_hierarchy)
> 		cgroup_for_each_child(pos, memcg->css.cgroup)
> 			return true;
> 
> 	return false;
> 
I don't oppose that.

> This still has an issue that a change (e.g. vm_swappiness) that requires
> this check will fail even though the child creation fails after it is
> made visible (e.g. during css_online).
> 
Is it a problem ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] memcg: split part of memcg creation to css_online
  2013-01-21  8:38       ` Michal Hocko
@ 2013-01-21  8:42         ` Glauber Costa
  0 siblings, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-21  8:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/21/2013 12:38 PM, Michal Hocko wrote:
> On Mon 21-01-13 11:33:20, Glauber Costa wrote:
>> On 01/18/2013 07:25 PM, Michal Hocko wrote:
>>>> -	spin_lock_init(&memcg->move_lock);
>>>>> +	memcg->swappiness = mem_cgroup_swappiness(parent);
>>> Please move this up to oom_kill_disable and use_hierarchy
>>> initialization.
>>
>> One thing: wouldn't moving it to inside use_hierarchy be a change of
>> behavior here?
> 
> I do not see how it would change the behavior. But maybe I wasn't clear
> enough. I just wanted to make all three:
> 	memcg->use_hierarchy = parent->use_hierarchy;
> 	memcg->oom_kill_disable = parent->oom_kill_disable;
> 	memcg->swappiness = mem_cgroup_swappiness(parent);
> 
> in the same visual block so that we can split the function into three
> parts. Inherited values which don't depend on use_hierarchy, those that
> depend on use_hierarchy and the rest that depends on the previous
> decisions (kmem e.g.).
> Makes sense?
> 
Yes. I misunderstood you, believing you wanted the swappiness assignment
to go inside the use_hierarchy block.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-21  8:41         ` Glauber Costa
@ 2013-01-21  9:15           ` Michal Hocko
  2013-01-21  9:19             ` Glauber Costa
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2013-01-21  9:15 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On Mon 21-01-13 12:41:24, Glauber Costa wrote:
> On 01/21/2013 12:34 PM, Michal Hocko wrote:
[...]
> > If you really insist on not using
> > children directly then do something like:
> > 	struct cgroup *pos;
> > 
> > 	if (!memcg->use_hierarchy)
> > 		cgroup_for_each_child(pos, memcg->css.cgroup)
> > 			return true;
> > 
> > 	return false;
> > 
> I don't oppose that.

OK, I guess I could live with that ;)

> > This still has an issue that a change (e.g. vm_swappiness) that requires
> > this check will fail even though the child creation fails after it is
> > made visible (e.g. during css_online).
> > 
> Is it a problem ?

I thought you were considering this a problem. Quoting from patch 3/7
"
> This calls for troubles and I do not think the win you get is really
> worth it. All it gives you is basically that you can change an
> inheritable attribute while your child is between css_alloc and
> css_online and so your attribute change doesn't fail if the child
> creation fails between those two. Is this the case you want to
> handle? Does it really even matter?

I think it matters a lot. Aside from the before vs after discussion to
which I've already conceded, without this protection we can't guarantee
that we won't end up with an inconsistent value of the tunables between
parent and child.
"

Just to make it clear. I do not see this failure as a big problem. We
can fix it by adding an Online check later if somebody complains.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/7] memcg: fast hierarchy-aware child test.
  2013-01-21  9:15           ` Michal Hocko
@ 2013-01-21  9:19             ` Glauber Costa
  0 siblings, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2013-01-21  9:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, kamezawa.hiroyu, Johannes Weiner, Tejun Heo

On 01/21/2013 01:15 PM, Michal Hocko wrote:
> On Mon 21-01-13 12:41:24, Glauber Costa wrote:
>> On 01/21/2013 12:34 PM, Michal Hocko wrote:
> [...]
>>> If you really insist on not using
>>> children directly then do something like:
>>> 	struct cgroup *pos;
>>>
>>> 	if (!memcg->use_hierarchy)
>>> 		cgroup_for_each_child(pos, memcg->css.cgroup)
>>> 			return true;
>>>
>>> 	return false;
>>>
>> I don't oppose that.
> 
> OK, I guess I could live with that ;)
> 
>>> This still has an issue that a change (e.g. vm_swappiness) that requires
>>> this check will fail even though the child creation fails after it is
>>> made visible (e.g. during css_online).
>>>
>> Is it a problem ?
> 
> I thought you were considering this a problem. Quoting from patch 3/7
> "
>> This calls for troubles and I do not think the win you get is really
>> worth it. All it gives you is basically that you can change an
>> inheritable attribute while your child is between css_alloc and
>> css_online and so your attribute change doesn't fail if the child
>> creation fails between those two. Is this the case you want to
>> handle? Does it really even matter?
> 
> I think it matters a lot. Aside from the before vs after discussion to
> which I've already conceded, without this protection we can't guarantee
> that we won't end up with an inconsistent value of the tunables between
> parent and child.
> "
> 
> Just to make it clear. I do not see this failure as a big problem. We
> can fix it by adding an Online check later if somebody complains.
> 
I am talking here about groups that appear between the alloc and online
callbacks.

You mentioned child creation failures. Those are very different
scenarios. I am personally not a lot bothered if we deny a value change
during child creation due to that child, and the allocation turns out to
fail.

IOW: denying a value change because we falsely believe there is a child
is a lot less serious than allowing a value change due to our ignorance
of child existence and ending up with an inconsistent value set.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-01-21  9:19 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-11  9:45 [PATCH v2 0/7] replace cgroup_lock with local memcg lock Glauber Costa
2013-01-11  9:45 ` [PATCH v2 1/7] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
2013-01-18 14:16   ` Michal Hocko
2013-01-11  9:45 ` [PATCH v2 2/7] memcg: split part of memcg creation to css_online Glauber Costa
2013-01-18 15:25   ` Michal Hocko
2013-01-18 19:28     ` Glauber Costa
2013-01-21  7:33     ` Glauber Costa
2013-01-21  8:38       ` Michal Hocko
2013-01-21  8:42         ` Glauber Costa
2013-01-11  9:45 ` [PATCH v2 3/7] memcg: provide online test for memcg Glauber Costa
2013-01-18 15:37   ` Michal Hocko
2013-01-18 15:56     ` Michal Hocko
2013-01-18 19:42       ` Glauber Costa
2013-01-18 19:43         ` Glauber Costa
2013-01-18 19:41     ` Glauber Costa
2013-01-11  9:45 ` [PATCH v2 4/7] memcg: fast hierarchy-aware child test Glauber Costa
2013-01-18 16:06   ` Michal Hocko
2013-01-21  7:58     ` Glauber Costa
2013-01-21  8:34       ` Michal Hocko
2013-01-21  8:41         ` Glauber Costa
2013-01-21  9:15           ` Michal Hocko
2013-01-21  9:19             ` Glauber Costa
2013-01-11  9:45 ` [PATCH v2 5/7] May god have mercy on my soul Glauber Costa
2013-01-18 16:07   ` Michal Hocko
2013-01-11  9:45 ` [PATCH v2 6/7] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
2013-01-18 16:21   ` Michal Hocko
2013-01-11  9:45 ` [PATCH v2 7/7] memcg: increment static branch right after limit set Glauber Costa
2013-01-18 16:23   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).