* [PATCH v4 1/6] memcg: prevent changes to move_charge_at_immigrate during task attach
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-29 0:11 ` Kamezawa Hiroyuki
2013-01-22 13:47 ` [PATCH v4 2/6] memcg: split part of memcg creation to css_online Glauber Costa
` (5 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
Currently, we rely on the cgroup_lock() to prevent changes to
move_charge_at_immigrate during task migration. However, this is only
needed because the current strategy keeps checking this value throughout
the whole process. Since all we need is serialization, one needs only to
guarantee that whatever decision we made in the beginning of a specific
migration is respected throughout the process.
We can achieve this by just saving it in mc. By doing this, no kind of
locking is needed.
[ v2: change flag name to avoid confusion ]
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09255ec..91d90a0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -398,8 +398,8 @@ static bool memcg_kmem_test_and_clear_dead(struct mem_cgroup *memcg)
/* Stuffs for move charges at task migration. */
/*
- * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
- * left-shifted bitmap of these types.
+ * Types of charges to be moved. "move_charge_at_immitgrate" and
+ * "immigrate_flags" are treated as a left-shifted bitmap of these types.
*/
enum move_type {
MOVE_CHARGE_TYPE_ANON, /* private anonymous page and swap of it */
@@ -412,6 +412,7 @@ static struct move_charge_struct {
spinlock_t lock; /* for from, to */
struct mem_cgroup *from;
struct mem_cgroup *to;
+ unsigned long immigrate_flags;
unsigned long precharge;
unsigned long moved_charge;
unsigned long moved_swap;
@@ -424,14 +425,12 @@ static struct move_charge_struct {
static bool move_anon(void)
{
- return test_bit(MOVE_CHARGE_TYPE_ANON,
- &mc.to->move_charge_at_immigrate);
+ return test_bit(MOVE_CHARGE_TYPE_ANON, &mc.immigrate_flags);
}
static bool move_file(void)
{
- return test_bit(MOVE_CHARGE_TYPE_FILE,
- &mc.to->move_charge_at_immigrate);
+ return test_bit(MOVE_CHARGE_TYPE_FILE, &mc.immigrate_flags);
}
/*
@@ -5146,15 +5145,14 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
if (val >= (1 << NR_MOVE_TYPE))
return -EINVAL;
+
/*
- * We check this value several times in both in can_attach() and
- * attach(), so we need cgroup lock to prevent this value from being
- * inconsistent.
+ * No kind of locking is needed in here, because ->can_attach() will
+ * check this value once in the beginning of the process, and then carry
+ * on with stale data. This means that changes to this value will only
+ * affect task migrations starting after the change.
*/
- cgroup_lock();
memcg->move_charge_at_immigrate = val;
- cgroup_unlock();
-
return 0;
}
#else
@@ -6530,8 +6528,15 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
struct task_struct *p = cgroup_taskset_first(tset);
int ret = 0;
struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+ unsigned long move_charge_at_immigrate;
- if (memcg->move_charge_at_immigrate) {
+ /*
+ * We are now commited to this value whatever it is. Changes in this
+ * tunable will only affect upcoming migrations, not the current one.
+ * So we need to save it, and keep it going.
+ */
+ move_charge_at_immigrate = memcg->move_charge_at_immigrate;
+ if (move_charge_at_immigrate) {
struct mm_struct *mm;
struct mem_cgroup *from = mem_cgroup_from_task(p);
@@ -6551,6 +6556,7 @@ static int mem_cgroup_can_attach(struct cgroup *cgroup,
spin_lock(&mc.lock);
mc.from = from;
mc.to = memcg;
+ mc.immigrate_flags = move_charge_at_immigrate;
spin_unlock(&mc.lock);
/* We set mc.moving_task later */
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 1/6] memcg: prevent changes to move_charge_at_immigrate during task attach
2013-01-22 13:47 ` [PATCH v4 1/6] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2013-01-29 0:11 ` Kamezawa Hiroyuki
0 siblings, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:11 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> Currently, we rely on the cgroup_lock() to prevent changes to
> move_charge_at_immigrate during task migration. However, this is only
> needed because the current strategy keeps checking this value throughout
> the whole process. Since all we need is serialization, one needs only to
> guarantee that whatever decision we made in the beginning of a specific
> migration is respected throughout the process.
>
> We can achieve this by just saving it in mc. By doing this, no kind of
> locking is needed.
>
> [ v2: change flag name to avoid confusion ]
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 2/6] memcg: split part of memcg creation to css_online
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
2013-01-22 13:47 ` [PATCH v4 1/6] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-25 23:52 ` Andrew Morton
2013-01-29 0:12 ` Kamezawa Hiroyuki
2013-01-22 13:47 ` [PATCH v4 3/6] memcg: fast hierarchy-aware child test Glauber Costa
` (4 subsequent siblings)
6 siblings, 2 replies; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
This patch is a preparatory work for later locking rework to get rid of
big cgroup lock from memory controller code.
The memory controller uses some tunables to adjust its operation. Those
tunables are inherited from parent to children upon children
intialization. For most of them, the value cannot be changed after the
parent has a new children.
cgroup core splits initialization in two phases: css_alloc and css_online.
After css_alloc, the memory allocation and basic initialization are
done. But the new group is not yet visible anywhere, not even for cgroup
core code. It is only somewhere between css_alloc and css_online that it
is inserted into the internal children lists. Copying tunable values in
css_alloc will lead to inconsistent values: the children will copy the
old parent values, that can change between the copy and the moment in
which the groups is linked to any data structure that can indicate the
presence of children.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 61 ++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 39 insertions(+), 22 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91d90a0..6c72204 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6063,7 +6063,7 @@ err_cleanup:
static struct cgroup_subsys_state * __ref
mem_cgroup_css_alloc(struct cgroup *cont)
{
- struct mem_cgroup *memcg, *parent;
+ struct mem_cgroup *memcg;
long error = -ENOMEM;
int node;
@@ -6079,7 +6079,6 @@ mem_cgroup_css_alloc(struct cgroup *cont)
if (cont->parent == NULL) {
int cpu;
enable_swap_cgroup();
- parent = NULL;
if (mem_cgroup_soft_limit_tree_init())
goto free_out;
root_mem_cgroup = memcg;
@@ -6088,13 +6087,43 @@ mem_cgroup_css_alloc(struct cgroup *cont)
&per_cpu(memcg_stock, cpu);
INIT_WORK(&stock->work, drain_local_stock);
}
- } else {
- parent = mem_cgroup_from_cont(cont->parent);
- memcg->use_hierarchy = parent->use_hierarchy;
- memcg->oom_kill_disable = parent->oom_kill_disable;
+
+ res_counter_init(&memcg->res, NULL);
+ res_counter_init(&memcg->memsw, NULL);
+ res_counter_init(&memcg->kmem, NULL);
}
- if (parent && parent->use_hierarchy) {
+ memcg->last_scanned_node = MAX_NUMNODES;
+ INIT_LIST_HEAD(&memcg->oom_notify);
+ atomic_set(&memcg->refcnt, 1);
+ memcg->move_charge_at_immigrate = 0;
+ mutex_init(&memcg->thresholds_lock);
+ spin_lock_init(&memcg->move_lock);
+
+ return &memcg->css;
+
+free_out:
+ __mem_cgroup_free(memcg);
+ return ERR_PTR(error);
+}
+
+static int
+mem_cgroup_css_online(struct cgroup *cont)
+{
+ struct mem_cgroup *memcg, *parent;
+ int error = 0;
+
+ if (!cont->parent)
+ return 0;
+
+ memcg = mem_cgroup_from_cont(cont);
+ parent = mem_cgroup_from_cont(cont->parent);
+
+ memcg->use_hierarchy = parent->use_hierarchy;
+ memcg->oom_kill_disable = parent->oom_kill_disable;
+ memcg->swappiness = mem_cgroup_swappiness(parent);
+
+ if (parent->use_hierarchy) {
res_counter_init(&memcg->res, &parent->res);
res_counter_init(&memcg->memsw, &parent->memsw);
res_counter_init(&memcg->kmem, &parent->kmem);
@@ -6115,18 +6144,9 @@ mem_cgroup_css_alloc(struct cgroup *cont)
* much sense so let cgroup subsystem know about this
* unfortunate state in our controller.
*/
- if (parent && parent != root_mem_cgroup)
+ if (parent != root_mem_cgroup)
mem_cgroup_subsys.broken_hierarchy = true;
}
- memcg->last_scanned_node = MAX_NUMNODES;
- INIT_LIST_HEAD(&memcg->oom_notify);
-
- if (parent)
- memcg->swappiness = mem_cgroup_swappiness(parent);
- atomic_set(&memcg->refcnt, 1);
- memcg->move_charge_at_immigrate = 0;
- mutex_init(&memcg->thresholds_lock);
- spin_lock_init(&memcg->move_lock);
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
if (error) {
@@ -6136,12 +6156,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
- return ERR_PTR(error);
}
- return &memcg->css;
-free_out:
- __mem_cgroup_free(memcg);
- return ERR_PTR(error);
+ return error;
}
static void mem_cgroup_css_offline(struct cgroup *cont)
@@ -6751,6 +6767,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
.name = "memory",
.subsys_id = mem_cgroup_subsys_id,
.css_alloc = mem_cgroup_css_alloc,
+ .css_online = mem_cgroup_css_online,
.css_offline = mem_cgroup_css_offline,
.css_free = mem_cgroup_css_free,
.can_attach = mem_cgroup_can_attach,
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/6] memcg: split part of memcg creation to css_online
2013-01-22 13:47 ` [PATCH v4 2/6] memcg: split part of memcg creation to css_online Glauber Costa
@ 2013-01-25 23:52 ` Andrew Morton
2013-01-28 8:35 ` Lord Glauber Costa of Sealand
2013-01-29 0:12 ` Kamezawa Hiroyuki
1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2013-01-25 23:52 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu
On Tue, 22 Jan 2013 17:47:37 +0400
Glauber Costa <glommer@parallels.com> wrote:
> This patch is a preparatory work for later locking rework to get rid of
> big cgroup lock from memory controller code.
Is this complete? From my reading, the patch is also a bugfix. It
prevents stale tunable values from getting installed into new children?
> The memory controller uses some tunables to adjust its operation. Those
> tunables are inherited from parent to children upon children
> intialization. For most of them, the value cannot be changed after the
> parent has a new children.
>
> cgroup core splits initialization in two phases: css_alloc and css_online.
> After css_alloc, the memory allocation and basic initialization are
> done. But the new group is not yet visible anywhere, not even for cgroup
> core code. It is only somewhere between css_alloc and css_online that it
> is inserted into the internal children lists. Copying tunable values in
> css_alloc will lead to inconsistent values: the children will copy the
> old parent values, that can change between the copy and the moment in
> which the groups is linked to any data structure that can indicate the
> presence of children.
That describes the problem, but not the fix. Don't we need something
like "therefore move the propagation of tunables into the css_online
handler".
What remains unclear is how we prevent races during the operation of
the css_online handler. Suppose mem_cgroup_css_online() is
mid-execution and userspace comes in and starts modifying the parent's
tunables?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/6] memcg: split part of memcg creation to css_online
2013-01-25 23:52 ` Andrew Morton
@ 2013-01-28 8:35 ` Lord Glauber Costa of Sealand
0 siblings, 0 replies; 24+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28 8:35 UTC (permalink / raw)
To: Andrew Morton
Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu
On 01/26/2013 03:52 AM, Andrew Morton wrote:
> On Tue, 22 Jan 2013 17:47:37 +0400
> Glauber Costa <glommer@parallels.com> wrote:
>
>> This patch is a preparatory work for later locking rework to get rid of
>> big cgroup lock from memory controller code.
>
> Is this complete? From my reading, the patch is also a bugfix. It
> prevents stale tunable values from getting installed into new children?
>
No, it is not a bug fix. This used to be all protected by the cgroup
lock under the hood - we don't see it, but it is there from cgroup core.
Yes, this is ugly. But it is one of the very problems this patchset is
trying to get rid of =p
>> The memory controller uses some tunables to adjust its operation. Those
>> tunables are inherited from parent to children upon children
>> intialization. For most of them, the value cannot be changed after the
>> parent has a new children.
>>
>> cgroup core splits initialization in two phases: css_alloc and css_online.
>> After css_alloc, the memory allocation and basic initialization are
>> done. But the new group is not yet visible anywhere, not even for cgroup
>> core code. It is only somewhere between css_alloc and css_online that it
>> is inserted into the internal children lists. Copying tunable values in
>> css_alloc will lead to inconsistent values: the children will copy the
>> old parent values, that can change between the copy and the moment in
>> which the groups is linked to any data structure that can indicate the
>> presence of children.
>
> That describes the problem, but not the fix. Don't we need something
> like "therefore move the propagation of tunables into the css_online
> handler".
>
> What remains unclear is how we prevent races during the operation of
> the css_online handler. Suppose mem_cgroup_css_online() is
> mid-execution and userspace comes in and starts modifying the parent's
> tunables?
>
At this point, the very same old cgroup_lock() - since it is still
present. In a later patch, we will need the memcg mutex around the
assignments.
IOW, The figure looks a bit like:
css_alloc() --> cgroup_internal_datastructure_update -> css_online()
This is all protected by the cgroup_lock(). So at this point, wherever
we do those assignments, we're safe. When we move to local locking, the
situation changes. Assigning in css_alloc will mean that we'll have a
non-locked window where the assignment is made, but the cgroup does not
yet show up in the internal data structures - so the pertinence tests
will fail and the tunable values will be allowed to change.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/6] memcg: split part of memcg creation to css_online
2013-01-22 13:47 ` [PATCH v4 2/6] memcg: split part of memcg creation to css_online Glauber Costa
2013-01-25 23:52 ` Andrew Morton
@ 2013-01-29 0:12 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:12 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> This patch is a preparatory work for later locking rework to get rid of
> big cgroup lock from memory controller code.
>
> The memory controller uses some tunables to adjust its operation. Those
> tunables are inherited from parent to children upon children
> intialization. For most of them, the value cannot be changed after the
> parent has a new children.
>
> cgroup core splits initialization in two phases: css_alloc and css_online.
> After css_alloc, the memory allocation and basic initialization are
> done. But the new group is not yet visible anywhere, not even for cgroup
> core code. It is only somewhere between css_alloc and css_online that it
> is inserted into the internal children lists. Copying tunable values in
> css_alloc will lead to inconsistent values: the children will copy the
> old parent values, that can change between the copy and the moment in
> which the groups is linked to any data structure that can indicate the
> presence of children.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 3/6] memcg: fast hierarchy-aware child test.
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
2013-01-22 13:47 ` [PATCH v4 1/6] memcg: prevent changes to move_charge_at_immigrate during task attach Glauber Costa
2013-01-22 13:47 ` [PATCH v4 2/6] memcg: split part of memcg creation to css_online Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-25 23:59 ` Andrew Morton
2013-01-29 0:14 ` Kamezawa Hiroyuki
2013-01-22 13:47 ` [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
` (3 subsequent siblings)
6 siblings, 2 replies; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
Currently, we use cgroups' provided list of children to verify if it is
safe to proceed with any value change that is dependent on the cgroup
being empty.
This is less than ideal, because it enforces a dependency over cgroup
core that we would be better off without. The solution proposed here is
to iterate over the child cgroups and if any is found that is already
online, we bounce and return: we don't really care how many children we
have, only if we have any.
This is also made to be hierarchy aware. IOW, cgroups with hierarchy
disabled, while they still exist, will be considered for the purpose of
this interface as having no children.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 34 +++++++++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6c72204..6d3ad21 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4716,6 +4716,32 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
}
/*
+ * this mainly exists for tests during set of use_hierarchy. Since this is
+ * the very setting we are changing, the current hierarchy value is meaningless
+ */
+static inline bool __memcg_has_children(struct mem_cgroup *memcg)
+{
+ struct cgroup *pos;
+
+ /* bounce at first found */
+ cgroup_for_each_child(pos, memcg->css.cgroup)
+ return true;
+ return false;
+}
+
+/*
+ * must be called with cgroup_lock held, unless the cgroup is guaranteed to be
+ * already dead (like in mem_cgroup_force_empty, for instance). This is
+ * different than mem_cgroup_count_children, in the sense that we don't really
+ * care how many children we have, we only need to know if we have any. It is
+ * also count any memcg without hierarchy as infertile for that matter.
+ */
+static inline bool memcg_has_children(struct mem_cgroup *memcg)
+{
+ return memcg->use_hierarchy && __memcg_has_children(memcg);
+}
+
+/*
* Reclaims as many pages from the given memcg as possible and moves
* the rest to the parent.
*
@@ -4800,7 +4826,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
*/
if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
(val == 1 || val == 0)) {
- if (list_empty(&cont->children))
+ if (!__memcg_has_children(memcg))
memcg->use_hierarchy = val;
else
retval = -EBUSY;
@@ -4917,8 +4943,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
cgroup_lock();
mutex_lock(&set_limit_mutex);
if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
- if (cgroup_task_count(cont) || (memcg->use_hierarchy &&
- !list_empty(&cont->children))) {
+ if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
ret = -EBUSY;
goto out;
}
@@ -5334,8 +5359,7 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
cgroup_lock();
/* If under hierarchy, only empty-root can set this value */
- if ((parent->use_hierarchy) ||
- (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
+ if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
cgroup_unlock();
return -EINVAL;
}
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 3/6] memcg: fast hierarchy-aware child test.
2013-01-22 13:47 ` [PATCH v4 3/6] memcg: fast hierarchy-aware child test Glauber Costa
@ 2013-01-25 23:59 ` Andrew Morton
2013-01-28 8:30 ` Lord Glauber Costa of Sealand
2013-01-29 0:14 ` Kamezawa Hiroyuki
1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2013-01-25 23:59 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu
On Tue, 22 Jan 2013 17:47:38 +0400
Glauber Costa <glommer@parallels.com> wrote:
> Currently, we use cgroups' provided list of children to verify if it is
> safe to proceed with any value change that is dependent on the cgroup
> being empty.
>
> This is less than ideal, because it enforces a dependency over cgroup
> core that we would be better off without. The solution proposed here is
> to iterate over the child cgroups and if any is found that is already
> online, we bounce and return: we don't really care how many children we
> have, only if we have any.
>
> This is also made to be hierarchy aware. IOW, cgroups with hierarchy
> disabled, while they still exist, will be considered for the purpose of
> this interface as having no children.
The code comments are a bit unclear. Did this improve them?
--- a/mm/memcontrol.c~memcg-fast-hierarchy-aware-child-test-fix
+++ a/mm/memcontrol.c
@@ -4761,8 +4761,9 @@ static void mem_cgroup_reparent_charges(
}
/*
- * this mainly exists for tests during set of use_hierarchy. Since this is
- * the very setting we are changing, the current hierarchy value is meaningless
+ * This mainly exists for tests during the setting of set of use_hierarchy.
+ * Since this is the very setting we are changing, the current hierarchy value
+ * is meaningless
*/
static inline bool __memcg_has_children(struct mem_cgroup *memcg)
{
@@ -4775,11 +4776,11 @@ static inline bool __memcg_has_children(
}
/*
- * must be called with cgroup_lock held, unless the cgroup is guaranteed to be
- * already dead (like in mem_cgroup_force_empty, for instance). This is
- * different than mem_cgroup_count_children, in the sense that we don't really
- * care how many children we have, we only need to know if we have any. It is
- * also count any memcg without hierarchy as infertile for that matter.
+ * Must be called with cgroup_lock held, unless the cgroup is guaranteed to be
+ * already dead (in mem_cgroup_force_empty(), for instance). This is different
+ * from mem_cgroup_count_children(), in the sense that we don't really care how
+ * many children we have; we only need to know if we have any. It also counts
+ * any memcg without hierarchy as infertile.
*/
static inline bool memcg_has_children(struct mem_cgroup *memcg)
{
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 3/6] memcg: fast hierarchy-aware child test.
2013-01-25 23:59 ` Andrew Morton
@ 2013-01-28 8:30 ` Lord Glauber Costa of Sealand
0 siblings, 0 replies; 24+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28 8:30 UTC (permalink / raw)
To: Andrew Morton
Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu
On 01/26/2013 03:59 AM, Andrew Morton wrote:
> On Tue, 22 Jan 2013 17:47:38 +0400
> Glauber Costa <glommer@parallels.com> wrote:
>
>> Currently, we use cgroups' provided list of children to verify if it is
>> safe to proceed with any value change that is dependent on the cgroup
>> being empty.
>>
>> This is less than ideal, because it enforces a dependency over cgroup
>> core that we would be better off without. The solution proposed here is
>> to iterate over the child cgroups and if any is found that is already
>> online, we bounce and return: we don't really care how many children we
>> have, only if we have any.
>>
>> This is also made to be hierarchy aware. IOW, cgroups with hierarchy
>> disabled, while they still exist, will be considered for the purpose of
>> this interface as having no children.
>
> The code comments are a bit unclear. Did this improve them?
Both versions seem clear to me, so I'll go with yours as a tie breaker =p
One thing to keep in mind:
> - * must be called with cgroup_lock held, unless the cgroup is guaranteed to be
> - * already dead (like in mem_cgroup_force_empty, for instance). This is
> - * different than mem_cgroup_count_children, in the sense that we don't really
> - * care how many children we have, we only need to know if we have any. It is
> - * also count any memcg without hierarchy as infertile for that matter.
> + * Must be called with cgroup_lock held, unless the cgroup is guaranteed to be
> + * already dead (in mem_cgroup_force_empty(), for instance). This is different
> + * from mem_cgroup_count_children(), in the sense that we don't really care how
> + * many children we have; we only need to know if we have any. It also counts
> + * any memcg without hierarchy as infertile.
> */
In a later patch, I update this text to reflect the fact that the
memcg_mutex will now play this role instead of the cgroup_lock. So I am
just mentioning the cgroup lock here for temporary consistency. We need
to make sure that the later patch still applies, or we'll be left with a
bogus comment.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 3/6] memcg: fast hierarchy-aware child test.
2013-01-22 13:47 ` [PATCH v4 3/6] memcg: fast hierarchy-aware child test Glauber Costa
2013-01-25 23:59 ` Andrew Morton
@ 2013-01-29 0:14 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:14 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> Currently, we use cgroups' provided list of children to verify if it is
> safe to proceed with any value change that is dependent on the cgroup
> being empty.
>
> This is less than ideal, because it enforces a dependency over cgroup
> core that we would be better off without. The solution proposed here is
> to iterate over the child cgroups and if any is found that is already
> online, we bounce and return: we don't really care how many children we
> have, only if we have any.
>
> This is also made to be hierarchy aware. IOW, cgroups with hierarchy
> disabled, while they still exist, will be considered for the purpose of
> this interface as having no children.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
` (2 preceding siblings ...)
2013-01-22 13:47 ` [PATCH v4 3/6] memcg: fast hierarchy-aware child test Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-22 14:00 ` Michal Hocko
2013-01-29 0:16 ` Kamezawa Hiroyuki
2013-01-22 13:47 ` [PATCH v4 5/6] memcg: increment static branch right after limit set Glauber Costa
` (2 subsequent siblings)
6 siblings, 2 replies; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
After the preparation work done in earlier patches, the cgroup_lock can
be trivially replaced with a memcg-specific lock. This is an automatic
translation in every site the values involved were queried.
The sites were values are written, however, used to be naturally called
under cgroup_lock. This is the case for instance of the css_online
callback. For those, we now need to explicitly add the memcg lock.
With this, all the calls to cgroup_lock outside cgroup core are gone.
Signed-off-by: Glauber Costa <glommer@parallels.com>
---
mm/memcontrol.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6d3ad21..f5decb7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -470,6 +470,13 @@ enum res_type {
#define MEM_CGROUP_RECLAIM_SHRINK_BIT 0x1
#define MEM_CGROUP_RECLAIM_SHRINK (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
+/*
+ * The memcg_create_mutex will be held whenever a new cgroup is created.
+ * As a consequence, any change that needs to protect against new child cgroups
+ * appearing has to hold it as well.
+ */
+static DEFINE_MUTEX(memcg_create_mutex);
+
static void mem_cgroup_get(struct mem_cgroup *memcg);
static void mem_cgroup_put(struct mem_cgroup *memcg);
@@ -4730,8 +4737,8 @@ static inline bool __memcg_has_children(struct mem_cgroup *memcg)
}
/*
- * must be called with cgroup_lock held, unless the cgroup is guaranteed to be
- * already dead (like in mem_cgroup_force_empty, for instance). This is
+ * must be called with memcg_create_mutex held, unless the cgroup is guaranteed
+ * to be already dead (like in mem_cgroup_force_empty, for instance). This is
* different than mem_cgroup_count_children, in the sense that we don't really
* care how many children we have, we only need to know if we have any. It is
* also count any memcg without hierarchy as infertile for that matter.
@@ -4811,7 +4818,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
if (parent)
parent_memcg = mem_cgroup_from_cont(parent);
- cgroup_lock();
+ mutex_lock(&memcg_create_mutex);
if (memcg->use_hierarchy == val)
goto out;
@@ -4834,7 +4841,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
retval = -EINVAL;
out:
- cgroup_unlock();
+ mutex_unlock(&memcg_create_mutex);
return retval;
}
@@ -4933,14 +4940,8 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
*
* After it first became limited, changes in the value of the limit are
* of course permitted.
- *
- * Taking the cgroup_lock is really offensive, but it is so far the only
- * way to guarantee that no children will appear. There are plenty of
- * other offenders, and they should all go away. Fine grained locking
- * is probably the way to go here. When we are fully hierarchical, we
- * can also get rid of the use_hierarchy check.
*/
- cgroup_lock();
+ mutex_lock(&memcg_create_mutex);
mutex_lock(&set_limit_mutex);
if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
@@ -4967,7 +4968,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
ret = res_counter_set_limit(&memcg->kmem, val);
out:
mutex_unlock(&set_limit_mutex);
- cgroup_unlock();
+ mutex_unlock(&memcg_create_mutex);
/*
* We are by now familiar with the fact that we can't inc the static
@@ -5356,17 +5357,17 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
parent = mem_cgroup_from_cont(cgrp->parent);
- cgroup_lock();
+ mutex_lock(&memcg_create_mutex);
/* If under hierarchy, only empty-root can set this value */
if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
- cgroup_unlock();
+ mutex_unlock(&memcg_create_mutex);
return -EINVAL;
}
memcg->swappiness = val;
- cgroup_unlock();
+ mutex_unlock(&memcg_create_mutex);
return 0;
}
@@ -5692,7 +5693,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
parent = mem_cgroup_from_cont(cgrp->parent);
- cgroup_lock();
+ mutex_lock(&memcg_create_mutex);
/* oom-kill-disable is a flag for subhierarchy. */
if ((parent->use_hierarchy) ||
(memcg->use_hierarchy && !list_empty(&cgrp->children))) {
@@ -5702,7 +5703,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
memcg->oom_kill_disable = val;
if (!val)
memcg_oom_recover(memcg);
- cgroup_unlock();
+ mutex_unlock(&memcg_create_mutex);
return 0;
}
@@ -6140,6 +6141,7 @@ mem_cgroup_css_online(struct cgroup *cont)
if (!cont->parent)
return 0;
+ mutex_lock(&memcg_create_mutex);
memcg = mem_cgroup_from_cont(cont);
parent = mem_cgroup_from_cont(cont->parent);
@@ -6173,6 +6175,7 @@ mem_cgroup_css_online(struct cgroup *cont)
}
error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
+ mutex_unlock(&memcg_create_mutex);
if (error) {
/*
* We call put now because our (and parent's) refcnts
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock
2013-01-22 13:47 ` [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2013-01-22 14:00 ` Michal Hocko
2013-01-29 0:16 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2013-01-22 14:00 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups, linux-mm, Tejun Heo, Johannes Weiner, kamezawa.hiroyu
On Tue 22-01-13 17:47:39, Glauber Costa wrote:
> After the preparation work done in earlier patches, the cgroup_lock can
> be trivially replaced with a memcg-specific lock. This is an automatic
> translation in every site the values involved were queried.
>
> The sites were values are written, however, used to be naturally called
> under cgroup_lock. This is the case for instance of the css_online
> callback. For those, we now need to explicitly add the memcg lock.
>
> With this, all the calls to cgroup_lock outside cgroup core are gone.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Thanks!
> ---
> mm/memcontrol.c | 37 ++++++++++++++++++++-----------------
> 1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6d3ad21..f5decb7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -470,6 +470,13 @@ enum res_type {
> #define MEM_CGROUP_RECLAIM_SHRINK_BIT 0x1
> #define MEM_CGROUP_RECLAIM_SHRINK (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
>
> +/*
> + * The memcg_create_mutex will be held whenever a new cgroup is created.
> + * As a consequence, any change that needs to protect against new child cgroups
> + * appearing has to hold it as well.
> + */
> +static DEFINE_MUTEX(memcg_create_mutex);
> +
> static void mem_cgroup_get(struct mem_cgroup *memcg);
> static void mem_cgroup_put(struct mem_cgroup *memcg);
>
> @@ -4730,8 +4737,8 @@ static inline bool __memcg_has_children(struct mem_cgroup *memcg)
> }
>
> /*
> - * must be called with cgroup_lock held, unless the cgroup is guaranteed to be
> - * already dead (like in mem_cgroup_force_empty, for instance). This is
> + * must be called with memcg_create_mutex held, unless the cgroup is guaranteed
> + * to be already dead (like in mem_cgroup_force_empty, for instance). This is
> * different than mem_cgroup_count_children, in the sense that we don't really
> * care how many children we have, we only need to know if we have any. It is
> * also count any memcg without hierarchy as infertile for that matter.
> @@ -4811,7 +4818,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
> if (parent)
> parent_memcg = mem_cgroup_from_cont(parent);
>
> - cgroup_lock();
> + mutex_lock(&memcg_create_mutex);
>
> if (memcg->use_hierarchy == val)
> goto out;
> @@ -4834,7 +4841,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup *cont, struct cftype *cft,
> retval = -EINVAL;
>
> out:
> - cgroup_unlock();
> + mutex_unlock(&memcg_create_mutex);
>
> return retval;
> }
> @@ -4933,14 +4940,8 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
> *
> * After it first became limited, changes in the value of the limit are
> * of course permitted.
> - *
> - * Taking the cgroup_lock is really offensive, but it is so far the only
> - * way to guarantee that no children will appear. There are plenty of
> - * other offenders, and they should all go away. Fine grained locking
> - * is probably the way to go here. When we are fully hierarchical, we
> - * can also get rid of the use_hierarchy check.
> */
> - cgroup_lock();
> + mutex_lock(&memcg_create_mutex);
> mutex_lock(&set_limit_mutex);
> if (!memcg->kmem_account_flags && val != RESOURCE_MAX) {
> if (cgroup_task_count(cont) || memcg_has_children(memcg)) {
> @@ -4967,7 +4968,7 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
> ret = res_counter_set_limit(&memcg->kmem, val);
> out:
> mutex_unlock(&set_limit_mutex);
> - cgroup_unlock();
> + mutex_unlock(&memcg_create_mutex);
>
> /*
> * We are by now familiar with the fact that we can't inc the static
> @@ -5356,17 +5357,17 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
>
> parent = mem_cgroup_from_cont(cgrp->parent);
>
> - cgroup_lock();
> + mutex_lock(&memcg_create_mutex);
>
> /* If under hierarchy, only empty-root can set this value */
> if ((parent->use_hierarchy) || memcg_has_children(memcg)) {
> - cgroup_unlock();
> + mutex_unlock(&memcg_create_mutex);
> return -EINVAL;
> }
>
> memcg->swappiness = val;
>
> - cgroup_unlock();
> + mutex_unlock(&memcg_create_mutex);
>
> return 0;
> }
> @@ -5692,7 +5693,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
>
> parent = mem_cgroup_from_cont(cgrp->parent);
>
> - cgroup_lock();
> + mutex_lock(&memcg_create_mutex);
> /* oom-kill-disable is a flag for subhierarchy. */
> if ((parent->use_hierarchy) ||
> (memcg->use_hierarchy && !list_empty(&cgrp->children))) {
> @@ -5702,7 +5703,7 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
> memcg->oom_kill_disable = val;
> if (!val)
> memcg_oom_recover(memcg);
> - cgroup_unlock();
> + mutex_unlock(&memcg_create_mutex);
> return 0;
> }
>
> @@ -6140,6 +6141,7 @@ mem_cgroup_css_online(struct cgroup *cont)
> if (!cont->parent)
> return 0;
>
> + mutex_lock(&memcg_create_mutex);
> memcg = mem_cgroup_from_cont(cont);
> parent = mem_cgroup_from_cont(cont->parent);
>
> @@ -6173,6 +6175,7 @@ mem_cgroup_css_online(struct cgroup *cont)
> }
>
> error = memcg_init_kmem(memcg, &mem_cgroup_subsys);
> + mutex_unlock(&memcg_create_mutex);
> if (error) {
> /*
> * We call put now because our (and parent's) refcnts
> --
> 1.8.1
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock
2013-01-22 13:47 ` [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
2013-01-22 14:00 ` Michal Hocko
@ 2013-01-29 0:16 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:16 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> After the preparation work done in earlier patches, the cgroup_lock can
> be trivially replaced with a memcg-specific lock. This is an automatic
> translation in every site the values involved were queried.
>
> The sites were values are written, however, used to be naturally called
> under cgroup_lock. This is the case for instance of the css_online
> callback. For those, we now need to explicitly add the memcg lock.
>
> With this, all the calls to cgroup_lock outside cgroup core are gone.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 5/6] memcg: increment static branch right after limit set.
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
` (3 preceding siblings ...)
2013-01-22 13:47 ` [PATCH v4 4/6] memcg: replace cgroup_lock with memcg specific memcg_lock Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-29 0:18 ` Kamezawa Hiroyuki
2013-01-22 13:47 ` [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure Glauber Costa
2013-01-25 10:05 ` [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Lord Glauber Costa of Sealand
6 siblings, 1 reply; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
We were deferring the kmemcg static branch increment to a later time,
due to a nasty dependency between the cpu_hotplug lock, taken by the
jump label update, and the cgroup_lock.
Now we no longer take the cgroup lock, and we can save ourselves the
trouble.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 31 +++++++------------------------
1 file changed, 7 insertions(+), 24 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f5decb7..357324c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4926,8 +4926,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
{
int ret = -EINVAL;
#ifdef CONFIG_MEMCG_KMEM
- bool must_inc_static_branch = false;
-
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
/*
* For simplicity, we won't allow this to be disabled. It also can't
@@ -4956,7 +4954,13 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
res_counter_set_limit(&memcg->kmem, RESOURCE_MAX);
goto out;
}
- must_inc_static_branch = true;
+ static_key_slow_inc(&memcg_kmem_enabled_key);
+ /*
+ * setting the active bit after the inc will guarantee no one
+ * starts accounting before all call sites are patched
+ */
+ memcg_kmem_set_active(memcg);
+
/*
* kmem charges can outlive the cgroup. In the case of slab
* pages, for instance, a page contain objects from various
@@ -4969,27 +4973,6 @@ static int memcg_update_kmem_limit(struct cgroup *cont, u64 val)
out:
mutex_unlock(&set_limit_mutex);
mutex_unlock(&memcg_create_mutex);
-
- /*
- * We are by now familiar with the fact that we can't inc the static
- * branch inside cgroup_lock. See disarm functions for details. A
- * worker here is overkill, but also wrong: After the limit is set, we
- * must start accounting right away. Since this operation can't fail,
- * we can safely defer it to here - no rollback will be needed.
- *
- * The boolean used to control this is also safe, because
- * KMEM_ACCOUNTED_ACTIVATED guarantees that only one process will be
- * able to set it to true;
- */
- if (must_inc_static_branch) {
- static_key_slow_inc(&memcg_kmem_enabled_key);
- /*
- * setting the active bit after the inc will guarantee no one
- * starts accounting before all call sites are patched
- */
- memcg_kmem_set_active(memcg);
- }
-
#endif
return ret;
}
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 5/6] memcg: increment static branch right after limit set.
2013-01-22 13:47 ` [PATCH v4 5/6] memcg: increment static branch right after limit set Glauber Costa
@ 2013-01-29 0:18 ` Kamezawa Hiroyuki
0 siblings, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:18 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> We were deferring the kmemcg static branch increment to a later time,
> due to a nasty dependency between the cpu_hotplug lock, taken by the
> jump label update, and the cgroup_lock.
>
> Now we no longer take the cgroup lock, and we can save ourselves the
> trouble.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure.
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
` (4 preceding siblings ...)
2013-01-22 13:47 ` [PATCH v4 5/6] memcg: increment static branch right after limit set Glauber Costa
@ 2013-01-22 13:47 ` Glauber Costa
2013-01-22 14:00 ` Michal Hocko
2013-01-29 0:22 ` Kamezawa Hiroyuki
2013-01-25 10:05 ` [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Lord Glauber Costa of Sealand
6 siblings, 2 replies; 24+ messages in thread
From: Glauber Costa @ 2013-01-22 13:47 UTC (permalink / raw)
To: cgroups
Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu, Glauber Costa
When use_hierarchy is enabled, we acquire an extra reference count
in our parent during cgroup creation. We don't release it, though,
if any failure exist in the creation process.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Michal Hocko <mhocko@suse.cz>
---
mm/memcontrol.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 357324c..72a008e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6166,6 +6166,8 @@ mem_cgroup_css_online(struct cgroup *cont)
* call __mem_cgroup_free, so return directly
*/
mem_cgroup_put(memcg);
+ if (parent->use_hierarchy)
+ mem_cgroup_put(parent);
}
return error;
}
--
1.8.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure.
2013-01-22 13:47 ` [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure Glauber Costa
@ 2013-01-22 14:00 ` Michal Hocko
2013-01-29 0:22 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2013-01-22 14:00 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups, linux-mm, Tejun Heo, Johannes Weiner, kamezawa.hiroyu
On Tue 22-01-13 17:47:41, Glauber Costa wrote:
> When use_hierarchy is enabled, we acquire an extra reference count
> in our parent during cgroup creation. We don't release it, though,
> if any failure exist in the creation process.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Reported-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 357324c..72a008e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6166,6 +6166,8 @@ mem_cgroup_css_online(struct cgroup *cont)
> * call __mem_cgroup_free, so return directly
> */
> mem_cgroup_put(memcg);
> + if (parent->use_hierarchy)
> + mem_cgroup_put(parent);
> }
> return error;
> }
> --
> 1.8.1
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure.
2013-01-22 13:47 ` [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure Glauber Costa
2013-01-22 14:00 ` Michal Hocko
@ 2013-01-29 0:22 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 24+ messages in thread
From: Kamezawa Hiroyuki @ 2013-01-29 0:22 UTC (permalink / raw)
To: Glauber Costa; +Cc: cgroups, linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner
(2013/01/22 22:47), Glauber Costa wrote:
> When use_hierarchy is enabled, we acquire an extra reference count
> in our parent during cgroup creation. We don't release it, though,
> if any failure exist in the creation process.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Reported-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking
2013-01-22 13:47 [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Glauber Costa
` (5 preceding siblings ...)
2013-01-22 13:47 ` [PATCH v4 6/6] memcg: avoid dangling reference count in creation failure Glauber Costa
@ 2013-01-25 10:05 ` Lord Glauber Costa of Sealand
2013-01-25 10:18 ` Michal Hocko
6 siblings, 1 reply; 24+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-25 10:05 UTC (permalink / raw)
To: cgroups; +Cc: linux-mm, Tejun Heo, Michal Hocko, Johannes Weiner,
kamezawa.hiroyu
On 01/22/2013 05:47 PM, Glauber Costa wrote:
> Hi,
>
> In memcg, we use the cgroup_lock basically to synchronize against
> attaching new children to a cgroup. We do this because we rely on cgroup core to
> provide us with this information.
>
> We need to guarantee that upon child creation, our tunables are consistent.
> For those, the calls to cgroup_lock() all live in handlers like
> mem_cgroup_hierarchy_write(), where we change a tunable in the group that is
> hierarchy-related. For instance, the use_hierarchy flag cannot be changed if
> the cgroup already have children.
>
> Furthermore, those values are propageted from the parent to the child when a
> new child is created. So if we don't lock like this, we can end up with the
> following situation:
>
> A B
> memcg_css_alloc() mem_cgroup_hierarchy_write()
> copy use hierarchy from parent change use hierarchy in parent
> finish creation.
>
> This is mainly because during create, we are still not fully connected to the
> css tree. So all iterators and the such that we could use, will fail to show
> that the group has children.
>
> My observation is that all of creation can proceed in parallel with those
> tasks, except value assignment. So what this patchseries does is to first move
> all value assignment that is dependent on parent values from css_alloc to
> css_online, where the iterators all work, and then we lock only the value
> assignment. This will guarantee that parent and children always have consistent
> values. Together with an online test, that can be derived from the observation
> that the refcount of an online memcg can be made to be always positive, we
> should be able to synchronize our side without the cgroup lock.
>
> *v4:
> - revert back to using the set_limit_mutex for kmemcg limit setting.
>
> *v3:
> - simplified test for presence of children, and no longer using refcnt for
> online testing
> - some cleanups as suggested by Michal
>
> *v2:
> - sanitize kmemcg assignment in the light of the current locking change.
> - don't grab locks on immigrate charges by caching the value during can_attach
>
> Glauber Costa (6):
> memcg: prevent changes to move_charge_at_immigrate during task attach
> memcg: split part of memcg creation to css_online
> memcg: fast hierarchy-aware child test.
> memcg: replace cgroup_lock with memcg specific memcg_lock
> memcg: increment static branch right after limit set.
> memcg: avoid dangling reference count in creation failure.
>
Tejun,
This applies ontop of your cpuset patches. Would you pick this (would be
my choice), or would you rather have it routed through somewhere mmish ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking
2013-01-25 10:05 ` [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Lord Glauber Costa of Sealand
@ 2013-01-25 10:18 ` Michal Hocko
2013-01-25 10:27 ` Lord Glauber Costa of Sealand
0 siblings, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2013-01-25 10:18 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand
Cc: cgroups, linux-mm, Tejun Heo, Johannes Weiner, kamezawa.hiroyu
On Fri 25-01-13 14:05:04, Glauber Costa wrote:
[...]
> > Glauber Costa (6):
> > memcg: prevent changes to move_charge_at_immigrate during task attach
> > memcg: split part of memcg creation to css_online
> > memcg: fast hierarchy-aware child test.
> > memcg: replace cgroup_lock with memcg specific memcg_lock
> > memcg: increment static branch right after limit set.
> > memcg: avoid dangling reference count in creation failure.
> >
>
> Tejun,
>
> This applies ontop of your cpuset patches. Would you pick this (would be
> my choice), or would you rather have it routed through somewhere mmish ?
I would vote to -mm. Or is there any specific reason to have it in
cgroup tree? It doesn't touch any cgroup core parts, does it?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking
2013-01-25 10:18 ` Michal Hocko
@ 2013-01-25 10:27 ` Lord Glauber Costa of Sealand
2013-01-25 17:37 ` Tejun Heo
0 siblings, 1 reply; 24+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-25 10:27 UTC (permalink / raw)
To: Michal Hocko
Cc: cgroups, linux-mm, Tejun Heo, Johannes Weiner, kamezawa.hiroyu,
Andrew Morton
On 01/25/2013 02:18 PM, Michal Hocko wrote:
> On Fri 25-01-13 14:05:04, Glauber Costa wrote:
> [...]
>>> Glauber Costa (6):
>>> memcg: prevent changes to move_charge_at_immigrate during task attach
>>> memcg: split part of memcg creation to css_online
>>> memcg: fast hierarchy-aware child test.
>>> memcg: replace cgroup_lock with memcg specific memcg_lock
>>> memcg: increment static branch right after limit set.
>>> memcg: avoid dangling reference count in creation failure.
>>>
>>
>> Tejun,
>>
>> This applies ontop of your cpuset patches. Would you pick this (would be
>> my choice), or would you rather have it routed through somewhere mmish ?
>
> I would vote to -mm. Or is there any specific reason to have it in
> cgroup tree? It doesn't touch any cgroup core parts, does it?
>
Copying Andrew (retroactively sorry you weren't directly CCd on this one
as well).
I depend on css_online and the cgroup generic iterator. If they are
already present @ -mm, then fine.
(looking now, they seem to be...)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking
2013-01-25 10:27 ` Lord Glauber Costa of Sealand
@ 2013-01-25 17:37 ` Tejun Heo
2013-01-26 0:03 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2013-01-25 17:37 UTC (permalink / raw)
To: Lord Glauber Costa of Sealand
Cc: Michal Hocko, cgroups, linux-mm, Johannes Weiner, kamezawa.hiroyu,
Andrew Morton
Hey,
On Fri, Jan 25, 2013 at 02:27:55PM +0400, Lord Glauber Costa of Sealand wrote:
> > I would vote to -mm. Or is there any specific reason to have it in
> > cgroup tree? It doesn't touch any cgroup core parts, does it?
> >
> Copying Andrew (retroactively sorry you weren't directly CCd on this one
> as well).
>
> I depend on css_online and the cgroup generic iterator. If they are
> already present @ -mm, then fine.
> (looking now, they seem to be...)
Yeah, they're all in cgroup/for-next so should be available in -mm, so
I think -mm probably is the better tree to route these.
Thanks!
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking
2013-01-25 17:37 ` Tejun Heo
@ 2013-01-26 0:03 ` Andrew Morton
0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2013-01-26 0:03 UTC (permalink / raw)
To: Tejun Heo
Cc: Lord Glauber Costa of Sealand, Michal Hocko, cgroups, linux-mm,
Johannes Weiner, kamezawa.hiroyu
On Fri, 25 Jan 2013 09:37:01 -0800
Tejun Heo <tj@kernel.org> wrote:
> Hey,
>
> On Fri, Jan 25, 2013 at 02:27:55PM +0400, Lord Glauber Costa of Sealand wrote:
> > > I would vote to -mm. Or is there any specific reason to have it in
> > > cgroup tree? It doesn't touch any cgroup core parts, does it?
> > >
> > Copying Andrew (retroactively sorry you weren't directly CCd on this one
> > as well).
> >
> > I depend on css_online and the cgroup generic iterator. If they are
> > already present @ -mm, then fine.
> > (looking now, they seem to be...)
>
> Yeah, they're all in cgroup/for-next so should be available in -mm, so
> I think -mm probably is the better tree to route these.
>
yep, grabbed, thanks.
The good changelogging and code commenting really help with review -
thanks for doing that. It's a shame so few people are interested in
reviewing them! (Hint).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread