* [patch 0/3 resend] memcg fixes for 3.12
@ 2013-10-30 21:55 Johannes Weiner
2013-10-30 21:55 ` [patch 1/3] mm: memcg: use proper memcg in limit bypass Johannes Weiner
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Johannes Weiner @ 2013-10-30 21:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel
Hi Andrew,
here are 3 more fixes that should go into 3.12.
#2 is a special case. The locking scheme is not new and
it's late in the cycle, but the context of this lock was
heavily changed after the OOM rewrite this merge window,
it would be good to have lockdep coverage in the release.
Thanks!
mm/memcontrol.c | 54 +++++++++++++++++++++++++-----------------------------
1 file changed, 25 insertions(+), 29 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [patch 1/3] mm: memcg: use proper memcg in limit bypass
2013-10-30 21:55 [patch 0/3 resend] memcg fixes for 3.12 Johannes Weiner
@ 2013-10-30 21:55 ` Johannes Weiner
2013-10-30 21:55 ` [patch 2/3] mm: memcg: lockdep annotation for memcg OOM lock Johannes Weiner
2013-10-30 21:55 ` [patch 3/3] mm: memcg: fix test for child groups Johannes Weiner
2 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2013-10-30 21:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel
84235de ("fs: buffer: move allocation failure loop into the
allocator") allowed __GFP_NOFAIL allocations to bypass the limit if
they fail to reclaim enough memory for the charge. Because the main
test case was on a 3.2-based system, this patch missed the fact that
on newer kernels the charge function needs to return root_mem_cgroup
when bypassing the limit, and not NULL. This will corrupt whatever
memory is at NULL + percpu pointer offset. Fix this quickly before
problems are reported.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 34d3ca9572d6..13a9c80d5708 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2765,10 +2765,10 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
*ptr = memcg;
return 0;
nomem:
- *ptr = NULL;
- if (gfp_mask & __GFP_NOFAIL)
- return 0;
- return -ENOMEM;
+ if (!(gfp_mask & __GFP_NOFAIL)) {
+ *ptr = NULL;
+ return -ENOMEM;
+ }
bypass:
*ptr = root_mem_cgroup;
return -EINTR;
--
1.8.4.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [patch 2/3] mm: memcg: lockdep annotation for memcg OOM lock
2013-10-30 21:55 [patch 0/3 resend] memcg fixes for 3.12 Johannes Weiner
2013-10-30 21:55 ` [patch 1/3] mm: memcg: use proper memcg in limit bypass Johannes Weiner
@ 2013-10-30 21:55 ` Johannes Weiner
2013-10-30 21:55 ` [patch 3/3] mm: memcg: fix test for child groups Johannes Weiner
2 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2013-10-30 21:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel
The memcg OOM lock is a mutex-type lock that is open-coded due to
memcg's special needs. Add annotations for lockdep coverage.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 13a9c80d5708..3e8cd0d9f716 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -54,6 +54,7 @@
#include <linux/page_cgroup.h>
#include <linux/cpu.h>
#include <linux/oom.h>
+#include <linux/lockdep.h>
#include "internal.h"
#include <net/sock.h>
#include <net/ip.h>
@@ -2046,6 +2047,12 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
return total;
}
+#ifdef CONFIG_LOCKDEP
+static struct lockdep_map memcg_oom_lock_dep_map = {
+ .name = "memcg_oom_lock",
+};
+#endif
+
static DEFINE_SPINLOCK(memcg_oom_lock);
/*
@@ -2083,7 +2090,8 @@ static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg)
}
iter->oom_lock = false;
}
- }
+ } else
+ mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_);
spin_unlock(&memcg_oom_lock);
@@ -2095,6 +2103,7 @@ static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg)
struct mem_cgroup *iter;
spin_lock(&memcg_oom_lock);
+ mutex_release(&memcg_oom_lock_dep_map, 1, _RET_IP_);
for_each_mem_cgroup_tree(iter, memcg)
iter->oom_lock = false;
spin_unlock(&memcg_oom_lock);
--
1.8.4.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [patch 3/3] mm: memcg: fix test for child groups
2013-10-30 21:55 [patch 0/3 resend] memcg fixes for 3.12 Johannes Weiner
2013-10-30 21:55 ` [patch 1/3] mm: memcg: use proper memcg in limit bypass Johannes Weiner
2013-10-30 21:55 ` [patch 2/3] mm: memcg: lockdep annotation for memcg OOM lock Johannes Weiner
@ 2013-10-30 21:55 ` Johannes Weiner
2 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2013-10-30 21:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel
When memcg code needs to know whether any given memcg has children, it
uses the cgroup child iteration primitives and returns true/false
depending on whether the iteration loop is executed at least once or
not.
Because a cgroup's list of children is RCU protected, these primitives
require the RCU read-lock to be held, which is not the case for all
memcg callers. This results in the following splat when e.g. enabling
hierarchy mode:
[ 3.683974] WARNING: CPU: 3 PID: 1 at /home/hannes/src/linux/linux/kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
[ 3.686266] CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9-dirty #18
[ 3.688616] Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
[ 3.690900] 0000000000000009 ffff88013227bdc8 ffffffff8173602f 0000000000000000
[ 3.693225] ffff88013227be00 ffffffff81090af8 0000000000000000 ffff88013220d000
[ 3.695606] ffff8800b6c50028 ffff88013220d000 0000000000000000 ffff88013227be10
[ 3.697950] Call Trace:
[ 3.700233] [<ffffffff8173602f>] dump_stack+0x54/0x74
[ 3.702503] [<ffffffff81090af8>] warn_slowpath_common+0x78/0xa0
[ 3.704764] [<ffffffff81090c0a>] warn_slowpath_null+0x1a/0x20
[ 3.707009] [<ffffffff81101173>] css_next_child+0xa3/0x160
[ 3.709255] [<ffffffff8118ae7b>] mem_cgroup_hierarchy_write+0x5b/0xa0
[ 3.711497] [<ffffffff810fe428>] cgroup_file_write+0x108/0x2a0
[ 3.713721] [<ffffffff8119b90d>] ? __sb_start_write+0xed/0x1b0
[ 3.715936] [<ffffffff811980fb>] ? vfs_write+0x1bb/0x1e0
[ 3.718155] [<ffffffff810b8d3f>] ? up_write+0x1f/0x40
[ 3.720356] [<ffffffff81197ffd>] vfs_write+0xbd/0x1e0
[ 3.722539] [<ffffffff8119820c>] SyS_write+0x4c/0xa0
[ 3.724685] [<ffffffff817400d2>] system_call_fastpath+0x16/0x1b
[ 3.726809] ---[ end trace ec33c7d4de043d06 ]---
In the memcg case, we only care about children when we are attempting
to modify inheritable attributes interactively. Racing with deletion
could mean a spurious -EBUSY, no problem. Racing with addition is
handled just fine as well through the memcg_create_mutex: if the child
group is not on the list after the mutex is acquired, it won't be
initialized from the parent's attributes until after the unlock.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 35 +++++++++++------------------------
1 file changed, 11 insertions(+), 24 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3e8cd0d9f716..8804be1cb826 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4959,31 +4959,18 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
} while (usage > 0);
}
-/*
- * This mainly exists for tests during the setting of set of use_hierarchy.
- * Since this is the very setting we are changing, the current hierarchy value
- * is meaningless
- */
-static inline bool __memcg_has_children(struct mem_cgroup *memcg)
-{
- struct cgroup_subsys_state *pos;
-
- /* bounce at first found */
- css_for_each_child(pos, &memcg->css)
- return true;
- return false;
-}
-
-/*
- * Must be called with memcg_create_mutex held, unless the cgroup is guaranteed
- * to be already dead (as in mem_cgroup_force_empty, for instance). This is
- * from mem_cgroup_count_children(), in the sense that we don't really care how
- * many children we have; we only need to know if we have any. It also counts
- * any memcg without hierarchy as infertile.
- */
static inline bool memcg_has_children(struct mem_cgroup *memcg)
{
- return memcg->use_hierarchy && __memcg_has_children(memcg);
+ lockdep_assert_held(&memcg_create_mutex);
+ /*
+ * The lock does not prevent addition or deletion to the list
+ * of children, but it prevents a new child from being
+ * initialized based on this parent in css_online(), so it's
+ * enough to decide whether hierarchically inherited
+ * attributes can still be changed or not.
+ */
+ return memcg->use_hierarchy &&
+ !list_empty(&memcg->css.cgroup->children);
}
/*
@@ -5063,7 +5050,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css,
*/
if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
(val == 1 || val == 0)) {
- if (!__memcg_has_children(memcg))
+ if (list_empty(&memcg->css.cgroup->children))
memcg->use_hierarchy = val;
else
retval = -EBUSY;
--
1.8.4.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-11-01 14:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-30 21:55 [patch 0/3 resend] memcg fixes for 3.12 Johannes Weiner
2013-10-30 21:55 ` [patch 1/3] mm: memcg: use proper memcg in limit bypass Johannes Weiner
2013-10-30 21:55 ` [patch 2/3] mm: memcg: lockdep annotation for memcg OOM lock Johannes Weiner
2013-10-30 21:55 ` [patch 3/3] mm: memcg: fix test for child groups Johannes Weiner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).