* [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator
@ 2013-06-05 22:53 Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Johannes Weiner @ 2013-06-05 22:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Tejun Heo, Michal Hocko, linux-mm, cgroups, linux-kernel
The lockless reclaim hierarchy iterator currently has a misplaced
barrier that can lead to use-after-free crashes.
The reclaim hierarchy iterator consist of a sequence count and a
position pointer that are read and written locklessly, with memory
barriers enforcing ordering.
The write side sets the position pointer first, then updates the
sequence count to "publish" the new position. Likewise, the read side
must read the sequence count first, then the position. If the
sequence count is up to date, it's guaranteed that the position is up
to date as well:
writer: reader:
iter->position = position if iter->sequence == expected:
smp_wmb() smp_rmb()
iter->sequence = sequence position = iter->position
However, the read side barrier is currently misplaced, which can lead
to dereferencing stale position pointers that no longer point to valid
memory. Fix this.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org [3.10+]
---
mm/memcontrol.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010d6c1..e2cbb44 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
mz = mem_cgroup_zoneinfo(root, nid, zid);
iter = &mz->reclaim_iter[reclaim->priority];
- last_visited = iter->last_visited;
if (prev && reclaim->generation != iter->generation) {
iter->last_visited = NULL;
goto out_unlock;
@@ -1218,13 +1217,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
* is alive.
*/
dead_count = atomic_read(&root->dead_count);
- smp_rmb();
- last_visited = iter->last_visited;
- if (last_visited) {
- if ((dead_count != iter->last_dead_count) ||
- !css_tryget(&last_visited->css)) {
+ if (dead_count == iter->last_dead_count) {
+ smp_rmb();
+ last_visited = iter->last_visited;
+ if (last_visited &&
+ !css_tryget(&last_visited->css))
last_visited = NULL;
- }
}
}
--
1.8.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating
2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner
@ 2013-06-05 22:53 ` Johannes Weiner
2013-06-05 23:06 ` Tejun Heo
2013-06-06 8:29 ` Michal Hocko
2013-06-05 23:06 ` [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Tejun Heo
2013-06-06 8:28 ` Michal Hocko
2 siblings, 2 replies; 6+ messages in thread
From: Johannes Weiner @ 2013-06-05 22:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Tejun Heo, Michal Hocko, linux-mm, cgroups, linux-kernel
mem_cgroup_iter() is too hard to follow. Factor out the lockless
reclaim iterator loading and updating so it's easier to follow the big
picture.
Also document the iterator invalidation mechanism a bit more
extensively.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 57 insertions(+), 29 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2cbb44..23a9236 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1148,6 +1148,58 @@ skip_node:
return NULL;
}
+static void mem_cgroup_iter_invalidate(struct mem_cgroup *root)
+{
+ /*
+ * When a group in the hierarchy below root is destroyed, the
+ * hierarchy iterator can no longer be trusted since it might
+ * have pointed to the destroyed group. Invalidate it.
+ */
+ atomic_inc(&root->dead_count);
+}
+
+static struct mem_cgroup *
+mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter,
+ struct mem_cgroup *root,
+ int *sequence)
+{
+ struct mem_cgroup *position = NULL;
+ /*
+ * A cgroup destruction happens in two stages: offlining and
+ * release. They are separated by a RCU grace period.
+ *
+ * If the iterator is valid, we may still race with an
+ * offlining. The RCU lock ensures the object won't be
+ * released, tryget will fail if we lost the race.
+ */
+ *sequence = atomic_read(&root->dead_count);
+ if (iter->last_dead_count == *sequence) {
+ smp_rmb();
+ position = iter->last_visited;
+ if (position && !css_tryget(&position->css))
+ position = NULL;
+ }
+ return position;
+}
+
+static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter,
+ struct mem_cgroup *last_visited,
+ struct mem_cgroup *new_position,
+ int sequence)
+{
+ if (last_visited)
+ css_put(&last_visited->css);
+ /*
+ * We store the sequence count from the time @last_visited was
+ * loaded successfully instead of rereading it here so that we
+ * don't lose destruction events in between. We could have
+ * raced with the destruction of @new_position after all.
+ */
+ iter->last_visited = new_position;
+ smp_wmb();
+ iter->last_dead_count = sequence;
+}
+
/**
* mem_cgroup_iter - iterate over memory cgroup hierarchy
* @root: hierarchy root
@@ -1171,7 +1223,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
{
struct mem_cgroup *memcg = NULL;
struct mem_cgroup *last_visited = NULL;
- unsigned long uninitialized_var(dead_count);
if (mem_cgroup_disabled())
return NULL;
@@ -1191,6 +1242,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
rcu_read_lock();
while (!memcg) {
struct mem_cgroup_reclaim_iter *uninitialized_var(iter);
+ int uninitialized_var(seq);
if (reclaim) {
int nid = zone_to_nid(reclaim->zone);
@@ -1204,37 +1256,13 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
goto out_unlock;
}
- /*
- * If the dead_count mismatches, a destruction
- * has happened or is happening concurrently.
- * If the dead_count matches, a destruction
- * might still happen concurrently, but since
- * we checked under RCU, that destruction
- * won't free the object until we release the
- * RCU reader lock. Thus, the dead_count
- * check verifies the pointer is still valid,
- * css_tryget() verifies the cgroup pointed to
- * is alive.
- */
- dead_count = atomic_read(&root->dead_count);
- if (dead_count == iter->last_dead_count) {
- smp_rmb();
- last_visited = iter->last_visited;
- if (last_visited &&
- !css_tryget(&last_visited->css))
- last_visited = NULL;
- }
+ last_visited = mem_cgroup_iter_load(iter, root, &seq);
}
memcg = __mem_cgroup_iter_next(root, last_visited);
if (reclaim) {
- if (last_visited)
- css_put(&last_visited->css);
-
- iter->last_visited = memcg;
- smp_wmb();
- iter->last_dead_count = dead_count;
+ mem_cgroup_iter_update(iter, last_visited, memcg, seq);
if (!memcg)
iter->generation++;
@@ -6319,14 +6347,14 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
struct mem_cgroup *parent = memcg;
while ((parent = parent_mem_cgroup(parent)))
- atomic_inc(&parent->dead_count);
+ mem_cgroup_iter_invalidate(parent);
/*
* if the root memcg is not hierarchical we have to check it
* explicitely.
*/
if (!root_mem_cgroup->use_hierarchy)
- atomic_inc(&root_mem_cgroup->dead_count);
+ mem_cgroup_iter_invalidate(root_mem_cgroup);
}
static void mem_cgroup_css_offline(struct cgroup *cont)
--
1.8.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator
2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
@ 2013-06-05 23:06 ` Tejun Heo
2013-06-06 8:28 ` Michal Hocko
2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-06-05 23:06 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel
On Wed, Jun 05, 2013 at 06:53:45PM -0400, Johannes Weiner wrote:
> The lockless reclaim hierarchy iterator currently has a misplaced
> barrier that can lead to use-after-free crashes.
>
> The reclaim hierarchy iterator consist of a sequence count and a
> position pointer that are read and written locklessly, with memory
> barriers enforcing ordering.
>
> The write side sets the position pointer first, then updates the
> sequence count to "publish" the new position. Likewise, the read side
> must read the sequence count first, then the position. If the
> sequence count is up to date, it's guaranteed that the position is up
> to date as well:
>
> writer: reader:
> iter->position = position if iter->sequence == expected:
> smp_wmb() smp_rmb()
> iter->sequence = sequence position = iter->position
>
> However, the read side barrier is currently misplaced, which can lead
> to dereferencing stale position pointers that no longer point to valid
> memory. Fix this.
>
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: stable@kernel.org [3.10+]
Reviewed-by: Tejun Heo <tj@kernel.org>
Oops, right, the references were reversed too.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
@ 2013-06-05 23:06 ` Tejun Heo
2013-06-06 8:29 ` Michal Hocko
1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-06-05 23:06 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel
On Wed, Jun 05, 2013 at 06:53:46PM -0400, Johannes Weiner wrote:
> mem_cgroup_iter() is too hard to follow. Factor out the lockless
> reclaim iterator loading and updating so it's easier to follow the big
> picture.
>
> Also document the iterator invalidation mechanism a bit more
> extensively.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator
2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
2013-06-05 23:06 ` [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Tejun Heo
@ 2013-06-06 8:28 ` Michal Hocko
2 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2013-06-06 8:28 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel
On Wed 05-06-13 18:53:45, Johannes Weiner wrote:
> The lockless reclaim hierarchy iterator currently has a misplaced
> barrier that can lead to use-after-free crashes.
>
> The reclaim hierarchy iterator consist of a sequence count and a
> position pointer that are read and written locklessly, with memory
> barriers enforcing ordering.
>
> The write side sets the position pointer first, then updates the
> sequence count to "publish" the new position. Likewise, the read side
> must read the sequence count first, then the position. If the
> sequence count is up to date, it's guaranteed that the position is up
> to date as well:
>
> writer: reader:
> iter->position = position if iter->sequence == expected:
> smp_wmb() smp_rmb()
> iter->sequence = sequence position = iter->position
>
> However, the read side barrier is currently misplaced, which can lead
> to dereferencing stale position pointers that no longer point to valid
> memory. Fix this.
>
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: stable@kernel.org [3.10+]
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 010d6c1..e2cbb44 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>
> mz = mem_cgroup_zoneinfo(root, nid, zid);
> iter = &mz->reclaim_iter[reclaim->priority];
> - last_visited = iter->last_visited;
> if (prev && reclaim->generation != iter->generation) {
> iter->last_visited = NULL;
> goto out_unlock;
> @@ -1218,13 +1217,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> * is alive.
> */
> dead_count = atomic_read(&root->dead_count);
> - smp_rmb();
> - last_visited = iter->last_visited;
> - if (last_visited) {
> - if ((dead_count != iter->last_dead_count) ||
> - !css_tryget(&last_visited->css)) {
> + if (dead_count == iter->last_dead_count) {
> + smp_rmb();
> + last_visited = iter->last_visited;
> + if (last_visited &&
> + !css_tryget(&last_visited->css))
> last_visited = NULL;
> - }
> }
> }
>
> --
> 1.8.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
2013-06-05 23:06 ` Tejun Heo
@ 2013-06-06 8:29 ` Michal Hocko
1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2013-06-06 8:29 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel
On Wed 05-06-13 18:53:46, Johannes Weiner wrote:
> mem_cgroup_iter() is too hard to follow. Factor out the lockless
> reclaim iterator loading and updating so it's easier to follow the big
> picture.
>
> Also document the iterator invalidation mechanism a bit more
> extensively.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
I like this
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 57 insertions(+), 29 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e2cbb44..23a9236 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1148,6 +1148,58 @@ skip_node:
> return NULL;
> }
>
> +static void mem_cgroup_iter_invalidate(struct mem_cgroup *root)
> +{
> + /*
> + * When a group in the hierarchy below root is destroyed, the
> + * hierarchy iterator can no longer be trusted since it might
> + * have pointed to the destroyed group. Invalidate it.
> + */
> + atomic_inc(&root->dead_count);
> +}
> +
> +static struct mem_cgroup *
> +mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter,
> + struct mem_cgroup *root,
> + int *sequence)
> +{
> + struct mem_cgroup *position = NULL;
> + /*
> + * A cgroup destruction happens in two stages: offlining and
> + * release. They are separated by a RCU grace period.
> + *
> + * If the iterator is valid, we may still race with an
> + * offlining. The RCU lock ensures the object won't be
> + * released, tryget will fail if we lost the race.
> + */
> + *sequence = atomic_read(&root->dead_count);
> + if (iter->last_dead_count == *sequence) {
> + smp_rmb();
> + position = iter->last_visited;
> + if (position && !css_tryget(&position->css))
> + position = NULL;
> + }
> + return position;
> +}
> +
> +static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter,
> + struct mem_cgroup *last_visited,
> + struct mem_cgroup *new_position,
> + int sequence)
> +{
> + if (last_visited)
> + css_put(&last_visited->css);
> + /*
> + * We store the sequence count from the time @last_visited was
> + * loaded successfully instead of rereading it here so that we
> + * don't lose destruction events in between. We could have
> + * raced with the destruction of @new_position after all.
> + */
> + iter->last_visited = new_position;
> + smp_wmb();
> + iter->last_dead_count = sequence;
> +}
> +
> /**
> * mem_cgroup_iter - iterate over memory cgroup hierarchy
> * @root: hierarchy root
> @@ -1171,7 +1223,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> {
> struct mem_cgroup *memcg = NULL;
> struct mem_cgroup *last_visited = NULL;
> - unsigned long uninitialized_var(dead_count);
>
> if (mem_cgroup_disabled())
> return NULL;
> @@ -1191,6 +1242,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> rcu_read_lock();
> while (!memcg) {
> struct mem_cgroup_reclaim_iter *uninitialized_var(iter);
> + int uninitialized_var(seq);
>
> if (reclaim) {
> int nid = zone_to_nid(reclaim->zone);
> @@ -1204,37 +1256,13 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> goto out_unlock;
> }
>
> - /*
> - * If the dead_count mismatches, a destruction
> - * has happened or is happening concurrently.
> - * If the dead_count matches, a destruction
> - * might still happen concurrently, but since
> - * we checked under RCU, that destruction
> - * won't free the object until we release the
> - * RCU reader lock. Thus, the dead_count
> - * check verifies the pointer is still valid,
> - * css_tryget() verifies the cgroup pointed to
> - * is alive.
> - */
> - dead_count = atomic_read(&root->dead_count);
> - if (dead_count == iter->last_dead_count) {
> - smp_rmb();
> - last_visited = iter->last_visited;
> - if (last_visited &&
> - !css_tryget(&last_visited->css))
> - last_visited = NULL;
> - }
> + last_visited = mem_cgroup_iter_load(iter, root, &seq);
> }
>
> memcg = __mem_cgroup_iter_next(root, last_visited);
>
> if (reclaim) {
> - if (last_visited)
> - css_put(&last_visited->css);
> -
> - iter->last_visited = memcg;
> - smp_wmb();
> - iter->last_dead_count = dead_count;
> + mem_cgroup_iter_update(iter, last_visited, memcg, seq);
>
> if (!memcg)
> iter->generation++;
> @@ -6319,14 +6347,14 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
> struct mem_cgroup *parent = memcg;
>
> while ((parent = parent_mem_cgroup(parent)))
> - atomic_inc(&parent->dead_count);
> + mem_cgroup_iter_invalidate(parent);
>
> /*
> * if the root memcg is not hierarchical we have to check it
> * explicitely.
> */
> if (!root_mem_cgroup->use_hierarchy)
> - atomic_inc(&root_mem_cgroup->dead_count);
> + mem_cgroup_iter_invalidate(root_mem_cgroup);
> }
>
> static void mem_cgroup_css_offline(struct cgroup *cont)
> --
> 1.8.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-06-06 8:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
2013-06-05 23:06 ` Tejun Heo
2013-06-06 8:29 ` Michal Hocko
2013-06-05 23:06 ` [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Tejun Heo
2013-06-06 8:28 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).