* [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator
@ 2013-06-05 22:53 Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Johannes Weiner @ 2013-06-05 22:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Tejun Heo, Michal Hocko, linux-mm, cgroups, linux-kernel
The lockless reclaim hierarchy iterator currently has a misplaced
barrier that can lead to use-after-free crashes.
The reclaim hierarchy iterator consist of a sequence count and a
position pointer that are read and written locklessly, with memory
barriers enforcing ordering.
The write side sets the position pointer first, then updates the
sequence count to "publish" the new position. Likewise, the read side
must read the sequence count first, then the position. If the
sequence count is up to date, it's guaranteed that the position is up
to date as well:
writer: reader:
iter->position = position if iter->sequence == expected:
smp_wmb() smp_rmb()
iter->sequence = sequence position = iter->position
However, the read side barrier is currently misplaced, which can lead
to dereferencing stale position pointers that no longer point to valid
memory. Fix this.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org [3.10+]
---
mm/memcontrol.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010d6c1..e2cbb44 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
mz = mem_cgroup_zoneinfo(root, nid, zid);
iter = &mz->reclaim_iter[reclaim->priority];
- last_visited = iter->last_visited;
if (prev && reclaim->generation != iter->generation) {
iter->last_visited = NULL;
goto out_unlock;
@@ -1218,13 +1217,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
* is alive.
*/
dead_count = atomic_read(&root->dead_count);
- smp_rmb();
- last_visited = iter->last_visited;
- if (last_visited) {
- if ((dead_count != iter->last_dead_count) ||
- !css_tryget(&last_visited->css)) {
+ if (dead_count == iter->last_dead_count) {
+ smp_rmb();
+ last_visited = iter->last_visited;
+ if (last_visited &&
+ !css_tryget(&last_visited->css))
last_visited = NULL;
- }
}
}
--
1.8.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread* [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating 2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner @ 2013-06-05 22:53 ` Johannes Weiner 2013-06-05 23:06 ` Tejun Heo [not found] ` <1370472826-29959-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> [not found] ` <1370472826-29959-1-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> 2013-06-06 8:28 ` Michal Hocko 2 siblings, 2 replies; 6+ messages in thread From: Johannes Weiner @ 2013-06-05 22:53 UTC (permalink / raw) To: Andrew Morton; +Cc: Tejun Heo, Michal Hocko, linux-mm, cgroups, linux-kernel mem_cgroup_iter() is too hard to follow. Factor out the lockless reclaim iterator loading and updating so it's easier to follow the big picture. Also document the iterator invalidation mechanism a bit more extensively. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 57 insertions(+), 29 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e2cbb44..23a9236 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1148,6 +1148,58 @@ skip_node: return NULL; } +static void mem_cgroup_iter_invalidate(struct mem_cgroup *root) +{ + /* + * When a group in the hierarchy below root is destroyed, the + * hierarchy iterator can no longer be trusted since it might + * have pointed to the destroyed group. Invalidate it. + */ + atomic_inc(&root->dead_count); +} + +static struct mem_cgroup * +mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter, + struct mem_cgroup *root, + int *sequence) +{ + struct mem_cgroup *position = NULL; + /* + * A cgroup destruction happens in two stages: offlining and + * release. They are separated by a RCU grace period. + * + * If the iterator is valid, we may still race with an + * offlining. The RCU lock ensures the object won't be + * released, tryget will fail if we lost the race. + */ + *sequence = atomic_read(&root->dead_count); + if (iter->last_dead_count == *sequence) { + smp_rmb(); + position = iter->last_visited; + if (position && !css_tryget(&position->css)) + position = NULL; + } + return position; +} + +static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter, + struct mem_cgroup *last_visited, + struct mem_cgroup *new_position, + int sequence) +{ + if (last_visited) + css_put(&last_visited->css); + /* + * We store the sequence count from the time @last_visited was + * loaded successfully instead of rereading it here so that we + * don't lose destruction events in between. We could have + * raced with the destruction of @new_position after all. + */ + iter->last_visited = new_position; + smp_wmb(); + iter->last_dead_count = sequence; +} + /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root @@ -1171,7 +1223,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, { struct mem_cgroup *memcg = NULL; struct mem_cgroup *last_visited = NULL; - unsigned long uninitialized_var(dead_count); if (mem_cgroup_disabled()) return NULL; @@ -1191,6 +1242,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, rcu_read_lock(); while (!memcg) { struct mem_cgroup_reclaim_iter *uninitialized_var(iter); + int uninitialized_var(seq); if (reclaim) { int nid = zone_to_nid(reclaim->zone); @@ -1204,37 +1256,13 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, goto out_unlock; } - /* - * If the dead_count mismatches, a destruction - * has happened or is happening concurrently. - * If the dead_count matches, a destruction - * might still happen concurrently, but since - * we checked under RCU, that destruction - * won't free the object until we release the - * RCU reader lock. Thus, the dead_count - * check verifies the pointer is still valid, - * css_tryget() verifies the cgroup pointed to - * is alive. - */ - dead_count = atomic_read(&root->dead_count); - if (dead_count == iter->last_dead_count) { - smp_rmb(); - last_visited = iter->last_visited; - if (last_visited && - !css_tryget(&last_visited->css)) - last_visited = NULL; - } + last_visited = mem_cgroup_iter_load(iter, root, &seq); } memcg = __mem_cgroup_iter_next(root, last_visited); if (reclaim) { - if (last_visited) - css_put(&last_visited->css); - - iter->last_visited = memcg; - smp_wmb(); - iter->last_dead_count = dead_count; + mem_cgroup_iter_update(iter, last_visited, memcg, seq); if (!memcg) iter->generation++; @@ -6319,14 +6347,14 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg) struct mem_cgroup *parent = memcg; while ((parent = parent_mem_cgroup(parent))) - atomic_inc(&parent->dead_count); + mem_cgroup_iter_invalidate(parent); /* * if the root memcg is not hierarchical we have to check it * explicitely. */ if (!root_mem_cgroup->use_hierarchy) - atomic_inc(&root_mem_cgroup->dead_count); + mem_cgroup_iter_invalidate(root_mem_cgroup); } static void mem_cgroup_css_offline(struct cgroup *cont) -- 1.8.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating 2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner @ 2013-06-05 23:06 ` Tejun Heo [not found] ` <1370472826-29959-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> 1 sibling, 0 replies; 6+ messages in thread From: Tejun Heo @ 2013-06-05 23:06 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel On Wed, Jun 05, 2013 at 06:53:46PM -0400, Johannes Weiner wrote: > mem_cgroup_iter() is too hard to follow. Factor out the lockless > reclaim iterator loading and updating so it's easier to follow the big > picture. > > Also document the iterator invalidation mechanism a bit more > extensively. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1370472826-29959-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating [not found] ` <1370472826-29959-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2013-06-06 8:29 ` Michal Hocko 0 siblings, 0 replies; 6+ messages in thread From: Michal Hocko @ 2013-06-06 8:29 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Tejun Heo, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed 05-06-13 18:53:46, Johannes Weiner wrote: > mem_cgroup_iter() is too hard to follow. Factor out the lockless > reclaim iterator loading and updating so it's easier to follow the big > picture. > > Also document the iterator invalidation mechanism a bit more > extensively. > > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> I like this Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> > --- > mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 57 insertions(+), 29 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index e2cbb44..23a9236 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1148,6 +1148,58 @@ skip_node: > return NULL; > } > > +static void mem_cgroup_iter_invalidate(struct mem_cgroup *root) > +{ > + /* > + * When a group in the hierarchy below root is destroyed, the > + * hierarchy iterator can no longer be trusted since it might > + * have pointed to the destroyed group. Invalidate it. > + */ > + atomic_inc(&root->dead_count); > +} > + > +static struct mem_cgroup * > +mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter, > + struct mem_cgroup *root, > + int *sequence) > +{ > + struct mem_cgroup *position = NULL; > + /* > + * A cgroup destruction happens in two stages: offlining and > + * release. They are separated by a RCU grace period. > + * > + * If the iterator is valid, we may still race with an > + * offlining. The RCU lock ensures the object won't be > + * released, tryget will fail if we lost the race. > + */ > + *sequence = atomic_read(&root->dead_count); > + if (iter->last_dead_count == *sequence) { > + smp_rmb(); > + position = iter->last_visited; > + if (position && !css_tryget(&position->css)) > + position = NULL; > + } > + return position; > +} > + > +static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter, > + struct mem_cgroup *last_visited, > + struct mem_cgroup *new_position, > + int sequence) > +{ > + if (last_visited) > + css_put(&last_visited->css); > + /* > + * We store the sequence count from the time @last_visited was > + * loaded successfully instead of rereading it here so that we > + * don't lose destruction events in between. We could have > + * raced with the destruction of @new_position after all. > + */ > + iter->last_visited = new_position; > + smp_wmb(); > + iter->last_dead_count = sequence; > +} > + > /** > * mem_cgroup_iter - iterate over memory cgroup hierarchy > * @root: hierarchy root > @@ -1171,7 +1223,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > { > struct mem_cgroup *memcg = NULL; > struct mem_cgroup *last_visited = NULL; > - unsigned long uninitialized_var(dead_count); > > if (mem_cgroup_disabled()) > return NULL; > @@ -1191,6 +1242,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > rcu_read_lock(); > while (!memcg) { > struct mem_cgroup_reclaim_iter *uninitialized_var(iter); > + int uninitialized_var(seq); > > if (reclaim) { > int nid = zone_to_nid(reclaim->zone); > @@ -1204,37 +1256,13 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > goto out_unlock; > } > > - /* > - * If the dead_count mismatches, a destruction > - * has happened or is happening concurrently. > - * If the dead_count matches, a destruction > - * might still happen concurrently, but since > - * we checked under RCU, that destruction > - * won't free the object until we release the > - * RCU reader lock. Thus, the dead_count > - * check verifies the pointer is still valid, > - * css_tryget() verifies the cgroup pointed to > - * is alive. > - */ > - dead_count = atomic_read(&root->dead_count); > - if (dead_count == iter->last_dead_count) { > - smp_rmb(); > - last_visited = iter->last_visited; > - if (last_visited && > - !css_tryget(&last_visited->css)) > - last_visited = NULL; > - } > + last_visited = mem_cgroup_iter_load(iter, root, &seq); > } > > memcg = __mem_cgroup_iter_next(root, last_visited); > > if (reclaim) { > - if (last_visited) > - css_put(&last_visited->css); > - > - iter->last_visited = memcg; > - smp_wmb(); > - iter->last_dead_count = dead_count; > + mem_cgroup_iter_update(iter, last_visited, memcg, seq); > > if (!memcg) > iter->generation++; > @@ -6319,14 +6347,14 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg) > struct mem_cgroup *parent = memcg; > > while ((parent = parent_mem_cgroup(parent))) > - atomic_inc(&parent->dead_count); > + mem_cgroup_iter_invalidate(parent); > > /* > * if the root memcg is not hierarchical we have to check it > * explicitely. > */ > if (!root_mem_cgroup->use_hierarchy) > - atomic_inc(&root_mem_cgroup->dead_count); > + mem_cgroup_iter_invalidate(root_mem_cgroup); > } > > static void mem_cgroup_css_offline(struct cgroup *cont) > -- > 1.8.3 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1370472826-29959-1-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator [not found] ` <1370472826-29959-1-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2013-06-05 23:06 ` Tejun Heo 0 siblings, 0 replies; 6+ messages in thread From: Tejun Heo @ 2013-06-05 23:06 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, Jun 05, 2013 at 06:53:45PM -0400, Johannes Weiner wrote: > The lockless reclaim hierarchy iterator currently has a misplaced > barrier that can lead to use-after-free crashes. > > The reclaim hierarchy iterator consist of a sequence count and a > position pointer that are read and written locklessly, with memory > barriers enforcing ordering. > > The write side sets the position pointer first, then updates the > sequence count to "publish" the new position. Likewise, the read side > must read the sequence count first, then the position. If the > sequence count is up to date, it's guaranteed that the position is up > to date as well: > > writer: reader: > iter->position = position if iter->sequence == expected: > smp_wmb() smp_rmb() > iter->sequence = sequence position = iter->position > > However, the read side barrier is currently misplaced, which can lead > to dereferencing stale position pointers that no longer point to valid > memory. Fix this. > > Reported-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > Cc: stable-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org [3.10+] Reviewed-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Oops, right, the references were reversed too. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator 2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner 2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner [not found] ` <1370472826-29959-1-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2013-06-06 8:28 ` Michal Hocko 2 siblings, 0 replies; 6+ messages in thread From: Michal Hocko @ 2013-06-06 8:28 UTC (permalink / raw) To: Johannes Weiner; +Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel On Wed 05-06-13 18:53:45, Johannes Weiner wrote: > The lockless reclaim hierarchy iterator currently has a misplaced > barrier that can lead to use-after-free crashes. > > The reclaim hierarchy iterator consist of a sequence count and a > position pointer that are read and written locklessly, with memory > barriers enforcing ordering. > > The write side sets the position pointer first, then updates the > sequence count to "publish" the new position. Likewise, the read side > must read the sequence count first, then the position. If the > sequence count is up to date, it's guaranteed that the position is up > to date as well: > > writer: reader: > iter->position = position if iter->sequence == expected: > smp_wmb() smp_rmb() > iter->sequence = sequence position = iter->position > > However, the read side barrier is currently misplaced, which can lead > to dereferencing stale position pointers that no longer point to valid > memory. Fix this. > > Reported-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > Cc: stable@kernel.org [3.10+] Acked-by: Michal Hocko <mhocko@suse.cz> > --- > mm/memcontrol.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 010d6c1..e2cbb44 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > > mz = mem_cgroup_zoneinfo(root, nid, zid); > iter = &mz->reclaim_iter[reclaim->priority]; > - last_visited = iter->last_visited; > if (prev && reclaim->generation != iter->generation) { > iter->last_visited = NULL; > goto out_unlock; > @@ -1218,13 +1217,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > * is alive. > */ > dead_count = atomic_read(&root->dead_count); > - smp_rmb(); > - last_visited = iter->last_visited; > - if (last_visited) { > - if ((dead_count != iter->last_dead_count) || > - !css_tryget(&last_visited->css)) { > + if (dead_count == iter->last_dead_count) { > + smp_rmb(); > + last_visited = iter->last_visited; > + if (last_visited && > + !css_tryget(&last_visited->css)) > last_visited = NULL; > - } > } > } > > -- > 1.8.3 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-06-06 8:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-05 22:53 [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Johannes Weiner
2013-06-05 22:53 ` [patch 2/2] mm: memcontrol: factor out reclaim iterator loading and updating Johannes Weiner
2013-06-05 23:06 ` Tejun Heo
[not found] ` <1370472826-29959-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-06-06 8:29 ` Michal Hocko
[not found] ` <1370472826-29959-1-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-06-05 23:06 ` [patch 1/2] mm: memcontrol: fix lockless reclaim hierarchy iterator Tejun Heo
2013-06-06 8:28 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox