All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>
Subject: Re: [RFC 2/5] memcg: rework mem_cgroup_iter to use cgroup iterators
Date: Thu, 15 Nov 2012 10:52:35 +0100	[thread overview]
Message-ID: <20121115095235.GC11990@dhcp22.suse.cz> (raw)
In-Reply-To: <50A46BB6.6070902@jp.fujitsu.com>

On Thu 15-11-12 13:12:38, KAMEZAWA Hiroyuki wrote:
> (2012/11/14 19:10), Michal Hocko wrote:
> >On Wed 14-11-12 09:20:03, KAMEZAWA Hiroyuki wrote:
> >>(2012/11/14 0:30), Michal Hocko wrote:
> >[...]
> >>>@@ -1096,30 +1096,64 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >>>   			mz = mem_cgroup_zoneinfo(root, nid, zid);
> >>>   			iter = &mz->reclaim_iter[reclaim->priority];
> >>>   			spin_lock(&iter->iter_lock);
> >>>+			last_visited = iter->last_visited;
> >>>   			if (prev && reclaim->generation != iter->generation) {
> >>>+				if (last_visited) {
> >>>+					mem_cgroup_put(last_visited);
> >>>+					iter->last_visited = NULL;
> >>>+				}
> >>>   				spin_unlock(&iter->iter_lock);
> >>>   				return NULL;
> >>>   			}
> >>>-			id = iter->position;
> >>>   		}
> >>>
> >>>   		rcu_read_lock();
> >>>-		css = css_get_next(&mem_cgroup_subsys, id + 1, &root->css, &id);
> >>>-		if (css) {
> >>>-			if (css == &root->css || css_tryget(css))
> >>>-				memcg = mem_cgroup_from_css(css);
> >>>-		} else
> >>>-			id = 0;
> >>>-		rcu_read_unlock();
> >>>+		/*
> >>>+		 * Root is not visited by cgroup iterators so it needs a special
> >>>+		 * treatment.
> >>>+		 */
> >>>+		if (!last_visited) {
> >>>+			css = &root->css;
> >>>+		} else {
> >>>+			struct cgroup *next_cgroup;
> >>>+
> >>>+			next_cgroup = cgroup_next_descendant_pre(
> >>>+					last_visited->css.cgroup,
> >>>+					root->css.cgroup);
> >>
> >>Maybe I miss something but.... last_visited is holded by memcg's refcnt.
> >>The cgroup pointed by css.cgroup is by cgroup's refcnt which can be freed
> >>before memcg is freed and last_visited->css.cgroup is out of RCU cycle.
> >>Is this safe ?
> >
> >Good spotted. You are right. What I need to do is to check that the
> >last_visited is alive and restart from the root if not. Something like
> >the bellow (incremental patch on top of this one) should help, right?
> >
> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >index 30efd7e..c0a91a3 100644
> >--- a/mm/memcontrol.c
> >+++ b/mm/memcontrol.c
> >@@ -1105,6 +1105,16 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >  				spin_unlock(&iter->iter_lock);
> >  				return NULL;
> >  			}
> >+			/*
> >+			 * memcg is still valid because we hold a reference but
> >+			 * its cgroup might have vanished in the meantime so
> >+			 * we have to double check it is alive and restart the
> >+			 * tree walk otherwise.
> >+			 */
> >+			if (last_visited && !css_tryget(&last_visited->css)) {
> >+				mem_cgroup_put(last_visited);
> >+				last_visited = NULL;
> >+			}
> >  		}
> >
> >  		rcu_read_lock();
> >@@ -1136,8 +1146,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >  		if (reclaim) {
> >  			struct mem_cgroup *curr = memcg;
> >
> >-			if (last_visited)
> >+			if (last_visited) {
> >+				css_put(&last_visited->css);
> >  				mem_cgroup_put(last_visited);
> >+			}
> >
> >  			if (css && !memcg)
> >  				curr = mem_cgroup_from_css(css);
> >
> 
> I think this will work.

Thanks for double checking. The updated patch:
---

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>
Subject: Re: [RFC 2/5] memcg: rework mem_cgroup_iter to use cgroup iterators
Date: Thu, 15 Nov 2012 10:52:35 +0100	[thread overview]
Message-ID: <20121115095235.GC11990@dhcp22.suse.cz> (raw)
In-Reply-To: <50A46BB6.6070902@jp.fujitsu.com>

On Thu 15-11-12 13:12:38, KAMEZAWA Hiroyuki wrote:
> (2012/11/14 19:10), Michal Hocko wrote:
> >On Wed 14-11-12 09:20:03, KAMEZAWA Hiroyuki wrote:
> >>(2012/11/14 0:30), Michal Hocko wrote:
> >[...]
> >>>@@ -1096,30 +1096,64 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >>>   			mz = mem_cgroup_zoneinfo(root, nid, zid);
> >>>   			iter = &mz->reclaim_iter[reclaim->priority];
> >>>   			spin_lock(&iter->iter_lock);
> >>>+			last_visited = iter->last_visited;
> >>>   			if (prev && reclaim->generation != iter->generation) {
> >>>+				if (last_visited) {
> >>>+					mem_cgroup_put(last_visited);
> >>>+					iter->last_visited = NULL;
> >>>+				}
> >>>   				spin_unlock(&iter->iter_lock);
> >>>   				return NULL;
> >>>   			}
> >>>-			id = iter->position;
> >>>   		}
> >>>
> >>>   		rcu_read_lock();
> >>>-		css = css_get_next(&mem_cgroup_subsys, id + 1, &root->css, &id);
> >>>-		if (css) {
> >>>-			if (css == &root->css || css_tryget(css))
> >>>-				memcg = mem_cgroup_from_css(css);
> >>>-		} else
> >>>-			id = 0;
> >>>-		rcu_read_unlock();
> >>>+		/*
> >>>+		 * Root is not visited by cgroup iterators so it needs a special
> >>>+		 * treatment.
> >>>+		 */
> >>>+		if (!last_visited) {
> >>>+			css = &root->css;
> >>>+		} else {
> >>>+			struct cgroup *next_cgroup;
> >>>+
> >>>+			next_cgroup = cgroup_next_descendant_pre(
> >>>+					last_visited->css.cgroup,
> >>>+					root->css.cgroup);
> >>
> >>Maybe I miss something but.... last_visited is holded by memcg's refcnt.
> >>The cgroup pointed by css.cgroup is by cgroup's refcnt which can be freed
> >>before memcg is freed and last_visited->css.cgroup is out of RCU cycle.
> >>Is this safe ?
> >
> >Good spotted. You are right. What I need to do is to check that the
> >last_visited is alive and restart from the root if not. Something like
> >the bellow (incremental patch on top of this one) should help, right?
> >
> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >index 30efd7e..c0a91a3 100644
> >--- a/mm/memcontrol.c
> >+++ b/mm/memcontrol.c
> >@@ -1105,6 +1105,16 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >  				spin_unlock(&iter->iter_lock);
> >  				return NULL;
> >  			}
> >+			/*
> >+			 * memcg is still valid because we hold a reference but
> >+			 * its cgroup might have vanished in the meantime so
> >+			 * we have to double check it is alive and restart the
> >+			 * tree walk otherwise.
> >+			 */
> >+			if (last_visited && !css_tryget(&last_visited->css)) {
> >+				mem_cgroup_put(last_visited);
> >+				last_visited = NULL;
> >+			}
> >  		}
> >
> >  		rcu_read_lock();
> >@@ -1136,8 +1146,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >  		if (reclaim) {
> >  			struct mem_cgroup *curr = memcg;
> >
> >-			if (last_visited)
> >+			if (last_visited) {
> >+				css_put(&last_visited->css);
> >  				mem_cgroup_put(last_visited);
> >+			}
> >
> >  			if (css && !memcg)
> >  				curr = mem_cgroup_from_css(css);
> >
> 
> I think this will work.

Thanks for double checking. The updated patch:
---
>From 3811de23a0306c229ce44dc655878431a4fdf449 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Tue, 13 Nov 2012 11:40:35 +0100
Subject: [PATCH] memcg: rework mem_cgroup_iter to use cgroup iterators

mem_cgroup_iter curently relies on css->id when walking down a group
hierarchy tree. This is really awkward because the tree walk depends on
the groups creation ordering. The only guarantee is that a parent node
is visited before its children.
Example
 1) mkdir -p a a/d a/b/c
 2) mkdir -a a/b/c a/d
Will create the same trees but the tree walks will be different:
 1) a, d, b, c
 2) a, b, c, d

574bd9f7 (cgroup: implement generic child / descendant walk macros) has
introduced generic cgroup tree walkers which provide either pre-order
or post-order tree walk. This patch converts css->id based iteration
to pre-order tree walk to keep the semantic with the original iterator
where parent is always visited before its subtree.

cgroup_for_each_descendant_pre suggests using post_create and
pre_destroy for proper synchronization with groups addidition resp.
removal. This implementation doesn't use those because a new memory
cgroup is fully initialized in mem_cgroup_create and css_tryget makes
sure that the group is alive when we encounter it by iterator.

If the reclaim cookie is used we need to store the last visited group
into the iterator so we have to be careful that it doesn't disappear in
the mean time. Elevated reference count on the memcg keeps it alive even
though the group have vanished in the meantime. This is checked by
css_tryget and if the group is gone the iteration is restarted from the
beginning.

V2
- iter->last_visited need to check whether the cgroup is still alive.
  Thanks to Kamezawa for pointing this out.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/memcontrol.c |   74 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0fe5177..c0a91a3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -142,8 +142,8 @@ struct mem_cgroup_stat_cpu {
 };
 
 struct mem_cgroup_reclaim_iter {
-	/* css_id of the last scanned hierarchy member */
-	int position;
+	/* last scanned hierarchy member with elevated ref count */
+	struct mem_cgroup *last_visited;
 	/* scan generation, increased every round-trip */
 	unsigned int generation;
 	/* lock to protect the position and generation */
@@ -1064,7 +1064,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 				   struct mem_cgroup_reclaim_cookie *reclaim)
 {
 	struct mem_cgroup *memcg = NULL;
-	int id = 0;
+	struct mem_cgroup *last_visited = NULL;
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -1073,7 +1073,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 		root = root_mem_cgroup;
 
 	if (prev && !reclaim)
-		id = css_id(&prev->css);
+		last_visited = prev;
 
 	if (prev && prev != root)
 		css_put(&prev->css);
@@ -1086,7 +1086,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 
 	while (!memcg) {
 		struct mem_cgroup_reclaim_iter *uninitialized_var(iter);
-		struct cgroup_subsys_state *css;
+		struct cgroup_subsys_state *css = NULL;
 
 		if (reclaim) {
 			int nid = zone_to_nid(reclaim->zone);
@@ -1096,30 +1096,76 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 			mz = mem_cgroup_zoneinfo(root, nid, zid);
 			iter = &mz->reclaim_iter[reclaim->priority];
 			spin_lock(&iter->iter_lock);
+			last_visited = iter->last_visited;
 			if (prev && reclaim->generation != iter->generation) {
+				if (last_visited) {
+					mem_cgroup_put(last_visited);
+					iter->last_visited = NULL;
+				}
 				spin_unlock(&iter->iter_lock);
 				return NULL;
 			}
-			id = iter->position;
+			/*
+			 * memcg is still valid because we hold a reference but
+			 * its cgroup might have vanished in the meantime so
+			 * we have to double check it is alive and restart the
+			 * tree walk otherwise.
+			 */
+			if (last_visited && !css_tryget(&last_visited->css)) {
+				mem_cgroup_put(last_visited);
+				last_visited = NULL;
+			}
 		}
 
 		rcu_read_lock();
-		css = css_get_next(&mem_cgroup_subsys, id + 1, &root->css, &id);
-		if (css) {
-			if (css == &root->css || css_tryget(css))
-				memcg = mem_cgroup_from_css(css);
-		} else
-			id = 0;
-		rcu_read_unlock();
+		/*
+		 * Root is not visited by cgroup iterators so it needs a special
+		 * treatment.
+		 */
+		if (!last_visited) {
+			css = &root->css;
+		} else {
+			struct cgroup *next_cgroup;
+
+			next_cgroup = cgroup_next_descendant_pre(
+					last_visited->css.cgroup,
+					root->css.cgroup);
+			if (next_cgroup)
+				css = cgroup_subsys_state(next_cgroup,
+						mem_cgroup_subsys_id);
+		}
+
+		/*
+		 * Even if we find a group we have to make sure it is alive.
+		 * css && !memcg means that the groups should be skipped and
+		 * we should continue the tree walk.
+		 */
+		if (css == &root->css || (css && css_tryget(css)))
+			memcg = mem_cgroup_from_css(css);
 
 		if (reclaim) {
-			iter->position = id;
+			struct mem_cgroup *curr = memcg;
+
+			if (last_visited) {
+				css_put(&last_visited->css);
+				mem_cgroup_put(last_visited);
+			}
+
+			if (css && !memcg)
+				curr = mem_cgroup_from_css(css);
+			if (curr)
+				mem_cgroup_get(curr);
+			iter->last_visited = curr;
+
 			if (!css)
 				iter->generation++;
 			else if (!prev && memcg)
 				reclaim->generation = iter->generation;
 			spin_unlock(&iter->iter_lock);
+		} else if (css && !memcg) {
+			last_visited = mem_cgroup_from_css(css);
 		}
+		rcu_read_unlock();
 
 		if (prev && !css)
 			return NULL;
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2012-11-15  9:52 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 15:30 [RFC] rework mem_cgroup iterator Michal Hocko
2012-11-13 15:30 ` Michal Hocko
2012-11-13 15:30 ` [RFC 1/5] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2012-11-13 15:30   ` Michal Hocko
2012-11-14  0:03   ` Kamezawa Hiroyuki
2012-11-14  0:03     ` Kamezawa Hiroyuki
2012-11-13 15:30 ` [RFC 2/5] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2012-11-13 15:30   ` Michal Hocko
2012-11-13 16:14   ` Tejun Heo
2012-11-13 16:14     ` Tejun Heo
2012-11-14  8:51     ` Michal Hocko
2012-11-14  8:51       ` Michal Hocko
2012-11-14 18:52       ` Tejun Heo
2012-11-14 18:52         ` Tejun Heo
2012-11-15  9:51         ` Michal Hocko
2012-11-15  9:51           ` Michal Hocko
2012-11-15 14:47           ` Tejun Heo
2012-11-15 14:47             ` Tejun Heo
2012-11-15 15:12             ` Michal Hocko
2012-11-15 15:12               ` Michal Hocko
2012-11-15 15:31               ` Tejun Heo
2012-11-15 15:31                 ` Tejun Heo
2012-11-15 16:15                 ` Michal Hocko
2012-11-15 16:15                   ` Michal Hocko
2012-11-14  0:20   ` Kamezawa Hiroyuki
2012-11-14  0:20     ` Kamezawa Hiroyuki
2012-11-14 10:10     ` Michal Hocko
2012-11-14 10:10       ` Michal Hocko
2012-11-15  4:12       ` Kamezawa Hiroyuki
2012-11-15  4:12         ` Kamezawa Hiroyuki
2012-11-15  9:52         ` Michal Hocko [this message]
2012-11-15  9:52           ` Michal Hocko
2012-11-19 14:05       ` Michal Hocko
2012-11-19 14:05         ` Michal Hocko
2012-11-19 15:11   ` Michal Hocko
2012-11-19 15:11     ` Michal Hocko
2012-11-13 15:30 ` [RFC 3/5] memcg: simplify mem_cgroup_iter Michal Hocko
2012-11-13 15:30   ` Michal Hocko
2012-11-13 15:30 ` [RFC 4/5] memcg: clean up mem_cgroup_iter Michal Hocko
2012-11-13 15:30   ` Michal Hocko
2012-11-13 15:30 ` [RFC 5/5] cgroup: remove css_get_next Michal Hocko
2012-11-13 15:30   ` Michal Hocko
2012-11-14  0:13 ` [RFC] rework mem_cgroup iterator Kamezawa Hiroyuki
2012-11-14  0:13   ` Kamezawa Hiroyuki
2012-11-14  1:55 ` Li Zefan
2012-11-14  1:55   ` Li Zefan
2012-11-14  8:36   ` Michal Hocko
2012-11-14  8:36     ` Michal Hocko
2012-11-14 18:30     ` Tejun Heo
2012-11-14 18:30       ` Tejun Heo
2012-11-15  2:12   ` Kamezawa Hiroyuki
2012-11-15  2:12     ` Kamezawa Hiroyuki
2012-11-14 16:17 ` Glauber Costa
2012-11-14 16:17   ` Glauber Costa
2012-11-14  8:40   ` Michal Hocko
2012-11-14  8:40     ` Michal Hocko
2012-11-14 18:41   ` Tejun Heo
2012-11-14 18:41     ` Tejun Heo
2012-11-15  2:44     ` Glauber Costa
2012-11-15  2:44       ` Glauber Costa
2012-11-14 18:46       ` Tejun Heo
2012-11-14 18:46         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121115095235.GC11990@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=htejun@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.