public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, Tejun Heo <tj@kernel.org>
Subject: [PATCH 10/14] cgroup: iterate cgroup_subsys_states directly
Date: Fri,  9 May 2014 17:31:27 -0400	[thread overview]
Message-ID: <1399671091-23867-11-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1399671091-23867-1-git-send-email-tj@kernel.org>

Currently, css_next_child() is implemented as finding the next child
cgroup which has the css enabled, which used to be the only way to do
it as only cgroups participated in sibling lists and thus could be
iteratd.  This works as long as what's required during iteration is
not missing online csses; however, it turns out that there are use
cases where offlined but not yet released csses need to be iterated.
This is difficult to implement through cgroup iteration the unified
hierarchy as there may be multiple dying csses for the same subsystem
associated with single cgroup.

After the recent changes, the cgroup self and regular csses behave
identically in how they're linked and unlinked from the sibling lists
including assertion of CSS_RELEASED and css_next_child() can simply
switch to iterating csses directly.  This both simplifies the logic
and ensures that all visible non-released csses are included in the
iteration whether there are multiple dying csses for a subsystem or
not.

As all other iterators depend on css_next_child() for sibling
iteration, this changes behaviors of all css iterators.  Add and
update explanations on the css states which are included in traversal
to all iterators.

As css iteration could always contain offlined csses, this shouldn't
break any of the current users and new usages which need iteration of
all on and offline csses can make use of the new semantics.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/cgroup.h | 44 ++++++++++++++++++++---------------
 kernel/cgroup.c        | 62 ++++++++++++++++++++++++++++++--------------------
 2 files changed, 63 insertions(+), 43 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 634ecc1..58867ed 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -759,14 +759,14 @@ struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss);
  * @pos: the css * to use as the loop cursor
  * @parent: css whose children to walk
  *
- * Walk @parent's children.  Must be called under rcu_read_lock().  A child
- * css which hasn't finished ->css_online() or already has finished
- * ->css_offline() may show up during traversal and it's each subsystem's
- * responsibility to verify that each @pos is alive.
+ * Walk @parent's children.  Must be called under rcu_read_lock().
  *
- * If a subsystem synchronizes against the parent in its ->css_online() and
- * before starting iterating, a css which finished ->css_online() is
- * guaranteed to be visible in the future iterations.
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
  *
  * It is allowed to temporarily drop RCU read lock during iteration.  The
  * caller is responsible for ensuring that @pos remains accessible until
@@ -789,17 +789,16 @@ css_rightmost_descendant(struct cgroup_subsys_state *pos);
  * @root: css whose descendants to walk
  *
  * Walk @root's descendants.  @root is included in the iteration and the
- * first node to be visited.  Must be called under rcu_read_lock().  A
- * descendant css which hasn't finished ->css_online() or already has
- * finished ->css_offline() may show up during traversal and it's each
- * subsystem's responsibility to verify that each @pos is alive.
+ * first node to be visited.  Must be called under rcu_read_lock().
  *
- * If a subsystem synchronizes against the parent in its ->css_online() and
- * before starting iterating, and synchronizes against @pos on each
- * iteration, any descendant css which finished ->css_online() is
- * guaranteed to be visible in the future iterations.
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
  *
- * In other words, the following guarantees that a descendant can't escape
+ * For example, the following guarantees that a descendant can't escape
  * state updates of its ancestors.
  *
  * my_online(@css)
@@ -855,8 +854,17 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
  *
  * Similar to css_for_each_descendant_pre() but performs post-order
  * traversal instead.  @root is included in the iteration and the last
- * node to be visited.  Note that the walk visibility guarantee described
- * in pre-order walk doesn't apply the same to post-order walks.
+ * node to be visited.
+ *
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
+ *
+ * Note that the walk visibility guarantee example described in pre-order
+ * walk doesn't apply the same to post-order walks.
  */
 #define css_for_each_descendant_post(pos, css)				\
 	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index eaea062..e0953f7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3088,21 +3088,25 @@ static int cgroup_task_count(const struct cgroup *cgrp)
 
 /**
  * css_next_child - find the next child of a given css
- * @pos_css: the current position (%NULL to initiate traversal)
- * @parent_css: css whose children to walk
+ * @pos: the current position (%NULL to initiate traversal)
+ * @parent: css whose children to walk
  *
- * This function returns the next child of @parent_css and should be called
+ * This function returns the next child of @parent and should be called
  * under either cgroup_mutex or RCU read lock.  The only requirement is
- * that @parent_css and @pos_css are accessible.  The next sibling is
- * guaranteed to be returned regardless of their states.
+ * that @parent and @pos are accessible.  The next sibling is guaranteed to
+ * be returned regardless of their states.
+ *
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
  */
-struct cgroup_subsys_state *
-css_next_child(struct cgroup_subsys_state *pos_css,
-	       struct cgroup_subsys_state *parent_css)
+struct cgroup_subsys_state *css_next_child(struct cgroup_subsys_state *pos,
+					   struct cgroup_subsys_state *parent)
 {
-	struct cgroup *pos = pos_css ? pos_css->cgroup : NULL;
-	struct cgroup *cgrp = parent_css->cgroup;
-	struct cgroup *next;
+	struct cgroup_subsys_state *next;
 
 	cgroup_assert_mutex_or_rcu_locked();
 
@@ -3127,27 +3131,21 @@ css_next_child(struct cgroup_subsys_state *pos_css,
 	 * races against release and the race window is very small.
 	 */
 	if (!pos) {
-		next = list_entry_rcu(cgrp->self.children.next, struct cgroup, self.sibling);
-	} else if (likely(!(pos->self.flags & CSS_RELEASED))) {
-		next = list_entry_rcu(pos->self.sibling.next, struct cgroup, self.sibling);
+		next = list_entry_rcu(parent->children.next, struct cgroup_subsys_state, sibling);
+	} else if (likely(!(pos->flags & CSS_RELEASED))) {
+		next = list_entry_rcu(pos->sibling.next, struct cgroup_subsys_state, sibling);
 	} else {
-		list_for_each_entry_rcu(next, &cgrp->self.children, self.sibling)
-			if (next->self.serial_nr > pos->self.serial_nr)
+		list_for_each_entry_rcu(next, &parent->children, sibling)
+			if (next->serial_nr > pos->serial_nr)
 				break;
 	}
 
 	/*
 	 * @next, if not pointing to the head, can be dereferenced and is
-	 * the next sibling; however, it might have @ss disabled.  If so,
-	 * fast-forward to the next enabled one.
+	 * the next sibling.
 	 */
-	while (&next->self.sibling != &cgrp->self.children) {
-		struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss);
-
-		if (next_css)
-			return next_css;
-		next = list_entry_rcu(next->self.sibling.next, struct cgroup, self.sibling);
-	}
+	if (&next->sibling != &parent->children)
+		return next;
 	return NULL;
 }
 
@@ -3164,6 +3162,13 @@ css_next_child(struct cgroup_subsys_state *pos_css,
  * doesn't require the whole traversal to be contained in a single critical
  * section.  This function will return the correct next descendant as long
  * as both @pos and @root are accessible and @pos is a descendant of @root.
+ *
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
  */
 struct cgroup_subsys_state *
 css_next_descendant_pre(struct cgroup_subsys_state *pos,
@@ -3251,6 +3256,13 @@ css_leftmost_descendant(struct cgroup_subsys_state *pos)
  * section.  This function will return the correct next descendant as long
  * as both @pos and @cgroup are accessible and @pos is a descendant of
  * @cgroup.
+ *
+ * If a subsystem synchronizes ->css_online() and the start of iteration, a
+ * css which finished ->css_online() is guaranteed to be visible in the
+ * future iterations and will stay visible until the last reference is put.
+ * A css which hasn't finished ->css_online() or already finished
+ * ->css_offline() may show up during traversal.  It's each subsystem's
+ * responsibility to synchronize against on/offlining.
  */
 struct cgroup_subsys_state *
 css_next_descendant_post(struct cgroup_subsys_state *pos,
-- 
1.9.0


  parent reply	other threads:[~2014-05-09 21:31 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-09 21:31 [PATCHSET cgroup/for-3.16] cgroup: iterate cgroup_subsys_states directly Tejun Heo
2014-05-09 21:31 ` [PATCH 01/14] cgroup: remove css_parent() Tejun Heo
2014-05-11  1:47   ` David Miller
2014-05-11 13:02   ` Neil Horman
2014-05-12 13:16   ` Michal Hocko
2014-05-13 18:50   ` [PATCH v2 " Tejun Heo
2014-05-09 21:31 ` [PATCH 02/14] cgroup: remove pointless has tasks/children test from mem_cgroup_force_empty() Tejun Heo
2014-05-12 14:53   ` Michal Hocko
2014-05-12 14:58     ` [PATCH] memcg: deprecate memory.force_empty knob Michal Hocko
2014-05-12 15:00       ` Tejun Heo
2014-05-12 15:20         ` Michal Hocko
2014-05-12 15:25           ` Tejun Heo
2014-05-12 15:34             ` Michal Hocko
2014-05-13 13:16               ` Johannes Weiner
2014-05-13 15:09                 ` Michal Hocko
2014-05-12 14:59     ` [PATCH 02/14] cgroup: remove pointless has tasks/children test from mem_cgroup_force_empty() Tejun Heo
2014-05-12 15:21       ` Michal Hocko
2014-05-13 13:10     ` Johannes Weiner
2014-05-13 16:46     ` Tejun Heo
2014-05-13 18:51     ` [PATCH UPDATED 02/14] memcg: remove " Tejun Heo
2014-05-09 21:31 ` [PATCH 03/14] memcg: update memcg_has_children() to use css_next_child() Tejun Heo
2014-05-12 15:18   ` Michal Hocko
2014-05-13 16:53   ` [PATCH v2 " Tejun Heo
2014-05-09 21:31 ` [PATCH 04/14] device_cgroup: remove direct access to cgroup->children Tejun Heo
2014-05-13 12:56   ` Aristeu Rozanski
2014-05-14 12:52   ` Serge E. Hallyn
2014-05-09 21:31 ` [PATCH 05/14] cgroup: remove cgroup->parent Tejun Heo
2014-05-09 21:31 ` [PATCH 06/14] cgroup: move cgroup->sibling and ->children into cgroup_subsys_state Tejun Heo
2014-05-09 21:31 ` [PATCH 07/14] cgroup: link all cgroup_subsys_states in their sibling lists Tejun Heo
2014-05-09 21:31 ` [PATCH 08/14] cgroup: move cgroup->serial_nr into cgroup_subsys_state Tejun Heo
2014-05-09 21:31 ` [PATCH 09/14] cgroup: introduce CSS_RELEASED and reduce css iteration fallback window Tejun Heo
2014-05-16 16:07   ` [PATCH v2 " Tejun Heo
2014-05-09 21:31 ` Tejun Heo [this message]
2014-05-09 21:31 ` [PATCH 11/14] cgroup: use CSS_ONLINE instead of CGRP_DEAD Tejun Heo
2014-05-09 21:31 ` [PATCH 12/14] cgroup: convert cgroup_has_live_children() into css_has_online_children() Tejun Heo
2014-05-09 21:31 ` [PATCH 13/14] device_cgroup: use css_has_online_children() instead of has_children() Tejun Heo
2014-05-13 12:56   ` Aristeu Rozanski
2014-05-14 12:53   ` Serge E. Hallyn
2014-05-09 21:31 ` [PATCH 14/14] cgroup: implement css_tryget() Tejun Heo
2014-05-11  4:54   ` Johannes Weiner
2014-05-11 12:38     ` Tejun Heo
2014-05-16 16:07   ` [PATCH v2 " Tejun Heo
2014-05-13 16:59 ` [PATCHSET cgroup/for-3.16] cgroup: iterate cgroup_subsys_states directly Tejun Heo
2014-05-14  4:21 ` Li Zefan
2014-05-14 13:07   ` Tejun Heo
2014-05-16  1:28     ` Li Zefan
2014-05-16  1:29 ` Li Zefan
2014-05-16 16:08 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1399671091-23867-11-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox