All of lore.kernel.org
 help / color / mirror / Atom feed
From: Colin Cross <ccross@android.com>
To: linux-kernel@vger.kernel.org
Cc: Colin Cross <ccross@android.com>, Paul Menage <menage@google.com>,
	Li Zefan <lizf@cn.fujitsu.com>,
	containers@lists.linux-foundation.org
Subject: [PATCH] cgroup: Convert synchronize_rcu to call_rcu in cgroup_attach_task
Date: Sun, 21 Nov 2010 20:06:07 -0800	[thread overview]
Message-ID: <1290398767-15230-1-git-send-email-ccross@android.com> (raw)
In-Reply-To: <AANLkTikx6d0_VFtZ4zWQucRCf=vFt7N2M6=0jpnKasEE@mail.gmail.com>

The synchronize_rcu call in cgroup_attach_task can be very
expensive.  All fastpath accesses to task->cgroups that expect
task->cgroups not to change already use task_lock() or
cgroup_lock() to protect against updates, and, in cgroup.c,
only the CGROUP_DEBUG files have RCU read-side critical
sections.

sched.c uses RCU read-side-critical sections on task->cgroups,
but only to ensure that a dereference of task->cgroups does
not become invalid, not that it doesn't change.

This patch adds a function put_css_set_rcu, which delays the
put until after a grace period has elapsed.  This ensures that
any RCU read-side critical sections that dereferenced
task->cgroups in sched.c have completed before the css_set is
deleted.  The synchronize_rcu()/put_css_set() combo in
cgroup_attach_task() can then be replaced with
put_css_set_rcu().

Also converts the CGROUP_DEBUG files that access
current->cgroups to use task_lock(current) instead of
rcu_read_lock().

Signed-off-by: Colin Cross <ccross@android.com>

---

This version fixes the problems with the previous patch by
keeping the use of RCU in cgroup_attach_task, but allowing
cgroup_attach_task to return immediately by deferring the
final put_css_reg to an rcu callback.

 include/linux/cgroup.h |    4 +++
 kernel/cgroup.c        |   58 ++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..fd26218 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -287,6 +287,10 @@ struct css_set {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	/* For RCU-delayed puts */
+	atomic_t delayed_put_count;
+	struct work_struct delayed_put_work;
 };
 
 /*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66a416b..c7348e7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -298,7 +298,8 @@ static int cgroup_init_idr(struct cgroup_subsys *ss,
 
 /* css_set_lock protects the list of css_set objects, and the
  * chain of tasks off each css_set.  Nests outside task->alloc_lock
- * due to cgroup_iter_start() */
+ * due to cgroup_iter_start().  Never locked in irq context, so
+ * the non-irq variants of write_lock and read_lock are used. */
 static DEFINE_RWLOCK(css_set_lock);
 static int css_set_count;
 
@@ -396,6 +397,39 @@ static inline void put_css_set_taskexit(struct css_set *cg)
 	__put_css_set(cg, 1);
 }
 
+/* work function, executes in process context */
+static void __put_css_set_rcu(struct work_struct *work)
+{
+	struct css_set *cg;
+	cg = container_of(work, struct css_set, delayed_put_work);
+
+	while (atomic_add_unless(&cg->delayed_put_count, -1, 0))
+		put_css_set(cg);
+}
+
+/* rcu callback, executes in softirq context */
+static void _put_css_set_rcu(struct rcu_head *obj)
+{
+	struct css_set *cg = container_of(obj, struct css_set, rcu_head);
+
+	/* the rcu callback happens in softirq context, but css_set_lock
+	 * is not irq safe, so bounce to process context.
+	 */
+	schedule_work(&cg->delayed_put_work);
+}
+
+/* put_css_set_rcu - helper function to delay a put until after an rcu
+ * grace period
+ *
+ * free_css_set_rcu can never be called if there are outstanding
+ * put_css_set_rcu calls, so we can reuse cg->rcu_head.
+ */
+static inline void put_css_set_rcu(struct css_set *cg)
+{
+	if (atomic_inc_return(&cg->delayed_put_count) == 1)
+		call_rcu(&cg->rcu_head, _put_css_set_rcu);
+}
+
 /*
  * compare_css_sets - helper function for find_existing_css_set().
  * @cg: candidate css_set being tested
@@ -620,9 +654,11 @@ static struct css_set *find_css_set(
 	}
 
 	atomic_set(&res->refcount, 1);
+	atomic_set(&res->delayed_put_count, 0);
 	INIT_LIST_HEAD(&res->cg_links);
 	INIT_LIST_HEAD(&res->tasks);
 	INIT_HLIST_NODE(&res->hlist);
+	INIT_WORK(&res->delayed_put_work, __put_css_set_rcu);
 
 	/* Copy the set of subsystem state objects generated in
 	 * find_existing_css_set() */
@@ -725,9 +761,9 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task,
  * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
  * another.  It does so using cgroup_mutex, however there are
  * several performance critical places that need to reference
- * task->cgroup without the expense of grabbing a system global
+ * task->cgroups without the expense of grabbing a system global
  * mutex.  Therefore except as noted below, when dereferencing or, as
- * in cgroup_attach_task(), modifying a task'ss cgroup pointer we use
+ * in cgroup_attach_task(), modifying a task's cgroups pointer we use
  * task_lock(), which acts on a spinlock (task->alloc_lock) already in
  * the task_struct routinely used for such matters.
  *
@@ -1802,8 +1838,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 			ss->attach(ss, cgrp, oldcgrp, tsk, false);
 	}
 	set_bit(CGRP_RELEASABLE, &oldcgrp->flags);
-	synchronize_rcu();
-	put_css_set(cg);
+	put_css_set_rcu(cg);
 
 	/*
 	 * wake up rmdir() waiter. the rmdir should fail since the cgroup
@@ -3900,6 +3935,7 @@ int __init cgroup_init_early(void)
 	INIT_LIST_HEAD(&init_css_set.cg_links);
 	INIT_LIST_HEAD(&init_css_set.tasks);
 	INIT_HLIST_NODE(&init_css_set.hlist);
+	INIT_WORK(&init_css_set.delayed_put_work, __put_css_set_rcu);
 	css_set_count = 1;
 	init_cgroup_root(&rootnode);
 	root_count = 1;
@@ -4827,9 +4863,9 @@ static u64 current_css_set_refcount_read(struct cgroup *cont,
 {
 	u64 count;
 
-	rcu_read_lock();
+	task_lock(current);
 	count = atomic_read(&current->cgroups->refcount);
-	rcu_read_unlock();
+	task_unlock(current);
 	return count;
 }
 
@@ -4838,12 +4874,10 @@ static int current_css_set_cg_links_read(struct cgroup *cont,
 					 struct seq_file *seq)
 {
 	struct cg_cgroup_link *link;
-	struct css_set *cg;
 
 	read_lock(&css_set_lock);
-	rcu_read_lock();
-	cg = rcu_dereference(current->cgroups);
-	list_for_each_entry(link, &cg->cg_links, cg_link_list) {
+	task_lock(current);
+	list_for_each_entry(link, &current->cgroups->cg_links, cg_link_list) {
 		struct cgroup *c = link->cgrp;
 		const char *name;
 
@@ -4854,7 +4888,7 @@ static int current_css_set_cg_links_read(struct cgroup *cont,
 		seq_printf(seq, "Root %d group %s\n",
 			   c->root->hierarchy_id, name);
 	}
-	rcu_read_unlock();
+	task_unlock(current);
 	read_unlock(&css_set_lock);
 	return 0;
 }
-- 
1.7.3.1


  parent reply	other threads:[~2010-11-22  4:06 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-21  2:00 [PATCH] cgroup: Remove RCU from task->cgroups Colin Cross
     [not found] ` <1290304824-22722-1-git-send-email-ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>
2010-11-21 23:02   ` Colin Cross
2010-11-21 23:02 ` Colin Cross
     [not found]   ` <AANLkTikx6d0_VFtZ4zWQucRCf=vFt7N2M6=0jpnKasEE-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-22  4:06     ` [PATCH] cgroup: Convert synchronize_rcu to call_rcu in cgroup_attach_task Colin Cross
2010-11-22  4:06   ` Colin Cross [this message]
2010-11-23  8:14     ` Li Zefan
     [not found]       ` <4CEB77E0.10202-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2010-11-23  8:58         ` Colin Cross
2010-11-23  8:58       ` Colin Cross
2010-11-23 20:22         ` Colin Cross
     [not found]         ` <AANLkTimjpW6NZ6fEiVi0VzjkpQGVob4=VHsohXUiDQkJ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-23 20:22           ` Colin Cross
2010-11-24  1:24     ` Paul Menage
     [not found]       ` <AANLkTi=4-OgPUugnUBaqSU3oC=3wxTjAsOB_Ais3Or+i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-24  1:43         ` [PATCH] cgroup: Remove call to synchronize_rcu " Colin Cross
2010-11-24  1:43           ` Colin Cross
     [not found]           ` <1290563018-2804-1-git-send-email-ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>
2010-11-24  2:29             ` Colin Cross
2011-01-22  1:17             ` Bryan Huntsman
2011-01-28  1:17             ` Bryan Huntsman
2010-11-24  2:29           ` Colin Cross
2011-01-22  1:17           ` Bryan Huntsman
     [not found]             ` <4D3A3024.9040402-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2011-01-22  2:04               ` Colin Cross
2011-01-22  2:04             ` Colin Cross
2011-01-28  1:17           ` Bryan Huntsman
2010-11-24  2:06         ` [PATCH] cgroup: Convert synchronize_rcu to call_rcu " Li Zefan
2010-11-24  2:06       ` Li Zefan
2010-11-24  2:10         ` Colin Cross
2010-11-24  5:37           ` [PATCH 1/2] cgroup: Set CGRP_RELEASABLE when adding to a cgroup Colin Cross
2010-11-24 23:54             ` Paul Menage
     [not found]               ` <AANLkTimFqJ+qPidS_81DKd7ExSxDG7GNi0gjcUEEq_7j-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-25  0:11                 ` Colin Cross
2010-11-25  0:11                   ` Colin Cross
     [not found]                   ` <AANLkTimwvP2Ey1gJ6AbbFNtDKjGZt4cwqL=08nGBa_PT-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-25  0:18                     ` Colin Cross
2010-11-25  0:21                     ` Paul Menage
2010-11-25  0:18                   ` Colin Cross
2010-11-25  0:21                   ` Paul Menage
     [not found]                     ` <AANLkTimJA52-GTM=AzS+tkOugrsi6Keh0_j87vK1BkGv-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-03  3:07                       ` Colin Cross
2010-12-03  3:07                     ` Colin Cross
     [not found]                       ` <AANLkTim67fLN+PYz-P0TM0QRmvQKP80tyXSNKNSZhFZ2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-17  0:54                         ` Paul Menage
2010-12-17  0:54                       ` Paul Menage
     [not found]                         ` <AANLkTinZarXbEyb1xfJWjG4gN2qhTVTXTdso4Cym5M9T-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-17  1:12                           ` Colin Cross
2010-12-17  1:12                         ` Colin Cross
     [not found]             ` <1290577024-12347-1-git-send-email-ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>
2010-11-24 23:54               ` Paul Menage
2011-01-28  1:17               ` Bryan Huntsman
2011-01-28  1:17             ` Bryan Huntsman
2011-01-28  1:30               ` Paul Menage
     [not found]                 ` <AANLkTin7B51maXHRH+FNmZ14bmWmEp9P2=2QTNqgq_Fi-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-28  1:48                   ` Michael Bohan
2011-01-28  1:48                 ` Michael Bohan
     [not found]               ` <4D42192C.9000701-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2011-01-28  1:30                 ` Paul Menage
2010-11-24  5:37           ` [PATCH 2/2] cgroup: Remove call to synchronize_rcu in cgroup_attach_task Colin Cross
2011-01-28  1:17             ` Bryan Huntsman
     [not found]             ` <1290577024-12347-2-git-send-email-ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>
2011-01-28  1:17               ` Bryan Huntsman
     [not found]           ` <AANLkTi=6nwDCdzDz7E2EaAw2pf3KUVjmKMRqGfz5zVhP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-24  5:37             ` [PATCH 1/2] cgroup: Set CGRP_RELEASABLE when adding to a cgroup Colin Cross
2010-11-24  5:37             ` [PATCH 2/2] cgroup: Remove call to synchronize_rcu in cgroup_attach_task Colin Cross
     [not found]         ` <4CEC7329.7070909-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2010-11-24  2:10           ` [PATCH] cgroup: Convert synchronize_rcu to call_rcu " Colin Cross
2010-11-24 18:58           ` Paul Menage
2010-11-24 18:58         ` Paul Menage
     [not found]     ` <1290398767-15230-1-git-send-email-ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>
2010-11-23  8:14       ` Li Zefan
2010-11-24  1:24       ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290398767-15230-1-git-send-email-ccross@android.com \
    --to=ccross@android.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.