* [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem
@ 2015-05-13 20:35 Tejun Heo
[not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Tejun Heo @ 2015-05-13 20:35 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA,
peterz-wEGCiKHe2LqWVfeAwA7xHQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
threadgroup locking was added because cgroup needs threadgroups to
stay stable across attach operations. It was implemented as a
per-signal_struct generic locking mechanism so that other users which
require threadgroups stable across blocking operations can use it too;
however, it hasn't grown any other use cases and still conditionalized
on CONFIG_CGROUPS.
It turns out that a single percpu_rwsem would serve cgroup's use case
better as it's lighter on the thread operations and allows atomic
operations across multiple processes. We can change the generic
threadgroup lock mechanism to do that but it's pointless to put this
in core task code. What's necessary from the task code is a place
which subsystems needing synchronization against threadgroup
operations can hook into so that they can implement the necessary
exclusion.
This patchset moves the specific locking details into cgroup proper
leaving threadgroup_changes_begin/end() as the section markers and
then converts the cgroup specific locking to use a percp_rwsem.
This patchset contains the following three patches.
0001-sched-cgroup-reorganize-threadgroup-locking.patch
0002-sched-cgroup-replace-signal_struct-group_rwsem-with-.patch
0003-cgroup-simplify-threadgroup-locking.patch
0001 moves threadgroup locking details into cgroup proper.
0002-0003 convert per-signal_struct group_rwsem with a global
percpu_rwsem.
This patchset is on top of
cgroup/for-4.2 d0f702e648dc ("cgroup: fix some comment typos")
+ [1] [PATCH] cgroup: separate out include/linux/cgroup-defs.h
+ [2] [PATCH] cgroup: reorganize include/linux/cgroup.h
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-threadgroup-percpu-rwsem
diffstat follows. Thanks.
include/linux/cgroup-defs.h | 33 ++++++++++++++++++++
include/linux/init_task.h | 8 ----
include/linux/sched.h | 65 ++++++++++------------------------------
init/Kconfig | 1
kernel/cgroup.c | 71 ++++++++++++++++----------------------------
kernel/fork.c | 4 --
6 files changed, 78 insertions(+), 104 deletions(-)
--
tejun
[1] http://lkml.kernel.org/g/20150513193840.GC11388-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org
[2] http://lkml.kernel.org/g/20150513202416.GE11388-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* [PATCH 1/3] sched, cgroup: reorganize threadgroup locking [not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2015-05-13 20:35 ` Tejun Heo [not found] ` <1431549318-16756-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2015-05-18 16:34 ` [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo 2015-05-27 0:34 ` Tejun Heo 2 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2015-05-13 20:35 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, peterz-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo threadgroup_change_begin/end() are used to mark the beginning and end of threadgroup modifying operations to allow code paths which require a threadgroup to stay stable across blocking operations to synchronize against those sections using threadgroup_lock/unlock(). It's currently implemented as a general mechanism in sched.h using per-signal_struct rwsem; however, this never grew non-cgroup use cases and becomes noop if !CONFIG_CGROUPS. It turns out that cgroups is gonna be better served with a different sycnrhonization scheme and is a bit silly to keep cgroups specific details as a general mechanism. What's general here is identifying the places where threadgroups are modified. This patch restructures threadgroup locking so that threadgroup_change_begin/end() become a place where subsystems which need to sycnhronize against threadgroup changes can hook into. cgroup_threadgroup_change_begin/end() which operate on the per-signal_struct rwsem are created and threadgroup_lock/unlock() are moved to cgroup.c and made static. This is pure reorganization which doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> --- include/linux/cgroup-defs.h | 10 +++++++++ include/linux/sched.h | 53 +++++++++++++++------------------------------ kernel/cgroup.c | 42 +++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+), 36 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 55f3120..1b8c938 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -14,6 +14,7 @@ #include <linux/mutex.h> #include <linux/rcupdate.h> #include <linux/percpu-refcount.h> +#include <linux/percpu-rwsem.h> #include <linux/workqueue.h> #ifdef CONFIG_CGROUPS @@ -460,5 +461,14 @@ struct cgroup_subsys { unsigned int depends_on; }; +void cgroup_threadgroup_change_begin(struct task_struct *tsk); +void cgroup_threadgroup_change_end(struct task_struct *tsk); + +#else /* CONFIG_CGROUPS */ + +static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {} +static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {} + #endif /* CONFIG_CGROUPS */ + #endif /* _LINUX_CGROUP_DEFS_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 8222ae4..5ee2900 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -58,6 +58,7 @@ struct sched_param { #include <linux/uidgid.h> #include <linux/gfp.h> #include <linux/magic.h> +#include <linux/cgroup-defs.h> #include <asm/processor.h> @@ -2648,53 +2649,33 @@ static inline void unlock_task_sighand(struct task_struct *tsk, spin_unlock_irqrestore(&tsk->sighand->siglock, *flags); } -#ifdef CONFIG_CGROUPS -static inline void threadgroup_change_begin(struct task_struct *tsk) -{ - down_read(&tsk->signal->group_rwsem); -} -static inline void threadgroup_change_end(struct task_struct *tsk) -{ - up_read(&tsk->signal->group_rwsem); -} - /** - * threadgroup_lock - lock threadgroup - * @tsk: member task of the threadgroup to lock - * - * Lock the threadgroup @tsk belongs to. No new task is allowed to enter - * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or - * change ->group_leader/pid. This is useful for cases where the threadgroup - * needs to stay stable across blockable operations. + * threadgroup_change_begin - mark the beginning of changes to a threadgroup + * @tsk: task causing the changes * - * fork and exit paths explicitly call threadgroup_change_{begin|end}() for - * synchronization. While held, no new task will be added to threadgroup - * and no existing live task will have its PF_EXITING set. - * - * de_thread() does threadgroup_change_{begin|end}() when a non-leader - * sub-thread becomes a new leader. + * All operations which modify a threadgroup - a new thread joining the + * group, death of a member thread (the assertion of PF_EXITING) and + * exec(2) dethreading the process and replacing the leader - are wrapped + * by threadgroup_change_{begin|end}(). This is to provide a place which + * subsystems needing threadgroup stability can hook into for + * synchronization. */ -static inline void threadgroup_lock(struct task_struct *tsk) +static inline void threadgroup_change_begin(struct task_struct *tsk) { - down_write(&tsk->signal->group_rwsem); + might_sleep(); + cgroup_threadgroup_change_begin(tsk); } /** - * threadgroup_unlock - unlock threadgroup - * @tsk: member task of the threadgroup to unlock + * threadgroup_change_end - mark the end of changes to a threadgroup + * @tsk: task causing the changes * - * Reverse threadgroup_lock(). + * See threadgroup_change_begin(). */ -static inline void threadgroup_unlock(struct task_struct *tsk) +static inline void threadgroup_change_end(struct task_struct *tsk) { - up_write(&tsk->signal->group_rwsem); + cgroup_threadgroup_change_end(tsk); } -#else -static inline void threadgroup_change_begin(struct task_struct *tsk) {} -static inline void threadgroup_change_end(struct task_struct *tsk) {} -static inline void threadgroup_lock(struct task_struct *tsk) {} -static inline void threadgroup_unlock(struct task_struct *tsk) {} -#endif #ifndef __HAVE_THREAD_FUNCTIONS diff --git a/kernel/cgroup.c b/kernel/cgroup.c index cfa27f9..9309452 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -848,6 +848,48 @@ static struct css_set *find_css_set(struct css_set *old_cset, return cset; } +void cgroup_threadgroup_change_begin(struct task_struct *tsk) +{ + down_read(&tsk->signal->group_rwsem); +} + +void cgroup_threadgroup_change_end(struct task_struct *tsk) +{ + up_read(&tsk->signal->group_rwsem); +} + +/** + * threadgroup_lock - lock threadgroup + * @tsk: member task of the threadgroup to lock + * + * Lock the threadgroup @tsk belongs to. No new task is allowed to enter + * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or + * change ->group_leader/pid. This is useful for cases where the threadgroup + * needs to stay stable across blockable operations. + * + * fork and exit explicitly call threadgroup_change_{begin|end}() for + * synchronization. While held, no new task will be added to threadgroup + * and no existing live task will have its PF_EXITING set. + * + * de_thread() does threadgroup_change_{begin|end}() when a non-leader + * sub-thread becomes a new leader. + */ +static void threadgroup_lock(struct task_struct *tsk) +{ + down_write(&tsk->signal->group_rwsem); +} + +/** + * threadgroup_unlock - unlock threadgroup + * @tsk: member task of the threadgroup to unlock + * + * Reverse threadgroup_lock(). + */ +static inline void threadgroup_unlock(struct task_struct *tsk) +{ + up_write(&tsk->signal->group_rwsem); +} + static struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root) { struct cgroup *root_cgrp = kf_root->kn->priv; -- 2.1.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <1431549318-16756-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 1/3] sched, cgroup: reorganize threadgroup locking [not found] ` <1431549318-16756-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2015-05-14 1:09 ` Sergey Senozhatsky 2015-05-14 15:17 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Sergey Senozhatsky @ 2015-05-14 1:09 UTC (permalink / raw) To: Tejun Heo Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, peterz-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, On (05/13/15 16:35), Tejun Heo wrote: [..] > -static inline void threadgroup_lock(struct task_struct *tsk) > +static inline void threadgroup_change_begin(struct task_struct *tsk) > { > - down_write(&tsk->signal->group_rwsem); > + might_sleep(); I think cgroup_threadgroup_change_begin()->down_read() already does might_sleep() check. -ss > + cgroup_threadgroup_change_begin(tsk); > } > > /** > - * threadgroup_unlock - unlock threadgroup > - * @tsk: member task of the threadgroup to unlock > + * threadgroup_change_end - mark the end of changes to a threadgroup > + * @tsk: task causing the changes > * > - * Reverse threadgroup_lock(). > + * See threadgroup_change_begin(). > */ > -static inline void threadgroup_unlock(struct task_struct *tsk) > +static inline void threadgroup_change_end(struct task_struct *tsk) > { > - up_write(&tsk->signal->group_rwsem); > + cgroup_threadgroup_change_end(tsk); > } > -#else > -static inline void threadgroup_change_begin(struct task_struct *tsk) {} > -static inline void threadgroup_change_end(struct task_struct *tsk) {} > -static inline void threadgroup_lock(struct task_struct *tsk) {} > -static inline void threadgroup_unlock(struct task_struct *tsk) {} > -#endif > > #ifndef __HAVE_THREAD_FUNCTIONS > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index cfa27f9..9309452 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -848,6 +848,48 @@ static struct css_set *find_css_set(struct css_set *old_cset, > return cset; > } > > +void cgroup_threadgroup_change_begin(struct task_struct *tsk) > +{ > + down_read(&tsk->signal->group_rwsem); > +} > + > +void cgroup_threadgroup_change_end(struct task_struct *tsk) > +{ > + up_read(&tsk->signal->group_rwsem); > +} > + > +/** > + * threadgroup_lock - lock threadgroup > + * @tsk: member task of the threadgroup to lock > + * > + * Lock the threadgroup @tsk belongs to. No new task is allowed to enter > + * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or > + * change ->group_leader/pid. This is useful for cases where the threadgroup > + * needs to stay stable across blockable operations. > + * > + * fork and exit explicitly call threadgroup_change_{begin|end}() for > + * synchronization. While held, no new task will be added to threadgroup > + * and no existing live task will have its PF_EXITING set. > + * > + * de_thread() does threadgroup_change_{begin|end}() when a non-leader > + * sub-thread becomes a new leader. > + */ > +static void threadgroup_lock(struct task_struct *tsk) > +{ > + down_write(&tsk->signal->group_rwsem); > +} > + > +/** > + * threadgroup_unlock - unlock threadgroup > + * @tsk: member task of the threadgroup to unlock > + * > + * Reverse threadgroup_lock(). > + */ > +static inline void threadgroup_unlock(struct task_struct *tsk) > +{ > + up_write(&tsk->signal->group_rwsem); > +} > + > static struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root) > { > struct cgroup *root_cgrp = kf_root->kn->priv; > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] sched, cgroup: reorganize threadgroup locking 2015-05-14 1:09 ` Sergey Senozhatsky @ 2015-05-14 15:17 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2015-05-14 15:17 UTC (permalink / raw) To: Sergey Senozhatsky Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, peterz-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, Sergey. On Thu, May 14, 2015 at 10:09:13AM +0900, Sergey Senozhatsky wrote: > > +static inline void threadgroup_change_begin(struct task_struct *tsk) > > { > > - down_write(&tsk->signal->group_rwsem); > > + might_sleep(); > > I think cgroup_threadgroup_change_begin()->down_read() already does > might_sleep() check. Sure but it's a layering thing. threadgroup_change_begin() should be called from a blockable context whether the hook users actually make use of it or not. e.g. We want might_sleep() even when !CONFIG_CGROUP. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem [not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2015-05-13 20:35 ` [PATCH 1/3] sched, cgroup: reorganize threadgroup locking Tejun Heo @ 2015-05-18 16:34 ` Tejun Heo [not found] ` <20150518163440.GA24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2015-05-27 0:34 ` Tejun Heo 2 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2015-05-18 16:34 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, peterz-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, May 13, 2015 at 04:35:15PM -0400, Tejun Heo wrote: > threadgroup locking was added because cgroup needs threadgroups to > stay stable across attach operations. It was implemented as a > per-signal_struct generic locking mechanism so that other users which > require threadgroups stable across blocking operations can use it too; > however, it hasn't grown any other use cases and still conditionalized > on CONFIG_CGROUPS. Ingo, Peter, what do you guys think? If you guys are okay with the changes, how do you want to route the patches? Given that it's mostly cgroup-specific and there are more cgroup changes depending on these, it'd be the easiest to route these through the cgroup tree but putting it elsewhere and pulling into cgroup is fine too. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20150518163440.GA24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem [not found] ` <20150518163440.GA24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2015-05-18 20:06 ` Peter Zijlstra 0 siblings, 0 replies; 14+ messages in thread From: Peter Zijlstra @ 2015-05-18 20:06 UTC (permalink / raw) To: Tejun Heo Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Mon, May 18, 2015 at 12:34:40PM -0400, Tejun Heo wrote: > On Wed, May 13, 2015 at 04:35:15PM -0400, Tejun Heo wrote: > > threadgroup locking was added because cgroup needs threadgroups to > > stay stable across attach operations. It was implemented as a > > per-signal_struct generic locking mechanism so that other users which > > require threadgroups stable across blocking operations can use it too; > > however, it hasn't grown any other use cases and still conditionalized > > on CONFIG_CGROUPS. > > Ingo, Peter, what do you guys think? I had a brief look and didn't spot anhything really weird. I'll try and give is a little more time tomorrow. On routing I think you can take it through the cgroup tree if Ingo is ok with that. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem [not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2015-05-13 20:35 ` [PATCH 1/3] sched, cgroup: reorganize threadgroup locking Tejun Heo 2015-05-18 16:34 ` [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo @ 2015-05-27 0:34 ` Tejun Heo 2 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2015-05-27 0:34 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, peterz-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, May 13, 2015 at 04:35:15PM -0400, Tejun Heo wrote: > threadgroup locking was added because cgroup needs threadgroups to > stay stable across attach operations. It was implemented as a > per-signal_struct generic locking mechanism so that other users which > require threadgroups stable across blocking operations can use it too; > however, it hasn't grown any other use cases and still conditionalized > on CONFIG_CGROUPS. Applying to cgroups/for-4.2. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem 2015-05-13 20:35 [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo [not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2015-05-13 20:35 ` Tejun Heo 2015-05-19 15:16 ` Peter Zijlstra 2015-05-13 20:35 ` [PATCH 3/3] cgroup: simplify threadgroup locking Tejun Heo 2 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2015-05-13 20:35 UTC (permalink / raw) To: lizefan; +Cc: cgroups, mingo, peterz, linux-kernel, Tejun Heo The cgroup side of threadgroup locking uses signal_struct->group_rwsem to synchronize against threadgroup changes. This per-process rwsem adds small overhead to thread creation, exit and exec paths, forces cgroup code paths to do lock-verify-unlock-retry dance in a couple places and makes it impossible to atomically perform operations across multiple processes. This patch replaces signal_struct->group_rwsem with a global percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader side and contained in cgroups proper. This patch converts one-to-one. This does make writer side heavier and lower the granularity; however, cgroup process migration is a fairly cold path, we do want to optimize thread operations over it and cgroup migration operations don't take enough time for the lower granularity to matter. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> --- include/linux/cgroup-defs.h | 27 ++++++++++++++-- include/linux/init_task.h | 8 ----- include/linux/sched.h | 12 ------- init/Kconfig | 1 + kernel/cgroup.c | 77 ++++++++++++--------------------------------- kernel/fork.c | 4 --- 6 files changed, 46 insertions(+), 83 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1b8c938..7d83d7f 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -461,8 +461,31 @@ struct cgroup_subsys { unsigned int depends_on; }; -void cgroup_threadgroup_change_begin(struct task_struct *tsk); -void cgroup_threadgroup_change_end(struct task_struct *tsk); +extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; + +/** + * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups + * @tsk: target task + * + * Called from threadgroup_change_begin() and allows cgroup operations to + * synchronize against threadgroup changes using a percpu_rw_semaphore. + */ +static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) +{ + percpu_down_read(&cgroup_threadgroup_rwsem); +} + +/** + * cgroup_threadgroup_change_end - threadgroup exclusion for cgroups + * @tsk: target task + * + * Called from threadgroup_change_end(). Counterpart of + * cgroup_threadcgroup_change_begin(). + */ +static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) +{ + percpu_up_read(&cgroup_threadgroup_rwsem); +} #else /* CONFIG_CGROUPS */ diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 696d223..0cc0bbf 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -25,13 +25,6 @@ extern struct files_struct init_files; extern struct fs_struct init_fs; -#ifdef CONFIG_CGROUPS -#define INIT_GROUP_RWSEM(sig) \ - .group_rwsem = __RWSEM_INITIALIZER(sig.group_rwsem), -#else -#define INIT_GROUP_RWSEM(sig) -#endif - #ifdef CONFIG_CPUSETS #define INIT_CPUSET_SEQ(tsk) \ .mems_allowed_seq = SEQCNT_ZERO(tsk.mems_allowed_seq), @@ -56,7 +49,6 @@ extern struct fs_struct init_fs; }, \ .cred_guard_mutex = \ __MUTEX_INITIALIZER(sig.cred_guard_mutex), \ - INIT_GROUP_RWSEM(sig) \ } extern struct nsproxy init_nsproxy; diff --git a/include/linux/sched.h b/include/linux/sched.h index 5ee2900..add524a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -743,18 +743,6 @@ struct signal_struct { unsigned audit_tty_log_passwd; struct tty_audit_buf *tty_audit_buf; #endif -#ifdef CONFIG_CGROUPS - /* - * group_rwsem prevents new tasks from entering the threadgroup and - * member tasks from exiting,a more specifically, setting of - * PF_EXITING. fork and exit paths are protected with this rwsem - * using threadgroup_change_begin/end(). Users which require - * threadgroup to remain stable should use threadgroup_[un]lock() - * which also takes care of exec path. Currently, cgroup is the - * only user. - */ - struct rw_semaphore group_rwsem; -#endif oom_flags_t oom_flags; short oom_score_adj; /* OOM kill score adjustment */ diff --git a/init/Kconfig b/init/Kconfig index dc24dec..b9b824b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -938,6 +938,7 @@ config NUMA_BALANCING_DEFAULT_ENABLED menuconfig CGROUPS bool "Control Group support" select KERNFS + select PERCPU_RWSEM help This option adds support for grouping sets of processes together, for use with process control subsystems such as Cpusets, CFS, memory diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9309452..5435b9d 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -46,6 +46,7 @@ #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/rwsem.h> +#include <linux/percpu-rwsem.h> #include <linux/string.h> #include <linux/sort.h> #include <linux/kmod.h> @@ -103,6 +104,8 @@ static DEFINE_SPINLOCK(cgroup_idr_lock); */ static DEFINE_SPINLOCK(release_agent_path_lock); +struct percpu_rw_semaphore cgroup_threadgroup_rwsem; + #define cgroup_assert_mutex_or_rcu_locked() \ rcu_lockdep_assert(rcu_read_lock_held() || \ lockdep_is_held(&cgroup_mutex), \ @@ -848,48 +851,6 @@ static struct css_set *find_css_set(struct css_set *old_cset, return cset; } -void cgroup_threadgroup_change_begin(struct task_struct *tsk) -{ - down_read(&tsk->signal->group_rwsem); -} - -void cgroup_threadgroup_change_end(struct task_struct *tsk) -{ - up_read(&tsk->signal->group_rwsem); -} - -/** - * threadgroup_lock - lock threadgroup - * @tsk: member task of the threadgroup to lock - * - * Lock the threadgroup @tsk belongs to. No new task is allowed to enter - * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or - * change ->group_leader/pid. This is useful for cases where the threadgroup - * needs to stay stable across blockable operations. - * - * fork and exit explicitly call threadgroup_change_{begin|end}() for - * synchronization. While held, no new task will be added to threadgroup - * and no existing live task will have its PF_EXITING set. - * - * de_thread() does threadgroup_change_{begin|end}() when a non-leader - * sub-thread becomes a new leader. - */ -static void threadgroup_lock(struct task_struct *tsk) -{ - down_write(&tsk->signal->group_rwsem); -} - -/** - * threadgroup_unlock - unlock threadgroup - * @tsk: member task of the threadgroup to unlock - * - * Reverse threadgroup_lock(). - */ -static inline void threadgroup_unlock(struct task_struct *tsk) -{ - up_write(&tsk->signal->group_rwsem); -} - static struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root) { struct cgroup *root_cgrp = kf_root->kn->priv; @@ -2094,9 +2055,9 @@ static void cgroup_task_migrate(struct cgroup *old_cgrp, lockdep_assert_held(&css_set_rwsem); /* - * We are synchronized through threadgroup_lock() against PF_EXITING - * setting such that we can't race against cgroup_exit() changing the - * css_set to init_css_set and dropping the old one. + * We are synchronized through cgroup_threadgroup_rwsem against + * PF_EXITING setting such that we can't race against cgroup_exit() + * changing the css_set to init_css_set and dropping the old one. */ WARN_ON_ONCE(tsk->flags & PF_EXITING); old_cset = task_css_set(tsk); @@ -2153,10 +2114,11 @@ static void cgroup_migrate_finish(struct list_head *preloaded_csets) * @src_cset and add it to @preloaded_csets, which should later be cleaned * up by cgroup_migrate_finish(). * - * This function may be called without holding threadgroup_lock even if the - * target is a process. Threads may be created and destroyed but as long - * as cgroup_mutex is not dropped, no new css_set can be put into play and - * the preloaded css_sets are guaranteed to cover all migrations. + * This function may be called without holding cgroup_threadgroup_rwsem + * even if the target is a process. Threads may be created and destroyed + * but as long as cgroup_mutex is not dropped, no new css_set can be put + * into play and the preloaded css_sets are guaranteed to cover all + * migrations. */ static void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp, @@ -2259,7 +2221,7 @@ err: * @threadgroup: whether @leader points to the whole process or a single task * * Migrate a process or task denoted by @leader to @cgrp. If migrating a - * process, the caller must be holding threadgroup_lock of @leader. The + * process, the caller must be holding cgroup_threadgroup_rwsem. The * caller is also responsible for invoking cgroup_migrate_add_src() and * cgroup_migrate_prepare_dst() on the targets before invoking this * function and following up with cgroup_migrate_finish(). @@ -2387,7 +2349,7 @@ out_release_tset: * @leader: the task or the leader of the threadgroup to be attached * @threadgroup: attach the whole threadgroup? * - * Call holding cgroup_mutex and threadgroup_lock of @leader. + * Call holding cgroup_mutex and cgroup_threadgroup_rwsem. */ static int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, bool threadgroup) @@ -2480,7 +2442,7 @@ retry_find_task: get_task_struct(tsk); rcu_read_unlock(); - threadgroup_lock(tsk); + percpu_down_write(&cgroup_threadgroup_rwsem); if (threadgroup) { if (!thread_group_leader(tsk)) { /* @@ -2490,7 +2452,7 @@ retry_find_task: * try again; this is * "double-double-toil-and-trouble-check locking". */ - threadgroup_unlock(tsk); + percpu_up_write(&cgroup_threadgroup_rwsem); put_task_struct(tsk); goto retry_find_task; } @@ -2498,7 +2460,7 @@ retry_find_task: ret = cgroup_attach_task(cgrp, tsk, threadgroup); - threadgroup_unlock(tsk); + percpu_up_write(&cgroup_threadgroup_rwsem); put_task_struct(tsk); out_unlock_cgroup: @@ -2703,17 +2665,17 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) goto out_finish; last_task = task; - threadgroup_lock(task); + percpu_down_write(&cgroup_threadgroup_rwsem); /* raced against de_thread() from another thread? */ if (!thread_group_leader(task)) { - threadgroup_unlock(task); + percpu_up_write(&cgroup_threadgroup_rwsem); put_task_struct(task); continue; } ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); - threadgroup_unlock(task); + percpu_up_write(&cgroup_threadgroup_rwsem); put_task_struct(task); if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) @@ -5031,6 +4993,7 @@ int __init cgroup_init(void) unsigned long key; int ssid, err; + BUG_ON(percpu_init_rwsem(&cgroup_threadgroup_rwsem)); BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files)); BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files)); diff --git a/kernel/fork.c b/kernel/fork.c index 03c1eaa..9531275 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1144,10 +1144,6 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) tty_audit_fork(sig); sched_autogroup_fork(sig); -#ifdef CONFIG_CGROUPS - init_rwsem(&sig->group_rwsem); -#endif - sig->oom_score_adj = current->signal->oom_score_adj; sig->oom_score_adj_min = current->signal->oom_score_adj_min; -- 2.1.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem 2015-05-13 20:35 ` [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem Tejun Heo @ 2015-05-19 15:16 ` Peter Zijlstra [not found] ` <20150519151659.GF3644-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2015-05-19 15:16 UTC (permalink / raw) To: Tejun Heo; +Cc: lizefan, cgroups, mingo, linux-kernel On Wed, May 13, 2015 at 04:35:17PM -0400, Tejun Heo wrote: .gitconfig: [diff "default"] xfuncname = "^[[:alpha:]$_].*[^:]$" Will avoid keying on labels like that and show us this is __cgroup_procs_write(). > @@ -2480,7 +2442,7 @@ retry_find_task: > get_task_struct(tsk); > rcu_read_unlock(); > > - threadgroup_lock(tsk); > + percpu_down_write(&cgroup_threadgroup_rwsem); > if (threadgroup) { > if (!thread_group_leader(tsk)) { > /* > @@ -2490,7 +2452,7 @@ retry_find_task: > * try again; this is > * "double-double-toil-and-trouble-check locking". > */ > - threadgroup_unlock(tsk); > + percpu_up_write(&cgroup_threadgroup_rwsem); > put_task_struct(tsk); > goto retry_find_task; > } > @@ -2703,17 +2665,17 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) > goto out_finish; > last_task = task; > > - threadgroup_lock(task); > + percpu_down_write(&cgroup_threadgroup_rwsem); > /* raced against de_thread() from another thread? */ > if (!thread_group_leader(task)) { > - threadgroup_unlock(task); > + percpu_up_write(&cgroup_threadgroup_rwsem); > put_task_struct(task); > continue; > } > > ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); > > - threadgroup_unlock(task); > + percpu_up_write(&cgroup_threadgroup_rwsem); > put_task_struct(task); > > if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) So my only worry with this patch-set is that these operations will be hugely expensive. Now it looks like the cgroup_update_dfl_csses() thing is very rare, its when you change which controllers are active in a given subtree under the uber-l337-super-comount design. The other one, __cgorup_procs_write() is every /procs, /tasks write to a cgroup, and that does worry me, this could be a somewhat common thing. The Changelog states task migration is a cold path, but is tens of miliseconds per task really no problem? ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20150519151659.GF3644-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem [not found] ` <20150519151659.GF3644-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2015-05-19 15:51 ` Tejun Heo 2015-05-20 10:05 ` Zefan Li 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2015-05-19 15:51 UTC (permalink / raw) To: Peter Zijlstra Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, Peter. On Tue, May 19, 2015 at 05:16:59PM +0200, Peter Zijlstra wrote: > .gitconfig: > > [diff "default"] > xfuncname = "^[[:alpha:]$_].*[^:]$" > > Will avoid keying on labels like that and show us this is > __cgroup_procs_write(). Ah, nice trick. > So my only worry with this patch-set is that these operations will be > hugely expensive. > > Now it looks like the cgroup_update_dfl_csses() thing is very rare, its > when you change which controllers are active in a given subtree under > the uber-l337-super-comount design. > > The other one, __cgorup_procs_write() is every /procs, /tasks write to a > cgroup, and that does worry me, this could be a somewhat common thing. > > The Changelog states task migration is a cold path, but is tens of > miliseconds per task really no problem? The latency is bound by synchronize_sched_expedited(). Given the way cgroups are used in majority of setups (process migration happening only during service / session setups), I think this should be okay. I agree that something which is closer to lglock in characteristics would fit the workload better tho. If this actually becomes a problem, we can come up with a different percpu locking scheme which puts a bit more overhead on the reader side to reduce the latency / overhead on the writer side which shouldn't be that difficult but let's see whether we need to get there at all. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem 2015-05-19 15:51 ` Tejun Heo @ 2015-05-20 10:05 ` Zefan Li [not found] ` <555C5C71.80200-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Zefan Li @ 2015-05-20 10:05 UTC (permalink / raw) To: Tejun Heo; +Cc: Peter Zijlstra, cgroups, mingo, linux-kernel On 2015/5/19 23:51, Tejun Heo wrote: > Hello, Peter. > > On Tue, May 19, 2015 at 05:16:59PM +0200, Peter Zijlstra wrote: >> .gitconfig: >> >> [diff "default"] >> xfuncname = "^[[:alpha:]$_].*[^:]$" >> >> Will avoid keying on labels like that and show us this is >> __cgroup_procs_write(). > > Ah, nice trick. > >> So my only worry with this patch-set is that these operations will be >> hugely expensive. >> >> Now it looks like the cgroup_update_dfl_csses() thing is very rare, its >> when you change which controllers are active in a given subtree under >> the uber-l337-super-comount design. >> >> The other one, __cgorup_procs_write() is every /procs, /tasks write to a >> cgroup, and that does worry me, this could be a somewhat common thing. >> >> The Changelog states task migration is a cold path, but is tens of >> miliseconds per task really no problem? > > The latency is bound by synchronize_sched_expedited(). Given the way > cgroups are used in majority of setups (process migration happening > only during service / session setups), I think this should be okay. > Actually process migration can happen quite frequently, for example in Android phones, and that's why Google had an out-of-tree patch to remove the synchronize_rcu() in that path, which turned out to be buggy. > I agree that something which is closer to lglock in characteristics > would fit the workload better tho. If this actually becomes a > problem, we can come up with a different percpu locking scheme which > puts a bit more overhead on the reader side to reduce the latency / > overhead on the writer side which shouldn't be that difficult but > let's see whether we need to get there at all. > > Thanks. > ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <555C5C71.80200-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem [not found] ` <555C5C71.80200-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2015-05-21 20:39 ` Tejun Heo [not found] ` <20150521203943.GS24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2015-05-21 20:39 UTC (permalink / raw) To: Zefan Li Cc: Peter Zijlstra, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, Li. On Wed, May 20, 2015 at 06:05:37PM +0800, Zefan Li wrote: > > The latency is bound by synchronize_sched_expedited(). Given the way > > cgroups are used in majority of setups (process migration happening > > only during service / session setups), I think this should be okay. > > Actually process migration can happen quite frequently, for example in > Android phones, and that's why Google had an out-of-tree patch to remove > the synchronize_rcu() in that path, which turned out to be buggy. It's still not a very frequent operation tho. We're talking about users switching fore/background jobs here and the expedited synchronization w/ preemption enabled doesn't take much time. In addition, as it currently stands, android is doing memory charge immigration on each fore/background switches. I'm pretty doubtful this would make any difference. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20150521203943.GS24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem [not found] ` <20150521203943.GS24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2015-05-24 2:35 ` Zefan Li 0 siblings, 0 replies; 14+ messages in thread From: Zefan Li @ 2015-05-24 2:35 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, cgroups-u79uwXL29TY76Z2rM5mHXA, mingo-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On 2015/5/22 4:39, Tejun Heo wrote: > Hello, Li. > > On Wed, May 20, 2015 at 06:05:37PM +0800, Zefan Li wrote: >>> The latency is bound by synchronize_sched_expedited(). Given the way >>> cgroups are used in majority of setups (process migration happening >>> only during service / session setups), I think this should be okay. >> >> Actually process migration can happen quite frequently, for example in >> Android phones, and that's why Google had an out-of-tree patch to remove >> the synchronize_rcu() in that path, which turned out to be buggy. > > It's still not a very frequent operation tho. We're talking about > users switching fore/background jobs here and the expedited > synchronization w/ preemption enabled doesn't take much time. In > addition, as it currently stands, android is doing memory charge > immigration on each fore/background switches. I'm pretty doubtful > this would make any difference. > I did some testing with my laptop. Moving a task between 2 cgroups for 10W times with one or two threads: 1T 2T orig 3.36s 3.65s orig+tj 3.55s 6.31s orig+sync_rcu 16.69s 28.47s (only 1000 times) The overhead looks acceptable. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] cgroup: simplify threadgroup locking 2015-05-13 20:35 [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo [not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2015-05-13 20:35 ` [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem Tejun Heo @ 2015-05-13 20:35 ` Tejun Heo 2 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2015-05-13 20:35 UTC (permalink / raw) To: lizefan; +Cc: cgroups, mingo, peterz, linux-kernel, Tejun Heo Now that threadgroup locking is made global, code paths around it can be simplified. * lock-verify-unlock-retry dancing removed from __cgroup_procs_write(). * Race protection against de_thread() removed from cgroup_update_dfl_csses(). Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 48 +++++++++++++----------------------------------- 1 file changed, 13 insertions(+), 35 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 5435b9d..afe9a7e 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2401,14 +2401,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, if (!cgrp) return -ENODEV; -retry_find_task: + percpu_down_write(&cgroup_threadgroup_rwsem); rcu_read_lock(); if (pid) { tsk = find_task_by_vpid(pid); if (!tsk) { - rcu_read_unlock(); ret = -ESRCH; - goto out_unlock_cgroup; + goto out_unlock_rcu; } /* * even if we're attaching all tasks in the thread group, we @@ -2418,9 +2417,8 @@ retry_find_task: if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && !uid_eq(cred->euid, tcred->uid) && !uid_eq(cred->euid, tcred->suid)) { - rcu_read_unlock(); ret = -EACCES; - goto out_unlock_cgroup; + goto out_unlock_rcu; } } else tsk = current; @@ -2435,35 +2433,21 @@ retry_find_task: */ if (tsk == kthreadd_task || (tsk->flags & PF_NO_SETAFFINITY)) { ret = -EINVAL; - rcu_read_unlock(); - goto out_unlock_cgroup; + goto out_unlock_rcu; } get_task_struct(tsk); rcu_read_unlock(); - percpu_down_write(&cgroup_threadgroup_rwsem); - if (threadgroup) { - if (!thread_group_leader(tsk)) { - /* - * a race with de_thread from another thread's exec() - * may strip us of our leadership, if this happens, - * there is no choice but to throw this task away and - * try again; this is - * "double-double-toil-and-trouble-check locking". - */ - percpu_up_write(&cgroup_threadgroup_rwsem); - put_task_struct(tsk); - goto retry_find_task; - } - } - ret = cgroup_attach_task(cgrp, tsk, threadgroup); - percpu_up_write(&cgroup_threadgroup_rwsem); - put_task_struct(tsk); -out_unlock_cgroup: + goto out_unlock_threadgroup; + +out_unlock_rcu: + rcu_read_unlock(); +out_unlock_threadgroup: + percpu_up_write(&cgroup_threadgroup_rwsem); cgroup_kn_unlock(of->kn); return ret ?: nbytes; } @@ -2610,6 +2594,8 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) lockdep_assert_held(&cgroup_mutex); + percpu_down_write(&cgroup_threadgroup_rwsem); + /* look up all csses currently attached to @cgrp's subtree */ down_read(&css_set_rwsem); css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { @@ -2665,17 +2651,8 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) goto out_finish; last_task = task; - percpu_down_write(&cgroup_threadgroup_rwsem); - /* raced against de_thread() from another thread? */ - if (!thread_group_leader(task)) { - percpu_up_write(&cgroup_threadgroup_rwsem); - put_task_struct(task); - continue; - } - ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); - percpu_up_write(&cgroup_threadgroup_rwsem); put_task_struct(task); if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) @@ -2685,6 +2662,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) out_finish: cgroup_migrate_finish(&preloaded_csets); + percpu_up_write(&cgroup_threadgroup_rwsem); return ret; } -- 2.1.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-05-27 0:34 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-13 20:35 [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo
[not found] ` <1431549318-16756-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-05-13 20:35 ` [PATCH 1/3] sched, cgroup: reorganize threadgroup locking Tejun Heo
[not found] ` <1431549318-16756-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-05-14 1:09 ` Sergey Senozhatsky
2015-05-14 15:17 ` Tejun Heo
2015-05-18 16:34 ` [PATCHSET] cgroup, sched: restructure threadgroup locking and replace it with a percpu_rwsem Tejun Heo
[not found] ` <20150518163440.GA24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-05-18 20:06 ` Peter Zijlstra
2015-05-27 0:34 ` Tejun Heo
2015-05-13 20:35 ` [PATCH 2/3] sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem Tejun Heo
2015-05-19 15:16 ` Peter Zijlstra
[not found] ` <20150519151659.GF3644-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2015-05-19 15:51 ` Tejun Heo
2015-05-20 10:05 ` Zefan Li
[not found] ` <555C5C71.80200-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-05-21 20:39 ` Tejun Heo
[not found] ` <20150521203943.GS24861-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-05-24 2:35 ` Zefan Li
2015-05-13 20:35 ` [PATCH 3/3] cgroup: simplify threadgroup locking Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).