cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] cgroup: allow dropping RCU read lock while iterating
@ 2013-05-21  1:50 Tejun Heo
       [not found] ` <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2013-05-21  1:50 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Currently all cgroup iterators require the whole traversal to be
contained in a single RCU read critical section, which can be too
restrictive as there are times when blocking operations are necessary
during traversal.  This forces controllers to implement specific
workarounds in those cases - building separate iteration list, punting
actual operations to work items and so on.

This patchset updates cgroup iterators so that they allow dropping RCU
read lock while iteration is in progress so that controllers which
require sleeping during iteration don't need to implement their own
mechanisms.

Dropping RCU read lock during iteration is unsafe because
cgroup->sibling.next can't be trusted once RCU read lock is dropped.
The sibling list is a RCU list and when a cgroup is removed the next
pointer is retained to keep RCU traversal working.  If the next
sibling is removed while RCU read lock is dropped, the removed current
cgroup's next won't be updated and the next sibling may complete its
grace period and get freed leaving the next pointer dangling.

Working around the problem is relatiely simple.  Whether
->sibling.next can be trusted can be trusted can be decided by looking
at CGRP_REMOVED - as cgroup removals are fully serialized, the flag is
guaranteed to be visible before the next sibling finishes its grace
period.  For those cases, each cgroup is assigned a monotonically
increasing serial number.  Because new cgroups are always appeneded to
the children list, it's guaranteed that all children list are sorted
in the ascending order of the serial numbers.  When the next pointer
can't be trusted, the next sibling can be located by walking the
parent's children list from the beginning looking for the first cgroup
with higher serial number.

The above is implemented in cgroup_next_sibling() and all iterators
are updated to use it to find out the next sibling thus allowing
droppping RCU read lock while iteration is in progress.  This patchset
replaces separate iteration list in device_cgroup with direct
descendant walk and there will be further patches making use of this
update.

This patchset contains the following five patches.

 0001-cgroup-fix-a-subtle-bug-in-descendant-pre-order-walk.patch
 0002-cgroup-make-cgroup_is_removed-static.patch
 0003-cgroup-add-cgroup-serial_nr-and-implement-cgroup_nex.patch
 0004-cgroup-update-iterators-to-use-cgroup_next_sibling.patch
 0005-device_cgroup-simplify-cgroup-tree-walk-in-propagate.patch

0001 fixes a subtle iteration bug.  Will be applied to for-3.10-fixes.

0002 is a trivial prep patch.

0003 implements cgroup_next_sibling() which can find out the next
sibling regardless of the state of the current cgroup.

0004 updates all iterators to use cgroup_next_sibling().

0005 replaces iteration list work around in device_cgroup with direct
iteration.

This patchset is on top of cgroup/for-3.11 23958e729e ("cgroup.h:
remove some functions that are now gone") and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-interruptible-iter

diffstat follows.

 include/linux/cgroup.h   |   31 +++++++++++---
 kernel/cgroup.c          |   98 ++++++++++++++++++++++++++++++++++++++++-------
 security/device_cgroup.c |   56 ++++++++------------------
 3 files changed, 128 insertions(+), 57 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-05-24  3:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-21  1:50 [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
     [not found] ` <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21  1:50   ` [PATCH 1/5] cgroup: fix a subtle bug in descendant pre-order walk Tejun Heo
     [not found]     ` <1369101025-28335-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-22 18:22       ` Michal Hocko
2013-05-24  1:51       ` Tejun Heo
2013-05-21  1:50   ` [PATCH 2/5] cgroup: make cgroup_is_removed() static Tejun Heo
     [not found]     ` <1369101025-28335-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-24  1:56       ` Tejun Heo
     [not found]         ` <20130524015613.GB19755-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-05-24  3:32           ` Li Zefan
2013-05-21  1:50   ` [PATCH 3/5] cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() Tejun Heo
     [not found]     ` <1369101025-28335-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 14:33       ` Serge Hallyn
2013-05-22 14:36       ` Aristeu Rozanski
     [not found]         ` <20130522143636.GC16739-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-22 14:38           ` Tejun Heo
2013-05-22 18:41       ` Michal Hocko
2013-05-21  1:50   ` [PATCH 4/5] cgroup: update iterators to use cgroup_next_sibling() Tejun Heo
     [not found]     ` <1369101025-28335-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:31       ` Serge Hallyn
2013-05-22  9:09       ` Li Zefan
     [not found]         ` <519C8B2E.5040606-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-05-22  9:17           ` Tejun Heo
2013-05-22 18:46       ` Michal Hocko
2013-05-21  1:50   ` [PATCH 5/5] device_cgroup: simplify cgroup tree walk in propagate_exception() Tejun Heo
     [not found]     ` <1369101025-28335-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:35       ` Serge Hallyn
2013-05-21  3:20   ` [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
2013-05-22 14:53   ` Aristeu Rozanski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).