From: Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
mhocko-AlSwsSmVLrQ@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCHSET] cgroup: allow dropping RCU read lock while iterating
Date: Wed, 22 May 2013 10:53:17 -0400 [thread overview]
Message-ID: <20130522145316.GD16739@redhat.com> (raw)
In-Reply-To: <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Hi Tejun,
On Tue, May 21, 2013 at 10:50:20AM +0900, Tejun Heo wrote:
> Currently all cgroup iterators require the whole traversal to be
> contained in a single RCU read critical section, which can be too
> restrictive as there are times when blocking operations are necessary
> during traversal. This forces controllers to implement specific
> workarounds in those cases - building separate iteration list, punting
> actual operations to work items and so on.
>
> This patchset updates cgroup iterators so that they allow dropping RCU
> read lock while iteration is in progress so that controllers which
> require sleeping during iteration don't need to implement their own
> mechanisms.
>
> Dropping RCU read lock during iteration is unsafe because
> cgroup->sibling.next can't be trusted once RCU read lock is dropped.
> The sibling list is a RCU list and when a cgroup is removed the next
> pointer is retained to keep RCU traversal working. If the next
> sibling is removed while RCU read lock is dropped, the removed current
> cgroup's next won't be updated and the next sibling may complete its
> grace period and get freed leaving the next pointer dangling.
>
> Working around the problem is relatiely simple. Whether
> ->sibling.next can be trusted can be trusted can be decided by looking
> at CGRP_REMOVED - as cgroup removals are fully serialized, the flag is
> guaranteed to be visible before the next sibling finishes its grace
> period. For those cases, each cgroup is assigned a monotonically
> increasing serial number. Because new cgroups are always appeneded to
> the children list, it's guaranteed that all children list are sorted
> in the ascending order of the serial numbers. When the next pointer
> can't be trusted, the next sibling can be located by walking the
> parent's children list from the beginning looking for the first cgroup
> with higher serial number.
>
> The above is implemented in cgroup_next_sibling() and all iterators
> are updated to use it to find out the next sibling thus allowing
> droppping RCU read lock while iteration is in progress. This patchset
> replaces separate iteration list in device_cgroup with direct
> descendant walk and there will be further patches making use of this
> update.
>
> This patchset contains the following five patches.
>
> 0001-cgroup-fix-a-subtle-bug-in-descendant-pre-order-walk.patch
> 0002-cgroup-make-cgroup_is_removed-static.patch
> 0003-cgroup-add-cgroup-serial_nr-and-implement-cgroup_nex.patch
> 0004-cgroup-update-iterators-to-use-cgroup_next_sibling.patch
> 0005-device_cgroup-simplify-cgroup-tree-walk-in-propagate.patch
>
> 0001 fixes a subtle iteration bug. Will be applied to for-3.10-fixes.
>
> 0002 is a trivial prep patch.
>
> 0003 implements cgroup_next_sibling() which can find out the next
> sibling regardless of the state of the current cgroup.
>
> 0004 updates all iterators to use cgroup_next_sibling().
>
> 0005 replaces iteration list work around in device_cgroup with direct
> iteration.
>
> This patchset is on top of cgroup/for-3.11 23958e729e ("cgroup.h:
> remove some functions that are now gone") and available in the
> following git branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-interruptible-iter
patchset looks good to me. ran some tests in a kernel with it without problems.
Acked-by: Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
Aristeu
prev parent reply other threads:[~2013-05-22 14:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-21 1:50 [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
[not found] ` <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 1:50 ` [PATCH 1/5] cgroup: fix a subtle bug in descendant pre-order walk Tejun Heo
[not found] ` <1369101025-28335-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-22 18:22 ` Michal Hocko
2013-05-24 1:51 ` Tejun Heo
2013-05-21 1:50 ` [PATCH 2/5] cgroup: make cgroup_is_removed() static Tejun Heo
[not found] ` <1369101025-28335-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-24 1:56 ` Tejun Heo
[not found] ` <20130524015613.GB19755-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-05-24 3:32 ` Li Zefan
2013-05-21 1:50 ` [PATCH 3/5] cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() Tejun Heo
[not found] ` <1369101025-28335-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 14:33 ` Serge Hallyn
2013-05-22 14:36 ` Aristeu Rozanski
[not found] ` <20130522143636.GC16739-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-22 14:38 ` Tejun Heo
2013-05-22 18:41 ` Michal Hocko
2013-05-21 1:50 ` [PATCH 4/5] cgroup: update iterators to use cgroup_next_sibling() Tejun Heo
[not found] ` <1369101025-28335-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:31 ` Serge Hallyn
2013-05-22 9:09 ` Li Zefan
[not found] ` <519C8B2E.5040606-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-05-22 9:17 ` Tejun Heo
2013-05-22 18:46 ` Michal Hocko
2013-05-21 1:50 ` [PATCH 5/5] device_cgroup: simplify cgroup tree walk in propagate_exception() Tejun Heo
[not found] ` <1369101025-28335-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:35 ` Serge Hallyn
2013-05-21 3:20 ` [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
2013-05-22 14:53 ` Aristeu Rozanski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130522145316.GD16739@redhat.com \
--to=aris-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).