From: Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
mhocko-AlSwsSmVLrQ@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCHSET] cgroup: allow dropping RCU read lock while iterating
Date: Wed, 22 May 2013 10:53:17 -0400 [thread overview]
Message-ID: <20130522145316.GD16739@redhat.com> (raw)
In-Reply-To: <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Hi Tejun,
On Tue, May 21, 2013 at 10:50:20AM +0900, Tejun Heo wrote:
> Currently all cgroup iterators require the whole traversal to be
> contained in a single RCU read critical section, which can be too
> restrictive as there are times when blocking operations are necessary
> during traversal. This forces controllers to implement specific
> workarounds in those cases - building separate iteration list, punting
> actual operations to work items and so on.
>
> This patchset updates cgroup iterators so that they allow dropping RCU
> read lock while iteration is in progress so that controllers which
> require sleeping during iteration don't need to implement their own
> mechanisms.
>
> Dropping RCU read lock during iteration is unsafe because
> cgroup->sibling.next can't be trusted once RCU read lock is dropped.
> The sibling list is a RCU list and when a cgroup is removed the next
> pointer is retained to keep RCU traversal working. If the next
> sibling is removed while RCU read lock is dropped, the removed current
> cgroup's next won't be updated and the next sibling may complete its
> grace period and get freed leaving the next pointer dangling.
>
> Working around the problem is relatiely simple. Whether
> ->sibling.next can be trusted can be trusted can be decided by looking
> at CGRP_REMOVED - as cgroup removals are fully serialized, the flag is
> guaranteed to be visible before the next sibling finishes its grace
> period. For those cases, each cgroup is assigned a monotonically
> increasing serial number. Because new cgroups are always appeneded to
> the children list, it's guaranteed that all children list are sorted
> in the ascending order of the serial numbers. When the next pointer
> can't be trusted, the next sibling can be located by walking the
> parent's children list from the beginning looking for the first cgroup
> with higher serial number.
>
> The above is implemented in cgroup_next_sibling() and all iterators
> are updated to use it to find out the next sibling thus allowing
> droppping RCU read lock while iteration is in progress. This patchset
> replaces separate iteration list in device_cgroup with direct
> descendant walk and there will be further patches making use of this
> update.
>
> This patchset contains the following five patches.
>
> 0001-cgroup-fix-a-subtle-bug-in-descendant-pre-order-walk.patch
> 0002-cgroup-make-cgroup_is_removed-static.patch
> 0003-cgroup-add-cgroup-serial_nr-and-implement-cgroup_nex.patch
> 0004-cgroup-update-iterators-to-use-cgroup_next_sibling.patch
> 0005-device_cgroup-simplify-cgroup-tree-walk-in-propagate.patch
>
> 0001 fixes a subtle iteration bug. Will be applied to for-3.10-fixes.
>
> 0002 is a trivial prep patch.
>
> 0003 implements cgroup_next_sibling() which can find out the next
> sibling regardless of the state of the current cgroup.
>
> 0004 updates all iterators to use cgroup_next_sibling().
>
> 0005 replaces iteration list work around in device_cgroup with direct
> iteration.
>
> This patchset is on top of cgroup/for-3.11 23958e729e ("cgroup.h:
> remove some functions that are now gone") and available in the
> following git branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-interruptible-iter
patchset looks good to me. ran some tests in a kernel with it without problems.
Acked-by: Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
Aristeu
prev parent reply other threads:[~2013-05-22 14:53 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-21 1:50 [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
[not found] ` <1369101025-28335-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 1:50 ` [PATCH 1/5] cgroup: fix a subtle bug in descendant pre-order walk Tejun Heo
[not found] ` <1369101025-28335-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-22 18:22 ` Michal Hocko
2013-05-22 18:22 ` Michal Hocko
2013-05-24 1:51 ` Tejun Heo
2013-05-21 1:50 ` Tejun Heo
2013-05-21 1:50 ` [PATCH 2/5] cgroup: make cgroup_is_removed() static Tejun Heo
2013-05-21 1:50 ` Tejun Heo
[not found] ` <1369101025-28335-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-24 1:56 ` Tejun Heo
[not found] ` <20130524015613.GB19755-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-05-24 3:32 ` Li Zefan
2013-05-24 3:32 ` Li Zefan
2013-05-21 1:50 ` [PATCH 3/5] cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() Tejun Heo
2013-05-21 1:50 ` Tejun Heo
[not found] ` <1369101025-28335-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 14:33 ` Serge Hallyn
2013-05-21 14:33 ` Serge Hallyn
2013-05-22 14:36 ` Aristeu Rozanski
[not found] ` <20130522143636.GC16739-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-22 14:38 ` Tejun Heo
2013-05-22 18:41 ` Michal Hocko
2013-05-21 1:50 ` [PATCH 4/5] cgroup: update iterators to use cgroup_next_sibling() Tejun Heo
[not found] ` <1369101025-28335-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:31 ` Serge Hallyn
2013-05-21 22:31 ` Serge Hallyn
2013-05-22 9:09 ` Li Zefan
[not found] ` <519C8B2E.5040606-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-05-22 9:17 ` Tejun Heo
2013-05-22 9:09 ` Li Zefan
2013-05-22 18:46 ` Michal Hocko
2013-05-21 1:50 ` Tejun Heo
2013-05-21 1:50 ` [PATCH 5/5] device_cgroup: simplify cgroup tree walk in propagate_exception() Tejun Heo
[not found] ` <1369101025-28335-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-21 22:35 ` Serge Hallyn
2013-05-21 1:50 ` Tejun Heo
2013-05-21 3:20 ` [PATCHSET] cgroup: allow dropping RCU read lock while iterating Tejun Heo
2013-05-21 3:20 ` Tejun Heo
2013-05-22 14:53 ` Aristeu Rozanski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130522145316.GD16739@redhat.com \
--to=aris-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.