From: Tejun Heo <tj@kernel.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: Tejun Heo <theo@redhat.com>,
linux-kernel@vger.kernel.org,
Rusty Russell <rusty@rustcorp.com.au>,
Oleg Nesterov <oleg@redhat.com>,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()
Date: Wed, 12 Jun 2013 16:31:51 -0700 [thread overview]
Message-ID: <20130612233151.GA32700@mtj.dyndns.org> (raw)
In-Reply-To: <20130612214630.GH6151@google.com>
Hey,
On Wed, Jun 12, 2013 at 02:46:30PM -0700, Kent Overstreet wrote:
> I'm reading through the cgroup patch/code now - this refcounting is
> _hairy_ so I could certainly believe the way you've done it is the
> sanest way that'll work.
There are several different entities involved in the refcnting. It's
hairy.
> But it does feel like some of the ordering/ownership is backwards here,
> and that's where the need for confirm_kill is coming from - also, the
> fact that css_tryget() is used more than css_get() is... suspicious.
>
> Basically, you're wanting the lifetime of the subsystems to be
> controlled by the lifetime of the cgroup. If that's the case, then
> nothing should be taking refs on them (i'm not sure if that's actually
> the case) and they shouldn't be taking refs on the cgroup - the cgroup
> should kill them directly when its ref goes to 0.
The cgroup initiates destruction but controllers are free to hold on
to their part of cgroup (each css) which in turn should pin the cgroup
itself. Each css proxies cgroup for each controller. The reference
counting becomes nested but it doesn't change the overall mechanism.
> Of course, if they can't just be killed and they really do need
> independent lifetimes, then that's a problem - though embedding two
> refcounts in the cgroup might solve that (one for the subsystems, one
> for everything else).
The original intention of the nested refcnting probably is allowing
draining css's separately so that subsystems can be added and removed
from an existing hierarchy. That never worked out, so it could be
that we can collapse all the css refcnts into cgroup refcnt, which BTW
is currently using the dentry reference count. I'm fairly certain
that we can collapse it but that's a separate issue we can deal with
later.
Being collapsed or not doesn't really change the necessity for kill
confirming tho. More on the subject below.
> Uhm, cgroup_offline_fn() seems weird. First it's telling the subsystems
> to go away - but all we know is that tryget() is failing, there could
> still be outstanding refs. What am I missing? offline_css() doesn't
> appear to wait for any refs to hit 0.
offline_css() tells the controllers that the cgroup is being removed
and they should release whatever resources which could be pinning the
css and stop creating such new references, so that whatever in flight
finishes the reference could hit zero.
Some controllers depend on tryget reliably failing for not creating
lasting references, so that's where the requirement for guaranteeing
tryget failure comes in. Note that it has nothing to do with whether
the refcnts are collapsed into one or not. It's still the same deal.
You need to confirm that the one refcnt is visible as killed on all
CPUs before starting offlining.
> Then it says it's putting base refs... but base refs should be put with
> percpu_ref_kill(), so what refs are those really?
Ah, that's me forgetting that the base refs are already put and slab
poisoning missing it probably because the css object tends to still be
in RCU purgatory when the last extra put happens. Will fix.
> Then there's a list_del_rcu() - but that should probably happen _before_
> the call_rcu that percpu_ref_kill() does (in the absense of the bizzare
> cgroup stuff you need the list_del_rcu() or whatever to happen first, so
> that you know no new gets() can happen, then then after the call_rcu()
> it's safe to drop the base ref... if you drop it before the rcu barrier
> you can have a get happen after the ref already hit 0.)
No, there are two separate stages of RCU'ing going on here. One to
disable tryget for ->css_offline. The other to actually free the css
as css is expected to be RCU-read accessible during and after
->css_offline(). The percpu ref is adding an extra RCU stage.
> So maybe those lists just aren't used by anything that would take a ref?
> Or is there a barrier or something somewhere else that makes it safe?
The two RCU grace periods are just protecting different things and
they can't be combined.
Thanks.
--
tejun
next prev parent reply other threads:[~2013-06-12 23:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-12 20:45 [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates Tejun Heo
2013-06-12 20:46 ` [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() Tejun Heo
2013-06-12 21:08 ` Kent Overstreet
2013-06-12 21:17 ` Tejun Heo
2013-06-12 21:46 ` Kent Overstreet
2013-06-12 23:31 ` Tejun Heo [this message]
2013-06-12 23:34 ` Tejun Heo
2013-06-13 3:50 ` [PATCH v2 " Tejun Heo
2013-06-13 23:13 ` [PATCH " Kent Overstreet
2013-06-13 23:44 ` Kent Overstreet
2013-06-14 2:41 ` [PATCH v3 " Tejun Heo
2013-06-12 20:57 ` [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates Kent Overstreet
2013-06-12 20:59 ` Tejun Heo
2013-06-13 3:48 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130612233151.GA32700@mtj.dyndns.org \
--to=tj@kernel.org \
--cc=cl@linux-foundation.org \
--cc=koverstreet@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=theo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox