cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Mikulas Patocka
	<mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Alasdair G. Kergon"
	<agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states
Date: Thu, 13 Jun 2013 16:16:34 -0700	[thread overview]
Message-ID: <20130613231634.GD28664@moria.home.lan> (raw)
In-Reply-To: <1371096298-24402-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Wed, Jun 12, 2013 at 09:04:58PM -0700, Tejun Heo wrote:
> A css (cgroup_subsys_state) is how each cgroup is represented to a
> controller.  As such, it can be used in hot paths across the various
> subsystems different controllers are associated with.
> 
> One of the common operations is reference counting, which up until now
> has been implemented using a global atomic counter and can have
> significant adverse impact on scalability.  For example, css refcnt
> can be gotten and put multiple times by blkcg for each IO request.
> For highops configurations which try to do as much per-cpu as
> possible, the global frequent refcnting can be very expensive.
> 
> In general, given the various and hugely diverse paths css's end up
> being used from, we need to make it cheap and highly scalable.  In its
> usage, css refcnting isn't very different from module refcnting.
> 
> This patch converts css refcnting to use the recently added
> percpu_ref.  css_get/tryget/put() directly maps to the matching
> percpu_ref operations and the deactivation logic is no longer
> necessary as percpu_ref already has refcnt killing.
> 
> The only complication is that as the refcnt is per-cpu,
> percpu_ref_kill() in itself doesn't ensure that further tryget
> operations will fail, which we need to guarantee before invoking
> ->css_offline()'s.  This is resolved collecting kill confirmation
> using percpu_ref_kill_and_confirm() and initiating the offline phase
> of destruction after all css refcnt's are confirmed to be seen as
> killed on all CPUs.  The previous patches already splitted destruction
> into two phases, so percpu_ref_kill_and_confirm() can be hooked up
> easily.
> 
> This patch removes css_refcnt() which is used for rcu dereference
> sanity check in css_id().  While we can add a percpu refcnt API to ask
> the same question, css_id() itself is scheduled to be removed fairly
> soon, so let's not bother with it.  Just drop the sanity check and use
> rcu_dereference_raw() instead.
> 
> v2: - init_cgroup_css() was calling percpu_ref_init() without checking
>       the return value.  This causes two problems - the obvious lack
>       of error handling and percpu_ref_init() being called from
>       cgroup_init_subsys() before the allocators are up, which
>       triggers warnings but doesn't cause actual problems as the
>       refcnt isn't used for roots anyway.  Fix both by moving
>       percpu_ref_init() to cgroup_create().
> 
>     - The base references were put too early by
>       percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
>       refs one extra time.  This wasn't noticeable because css's go
>       through another RCU grace period before being freed.  Update
>       cgroup_destroy_locked() to grab an extra reference before
>       killing the refcnts.  This problem was noticed by Kent.

Reviewed-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

  parent reply	other threads:[~2013-06-13 23:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  4:04 [PATCHSET v2 cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Tejun Heo
2013-06-13  4:04 ` [PATCH 05/11] cgroup: clean up css_[try]get() and css_put() Tejun Heo
     [not found] ` <1371096298-24402-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-13  4:04   ` [PATCH 01/11] cgroup: remove now unused css_depth() Tejun Heo
2013-06-13  4:04   ` [PATCH 02/11] cgroup: consistently use @cset for struct css_set variables Tejun Heo
2013-06-13  4:04   ` [PATCH 03/11] cgroup: bring some sanity to naming around cg_cgroup_link Tejun Heo
2013-06-13  4:04   ` [PATCH 04/11] cgroup: use kzalloc() instead of kmalloc() Tejun Heo
2013-06-13  4:04   ` [PATCH 06/11] cgroup: rename CGRP_REMOVED to CGRP_DEAD Tejun Heo
2013-06-13  4:04   ` [PATCH 07/11] cgroup: drop unnecessary RCU dancing from __put_css_set() Tejun Heo
2013-06-13  4:04   ` [PATCH 09/11] cgroup: reorder the operations in cgroup_destroy_locked() Tejun Heo
2013-06-13  6:04   ` [PATCHSET v2 cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Li Zefan
     [not found]     ` <51B960FF.7070604-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-06-13 17:56       ` Tejun Heo
     [not found]         ` <20130613175627.GC10799-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-14  2:41           ` Tejun Heo
2013-06-13  4:04 ` [PATCH 08/11] cgroup: remove cgroup->count and use Tejun Heo
2013-06-13  4:04 ` [PATCH 10/11] cgroup: split cgroup destruction into two steps Tejun Heo
2013-06-13  4:04 ` [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states Tejun Heo
     [not found]   ` <1371096298-24402-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-13 23:16     ` Kent Overstreet [this message]
2013-06-14 12:55     ` Michal Hocko
     [not found]       ` <20130614125539.GC10084-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-14 14:15         ` Glauber Costa
     [not found]           ` <20130614141510.GA19021-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2013-06-14 14:22             ` Michal Hocko
2013-06-14 13:20     ` Michal Hocko
     [not found]       ` <20130614132026.GD10084-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-14 22:31         ` Tejun Heo
     [not found]           ` <20130614223125.GD6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-15  5:35             ` Tejun Heo
     [not found]               ` <20130615053522.GA7017-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-15  5:39                 ` Tejun Heo
2013-06-15  6:31                 ` Tejun Heo
2013-06-17 13:27                 ` Michal Hocko
     [not found]                   ` <20130617132744.GB5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-17 17:16                     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2013-06-12 21:03 [PATCHSET cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Tejun Heo
2013-06-12 21:03 ` [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130613231634.GD28664@moria.home.lan \
    --to=koverstreet-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
    --cc=agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).