From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kent Overstreet Subject: Re: [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states Date: Thu, 13 Jun 2013 16:16:34 -0700 Message-ID: <20130613231634.GD28664@moria.home.lan> References: <1371096298-24402-1-git-send-email-tj@kernel.org> <1371096298-24402-12-git-send-email-tj@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Nqgf7iLVHTFvTUsIxtmrHsp0Fwds3QrcfKfvetIfWZA=; b=ig7SgdNDCthn6zLOBi/3I7Ar2JqSme8UbfSd5zjdznHyUMhxWfla6wbx/80/cNriax hMUeh3K1fHjKbicxxzCRWZ3aPf+8Mg4/ePJ8YkTa5k+BnKLQ6IBMovMnchsC6KNmP8Pc 2W72G4hrE7nj8VBcEmPXk/OJa6BaBmTFAF+3xwmVeDoirT8JyW2jdsl2Lvu4p+xZPFHc GEcr2Hx/La90sUwGpN0EWoNJvEOAb5Y8NkTx2ZRRXva4IL64yPDLXsF9/mUN+9thN7WW X5/6FQ+nnHbdfop4LzEXpnAlo56LYLnOdNfZfnqfYZBSZ+JIVKyCCDwK5EXZY0Gb5M0f kL5w== Content-Disposition: inline In-Reply-To: <1371096298-24402-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Tejun Heo Cc: Jens Axboe , cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, Mike Snitzer , Glauber Costa , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michal Hocko , Mikulas Patocka , "Alasdair G. Kergon" , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Vivek Goyal On Wed, Jun 12, 2013 at 09:04:58PM -0700, Tejun Heo wrote: > A css (cgroup_subsys_state) is how each cgroup is represented to a > controller. As such, it can be used in hot paths across the various > subsystems different controllers are associated with. > > One of the common operations is reference counting, which up until now > has been implemented using a global atomic counter and can have > significant adverse impact on scalability. For example, css refcnt > can be gotten and put multiple times by blkcg for each IO request. > For highops configurations which try to do as much per-cpu as > possible, the global frequent refcnting can be very expensive. > > In general, given the various and hugely diverse paths css's end up > being used from, we need to make it cheap and highly scalable. In its > usage, css refcnting isn't very different from module refcnting. > > This patch converts css refcnting to use the recently added > percpu_ref. css_get/tryget/put() directly maps to the matching > percpu_ref operations and the deactivation logic is no longer > necessary as percpu_ref already has refcnt killing. > > The only complication is that as the refcnt is per-cpu, > percpu_ref_kill() in itself doesn't ensure that further tryget > operations will fail, which we need to guarantee before invoking > ->css_offline()'s. This is resolved collecting kill confirmation > using percpu_ref_kill_and_confirm() and initiating the offline phase > of destruction after all css refcnt's are confirmed to be seen as > killed on all CPUs. The previous patches already splitted destruction > into two phases, so percpu_ref_kill_and_confirm() can be hooked up > easily. > > This patch removes css_refcnt() which is used for rcu dereference > sanity check in css_id(). While we can add a percpu refcnt API to ask > the same question, css_id() itself is scheduled to be removed fairly > soon, so let's not bother with it. Just drop the sanity check and use > rcu_dereference_raw() instead. > > v2: - init_cgroup_css() was calling percpu_ref_init() without checking > the return value. This causes two problems - the obvious lack > of error handling and percpu_ref_init() being called from > cgroup_init_subsys() before the allocators are up, which > triggers warnings but doesn't cause actual problems as the > refcnt isn't used for roots anyway. Fix both by moving > percpu_ref_init() to cgroup_create(). > > - The base references were put too early by > percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the > refs one extra time. This wasn't noticeable because css's go > through another RCU grace period before being freed. Update > cgroup_destroy_locked() to grab an extra reference before > killing the refcnts. This problem was noticed by Kent. Reviewed-by: Kent Overstreet