From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753127Ab1BHKXR (ORCPT ); Tue, 8 Feb 2011 05:23:17 -0500 Received: from bombadil.infradead.org ([18.85.46.34]:40380 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752563Ab1BHKXQ (ORCPT ); Tue, 8 Feb 2011 05:23:16 -0500 Subject: Re: [RFC][PATCH] cgroup: Fix cgroup_subsys::exit callback From: Peter Zijlstra To: Paul Menage Cc: balbir@linux.vnet.ibm.com, eranian@google.com, linux-kernel@vger.kernel.org, mingo@elte.hu, paulus@samba.org, davem@davemloft.net, fweisbec@gmail.com, perfmon2-devel@lists.sf.net, eranian@gmail.com, robert.richter@amd.com, acme@redhat.com, lizf@cn.fujitsu.com In-Reply-To: References: <4d384700.2308e30a.70bc.ffffd532@mx.google.com> <1295534345.28776.175.camel@laptop> <1296646160.26581.315.camel@laptop> <20110202115012.GA16409@balbir.in.ibm.com> <1296650792.26581.319.camel@laptop> <20110202190251.GB16409@balbir.in.ibm.com> <1297095033.13327.46.camel@laptop> <1297108959.13327.54.camel@laptop> Content-Type: text/plain; charset="UTF-8" Date: Tue, 08 Feb 2011 11:24:15 +0100 Message-ID: <1297160655.13327.92.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2011-02-07 at 13:21 -0800, Paul Menage wrote: > On Mon, Feb 7, 2011 at 12:02 PM, Peter Zijlstra wrote: > > On Mon, 2011-02-07 at 11:28 -0800, Paul Menage wrote: > >> On Mon, Feb 7, 2011 at 8:10 AM, Peter Zijlstra wrote: > >> > > >> > Make the ::exit method act like ::attach, it is after all very nearly > >> > the same thing. > >> > >> The major difference between attach and exit is that the former is > >> only triggered in response to user cgroup-management action, whereas > >> the latter is triggered whenever a task exits, even if cgroups aren't > >> set up. > > > > And the major likeness is that they both migrate a task from one cgroup > > to another. You cannot simply ignore that. > > True, and the exit path for cgroups has always been a bit fuzzy - that > was kind of inherited from cpusets, where this worked out OK, but the > CPU subsystem has more interesting requirements. Yeah, from that pov we're actually still a running task that can and will scheduler still. > An important semantic difference between attach and exit is that in > the exit case the destination is always the root, and the task in > question is going to be exiting before doing anything else > interesting. So it should be possible to optimize that path a lot > compared to the regular attach (many things related to resource usage > can be ignored, since the task won't have time to actually use any > non-trivial amount of resources). One could also argue that the accumulated time (/other resources) of all tasks exiting might end up being a significant amount, but yeah whatever :-) I'm mostly concerned with not wrecking state (and crashing/leaking etc). > > > > If maybe you're like respond more often than about once every two months > > I might actually care what you think. > > Yes, sadly since switching groups at Google, cgroups has become pretty > much just a spare-time activity for me And here I thought Google was starting to understand what community participation meant.. is there anybody you know who can play a more active role in the whole cgroup thing? > - but that in itself doesn't > automatically invalidate my opinion when I do have time to respond. > It's still the case that cgroup_mutex is an incredibly heavyweight > mutex that has no business being in the task exit path. Or do you just > believe that ad hominem is a valid style of argument? No, it was mostly frustration talking. > How about if the exit callback was moved before the preceding > task_unlock()? Since I think the scheduler is still the only user of > the exit callback, redefining the locking semantics should be fine. Like the below? Both the perf and sched exit callback are fine with being called under task_lock afaict, but I haven't actually ran with lockdep enabled to see if I missed something. I also pondered doing the cgroup exit from __put_task_struct() but that had another problem iirc. --- Index: linux-2.6/include/linux/cgroup.h =================================================================== --- linux-2.6.orig/include/linux/cgroup.h +++ linux-2.6/include/linux/cgroup.h @@ -474,7 +474,8 @@ struct cgroup_subsys { struct cgroup *old_cgrp, struct task_struct *tsk, bool threadgroup); void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); - void (*exit)(struct cgroup_subsys *ss, struct task_struct *task); + void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup *old_cgrp, struct task_struct *task); int (*populate)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp); Index: linux-2.6/kernel/cgroup.c =================================================================== --- linux-2.6.orig/kernel/cgroup.c +++ linux-2.6/kernel/cgroup.c @@ -4230,20 +4230,8 @@ void cgroup_post_fork(struct task_struct */ void cgroup_exit(struct task_struct *tsk, int run_callbacks) { - int i; struct css_set *cg; - - if (run_callbacks && need_forkexit_callback) { - /* - * modular subsystems can't use callbacks, so no need to lock - * the subsys array - */ - for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) { - struct cgroup_subsys *ss = subsys[i]; - if (ss->exit) - ss->exit(ss, tsk); - } - } + int i; /* * Unlink from the css_set task list if necessary. @@ -4261,7 +4249,24 @@ void cgroup_exit(struct task_struct *tsk task_lock(tsk); cg = tsk->cgroups; tsk->cgroups = &init_css_set; + + if (run_callbacks && need_forkexit_callback) { + /* + * modular subsystems can't use callbacks, so no need to lock + * the subsys array + */ + for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) { + struct cgroup_subsys *ss = subsys[i]; + if (ss->exit) { + struct cgroup *old_cgrp = + rcu_dereference_raw(cg->subsys[i])->cgroup; + struct cgroup *cgrp = task_cgroup(tsk, i); + ss->exit(ss, cgrp, old_cgrp, tsk); + } + } + } task_unlock(tsk); + if (cg) put_css_set_taskexit(cg); } Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -606,9 +606,6 @@ static inline struct task_group *task_gr struct task_group *tg; struct cgroup_subsys_state *css; - if (p->flags & PF_EXITING) - return &root_task_group; - css = task_subsys_state_check(p, cpu_cgroup_subsys_id, lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); @@ -9081,7 +9078,8 @@ cpu_cgroup_attach(struct cgroup_subsys * } static void -cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task) +cpu_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup *old_cgrp, struct task_struct *task) { /* * cgroup_exit() is called in the copy_process() failure path.