Re: [RFC][PATCH] cgroup: Fix cgroup_subsys::exit callback

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Paul Menage <menage@google.com>
Cc: balbir@linux.vnet.ibm.com, eranian@google.com,
	linux-kernel@vger.kernel.org, mingo@elte.hu, paulus@samba.org,
	davem@davemloft.net, fweisbec@gmail.com,
	perfmon2-devel@lists.sf.net, eranian@gmail.com,
	robert.richter@amd.com, acme@redhat.com, lizf@cn.fujitsu.com
Subject: Re: [RFC][PATCH] cgroup: Fix cgroup_subsys::exit callback
Date: Tue, 08 Feb 2011 11:24:15 +0100	[thread overview]
Message-ID: <1297160655.13327.92.camel@laptop> (raw)
In-Reply-To: <AANLkTikWS86TPjp+J=iiwCQJ56pqYTQo8Z9_V20GEcEz@mail.gmail.com>

On Mon, 2011-02-07 at 13:21 -0800, Paul Menage wrote:
> On Mon, Feb 7, 2011 at 12:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2011-02-07 at 11:28 -0800, Paul Menage wrote:
> >> On Mon, Feb 7, 2011 at 8:10 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> >
> >> > Make the ::exit method act like ::attach, it is after all very nearly
> >> > the same thing.
> >>
> >> The major difference between attach and exit is that the former is
> >> only triggered in response to user cgroup-management action, whereas
> >> the latter is triggered whenever a task exits, even if cgroups aren't
> >> set up.
> >
> > And the major likeness is that they both migrate a task from one cgroup
> > to another. You cannot simply ignore that.
> 
> True, and the exit path for cgroups has always been a bit fuzzy - that
> was kind of inherited from cpusets, where this worked out OK, but the
> CPU subsystem has more interesting requirements.

Yeah, from that pov we're actually still a running task that can and
will scheduler still.

> An important semantic difference between attach and exit is that in
> the exit case the destination is always the root, and the task in
> question is going to be exiting before doing anything else
> interesting. So it should be possible to optimize that path a lot
> compared to the regular attach (many things related to resource usage
> can be ignored, since the task won't have time to actually use any
> non-trivial amount of resources).

One could also argue that the accumulated time (/other resources) of all
tasks exiting might end up being a significant amount, but yeah
whatever :-)

I'm mostly concerned with not wrecking state (and crashing/leaking etc).

> >
> > If maybe you're like respond more often than about once every two months
> > I might actually care what you think.
> 
> Yes, sadly since switching groups at Google, cgroups has become pretty
> much just a spare-time activity for me 

And here I thought Google was starting to understand what community
participation meant.. is there anybody you know who can play a more
active role in the whole cgroup thing?

> - but that in itself doesn't
> automatically invalidate my opinion when I do have time to respond.
> It's still the case that cgroup_mutex is an incredibly heavyweight
> mutex that has no business being in the task exit path. Or do you just
> believe that ad hominem is a valid style of argument?

No, it was mostly frustration talking.

> How about if the exit callback was moved before the preceding
> task_unlock()? Since I think the scheduler is still the only user of
> the exit callback, redefining the locking semantics should be fine.

Like the below? Both the perf and sched exit callback are fine with
being called under task_lock afaict, but I haven't actually ran with
lockdep enabled to see if I missed something.

I also pondered doing the cgroup exit from __put_task_struct() but that
had another problem iirc.

---
Index: linux-2.6/include/linux/cgroup.h
===================================================================
--- linux-2.6.orig/include/linux/cgroup.h
+++ linux-2.6/include/linux/cgroup.h
@@ -474,7 +474,8 @@ struct cgroup_subsys {
 			struct cgroup *old_cgrp, struct task_struct *tsk,
 			bool threadgroup);
 	void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
-	void (*exit)(struct cgroup_subsys *ss, struct task_struct *task);
+	void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			struct cgroup *old_cgrp, struct task_struct *task);
 	int (*populate)(struct cgroup_subsys *ss,
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c
+++ linux-2.6/kernel/cgroup.c
@@ -4230,20 +4230,8 @@ void cgroup_post_fork(struct task_struct
  */
 void cgroup_exit(struct task_struct *tsk, int run_callbacks)
 {
-	int i;
 	struct css_set *cg;
-
-	if (run_callbacks && need_forkexit_callback) {
-		/*
-		 * modular subsystems can't use callbacks, so no need to lock
-		 * the subsys array
-		 */
-		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
-			struct cgroup_subsys *ss = subsys[i];
-			if (ss->exit)
-				ss->exit(ss, tsk);
-		}
-	}
+	int i;
 
 	/*
 	 * Unlink from the css_set task list if necessary.
@@ -4261,7 +4249,24 @@ void cgroup_exit(struct task_struct *tsk
 	task_lock(tsk);
 	cg = tsk->cgroups;
 	tsk->cgroups = &init_css_set;
+
+	if (run_callbacks && need_forkexit_callback) {
+		/*
+		 * modular subsystems can't use callbacks, so no need to lock
+		 * the subsys array
+		 */
+		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+			if (ss->exit) {
+				struct cgroup *old_cgrp =
+					rcu_dereference_raw(cg->subsys[i])->cgroup;
+				struct cgroup *cgrp = task_cgroup(tsk, i);
+				ss->exit(ss, cgrp, old_cgrp, tsk);
+			}
+		}
+	}
 	task_unlock(tsk);
+
 	if (cg)
 		put_css_set_taskexit(cg);
 }
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -606,9 +606,6 @@ static inline struct task_group *task_gr
 	struct task_group *tg;
 	struct cgroup_subsys_state *css;
 
-	if (p->flags & PF_EXITING)
-		return &root_task_group;
-
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
 			lockdep_is_held(&task_rq(p)->lock));
 	tg = container_of(css, struct task_group, css);
@@ -9081,7 +9078,8 @@ cpu_cgroup_attach(struct cgroup_subsys *
 }
 
 static void
-cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task)
+cpu_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
+		struct cgroup *old_cgrp, struct task_struct *task)
 {
 	/*
 	 * cgroup_exit() is called in the copy_process() failure path.

next prev parent reply	other threads:[~2011-02-08 10:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-20 13:30 [PATCH 1/2] perf_events: add cgroup support (v8) Stephane Eranian
2011-01-20 14:39 ` Peter Zijlstra
2011-01-20 14:46   ` Stephane Eranian
2011-02-02 11:29   ` Peter Zijlstra
2011-02-02 11:50     ` Balbir Singh
2011-02-02 12:46       ` Peter Zijlstra
2011-02-02 19:02         ` Balbir Singh
2011-02-07 16:10           ` [RFC][PATCH] cgroup: Fix cgroup_subsys::exit callback Peter Zijlstra
2011-02-07 19:28             ` Paul Menage
2011-02-07 20:02               ` Peter Zijlstra
2011-02-07 21:21                 ` Paul Menage
2011-02-08 10:24                   ` Peter Zijlstra [this message]
2011-02-10  2:04                     ` Li Zefan
2011-02-11 12:13                       ` Peter Zijlstra
2011-02-14  4:32                     ` Paul Menage
2011-02-16 13:46                     ` [tip:perf/core] " tip-bot for Peter Zijlstra
2011-02-13 12:52             ` [RFC][PATCH] " Balbir Singh
2011-02-07 19:29         ` [PATCH 1/2] perf_events: add cgroup support (v8) Paul Menage
2011-02-07 20:09           ` Peter Zijlstra
2011-02-07 21:33             ` Paul Menage
2011-02-07 16:10 ` Peter Zijlstra
2011-02-07 20:30   ` Stephane Eranian
2011-02-08 22:31   ` Stephane Eranian
2011-02-09  9:47     ` Peter Zijlstra
2011-02-10 11:47       ` Stephane Eranian
2011-02-11  0:55         ` Li Zefan
2011-02-11  9:56           ` Stephane Eranian
2011-02-11 13:36             ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1297160655.13327.92.camel@laptop \
    --to=peterz@infradead.org \
    --cc=acme@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=eranian@gmail.com \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=perfmon2-devel@lists.sf.net \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox