From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>,
Miklos Vajna <vmiklos@frugalware.org>,
shenghui <crosslonelyover@gmail.com>,
kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@elte.hu, Greg KH <greg@kroah.com>,
Paul Turner <pjt@google.com>, Yong Zhang <yong.zhang0@gmail.com>,
Li Zefan <lizf@cn.fujitsu.com>, Paul Menage <menage@google.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free
Date: Sat, 25 Dec 2010 18:07:25 +0000 [thread overview]
Message-ID: <20101225175525.GA3393@balbir.in.ibm.com> (raw)
In-Reply-To: <1293206353.29444.205.camel@laptop>
* Peter Zijlstra <peterz@infradead.org> [2010-12-24 16:59:13]:
> On Fri, 2010-12-24 at 13:16 +0100, Mike Galbraith wrote:
> > On Fri, 2010-12-24 at 11:54 +0100, Peter Zijlstra wrote:
>
> > > Right, so the cgroup core is supposed to already emit -EBUSY when there
> > > are associated tasks with the cgroup, that _should_ be sufficient, the
> > > pre_destroy() method is to frob some extra constraints or somesuch.
> > >
> > > Our problem looks to be that a task (afaict usually current) changes
> > > cgroups without us getting notified of it. On destruction the task is
> > > still enqueued in the cfs_rq being destroyed but is not actually part of
> > > that cgroup according to the task->css bits.
> >
> > Could it be an exiting task? We're still preemptible, and iirc, you run
> > a CONFIG_PREEMPT kernel. (grasp at all straws;)
> >
> > cgroup_exit:
> > /* Reassign the task to the init_css_set. */
> > task_lock(tsk);
> > cg = tsk->cgroups;
> > tsk->cgroups = &init_css_set;
> > task_unlock(tsk);
> > if (cg)
> > put_css_set_taskexit(cg);
> >
>
> This straw appears true:
>
> $ grep -e cpu_cgroup\\\|f491447c log9
>
> ...
>
> kworker/-1196 0d..2. 1601180us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> kworker/-1196 0d..2. 1601186us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> kworker/-1196 0d..2. 1601188us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1601188us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1601192us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> modprobe-1210 0d..5. 1601802us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..5. 1601807us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601817us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601819us : __enqueue_entity: f491447c to f492a480, 1 tasks
> modprobe-1210 0d..2. 1601826us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601832us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601839us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601848us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601854us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601860us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601865us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601871us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601872us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1601873us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1601876us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..7. 1601895us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..7. 1601900us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601909us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601911us : __enqueue_entity: f491447c to f492a480, 1 tasks
> modprobe-1210 0d..2. 1601918us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601924us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601931us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602071us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602080us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602089us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602097us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602105us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602107us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1602108us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1602114us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..3. 1602128us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 80, load: 1024, cgroup: /
>
>
> So cgroup moves a task without calling cgroup_subsys::attach() which is
> odd, but it does have an ::exit method, sadly it calls that _before_
> re-assigning the task, which means we have to jump through some hoops.
>
> The below seems to fix the problem for me..
>
> ---
> Subject: sched, cgroup: Use exit hook to avoid use-after-free crash
>
> By not notifying the controller of the on-exit move back to
> init_css_set, we fail to move the task out of the previous cgroup's
> cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and
> free the cgroup (there are no active tasks left in it after all) to
> which the not-quite dead task is still enqueued.
>
> Cc: stable@kernel.org
> Reported-by: Miklos Vajna <vmiklos@frugalware.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/sched.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 7e401f8..572625c 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -611,6 +611,9 @@ static inline struct task_group *task_group(struct task_struct *p)
> struct task_group *tg;
> struct cgroup_subsys_state *css;
>
> + if (p->flags & PF_EXITING)
> + return &root_task_group;
> +
> css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> lockdep_is_held(&task_rq(p)->lock));
> tg = container_of(css, struct task_group, css);
> @@ -8887,6 +8890,12 @@ cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> }
> }
>
> +static void
> +cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task)
> +{
> + sched_move_task(task);
> +}
> +
> #ifdef CONFIG_FAIR_GROUP_SCHED
> static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype,
> u64 shareval)
> @@ -8959,6 +8968,7 @@ struct cgroup_subsys cpu_cgroup_subsys = {
> .destroy = cpu_cgroup_destroy,
> .can_attach = cpu_cgroup_can_attach,
> .attach = cpu_cgroup_attach,
> + .exit = cpu_cgroup_exit,
> .populate = cpu_cgroup_populate,
> .subsys_id = cpu_cgroup_subsys_id,
> .early_init = 1,
>
>
Very good catch!
Looks very reasonable and correct to me
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Three Cheers,
Balbir
next prev parent reply other threads:[~2010-12-25 18:07 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-29 7:10 [PATCH] avoid race condition in pick_next_task_fair in shenghui
2010-06-29 10:43 ` Peter Zijlstra
2010-06-29 11:24 ` shenghui
2010-06-29 11:35 ` Peter Zijlstra
2010-06-29 12:44 ` shenghui
2010-12-19 2:03 ` Miklos Vajna
2010-12-22 0:22 ` Miklos Vajna
2010-12-22 8:29 ` Peter Zijlstra
2010-12-22 8:41 ` Peter Zijlstra
2010-12-22 8:41 ` Mike Galbraith
2010-12-22 9:07 ` Peter Zijlstra
2010-12-22 13:31 ` Miklos Vajna
2010-12-22 14:00 ` Peter Zijlstra
2010-12-22 14:11 ` Peter Zijlstra
2010-12-22 15:14 ` Miklos Vajna
2010-12-22 15:25 ` Peter Zijlstra
2010-12-22 17:08 ` Peter Zijlstra
2010-12-22 17:16 ` Ingo Molnar
2010-12-22 17:25 ` Peter Zijlstra
2010-12-22 20:36 ` Peter Zijlstra
2010-12-23 2:08 ` [PATCH] avoid race condition in pick_next_task_fair in kernel/sched_fair.c Yong Zhang
2010-12-23 12:12 ` [PATCH] avoid race condition in pick_next_task_fair in Peter Zijlstra
2010-12-23 12:33 ` Peter Zijlstra
2010-12-23 18:24 ` Peter Zijlstra
[not found] ` <1293132304.6798.6.camel@marge.simson.net>
[not found] ` <1293132862.25981.22.camel@laptop>
[not found] ` <1293187425.7138.2.camel@marge.simson.net>
[not found] ` <1293188091.25981.200.camel@laptop>
[not found] ` <1293192999.18035.4.camel@marge.simson.net>
2010-12-24 15:59 ` [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash Peter Zijlstra
2010-12-24 16:40 ` [PATCH] sched, cgroup: Use exit hook to avoid use-after-free Miklos Vajna
2010-12-24 16:48 ` Mike Galbraith
2010-12-24 17:07 ` Peter Zijlstra
2010-12-24 17:24 ` Mike Galbraith
2010-12-25 18:07 ` Balbir Singh [this message]
2010-12-25 20:59 ` [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash Paul Menage
2011-01-03 7:06 ` [PATCH] sched, cgroup: Use exit hook to avoid use-after-free Peter Zijlstra
2010-12-29 15:25 ` Ingo Molnar
2010-12-31 8:32 ` [PATCH] Re: [PATCH] sched, cgroup: Use exit hook to avoid Mike Galbraith
2011-01-03 8:21 ` Peter Zijlstra
[not found] ` <20101229230744.GA10557@genesis.frugalware.org>
2010-12-31 10:04 ` [PATCH] sched, cgroup: Use exit hook to avoid use-after-free Mike Galbraith
2010-12-31 10:46 ` Miklos Vajna
2010-12-22 21:11 ` [PATCH] avoid race condition in pick_next_task_fair in Miklos Vajna
2010-12-22 23:39 ` Miklos Vajna
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101225175525.GA3393@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=crosslonelyover@gmail.com \
--cc=efault@gmx.de \
--cc=greg@kroah.com \
--cc=kernel-janitors@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=vatsa@in.ibm.com \
--cc=vmiklos@frugalware.org \
--cc=yong.zhang0@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox