From: Luis Henriques <luis.henriques@canonical.com>
To: Stefan Bader <stefan.bader@canonical.com>
Cc: cwillu <cwillu@cwillu.com>,
mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
a.p.zijlstra@chello.nl, peterz@infradead.org, tglx@linutronix.de,
yong.zhang0@gmail.com
Subject: Re: [tip:sched/core] sched: Fix race in task_group()
Date: Thu, 18 Oct 2012 14:33:53 +0100 [thread overview]
Message-ID: <20121018133353.GA25885@hercules> (raw)
In-Reply-To: <507FD8AA.50500@canonical.com>
On Thu, Oct 18, 2012 at 12:23:38PM +0200, Stefan Bader wrote:
> On 18.10.2012 10:27, cwillu wrote:
> > On Tue, Jul 24, 2012 at 8:21 AM, tip-bot for Peter Zijlstra
> > <peterz@infradead.org> wrote:
> >> Commit-ID: 8323f26ce3425460769605a6aece7a174edaa7d1
> >> Gitweb: http://git.kernel.org/tip/8323f26ce3425460769605a6aece7a174edaa7d1
> >> Author: Peter Zijlstra <peterz@infradead.org>
> >> AuthorDate: Fri, 22 Jun 2012 13:36:05 +0200
> >> Committer: Ingo Molnar <mingo@kernel.org>
> >> CommitDate: Tue, 24 Jul 2012 13:58:20 +0200
> >>
> >> sched: Fix race in task_group()
> >>
> >> Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
> >> Don't call task_group() too many times in set_task_rq()"), he
> >> found the reason to be that the multiple task_group()
> >> invocations in set_task_rq() returned different values.
> >>
> >> Looking at all that I found a lack of serialization and plain
> >> wrong comments.
> >>
> >> The below tries to fix it using an extra pointer which is
> >> updated under the appropriate scheduler locks. Its not pretty,
> >> but I can't really see another way given how all the cgroup
> >> stuff works.
> >>
> >> Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
> >> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
> >> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> >
> > I just finished bisecting a crash on boot to this commit; booting with
> > "noautogroup" brings it back.
> >
> > 3.5.4 is the latest -stable that still boots, and none of the 3.6 rc's
> > boot at all.
> >
> > Photo of the bug (3.6.0next is 3.6 + btrfs's for-linus):
> > https://lh5.googleusercontent.com/-0DY-YYhgvzs/UHdB-BQdzMI/AAAAAAAAAEg/QhY9rgxnv98/s811/2012-10-11
> >
>
> On a very quick glance I wonder whether there might be a case where sched_fork
> goes into set_task_cpu with a different cpu than the current but has not yet
> task_group.sched_task_group set to something valid...
>
>
I was looking at another bug report [1] which may be related with this
issue. Basically, it looks like there is a race window where
resetting sched_autogroup_enabled will cause a crash on
shutdown/reboot. In the bug report, the user has added:
echo 0 > /proc/sys/kernel/sched_autogroup_enabled
to /etc/rc.local. This will cause a NULL pointer dereference during
shutdown (and it is reproducible with mainline kernel 3.7.0-rc1).
By using the kernel parameter noautogroup I *wasn't* able to reproduce
this issue.
After a little bit of digging, commit
800d4d30c8f20bd728e5741a3b77c4859a613f7c ("sched, autogroup: Stop
going ahead if autogroup is disabled") caught my attention as it
changes the following code path when sched_autogroup_enabled is
disabled:
sched_autogroup_create_attach()
autogroup_move_group()
sched_move_task() <<-- conditionally invoked
task_move_group_fair()
set_task_rq()
task_group()
autogroup_task_group()
And commit 8323f26ce3425460769605a6aece7a174edaa7d1 ("sched: Fix
race in task_group()") actually adds code to this conditional path (in
sched_move_task()).
A quick test shows that reverting
800d4d30c8f20bd728e5741a3b77c4859a613f7c (i.e., always going through
the whole call tree) seems to fix it or, at least, doesn't trigger the
NULL pointer. But again, I may just be doing something foolish,
hiding something else. It is also possible that this is a completely
different issue.
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1055222
Cheers,
--
Luis
next prev parent reply other threads:[~2012-10-18 13:34 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-22 11:36 [RFC][PATCH] sched: Fix race in task_group() Peter Zijlstra
2012-06-22 15:06 ` Stefan Bader
2012-06-22 15:15 ` Peter Zijlstra
2012-06-26 13:48 ` Peter Zijlstra
2012-06-26 17:49 ` Stefan Bader
2012-06-27 12:40 ` Hillf Danton
2012-06-27 12:51 ` Stefan Bader
2012-06-26 20:13 ` Tejun Heo
2012-06-26 21:17 ` Peter Zijlstra
2012-07-03 10:06 ` Stefan Bader
2012-07-06 6:24 ` [tip:sched/core] " tip-bot for Peter Zijlstra
2012-07-24 14:21 ` tip-bot for Peter Zijlstra
2012-10-18 8:27 ` cwillu
2012-10-18 10:23 ` Stefan Bader
2012-10-18 13:33 ` Luis Henriques [this message]
2012-10-18 20:50 ` cwillu
2012-10-19 7:40 ` Stefan Bader
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121018133353.GA25885@hercules \
--to=luis.henriques@canonical.com \
--cc=a.p.zijlstra@chello.nl \
--cc=cwillu@cwillu.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=stefan.bader@canonical.com \
--cc=tglx@linutronix.de \
--cc=yong.zhang0@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.