From: Ingo Molnar <mingo@elte.hu>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Grant Wilson <grant.wilson@zen.co.uk>,
Peter Zijlstra <peterz@infradead.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.24-rc1-gb4f5550 oops
Date: Wed, 14 Nov 2007 16:29:30 +0100 [thread overview]
Message-ID: <20071114152930.GA1690@elte.hu> (raw)
In-Reply-To: <20071114151708.GA12355@tv-sign.ru>
* Oleg Nesterov <oleg@tv-sign.ru> wrote:
> > [18073.371126] Unable to handle kernel NULL pointer dereference at 0000000000000120 RIP:
> > [18073.371134] [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > [18073.371151] Oops: 0000 [1] PREEMPT SMP
> > [18073.371157] CPU 2
> > [18073.371161] Modules linked in: vfat fat
> > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > [18073.371171] RIP: 0010:[<ffffffff8023572e>] [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371177] RSP: 0018:ffff810008531a78 EFLAGS: 00010006
> > [18073.371179] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [18073.371183] RDX: ffff810004441bf0 RSI: ffff81000801e860 RDI: ffff81000444ab80
> > [18073.371186] RBP: ffff810008531aa8 R08: 000000d0d47a4a90 R09: 0000000000000000
> > [18073.371188] R10: ffff810004441bf0 R11: 0000000000000001 R12: ffff810006520400
> > [18073.371190] R13: ffff81000801e860 R14: ffff81000a63a000 R15: ffff81000443d8e0
> > [18073.371193] FS: 00002b7d646a86f0(0000) GS:ffff810004c11780(0000) knlGS:0000000000000000
> > [18073.371196] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [18073.371199] CR2: 0000000000000120 CR3: 0000000008495000 CR4: 00000000000006e0
> > [18073.371202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [18073.371211] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [18073.371214] Process kwin (pid: 4639, threadinfo ffff810008530000, task ffff81000840a860)
> > [18073.371216] Stack: ffff81000444ab80 0000000000000001 ffff81000801e860 ffff81000444ab80
> > [18073.371231] 0000000000000002 ffff81000443d8e0 ffff810008531b38 ffffffff8023061e
> > [18073.371238] 0000000000000000 ffff810004441b80 0000000000000002 0000000100000000
> > [18073.371245] Call Trace:
> > [18073.371250] [<ffffffff8023061e>] try_to_wake_up+0x2fe/0x3a0
>
> I suspect I see the bug in that area, but I am not sure it can explain
> this trace completely.
there's a fix pending from Dmitry - please see below. It took days for
Grant to trigger the crash so it needs some time to be confirmed but it
could explain the crash in theory.
Ingo
---------------------->
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core
system, which crashes can only be explained via runqueue corruption.
there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to
a new value, task_rq_lock(p, ...) can be successfuly executed on another
CPU. We must ensure that updates of per-task data have been completed by
this moment.
this bug has been hiding in the Linux scheduler for an eternity (we
never had any explicit barrier for task->cpu in set_task_cpu() - so the
bug was introduced in 2.5.1), but only became visible via
set_task_cfs_rq() being accidentally put after the task->cpu update. It
also probably needs a sufficiently out-of-order CPU to trigger.
Reported-by: Grant Wilson <grant.wilson@zen.co.uk>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
}
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
{
- p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)];
- p->se.parent = task_group(p)->se[task_cpu(p)];
+ p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+ p->se.parent = task_group(p)->se[cpu];
}
#else
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
#endif /* CONFIG_FAIR_GROUP_SCHED */
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
{
+ set_task_cfs_rq(p, cpu);
#ifdef CONFIG_SMP
+ /*
+ * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
+ * successfuly executed on another CPU. We must ensure that updates of
+ * per-task data have been completed by this moment.
+ */
+ smp_wmb();
task_thread_info(p)->cpu = cpu;
#endif
- set_task_cfs_rq(p);
}
#ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct
tsk->sched_class->put_prev_task(rq, tsk);
}
- set_task_cfs_rq(tsk);
+ set_task_cfs_rq(tsk, task_cpu(tsk));
if (on_rq) {
if (unlikely(running))
next prev parent reply other threads:[~2007-11-14 15:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-14 15:17 2.6.24-rc1-gb4f5550 oops Oleg Nesterov
2007-11-14 15:29 ` Ingo Molnar [this message]
2007-11-14 15:50 ` Oleg Nesterov
2007-11-14 15:59 ` Ingo Molnar
2007-11-14 16:02 ` Ingo Molnar
2007-11-14 16:37 ` Srivatsa Vaddagiri
2007-11-14 17:48 ` Oleg Nesterov
-- strict thread matches above, loose matches on Subject: below --
2007-11-05 6:11 Grant Wilson
2007-11-08 0:06 ` Rafael J. Wysocki
[not found] ` <20071108062250.02479c0c@worthy.swandive.local>
2007-11-08 15:53 ` Rafael J. Wysocki
2007-11-08 17:22 ` Grant Wilson
2007-11-08 21:42 ` Rafael J. Wysocki
2007-11-08 22:19 ` Grant Wilson
2007-11-08 22:49 ` Rafael J. Wysocki
2007-11-12 18:05 ` Peter Zijlstra
2007-11-12 18:28 ` Grant Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071114152930.GA1690@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=grant.wilson@zen.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
--cc=vatsa@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox