public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Grant Wilson <grant.wilson@zen.co.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.24-rc1-gb4f5550 oops
Date: Wed, 14 Nov 2007 16:29:30 +0100	[thread overview]
Message-ID: <20071114152930.GA1690@elte.hu> (raw)
In-Reply-To: <20071114151708.GA12355@tv-sign.ru>


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > [18073.371126] Unable to handle kernel NULL pointer dereference at 0000000000000120 RIP:
> > [18073.371134]  [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > [18073.371151] Oops: 0000 [1] PREEMPT SMP
> > [18073.371157] CPU 2
> > [18073.371161] Modules linked in: vfat fat
> > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > [18073.371171] RIP: 0010:[<ffffffff8023572e>]  [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371177] RSP: 0018:ffff810008531a78  EFLAGS: 00010006
> > [18073.371179] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [18073.371183] RDX: ffff810004441bf0 RSI: ffff81000801e860 RDI: ffff81000444ab80
> > [18073.371186] RBP: ffff810008531aa8 R08: 000000d0d47a4a90 R09: 0000000000000000
> > [18073.371188] R10: ffff810004441bf0 R11: 0000000000000001 R12: ffff810006520400
> > [18073.371190] R13: ffff81000801e860 R14: ffff81000a63a000 R15: ffff81000443d8e0
> > [18073.371193] FS:  00002b7d646a86f0(0000) GS:ffff810004c11780(0000) knlGS:0000000000000000
> > [18073.371196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [18073.371199] CR2: 0000000000000120 CR3: 0000000008495000 CR4: 00000000000006e0
> > [18073.371202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [18073.371211] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [18073.371214] Process kwin (pid: 4639, threadinfo ffff810008530000, task ffff81000840a860)
> > [18073.371216] Stack:  ffff81000444ab80 0000000000000001 ffff81000801e860 ffff81000444ab80
> > [18073.371231]  0000000000000002 ffff81000443d8e0 ffff810008531b38 ffffffff8023061e
> > [18073.371238]  0000000000000000 ffff810004441b80 0000000000000002 0000000100000000
> > [18073.371245] Call Trace:
> > [18073.371250]  [<ffffffff8023061e>] try_to_wake_up+0x2fe/0x3a0
> 
> I suspect I see the bug in that area, but I am not sure it can explain 
> this trace completely.

there's a fix pending from Dmitry - please see below. It took days for 
Grant to trigger the crash so it needs some time to be confirmed but it 
could explain the crash in theory.

	Ingo

---------------------->
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko <dmitry.adamushko@gmail.com>

Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core 
system, which crashes can only be explained via runqueue corruption.

there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to 
a new value, task_rq_lock(p, ...) can be successfuly executed on another 
CPU. We must ensure that updates of per-task data have been completed by 
this moment.

this bug has been hiding in the Linux scheduler for an eternity (we 
never had any explicit barrier for task->cpu in set_task_cpu() - so the 
bug was introduced in 2.5.1), but only became visible via 
set_task_cfs_rq() being accidentally put after the task->cpu update. It 
also probably needs a sufficiently out-of-order CPU to trigger.

Reported-by: Grant Wilson <grant.wilson@zen.co.uk>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
 {
-	p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)];
-	p->se.parent = task_group(p)->se[task_cpu(p)];
+	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+	p->se.parent = task_group(p)->se[cpu];
 }
 
 #else
 
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
 
 #endif	/* CONFIG_FAIR_GROUP_SCHED */
 
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
+	set_task_cfs_rq(p, cpu);
 #ifdef CONFIG_SMP
+	/*
+	 * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
+	 * successfuly executed on another CPU. We must ensure that updates of
+	 * per-task data have been completed by this moment.
+	 */
+	smp_wmb();
 	task_thread_info(p)->cpu = cpu;
 #endif
-	set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct 
 			tsk->sched_class->put_prev_task(rq, tsk);
 	}
 
-	set_task_cfs_rq(tsk);
+	set_task_cfs_rq(tsk, task_cpu(tsk));
 
 	if (on_rq) {
 		if (unlikely(running))

  reply	other threads:[~2007-11-14 15:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-14 15:17 2.6.24-rc1-gb4f5550 oops Oleg Nesterov
2007-11-14 15:29 ` Ingo Molnar [this message]
2007-11-14 15:50   ` Oleg Nesterov
2007-11-14 15:59     ` Ingo Molnar
2007-11-14 16:02       ` Ingo Molnar
2007-11-14 16:37 ` Srivatsa Vaddagiri
2007-11-14 17:48   ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2007-11-05  6:11 Grant Wilson
2007-11-08  0:06 ` Rafael J. Wysocki
     [not found]   ` <20071108062250.02479c0c@worthy.swandive.local>
2007-11-08 15:53     ` Rafael J. Wysocki
2007-11-08 17:22       ` Grant Wilson
2007-11-08 21:42         ` Rafael J. Wysocki
2007-11-08 22:19           ` Grant Wilson
2007-11-08 22:49             ` Rafael J. Wysocki
2007-11-12 18:05               ` Peter Zijlstra
2007-11-12 18:28                 ` Grant Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071114152930.GA1690@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=grant.wilson@zen.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox