All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Grant Wilson <grant.wilson@zen.co.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.24-rc1-gb4f5550 oops
Date: Wed, 14 Nov 2007 16:29:30 +0100	[thread overview]
Message-ID: <20071114152930.GA1690@elte.hu> (raw)
In-Reply-To: <20071114151708.GA12355@tv-sign.ru>


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > [18073.371126] Unable to handle kernel NULL pointer dereference at 0000000000000120 RIP:
> > [18073.371134]  [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > [18073.371151] Oops: 0000 [1] PREEMPT SMP
> > [18073.371157] CPU 2
> > [18073.371161] Modules linked in: vfat fat
> > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > [18073.371171] RIP: 0010:[<ffffffff8023572e>]  [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371177] RSP: 0018:ffff810008531a78  EFLAGS: 00010006
> > [18073.371179] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [18073.371183] RDX: ffff810004441bf0 RSI: ffff81000801e860 RDI: ffff81000444ab80
> > [18073.371186] RBP: ffff810008531aa8 R08: 000000d0d47a4a90 R09: 0000000000000000
> > [18073.371188] R10: ffff810004441bf0 R11: 0000000000000001 R12: ffff810006520400
> > [18073.371190] R13: ffff81000801e860 R14: ffff81000a63a000 R15: ffff81000443d8e0
> > [18073.371193] FS:  00002b7d646a86f0(0000) GS:ffff810004c11780(0000) knlGS:0000000000000000
> > [18073.371196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [18073.371199] CR2: 0000000000000120 CR3: 0000000008495000 CR4: 00000000000006e0
> > [18073.371202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [18073.371211] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [18073.371214] Process kwin (pid: 4639, threadinfo ffff810008530000, task ffff81000840a860)
> > [18073.371216] Stack:  ffff81000444ab80 0000000000000001 ffff81000801e860 ffff81000444ab80
> > [18073.371231]  0000000000000002 ffff81000443d8e0 ffff810008531b38 ffffffff8023061e
> > [18073.371238]  0000000000000000 ffff810004441b80 0000000000000002 0000000100000000
> > [18073.371245] Call Trace:
> > [18073.371250]  [<ffffffff8023061e>] try_to_wake_up+0x2fe/0x3a0
> 
> I suspect I see the bug in that area, but I am not sure it can explain 
> this trace completely.

there's a fix pending from Dmitry - please see below. It took days for 
Grant to trigger the crash so it needs some time to be confirmed but it 
could explain the crash in theory.

	Ingo

---------------------->
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko <dmitry.adamushko@gmail.com>

Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core 
system, which crashes can only be explained via runqueue corruption.

there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to 
a new value, task_rq_lock(p, ...) can be successfuly executed on another 
CPU. We must ensure that updates of per-task data have been completed by 
this moment.

this bug has been hiding in the Linux scheduler for an eternity (we 
never had any explicit barrier for task->cpu in set_task_cpu() - so the 
bug was introduced in 2.5.1), but only became visible via 
set_task_cfs_rq() being accidentally put after the task->cpu update. It 
also probably needs a sufficiently out-of-order CPU to trigger.

Reported-by: Grant Wilson <grant.wilson@zen.co.uk>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
 {
-	p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)];
-	p->se.parent = task_group(p)->se[task_cpu(p)];
+	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+	p->se.parent = task_group(p)->se[cpu];
 }
 
 #else
 
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
 
 #endif	/* CONFIG_FAIR_GROUP_SCHED */
 
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
+	set_task_cfs_rq(p, cpu);
 #ifdef CONFIG_SMP
+	/*
+	 * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
+	 * successfuly executed on another CPU. We must ensure that updates of
+	 * per-task data have been completed by this moment.
+	 */
+	smp_wmb();
 	task_thread_info(p)->cpu = cpu;
 #endif
-	set_task_cfs_rq(p);
 }
 
 #ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct 
 			tsk->sched_class->put_prev_task(rq, tsk);
 	}
 
-	set_task_cfs_rq(tsk);
+	set_task_cfs_rq(tsk, task_cpu(tsk));
 
 	if (on_rq) {
 		if (unlikely(running))

  reply	other threads:[~2007-11-14 15:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-14 15:17 2.6.24-rc1-gb4f5550 oops Oleg Nesterov
2007-11-14 15:29 ` Ingo Molnar [this message]
2007-11-14 15:50   ` Oleg Nesterov
2007-11-14 15:59     ` Ingo Molnar
2007-11-14 16:02       ` Ingo Molnar
2007-11-14 16:37 ` Srivatsa Vaddagiri
2007-11-14 17:48   ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2007-11-05  6:11 Grant Wilson
2007-11-08  0:06 ` Rafael J. Wysocki
     [not found]   ` <20071108062250.02479c0c@worthy.swandive.local>
2007-11-08 15:53     ` Rafael J. Wysocki
2007-11-08 17:22       ` Grant Wilson
2007-11-08 21:42         ` Rafael J. Wysocki
2007-11-08 22:19           ` Grant Wilson
2007-11-08 22:49             ` Rafael J. Wysocki
2007-11-12 18:05               ` Peter Zijlstra
2007-11-12 18:28                 ` Grant Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071114152930.GA1690@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=grant.wilson@zen.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.