From: Xiaotian Feng <xtfeng@gmail.com>
To: "Américo Wang" <xiyou.wangcong@gmail.com>
Cc: Eric Paris <eparis@redhat.com>,
linux-kernel@vger.kernel.org, mingo@elte.hu,
peterz@infradead.org, efault@gmx.de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs
Date: Tue, 22 Dec 2009 16:34:57 +0800 [thread overview]
Message-ID: <7b6bb4a50912220034w1e49055dob1afff292becaf02@mail.gmail.com> (raw)
In-Reply-To: <2375c9f90912212350s7e48a4bfp2c5b3863f5969097@mail.gmail.com>
On Tue, Dec 22, 2009 at 3:50 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, Dec 22, 2009 at 3:41 PM, Xiaotian Feng <xtfeng@gmail.com> wrote:
>> On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
>>> [Fix top-posting]
>>>
>>> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <xtfeng@gmail.com> wrote:
>>>>
>>>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <eparis@redhat.com> wrote:
>>>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>>>> I'm exploding in the scheduler. I'm running (and building) kernel
>>>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
>>>>> signatures of problems. Some boots I'll see all 3 of these failures
>>>>> sometimes only 1 or 2 of them. That's the reason they are kinda split
>>>>> up in dmesg.
>>>>>
>>>>> 1) gcc/3141 is trying to acquire lock:
>>>>> (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>>>
>>>>> but task is already holding lock:
>>>>> (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>>>
>>>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>>>
>>>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>>> kernel/sched_fair.c
>>>>>
>>>>> Full backtraces are in the attached dmesg.
>>>>>
>>>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>>>
>>>
>>> I don't think so...
>>>
>>> I think the most suspicious commit here is ab19cb23. It kicked
>>> "local_irq_save()"
>>> out, which means if the task is selected to run on another cpu which doesn't
>>> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
>>> while we are holding rq->lock already.
>>
>> The page fault is from kernel NULL pointer deref. You should connect
>> the lockdep warning and kernel BUG together.
>>
>
> Interesting.
>
> 1) Doesn't this NULL ptr def expose that we have a potential problem?
>
> 2) For NULL ptr def problem, commit 3a7e73a2e2 seems more suspicious..
I don't think so,
(gdb) l *check_preempt_wakeup+0x170
0xffffffff8103c815 is in is_same_group (kernel/sched_fair.c:154).
(gdb) assemble check_preempt_wakeup
<snip>
0xffffffff8103c815 <is_same_group+0>: mov 0x168(%rsi),%rax
<snip>
The panic is from NULL pointer deref at 0000000000000168, so some time
in is_same_group() while loop,
parent_entity(*pse) = NULL, then is_same_group() trying to visit
pse->cfs_rq, NULL pointer deref was triggered.
commit 3a7e73, the behaviour for find_matching_se() is same as before,
this commit should not be the buggy one.
>
next prev parent reply other threads:[~2009-12-22 8:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-22 0:17 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Eric Paris
2009-12-22 5:42 ` Xiaotian Feng
2009-12-22 7:19 ` Américo Wang
2009-12-22 7:41 ` Xiaotian Feng
2009-12-22 7:50 ` Américo Wang
2009-12-22 8:34 ` Xiaotian Feng [this message]
2009-12-22 8:48 ` Peter Zijlstra
2009-12-22 6:29 ` Eric Paris
2009-12-22 14:41 ` [PATCH] sched: Peter Zijlstra
2009-12-22 14:43 ` [PATCH] sched: Revert 738d2be, Simplify set_task_cpu() Peter Zijlstra
2009-12-23 9:06 ` [tip:sched/urgent] sched: Revert 738d2be, simplify set_task_cpu() tip-bot for Peter Zijlstra
2009-12-22 11:31 ` 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Arjan van de Ven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b6bb4a50912220034w1e49055dob1afff292becaf02@mail.gmail.com \
--to=xtfeng@gmail.com \
--cc=efault@gmx.de \
--cc=eparis@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox