From: Xiaotian Feng <xtfeng@gmail.com>
To: "Américo Wang" <xiyou.wangcong@gmail.com>
Cc: Eric Paris <eparis@redhat.com>,
linux-kernel@vger.kernel.org, mingo@elte.hu,
peterz@infradead.org, efault@gmx.de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs
Date: Tue, 22 Dec 2009 16:34:57 +0800 [thread overview]
Message-ID: <7b6bb4a50912220034w1e49055dob1afff292becaf02@mail.gmail.com> (raw)
In-Reply-To: <2375c9f90912212350s7e48a4bfp2c5b3863f5969097@mail.gmail.com>
On Tue, Dec 22, 2009 at 3:50 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, Dec 22, 2009 at 3:41 PM, Xiaotian Feng <xtfeng@gmail.com> wrote:
>> On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
>>> [Fix top-posting]
>>>
>>> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <xtfeng@gmail.com> wrote:
>>>>
>>>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <eparis@redhat.com> wrote:
>>>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>>>> I'm exploding in the scheduler. I'm running (and building) kernel
>>>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
>>>>> signatures of problems. Some boots I'll see all 3 of these failures
>>>>> sometimes only 1 or 2 of them. That's the reason they are kinda split
>>>>> up in dmesg.
>>>>>
>>>>> 1) gcc/3141 is trying to acquire lock:
>>>>> (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>>>
>>>>> but task is already holding lock:
>>>>> (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>>>
>>>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>>>
>>>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>>> kernel/sched_fair.c
>>>>>
>>>>> Full backtraces are in the attached dmesg.
>>>>>
>>>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>>>
>>>
>>> I don't think so...
>>>
>>> I think the most suspicious commit here is ab19cb23. It kicked
>>> "local_irq_save()"
>>> out, which means if the task is selected to run on another cpu which doesn't
>>> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
>>> while we are holding rq->lock already.
>>
>> The page fault is from kernel NULL pointer deref. You should connect
>> the lockdep warning and kernel BUG together.
>>
>
> Interesting.
>
> 1) Doesn't this NULL ptr def expose that we have a potential problem?
>
> 2) For NULL ptr def problem, commit 3a7e73a2e2 seems more suspicious..
I don't think so,
(gdb) l *check_preempt_wakeup+0x170
0xffffffff8103c815 is in is_same_group (kernel/sched_fair.c:154).
(gdb) assemble check_preempt_wakeup
<snip>
0xffffffff8103c815 <is_same_group+0>: mov 0x168(%rsi),%rax
<snip>
The panic is from NULL pointer deref at 0000000000000168, so some time
in is_same_group() while loop,
parent_entity(*pse) = NULL, then is_same_group() trying to visit
pse->cfs_rq, NULL pointer deref was triggered.
commit 3a7e73, the behaviour for find_matching_se() is same as before,
this commit should not be the buggy one.
>
next prev parent reply other threads:[~2009-12-22 8:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-22 0:17 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Eric Paris
2009-12-22 5:42 ` Xiaotian Feng
2009-12-22 7:19 ` Américo Wang
2009-12-22 7:41 ` Xiaotian Feng
2009-12-22 7:50 ` Américo Wang
2009-12-22 8:34 ` Xiaotian Feng [this message]
2009-12-22 8:48 ` Peter Zijlstra
2009-12-22 6:29 ` Eric Paris
2009-12-22 14:41 ` [PATCH] sched: Peter Zijlstra
2009-12-22 14:43 ` [PATCH] sched: Revert 738d2be, Simplify set_task_cpu() Peter Zijlstra
2009-12-23 9:06 ` [tip:sched/urgent] sched: Revert 738d2be, simplify set_task_cpu() tip-bot for Peter Zijlstra
2009-12-22 11:31 ` 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Arjan van de Ven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b6bb4a50912220034w1e49055dob1afff292becaf02@mail.gmail.com \
--to=xtfeng@gmail.com \
--cc=efault@gmx.de \
--cc=eparis@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.