From: "Justin P. Mattock" <justinmattock@gmail.com>
To: huang ying <huang.ying.caritas@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
Andi Kleen <andi@firstfloor.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: using mce_inject I get: RIP 10:<ffffffffa012c909> {ttm_bo_unref+0xf/0x45 [ttm]}
Date: Tue, 30 Aug 2011 08:38:18 -0700 [thread overview]
Message-ID: <4E5D03EA.1010309@gmail.com> (raw)
In-Reply-To: <CAC=cRTNJgRfYk9qpeX4=8FtPSr-c0_MGr48VnOBrp01-d3yhWQ@mail.gmail.com>
On 08/29/2011 06:07 PM, huang ying wrote:
> On Sat, Aug 27, 2011 at 11:03 PM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>> On 08/23/2011 01:15 PM, Luck, Tony wrote:
>>>>
>>>> its easily fixable, but not sure its a good idea due to bisect going
>>>> through commits(afraid I might go astray with the bisect if I add any
>>>> patches).
>>>
>>> Rather than fixing a bad build - you can try moving to a nearby commit
>>> (use "gitk" to get a view of the structure around the commit that git
>>> bisect suggested). In the early stages of a bisection, it doesn't really
>>> matter much if you build the mid-point that bisect provided, or some
>>> nearby on - just be sure to mark good/bad the commit you actually built.
>>>
>>> -Tony
>>>
>>>
>>
>> well.. after bisecting(with no results), I found that something in my
>> .config was causing this, so after looking through, I found that having
>> X86_MCE_INJECT = y causes the pauses when the timeouts occur
>>
>> let me know if I need to supply any info.
>
> Which test case cause the pause? Some test case with "timeout" in
> name may cause timeout between CPUs. Or you can try boot system with
> kernel parameter "mce=3,0", which will disable timeout.
>
> Best Regards,
> Huang Ying
>
cool thanks for the info.
I went and used mce=3,0 on the command line, and then ran the mce-test
suite. unfortunantly the pause still occurs.
as for which timeouts bassically when any of the timeouts
here is what the verbosity looks like:
`/home/kernel/mce-inject/mce-test'
./drivers/simple/driver.sh simple.conf
soft-inj/non-panic/corrected:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_hold:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_over:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/panic/fatal:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_eipv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_irq:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Machine check from unknown source
soft-inj/panic/fatal_over:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_userspace:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
in dmesg I see:
[ 102.491609] Starting machine check poll CPU 1
[ 102.492077] [Hardware Error]: Machine check events logged
[ 102.492086] Machine check poll done on CPU 1
[ 123.537575] Triggering MCE exception on CPU 0
[ 123.537584] Disabling lock debugging due to kernel taint
[ 123.537594] [Hardware Error]: Machine check events logged
[ 123.537597] MCE exception done on CPU 0
[ 129.779850] Triggering MCE exception on CPU 1
[ 129.779879] MCE exception done on CPU 1
[ 137.030085] Triggering MCE exception on CPU 0
[ 137.030108] MCE exception done on CPU 0
[ 143.286096] Triggering MCE exception on CPU 0
[ 143.286110] MCE exception done on CPU 0
[ 149.541391] Triggering MCE exception on CPU 0
[ 149.541409] MCE exception done on CPU 0
[ 156.785580] Triggering MCE exception on CPU 1
[ 156.785602] MCE exception done on CPU 1
[ 164.011576] Triggering MCE exception on CPU 0
[ 164.012558] mce_notify_irq: 4 callbacks suppressed
[ 164.012558] [Hardware Error]: Machine check events logged
[ 166.795340] MCE exception done on CPU 0
[ 173.088624] Triggering MCE exception on CPU 0
[ 173.089600] [Hardware Error]: Machine check events logged
[ 177.119421] MCE exception done on CPU 0
[ 184.373355] Triggering MCE exception on CPU 1
[ 184.373372] MCE exception done on CPU 1
[ 190.741030] Triggering MCE exception on CPU 1
[ 190.741047] MCE exception done on CPU 1
let me know if you need more info.
Justin P. Mattock
prev parent reply other threads:[~2011-08-30 15:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-21 2:31 using mce_inject I get: RIP 10:<ffffffffa012c909> {ttm_bo_unref+0xf/0x45 [ttm]} Justin P. Mattock
2011-08-21 22:16 ` Andi Kleen
2011-08-21 23:08 ` Justin P. Mattock
2011-08-23 18:01 ` Justin P. Mattock
2011-08-23 20:15 ` Luck, Tony
2011-08-24 3:36 ` Justin P. Mattock
2011-08-27 15:03 ` Justin P. Mattock
2011-08-27 15:12 ` Andi Kleen
2011-08-30 1:07 ` huang ying
2011-08-30 15:38 ` Justin P. Mattock [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E5D03EA.1010309@gmail.com \
--to=justinmattock@gmail.com \
--cc=andi@firstfloor.org \
--cc=huang.ying.caritas@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox