From: "Justin P. Mattock" <justinmattock@gmail.com>
To: huang ying <huang.ying.caritas@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
Andi Kleen <andi@firstfloor.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: using mce_inject I get: RIP 10:<ffffffffa012c909> {ttm_bo_unref+0xf/0x45 [ttm]}
Date: Tue, 30 Aug 2011 08:38:18 -0700 [thread overview]
Message-ID: <4E5D03EA.1010309@gmail.com> (raw)
In-Reply-To: <CAC=cRTNJgRfYk9qpeX4=8FtPSr-c0_MGr48VnOBrp01-d3yhWQ@mail.gmail.com>
On 08/29/2011 06:07 PM, huang ying wrote:
> On Sat, Aug 27, 2011 at 11:03 PM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>> On 08/23/2011 01:15 PM, Luck, Tony wrote:
>>>>
>>>> its easily fixable, but not sure its a good idea due to bisect going
>>>> through commits(afraid I might go astray with the bisect if I add any
>>>> patches).
>>>
>>> Rather than fixing a bad build - you can try moving to a nearby commit
>>> (use "gitk" to get a view of the structure around the commit that git
>>> bisect suggested). In the early stages of a bisection, it doesn't really
>>> matter much if you build the mid-point that bisect provided, or some
>>> nearby on - just be sure to mark good/bad the commit you actually built.
>>>
>>> -Tony
>>>
>>>
>>
>> well.. after bisecting(with no results), I found that something in my
>> .config was causing this, so after looking through, I found that having
>> X86_MCE_INJECT = y causes the pauses when the timeouts occur
>>
>> let me know if I need to supply any info.
>
> Which test case cause the pause? Some test case with "timeout" in
> name may cause timeout between CPUs. Or you can try boot system with
> kernel parameter "mce=3,0", which will disable timeout.
>
> Best Regards,
> Huang Ying
>
cool thanks for the info.
I went and used mce=3,0 on the command line, and then ran the mce-test
suite. unfortunantly the pause still occurs.
as for which timeouts bassically when any of the timeouts
here is what the verbosity looks like:
`/home/kernel/mce-inject/mce-test'
./drivers/simple/driver.sh simple.conf
soft-inj/non-panic/corrected:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_hold:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_over:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/panic/fatal:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_eipv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_irq:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Machine check from unknown source
soft-inj/panic/fatal_over:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_userspace:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
in dmesg I see:
[ 102.491609] Starting machine check poll CPU 1
[ 102.492077] [Hardware Error]: Machine check events logged
[ 102.492086] Machine check poll done on CPU 1
[ 123.537575] Triggering MCE exception on CPU 0
[ 123.537584] Disabling lock debugging due to kernel taint
[ 123.537594] [Hardware Error]: Machine check events logged
[ 123.537597] MCE exception done on CPU 0
[ 129.779850] Triggering MCE exception on CPU 1
[ 129.779879] MCE exception done on CPU 1
[ 137.030085] Triggering MCE exception on CPU 0
[ 137.030108] MCE exception done on CPU 0
[ 143.286096] Triggering MCE exception on CPU 0
[ 143.286110] MCE exception done on CPU 0
[ 149.541391] Triggering MCE exception on CPU 0
[ 149.541409] MCE exception done on CPU 0
[ 156.785580] Triggering MCE exception on CPU 1
[ 156.785602] MCE exception done on CPU 1
[ 164.011576] Triggering MCE exception on CPU 0
[ 164.012558] mce_notify_irq: 4 callbacks suppressed
[ 164.012558] [Hardware Error]: Machine check events logged
[ 166.795340] MCE exception done on CPU 0
[ 173.088624] Triggering MCE exception on CPU 0
[ 173.089600] [Hardware Error]: Machine check events logged
[ 177.119421] MCE exception done on CPU 0
[ 184.373355] Triggering MCE exception on CPU 1
[ 184.373372] MCE exception done on CPU 1
[ 190.741030] Triggering MCE exception on CPU 1
[ 190.741047] MCE exception done on CPU 1
let me know if you need more info.
Justin P. Mattock
prev parent reply other threads:[~2011-08-30 15:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-21 2:31 using mce_inject I get: RIP 10:<ffffffffa012c909> {ttm_bo_unref+0xf/0x45 [ttm]} Justin P. Mattock
2011-08-21 22:16 ` Andi Kleen
2011-08-21 23:08 ` Justin P. Mattock
2011-08-23 18:01 ` Justin P. Mattock
2011-08-23 20:15 ` Luck, Tony
2011-08-24 3:36 ` Justin P. Mattock
2011-08-27 15:03 ` Justin P. Mattock
2011-08-27 15:12 ` Andi Kleen
2011-08-30 1:07 ` huang ying
2011-08-30 15:38 ` Justin P. Mattock [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E5D03EA.1010309@gmail.com \
--to=justinmattock@gmail.com \
--cc=andi@firstfloor.org \
--cc=huang.ying.caritas@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.