From: Badalian Vyacheslav <slavon@bigtelecom.ru>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Denys Fedoryshchenko <denys@visp.net.lb>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Machine Check Exception Re: NetDev! Please help!
Date: Mon, 22 Sep 2008 13:40:35 +0400 [thread overview]
Message-ID: <48D76813.9000603@bigtelecom.ru> (raw)
In-Reply-To: <20080922065339.GA4399@ff.dom.local>
Thanks for answer Jarek!
I post it is bugtrack - http://bugzilla.kernel.org/show_bug.cgi?id=11618
I not think that its hardware error because this problem we have in 10
servers on 2.6.26.2 kernel +)
On Friday night i compile 2.6.26.5 and have 2 panic on 1 pc what have
max load and 1 panic on other pc.
I write to netdev list because first messages looks like:
[ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
[ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 4956.420300] Tx Queue <0>
[ 4956.420300] TDH <81>
[ 4956.420301] TDT <81>
[ 4956.420302] next_to_use <81>
[ 4956.420302] next_to_clean <d6>
[ 4956.420303] buffer_info[next_to_clean]
[ 4956.420303] time_stamp <15498d>
[ 4956.420304] next_to_watch <d6>
[ 4956.420304] jiffies <15511c>
[ 4956.420305] next_to_watch.status <1>
[ 4956.420537] eth1: Detected Tx Unit Hang:
[ 4956.420538] TDH <b0>
[ 4956.420538] TDT <b0>
[ 4956.420539] next_to_use <b0>
[ 4956.420539] next_to_clean <5>
[ 4956.420540] buffer_info[next_to_clean]:
[ 4956.420540] time_stamp <15498e>
[ 4956.420541] next_to_watch <5>
[ 4956.420542] jiffies <15511c>
[ 4956.420542] next_to_watch.status <1>
[ 4956.423064] CPU 1: Bank 0: 3200004000000800
[ 4956.423190] CPU 1: Bank 5: 3200220024080400
[ 4956.423315] Kernel panic - not syncing: CPU context corrupt
[ 4956.423933] Rebooting in 3 seconds..
But in 2.6.26.5 i not see errors like this 2 days... Also if system not have network load - i can't do panic by cpuburn or compiling sources...
Anyone i think its good that my message also go to general mail-list and bugzilla...
I try get more info... if you or anyone have idea how test this bug - i can do it)
Thanks!
> On Mon, Sep 22, 2008 at 10:17:01AM +0400, Badalian Vyacheslav wrote:
>
>> Jarek Poplawski:
>>
>> Hello!
>> There all requested information.
>> I try 2.6.26.5 and again get:
>> [143784.513166] CPU 2: Bank 0: 3200004000000800
>> [143784.513241] CPU 2: Bank 5: 3200121020080400
>> [143784.513241] Kernel panic - not syncing: CPU context corrupt
>> [143784.513282] Rebooting in 3 seconds..
>>
>
> Hi,
>
> Actually, I suggested you to read this Machine Check Exception help,
> because I think you should first try to test your hardware instead of
> sending configs. This type of error isn't usually seen with netdev
> bugs.
>
> Since I'm not a hardware expert I added linux-kernel to Cc, and
> probably you should do the same (I added it to this one). But, until
> you have any better advice I think you should do some long and heavy
> testing of your PCs especially for overheating or memory problems.
> We can start to analyze other bugs after we are sure the hardware is
> OK.
>
> BTW, probably your attachements are too big for the lists and the
> message could be dropped. It would be better to add some link to a
> server or use bugzilla for this.
>
> Thanks,
> Jarek P.
>
>
>> Attached all info that i was can get from PC. Maybe problem that we use
>> Core Duo Quard processors? It's 64bit, but kernel and software compile
>> as 32. On 2 x "OLD HT(2 core) Xeon 32 bit" PC all work great...
>>
>> Simple step to reproduce
>> Add iptables and tc rules.... give above 500 mbs total traffic (we have
>> above 300/200 mbs in/out) from any (many?) ip what preset in TC rules
>> and run any CPU like process (like compiling)...
>>
>> Thanks for answers!
>>
>> Denys Fedoryshchenko:
>> Hello!
>> i try run nmi_watchdog...
>> i hope its helps, but this PC have hardware watchdog (bios have params
>> for it), but kernel not have module for it - /S3210SH/ (ICH9-R chipset).
>> I think simple not add ID to driver. I try write to author of it -
>> wim@iguana.be.
>> Please ask for me... this line:
>> [ 0.143332] APIC timer registered as dummy, due to nmi_watchdog=1!
>> its normal start of nmi_watchdog? or i need use nmi_watchdog=2?
>>
>> Thanks for answers!
>>
>>
>>> Denys Fedoryshchenko wrote, On 09/20/2008 08:11 PM:
>>> ...
>>>
>>>
>>>
>>>> P.S. For netdev, i have one more friend - who is complaining that shapers is
>>>> crashing on Intel machines (who uses TSC, he have two different "Core" based
>>>> servers, and both is crashing). With HPET i dont have any problem on high
>>>> performance shapers (except, that it is CPU expensive). It happens on latest
>>>> 2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
>>>> able to recover it. They dont have experience to get actual reason of this
>>>> issue and they dont know english well to report this issue.
>>>>
>>>>
>>> Is your friend sure it's because of shapers? If he/she can patch
>>> there is no need to know English well to report here:
>>>
>>> Subject: 2.6.26.5 tc not OK
>>>
>>> Config:
>>> .config
>>>
>>> tc script:
>>> script
>>>
>>> dmesg:
>>> dmesg
>>>
>>> not OK when: script run/script not run
>>>
>>> patch #1 not OK
>>> patch #2 not OK
>>> ...
>>> patch #2001 OK!
>>>
>>> Jarek P.
>>>
>>>
>>>
>
>
>
>
>
>
>
>
>
>
next prev parent reply other threads:[~2008-09-22 9:40 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
2008-09-20 18:11 ` Denys Fedoryshchenko
2008-09-21 16:11 ` Jarek Poplawski
[not found] ` <48D7385D.40107@bigtelecom.ru>
2008-09-22 6:53 ` Machine Check Exception " Jarek Poplawski
2008-09-22 8:05 ` Jarek Poplawski
2008-09-22 9:40 ` Badalian Vyacheslav [this message]
2008-09-22 11:24 ` Jarek Poplawski
2008-09-22 13:00 ` Badalian Vyacheslav
2008-09-22 17:23 ` Jarek Poplawski
2008-09-23 7:43 ` Badalian Vyacheslav
2008-09-23 9:25 ` Jarek Poplawski
2008-09-23 10:36 ` Badalian Vyacheslav
2008-09-23 11:57 ` Jarek Poplawski
2008-09-23 12:06 ` Jarek Poplawski
2008-09-23 12:16 ` Badalian Vyacheslav
2008-09-23 18:26 ` Jarek Poplawski
2008-09-20 18:31 ` Machine Check Exception Was: " Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48D76813.9000603@bigtelecom.ru \
--to=slavon@bigtelecom.ru \
--cc=denys@visp.net.lb \
--cc=jarkao2@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).