From: Badalian Vyacheslav <slavon@bigtelecom.ru>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Denys Fedoryshchenko <denys@visp.net.lb>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Machine Check Exception Re: NetDev! Please help!
Date: Mon, 22 Sep 2008 13:40:35 +0400 [thread overview]
Message-ID: <48D76813.9000603@bigtelecom.ru> (raw)
In-Reply-To: <20080922065339.GA4399@ff.dom.local>
Thanks for answer Jarek!
I post it is bugtrack - http://bugzilla.kernel.org/show_bug.cgi?id=11618
I not think that its hardware error because this problem we have in 10
servers on 2.6.26.2 kernel +)
On Friday night i compile 2.6.26.5 and have 2 panic on 1 pc what have
max load and 1 panic on other pc.
I write to netdev list because first messages looks like:
[ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
[ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 4956.420300] Tx Queue <0>
[ 4956.420300] TDH <81>
[ 4956.420301] TDT <81>
[ 4956.420302] next_to_use <81>
[ 4956.420302] next_to_clean <d6>
[ 4956.420303] buffer_info[next_to_clean]
[ 4956.420303] time_stamp <15498d>
[ 4956.420304] next_to_watch <d6>
[ 4956.420304] jiffies <15511c>
[ 4956.420305] next_to_watch.status <1>
[ 4956.420537] eth1: Detected Tx Unit Hang:
[ 4956.420538] TDH <b0>
[ 4956.420538] TDT <b0>
[ 4956.420539] next_to_use <b0>
[ 4956.420539] next_to_clean <5>
[ 4956.420540] buffer_info[next_to_clean]:
[ 4956.420540] time_stamp <15498e>
[ 4956.420541] next_to_watch <5>
[ 4956.420542] jiffies <15511c>
[ 4956.420542] next_to_watch.status <1>
[ 4956.423064] CPU 1: Bank 0: 3200004000000800
[ 4956.423190] CPU 1: Bank 5: 3200220024080400
[ 4956.423315] Kernel panic - not syncing: CPU context corrupt
[ 4956.423933] Rebooting in 3 seconds..
But in 2.6.26.5 i not see errors like this 2 days... Also if system not have network load - i can't do panic by cpuburn or compiling sources...
Anyone i think its good that my message also go to general mail-list and bugzilla...
I try get more info... if you or anyone have idea how test this bug - i can do it)
Thanks!
> On Mon, Sep 22, 2008 at 10:17:01AM +0400, Badalian Vyacheslav wrote:
>
>> Jarek Poplawski:
>>
>> Hello!
>> There all requested information.
>> I try 2.6.26.5 and again get:
>> [143784.513166] CPU 2: Bank 0: 3200004000000800
>> [143784.513241] CPU 2: Bank 5: 3200121020080400
>> [143784.513241] Kernel panic - not syncing: CPU context corrupt
>> [143784.513282] Rebooting in 3 seconds..
>>
>
> Hi,
>
> Actually, I suggested you to read this Machine Check Exception help,
> because I think you should first try to test your hardware instead of
> sending configs. This type of error isn't usually seen with netdev
> bugs.
>
> Since I'm not a hardware expert I added linux-kernel to Cc, and
> probably you should do the same (I added it to this one). But, until
> you have any better advice I think you should do some long and heavy
> testing of your PCs especially for overheating or memory problems.
> We can start to analyze other bugs after we are sure the hardware is
> OK.
>
> BTW, probably your attachements are too big for the lists and the
> message could be dropped. It would be better to add some link to a
> server or use bugzilla for this.
>
> Thanks,
> Jarek P.
>
>
>> Attached all info that i was can get from PC. Maybe problem that we use
>> Core Duo Quard processors? It's 64bit, but kernel and software compile
>> as 32. On 2 x "OLD HT(2 core) Xeon 32 bit" PC all work great...
>>
>> Simple step to reproduce
>> Add iptables and tc rules.... give above 500 mbs total traffic (we have
>> above 300/200 mbs in/out) from any (many?) ip what preset in TC rules
>> and run any CPU like process (like compiling)...
>>
>> Thanks for answers!
>>
>> Denys Fedoryshchenko:
>> Hello!
>> i try run nmi_watchdog...
>> i hope its helps, but this PC have hardware watchdog (bios have params
>> for it), but kernel not have module for it - /S3210SH/ (ICH9-R chipset).
>> I think simple not add ID to driver. I try write to author of it -
>> wim@iguana.be.
>> Please ask for me... this line:
>> [ 0.143332] APIC timer registered as dummy, due to nmi_watchdog=1!
>> its normal start of nmi_watchdog? or i need use nmi_watchdog=2?
>>
>> Thanks for answers!
>>
>>
>>> Denys Fedoryshchenko wrote, On 09/20/2008 08:11 PM:
>>> ...
>>>
>>>
>>>
>>>> P.S. For netdev, i have one more friend - who is complaining that shapers is
>>>> crashing on Intel machines (who uses TSC, he have two different "Core" based
>>>> servers, and both is crashing). With HPET i dont have any problem on high
>>>> performance shapers (except, that it is CPU expensive). It happens on latest
>>>> 2.6.26.5 too. Machine getting hard lockup, and nothing than hardware watchdog
>>>> able to recover it. They dont have experience to get actual reason of this
>>>> issue and they dont know english well to report this issue.
>>>>
>>>>
>>> Is your friend sure it's because of shapers? If he/she can patch
>>> there is no need to know English well to report here:
>>>
>>> Subject: 2.6.26.5 tc not OK
>>>
>>> Config:
>>> .config
>>>
>>> tc script:
>>> script
>>>
>>> dmesg:
>>> dmesg
>>>
>>> not OK when: script run/script not run
>>>
>>> patch #1 not OK
>>> patch #2 not OK
>>> ...
>>> patch #2001 OK!
>>>
>>> Jarek P.
>>>
>>>
>>>
>
>
>
>
>
>
>
>
>
>
next prev parent reply other threads:[~2008-09-22 9:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-20 13:19 NetDev! Please help! Badalian Vyacheslav
2008-09-20 13:38 ` Badalian Vyacheslav
2008-09-20 18:11 ` Denys Fedoryshchenko
2008-09-21 16:11 ` Jarek Poplawski
[not found] ` <48D7385D.40107@bigtelecom.ru>
2008-09-22 6:53 ` Machine Check Exception " Jarek Poplawski
2008-09-22 8:05 ` Jarek Poplawski
2008-09-22 9:40 ` Badalian Vyacheslav [this message]
2008-09-22 11:24 ` Jarek Poplawski
2008-09-22 13:00 ` Badalian Vyacheslav
2008-09-22 17:23 ` Jarek Poplawski
2008-09-23 7:43 ` Badalian Vyacheslav
2008-09-23 9:25 ` Jarek Poplawski
2008-09-23 10:36 ` Badalian Vyacheslav
2008-09-23 11:57 ` Jarek Poplawski
2008-09-23 12:06 ` Jarek Poplawski
2008-09-23 12:16 ` Badalian Vyacheslav
2008-09-23 18:26 ` Jarek Poplawski
2008-09-20 18:31 ` Machine Check Exception Was: " Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48D76813.9000603@bigtelecom.ru \
--to=slavon@bigtelecom.ru \
--cc=denys@visp.net.lb \
--cc=jarkao2@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.