public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel NMI error
@ 2003-09-16 11:13 msrinath
  0 siblings, 0 replies; 6+ messages in thread
From: msrinath @ 2003-09-16 11:13 UTC (permalink / raw)
  To: linux-kernel

Hello Everybody,

Can anyone help me on this?

Recently one of our servers running RedHat linux 7.2 with 2.4.7-10 SMP
kernel generated the following log and the system rebooted. This system has
2 CPUs.

Sep 16 01:34:24 cbesc ftpd[30753]: FTP LOGIN FROM 16.128.157.7
[16.128.157.7], scuser
Sep 16 01:36:48 cbesc ftpd[30753]: FTP session closed
Sep 16 01:54:30 cbesc kernel: Uhhuh. NMI received for unknown reason 35.
Sep 16 01:54:30 cbesc kernel: Dazed and confused, but trying to continue
Sep 16 01:54:30 cbesc kernel: Do you have a strange power saving mode
enabled?
Sep 16 01:54:30 cbesc kernel: eth0: card reports no resources.
Sep 16 01:58:09 cbesc syslogd 1.4.1: restart.
Sep 16 01:58:09 cbesc syslog: syslogd startup succeeded

This is the first time we have faced this problem. The ethernet card used is
intel eepro 100. The details are shown below.

Sep 16 07:33:53 cbesc kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Sep 16 07:33:53 cbesc kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17
Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
Sep 16 07:33:53 cbesc kernel: eth0: Intel Corporation 82557 [Ethernet Pro
100], 00:A0:C9:A0:B7:71, IRQ 16.
Sep 16 07:33:53 cbesc kernel:   Receiver lock-up bug exists -- enabling
work-around.
Sep 16 07:33:53 cbesc kernel:   Board assembly 668081-004, Physical
connectors present: RJ45
Sep 16 07:33:53 cbesc kernel:   Primary interface chip i82555 PHY #1.
Sep 16 07:33:54 cbesc kernel:   General self-test: passed.
Sep 16 07:33:54 cbesc kernel:   Serial sub-system self-test: passed.
Sep 16 07:33:54 cbesc kernel:   Internal registers self-test: passed.
Sep 16 07:33:54 cbesc kernel:   ROM checksum self-test: passed (0x3c15c8f1).
Sep 16 07:33:54 cbesc kernel:   Receiver lock-up workaround activated.


Please let me know why this happened and whether it indicates any hardware
problem in the system.

Thanks & Regards,

- Srinath.


-- 
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Kernel NMI error
@ 2003-09-16 11:38 msrinath
  2003-09-16 13:25 ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: msrinath @ 2003-09-16 11:38 UTC (permalink / raw)
  To: linux-kernel

Hello Everybody,

Can anyone help me on this?

Recently one of our servers running RedHat linux 7.2 with 2.4.7-10 SMP
kernel generated the following log and the system rebooted. This system has
2 CPUs.

Sep 16 01:34:24 cbesc ftpd[30753]: FTP LOGIN FROM 16.128.157.7
[16.128.157.7], scuser
Sep 16 01:36:48 cbesc ftpd[30753]: FTP session closed
Sep 16 01:54:30 cbesc kernel: Uhhuh. NMI received for unknown reason 35.
Sep 16 01:54:30 cbesc kernel: Dazed and confused, but trying to continue
Sep 16 01:54:30 cbesc kernel: Do you have a strange power saving mode
enabled?
Sep 16 01:54:30 cbesc kernel: eth0: card reports no resources.
Sep 16 01:58:09 cbesc syslogd 1.4.1: restart.
Sep 16 01:58:09 cbesc syslog: syslogd startup succeeded

This is the first time we have faced this problem. The ethernet card used is
intel eepro 100. The details are shown below.

Sep 16 07:33:53 cbesc kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Sep 16 07:33:53 cbesc kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17
Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
Sep 16 07:33:53 cbesc kernel: eth0: Intel Corporation 82557 [Ethernet Pro
100], 00:A0:C9:A0:B7:71, IRQ 16.
Sep 16 07:33:53 cbesc kernel:   Receiver lock-up bug exists -- enabling
work-around.
Sep 16 07:33:53 cbesc kernel:   Board assembly 668081-004, Physical
connectors present: RJ45
Sep 16 07:33:53 cbesc kernel:   Primary interface chip i82555 PHY #1.
Sep 16 07:33:54 cbesc kernel:   General self-test: passed.
Sep 16 07:33:54 cbesc kernel:   Serial sub-system self-test: passed.
Sep 16 07:33:54 cbesc kernel:   Internal registers self-test: passed.
Sep 16 07:33:54 cbesc kernel:   ROM checksum self-test: passed (0x3c15c8f1).
Sep 16 07:33:54 cbesc kernel:   Receiver lock-up workaround activated.


Please let me know why this happened and whether it indicates any hardware
problem in the system.

Please send a CC to my email address, since I have not subscribed to the
list.

Thanks & Regards,

- Srinath.


-- 
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel NMI error
  2003-09-16 11:38 Kernel NMI error msrinath
@ 2003-09-16 13:25 ` Alan Cox
  2003-09-17  9:51   ` msrinath
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2003-09-16 13:25 UTC (permalink / raw)
  To: msrinath; +Cc: Linux Kernel Mailing List

On Maw, 2003-09-16 at 12:38, msrinath wrote:
> Recently one of our servers running RedHat linux 7.2 with 2.4.7-10 SMP
> kernel generated the following log and the system rebooted. This system has
> 2 CPUs.

Typically an NMI is a system error. That could be a memory error, it
could be a freak power glitch if its only ever happened once. 

If you are using a 2.4.7 kernel you really should also update to the
current errata kernel and other updates.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Kernel NMI error
  2003-09-16 13:25 ` Alan Cox
@ 2003-09-17  9:51   ` msrinath
  2003-09-17 13:41     ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: msrinath @ 2003-09-17  9:51 UTC (permalink / raw)
  To: 'Alan Cox'; +Cc: 'Linux Kernel Mailing List'

Thanks for the reply. This is the only time this has ever happened. How can
I make out if it is a memory error? Is there any way by which I can test it?

Thanks & Regards,

- Srinath

-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: 16 September 2003 18:55
To: msrinath
Cc: Linux Kernel Mailing List
Subject: Re: Kernel NMI error


On Maw, 2003-09-16 at 12:38, msrinath wrote:
> Recently one of our servers running RedHat linux 7.2 with 2.4.7-10 SMP
> kernel generated the following log and the system rebooted. This system
has
> 2 CPUs.

Typically an NMI is a system error. That could be a memory error, it
could be a freak power glitch if its only ever happened once.

If you are using a 2.4.7 kernel you really should also update to the
current errata kernel and other updates.



--
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


-- 
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Kernel NMI error
  2003-09-17  9:51   ` msrinath
@ 2003-09-17 13:41     ` Alan Cox
  2003-09-18  3:52       ` msrinath
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2003-09-17 13:41 UTC (permalink / raw)
  To: msrinath; +Cc: 'Linux Kernel Mailing List'

On Mer, 2003-09-17 at 10:51, msrinath wrote:
> Thanks for the reply. This is the only time this has ever happened. How can
> I make out if it is a memory error? Is there any way by which I can test it?

If you can schedule down time for the machine run memtest86 on it for a
few hours to check. If not just see if it happens again I guess, if so
then think about testing the RAM


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Kernel NMI error
  2003-09-17 13:41     ` Alan Cox
@ 2003-09-18  3:52       ` msrinath
  0 siblings, 0 replies; 6+ messages in thread
From: msrinath @ 2003-09-18  3:52 UTC (permalink / raw)
  To: 'Alan Cox'; +Cc: 'Linux Kernel Mailing List'

Thanks. I will wait and watch.

- Srinath

-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: 17 September 2003 19:11
To: msrinath
Cc: 'Linux Kernel Mailing List'
Subject: RE: Kernel NMI error


On Mer, 2003-09-17 at 10:51, msrinath wrote:
> Thanks for the reply. This is the only time this has ever happened. How
can
> I make out if it is a memory error? Is there any way by which I can test
it?

If you can schedule down time for the machine run memtest86 on it for a
few hours to check. If not just see if it happens again I guess, if so
then think about testing the RAM


--
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


-- 
This message has been scanned for viruses and
dangerous content by Kaspersky on bpl Server, and is
believed to be clean.
bpl www.kaspersky.com
.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-18  3:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-16 11:38 Kernel NMI error msrinath
2003-09-16 13:25 ` Alan Cox
2003-09-17  9:51   ` msrinath
2003-09-17 13:41     ` Alan Cox
2003-09-18  3:52       ` msrinath
  -- strict thread matches above, loose matches on Subject: below --
2003-09-16 11:13 msrinath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox