public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* APIC errors
@ 2001-01-17  8:14 Dominik Kubla
  2001-01-17 11:22 ` Maciej W. Rozycki
  2001-01-17 20:54 ` Dr. Kelsey Hudson
  0 siblings, 2 replies; 5+ messages in thread
From: Dominik Kubla @ 2001-01-17  8:14 UTC (permalink / raw)
  To: linux-kernel

Just switched to 2.4.0-ac9 (+crypto patches) on our Dual-Pentium MMX
webserver yesterday.  Works fine so far, except i keep seeing those
APIC erros (about 14 in 12 hrs) indicating receive, send and CS errors.

Should i be concerned?

Yours,
  Dominik Kubla
-- 
http://petition.eurolinux.org/index_html - No Software Patents In Europe!
http://petition.lugs.ch/ (in Switzerland)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: APIC errors
  2001-01-17  8:14 APIC errors Dominik Kubla
@ 2001-01-17 11:22 ` Maciej W. Rozycki
  2001-01-18 19:17   ` Jorge Nerin
  2001-01-17 20:54 ` Dr. Kelsey Hudson
  1 sibling, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2001-01-17 11:22 UTC (permalink / raw)
  To: Dominik Kubla; +Cc: linux-kernel

On Wed, 17 Jan 2001, Dominik Kubla wrote:

> Just switched to 2.4.0-ac9 (+crypto patches) on our Dual-Pentium MMX
> webserver yesterday.  Works fine so far, except i keep seeing those
> APIC erros (about 14 in 12 hrs) indicating receive, send and CS errors.
> 
> Should i be concerned?

 At this volume I would treat this as a warning but not a critical issue. 
Inter-APIC messages get retransmitted in case of an error, but the
checksum circuit is not sophisticated -- a double-bit error might pass
unnoticed leading to a system unstability under certain conditions.  At
such a low volume of errors double-bit ones are not likely to happen. 

 It's the first report of APIC errors on a P5 system I have seen, so it's
probably not a result of a bad motherboard design.  I'd recommend to check
if the system doesn't get overheated.  You may also be unlucky to have a
faulty board. 

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: APIC errors
  2001-01-17  8:14 APIC errors Dominik Kubla
  2001-01-17 11:22 ` Maciej W. Rozycki
@ 2001-01-17 20:54 ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 5+ messages in thread
From: Dr. Kelsey Hudson @ 2001-01-17 20:54 UTC (permalink / raw)
  To: Dominik Kubla; +Cc: linux-kernel

On Wed, 17 Jan 2001, Dominik Kubla wrote:

> Just switched to 2.4.0-ac9 (+crypto patches) on our Dual-Pentium MMX
> webserver yesterday.  Works fine so far, except i keep seeing those
> APIC erros (about 14 in 12 hrs) indicating receive, send and CS errors.

Make sure your system is free of dust...dust can cause small errors like
this to occur. Also make sure that the temperature of the system is within
tolerable levels. Also, a capacitor on your motherboard could be
failing...there are many different things that could cause this error.

> Should i be concerned?

Probably not. The errors were there before yo upgraded the kernel; they
just weren't reported. Those messages are merely letting you know that an
APIC retry happened. The message still got to the controller; it just had
to get sent twice.

good luck,
-kelsey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: APIC errors
  2001-01-17 11:22 ` Maciej W. Rozycki
@ 2001-01-18 19:17   ` Jorge Nerin
  2001-01-22 16:26     ` Maciej W. Rozycki
  0 siblings, 1 reply; 5+ messages in thread
From: Jorge Nerin @ 2001-01-18 19:17 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Dominik Kubla, linux-kernel

"Maciej W. Rozycki" escribió:
> 
> On Wed, 17 Jan 2001, Dominik Kubla wrote:
> 
> > Just switched to 2.4.0-ac9 (+crypto patches) on our Dual-Pentium MMX
> > webserver yesterday.  Works fine so far, except i keep seeing those
> > APIC erros (about 14 in 12 hrs) indicating receive, send and CS errors.
> >
> > Should i be concerned?
> 
>  At this volume I would treat this as a warning but not a critical issue.
> Inter-APIC messages get retransmitted in case of an error, but the
> checksum circuit is not sophisticated -- a double-bit error might pass
> unnoticed leading to a system unstability under certain conditions.  At
> such a low volume of errors double-bit ones are not likely to happen.
> 
>  It's the first report of APIC errors on a P5 system I have seen, so it's
> probably not a result of a bad motherboard design.  I'd recommend to check
> if the system doesn't get overheated.  You may also be unlucky to have a
> faulty board.
> 
>   Maciej
> 

Hey, it's not the first, some time ago when it began to be reported a
lot of people with various systems asked at the same time about the same
thing :)

I have a dual p200mmx in a Gigabyte 586DX mobo with 96Mb + Voodoo 3
2000pci, Realtek 8139 nic, bt848 tv...

And I usually get a lot of these messages:

[coma@quartz coma]$ cat /proc/interrupts 
           CPU0       CPU1       
  0:     801148     819848    IO-APIC-edge  timer
  1:       7576       7691    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  5:          0          4    IO-APIC-edge  soundblaster
  8:          1          0    IO-APIC-edge  rtc
  9:       4358       4347    IO-APIC-edge  eth1
 12:     124492     126503    IO-APIC-edge  PS/2 Mouse
 14:     206324     201592    IO-APIC-edge  ide0
 15:    1593094    1593085    IO-APIC-edge  ide1
 17:     785989     785945   IO-APIC-level  eth0
 18:        402        433   IO-APIC-level  bttv
NMI:    1620906    1620904 
LOC:    1620963    1620962 
ERR:       2697
[coma@quartz coma]$ uptime 
  8:14pm  up  4:30,  0 users,  load average: 0.19, 0.11, 0.09

but my system works ok, mostly, now I have just upgraded a Realtek 8029
(10Mb) because it gets hung to a Realtek 8139 (100Mb) just to found the
mobo has some kind of busmastering problems, but that's another story.

P.D. And as you suggested it runs very hot, about 50ºC at the cpus when
both are at full use.

-- 
Jorge Nerin
<comandante@zaralinux.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: APIC errors
  2001-01-18 19:17   ` Jorge Nerin
@ 2001-01-22 16:26     ` Maciej W. Rozycki
  0 siblings, 0 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2001-01-22 16:26 UTC (permalink / raw)
  To: Jorge Nerin; +Cc: Dominik Kubla, linux-kernel

On Thu, 18 Jan 2001, Jorge Nerin wrote:

> >  It's the first report of APIC errors on a P5 system I have seen, so it's
> > probably not a result of a bad motherboard design.  I'd recommend to check
> > if the system doesn't get overheated.  You may also be unlucky to have a
> > faulty board.
> 
> Hey, it's not the first, some time ago when it began to be reported a
> lot of people with various systems asked at the same time about the same
> thing :)

 I've seen a lot of reports but they were from P6 systems' owners.

> LOC:    1620963    1620962 
> ERR:       2697
> [coma@quartz coma]$ uptime 
>   8:14pm  up  4:30,  0 users,  load average: 0.19, 0.11, 0.09

 This rate of errors is alarming.  You get an error every six seconds on
the average. 

> but my system works ok, mostly, now I have just upgraded a Realtek 8029
> (10Mb) because it gets hung to a Realtek 8139 (100Mb) just to found the
> mobo has some kind of busmastering problems, but that's another story.

 The hangs might actually be a result of interrupt delivery problems (just
as other people report).

> P.D. And as you suggested it runs very hot, about 50şC at the cpus when
> both are at full use.

 Well, 50 degrees is not that hot -- CPUs are actually speced for up to 70
degrees ambient temperature (that means the maximum temperature of the
case, not the heatsink!), but you need to ensure proper cooling. 

 After one and a half year since the error reporting is enabled I have yet
to see a hardware error to be reported by an APIC in my system.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-01-22 16:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-17  8:14 APIC errors Dominik Kubla
2001-01-17 11:22 ` Maciej W. Rozycki
2001-01-18 19:17   ` Jorge Nerin
2001-01-22 16:26     ` Maciej W. Rozycki
2001-01-17 20:54 ` Dr. Kelsey Hudson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox