public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oops on linux 2.4.20-ac1
@ 2002-12-10 17:49 Orion Poplawski
  2002-12-10 21:00 ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Orion Poplawski @ 2002-12-10 17:49 UTC (permalink / raw)
  To: linux-kernel

I've been having a number of issues, mostly system lockups, with a 
machine of ours - a dual proc athlon.  I've removed some hardware and I 
haven't seen a hand recently.  However, we got an Oops message recently. 
 I lost that message because it wasn't written to any log (any way I can 
fix that?).  So, I upgraded the kernel to 2.4.20-ac1.  Under that I 
started getting Oops quite frequently.  Here is my first attemp at 
processing the message.  Note that I switched back to the previous 
kernel, but it's running the same module list and I tried to point 
ksymoops to the correct pieces.  I also typed the oops message in from 
what I wrote down from the screen.  Please let me know if I made a 
mistake there.

ksymoops 2.4.1 on i686 2.4.19.  Options used
     -v /usr/src/linux-2.4.20-ac1/vmlinux (specified)
     -k /var/log/ksyms.1 (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-ac1 (specified)
     -m /boot/System.map-2.4.20-ac1 (specified)

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sym53c8xx.o) for sym53c8xx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/net/ipv4/netfilter/netfilter.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/net/fc/fc.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/sound/sounddrivers.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/cdrom/driver.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/ide/raid/idedriver-raid.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/ide/ppc/idedriver-ppc.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/ide/legacy/idedriver-legacy.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/ide/arm/idedriver-arm.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/misc/misc.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/parport/driver.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/media/radio/radio.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/media/video/video.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/media/media.o
Warning (read_object): no symbols in 
/lib/modules/2.4.20-ac1/build/drivers/hotplug/vmlinux-obj.o
Warning (compare_ksyms_lsmod): module 3c59x is in lsmod but not in 
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module autofs is in lsmod but not in 
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module binfmt_misc is in lsmod but not in 
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module lockd is in lsmod but not in 
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nfs is in lsmod but not in ksyms, 
probably no symbols exported
Warning (compare_ksyms_lsmod): module nfsd is in lsmod but not in ksyms, 
probably no symbols exported
Warning (compare_ksyms_lsmod): module sunrpc is in lsmod but not in 
ksyms, probably no symbols exported
Warning (map_ksym_to_module): cannot match loaded module ext3 to a 
unique module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module jbd to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sym53c8xx to a 
unique module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sd_mod to a 
unique module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module scsi_mod to a 
unique module object.  Trace may not be reliable.
Oops: 0002
CPU: 0
EIP: 0010:[<f89641eb>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010216
eax: 00000040  ebx: 00000028  ecx: 00000060  edx: f590a960
esi: f590a800  edi: f590a960  ebp: c4650ed2  esp: f4a45eb0
ds: 0018  es: 0018  ss: 0018
Stack: 0000003c 0000001f 00000000 0000b800 00000060 f590a960 f5e4d62c 
00000082
       00000001 00000082 f5dd3fd8 00000001 00000086 00000001 00000086 
f5a98000
       00000000 0000000e f4a45f08 00000000 f5a4d600 f7ff0340 00000040 
c01beaff
Call Trace: [<c01beaff>] [<c0120593>] [<f896382a>]
            [<c011f92d>] [<c0109b89>] [<c0109cf8>]
Code: b8 01 30 00 00 83 c2 0e 66 ef 8b 7c 24 14 8b 87 dc 00 00 00

 >>EIP; f89641eb <END_OF_CODE+10598c/????>   <=====
Trace; c01beaff <netif_receive_skb+ff/130>
Trace; c0120593 <send_sig_info+73/90>
Trace; f896382a <END_OF_CODE+104fcb/????>
Trace; c011f92d <do_timer+3d/70>
Trace; c0109b89 <handle_IRQ_event+39/60>
Trace; c0109cf8 <do_IRQ+68/b0>
Code;  f89641eb <END_OF_CODE+10598c/????>
00000000 <_EIP>:
Code;  f89641eb <END_OF_CODE+10598c/????>   <=====
   0:   b8 01 30 00 00            mov    $0x3001,%eax   <=====
Code;  f89641f0 <END_OF_CODE+105991/????>
   5:   83 c2 0e                  add    $0xe,%edx
Code;  f89641f3 <END_OF_CODE+105994/????>
   8:   66 ef                     out    %ax,(%dx)
Code;  f89641f5 <END_OF_CODE+105996/????>
   a:   8b 7c 24 14               mov    0x14(%esp,1),%edi
Code;  f89641f9 <END_OF_CODE+10599a/????>
   e:   8b 87 dc 00 00 00         mov    0xdc(%edi),%eax


26 warnings and 5 errors issued.  Results may not be reliable.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops on linux 2.4.20-ac1
  2002-12-10 17:49 Oops on linux 2.4.20-ac1 Orion Poplawski
@ 2002-12-10 21:00 ` Alan Cox
  2002-12-11  2:00   ` scott thomason
  2002-12-11 23:00   ` Orion Poplawski
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Cox @ 2002-12-10 21:00 UTC (permalink / raw)
  To: Orion Poplawski; +Cc: Linux Kernel Mailing List

On Tue, 2002-12-10 at 17:49, Orion Poplawski wrote:
> I've been having a number of issues, mostly system lockups, with a 
> machine of ours - a dual proc athlon.  I've removed some hardware and I 

Random lockups on dual athlons are a notorious problem under all OS's.
Start by checking it passes memtest86, that will verify the RAM is ok -
and the AMD is -very- picky about RAM.

If thats ok then let me know which board you have, what is plugged into
it and what PSU you are using.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops on linux 2.4.20-ac1
  2002-12-10 21:00 ` Alan Cox
@ 2002-12-11  2:00   ` scott thomason
  2002-12-11 16:08     ` Reliable hardware Orion Poplawski
  2002-12-11 16:10     ` Oops on linux 2.4.20-ac1 Orion Poplawski
  2002-12-11 23:00   ` Orion Poplawski
  1 sibling, 2 replies; 11+ messages in thread
From: scott thomason @ 2002-12-11  2:00 UTC (permalink / raw)
  To: Alan Cox, Orion Poplawski, Linux Kernel Mailing List

On Tuesday 10 December 2002 03:00 pm, Alan Cox wrote:
> Random lockups on dual athlons are a notorious problem under all
> OS's. Start by checking it passes memtest86, that will verify the
> RAM is ok - and the AMD is -very- picky about RAM.
>
> If thats ok then let me know which board you have, what is plugged
> into it and what PSU you are using.

I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning 
my memory for new chips the store owner memtest86'd, my combo of cpus 
and mobo was finding the occasional error. I finally ended up 
resolving it by simply underclocking the bus about 6Mhz :( 

Next time, I'm buying ECC memory.
---scott

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Reliable hardware
  2002-12-11  2:00   ` scott thomason
@ 2002-12-11 16:08     ` Orion Poplawski
  2002-12-11 16:33       ` John Bradford
  2002-12-11 17:01       ` Alan Cox
  2002-12-11 16:10     ` Oops on linux 2.4.20-ac1 Orion Poplawski
  1 sibling, 2 replies; 11+ messages in thread
From: Orion Poplawski @ 2002-12-11 16:08 UTC (permalink / raw)
  To: scott; +Cc: Alan Cox, Linux Kernel Mailing List

scott thomason wrote:

>On Tuesday 10 December 2002 03:00 pm, Alan Cox wrote:
>  
>
>>Random lockups on dual athlons are a notorious problem under all
>>OS's. Start by checking it passes memtest86, that will verify the
>>RAM is ok - and the AMD is -very- picky about RAM.
>>
>>If thats ok then let me know which board you have, what is plugged
>>into it and what PSU you are using.
>>    
>>
>
>I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning 
>my memory for new chips the store owner memtest86'd, my combo of cpus 
>and mobo was finding the occasional error. I finally ended up 
>resolving it by simply underclocking the bus about 6Mhz :( 
>
>Next time, I'm buying ECC memory.
>---scott
>  
>
Is there a good site for pointers towards assembling reliable Linux 
machines?  It seems to me the trickiest part of the whole operation is 
choosing good hardware in the first place.  I just started a new job and 
inherited a buch of new but flakey machines, and I'd like to avoid doing 
that in the future.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops on linux 2.4.20-ac1
  2002-12-11  2:00   ` scott thomason
  2002-12-11 16:08     ` Reliable hardware Orion Poplawski
@ 2002-12-11 16:10     ` Orion Poplawski
  1 sibling, 0 replies; 11+ messages in thread
From: Orion Poplawski @ 2002-12-11 16:10 UTC (permalink / raw)
  To: scott; +Cc: Linux Kernel Mailing List

scott thomason wrote:

>I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning 
>my memory for new chips the store owner memtest86'd, my combo of cpus 
>and mobo was finding the occasional error. I finally ended up 
>resolving it by simply underclocking the bus about 6Mhz :( 
>
>Next time, I'm buying ECC memory.
>---scott
>  
>

Underclocking has been my "solution" to these lockups as well.  Would 
ECC memory actually help in this case though?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reliable hardware
  2002-12-11 16:08     ` Reliable hardware Orion Poplawski
@ 2002-12-11 16:33       ` John Bradford
  2002-12-11 17:01       ` Alan Cox
  1 sibling, 0 replies; 11+ messages in thread
From: John Bradford @ 2002-12-11 16:33 UTC (permalink / raw)
  To: Orion Poplawski; +Cc: scott, alan, linux-kernel

> >>Random lockups on dual athlons are a notorious problem under all
> >>OS's. Start by checking it passes memtest86, that will verify the
> >>RAM is ok - and the AMD is -very- picky about RAM.
> >>
> >>If thats ok then let me know which board you have, what is plugged
> >>into it and what PSU you are using.
> >>    
> >>
> >
> >I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning 
> >my memory for new chips the store owner memtest86'd, my combo of cpus 
> >and mobo was finding the occasional error. I finally ended up 
> >resolving it by simply underclocking the bus about 6Mhz :( 
> >
> >Next time, I'm buying ECC memory.

Why?  ECC memory guards against a single bit error in the RAM, nothing
else, (except that it also reports double bit errors).

John.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reliable hardware
  2002-12-11 16:08     ` Reliable hardware Orion Poplawski
  2002-12-11 16:33       ` John Bradford
@ 2002-12-11 17:01       ` Alan Cox
  2002-12-11 17:21         ` Jason L Tibbitts III
  2002-12-11 23:35         ` Patrick Finnegan
  1 sibling, 2 replies; 11+ messages in thread
From: Alan Cox @ 2002-12-11 17:01 UTC (permalink / raw)
  To: Orion Poplawski; +Cc: scott, Linux Kernel Mailing List

On Wed, 2002-12-11 at 16:08, Orion Poplawski wrote:
> Is there a good site for pointers towards assembling reliable Linux 
> machines?  It seems to me the trickiest part of the whole operation is 
> choosing good hardware in the first place.  I just started a new job and 
> inherited a buch of new but flakey machines, and I'd like to avoid doing 
> that in the future.

The AMD duals have been a disaster in my experience. Its a shame because
when they do go they really are very fast boxes. The biggest factor I've
found is chipsets. 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reliable hardware
  2002-12-11 17:01       ` Alan Cox
@ 2002-12-11 17:21         ` Jason L Tibbitts III
  2002-12-11 23:35         ` Patrick Finnegan
  1 sibling, 0 replies; 11+ messages in thread
From: Jason L Tibbitts III @ 2002-12-11 17:21 UTC (permalink / raw)
  To: linux-kernel

>>>>> "AC" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

AC> The AMD duals have been a disaster in my experience.

I do have a bunch of these running reliably (RH 7.3 plus the latest
OpenMosix kernel).  I had to go through a few combinations of
motherboard and RAM (four different manufacturers of RAM) before I got
something that works.  Processors are MP 1900+ or 2000+, boards are
Tyan S2466, memory is in PC2100 ECC registered 512MB sticks from
Corsair.  Case and power supply are PC Power and Cooling, mid tower,
450W PS, every fan bay filled.  These machines have been rock
stable for months except for a failed IBM deathstar drive and an
over-temp shutdown when the room AC failed.

I still have a couple of the 760MP boards (as opposed to the MPX
boards) which I just can't get to run properly with two processors.

 - J<

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops on linux 2.4.20-ac1
  2002-12-10 21:00 ` Alan Cox
  2002-12-11  2:00   ` scott thomason
@ 2002-12-11 23:00   ` Orion Poplawski
  1 sibling, 0 replies; 11+ messages in thread
From: Orion Poplawski @ 2002-12-11 23:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Alan Cox wrote:

>Random lockups on dual athlons are a notorious problem under all OS's.
>Start by checking it passes memtest86, that will verify the RAM is ok -
>and the AMD is -very- picky about RAM.
>
>If thats ok then let me know which board you have, what is plugged into
>it and what PSU you are using.
>
>  
>

memtest86 completed 3 passes with no errors, so:

MB:
Asus A7M266-D w/ Dual Athlon 2100 MP and 4 x 512MB PC2100 ECC Dimms
AMD 762 Chipset
RAM clocking is "normal"

Cards:
PCI 3com 3c905-TX ethernet
PCI Tekram DC-390U3W SCSI Controller
PCI ATI 3d Rage II Video

1 IDE Hard disk
1 external SCSI disk

PSU is a Turbo-Cool 475 ATX-PFC  (appears to be 460W)




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reliable hardware
  2002-12-11 17:01       ` Alan Cox
  2002-12-11 17:21         ` Jason L Tibbitts III
@ 2002-12-11 23:35         ` Patrick Finnegan
  2002-12-12  1:24           ` Alan Cox
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick Finnegan @ 2002-12-11 23:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: Orion Poplawski, scott, Linux Kernel Mailing List

On 11 Dec 2002, Alan Cox wrote:

> On Wed, 2002-12-11 at 16:08, Orion Poplawski wrote:
> > Is there a good site for pointers towards assembling reliable Linux
> > machines?  It seems to me the trickiest part of the whole operation is
> > choosing good hardware in the first place.  I just started a new job and
> > inherited a buch of new but flakey machines, and I'd like to avoid doing
> > that in the future.
>
> The AMD duals have been a disaster in my experience. Its a shame because
> when they do go they really are very fast boxes. The biggest factor I've
> found is chipsets.

Which chipset - the new or the old one?  I've got an ASUS A7M266D (or
something) that's based on the AMD 760MPX chipset and has 512MB of
Registered ECC memory, and a pair of XP 1800+'s... and it works just
beautifuly.  Truely rock solid.

Pat
--
Purdue Universtiy ITAP/RCS
Information Technology at Purdue
Research Computing and Storage
http://www-rcd.cc.purdue.edu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reliable hardware
  2002-12-11 23:35         ` Patrick Finnegan
@ 2002-12-12  1:24           ` Alan Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2002-12-12  1:24 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: Orion Poplawski, scott, Linux Kernel Mailing List

On Wed, 2002-12-11 at 23:35, Patrick Finnegan wrote:
> Which chipset - the new or the old one?  I've got an ASUS A7M266D (or
> something) that's based on the AMD 760MPX chipset and has 512MB of
> Registered ECC memory, and a pair of XP 1800+'s... and it works just
> beautifuly.  Truely rock solid.

Same board you have. 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-12-12  0:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-10 17:49 Oops on linux 2.4.20-ac1 Orion Poplawski
2002-12-10 21:00 ` Alan Cox
2002-12-11  2:00   ` scott thomason
2002-12-11 16:08     ` Reliable hardware Orion Poplawski
2002-12-11 16:33       ` John Bradford
2002-12-11 17:01       ` Alan Cox
2002-12-11 17:21         ` Jason L Tibbitts III
2002-12-11 23:35         ` Patrick Finnegan
2002-12-12  1:24           ` Alan Cox
2002-12-11 16:10     ` Oops on linux 2.4.20-ac1 Orion Poplawski
2002-12-11 23:00   ` Orion Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox