All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pau Montero Parés" <pau@imente.com>
To: linux-kernel@vger.kernel.org
Subject: SMP Athlon troubles under high load
Date: Tue, 13 Aug 2002 16:37:28 +0200	[thread overview]
Message-ID: <3D5919A8.1080908@imente.com> (raw)

[1.] One line summary of the problem:
hangs on dual athlon system under heavy load.

[2.] Full description of the problem/report:
The systems hangs only under heavy load or during shutdown. It still 
happens appending
noapic and mem=nopentium, removing the networking card or the ati 
graphic card.
Although i can't remove the Adaptec 2100s RAID in order to boot. The 
system works
fine compiled with only one CPU support. The temperature seems fine
around 55C.

[3.] Keywords (i.e., modules, networking, kernel):
with or without networking and with or without modules.

[4.] Kernel version (from /proc/version):
Linux version 2.4.19 (pau@lorien) (gcc version 2.95.4 20011002 (Debian 
prerelease)) #6 SMP
It still happens under SuSE 7.3, 7.2 and 8 kernels. (2.4.16 and 2.4.18)


[5.] Output of Oops.. message (if applicable) with symbolic information
   Aug  6 13:53:29 lorien kernel: Unable to handle kernel paging request 
at virtual address 00009000
   Aug  6 13:53:29 lorien kernel:  printing eip:
   Aug  6 13:53:29 lorien kernel: 00009000
   Aug  6 13:53:29 lorien kernel: *pde = 04bde001
   Aug  6 13:53:29 lorien kernel: Oops: 0000
   Aug  6 13:53:29 lorien kernel: CPU:    1
   Aug  6 13:53:29 lorien kernel: EIP:    0010:[<00009000>]    Not tainted
   Aug  6 13:53:29 lorien kernel: EFLAGS: 00010246
   Aug  6 13:53:29 lorien kernel: eax: c4854cc0   ebx: c42ef1c0   ecx: 
c426d7c0   edx: 00000000
   Aug  6 13:53:29 lorien kernel: esi: c4853ec0   edi: 080ed000   ebp: 
00009000   esp: c4bbfe94
   Aug  6 13:53:29 lorien kernel: ds: 0018   es: 0018   ss: 0018
   Aug  6 13:53:29 lorien kernel: Process perl (pid: 392, 
stackpage=c4bbf000)
   Aug  6 13:53:29 lorien kernel: Stack: c0331020 c3b21ac8 48048000 
000a5000 fffd3768 3b61d025 00000000 fffd3240
   Aug  6 13:53:29 lorien kernel:        00000062 000a5000 000a5000 
00000062 080ed000 00000286 c426d7c0 c4853ec0
   Aug  6 13:53:29 lorien kernel:        08048000 c426d7c0 c012e59f 
c012e579 c42ef1c0 c4853ec0 c4bbe000 c4bbe000
   Aug  6 13:53:29 lorien kernel: Call Trace: [<c012e59f>] [<c012e579>] 
[<c011b4db>] [<c01200a8>] [<c010702b>]
   Aug  6 13:53:29 lorien kernel:    [<c01188f0>] [<c0119c20>] 
[<c013f24e>] [<c01072cc>] [<c0107214>]
   Aug  6 13:53:29 lorien kernel:
   Aug  6 13:53:29 lorien kernel: Code:  Bad EIP value.

[6.] A small shell script or example program which triggers the
problem (if possible)

I'm only able fastly hang the machine using something like this:
#!/usr/bin/perl
my $a = 0;
while ($a == 0) {rand();}

It usualy returns a Segmentation Faults and the machine hangs in a few 
seconds.
It can become freeze during the perl script too. The script can run 
during 10 seconds to 30 minutes.

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

Linux lorien 2.4.19 #6 SMP lun ago 12 19:08:58 CEST 2002 i686 unknown

gcc version 2.95.4 20011002 (Debian prerelease)
GNU Make version 3.79.1
util-linux 2.11n-4
ldd (GNU libc) 2.2.5
procps 2.0.7-8

[7.2.] Processor information (from /proc/cpuinfo):
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 4
model name      : AMD Athlon(tm) Processor
stepping        : 4
cpu MHz         : 1400.071
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov p
at pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips        : 2791.83

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 4
model name      : AMD Athlon(tm) Processor
stepping        : 4
cpu MHz         : 1400.071
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov p
at pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips        : 2798.38

[7.3.] Module information (from /proc/modules):
none

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
/proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
03c0-03df : vga+
0cf8-0cff : PCI conf1
1000-107f : PCI device 10b7:7646
  1000-107f : 00:0c.0
1090-1093 : PCI device 1022:700c
2000-2fff : PCI Bus #01
  2000-20ff : PCI device 1002:5046
f000-f00f : PCI device 1022:7411

/proc/iomem
00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-7fffffff : System RAM
  00100000-001e32eb : Kernel code
  001e32ec-002139df : Kernel data
f0000000-f000007f : PCI device 10b7:7646
f0001000-f0001fff : PCI device 1022:700c
f0100000-f01fffff : PCI Bus #01
  f0100000-f0103fff : PCI device 1002:5046
f2000000-f3ffffff : PCI device 1044:a501
f4000000-f7ffffff : PCI device 1022:700c
f8000000-fbffffff : PCI Bus #01
  f8000000-fbffffff : PCI device 1002:5046
fec00000-fec0ffff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved

[7.5.] PCI information ('lspci -vvv' as root)
00:00.0 Host bridge: Advanced Micro Devices [AMD]: Unknown device 700c 
(rev 11)
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort+ >SERR- <PERR-
  Latency: 64
  Region 0: Memory at f4000000 (32-bit, prefetchable) [size=64M]
  Region 1: Memory at f0001000 (32-bit, prefetchable) [size=4K]
  Region 2: I/O ports at 1090 [disabled] [size=4]
  Capabilities: [a0] AGP version 2.0
    Status: RQ=15 SBA+ 64bit- FW+ Rate=x1,x2
    Command: RQ=0 SBA+ AGP+ 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Advanced Micro Devices [AMD]: Unknown device 700d 
(prog-if 00 [Normal decode])
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 99
  Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
  I/O behind bridge: 00002000-00002fff
  Memory behind bridge: f0100000-f01fffff
  Prefetchable memory behind bridge: f8000000-fbffffff
  BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-765 [Viper] ISA 
(rev 02)
  Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-765 [Viper] IDE 
(rev 01) (prog-if 8a [Master SecP PriP])
  Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 64
  Region 4: I/O ports at f000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-765 [Viper] ACPI (rev 01)
  Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-

00:0c.0 Ethernet controller: 3Com Corporation 3cSOHO100-TX Hurricane 
(rev 30)
  Subsystem: 3Com Corporation 3cSOHO100-TX Hurricane
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 80 (2500ns min, 2500ns max), cache line size 10
  Interrupt: pin A routed to IRQ 5
  Region 0: I/O ports at 1000 [size=128]
  Region 1: Memory at f0000000 (32-bit, non-prefetchable) [size=128]
  Expansion ROM at <unassigned> [disabled] [size=128K]
  Capabilities: [dc] Power Management version 1
    Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)
    Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0d.0 PCI bridge: Distributed Processing Technology PCI Bridge (rev 
02) (prog-if 00 [Normal decode])
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 64, cache line size 10
  Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
  I/O behind bridge: 0000f000-00000fff
  Memory behind bridge: 00100000-000fffff
  Prefetchable memory behind bridge: 00100000-000fffff
  BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
  Capabilities: [68] Power Management version 2
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0d.1 I2O: Distributed Processing Technology SmartRAID V Controller 
(rev 02) (prog-if 01)
  Subsystem: Distributed Processing Technology: Unknown device c03c
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Latency: 64 (250ns min, 250ns max), cache line size 10
  Interrupt: pin A routed to IRQ 11
  BIST result: 00
  Region 0: Memory at f2000000 (32-bit, prefetchable) [size=32M]
  Expansion ROM at <unassigned> [disabled] [size=32K]
  Capabilities: [80] Power Management version 2
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:05.0 VGA compatible controller: ATI Technologies Inc Rage 128 PF 
(prog-if 00 [VGA])
  Subsystem: ATI Technologies Inc: Unknown device 0008
  Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping+ SERR- FastB2B+
  Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
  Interrupt: pin A routed to IRQ 11
  Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
  Region 1: I/O ports at 2000 [size=256]
  Region 2: Memory at f0100000 (32-bit, non-prefetchable) [size=16K]
  Expansion ROM at <unassigned> [disabled] [size=128K]
  Capabilities: [50] AGP version 2.0
    Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
    Command: RQ=0 SBA+ AGP- 64bit- FW- Rate=<none>
  Capabilities: [5c] Power Management version 2
    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 PME-Enable- DSel=0 DScale=0 PME-

[7.6.] SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: ADAPTEC  Model: RAID-5           Rev: 370F
  Type:   Direct-Access                    ANSI SCSI revision: 02

[7.7.] Other information that might be relevant to the problem
/proc/interrupts
           CPU0       CPU1
  0:     143027          0          XT-PIC  timer
  1:       5401          0          XT-PIC  keyboard
  2:          0          0          XT-PIC  cascade
  5:       1715          0          XT-PIC  eth0
  8:          3          0          XT-PIC  rtc
 11:       5744          0          XT-PIC  dpti0
NMI:          0          0
LOC:     142944     143078
ERR:          8
MIS:          0

More logs that i can't understand:

   Aug  6 13:48:13 lorien kernel:  IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 
2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
   Aug  6 13:48:14 lorien kernel:         : booting with the "noapic" 
option.
----
   Aug  6 13:48:13 lorien kernel: mtrr: detected mtrr type: Intel
   Aug  6 13:48:14 lorien kernel: mtrr: your CPUs had inconsistent fixed 
MTRR settings
   Aug  6 13:48:14 lorien kernel: mtrr: probably your BIOS does not 
setup all CPUs

The system hangs both 1.1 and 1.4 SMP specification and with or without 
using MP interrupts table.
MainBoard: Tyan 2460 with registered ECC memory.

The MB can't reboot normaly, but i think it is a BIOS issue, i should 
update it.

Good luck!

Pau Montero Parés.
http://pau.no-ip.com


             reply	other threads:[~2002-08-13 14:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-13 14:37 Pau Montero Parés [this message]
2002-08-13 21:22 ` SMP Athlon troubles under high load Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2002-08-13 15:02 jordan.breeding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D5919A8.1080908@imente.com \
    --to=pau@imente.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.