* repeatable SMP lockups - kernel 2.4.9
@ 2001-09-14 12:30 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
2001-09-14 18:37 ` Andrew Morton
0 siblings, 2 replies; 8+ messages in thread
From: Matthias Haase @ 2001-09-14 12:30 UTC (permalink / raw)
To: linux-kernel
Our new SMP file- and printserver locks always hard up, if higher load
come on the NIC. True stable without networking (X11, DRI
1. First, I have changed the NIC from 3Com (vortex-driver) to noname,
driven by Realtek
RTL-8139 (rev 10) and the lockup occurs some later, but it occurs
repeatable if I copy large file on LAN, or export an X11 environment to
another box.
2. Changing the kernel to 2.2.19 results the same thing.
Donald Becker wrote, that he think, this apparently could be a bug with
the interrupt handling in the 2.4.9 kernel, not inside
the (his) driver itself.
The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM)
is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail'
- warning.
The lockup occurs too with 'noapic' on boot.
At third stage I can try another and 'smp-cleaner' (I think) NIC, D-Link
DFE-500 TX, based on DEC-Chip, using the tulip-driver.
Nothing is wrote about this in /var/log messages. The box is SCSI only,
Adaptec 29160N.
/proc/interrupts:
CPU0 CPU1
0: 273705 282423 IO-APIC-edge timer
1: 4891 5117 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 0 1 IO-APIC-edge rtc
10: 8578 8328 IO-APIC-level aic7xxx
11: 962066 961390 IO-APIC-level mga@PCI:1:0:0, es1371
12: 109685 111089 IO-APIC-edge PS/2 Mouse
15: 2273 2295 IO-APIC-level eth0
NMI: 0 0
LOC: 556044 556060
ERR: 0
MIS: 0
Looks clean :-(
Are there any patches, hints or recommendations known about this?
__
Best regards from Germany
Matthias
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: repeatable SMP lockups - kernel 2.4.9 2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase @ 2001-09-14 14:54 ` Martin Josefsson 2001-09-14 16:23 ` Matthias Haase 2001-09-14 18:37 ` Andrew Morton 1 sibling, 1 reply; 8+ messages in thread From: Martin Josefsson @ 2001-09-14 14:54 UTC (permalink / raw) To: Matthias Haase; +Cc: linux-kernel On Fri, 14 Sep 2001, Matthias Haase wrote: > Our new SMP file- and printserver locks always hard up, if higher load > come on the NIC. True stable without networking (X11, DRI I have the similar problems with 4 routers here, they get quite high network load sometimes... not really good. > 1. First, I have changed the NIC from 3Com (vortex-driver) to noname, > driven by Realtek > RTL-8139 (rev 10) and the lockup occurs some later, but it occurs > repeatable if I copy large file on LAN, or export an X11 environment to > another box. I used to be able to get the routers to hang in under 30minutes, but with 2.4.8-ac12 one of them survived my testing for over 36hours. But when I put it into production thinking that it's more stable than the other kernels it hung after 5-10minutes of operation. > 2. Changing the kernel to 2.2.19 results the same thing. Havn't tried any 2.2 kernels here because I want iptables. > Donald Becker wrote, that he think, this apparently could be a bug with > the interrupt handling in the 2.4.9 kernel, not inside > the (his) driver itself. > > The boot on the mainboard (Asus CUV266-D, 2x PIII 1 GHz, 512 mb DDR-RAM) > is always o.k. with APIC, excepting the 'unexpected IO-APIC, please mail' > - warning. > The lockup occurs too with 'noapic' on boot. Our routers consists of Asus P3C-D (i820 chipset), 2xpIII 800MHz, 256MB rimm. As a lot of people know, the i820 chipset is very unstable _if_ you have SDRAM but not with rimm as it was built for. Running with 'noapic' still freezes but I don't think it occurs as frequently as when runnign with IOAPIC. > At third stage I can try another and 'smp-cleaner' (I think) NIC, D-Link > DFE-500 TX, based on DEC-Chip, using the tulip-driver. I'm using D-Link DFE-570TX which is a quad tulip (DECchip 21143 rev 65). I've been using both the stock driver in the kernels and an optimzed one, I get a lockup with both. > Nothing is wrote about this in /var/log messages. The box is SCSI only, Just a hard lockup, it doesn't say anything at all, just a freeze, keyboard doesn't work (not even numlock). I also have a Adaptec 29160 card in our routers for logging to a scsi-disk. Now that I think of it, the one I thought was stable didn't have a SCSI-disk in it, and then I moved the flashdisk to the other router that was in production and that died (but the logging isn't running). > /proc/interrupts: > > CPU0 CPU1 > 0: 273705 282423 IO-APIC-edge timer > 1: 4891 5117 IO-APIC-edge keyboard > 2: 0 0 XT-PIC cascade > 8: 0 1 IO-APIC-edge rtc > 10: 8578 8328 IO-APIC-level aic7xxx > 11: 962066 961390 IO-APIC-level mga@PCI:1:0:0, es1371 > 12: 109685 111089 IO-APIC-edge PS/2 Mouse > 15: 2273 2295 IO-APIC-level eth0 > NMI: 0 0 > LOC: 556044 556060 > ERR: 0 > MIS: 0 > > > Looks clean :-( Looks as clean as in my routers and then suddenly a freeze comes along and ruins my day (I have watchdogcards but it still ruins my day knowing that the router froze) > Are there any patches, hints or recommendations known about this? I havn't found anything about this at all :( I have two of these routers right here next to my desk and I'm going to do some heavy testing on them, one of them is the one I thought was stable and the other one is virtually untested. I'm going to try with and without scsi-cards and comparing BIOS-settings om them (But with my luck I'm probably going to manage to make the "maybe stable" router freeze too. /Martin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9 2001-09-14 14:54 ` Martin Josefsson @ 2001-09-14 16:23 ` Matthias Haase 2001-09-14 16:26 ` Martin Josefsson 0 siblings, 1 reply; 8+ messages in thread From: Matthias Haase @ 2001-09-14 16:23 UTC (permalink / raw) To: Martin Josefsson; +Cc: linux-kernel Hi, Martin, I hope, this sounds not to stupid: As an hardware test I have run quake3d_demo with enabled DRI. For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I could use the installed Xfree86 4.03 instead the required 4.1: No error, no lockup, even though this game produced heavy load on ram and harddisks. No lockup too with the small traffic on the NIC, for instance with the ADSL-connection (max. 90 kb/s) to our router. But, as I sayd, repeatable lockups with some higher network-traffic inside the LAN. regards Matthias -- Gruesse Matthias Haase | Telefon +49-(0)3733-23713 Markt 2 | Telefax +49-(0)3733-22660 | D-09456 Annaberg-Buchholz | http://www.bennewitz.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9 2001-09-14 16:23 ` Matthias Haase @ 2001-09-14 16:26 ` Martin Josefsson 2001-09-15 7:20 ` Matthias Haase 0 siblings, 1 reply; 8+ messages in thread From: Martin Josefsson @ 2001-09-14 16:26 UTC (permalink / raw) To: Matthias Haase; +Cc: linux-kernel On Fri, 14 Sep 2001, Matthias Haase wrote: > Hi, Martin, > > > I hope, this sounds not to stupid: > > As an hardware test I have run quake3d_demo with enabled DRI. > For this, I have compiled the 2.4.9 kernel the older DRM-code in, so I > could use the installed Xfree86 4.03 instead the required 4.1: > > No error, no lockup, even though this game produced heavy load on ram and > harddisks. > No lockup too with the small traffic on the NIC, for instance with the > ADSL-connection (max. 90 kb/s) to our router. > But, as I sayd, repeatable lockups with some higher network-traffic inside > the LAN. I don't think it sounds that stupid.. but if it had hung you wouldn't have known if it was the possible interupthandeling bug or some oghet bug in DRI/DRM :) I'm going to start my tests here soon. /Martin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9 2001-09-14 16:26 ` Martin Josefsson @ 2001-09-15 7:20 ` Matthias Haase 0 siblings, 0 replies; 8+ messages in thread From: Matthias Haase @ 2001-09-15 7:20 UTC (permalink / raw) To: Martin Josefsson; +Cc: linux-kernel On Fri, 14 Sep 2001 18:26:04 +0200 (CEST) Martin Josefsson <gandalf@wlug.westbo.se> wrote: > I don't think it sounds that stupid.. but if it had hung you wouldn't > have > known if it was the possible interupthandeling bug or some oghet bug in > DRI/DRM :) Yes, but I now (relative) sure, that's ram-timing (it's DDR-RAM on 266 mHz) and cpu-clock are right. Have found last night, that the box lockup too, if I use the scanner and scanning a large file. For scanning, I use an second additional SCSI-Controller (Dawicontrol, based on AMD 53c974 [PCscsi]). The preview scan is o.k., but the scan itself stops (and lockup hard the machine of course), if 4-5 mb are transfered. Sounds like an interrupt handling error? > I'm going to start my tests here soon. > > /Martin Please let me known about your results. regards Matthias -- Gruesse Matthias Haase | Telefon +49-(0)3733-23713 Markt 2 | Telefax +49-(0)3733-22660 | D-09456 Annaberg-Buchholz | http://www.bennewitz.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: repeatable SMP lockups - kernel 2.4.9 2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase 2001-09-14 14:54 ` Martin Josefsson @ 2001-09-14 18:37 ` Andrew Morton 2001-09-15 8:32 ` Matthias Haase 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2001-09-14 18:37 UTC (permalink / raw) To: Matthias Haase; +Cc: linux-kernel Matthias Haase wrote: > > Our new SMP file- and printserver locks always hard up, if higher load > come on the NIC. True stable without networking (X11, DRI > Have you tried enabling the NMI watchdog? Boot with the nmi_watchdog=1 LILO option. ^ permalink raw reply [flat|nested] 8+ messages in thread
* repeatable SMP lockups - kernel 2.4.9 2001-09-14 18:37 ` Andrew Morton @ 2001-09-15 8:32 ` Matthias Haase 0 siblings, 0 replies; 8+ messages in thread From: Matthias Haase @ 2001-09-15 8:32 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton Hi, Andrew... > Have you tried enabling the NMI watchdog? Boot with the > > nmi_watchdog=1 > > LILO option. Have this tried today, but no debugging messages is printed out. The cursor blinks, and if the hang comes up, blinking is frozen. Have nmi_watchdog=1 set in lilo.conf + # /etc/lilo -v -v No watchdog or software-watchdog is compiled in the kernel, but I think, this isn't related to the nmi_watchdog? Thank's for your help. regards Matthias -- Gruesse Matthias Haase | Telefon +49-(0)3733-23713 Markt 2 | Telefax +49-(0)3733-22660 | D-09456 Annaberg-Buchholz | http://www.bennewitz.com ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>]
* Re: repeatable SMP lockups - kernel 2.4.9 [not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com> @ 2001-09-14 15:46 ` Matthias Haase 0 siblings, 0 replies; 8+ messages in thread From: Matthias Haase @ 2001-09-14 15:46 UTC (permalink / raw) To: James Washer; +Cc: linux-kernel Hi, Jim... > have you enable Magic Sysrq, and attempted to get a register dump > (Alt-Sysrq-p).. Alt-Sysrq-* doesn't work at this time. Couldn't do a sync/mount/read-only/boot or get a dump with 'p'. regards Matthias -- Gruesse Matthias Haase | Telefon +49-(0)3733-23713 Markt 2 | Telefax +49-(0)3733-22660 | D-09456 Annaberg-Buchholz | http://www.bennewitz.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-09-15 8:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-14 12:30 repeatable SMP lockups - kernel 2.4.9 Matthias Haase
2001-09-14 14:54 ` Martin Josefsson
2001-09-14 16:23 ` Matthias Haase
2001-09-14 16:26 ` Martin Josefsson
2001-09-15 7:20 ` Matthias Haase
2001-09-14 18:37 ` Andrew Morton
2001-09-15 8:32 ` Matthias Haase
[not found] <OF21F37EC6.10570427-ON88256AC7.0052A32C@boulder.ibm.com>
2001-09-14 15:46 ` Matthias Haase
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox