* eth*: transmit timed out since .27 (was: linux-2.4.27 released)
2004-08-07 23:28 linux-2.4.27 released Marcelo Tosatti
@ 2004-08-10 12:23 ` Oliver Feiler
2004-08-13 10:15 ` Marcelo Tosatti
0 siblings, 1 reply; 12+ messages in thread
From: Oliver Feiler @ 2004-08-10 12:23 UTC (permalink / raw)
To: Marcelo Tosatti, linux-kernel
[-- Attachment #1.1: body text --]
[-- Type: text/plain, Size: 2513 bytes --]
Hi,
I've upgraded a server from .26 to .27, but ran into problems with the network
cards.
The kernel throws a lot of errors into the syslog and the net devices don't
work:
Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004 00000249
Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status 00000004 00000241
Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:42 spot kernel: eth1: Transmit timeout, status 00000004 00000240
[...]
and:
Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
ISR=0x3, t=515.
Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
ISR=0x3, t=5015.
Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
ISR=0x3, t=1014.
[...]
The system has three network cards.
eth0: SIS900 (sis900.c)
eth1: RTL-8029 (ne2k-pci.c)
eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)
eth0 and eth1 share the same interrupt
CPU0
0: 91986 XT-PIC timer
1: 935 XT-PIC keyboard
2: 0 XT-PIC cascade
8: 1 XT-PIC rtc
9: 0 XT-PIC acpi
10: 25109 XT-PIC via82cxxx, usb-uhci, usb-uhci, eth0, eth1
11: 24 XT-PIC usb-uhci, eth2
14: 7523 XT-PIC ide0
15: 7021 XT-PIC ide1
NMI: 0
ERR: 0
That was not a problem in .26 however. Though it seems to be the cause of the
problem (lost interrupt)? The hardware this is all running on is an Asrock
K7VM4 mainboard. The system is booted with "pci=noacpi" (ACPI, no APM).
Otherwise IRQ255 is assigned to IDE and someone told me the noacpi parameter
would fix the board's braindead BIOS.
Either way .27 doesn't want to boot. I've attached dmesg from a running 2.4.26
kernel and the config used for 2.4.27.
Other postings I've found say that the transmit timeouts mean that the
lowlevel ethernet connection between the NICs broke. But this works fine in
earlier kernels and only eth0 and eth1 which share an interrupt are affected.
I'd be glad for any more suggestions on what might be causing this. :)
Thanks,
Oliver
--
Oliver Feiler - http://kiza.kcore.de/
[-- Attachment #1.2: dmesg --]
[-- Type: text/plain, Size: 9313 bytes --]
Linux version 2.4.26 (root@spot) (gcc version 3.3.4) #3 Mon Jul 5 15:32:52 CEST 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000d0000 - 00000000000d6000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000f7f0000 (usable)
BIOS-e820: 000000000f7f0000 - 000000000f7f8000 (ACPI data)
BIOS-e820: 000000000f7f8000 - 000000000f800000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
247MB LOWMEM available.
On node 0 totalpages: 63472
zone(0): 4096 pages.
zone(1): 59376 pages.
zone(2): 0 pages.
ACPI: RSDP (v000 AMI ) @ 0x000fa620
ACPI: RSDT (v001 AMIINT VIA_K7 0x00000010 MSFT 0x00000097) @ 0x0f7f0000
ACPI: FADT (v001 AMIINT VIA_K7 0x00000011 MSFT 0x00000097) @ 0x0f7f0030
ACPI: MADT (v001 AMIINT VIA_K7 0x00000009 MSFT 0x00000097) @ 0x0f7f00c0
ACPI: DSDT (v001 VIA K7VT4 0x00001000 MSFT 0x0100000d) @ 0x00000000
Kernel command line: BOOT_IMAGE=Linux.old ro root=900 pci=noacpi
Initializing CPU#0
Detected 599.436 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1196.03 BogoMIPS
Memory: 248184k/253888k available (1668k kernel code, 5316k reserved, 578k data, 92k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 64K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0183fbff c1c7fbff 00000000 00000000
CPU: Common caps: 0183fbff c1c7fbff 00000000 00000000
CPU: AMD Duron(tm) stepping 00
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
ACPI: Subsystem revision 20040326
PCI: PCI BIOS revision 2.10 entry at 0xfdae1, last bus=1
PCI: Using configuration type 1
ACPI: IRQ9 SCI: Edge set to Level Trigger.
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: System [ACPI] (supports S0 S1 S4 S5)
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12 14 15)
PCI: Probing PCI hardware
PCI: Using IRQ router default [1106/3177] at 00:11.0
PCI: Hardcoded IRQ 14 for device 00:11.1
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ACPI: Power Button (FF) [PWRF]
ACPI: Sleep Button (CM) [SLPB]
ACPI: Processor [CPU1] (supports C1)
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10f
FDC 0 is a post-1991 82077
ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker
http://www.scyld.com/network/ne2k-pci.html
eth0: RealTek RTL-8029 found at 0xe800, IRQ 10, 00:00:E8:5C:2D:AA.
sis900.c: v1.08.06 9/24/2002
eth1: SiS 900 Internal MII PHY transceiver found at address 1.
eth1: Using transceiver found at address 1 as default
eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 10, 00:c0:ca:16:4c:b6.
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:11.1
PCI: Hardcoded IRQ 14 for device 00:11.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD800BB-00CAA1, ATA DISK drive
blk: queue c0371b40, I/O limit 4095Mb (mask 0xffffffff)
hdc: ST380011A, ATA DISK drive
blk: queue c0371f94, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
hdc: attached ide-disk driver.
hdc: host protected area => 1
hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
Partition check:
hda: hda1 hda2 hda3
hdc: hdc1 hdc2 hdc3
Via 686a/8233/8235 audio driver 1.9.1-ac3
via82cxxx: Six channel audio available
PCI: Setting latency timer of device 00:11.5 to 64
ac97_codec: AC97 codec, id: CMI97 (CMedia)
AC97 codec does not have proper volume support.
via82cxxx: Codec rate locked at 48Khz
via82cxxx: board #1 at 0xD800, IRQ 10
usb.c: registered new driver hub
host/usb-uhci.c: $Revision: 1.275 $ time 15:33:03 Jul 5 2004
host/usb-uhci.c: High bandwidth mode enabled
host/usb-uhci.c: USB UHCI at I/O 0xe400, IRQ 10
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: USB UHCI at I/O 0xe000, IRQ 10
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: USB UHCI at I/O 0xdc00, IRQ 11
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
i2c-core.o: i2c core module version 2.8.3 (20040115)
i2c-dev.o: i2c /dev entries driver module version 2.8.3 (20040115)
i2c-proc.o version 2.8.3 (20040115)
md: raid1 personality registered as nr 3
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 000005f4]
[events: 000005f4]
md: autorun ...
md: considering hdc3 ...
md: adding hdc3 ...
md: adding hda3 ...
md: created md0
md: bind<hda3,1>
md: bind<hdc3,2>
md: running: <hdc3><hda3>
md: hdc3's event counter: 000005f4
md: hda3's event counter: 000005f4
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdc3 operational as mirror 1
raid1: device hda3 operational as mirror 0
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdc3 [events: 000005f5]<6>(write) hdc3's sb offset: 77858944
md: hda3 [events: 000005f5]<6>(write) hda3's sb offset: 77851776
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 32768)
ip_conntrack version 2.1 (1983 buckets, 15864 max) - 288 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 92k freed
Adding Swap: 240964k swap-space (priority -1)
Adding Swap: 249976k swap-space (priority -2)
EXT3 FS 2.4-0.9.19, 19 August 2002 on md(9,0), internal journal
i2c-viapro.o version 2.8.3 (20040115)
i2c-dev.o: Registered 'SMBus Via Pro adapter at 0400' as minor 0
i2c-isa.o version 2.8.3 (20040115)
i2c-dev.o: Registered 'ISA main adapter' as minor 1
w83627hf.o version 2.8.3 (20040115)
via-rhine.c:v1.10-LK1.1.19 July-12-2003 Written by Donald Becker
http://www.scyld.com/network/via-rhine.html
eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 11.
eth2: MII PHY found at address 1, status 0x786d advertising 05e1 Link 45e1.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
eth2: Setting full-duplex based on MII #1 link partner capability of 45e1.
eth1: Media Link On 100mbps half-duplex
HTB init, kernel part version 3.16
[-- Attachment #1.3: config-2.4.27.gz --]
[-- Type: application/x-gzip, Size: 5287 bytes --]
[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
2004-08-10 12:23 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Oliver Feiler
@ 2004-08-13 10:15 ` Marcelo Tosatti
2004-08-13 21:56 ` Oliver Feiler
0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-13 10:15 UTC (permalink / raw)
To: Oliver Feiler; +Cc: Marcelo Tosatti, linux-kernel
Hi Oliver,
On Tue, Aug 10, 2004 at 02:23:34PM +0200, Oliver Feiler wrote:
> Hi,
>
> I've upgraded a server from .26 to .27, but ran into problems with the network
> cards.
>
> The kernel throws a lot of errors into the syslog and the net devices don't
> work:
> Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004 00000249
> Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status 00000004 00000241
> Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:42 spot kernel: eth1: Transmit timeout, status 00000004 00000240
> [...]
>
> and:
> Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> ISR=0x3, t=515.
> Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> ISR=0x3, t=5015.
> Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> ISR=0x3, t=1014.
> [...]
>
> The system has three network cards.
> eth0: SIS900 (sis900.c)
> eth1: RTL-8029 (ne2k-pci.c)
> eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)
>
> eth0 and eth1 share the same interrupt
>
> CPU0
> 0: 91986 XT-PIC timer
> 1: 935 XT-PIC keyboard
> 2: 0 XT-PIC cascade
> 8: 1 XT-PIC rtc
> 9: 0 XT-PIC acpi
> 10: 25109 XT-PIC via82cxxx, usb-uhci, usb-uhci, eth0, eth1
> 11: 24 XT-PIC usb-uhci, eth2
> 14: 7523 XT-PIC ide0
> 15: 7021 XT-PIC ide1
> NMI: 0
> ERR: 0
Wow, you have four devices on the same interrupt line. /proc/interrupts
from 2.4.26/27 looks the same?
> That was not a problem in .26 however. Though it seems to be the cause of the
> problem (lost interrupt)? The hardware this is all running on is an Asrock
> K7VM4 mainboard. The system is booted with "pci=noacpi" (ACPI, no APM).
> Otherwise IRQ255 is assigned to IDE and someone told me the noacpi parameter
> would fix the board's braindead BIOS.
>
> Either way .27 doesn't want to boot. I've attached dmesg from a running 2.4.26
> kernel and the config used for 2.4.27.
You mean it boots but you get the Tx timeouts?
> Other postings I've found say that the transmit timeouts mean that the
> lowlevel ethernet connection between the NICs broke. But this works fine in
> earlier kernels and only eth0 and eth1 which share an interrupt are affected.
> I'd be glad for any more suggestions on what might be causing this. :)
Well there are some changes to sis900 between .26 and .27 but I doubt
they could be causing it.
Can you try to boot with ACPI disabled? I think the problem might be
related to ACPI configuration.
Also, can you post the boot messages from 2.4.27?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
2004-08-13 10:15 ` Marcelo Tosatti
@ 2004-08-13 21:56 ` Oliver Feiler
0 siblings, 0 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-13 21:56 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Marcelo Tosatti, linux-kernel
[-- Attachment #1.1: body text --]
[-- Type: text/plain, Size: 4036 bytes --]
Hi Marcelo,
On Friday 13 August 2004 12:15, Marcelo Tosatti wrote:
> On Tue, Aug 10, 2004 at 02:23:34PM +0200, Oliver Feiler wrote:
> > Hi,
> >
> > I've upgraded a server from .26 to .27, but ran into problems with the
> > network cards.
> >
> > The kernel throws a lot of errors into the syslog and the net devices
> > don't work:
> > Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> > Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004
> > 00000249 Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit
> > timed out Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status
> > 00000004 00000241 Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1:
> > transmit timed out Aug 10 13:39:42 spot kernel: eth1: Transmit timeout,
> > status 00000004 00000240 [...]
> >
> > and:
> > Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=515.
> > Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=5015.
> > Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=1014.
> > [...]
> >
> > The system has three network cards.
> > eth0: SIS900 (sis900.c)
> > eth1: RTL-8029 (ne2k-pci.c)
> > eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)
> >
> > eth0 and eth1 share the same interrupt
> >
> > CPU0
> > 0: 91986 XT-PIC timer
> > 1: 935 XT-PIC keyboard
> > 2: 0 XT-PIC cascade
> > 8: 1 XT-PIC rtc
> > 9: 0 XT-PIC acpi
> > 10: 25109 XT-PIC via82cxxx, usb-uhci, usb-uhci, eth0,
> > eth1 11: 24 XT-PIC usb-uhci, eth2
> > 14: 7523 XT-PIC ide0
> > 15: 7021 XT-PIC ide1
> > NMI: 0
> > ERR: 0
>
> Wow, you have four devices on the same interrupt line. /proc/interrupts
> from 2.4.26/27 looks the same?
There are five on int10. ;) It's worse on my desktop box with six devices on
int11. But hey, Linux works just fine so I never cared.
Yes, /proc/interrupts from .26 and .27 is the same.
> > Either way .27 doesn't want to boot. I've attached dmesg from a running
> > 2.4.26 kernel and the config used for 2.4.27.
>
> You mean it boots but you get the Tx timeouts?
Yes.
> Well there are some changes to sis900 between .26 and .27 but I doubt
> they could be causing it.
>
> Can you try to boot with ACPI disabled? I think the problem might be
> related to ACPI configuration.
>
> Also, can you post the boot messages from 2.4.27?
I've attached three boots with .27. One without any parameters, one with
acpi=off and pci=noacpi (the way I booted previous kernels).
It seems I've found the problem. The network errors were caused by the
pci=noacpi boot parameter. Once I boot without any parameter or acpi=off it
works just fine.
Btw, how can I boot with ACPI disabled? I thought it was acpi=off, but it
doesn't seem to make any difference, the kernel still uses ACPI (see
dmesg-2.4.27-acpi=off.gz attachement).
Also there must have been some (positive) changes to ACPI in 2.4.27? With
earlier kernels I had this problem:
Feb 6 18:31:27 spot kernel: PCI: Using ACPI for IRQ routing
Feb 6 18:31:27 spot kernel: PCI: if you experience problems, try using option
'pci=noacpi' or even 'acpi=off'
[...]
Feb 6 18:31:27 spot kernel: PCI: No IRQ known for interrupt pin A of device
00:11.1 - using IRQ 255
This seems to have been corrected as of 2.4.27. I still get the
PCI: No IRQ known for interrupt pin A of device 00:11.1
warning, but it doesn't assign IRQ 255 anymore which I take as a good sign. :)
So, it seems to work fine now. If you still want me to test something
regarding ACPI on this mainboard feel free to ask.
Thanks for your help,
Oliver
--
Oliver Feiler - http://kiza.kcore.de/
[-- Attachment #1.2: dmesg-2.4.27-acpi.gz --]
[-- Type: application/x-gzip, Size: 3854 bytes --]
[-- Attachment #1.3: dmesg-2.4.27-acpi=off.gz --]
[-- Type: application/x-gzip, Size: 3867 bytes --]
[-- Attachment #1.4: dmesg-2.4.27-pci=noacpi.gz --]
[-- Type: application/x-gzip, Size: 3945 bytes --]
[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
[not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com>
@ 2004-08-16 17:52 ` Len Brown
2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler
0 siblings, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 17:52 UTC (permalink / raw)
To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
Oliver,
I'm glad that turning off "pci=noacpi" fixed your system.
I don't know why the legacy irqrouter didn't work, but
as ACPI works, I'm not going to worry about it;-)
I expect the "acpi=off" experiment would behave the same as
"pci=noacpi", but it looks like in your experiment you
mis-spelled that parameter as apci=off, so instead it was the
same as the default ACPI-enabled case.
Re: lots of interrupts on the same IRQ.
There are boot params to balance out the IRQs in PIC mode,
but what you want to do on this system is enable the IOAPIC
in your kernel config. The existence of the MADT in your
ACPI tables suggests you may have one. An IOAPIC will bring
additional interrupt pins to bear, usually allowing
the PCI interrupts to use IRQs > 16 where they may
not have to share so much.
cheers,
-Len
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown
@ 2004-08-16 18:44 ` Oliver Feiler
2004-08-16 19:08 ` Oliver Feiler
2004-08-16 19:38 ` Len Brown
0 siblings, 2 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 18:44 UTC (permalink / raw)
To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
Hello Len,
Len Brown wrote:
> Oliver,
> I'm glad that turning off "pci=noacpi" fixed your system.
> I don't know why the legacy irqrouter didn't work, but
> as ACPI works, I'm not going to worry about it;-)
Well, it did work with 2.4.26, but I agree that it's better to get the
new stuff to work correctly. ;) I just noticed that /proc/interrupts and
/proc/pci, lspci still disagree on the IRQ of the IDE device.
CPU0
0: 112337 IO-APIC-edge timer
1: 2 IO-APIC-edge keyboard
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
14: 9296 IO-APIC-edge ide0
15: 9078 IO-APIC-edge ide1
17: 24 IO-APIC-level eth1
18: 125085 IO-APIC-level eth0
21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci
22: 0 IO-APIC-level via82cxxx
23: 2976 IO-APIC-level eth2
NMI: 0
LOC: 112313
ERR: 0
MIS: 42
vs.
00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06)
(prog-if 8a [Master SecP PriP])
Subsystem: Unknown device 1849:0571
Flags: bus master, medium devsel, latency 32, IRQ 255
I/O ports at fc00 [size=16]
Capabilities: <available only to root>
This probably has to do with this boot message:
PCI: No IRQ known for interrupt pin A of device 00:11.1
I have found absolutely nothing that explains if this is an error or
just some sort of debug message one can ignore.
>
> I expect the "acpi=off" experiment would behave the same as
> "pci=noacpi", but it looks like in your experiment you
> mis-spelled that parameter as apci=off, so instead it was the
> same as the default ACPI-enabled case.
Oh, thanks for noticing. Stupid me.
>
> Re: lots of interrupts on the same IRQ.
> There are boot params to balance out the IRQs in PIC mode,
> but what you want to do on this system is enable the IOAPIC
> in your kernel config. The existence of the MADT in your
> ACPI tables suggests you may have one. An IOAPIC will bring
> additional interrupt pins to bear, usually allowing
> the PCI interrupts to use IRQs > 16 where they may
> not have to share so much.
Ok, I've turned on the IOAPIC and it seems to work perfectly fine.
Except for that IRQ 255 thing I've noticed no oddities. Thanks for the
hint. :)
cu
Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler
@ 2004-08-16 19:08 ` Oliver Feiler
2004-08-16 19:50 ` Len Brown
2004-08-16 19:38 ` Len Brown
1 sibling, 1 reply; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 19:08 UTC (permalink / raw)
To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]
Oliver Feiler wrote:
>
>
> Ok, I've turned on the IOAPIC and it seems to work perfectly fine.
> Except for that IRQ 255 thing I've noticed no oddities. Thanks for the
> hint. :)
No, not quite. After about 30 minutes of uptime and a moderate load of
eth0 (100-200KB/s constant data flow) it happened again. :(
Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x97, t=36.
Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=141.
Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=545.
[repeating endlessly]
I've booted a kernel without APIC and IOAPIC compiled and it works again.
I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really
know where to look for the problem here. The interrupt counter for the
IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly
after a while. And after a while is seems to get stuck (Tx timed out).
"ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to
fix such network problems.
cu
Oliver
[-- Attachment #2: dmesg-2.4.27-ioapic.gz --]
[-- Type: application/x-gzip, Size: 4878 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler
2004-08-16 19:08 ` Oliver Feiler
@ 2004-08-16 19:38 ` Len Brown
2004-08-16 20:11 ` Maciej W. Rozycki
1 sibling, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 19:38 UTC (permalink / raw)
To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
On Mon, 2004-08-16 at 14:44, Oliver Feiler wrote:
> 14: 9296 IO-APIC-edge ide0
> 15: 9078 IO-APIC-edge ide1
> 17: 24 IO-APIC-level eth1
> 18: 125085 IO-APIC-level eth0
> 21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci
> 22: 0 IO-APIC-level via82cxxx
> 23: 2976 IO-APIC-level eth2
> NMI: 0
> LOC: 112313
> ERR: 0
> MIS: 42
This is unusual.
MIS is a hardware workaround and should normally be 0.
>
>
> vs.
>
> 00:11.1 IDE interface: VIA Technologies, Inc.
> VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev
> 06)
> (prog-if 8a [Master SecP PriP])
> Subsystem: Unknown device 1849:0571
> Flags: bus master, medium devsel, latency 32, IRQ 255
> I/O ports at fc00 [size=16]
> Capabilities: <available only to root>
>
> This probably has to do with this boot message:
> PCI: No IRQ known for interrupt pin A of device 00:11.1
> I have found absolutely nothing that explains if this is an error or
> just some sort of debug message one can ignore.
Yes, ignore it.
This is where that message about 255 came from.
When ACPI failed to find a PCI-routing-table entry
for this device, it looked in PCI config space
and found the 255 you see above. The only recent
change is that it dosn't try to use an obviously
bogus value. But in either case, with this device
it is moot as the hardware and the driver are hard-coded.
-Len
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 19:08 ` Oliver Feiler
@ 2004-08-16 19:50 ` Len Brown
2004-08-16 23:04 ` Oliver Feiler
0 siblings, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 19:50 UTC (permalink / raw)
To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
On Mon, 2004-08-16 at 15:08, Oliver Feiler wrote:
> Oliver Feiler wrote:
> >
> >
> > Ok, I've turned on the IOAPIC and it seems to work perfectly fine.
> > Except for that IRQ 255 thing I've noticed no oddities. Thanks for
> the
> > hint. :)
>
> No, not quite. After about 30 minutes of uptime and a moderate load of
> eth0 (100-200KB/s constant data flow) it happened again. :(
>
> Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt?
> TSR=0x3, ISR=0x97, t=36.
> Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt?
> TSR=0x3, ISR=0x3, t=141.
> Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt?
> TSR=0x3, ISR=0x3, t=545.
> [repeating endlessly]
>
> I've booted a kernel without APIC and IOAPIC compiled and it works
> again.
>
> I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really
> know where to look for the problem here. The interrupt counter for the
> IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly
> after a while. And after a while is seems to get stuck (Tx timed out).
> "ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to
> fix such network problems.
You've got 3 ethernet controllers.
eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA.
eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6.
eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23.
And eth0 is failing.
See if you can give its network cable and its IRQ to on of the other
devices and see if the error follows the load and the wires,
or stays with the device.
The quirks for this hardware look totally broken in IOAPIC mode:
PCI: Via IRQ fixup for 00:10.2, from 10 to 5
PCI: Via IRQ fixup for 00:10.1, from 10 to 5
PCI: Via IRQ fixup for 00:10.0, from 11 to 5
I have no idea if they're a nop or not, but you might exeriment with
disabling them. Sure isn't obvious that something called
quirk_via_irqpic() should be running in IOAPIC mode.
I'd try disabling quirk_via_acpi() too.
cheers,
-Len
ps. to exchange IRQs, you'll need to physically exchange the slots
of the cards, easy enough unless eth0 is soldered onto the
motherboard;-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 19:38 ` Len Brown
@ 2004-08-16 20:11 ` Maciej W. Rozycki
0 siblings, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2004-08-16 20:11 UTC (permalink / raw)
To: Len Brown; +Cc: Oliver Feiler, Marcelo Tosatti, Marcelo Tosatti, linux-kernel
On Mon, 16 Aug 2004, Len Brown wrote:
> > MIS: 42
>
> This is unusual.
> MIS is a hardware workaround and should normally be 0.
Unfortunately these events seem to be triggerable for all systems using
serial APIC interrupt delivery. All that is needed is a sufficiently high
load on interrupts, even a transient one. Admittedly the definition of
"sufficient" here is very high, something like at least ten thousands of
interrupts per second. E.g. I've been able to observe a few of them on my
system when a UDP NFS client was untarring an archive over a 100Mbps
network -- both the archive and the destination were located in an NFS
mounted filesystem and the size of the untarred data was around 300MB.
The APIC hardware is rock-solid there -- after many years of operation I
have yet to see a single APIC error.
One "reliable" way of triggering these events is configuring the PIT
timer interrupt input as level-triggered in the I/O APIC. ;-) This is
actually how I did run-time testing of this code.
Maciej
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 19:50 ` Len Brown
@ 2004-08-16 23:04 ` Oliver Feiler
2004-08-16 23:42 ` Maciej W. Rozycki
2004-08-17 0:29 ` Alan Cox
0 siblings, 2 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 23:04 UTC (permalink / raw)
To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel
Hi Len,
Len Brown wrote:
>
>
> You've got 3 ethernet controllers.
>
> eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA.
> eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6.
> eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23.
Correct.
>
> And eth0 is failing.
> See if you can give its network cable and its IRQ to on of the other
> devices and see if the error follows the load and the wires,
> or stays with the device.
Doing that is a bit problematic. eth0 is a 10mbit NIC, eth1 and eth2
must be 100mbit unfortunately. I can move around (two of) the NICs in
the PCI slots however. The box is headless and a bit uncomfortable to
work with, so I'd like to try software solutions first.
>
> The quirks for this hardware look totally broken in IOAPIC mode:
> PCI: Via IRQ fixup for 00:10.2, from 10 to 5
> PCI: Via IRQ fixup for 00:10.1, from 10 to 5
> PCI: Via IRQ fixup for 00:10.0, from 11 to 5
> I have no idea if they're a nop or not, but you might exeriment with
> disabling them. Sure isn't obvious that something called
> quirk_via_irqpic() should be running in IOAPIC mode.
> I'd try disabling quirk_via_acpi() too.
Ok, I've removed the quirks from quirks.c, compiled and rebooted. I hope
I have done it right, I commented out these lines in quirks.c:
// { PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA,
PCI_DEVICE_ID_VIA_82C586_3, quirk_via_acpi },
// { PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA,
PCI_DEVICE_ID_VIA_82C686_4, quirk_via_acpi },
// { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA,
PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic },
// { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA,
PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic },
// { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA,
PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic },
The "Via IRQ fixup for dev:..." are gone from the boot messages. After
transferring about 250 MB over eth0 the "Tx timed out" error reoccured.
/proc/interrupts looked like this:
CPU0
0: 191473 IO-APIC-edge timer
1: 1244 IO-APIC-edge keyboard
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
14: 33547 IO-APIC-edge ide0
15: 23121 IO-APIC-edge ide1
17: 5699 IO-APIC-level eth1
18: 234589 IO-APIC-level eth0
21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci
22: 0 IO-APIC-level via82cxxx
23: 240873 IO-APIC-level eth2
NMI: 0
LOC: 191481
ERR: 0
MIS: 8
What exactly is MIS? Something like "interrupt occured, but I have no
idea what device caused it"? I don't know much about it, but it's always
>0 when the problem happens.
>
> cheers,
> -Len
>
> ps. to exchange IRQs, you'll need to physically exchange the slots
> of the cards, easy enough unless eth0 is soldered onto the
> motherboard;-)
Fortunately only eth2 (the VIA Rhine-II) is soldered onto the board. :)
I'll try reordering the NICs in the PCI slots. The system is used most
of the time though, so I can't take it apart and test things all the
time. I wonder if it makes sense to experiment with the IOAPIC further.
Maybe the hardware is just plain broken? Or might there be a slight
chance to get this to work the way it's intended to?
Btw, I don't know if I've ever mentioned it, it's an Asrock K7VM4 board.
lspci output is here if it might be of interest:
kiza@spot:~> lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8378 [KM400] Chipset Host
Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:09.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI
Fast Ethernet (rev 02)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
00:10.0 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0
controller] (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0
controller] (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0
controller] (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II]
(rev 74)
01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8378 [S3
UniChrome] Integrated Video (rev 01)
Thanks for your help with this. :)
Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 23:04 ` Oliver Feiler
@ 2004-08-16 23:42 ` Maciej W. Rozycki
2004-08-17 0:29 ` Alan Cox
1 sibling, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2004-08-16 23:42 UTC (permalink / raw)
To: Oliver Feiler; +Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti, linux-kernel
On Tue, 17 Aug 2004, Oliver Feiler wrote:
> MIS: 8
>
> What exactly is MIS? Something like "interrupt occured, but I have no
> idea what device caused it"? I don't know much about it, but it's always
> >0 when the problem happens.
It's a trigger mode MISmatch. It only happens for level-triggered
interrupts and the problem is they get recorded as edge-triggered ones in
the receiving local APIC. The two interrupt trigger modes require the
hardware to perform different actions when the software interrupt handler
concludes and such a mismatch would lead to a lock-up of the affected
line. Specifically, the local APIC involved sends an End Of Interrupt
(EOI) message to the originating I/O APIC for level-triggered interrupts
and for edge-triggered interrupts nothing is sent. Fortunately just
before sending the final ACK to the hardware at the conclusion of the
handler we can detect that the trigger mode recorded by the local APIC
disagrees with the setup of the corresponding I/O APIC line and if that
happens we execute an (expensive) unlock action at the I/O APIC so that it
resets its logic for the input as if it received an EOI message from a
local APIC for a level-triggered interrupt.
Maciej
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eth*: transmit timed out since .27
2004-08-16 23:04 ` Oliver Feiler
2004-08-16 23:42 ` Maciej W. Rozycki
@ 2004-08-17 0:29 ` Alan Cox
1 sibling, 0 replies; 12+ messages in thread
From: Alan Cox @ 2004-08-17 0:29 UTC (permalink / raw)
To: Oliver Feiler
Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti,
Linux Kernel Mailing List
Looking over the docs the whole ACPI and IOAPIC mode for these boards
seems very different and quite "magic" compared to the PCI mode which is
merely "odd" in a few places. APIC routing bits are stuffed into strange
chipset specific places which implies the quirks probably shouldn't be
applied in acpi mode.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-08-17 1:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com>
2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown
2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler
2004-08-16 19:08 ` Oliver Feiler
2004-08-16 19:50 ` Len Brown
2004-08-16 23:04 ` Oliver Feiler
2004-08-16 23:42 ` Maciej W. Rozycki
2004-08-17 0:29 ` Alan Cox
2004-08-16 19:38 ` Len Brown
2004-08-16 20:11 ` Maciej W. Rozycki
2004-08-07 23:28 linux-2.4.27 released Marcelo Tosatti
2004-08-10 12:23 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Oliver Feiler
2004-08-13 10:15 ` Marcelo Tosatti
2004-08-13 21:56 ` Oliver Feiler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox