public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* eth*: transmit timed out since .27 (was: linux-2.4.27 released)
  2004-08-07 23:28 linux-2.4.27 released Marcelo Tosatti
@ 2004-08-10 12:23 ` Oliver Feiler
  2004-08-13 10:15   ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Feiler @ 2004-08-10 12:23 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel


[-- Attachment #1.1: body text --]
[-- Type: text/plain, Size: 2513 bytes --]

Hi,

I've upgraded a server from .26 to .27, but ran into problems with the network 
cards.

The kernel throws a lot of errors into the syslog and the net devices don't 
work:
Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004 00000249 
Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status 00000004 00000241 
Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
Aug 10 13:39:42 spot kernel: eth1: Transmit timeout, status 00000004 00000240 
[...]

and:
Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
ISR=0x3, t=515.
Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
ISR=0x3, t=5015.
Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
ISR=0x3, t=1014.
[...]

The system has three network cards.
eth0: SIS900 (sis900.c)
eth1: RTL-8029 (ne2k-pci.c)
eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)

eth0 and eth1 share the same interrupt

           CPU0       
  0:      91986          XT-PIC  timer
  1:        935          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
  9:          0          XT-PIC  acpi
 10:      25109          XT-PIC  via82cxxx, usb-uhci, usb-uhci, eth0, eth1
 11:         24          XT-PIC  usb-uhci, eth2
 14:       7523          XT-PIC  ide0
 15:       7021          XT-PIC  ide1
NMI:          0 
ERR:          0

That was not a problem in .26 however. Though it seems to be the cause of the 
problem (lost interrupt)? The hardware this is all running on is an Asrock 
K7VM4 mainboard. The system is booted with "pci=noacpi" (ACPI, no APM). 
Otherwise IRQ255 is assigned to IDE and someone told me the noacpi parameter 
would fix the board's braindead BIOS.

Either way .27 doesn't want to boot. I've attached dmesg from a running 2.4.26 
kernel and the config used for 2.4.27.

Other postings I've found say that the transmit timeouts mean that the 
lowlevel ethernet connection between the NICs broke. But this works fine in 
earlier kernels and only eth0 and eth1 which share an interrupt are affected. 
I'd be glad for any more suggestions on what might be causing this. :)

Thanks,
	Oliver


-- 
Oliver Feiler  -  http://kiza.kcore.de/

[-- Attachment #1.2: dmesg --]
[-- Type: text/plain, Size: 9313 bytes --]

Linux version 2.4.26 (root@spot) (gcc version 3.3.4) #3 Mon Jul 5 15:32:52 CEST 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000d0000 - 00000000000d6000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000f7f0000 (usable)
 BIOS-e820: 000000000f7f0000 - 000000000f7f8000 (ACPI data)
 BIOS-e820: 000000000f7f8000 - 000000000f800000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
247MB LOWMEM available.
On node 0 totalpages: 63472
zone(0): 4096 pages.
zone(1): 59376 pages.
zone(2): 0 pages.
ACPI: RSDP (v000 AMI                                       ) @ 0x000fa620
ACPI: RSDT (v001 AMIINT VIA_K7   0x00000010 MSFT 0x00000097) @ 0x0f7f0000
ACPI: FADT (v001 AMIINT VIA_K7   0x00000011 MSFT 0x00000097) @ 0x0f7f0030
ACPI: MADT (v001 AMIINT VIA_K7   0x00000009 MSFT 0x00000097) @ 0x0f7f00c0
ACPI: DSDT (v001    VIA    K7VT4 0x00001000 MSFT 0x0100000d) @ 0x00000000
Kernel command line: BOOT_IMAGE=Linux.old ro root=900 pci=noacpi
Initializing CPU#0
Detected 599.436 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1196.03 BogoMIPS
Memory: 248184k/253888k available (1668k kernel code, 5316k reserved, 578k data, 92k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 64K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0183fbff c1c7fbff 00000000 00000000
CPU:             Common caps: 0183fbff c1c7fbff 00000000 00000000
CPU: AMD Duron(tm)  stepping 00
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
ACPI: Subsystem revision 20040326
PCI: PCI BIOS revision 2.10 entry at 0xfdae1, last bus=1
PCI: Using configuration type 1
ACPI: IRQ9 SCI: Edge set to Level Trigger.
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: System [ACPI] (supports S0 S1 S4 S5)
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12 14 15)
PCI: Probing PCI hardware
PCI: Using IRQ router default [1106/3177] at 00:11.0
PCI: Hardcoded IRQ 14 for device 00:11.1
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ACPI: Power Button (FF) [PWRF]
ACPI: Sleep Button (CM) [SLPB]
ACPI: Processor [CPU1] (supports C1)
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10f
FDC 0 is a post-1991 82077
ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker
  http://www.scyld.com/network/ne2k-pci.html
eth0: RealTek RTL-8029 found at 0xe800, IRQ 10, 00:00:E8:5C:2D:AA.
sis900.c: v1.08.06 9/24/2002
eth1: SiS 900 Internal MII PHY transceiver found at address 1.
eth1: Using transceiver found at address 1 as default
eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 10, 00:c0:ca:16:4c:b6.
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:11.1
PCI: Hardcoded IRQ 14 for device 00:11.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD800BB-00CAA1, ATA DISK drive
blk: queue c0371b40, I/O limit 4095Mb (mask 0xffffffff)
hdc: ST380011A, ATA DISK drive
blk: queue c0371f94, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
hdc: attached ide-disk driver.
hdc: host protected area => 1
hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
Partition check:
 hda: hda1 hda2 hda3
 hdc: hdc1 hdc2 hdc3
Via 686a/8233/8235 audio driver 1.9.1-ac3
via82cxxx: Six channel audio available
PCI: Setting latency timer of device 00:11.5 to 64
ac97_codec: AC97  codec, id: CMI97 (CMedia)
AC97 codec does not have proper volume support.
via82cxxx: Codec rate locked at 48Khz
via82cxxx: board #1 at 0xD800, IRQ 10
usb.c: registered new driver hub
host/usb-uhci.c: $Revision: 1.275 $ time 15:33:03 Jul  5 2004
host/usb-uhci.c: High bandwidth mode enabled
host/usb-uhci.c: USB UHCI at I/O 0xe400, IRQ 10
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: USB UHCI at I/O 0xe000, IRQ 10
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: USB UHCI at I/O 0xdc00, IRQ 11
host/usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
host/usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
i2c-core.o: i2c core module version 2.8.3 (20040115)
i2c-dev.o: i2c /dev entries driver module version 2.8.3 (20040115)
i2c-proc.o version 2.8.3 (20040115)
md: raid1 personality registered as nr 3
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 000005f4]
 [events: 000005f4]
md: autorun ...
md: considering hdc3 ...
md:  adding hdc3 ...
md:  adding hda3 ...
md: created md0
md: bind<hda3,1>
md: bind<hdc3,2>
md: running: <hdc3><hda3>
md: hdc3's event counter: 000005f4
md: hda3's event counter: 000005f4
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdc3 operational as mirror 1
raid1: device hda3 operational as mirror 0
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdc3 [events: 000005f5]<6>(write) hdc3's sb offset: 77858944
md: hda3 [events: 000005f5]<6>(write) hda3's sb offset: 77851776
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 32768)
ip_conntrack version 2.1 (1983 buckets, 15864 max) - 288 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 92k freed
Adding Swap: 240964k swap-space (priority -1)
Adding Swap: 249976k swap-space (priority -2)
EXT3 FS 2.4-0.9.19, 19 August 2002 on md(9,0), internal journal
i2c-viapro.o version 2.8.3 (20040115)
i2c-dev.o: Registered 'SMBus Via Pro adapter at 0400' as minor 0
i2c-isa.o version 2.8.3 (20040115)
i2c-dev.o: Registered 'ISA main adapter' as minor 1
w83627hf.o version 2.8.3 (20040115)
via-rhine.c:v1.10-LK1.1.19  July-12-2003  Written by Donald Becker
  http://www.scyld.com/network/via-rhine.html
eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 11.
eth2: MII PHY found at address 1, status 0x786d advertising 05e1 Link 45e1.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
eth2: Setting full-duplex based on MII #1 link partner capability of 45e1.
eth1: Media Link On 100mbps half-duplex 
HTB init, kernel part version 3.16

[-- Attachment #1.3: config-2.4.27.gz --]
[-- Type: application/x-gzip, Size: 5287 bytes --]

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
  2004-08-10 12:23 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Oliver Feiler
@ 2004-08-13 10:15   ` Marcelo Tosatti
  2004-08-13 21:56     ` Oliver Feiler
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-13 10:15 UTC (permalink / raw)
  To: Oliver Feiler; +Cc: Marcelo Tosatti, linux-kernel


Hi Oliver,

On Tue, Aug 10, 2004 at 02:23:34PM +0200, Oliver Feiler wrote:
> Hi,
> 
> I've upgraded a server from .26 to .27, but ran into problems with the network 
> cards.
> 
> The kernel throws a lot of errors into the syslog and the net devices don't 
> work:
> Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004 00000249 
> Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status 00000004 00000241 
> Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> Aug 10 13:39:42 spot kernel: eth1: Transmit timeout, status 00000004 00000240 
> [...]
> 
> and:
> Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
> ISR=0x3, t=515.
> Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
> ISR=0x3, t=5015.
> Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, 
> ISR=0x3, t=1014.
> [...]
> 
> The system has three network cards.
> eth0: SIS900 (sis900.c)
> eth1: RTL-8029 (ne2k-pci.c)
> eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)
> 
> eth0 and eth1 share the same interrupt
> 
>            CPU0       
>   0:      91986          XT-PIC  timer
>   1:        935          XT-PIC  keyboard
>   2:          0          XT-PIC  cascade
>   8:          1          XT-PIC  rtc
>   9:          0          XT-PIC  acpi
>  10:      25109          XT-PIC  via82cxxx, usb-uhci, usb-uhci, eth0, eth1
>  11:         24          XT-PIC  usb-uhci, eth2
>  14:       7523          XT-PIC  ide0
>  15:       7021          XT-PIC  ide1
> NMI:          0 
> ERR:          0

Wow, you have four devices on the same interrupt line. /proc/interrupts
from 2.4.26/27 looks the same?

> That was not a problem in .26 however. Though it seems to be the cause of the 
> problem (lost interrupt)? The hardware this is all running on is an Asrock 
> K7VM4 mainboard. The system is booted with "pci=noacpi" (ACPI, no APM). 
> Otherwise IRQ255 is assigned to IDE and someone told me the noacpi parameter 
> would fix the board's braindead BIOS.
> 
> Either way .27 doesn't want to boot. I've attached dmesg from a running 2.4.26 
> kernel and the config used for 2.4.27.

You mean it boots but you get the Tx timeouts?

> Other postings I've found say that the transmit timeouts mean that the 
> lowlevel ethernet connection between the NICs broke. But this works fine in 
> earlier kernels and only eth0 and eth1 which share an interrupt are affected. 
> I'd be glad for any more suggestions on what might be causing this. :)

Well there are some changes to sis900 between .26 and .27 but I doubt
they could be causing it.

Can you try to boot with ACPI disabled? I think the problem might be 
related to ACPI configuration. 

Also, can you post the boot messages from 2.4.27?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
  2004-08-13 10:15   ` Marcelo Tosatti
@ 2004-08-13 21:56     ` Oliver Feiler
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-13 21:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Marcelo Tosatti, linux-kernel


[-- Attachment #1.1: body text --]
[-- Type: text/plain, Size: 4036 bytes --]

Hi Marcelo,

On Friday 13 August 2004 12:15, Marcelo Tosatti wrote:

> On Tue, Aug 10, 2004 at 02:23:34PM +0200, Oliver Feiler wrote:
> > Hi,
> >
> > I've upgraded a server from .26 to .27, but ran into problems with the
> > network cards.
> >
> > The kernel throws a lot of errors into the syslog and the net devices
> > don't work:
> > Aug 10 13:39:25 spot kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Aug 10 13:39:26 spot kernel: NETDEV WATCHDOG: eth1: transmit timed out
> > Aug 10 13:39:26 spot kernel: eth1: Transmit timeout, status 00000004
> > 00000249 Aug 10 13:39:34 spot kernel: NETDEV WATCHDOG: eth1: transmit
> > timed out Aug 10 13:39:34 spot kernel: eth1: Transmit timeout, status
> > 00000004 00000241 Aug 10 13:39:42 spot kernel: NETDEV WATCHDOG: eth1:
> > transmit timed out Aug 10 13:39:42 spot kernel: eth1: Transmit timeout,
> > status 00000004 00000240 [...]
> >
> > and:
> > Aug 10 13:39:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=515.
> > Aug 10 13:40:25 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=5015.
> > Aug 10 13:40:40 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3,
> > ISR=0x3, t=1014.
> > [...]
> >
> > The system has three network cards.
> > eth0: SIS900 (sis900.c)
> > eth1: RTL-8029 (ne2k-pci.c)
> > eth2: onboard VIA VT6102 Rhine-II (via-rhine.c)
> >
> > eth0 and eth1 share the same interrupt
> >
> >            CPU0
> >   0:      91986          XT-PIC  timer
> >   1:        935          XT-PIC  keyboard
> >   2:          0          XT-PIC  cascade
> >   8:          1          XT-PIC  rtc
> >   9:          0          XT-PIC  acpi
> >  10:      25109          XT-PIC  via82cxxx, usb-uhci, usb-uhci, eth0,
> > eth1 11:         24          XT-PIC  usb-uhci, eth2
> >  14:       7523          XT-PIC  ide0
> >  15:       7021          XT-PIC  ide1
> > NMI:          0
> > ERR:          0
>
> Wow, you have four devices on the same interrupt line. /proc/interrupts
> from 2.4.26/27 looks the same?

There are five on int10. ;) It's worse on my desktop box with six devices on 
int11. But hey, Linux works just fine so I never cared.

Yes, /proc/interrupts from .26 and .27 is the same.

> > Either way .27 doesn't want to boot. I've attached dmesg from a running
> > 2.4.26 kernel and the config used for 2.4.27.
>
> You mean it boots but you get the Tx timeouts?

Yes.


> Well there are some changes to sis900 between .26 and .27 but I doubt
> they could be causing it.
>
> Can you try to boot with ACPI disabled? I think the problem might be
> related to ACPI configuration.
>
> Also, can you post the boot messages from 2.4.27?

I've attached three boots with .27. One without any parameters, one with 
acpi=off and pci=noacpi (the way I booted previous kernels).

It seems I've found the problem. The network errors were caused by the 
pci=noacpi boot parameter. Once I boot without any parameter or acpi=off it 
works just fine.

Btw, how can I boot with ACPI disabled? I thought it was acpi=off, but it 
doesn't seem to make any difference, the kernel still uses ACPI (see 
dmesg-2.4.27-acpi=off.gz attachement).

Also there must have been some (positive) changes to ACPI in 2.4.27? With 
earlier kernels I had this problem:

Feb 6 18:31:27 spot kernel: PCI: Using ACPI for IRQ routing
Feb 6 18:31:27 spot kernel: PCI: if you experience problems, try using option 
'pci=noacpi' or even 'acpi=off'
[...]
Feb 6 18:31:27 spot kernel: PCI: No IRQ known for interrupt pin A of device 
00:11.1 - using IRQ 255

This seems to have been corrected as of 2.4.27. I still get the
PCI: No IRQ known for interrupt pin A of device 00:11.1
warning, but it doesn't assign IRQ 255 anymore which I take as a good sign. :)

So, it seems to work fine now. If you still want me to test something 
regarding ACPI on this mainboard feel free to ask.

Thanks for your help,

Oliver

-- 
Oliver Feiler  -  http://kiza.kcore.de/

[-- Attachment #1.2: dmesg-2.4.27-acpi.gz --]
[-- Type: application/x-gzip, Size: 3854 bytes --]

[-- Attachment #1.3: dmesg-2.4.27-acpi=off.gz --]
[-- Type: application/x-gzip, Size: 3867 bytes --]

[-- Attachment #1.4: dmesg-2.4.27-pci=noacpi.gz --]
[-- Type: application/x-gzip, Size: 3945 bytes --]

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released)
       [not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com>
@ 2004-08-16 17:52 ` Len Brown
  2004-08-16 18:44   ` eth*: transmit timed out since .27 Oliver Feiler
  0 siblings, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 17:52 UTC (permalink / raw)
  To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

Oliver,
I'm glad that turning off "pci=noacpi" fixed your system.
I don't know why the legacy irqrouter didn't work, but
as ACPI works, I'm not going to worry about it;-)

I expect the "acpi=off" experiment would behave the same as
"pci=noacpi", but it looks like in your experiment you
mis-spelled that parameter as apci=off, so instead it was the
same as the default ACPI-enabled case.

Re: lots of interrupts on the same IRQ.
There are boot params to balance out the IRQs in PIC mode,
but what you want to do on this system is enable the IOAPIC
in your kernel config.  The existence of the MADT in your
ACPI tables suggests you may have one.  An IOAPIC will bring
additional interrupt pins to bear, usually allowing
the PCI interrupts to use IRQs > 16 where they may
not have to share so much.

cheers,
-Len



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown
@ 2004-08-16 18:44   ` Oliver Feiler
  2004-08-16 19:08     ` Oliver Feiler
  2004-08-16 19:38     ` Len Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 18:44 UTC (permalink / raw)
  To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

Hello Len,

Len Brown wrote:

> Oliver,
> I'm glad that turning off "pci=noacpi" fixed your system.
> I don't know why the legacy irqrouter didn't work, but
> as ACPI works, I'm not going to worry about it;-)

Well, it did work with 2.4.26, but I agree that it's better to get the 
new stuff to work correctly. ;) I just noticed that /proc/interrupts and 
/proc/pci, lspci still disagree on the IRQ of the IDE device.

            CPU0
   0:     112337    IO-APIC-edge  timer
   1:          2    IO-APIC-edge  keyboard
   8:          1    IO-APIC-edge  rtc
   9:          0   IO-APIC-level  acpi
  14:       9296    IO-APIC-edge  ide0
  15:       9078    IO-APIC-edge  ide1
  17:         24   IO-APIC-level  eth1
  18:     125085   IO-APIC-level  eth0
  21:          0   IO-APIC-level  usb-uhci, usb-uhci, usb-uhci
  22:          0   IO-APIC-level  via82cxxx
  23:       2976   IO-APIC-level  eth2
NMI:          0
LOC:     112313
ERR:          0
MIS:         42


vs.

00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06) 
(prog-if 8a [Master SecP PriP])
         Subsystem: Unknown device 1849:0571
         Flags: bus master, medium devsel, latency 32, IRQ 255
         I/O ports at fc00 [size=16]
         Capabilities: <available only to root>

This probably has to do with this boot message:
PCI: No IRQ known for interrupt pin A of device 00:11.1

I have found absolutely nothing that explains if this is an error or 
just some sort of debug message one can ignore.

> 
> I expect the "acpi=off" experiment would behave the same as
> "pci=noacpi", but it looks like in your experiment you
> mis-spelled that parameter as apci=off, so instead it was the
> same as the default ACPI-enabled case.

Oh, thanks for noticing. Stupid me.

> 
> Re: lots of interrupts on the same IRQ.
> There are boot params to balance out the IRQs in PIC mode,
> but what you want to do on this system is enable the IOAPIC
> in your kernel config.  The existence of the MADT in your
> ACPI tables suggests you may have one.  An IOAPIC will bring
> additional interrupt pins to bear, usually allowing
> the PCI interrupts to use IRQs > 16 where they may
> not have to share so much.

Ok, I've turned on the IOAPIC and it seems to work perfectly fine. 
Except for that IRQ 255 thing I've noticed no oddities. Thanks for the 
hint. :)

cu
	Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 18:44   ` eth*: transmit timed out since .27 Oliver Feiler
@ 2004-08-16 19:08     ` Oliver Feiler
  2004-08-16 19:50       ` Len Brown
  2004-08-16 19:38     ` Len Brown
  1 sibling, 1 reply; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 19:08 UTC (permalink / raw)
  To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

Oliver Feiler wrote:
> 
> 
> Ok, I've turned on the IOAPIC and it seems to work perfectly fine. 
> Except for that IRQ 255 thing I've noticed no oddities. Thanks for the 
> hint. :)

No, not quite. After about 30 minutes of uptime and a moderate load of 
eth0 (100-200KB/s constant data flow) it happened again. :(

Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt? 
TSR=0x3, ISR=0x97, t=36.
Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt? 
TSR=0x3, ISR=0x3, t=141.
Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt? 
TSR=0x3, ISR=0x3, t=545.
[repeating endlessly]

I've booted a kernel without APIC and IOAPIC compiled and it works again.

I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really 
know where to look for the problem here. The interrupt counter for the 
IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly 
after a while. And after a while is seems to get stuck (Tx timed out). 
"ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to 
fix such network problems.

cu
	Oliver


[-- Attachment #2: dmesg-2.4.27-ioapic.gz --]
[-- Type: application/x-gzip, Size: 4878 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 18:44   ` eth*: transmit timed out since .27 Oliver Feiler
  2004-08-16 19:08     ` Oliver Feiler
@ 2004-08-16 19:38     ` Len Brown
  2004-08-16 20:11       ` Maciej W. Rozycki
  1 sibling, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 19:38 UTC (permalink / raw)
  To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

On Mon, 2004-08-16 at 14:44, Oliver Feiler wrote:

>   14:       9296    IO-APIC-edge  ide0
>   15:       9078    IO-APIC-edge  ide1
>   17:         24   IO-APIC-level  eth1
>   18:     125085   IO-APIC-level  eth0
>   21:          0   IO-APIC-level  usb-uhci, usb-uhci, usb-uhci
>   22:          0   IO-APIC-level  via82cxxx
>   23:       2976   IO-APIC-level  eth2
> NMI:          0
> LOC:     112313
> ERR:          0
> MIS:         42

This is unusual.
MIS is a hardware workaround and should normally be 0.

> 
> 
> vs.
> 
> 00:11.1 IDE interface: VIA Technologies, Inc. 
> VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev
> 06) 
> (prog-if 8a [Master SecP PriP])
>          Subsystem: Unknown device 1849:0571
>          Flags: bus master, medium devsel, latency 32, IRQ 255
>          I/O ports at fc00 [size=16]
>          Capabilities: <available only to root>
> 
> This probably has to do with this boot message:
> PCI: No IRQ known for interrupt pin A of device 00:11.1

> I have found absolutely nothing that explains if this is an error or 
> just some sort of debug message one can ignore.

Yes, ignore it.

This is where that message about 255 came from.
When ACPI failed to find a PCI-routing-table entry
for this device, it looked in PCI config space
and found the 255 you see above.  The only recent
change is that it dosn't try to use an obviously
bogus value.  But in either case, with this device
it is moot as the hardware and the driver are hard-coded.

-Len



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 19:08     ` Oliver Feiler
@ 2004-08-16 19:50       ` Len Brown
  2004-08-16 23:04         ` Oliver Feiler
  0 siblings, 1 reply; 12+ messages in thread
From: Len Brown @ 2004-08-16 19:50 UTC (permalink / raw)
  To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

On Mon, 2004-08-16 at 15:08, Oliver Feiler wrote:
> Oliver Feiler wrote:
> > 
> > 
> > Ok, I've turned on the IOAPIC and it seems to work perfectly fine. 
> > Except for that IRQ 255 thing I've noticed no oddities. Thanks for
> the 
> > hint. :)
> 
> No, not quite. After about 30 minutes of uptime and a moderate load of
> eth0 (100-200KB/s constant data flow) it happened again. :(
> 
> Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt? 
> TSR=0x3, ISR=0x97, t=36.
> Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt? 
> TSR=0x3, ISR=0x3, t=141.
> Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt? 
> TSR=0x3, ISR=0x3, t=545.
> [repeating endlessly]
> 
> I've booted a kernel without APIC and IOAPIC compiled and it works
> again.
> 
> I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really 
> know where to look for the problem here. The interrupt counter for the
> IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly 
> after a while. And after a while is seems to get stuck (Tx timed out).
> "ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to
> fix such network problems.

You've got 3 ethernet controllers.

eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA.
eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6.
eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23.

And eth0 is failing.
See if you can give its network cable and its IRQ to on of the other
devices and see if the error follows the load and the wires,
or stays with the device.

The quirks for this hardware look totally broken in IOAPIC mode:
PCI: Via IRQ fixup for 00:10.2, from 10 to 5
PCI: Via IRQ fixup for 00:10.1, from 10 to 5
PCI: Via IRQ fixup for 00:10.0, from 11 to 5
I have no idea if they're a nop or not, but you might exeriment with
disabling them.  Sure isn't obvious that something called
quirk_via_irqpic() should be running in IOAPIC mode.
I'd try disabling quirk_via_acpi() too.

cheers,
-Len

ps. to exchange IRQs, you'll need to physically exchange the slots
of the cards, easy enough unless eth0 is soldered onto the
motherboard;-)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 19:38     ` Len Brown
@ 2004-08-16 20:11       ` Maciej W. Rozycki
  0 siblings, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2004-08-16 20:11 UTC (permalink / raw)
  To: Len Brown; +Cc: Oliver Feiler, Marcelo Tosatti, Marcelo Tosatti, linux-kernel

On Mon, 16 Aug 2004, Len Brown wrote:

> > MIS:         42
> 
> This is unusual.
> MIS is a hardware workaround and should normally be 0.

 Unfortunately these events seem to be triggerable for all systems using
serial APIC interrupt delivery.  All that is needed is a sufficiently high
load on interrupts, even a transient one.  Admittedly the definition of
"sufficient" here is very high, something like at least ten thousands of
interrupts per second.  E.g. I've been able to observe a few of them on my
system when a UDP NFS client was untarring an archive over a 100Mbps
network -- both the archive and the destination were located in an NFS
mounted filesystem and the size of the untarred data was around 300MB.  
The APIC hardware is rock-solid there -- after many years of operation I
have yet to see a single APIC error.

 One "reliable" way of triggering these events is configuring the PIT
timer interrupt input as level-triggered in the I/O APIC. ;-)  This is
actually how I did run-time testing of this code.

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 19:50       ` Len Brown
@ 2004-08-16 23:04         ` Oliver Feiler
  2004-08-16 23:42           ` Maciej W. Rozycki
  2004-08-17  0:29           ` Alan Cox
  0 siblings, 2 replies; 12+ messages in thread
From: Oliver Feiler @ 2004-08-16 23:04 UTC (permalink / raw)
  To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel

Hi Len,

Len Brown wrote:
> 
> 
> You've got 3 ethernet controllers.
> 
> eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA.
> eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6.
> eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23.

Correct.

> 
> And eth0 is failing.
> See if you can give its network cable and its IRQ to on of the other
> devices and see if the error follows the load and the wires,
> or stays with the device.

Doing that is a bit problematic. eth0 is a 10mbit NIC, eth1 and eth2 
must be 100mbit unfortunately. I can move around (two of) the NICs in 
the PCI slots however. The box is headless and a bit uncomfortable to 
work with, so I'd like to try software solutions first.

> 
> The quirks for this hardware look totally broken in IOAPIC mode:
> PCI: Via IRQ fixup for 00:10.2, from 10 to 5
> PCI: Via IRQ fixup for 00:10.1, from 10 to 5
> PCI: Via IRQ fixup for 00:10.0, from 11 to 5
> I have no idea if they're a nop or not, but you might exeriment with
> disabling them.  Sure isn't obvious that something called
> quirk_via_irqpic() should be running in IOAPIC mode.
> I'd try disabling quirk_via_acpi() too.

Ok, I've removed the quirks from quirks.c, compiled and rebooted. I hope 
I have done it right, I commented out these lines in quirks.c:

//      { PCI_FIXUP_HEADER,     PCI_VENDOR_ID_VIA, 
PCI_DEVICE_ID_VIA_82C586_3,     quirk_via_acpi },
//      { PCI_FIXUP_HEADER,     PCI_VENDOR_ID_VIA, 
PCI_DEVICE_ID_VIA_82C686_4,     quirk_via_acpi },
//      { PCI_FIXUP_FINAL,      PCI_VENDOR_ID_VIA, 
PCI_DEVICE_ID_VIA_82C586_2,     quirk_via_irqpic },
//      { PCI_FIXUP_FINAL,      PCI_VENDOR_ID_VIA, 
PCI_DEVICE_ID_VIA_82C686_5,     quirk_via_irqpic },
//      { PCI_FIXUP_FINAL,      PCI_VENDOR_ID_VIA, 
PCI_DEVICE_ID_VIA_82C686_6,     quirk_via_irqpic },

The "Via IRQ fixup for dev:..." are gone from the boot messages. After 
transferring about 250 MB over eth0 the "Tx timed out" error reoccured.

/proc/interrupts looked like this:

            CPU0
   0:     191473    IO-APIC-edge  timer
   1:       1244    IO-APIC-edge  keyboard
   8:          1    IO-APIC-edge  rtc
   9:          0   IO-APIC-level  acpi
  14:      33547    IO-APIC-edge  ide0
  15:      23121    IO-APIC-edge  ide1
  17:       5699   IO-APIC-level  eth1
  18:     234589   IO-APIC-level  eth0
  21:          0   IO-APIC-level  usb-uhci, usb-uhci, usb-uhci
  22:          0   IO-APIC-level  via82cxxx
  23:     240873   IO-APIC-level  eth2
NMI:          0
LOC:     191481
ERR:          0
MIS:          8

What exactly is MIS? Something like "interrupt occured, but I have no 
idea what device caused it"? I don't know much about it, but it's always 
 >0 when the problem happens.

> 
> cheers,
> -Len
> 
> ps. to exchange IRQs, you'll need to physically exchange the slots
> of the cards, easy enough unless eth0 is soldered onto the
> motherboard;-)

Fortunately only eth2 (the VIA Rhine-II) is soldered onto the board. :)

I'll try reordering the NICs in the PCI slots. The system is used most 
of the time though, so I can't take it apart and test things all the 
time. I wonder if it makes sense to experiment with the IOAPIC further. 
Maybe the hardware is just plain broken? Or might there be a slight 
chance to get this to work the way it's intended to?

Btw, I don't know if I've ever mentioned it, it's an Asrock K7VM4 board. 
lspci output is here if it might be of interest:

kiza@spot:~> lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8378 [KM400] Chipset Host 
Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:09.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI 
Fast Ethernet (rev 02)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
00:10.0 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 
controller] (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 
controller] (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 
controller] (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. 
VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] 
(rev 74)
01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8378 [S3 
UniChrome] Integrated Video (rev 01)

Thanks for your help with this. :)

Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 23:04         ` Oliver Feiler
@ 2004-08-16 23:42           ` Maciej W. Rozycki
  2004-08-17  0:29           ` Alan Cox
  1 sibling, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2004-08-16 23:42 UTC (permalink / raw)
  To: Oliver Feiler; +Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti, linux-kernel

On Tue, 17 Aug 2004, Oliver Feiler wrote:

> MIS:          8
> 
> What exactly is MIS? Something like "interrupt occured, but I have no 
> idea what device caused it"? I don't know much about it, but it's always 
>  >0 when the problem happens.

 It's a trigger mode MISmatch.  It only happens for level-triggered
interrupts and the problem is they get recorded as edge-triggered ones in
the receiving local APIC.  The two interrupt trigger modes require the
hardware to perform different actions when the software interrupt handler
concludes and such a mismatch would lead to a lock-up of the affected
line.  Specifically, the local APIC involved sends an End Of Interrupt
(EOI) message to the originating I/O APIC for level-triggered interrupts
and for edge-triggered interrupts nothing is sent.  Fortunately just
before sending the final ACK to the hardware at the conclusion of the
handler we can detect that the trigger mode recorded by the local APIC
disagrees with the setup of the corresponding I/O APIC line and if that
happens we execute an (expensive) unlock action at the I/O APIC so that it
resets its logic for the input as if it received an EOI message from a
local APIC for a level-triggered interrupt.

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: eth*: transmit timed out since .27
  2004-08-16 23:04         ` Oliver Feiler
  2004-08-16 23:42           ` Maciej W. Rozycki
@ 2004-08-17  0:29           ` Alan Cox
  1 sibling, 0 replies; 12+ messages in thread
From: Alan Cox @ 2004-08-17  0:29 UTC (permalink / raw)
  To: Oliver Feiler
  Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti,
	Linux Kernel Mailing List

Looking over the docs the whole ACPI and IOAPIC mode for these boards
seems very different and quite "magic" compared to the PCI mode which is
merely "odd" in a few places. APIC routing bits are stuffed into strange
chipset specific places which implies the quirks probably shouldn't be
applied in acpi mode.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-08-17  1:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com>
2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown
2004-08-16 18:44   ` eth*: transmit timed out since .27 Oliver Feiler
2004-08-16 19:08     ` Oliver Feiler
2004-08-16 19:50       ` Len Brown
2004-08-16 23:04         ` Oliver Feiler
2004-08-16 23:42           ` Maciej W. Rozycki
2004-08-17  0:29           ` Alan Cox
2004-08-16 19:38     ` Len Brown
2004-08-16 20:11       ` Maciej W. Rozycki
2004-08-07 23:28 linux-2.4.27 released Marcelo Tosatti
2004-08-10 12:23 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Oliver Feiler
2004-08-13 10:15   ` Marcelo Tosatti
2004-08-13 21:56     ` Oliver Feiler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox