From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261351AbTKAWEm (ORCPT ); Sat, 1 Nov 2003 17:04:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261555AbTKAWEm (ORCPT ); Sat, 1 Nov 2003 17:04:42 -0500 Received: from r00tworld.com ([194.98.185.18]:16330 "EHLO r00tworld.com") by vger.kernel.org with ESMTP id S263454AbTKAWE0 (ORCPT ); Sat, 1 Nov 2003 17:04:26 -0500 Date: Sat, 1 Nov 2003 23:04:24 +0100 From: dav@r00tworld.com To: linux-kernel@vger.kernel.org Subject: PROBLEM: IRQ balancing with SMP failed with system load Message-ID: <20031101220424.GD443@moon.r00tworld.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Thread-Topic: PROBLEM: IRQ balancing with SMP failed with system load Thread-Index: AcOdnkRlpsOoP0YHSgOwXFmUInPmqA== Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org hi, i've a trouble with Linux 2.4- kernels on SMP station with XEON processors. Problem seems to be linked with IRQ generated by network traffic, and occurs after a variable period (a few days or a few weeks, best now is 21 days with RedHat(9) kernel. IRQ are not rightly balanced between each processor on SMP RedHat 9/x86 system, with too heavy load with high network traffic (and so, a lot of IRQs for NICs). This occurs with a lot of kernel tested (2.4.20, 2.4.21, 2.4.22, 2.4.22-ac4). At this time, we are running RedHat kernel : # cat /proc/version Linux version 2.4.20-8smp (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu Mar 13 17:45:54 EST 2003 # w | head -1 22:34:33 up 21 days, 8:04, 3 users, load average: 1.17, 1.02, 0.99 # ps axuw USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1372 472 ? S Oct07 0:20 init root 2 0.0 0.0 0 0 ? SW Oct07 0:00 [migration/0] root 3 0.0 0.0 0 0 ? SW Oct07 0:00 [migration/1] root 4 0.0 0.0 0 0 ? SW Oct07 0:00 [migration/2] root 5 0.0 0.0 0 0 ? SW Oct07 0:00 [migration/3] root 6 0.0 0.0 0 0 ? SW Oct07 0:00 [keventd] root 7 6.0 0.0 0 0 ? RWN Oct07 1862:18 [ksoftirqd_CPU0] root 8 0.0 0.0 0 0 ? SWN Oct07 0:00 [ksoftirqd_CPU1] root 9 0.0 0.0 0 0 ? SWN Oct07 0:00 [ksoftirqd_CPU2] root 10 0.0 0.0 0 0 ? SWN Oct07 0:00 [ksoftirqd_CPU3] root 15 0.0 0.0 0 0 ? SW Oct07 0:00 [bdflush] root 11 0.0 0.0 0 0 ? SW Oct07 0:11 [kswapd] root 12 0.0 0.0 0 0 ? SW Oct07 0:00 [kscand/DMA] root 13 0.0 0.0 0 0 ? SW Oct07 4:43 [kscand/Normal] root 14 0.0 0.0 0 0 ? SW Oct07 5:47 [kscand/HighMem] root 16 0.0 0.0 0 0 ? SW Oct07 0:05 [kupdated] root 17 0.0 0.0 0 0 ? SW Oct07 0:00 [mdrecoveryd] root 23 0.0 0.0 0 0 ? SW Oct07 0:00 [aacraid] root 24 0.0 0.0 0 0 ? SW Oct07 0:00 [scsi_eh_0] root 27 0.0 0.0 0 0 ? SW Oct07 0:02 [kjournald] root 84 0.0 0.0 0 0 ? SW Oct07 0:00 [khubd] root 642 0.0 0.0 0 0 ? SW Oct07 0:00 [kjournald] root 643 0.0 0.0 0 0 ? SW Oct07 0:13 [kjournald] root 1210 0.0 0.0 1448 588 ? S Oct07 0:09 syslogd -m 0 root 1214 0.0 0.0 1368 428 ? S Oct07 0:00 klogd -x root 1266 0.0 0.1 3580 1396 ? S Oct07 0:02 /usr/sbin/sshd .../... This system is used only to route packets (asymetric routing, inside our network to outside only), without iptables rules (due to this trouble but we've planned initialy to set netfilter rules). When system is loaded (by process ksoftirq_CPU0), some datas are lost (monitorring system reports each 10 minuts CRITICAL alerts => 50% of warning report for packet lost). report trafic when this issue occurs is like this : .../... eth5|Recv: 2.169G|Sent: 1.591G|Recv Speed: 1.25Kb/s|Sent speed: 1.03Mb/s| eth5|Recv: 2.169G|Sent: 1.591G|Recv Speed: nanb/s|Sent speed: nanb/s| eth5|Recv: 2.169G|Sent: 1.594G|Recv Speed: 0b/s|Sent speed: 776.96Kb/s| eth5|Recv: 2.169G|Sent: 1.596G|Recv Speed: 1.75Kb/s|Sent speed: 958.87Kb/s| eth5|Recv: 2.169G|Sent: 1.599G|Recv Speed: 939b/s|Sent speed: 978.17Kb/s| eth5|Recv: 2.169G|Sent: 1.603G|Recv Speed: 640b/s|Sent speed:1009.87Kb/s| eth5|Recv: 2.169G|Sent: 1.607G|Recv Speed: 85b/s|Sent speed: 1.29Mb/s| eth5|Recv: 2.169G|Sent: 1.611G|Recv Speed: 853b/s|Sent speed: 1.28Mb/s| eth5|Recv: 2.169G|Sent: 1.615G|Recv Speed: 0b/s|Sent speed: 1.68Mb/s| .../... And we can see : # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 184391788 0 0 0 IO-APIC-edge timer 1: 953 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 14: 2 0 0 0 IO-APIC-edge ide0 16: 6640100 0 0 0 IO-APIC-level eth0 20: 2347915 0 0 0 IO-APIC-level eth2 21: 3242586 0 0 0 IO-APIC-level eth3 28: 1362515262 0 0 0 IO-APIC-level eth4 29: 1127825818 0 0 0 IO-APIC-level eth5 30: 831193 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 184394043 184394090 184394090 184394089 ERR: 0 MIS: 0 Datas are transmit like following : --------------------- OUT NETWORK ---------------------->|<--- OUTSIDE (INTERNET) ----- ...applications servers +--------+(eth4) Linux box (eth5)+----------+ router +--- ... eth4 Link encap:Ethernet HWaddr 00:06:5B:F3:61:5F inet addr:192.168.20.245 Bcast:192.168.20.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2184189475 errors:0 dropped:0 overruns:0 frame:0 TX packets:2195331 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:717500525 (684.2 Mb) TX bytes:153729558 (146.6 Mb) Interrupt:28 eth5 Link encap:Ethernet HWaddr 00:06:5B:F3:61:60 inet addr:194.xxx.xxx.248 Bcast:194.xxx.xxx.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:26145681 errors:0 dropped:0 overruns:0 frame:0 TX packets:2205082845 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:2328890724 (2221.0 Mb) TX bytes:1860452584 (1774.2 Mb) Interrupt:29 Additional informations about system can be found at the end of this message. Thanks if anyone know how to fix or solve this issue ;-( At this time, the only answer we found is to switch gateway with H.A process, and reboot the box with kernel process "broken" ;-( dav. -- # cat /proc/modules tg3 52904 2 e100 62340 3 ipt_LOG 4280 0 (autoclean) ipt_limit 1688 0 (autoclean) ipt_REJECT 3928 0 (autoclean) iptable_mangle 2776 0 (autoclean) (unused) ipt_state 1080 0 (autoclean) iptable_nat 22904 0 (autoclean) ip_conntrack 29696 2 (autoclean) [ipt_state iptable_nat] iptable_filter 2412 0 (autoclean) ip_tables 15864 9 [ipt_LOG ipt_limit ipt_REJECT iptable_mangle ipt_state iptable_nat iptable_filter] keybdev 2976 0 (unused) mousedev 5656 0 (unused) hid 22308 0 (unused) input 6208 0 [keybdev mousedev hid] usb-ohci 22216 0 (unused) usbcore 82592 1 [hid usb-ohci] ext3 73376 3 jbd 56336 3 [ext3] aacraid 32580 4 sd_mod 13452 8 scsi_mod 110488 2 [aacraid sd_mod] # cat /proc/ioports 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0070-007f : rtc 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 01f0-01f7 : ide0 02f8-02ff : serial(auto) 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial(auto) 08b0-08bf : ServerWorks CSB5 IDE Controller 08b0-08b7 : ide0 08b8-08bf : ide1 0cf8-0cff : PCI conf1 9000-9fff : PCI Bus #07 9800-98ff : Adaptec RAID subsystem HBA (#2) 9c00-9cff : Adaptec RAID subsystem HBA c000-cfff : PCI Bus #03 ccc0-ccdf : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#4) ccc0-ccdf : e100 cce0-ccff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#3) cce0-ccff : e100 d000-dfff : PCI Bus #02 dcc0-dcdf : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#2) dcc0-dcdf : e100 dce0-dcff : Intel Corp. 82557/8/9 [Ethernet Pro 100] dce0-dcff : e100 e800-e8ff : ATI Technologies Inc Rage XL ec80-ecbf : Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di ece8-ecef : Dell Computer Corporation Embedded Systems Management Device 4 ecf4-ecf7 : PCI device 1028:000d (Dell Computer Corporation) ecf8-ecff : Dell Computer Corporation Embedded Systems Management Device 4 # cat /proc/iomem 00000000-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000cc000-000cc5ff : Extension ROM 000f0000-000fffff : System ROM 00100000-3ffeffff : System RAM 00100000-002720eb : Kernel code 002720ec-00383ba3 : Kernel data 3fff0000-3fffebff : ACPI Tables 3fffec00-3fffefff : reserved f0000000-f7ffffff : Dell Computer Corporation PowerEdge Expandable RAID Controller 3 fc300000-fc4fffff : PCI Bus #07 fc3fe000-fc3fefff : Adaptec RAID subsystem HBA (#2) fc3ff000-fc3fffff : Adaptec RAID subsystem HBA fc600000-fc60ffff : Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (#2) fc600000-fc60ffff : tg3 fc610000-fc61ffff : Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet fc610000-fc61ffff : tg3 fc700000-fc7fffff : PCI Bus #03 fc7fe000-fc7fefff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#4) fc7fe000-fc7fefff : e100 fc7ff000-fc7fffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#3) fc7ff000-fc7fffff : e100 fc800000-fc8fffff : PCI Bus #02 fc8fe000-fc8fefff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#2) fc8fe000-fc8fefff : e100 fc8ff000-fc8fffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] fc8ff000-fc8fffff : e100 fca00000-fccfffff : PCI Bus #03 fca00000-fcafffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#4) fca00000-fcafffff : e100 fcb00000-fcbfffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#3) fcb00000-fcbfffff : e100 fcd00000-fcffffff : PCI Bus #02 fcd00000-fcdfffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] (#2) fcd00000-fcdfffff : e100 fce00000-fcefffff : Intel Corp. 82557/8/9 [Ethernet Pro 100] fce00000-fcefffff : e100 fd000000-fdffffff : ATI Technologies Inc Rage XL fe100000-fe100fff : ServerWorks OSB4/CSB5 OHCI USB Controller fe100000-fe100fff : usb-ohci fe101000-fe101fff : ATI Technologies Inc Rage XL fe102000-fe102fff : Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di feb00000-feb7ffff : Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di feb80000-feb80fff : Dell Computer Corporation Embedded Systems Management Device 4 fec00000-fec0ffff : reserved fee00000-fee0ffff : reserved fff80000-ffffffff : reserved 00:00.0 Host bridge: ServerWorks CMIC-LE (rev 13) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- [disabled] [size=128K] Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) Subsystem: ServerWorks CSB5 South Bridge Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- Region 1: I/O ports at Region 2: I/O ports at Region 3: I/O ports at Region 4: I/O ports at 08b0 [size=16] 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) (prog-if 10 [OHCI]) Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Bridge: PM- B3+ 01:08.0 PCI bridge: Intel Corp. 21152 PCI-to-PCI Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Reset- FastB2B- Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Bridge: PM- B3+ 02:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 05) Subsystem: Intel Corp. EtherExpress PRO/100+ Dual Port Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [68] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 06:08.1 RAID bus controller: Dell Computer Corporation PowerEdge Expandable RAID Controller 3 (rev 01) Subsystem: Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- SERR- TAbort- SERR- TAbort- SERR-