From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263399AbUDTRyc (ORCPT ); Tue, 20 Apr 2004 13:54:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263688AbUDTRyb (ORCPT ); Tue, 20 Apr 2004 13:54:31 -0400 Received: from fay-gateway.aitcom.net ([208.234.1.225]:60826 "EHLO group-cio.aitcorp.ait") by vger.kernel.org with ESMTP id S263656AbUDTRx0 (ORCPT ); Tue, 20 Apr 2004 13:53:26 -0400 Message-ID: <40856395.1020703@nova.org> Date: Tue, 20 Apr 2004 13:53:25 -0400 From: Ken User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: [2.6][NETFILTER][e1000 w/NAPI] repeatable deadlock in uP & SMP Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Greetings: [This is a re-submit -- I didn't see it in the archives and there was no response. I apologize in advance to anyone who receives this as a duplicate.] Summary: I have an Intel SR2300 based on an Intel SE7501-WV2 MB with two Xeon 2.4G (HT) CPUs and E7500 (Plumas) chip set. There is an Adaptec 2000S ZCR card and two Intel 82544EI Gigabit Ethernet PCI cards on the bus. The system is used as a network monitoring system using iptables/netfilter to feed Snort and NTOP with libpcap. The hardware is rock solid using a 2.4.[18-26] kernel. However, the same hardware using kernel 2.6.[1-5] exhibits a repeatable deadlock or freeze after varying periods of operation. I have "googled", lurked, and searched list archives for linux-kernel and netfilter. I have searched netfilter's bug reports. I have attempted UP and SMP kernel configurations in an attempt to eliminate or narrow the problem. I have attempted to follow Documentation/oops-tracing.txt. Problem: In kernel 2.6.[1-5] (I could not run 2.6.0) using a UP or SMP build and running iptable/netfilter modules, the system deadlocks/freezes after operating for 3 minutes to 5 hours. The shortest time to lock was with 2.6.4 UP -- the longest with 2.6.3 SMP. Prior to each occurrence the system was seeing 12K-20K interrupts per second and 8K-20K context switches per second according to 'vmstat 1' output. Practically all of this load is related to a single Intel fiber NIC sniffing a SPAN port on a Cisco switch. If I ONLY run NTOP and sniff the NIC via libpcap, the system performs normally. In all cases the NIC is driven by the e1000 driver compiled with NAPI support. I use 'ethtool' to increase the RX ring parameters to 4096 (max) -- this is to prevent drops, which occur otherwise. The only diagnostic I've been able to get was using the NMI oopser to produce a dump on console. When the deadlock occurs, there is no response via network or keyboard. However, with a UP kernel, on some occasions I can press as in and receive another trace. After that, it's hardware reset. The SMP build rarely generates an oops to console -- it's just locked. The dumps to console vary at the top of the screen and usually contain garbage down to Stack. On one occasion using an UP build, the dump did indicate that EIP was at scheduler_tick. Regardless of the garbage at the top of the screen or the sub-version of kernel, the call trace for UP is consistent and is hand copied here (from 2.6.4 UP configuration): Call trace: [] __kfree_skb + 0xa3/0x128 [] update_process_times + 0x46/0x52 [] do_timer + 0x34/0xe4 [] timer_interrupt + 0x41/0xf3 [] handle_IRQ_event + 0x3a/0x64 [] do_IRQ + 0x72/0xe5 [] common_interrupt + 0x18/0x20 [] skb_copy_bits + 0x49/0x261 [] printk + 0x113/0x146 [] dump_packet + 0x39f/0x8c4 [ipt_LOG] [] recalc_task_prio + 0x92/0x19d Code: 0f ab 50 08 83 c4 18 5b 5e 5f 5d c3 8b 4e 18 83 f9 63 89 4d <0> Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing Using the information above, the code (at EIP when panic occurred, I believe) is in the 'scheduler_tick' function in kernel/sched.c. Using objdump on kernel/sched.o, I did find that code sequence associated with scheduler_tick. Using SMP builds the troubleshooting is much harder. The NMI oopser rarely succeeds in getting anything to console and doesn't work. On the occasion when a trace was present, I have hand copied it here (from 2.6.5 SMP configuration): Process swapper (pid: 0, threadinfo=f7f9e000 task=c247d190 Stack: [ elided ] Call trace: [] drain_array+0x7b/0xb6 [] cache_reap+0x8b/0x16e [] reap_timer_fnc+0x22/0x46 [] reap_timer_fnc+0x0/0x46 [] run_timer_softirq+0xc7/0x180 [] do_softirq+0xc3/0xc5 [] smp_apic_timer_interrupt+0xd2/0x13a [] default_idle+0x0/0x2d [] apic_timer_interrupt+0x1a/0x20 [] default_idle+0x0/0x2d [] default_idle+0x2a/0x2d [] cpu_idle+0x37/0x40 [] printk+0x173/0x1a9 Code: 89 50 04 89 02 c7 43 04 00 02 20 00 2b 4b 0c c7 03 00 01 10 <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing I found the byte code at EIP in mm/slab.o using objdump. This is what I see: slab.o: file format elf32-i386 Disassembly of section .text: 00000000 : 0: 55 push %ebp 1: 57 push %edi 2: 31 ff xor %edi,%edi 000010ce : 10ce: 55 push %ebp 10cf: 31 c0 xor %eax,%eax 10d1: 57 push %edi 10d2: 56 push %esi 10d3: 53 push %ebx 1123: 89 50 04 mov %edx,0x4(%eax) [<-code start] 1126: 89 02 mov %eax,(%edx) 1128: c7 43 04 00 02 20 00 movl $0x200200,0x4(%ebx) 112f: 2b 4b 0c sub 0xc(%ebx),%ecx 1132: c7 03 00 01 10 00 movl $0x100100,(%ebx) 1138: 31 d2 xor %edx,%edx 113a: 89 c8 mov %ecx,%eax Unfortunately, that's about the limit of my ability to analyze it. I surmise that the netfilter code is the trigger, but not necessarily the cause -- could be NAPI code, e1000 driver corner case, memory management, or something else. The problem is repeatable with both UP and SMP kernels, but only so long as the iptables/netfilter modules are loaded and in use -- I can't make the panic/deadlock occur any other way. I have let it run for over 72 hours without a problem using only NTOP to sniff traffic via the e1000/NAPI driver. The rest of the normally requested information -- ver_linux, lspci, config, /proc, and boot dmesg are below: Linux scythe 2.6.5 #3 SMP Thu Apr 15 13:04:50 EDT 2004 i686 unknown Gnu C 3.2.2 Gnu make 3.80 binutils 2.13.90.0.18 util-linux 2.11z mount 2.11z module-init-tools 3.0 e2fsprogs 1.32 jfsutils 1.1.1 reiserfsprogs 3.6.4 xfsprogs 2.3.5 pcmcia-cs 3.2.4 quota-tools 3.08. PPP 2.4.1 nfs-utils 1.0.4 Linux C Library 2.3.1 Dynamic linker (ldd) 2.3.1 Linux C++ Library 5.0.2 Procps 3.1.8 Net-tools 1.60 Kbd 1.08 Sh-utils 2.0 Modules Loaded ipt_length ipt_mac ipt_limit ip_queue ipt_LOG adm1021 i2c_sensor i2c_i801 i2c_core ehci_hcd uhci_hcd 00:00.0 Host bridge: Intel Corp. e7500 [Plumas] DRAM Controller (rev 03) Subsystem: Intel Corp.: Unknown device 3415 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- 00:03.1 Class ff00: Intel Corp. e7500 [Plumas] HI_C Virtual PCI Bridge (F1) (rev 03) Subsystem: Intel Corp.: Unknown device 3415 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corp. 82801CA ISA Bridge (LPC) (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- Region 1: I/O ports at Region 2: I/O ports at Region 3: I/O ports at Region 4: I/O ports at 03a0 [size=16] Region 5: Memory at 80000000 (32-bit, non-prefetchable) [size=1K] 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus (rev 02) Subsystem: Intel Corp.: Unknown device 3415 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [50] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=0 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, SCO-, SRD- : Upstream: Capacity=0, Commitment Limit=0 : Downstream: Capacity=0, Commitment Limit=0 02:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 03) (prog-if 20 [IO(X)-APIC]) Subsystem: Intel Corp.: Unknown device 3415 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [50] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=1 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, SCO-, SRD- : Upstream: Capacity=0, Commitment Limit=0 : Downstream: Capacity=0, Commitment Limit=0 03:07.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller (Copper) (rev 01) Subsystem: Intel Corp.: Unknown device 3415 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-