From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754666AbXIQRCs (ORCPT ); Mon, 17 Sep 2007 13:02:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752691AbXIQRCm (ORCPT ); Mon, 17 Sep 2007 13:02:42 -0400 Received: from www.sophics.cz ([194.108.6.2]:39567 "EHLO www.sophics.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752685AbXIQRCl (ORCPT ); Mon, 17 Sep 2007 13:02:41 -0400 X-Greylist: delayed 2056 seconds by postgrey-1.27 at vger.kernel.org; Mon, 17 Sep 2007 13:02:40 EDT Message-ID: <46EEAB26.6050400@sophics.cz> Date: Mon, 17 Sep 2007 18:28:22 +0200 From: Petr Stehlik User-Agent: Icedove 1.5.0.12 (X11/20070607) MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: forcedeth kernel panic Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, an ASUS M2N32 WS Pro (nVidia MCP55 chipset) based machine with on-board Gbit ethernet leads to kernel panic under high network load. The machine is to be a Samba server and got minimal 64bit Debian Etch installed. First it crashed with stock Debian 2.6.18-amd64 kernel so I upgraded to 2.6.21 and at last to 2.6.22-2-amd64 (source from Debian). The crashes varied per kernel but were always fatal (only hard reset helped) so I decided to post also here (in addition to Debian's BTS #442877). The crash occurs under high network load generated by tserv from dbench package within about 20 minutes of tserv test (run from another machine) against this machine (which is running tserv_srv). Before it crashes it fills the kernel log with the following messages that may or may not be related to the crash: Sep 17 14:51:27 harapes kernel: eth0: too many iterations (6) in nv_nic_irq. Sep 17 14:51:58 harapes last message repeated 1026 times Sep 17 14:52:59 harapes last message repeated 2063 times Sep 17 14:54:00 harapes last message repeated 2055 times Sep 17 14:55:01 harapes last message repeated 2044 times I wrote it may not be related because I got here an older nForce based machine that is running the tserv against the crashing server and it also fills the log with the same messages - but fortunately it does not crash... After killing the machine several times in a row I googled a bit and found some suggestions so now I am testing a different setup - the forcedeth driver loaded with "optimization_mode=1" parameter and so far (95 minutes of tserv run) it didn't crash... More details about the hardware: AMD64 3600+ (=2GHz), 2GB of DDR2, 6 SATA drives in RAID1 and RAID5 configuration on the on-board SATA driver, a PCI S3 graphics and that's it. dmesg output related to networking: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 01043:81fb bound to 0000:00:10.0 eth0: no IPv6 routers present lspci -vv: 00:10.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2) Subsystem: ASUSTeK Computer Inc. Unknown device 81fb Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- :forcedeth: nv_nic_irq_optimized+0x89/0x22c handle_IRQ_event+0x25/0x53 __do_softirq+0x55/0xc3 handle_edge_irq+0xe4/0x127 do_IRQ+0x6c/0xd5 default_idle+0x0/0x3d ret_from_intr+0x0/0xa default_idle+0x29/0x3d cpu_idle+0x8b/0xae Code: 8a 83 84 00 00 00 83 e0 f3 83 c8 04 88 83 84 00 00 00 83 7b RIP :forcedeth:nv_rx_process_optimized+0xe6/0x380 Kernel panic - not syncing: Aiee, killing interrupt handler! I may have to replace the on-board ethernet with some PCI based card because I need a reliable server very soon and when it gets deployed I won't have a chance of playing with it anymore so if there is a suggestion I could try now for perfect kernel forcedeth stability then please let me know soon. Is the "optimization_mode=1" the right solution? What kind of negative impact does it have? Thanks! Petr