From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Denys Fedoryshchenko" Subject: kernel 2.6.25-rc7 highly unstable on high load Date: Thu, 27 Mar 2008 08:35:06 +0200 Message-ID: <20080327062502.M51594@visp.net.lb> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r To: netdev@vger.kernel.org Return-path: Received: from usermail.globalproof.net ([194.146.153.18]:43714 "EHLO usermail.globalproof.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753519AbYC0GfR (ORCPT ); Thu, 27 Mar 2008 02:35:17 -0400 Received: from visp.net.lb (localhost [127.0.0.1]) by usermail.globalproof.net (Postfix) with ESMTP id 20515BD396 for ; Thu, 27 Mar 2008 08:35:06 +0200 (EET) Sender: netdev-owner@vger.kernel.org List-ID: Hi again It seems i am having very bad luck with 2.6.27. As Linus told, it have to be released soon, but it is crashing like hell on high network load. Even i change FIB to HASH, still after 3-4 hours running it stuck. And NOTHING helps. Software watchdog (hardware iTCO_wdt probably not working at all on ICH9 Intel boards),nmi_watchdog (even panic on oops set, and reboot on panic set too), hangcheck-timer, shell script with ping watchdog. Nothing helps. So probably people with heavy loaded machines without IPMI and management cards will be very "happy". I have now watchdog just doing malloc/free loop, i will try to modify it, so it will send \000 only when ping succeeded. One crash i had netconsole enabled, next crash it was disabled. Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6), available also at http://www.nuclearcat.com/files/crash_2.6.25.txt Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 task=f7459080 task.ti=f70b7000) Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: Mar 26 02:27:14 ROUTER c092eb14 Mar 26 02:27:14 ROUTER c011991e Mar 26 02:27:14 ROUTER f750d600 Mar 26 02:27:14 ROUTER f750d600 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER c092eb34 Mar 26 02:27:14 ROUTER c0119b3b Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER 00000082 Mar 26 02:27:14 ROUTER f708af88 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER c092eb3c Mar 26 02:27:14 ROUTER c0119bfe Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Mar 26 02:27:14 ROUTER c092eb50 Mar 26 02:27:14 ROUTER c012f19c Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER f708af88 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER c092eb74 Mar 26 02:27:14 ROUTER c011652a Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER printk+0x15/0x17 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 Mar 26 02:27:14 ROUTER [ 4698.694693] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __fput+0x93/0x135 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER fput+0x17/0x19 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d Mar 26 02:27:14 ROUTER [ 4698.694694] [] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= Mar 26 02:27:14 ROUTER [ 4698.694694] Code: Mar 26 02:27:14 ROUTER 94 Mar 26 02:27:14 ROUTER c0 Mar 26 02:27:14 ROUTER 84 Mar 26 02:27:14 ROUTER c0 Mar 26 02:27:14 ROUTER b9 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 75 Mar 26 02:27:14 ROUTER 09 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 81 Mar 26 02:27:14 ROUTER 02 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 30 Mar 26 02:27:14 ROUTER c9 Mar 26 02:27:14 ROUTER 5d Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER c8 Mar 26 02:27:14 ROUTER c3 Mar 26 02:27:14 ROUTER 55 Mar 26 02:27:14 ROUTER ba Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER e5 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 66 Mar 26 02:27:14 ROUTER 0f Mar 26 02:27:14 ROUTER c1 Mar 26 02:27:14 ROUTER 10 Mar 26 02:27:14 ROUTER 38 Mar 26 02:27:14 ROUTER f2 Mar 26 02:27:14 ROUTER 74 Mar 26 02:27:14 ROUTER 06 Mar 26 02:27:14 ROUTER f3 Mar 26 02:27:14 ROUTER 90 Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " Mar 26 02:27:14 ROUTER 10 Mar 26 02:27:14 ROUTER eb Mar 26 02:27:14 ROUTER f6 Mar 26 02:27:14 ROUTER 5d Mar 26 02:27:14 ROUTER c3 Mar 26 02:27:14 ROUTER 55 Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER e5 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 81 Mar 26 02:27:14 ROUTER 28 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 74 Mar 26 02:27:14 ROUTER 05 Mar 26 02:27:14 ROUTER e8 Mar 26 02:27:14 ROUTER 64 Mar 26 02:27:14 ROUTER fd Mar 26 02:27:14 ROUTER -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L.