* kernel 2.6.25-rc7 highly unstable on high load @ 2008-03-27 6:35 Denys Fedoryshchenko 2008-03-27 6:40 ` David Miller 0 siblings, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 6:35 UTC (permalink / raw) To: netdev Hi again It seems i am having very bad luck with 2.6.27. As Linus told, it have to be released soon, but it is crashing like hell on high network load. Even i change FIB to HASH, still after 3-4 hours running it stuck. And NOTHING helps. Software watchdog (hardware iTCO_wdt probably not working at all on ICH9 Intel boards),nmi_watchdog (even panic on oops set, and reboot on panic set too), hangcheck-timer, shell script with ping watchdog. Nothing helps. So probably people with heavy loaded machines without IPMI and management cards will be very "happy". I have now watchdog just doing malloc/free loop, i will try to modify it, so it will send \000 only when ping succeeded. One crash i had netconsole enabled, next crash it was disabled. Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6), available also at http://www.nuclearcat.com/files/crash_2.6.25.txt Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 task=f7459080 task.ti=f70b7000) Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: Mar 26 02:27:14 ROUTER c092eb14 Mar 26 02:27:14 ROUTER c011991e Mar 26 02:27:14 ROUTER f750d600 Mar 26 02:27:14 ROUTER f750d600 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER c092eb34 Mar 26 02:27:14 ROUTER c0119b3b Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER 00000082 Mar 26 02:27:14 ROUTER f708af88 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER 00000001 Mar 26 02:27:14 ROUTER c092eb3c Mar 26 02:27:14 ROUTER c0119bfe Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Mar 26 02:27:14 ROUTER c092eb50 Mar 26 02:27:14 ROUTER c012f19c Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER f708af88 Mar 26 02:27:14 ROUTER c0378058 Mar 26 02:27:14 ROUTER c092eb74 Mar 26 02:27:14 ROUTER c011652a Mar 26 02:27:14 ROUTER 00000000 Mar 26 02:27:14 ROUTER Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER printk+0x15/0x17 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER __fput+0x93/0x135 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER fput+0x17/0x19 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] Mar 26 02:27:14 ROUTER ? Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= Mar 26 02:27:14 ROUTER [ 4698.694694] Code: Mar 26 02:27:14 ROUTER 94 Mar 26 02:27:14 ROUTER c0 Mar 26 02:27:14 ROUTER 84 Mar 26 02:27:14 ROUTER c0 Mar 26 02:27:14 ROUTER b9 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 75 Mar 26 02:27:14 ROUTER 09 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 81 Mar 26 02:27:14 ROUTER 02 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 30 Mar 26 02:27:14 ROUTER c9 Mar 26 02:27:14 ROUTER 5d Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER c8 Mar 26 02:27:14 ROUTER c3 Mar 26 02:27:14 ROUTER 55 Mar 26 02:27:14 ROUTER ba Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER e5 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 66 Mar 26 02:27:14 ROUTER 0f Mar 26 02:27:14 ROUTER c1 Mar 26 02:27:14 ROUTER 10 Mar 26 02:27:14 ROUTER 38 Mar 26 02:27:14 ROUTER f2 Mar 26 02:27:14 ROUTER 74 Mar 26 02:27:14 ROUTER 06 Mar 26 02:27:14 ROUTER f3 Mar 26 02:27:14 ROUTER 90 Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " Mar 26 02:27:14 ROUTER 10 Mar 26 02:27:14 ROUTER eb Mar 26 02:27:14 ROUTER f6 Mar 26 02:27:14 ROUTER 5d Mar 26 02:27:14 ROUTER c3 Mar 26 02:27:14 ROUTER 55 Mar 26 02:27:14 ROUTER 89 Mar 26 02:27:14 ROUTER e5 Mar 26 02:27:14 ROUTER f0 Mar 26 02:27:14 ROUTER 81 Mar 26 02:27:14 ROUTER 28 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 00 Mar 26 02:27:14 ROUTER 01 Mar 26 02:27:14 ROUTER 74 Mar 26 02:27:14 ROUTER 05 Mar 26 02:27:14 ROUTER e8 Mar 26 02:27:14 ROUTER 64 Mar 26 02:27:14 ROUTER fd Mar 26 02:27:14 ROUTER -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 6:35 kernel 2.6.25-rc7 highly unstable on high load Denys Fedoryshchenko @ 2008-03-27 6:40 ` David Miller 2008-03-27 7:03 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: David Miller @ 2008-03-27 6:40 UTC (permalink / raw) To: denys; +Cc: netdev, kaber, netfilter-devel From: "Denys Fedoryshchenko" <denys@visp.net.lb> Date: Thu, 27 Mar 2008 08:35:06 +0200 > It seems i am having very bad luck with 2.6.27. As Linus told, it have to be > released soon, but it is crashing like hell on high network load. That's amazing, you've taken a trip into the future and are running 2.6.27 already, please let me borrow your time machine :-) More seriously, there is obviously something very unique to your setup or else everyone would be reporting this crash, and we have to find out what that might be. There seems to be bunch of netfilter stuff in your traces, but the top of the trace is somewhere totally unrelated. This is a common reoccurance in your crash traces, making them less useful than they could be. I know you asked before what can be done to improve the traces, but I'm not an x86 expert so I have no idea how to help you in that area. Patrick, could you see if you can make any sense of his log? I see conttrack a lot in the backtraces. Thanks. > Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6), > available also at http://www.nuclearcat.com/files/crash_2.6.25.txt > > Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP > Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: > Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 > task=f7459080 task.ti=f70b7000) > Mar 26 02:27:14 ROUTER > Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: > Mar 26 02:27:14 ROUTER c092eb14 > Mar 26 02:27:14 ROUTER c011991e > Mar 26 02:27:14 ROUTER f750d600 > Mar 26 02:27:14 ROUTER f750d600 > Mar 26 02:27:14 ROUTER c0378058 > Mar 26 02:27:14 ROUTER 00000001 > Mar 26 02:27:14 ROUTER c092eb34 > Mar 26 02:27:14 ROUTER c0119b3b > Mar 26 02:27:14 ROUTER > Mar 26 02:27:14 ROUTER [ 4698.694693] > Mar 26 02:27:14 ROUTER 00000000 > Mar 26 02:27:14 ROUTER 00000001 > Mar 26 02:27:14 ROUTER 00000082 > Mar 26 02:27:14 ROUTER f708af88 > Mar 26 02:27:14 ROUTER c0378058 > Mar 26 02:27:14 ROUTER 00000001 > Mar 26 02:27:14 ROUTER c092eb3c > Mar 26 02:27:14 ROUTER c0119bfe > Mar 26 02:27:14 ROUTER > Mar 26 02:27:14 ROUTER [ 4698.694693] > Mar 26 02:27:14 ROUTER c092eb50 > Mar 26 02:27:14 ROUTER c012f19c > Mar 26 02:27:14 ROUTER 00000000 > Mar 26 02:27:14 ROUTER f708af88 > Mar 26 02:27:14 ROUTER c0378058 > Mar 26 02:27:14 ROUTER c092eb74 > Mar 26 02:27:14 ROUTER c011652a > Mar 26 02:27:14 ROUTER 00000000 > Mar 26 02:27:14 ROUTER > Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER printk+0x15/0x17 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER __fput+0x93/0x135 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER fput+0x17/0x19 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] > Mar 26 02:27:14 ROUTER ? > Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 > Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= > Mar 26 02:27:14 ROUTER [ 4698.694694] Code: > Mar 26 02:27:14 ROUTER 94 > Mar 26 02:27:14 ROUTER c0 > Mar 26 02:27:14 ROUTER 84 > Mar 26 02:27:14 ROUTER c0 > Mar 26 02:27:14 ROUTER b9 > Mar 26 02:27:14 ROUTER 01 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 75 > Mar 26 02:27:14 ROUTER 09 > Mar 26 02:27:14 ROUTER f0 > Mar 26 02:27:14 ROUTER 81 > Mar 26 02:27:14 ROUTER 02 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 01 > Mar 26 02:27:14 ROUTER 30 > Mar 26 02:27:14 ROUTER c9 > Mar 26 02:27:14 ROUTER 5d > Mar 26 02:27:14 ROUTER 89 > Mar 26 02:27:14 ROUTER c8 > Mar 26 02:27:14 ROUTER c3 > Mar 26 02:27:14 ROUTER 55 > Mar 26 02:27:14 ROUTER ba > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 01 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 89 > Mar 26 02:27:14 ROUTER e5 > Mar 26 02:27:14 ROUTER f0 > Mar 26 02:27:14 ROUTER 66 > Mar 26 02:27:14 ROUTER 0f > Mar 26 02:27:14 ROUTER c1 > Mar 26 02:27:14 ROUTER 10 > Mar 26 02:27:14 ROUTER 38 > Mar 26 02:27:14 ROUTER f2 > Mar 26 02:27:14 ROUTER 74 > Mar 26 02:27:14 ROUTER 06 > Mar 26 02:27:14 ROUTER f3 > Mar 26 02:27:14 ROUTER 90 > Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " > Mar 26 02:27:14 ROUTER 10 > Mar 26 02:27:14 ROUTER eb > Mar 26 02:27:14 ROUTER f6 > Mar 26 02:27:14 ROUTER 5d > Mar 26 02:27:14 ROUTER c3 > Mar 26 02:27:14 ROUTER 55 > Mar 26 02:27:14 ROUTER 89 > Mar 26 02:27:14 ROUTER e5 > Mar 26 02:27:14 ROUTER f0 > Mar 26 02:27:14 ROUTER 81 > Mar 26 02:27:14 ROUTER 28 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 00 > Mar 26 02:27:14 ROUTER 01 > Mar 26 02:27:14 ROUTER 74 > Mar 26 02:27:14 ROUTER 05 > Mar 26 02:27:14 ROUTER e8 > Mar 26 02:27:14 ROUTER 64 > Mar 26 02:27:14 ROUTER fd > Mar 26 02:27:14 ROUTER ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 6:40 ` David Miller @ 2008-03-27 7:03 ` Eric Dumazet 2008-03-27 8:36 ` Denys Fedoryshchenko ` (2 more replies) 2008-03-27 7:07 ` Denys Fedoryshchenko 2008-03-27 15:01 ` Patrick McHardy 2 siblings, 3 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-27 7:03 UTC (permalink / raw) To: David Miller; +Cc: denys, netdev, kaber, netfilter-devel David Miller a écrit : > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > Date: Thu, 27 Mar 2008 08:35:06 +0200 > >> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be >> released soon, but it is crashing like hell on high network load. > > That's amazing, you've taken a trip into the future and are running > 2.6.27 already, please let me borrow your time machine :-) > > More seriously, there is obviously something very unique to your > setup or else everyone would be reporting this crash, and we have > to find out what that might be. > > There seems to be bunch of netfilter stuff in your traces, but > the top of the trace is somewhere totally unrelated. This is > a common reoccurance in your crash traces, making them less > useful than they could be. > > I know you asked before what can be done to improve the traces, > but I'm not an x86 expert so I have no idea how to help you > in that area. > > Patrick, could you see if you can make any sense of his log? > I see conttrack a lot in the backtraces. I can see rt_garbage_collect() involved here. This one might explain very long delays in softirq processing, and eventually crashes... Denys, could you post : grep . /proc/sys/net/ipv4/route/* rtstat -c1 -i10 So that we can check if you should first change route cache tunables :) > > Thanks. > >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6), >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt >> >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 >> task=f7459080 task.ti=f70b7000) >> Mar 26 02:27:14 ROUTER >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: >> Mar 26 02:27:14 ROUTER c092eb14 >> Mar 26 02:27:14 ROUTER c011991e >> Mar 26 02:27:14 ROUTER f750d600 >> Mar 26 02:27:14 ROUTER f750d600 >> Mar 26 02:27:14 ROUTER c0378058 >> Mar 26 02:27:14 ROUTER 00000001 >> Mar 26 02:27:14 ROUTER c092eb34 >> Mar 26 02:27:14 ROUTER c0119b3b >> Mar 26 02:27:14 ROUTER >> Mar 26 02:27:14 ROUTER [ 4698.694693] >> Mar 26 02:27:14 ROUTER 00000000 >> Mar 26 02:27:14 ROUTER 00000001 >> Mar 26 02:27:14 ROUTER 00000082 >> Mar 26 02:27:14 ROUTER f708af88 >> Mar 26 02:27:14 ROUTER c0378058 >> Mar 26 02:27:14 ROUTER 00000001 >> Mar 26 02:27:14 ROUTER c092eb3c >> Mar 26 02:27:14 ROUTER c0119bfe >> Mar 26 02:27:14 ROUTER >> Mar 26 02:27:14 ROUTER [ 4698.694693] >> Mar 26 02:27:14 ROUTER c092eb50 >> Mar 26 02:27:14 ROUTER c012f19c >> Mar 26 02:27:14 ROUTER 00000000 >> Mar 26 02:27:14 ROUTER f708af88 >> Mar 26 02:27:14 ROUTER c0378058 >> Mar 26 02:27:14 ROUTER c092eb74 >> Mar 26 02:27:14 ROUTER c011652a >> Mar 26 02:27:14 ROUTER 00000000 >> Mar 26 02:27:14 ROUTER >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER printk+0x15/0x17 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER fput+0x17/0x19 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] >> Mar 26 02:27:14 ROUTER ? >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 >> Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code: >> Mar 26 02:27:14 ROUTER 94 >> Mar 26 02:27:14 ROUTER c0 >> Mar 26 02:27:14 ROUTER 84 >> Mar 26 02:27:14 ROUTER c0 >> Mar 26 02:27:14 ROUTER b9 >> Mar 26 02:27:14 ROUTER 01 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 75 >> Mar 26 02:27:14 ROUTER 09 >> Mar 26 02:27:14 ROUTER f0 >> Mar 26 02:27:14 ROUTER 81 >> Mar 26 02:27:14 ROUTER 02 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 01 >> Mar 26 02:27:14 ROUTER 30 >> Mar 26 02:27:14 ROUTER c9 >> Mar 26 02:27:14 ROUTER 5d >> Mar 26 02:27:14 ROUTER 89 >> Mar 26 02:27:14 ROUTER c8 >> Mar 26 02:27:14 ROUTER c3 >> Mar 26 02:27:14 ROUTER 55 >> Mar 26 02:27:14 ROUTER ba >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 01 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 89 >> Mar 26 02:27:14 ROUTER e5 >> Mar 26 02:27:14 ROUTER f0 >> Mar 26 02:27:14 ROUTER 66 >> Mar 26 02:27:14 ROUTER 0f >> Mar 26 02:27:14 ROUTER c1 >> Mar 26 02:27:14 ROUTER 10 >> Mar 26 02:27:14 ROUTER 38 >> Mar 26 02:27:14 ROUTER f2 >> Mar 26 02:27:14 ROUTER 74 >> Mar 26 02:27:14 ROUTER 06 >> Mar 26 02:27:14 ROUTER f3 >> Mar 26 02:27:14 ROUTER 90 >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " >> Mar 26 02:27:14 ROUTER 10 >> Mar 26 02:27:14 ROUTER eb >> Mar 26 02:27:14 ROUTER f6 >> Mar 26 02:27:14 ROUTER 5d >> Mar 26 02:27:14 ROUTER c3 >> Mar 26 02:27:14 ROUTER 55 >> Mar 26 02:27:14 ROUTER 89 >> Mar 26 02:27:14 ROUTER e5 >> Mar 26 02:27:14 ROUTER f0 >> Mar 26 02:27:14 ROUTER 81 >> Mar 26 02:27:14 ROUTER 28 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 00 >> Mar 26 02:27:14 ROUTER 01 >> Mar 26 02:27:14 ROUTER 74 >> Mar 26 02:27:14 ROUTER 05 >> Mar 26 02:27:14 ROUTER e8 >> Mar 26 02:27:14 ROUTER 64 >> Mar 26 02:27:14 ROUTER fd >> Mar 26 02:27:14 ROUTER > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 7:03 ` Eric Dumazet @ 2008-03-27 8:36 ` Denys Fedoryshchenko 2008-03-27 8:48 ` Denys Fedoryshchenko 2008-03-27 14:41 ` Denys Fedoryshchenko 2 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 8:36 UTC (permalink / raw) To: netdev Kup ~ # grep . /proc/sys/net/ipv4/route/* /proc/sys/net/ipv4/route/error_burst:5000 /proc/sys/net/ipv4/route/error_cost:1000 grep: /proc/sys/net/ipv4/route/flush: Permission denied /proc/sys/net/ipv4/route/gc_elasticity:8 /proc/sys/net/ipv4/route/gc_interval:60 /proc/sys/net/ipv4/route/gc_min_interval:0 /proc/sys/net/ipv4/route/gc_min_interval_ms:500 /proc/sys/net/ipv4/route/gc_thresh:32768 /proc/sys/net/ipv4/route/gc_timeout:300 /proc/sys/net/ipv4/route/max_size:524288 /proc/sys/net/ipv4/route/min_adv_mss:256 /proc/sys/net/ipv4/route/min_pmtu:552 /proc/sys/net/ipv4/route/mtu_expires:600 /proc/sys/net/ipv4/route/redirect_load:20 /proc/sys/net/ipv4/route/redirect_number:9 /proc/sys/net/ipv4/route/redirect_silence:20480 /proc/sys/net/ipv4/route/secret_interval:600 Kup ~ # rtstat -c1 -i10 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 247734| 6225923| 482238| 0| 1472| 32| 0| 0| 75983| 30203| 0| 508575| 507658| 4| 0|18714805| 250265| Kup ~ # rtstat -c1 -i10 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 246851| 6356930| 496921| 0| 1535| 33| 0| 0| 77469| 30908| 0| 524018| 523085| 5| 0|18873741| 252563| On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote > David Miller a : > > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > > Date: Thu, 27 Mar 2008 08:35:06 +0200 > > > >> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be > >> released soon, but it is crashing like hell on high network load. > > > > That's amazing, you've taken a trip into the future and are running > > 2.6.27 already, please let me borrow your time machine :-) > > > > More seriously, there is obviously something very unique to your > > setup or else everyone would be reporting this crash, and we have > > to find out what that might be. > > > > There seems to be bunch of netfilter stuff in your traces, but > > the top of the trace is somewhere totally unrelated. This is > > a common reoccurance in your crash traces, making them less > > useful than they could be. > > > > I know you asked before what can be done to improve the traces, > > but I'm not an x86 expert so I have no idea how to help you > > in that area. > > > > Patrick, could you see if you can make any sense of his log? > > I see conttrack a lot in the backtraces. > > I can see rt_garbage_collect() involved here. This one might explain > very long delays in softirq processing, and eventually crashes... > > Denys, could you post : > > grep . /proc/sys/net/ipv4/route/* > > rtstat -c1 -i10 > > So that we can check if you should first change route cache tunables > :) > > > > > Thanks. > > > >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6- git6), > >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt > >> > >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP > >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 > >> task=f7459080 task.ti=f70b7000) > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: > >> Mar 26 02:27:14 ROUTER c092eb14 > >> Mar 26 02:27:14 ROUTER c011991e > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb34 > >> Mar 26 02:27:14 ROUTER c0119b3b > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER 00000082 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb3c > >> Mar 26 02:27:14 ROUTER c0119bfe > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER c092eb50 > >> Mar 26 02:27:14 ROUTER c012f19c > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER c092eb74 > >> Mar 26 02:27:14 ROUTER c011652a > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER printk+0x15/0x17 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER fput+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= > >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code: > >> Mar 26 02:27:14 ROUTER 94 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER 84 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER b9 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 75 > >> Mar 26 02:27:14 ROUTER 09 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 02 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 30 > >> Mar 26 02:27:14 ROUTER c9 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER c8 > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER ba > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 66 > >> Mar 26 02:27:14 ROUTER 0f > >> Mar 26 02:27:14 ROUTER c1 > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER 38 > >> Mar 26 02:27:14 ROUTER f2 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 06 > >> Mar 26 02:27:14 ROUTER f3 > >> Mar 26 02:27:14 ROUTER 90 > >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER eb > >> Mar 26 02:27:14 ROUTER f6 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 28 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 05 > >> Mar 26 02:27:14 ROUTER e8 > >> Mar 26 02:27:14 ROUTER 64 > >> Mar 26 02:27:14 ROUTER fd > >> Mar 26 02:27:14 ROUTER > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 7:03 ` Eric Dumazet 2008-03-27 8:36 ` Denys Fedoryshchenko @ 2008-03-27 8:48 ` Denys Fedoryshchenko 2008-03-27 14:41 ` Denys Fedoryshchenko 2 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 8:48 UTC (permalink / raw) To: Eric Dumazet, David Miller; +Cc: netdev, kaber, netfilter-devel After some time Kup /config # grep . /proc/sys/net/ipv4/route/* /proc/sys/net/ipv4/route/error_burst:5000 /proc/sys/net/ipv4/route/error_cost:1000 grep: /proc/sys/net/ipv4/route/flush: Permission denied /proc/sys/net/ipv4/route/gc_elasticity:8 /proc/sys/net/ipv4/route/gc_interval:60 /proc/sys/net/ipv4/route/gc_min_interval:0 /proc/sys/net/ipv4/route/gc_min_interval_ms:500 /proc/sys/net/ipv4/route/gc_thresh:32768 /proc/sys/net/ipv4/route/gc_timeout:300 /proc/sys/net/ipv4/route/max_size:524288 /proc/sys/net/ipv4/route/min_adv_mss:256 /proc/sys/net/ipv4/route/min_pmtu:552 /proc/sys/net/ipv4/route/mtu_expires:600 /proc/sys/net/ipv4/route/redirect_load:20 /proc/sys/net/ipv4/route/redirect_number:9 /proc/sys/net/ipv4/route/redirect_silence:20480 /proc/sys/net/ipv4/route/secret_interval:600 Kup /config # Kup /config # rtstat -c1 -i10 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 337510| 8928889| 1855387| 0| 14286| 69| 0| 0| 110840| 58108| 0| 1922232| 1920809| 377| 0|20908744| 294715| Kup /config # On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote > David Miller: > > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > > Date: Thu, 27 Mar 2008 08:35:06 +0200 > > > >> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be > >> released soon, but it is crashing like hell on high network load. > > > > That's amazing, you've taken a trip into the future and are running > > 2.6.27 already, please let me borrow your time machine :-) > > > > More seriously, there is obviously something very unique to your > > setup or else everyone would be reporting this crash, and we have > > to find out what that might be. > > > > There seems to be bunch of netfilter stuff in your traces, but > > the top of the trace is somewhere totally unrelated. This is > > a common reoccurance in your crash traces, making them less > > useful than they could be. > > > > I know you asked before what can be done to improve the traces, > > but I'm not an x86 expert so I have no idea how to help you > > in that area. > > > > Patrick, could you see if you can make any sense of his log? > > I see conttrack a lot in the backtraces. > > I can see rt_garbage_collect() involved here. This one might explain > very long delays in softirq processing, and eventually crashes... > > Denys, could you post : > > grep . /proc/sys/net/ipv4/route/* > > rtstat -c1 -i10 > > So that we can check if you should first change route cache tunables > :) > > > > > Thanks. > > > >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6- git6), > >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt > >> > >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP > >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 > >> task=f7459080 task.ti=f70b7000) > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: > >> Mar 26 02:27:14 ROUTER c092eb14 > >> Mar 26 02:27:14 ROUTER c011991e > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb34 > >> Mar 26 02:27:14 ROUTER c0119b3b > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER 00000082 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb3c > >> Mar 26 02:27:14 ROUTER c0119bfe > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER c092eb50 > >> Mar 26 02:27:14 ROUTER c012f19c > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER c092eb74 > >> Mar 26 02:27:14 ROUTER c011652a > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER printk+0x15/0x17 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER fput+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= > >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code: > >> Mar 26 02:27:14 ROUTER 94 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER 84 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER b9 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 75 > >> Mar 26 02:27:14 ROUTER 09 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 02 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 30 > >> Mar 26 02:27:14 ROUTER c9 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER c8 > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER ba > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 66 > >> Mar 26 02:27:14 ROUTER 0f > >> Mar 26 02:27:14 ROUTER c1 > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER 38 > >> Mar 26 02:27:14 ROUTER f2 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 06 > >> Mar 26 02:27:14 ROUTER f3 > >> Mar 26 02:27:14 ROUTER 90 > >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER eb > >> Mar 26 02:27:14 ROUTER f6 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 28 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 05 > >> Mar 26 02:27:14 ROUTER e8 > >> Mar 26 02:27:14 ROUTER 64 > >> Mar 26 02:27:14 ROUTER fd > >> Mar 26 02:27:14 ROUTER > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 7:03 ` Eric Dumazet 2008-03-27 8:36 ` Denys Fedoryshchenko 2008-03-27 8:48 ` Denys Fedoryshchenko @ 2008-03-27 14:41 ` Denys Fedoryshchenko 2008-03-27 14:52 ` Eric Dumazet 2 siblings, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 14:41 UTC (permalink / raw) To: Eric Dumazet, David Miller; +Cc: netdev, kaber, netfilter-devel Yes, it seems related to routing. Before such thing was not happening (maybe because TRIE was operating better?). Here is info at "peak time", i disable nmi_watchdog now, so garbage collector will not be triggered by nmi watchdog. rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 247011| 2255410| 162866| 0| 610| 5| 0| 0| 33761| 8574| 0| 165475| 165307| 0| 0|10407428| 174948| 251810| 43087| 2782| 0| 9| 0| 0| 0| 618| 122| 0| 2912| 2909| 0| 0| 241946| 3796| 256277| 43035| 2739| 0| 9| 0| 0| 0| 595| 121| 0| 2867| 2864| 0| 0| 243724| 3748| 260177| 43596| 2647| 0| 8| 0| 0| 0| 672| 123| 0| 2778| 2776| 0| 0| 246880| 4048| 232741| 42270| 2759| 0| 15| 0| 0| 0| 665| 135| 0| 2910| 2907| 0| 0| 233990| 3938| 226623| 42615| 2792| 0| 11| 0| 0| 0| 723| 132| 0| 2935| 2932| 0| 0| 218378| 3862| 233190| 42397| 2778| 0| 8| 0| 0| 0| 675| 128| 0| 2913| 2909| 0| 0| 214258| 3703| 239093| 42342| 2713| 0| 9| 0| 0| 0| 764| 126| 0| 2847| 2845| 0| 0| 216453| 4080| 150539| 36992| 7564| 0| 58| 0| 0| 0| 539| 283| 0| 7902| 7900| 0| 0| 93459| 1996| 154441| 40258| 4422| 0| 12| 0| 0| 0| 655| 212| 0| 4644| 4640| 0| 0| 77271| 1458| On one "crash" i got Mar 27 11:27:15 205.177.186.74 [ 7141.156100] dst cache overflow Mar 27 11:27:15 205.177.186.74 [ 7141.002291] dst cache overflow Mar 27 11:27:15 205.177.186.74 [ 7141.162756] dst cache overflow So i increase now routes amount more than 512K to 5120K, and will see more. On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote > David Miller: > > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > > Date: Thu, 27 Mar 2008 08:35:06 +0200 > > > >> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be > >> released soon, but it is crashing like hell on high network load. > > > > That's amazing, you've taken a trip into the future and are running > > 2.6.27 already, please let me borrow your time machine :-) > > > > More seriously, there is obviously something very unique to your > > setup or else everyone would be reporting this crash, and we have > > to find out what that might be. > > > > There seems to be bunch of netfilter stuff in your traces, but > > the top of the trace is somewhere totally unrelated. This is > > a common reoccurance in your crash traces, making them less > > useful than they could be. > > > > I know you asked before what can be done to improve the traces, > > but I'm not an x86 expert so I have no idea how to help you > > in that area. > > > > Patrick, could you see if you can make any sense of his log? > > I see conttrack a lot in the backtraces. > > I can see rt_garbage_collect() involved here. This one might explain > very long delays in softirq processing, and eventually crashes... > > Denys, could you post : > > grep . /proc/sys/net/ipv4/route/* > > rtstat -c1 -i10 > > So that we can check if you should first change route cache tunables > :) > > > > > Thanks. > > > >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6- git6), > >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt > >> > >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP > >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 > >> task=f7459080 task.ti=f70b7000) > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: > >> Mar 26 02:27:14 ROUTER c092eb14 > >> Mar 26 02:27:14 ROUTER c011991e > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER f750d600 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb34 > >> Mar 26 02:27:14 ROUTER c0119b3b > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER 00000082 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER 00000001 > >> Mar 26 02:27:14 ROUTER c092eb3c > >> Mar 26 02:27:14 ROUTER c0119bfe > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] > >> Mar 26 02:27:14 ROUTER c092eb50 > >> Mar 26 02:27:14 ROUTER c012f19c > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER f708af88 > >> Mar 26 02:27:14 ROUTER c0378058 > >> Mar 26 02:27:14 ROUTER c092eb74 > >> Mar 26 02:27:14 ROUTER c011652a > >> Mar 26 02:27:14 ROUTER 00000000 > >> Mar 26 02:27:14 ROUTER > >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER printk+0x15/0x17 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 > >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER fput+0x17/0x19 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d > >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] > >> Mar 26 02:27:14 ROUTER ? > >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 > >> Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= > >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code: > >> Mar 26 02:27:14 ROUTER 94 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER 84 > >> Mar 26 02:27:14 ROUTER c0 > >> Mar 26 02:27:14 ROUTER b9 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 75 > >> Mar 26 02:27:14 ROUTER 09 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 02 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 30 > >> Mar 26 02:27:14 ROUTER c9 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER c8 > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER ba > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 66 > >> Mar 26 02:27:14 ROUTER 0f > >> Mar 26 02:27:14 ROUTER c1 > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER 38 > >> Mar 26 02:27:14 ROUTER f2 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 06 > >> Mar 26 02:27:14 ROUTER f3 > >> Mar 26 02:27:14 ROUTER 90 > >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " > >> Mar 26 02:27:14 ROUTER 10 > >> Mar 26 02:27:14 ROUTER eb > >> Mar 26 02:27:14 ROUTER f6 > >> Mar 26 02:27:14 ROUTER 5d > >> Mar 26 02:27:14 ROUTER c3 > >> Mar 26 02:27:14 ROUTER 55 > >> Mar 26 02:27:14 ROUTER 89 > >> Mar 26 02:27:14 ROUTER e5 > >> Mar 26 02:27:14 ROUTER f0 > >> Mar 26 02:27:14 ROUTER 81 > >> Mar 26 02:27:14 ROUTER 28 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 00 > >> Mar 26 02:27:14 ROUTER 01 > >> Mar 26 02:27:14 ROUTER 74 > >> Mar 26 02:27:14 ROUTER 05 > >> Mar 26 02:27:14 ROUTER e8 > >> Mar 26 02:27:14 ROUTER 64 > >> Mar 26 02:27:14 ROUTER fd > >> Mar 26 02:27:14 ROUTER > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 14:41 ` Denys Fedoryshchenko @ 2008-03-27 14:52 ` Eric Dumazet [not found] ` <20080327151520.M89250@visp.net.lb> 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2008-03-27 14:52 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: David Miller, netdev, kaber, netfilter-devel Denys Fedoryshchenko a écrit : > Yes, it seems related to routing. Before such thing was not happening (maybe > because TRIE was operating better?). > > Here is info at "peak time", i disable nmi_watchdog now, so garbage collector > will not be triggered by nmi watchdog. > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 247011| 2255410| 162866| 0| 610| 5| 0| 0| > 33761| 8574| 0| 165475| 165307| 0| 0|10407428| 174948| > 251810| 43087| 2782| 0| 9| 0| 0| 0| > 618| 122| 0| 2912| 2909| 0| 0| 241946| 3796| > 256277| 43035| 2739| 0| 9| 0| 0| 0| > 595| 121| 0| 2867| 2864| 0| 0| 243724| 3748| > 260177| 43596| 2647| 0| 8| 0| 0| 0| > 672| 123| 0| 2778| 2776| 0| 0| 246880| 4048| > 232741| 42270| 2759| 0| 15| 0| 0| 0| > 665| 135| 0| 2910| 2907| 0| 0| 233990| 3938| > 226623| 42615| 2792| 0| 11| 0| 0| 0| > 723| 132| 0| 2935| 2932| 0| 0| 218378| 3862| > 233190| 42397| 2778| 0| 8| 0| 0| 0| > 675| 128| 0| 2913| 2909| 0| 0| 214258| 3703| > 239093| 42342| 2713| 0| 9| 0| 0| 0| > 764| 126| 0| 2847| 2845| 0| 0| 216453| 4080| > 150539| 36992| 7564| 0| 58| 0| 0| 0| > 539| 283| 0| 7902| 7900| 0| 0| 93459| 1996| > 154441| 40258| 4422| 0| 12| 0| 0| 0| > 655| 212| 0| 4644| 4640| 0| 0| 77271| 1458| > > On one "crash" i got > Mar 27 11:27:15 205.177.186.74 [ 7141.156100] dst cache overflow > Mar 27 11:27:15 205.177.186.74 [ 7141.002291] dst cache overflow > Mar 27 11:27:15 205.177.186.74 [ 7141.162756] dst cache overflow > > So i increase now routes amount more than 512K to 5120K, and will see more. > You want to tune route cache for your special needs, and not permit it to store 5 millions entries ! # default is a gc every 60 seconds, not good for large caches echo 1 >/proc/sys/net/ipv4/route/gc_interval # default is 8 entries per slot.. echo 4 >/proc/sys/net/ipv4/route/gc_elasticity # avoid a flush every 10 minutes echo 3600 >/proc/sys/net/ipv4/route/secret_interval -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20080327151520.M89250@visp.net.lb>]
* Re: kernel 2.6.25-rc7 highly unstable on high load [not found] ` <20080327151520.M89250@visp.net.lb> @ 2008-03-27 16:07 ` Eric Dumazet 2008-03-27 16:29 ` Eric Dumazet 2008-03-27 18:37 ` Denys Fedoryshchenko 0 siblings, 2 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-27 16:07 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: David Miller, netdev, kaber, netfilter-devel Denys Fedoryshchenko a écrit : > I did this tuning. Not sure what does mean rt_cache_entries, but seems they > are still increasing. > > Kup /config # rtstat -c1000 -i60 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 274234| 1590009| 271953| 0| 1809| 6| 0| 0| > 24551| 7365| 0| 279991| 279867| 24| 0| 5438630| 89656| > 280535| 25463| 19956| 0| 187| 0| 0| 0| > 485| 263| 0| 20407| 20404| 3| 0| 12796| 242| > 288072| 25367| 19830| 0| 163| 0| 0| 0| > 493| 256| 0| 20248| 20245| 3| 0| 17146| 346| > 294943| 25689| 19807| 0| 154| 0| 0| 0| > 438| 250| 0| 20211| 20208| 3| 0| 21554| 432| > 301118| 25287| 19645| 0| 170| 0| 0| 0| > 466| 236| 0| 20052| 20048| 3| 0| 25713| 528| > 307651| 25663| 20264| 0| 180| 0| 0| 0| > 472| 234| 0| 20678| 20675| 3| 0| 30588| 607| > 313767| 25810| 20014| 0| 175| 0| 0| 0| > 459| 237| 0| 20425| 20422| 3| 0| 33701| 699| > > > Then try to lower gc_elasticity, to 3, or even 2 echo 3 >/proc/sys/net/ipv4/route/gc_elasticity ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 16:07 ` Eric Dumazet @ 2008-03-27 16:29 ` Eric Dumazet 2008-03-27 18:37 ` Denys Fedoryshchenko 1 sibling, 0 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-27 16:29 UTC (permalink / raw) To: Eric Dumazet Cc: Denys Fedoryshchenko, David Miller, netdev, kaber, netfilter-devel Eric Dumazet a écrit : > Denys Fedoryshchenko a écrit : >> I did this tuning. Not sure what does mean rt_cache_entries, but >> seems they are still increasing. >> >> Kup /config # rtstat -c1000 -i60 >> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| >> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| >> rt_cache| >> entries| in_hit|in_slow_|in_slow_|in_no_ro| >> in_brd|in_marti|in_marti| >> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| >> out_hlis| >> | | tot| mc| ute| | an_dst| >> an_src| | _tot| _mc| | ed| miss| >> verflow| _search|t_search| >> 274234| 1590009| 271953| 0| 1809| 6| 0| >> 0| 24551| 7365| 0| 279991| 279867| 24| 0| >> 5438630| 89656| >> 280535| 25463| 19956| 0| 187| 0| 0| >> 0| 485| 263| 0| 20407| 20404| 3| 0| >> 12796| 242| >> 288072| 25367| 19830| 0| 163| 0| 0| >> 0| 493| 256| 0| 20248| 20245| 3| 0| >> 17146| 346| >> 294943| 25689| 19807| 0| 154| 0| 0| >> 0| 438| 250| 0| 20211| 20208| 3| 0| >> 21554| 432| >> 301118| 25287| 19645| 0| 170| 0| 0| >> 0| 466| 236| 0| 20052| 20048| 3| 0| >> 25713| 528| >> 307651| 25663| 20264| 0| 180| 0| 0| >> 0| 472| 234| 0| 20678| 20675| 3| 0| >> 30588| 607| >> 313767| 25810| 20014| 0| 175| 0| 0| >> 0| 459| 237| 0| 20425| 20422| 3| 0| >> 33701| 699| >> >> >> > Then try to lower gc_elasticity, to 3, or even 2 > > echo 3 >/proc/sys/net/ipv4/route/gc_elasticity > > One more idea before leaving :) If rt_cache_entries still increasing, we might have a dst leak somewhere : gc tries to evict entries that have a non null refcount -> they are put in dst_garbage.list Following patch will show us how dst_garbage behaves. ( printk(KERN_DEBUG "dst_total: %d delayed: %d work_perf: %d" ...) diff --git a/net/core/dst.c b/net/core/dst.c index 7deef48..e634e5f 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -31,6 +31,8 @@ * 3) This list is guarded by a mutex, * so that the gc_task and dst_dev_event() can be synchronized. */ +#undef RT_CACHE_DEBUG +#define RT_CACHE_DEBUG 2 #if RT_CACHE_DEBUG >= 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 16:07 ` Eric Dumazet 2008-03-27 16:29 ` Eric Dumazet @ 2008-03-27 18:37 ` Denys Fedoryshchenko 2008-03-27 18:56 ` Eric Dumazet 1 sibling, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 18:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, kaber, netdev, netfilter-devel Here is output from dmesg with patch you supplied. Kup /config # rtstat -i60 -c60 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 103266| 69986| 11574| 0| 54| 0| 0| 0| 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254| 124787| 45125| 6019| 0| 28| 0| 0| 0| 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518| 120270| 45588| 6288| 0| 30| 0| 0| 0| 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743| 122253| 46522| 6582| 0| 27| 0| 0| 0| 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761| [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999 elapsed: 1 us [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998 elapsed: 2 us [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000 elapsed: 1 us [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998 elapsed: 1 us [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600 elapsed: 2 us [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696 elapsed: 6 us After while Kup /config # rtstat -i300 -c60 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 129138| 64166| 9864| 0| 46| 0| 0| 0| 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985| 130125| 43971| 10829| 0| 70| 0| 0| 0| 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848| 144857| 32041| 22233| 0| 234| 0| 0| 0| 738| 261| 0| 22727| 22724| 2| 0| 42769| 811| [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999 elapsed: 1 us [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600 elapsed: 11274 us [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600 elapsed: 11150 us [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100 elapsed: 10896 us [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655 elapsed: 11561 us [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988 elapsed: 12254 us [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988 elapsed: 12209 us [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987 elapsed: 12039 us [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987 elapsed: 12448 us [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988 elapsed: 11435 us [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987 elapsed: 12833 us [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987 elapsed: 12068 us [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988 elapsed: 11970 us [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989 elapsed: 11747 us [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988 elapsed: 12653 us [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987 elapsed: 12217 us [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987 elapsed: 12026 us [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988 elapsed: 11151 us and after 30 minutes more Kup /config # rtstat -i300 -c60 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 194841| 176962| 77964| 0| 684| 1| 0| 0| 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455| 223224| 28262| 26499| 0| 236| 0| 0| 0| 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946| 248425| 28257| 26784| 0| 226| 0| 0| 0| 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412| 272823| 28943| 26728| 0| 246| 0| 0| 0| 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942| 295087| 28835| 26701| 0| 252| 0| 0| 0| 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312| [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 12519 us [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13930 us [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986 elapsed: 14330 us [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972 elapsed: 15230 us [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985 elapsed: 15018 us [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13654 us [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987 elapsed: 13137 us On Thu, 27 Mar 2008 17:07:29 +0100, Eric Dumazet wrote > Denys Fedoryshchenko a : > > I did this tuning. Not sure what does mean rt_cache_entries, but seems they > > are still increasing. > > > > Kup /config # rtstat -c1000 -i60 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 274234| 1590009| 271953| 0| 1809| 6| 0| 0| > > 24551| 7365| 0| 279991| 279867| 24| 0| 5438630| 89656| > > 280535| 25463| 19956| 0| 187| 0| 0| 0| > > 485| 263| 0| 20407| 20404| 3| 0| 12796| 242| > > 288072| 25367| 19830| 0| 163| 0| 0| 0| > > 493| 256| 0| 20248| 20245| 3| 0| 17146| 346| > > 294943| 25689| 19807| 0| 154| 0| 0| 0| > > 438| 250| 0| 20211| 20208| 3| 0| 21554| 432| > > 301118| 25287| 19645| 0| 170| 0| 0| 0| > > 466| 236| 0| 20052| 20048| 3| 0| 25713| 528| > > 307651| 25663| 20264| 0| 180| 0| 0| 0| > > 472| 234| 0| 20678| 20675| 3| 0| 30588| 607| > > 313767| 25810| 20014| 0| 175| 0| 0| 0| > > 459| 237| 0| 20425| 20422| 3| 0| 33701| 699| > > > > > > > Then try to lower gc_elasticity, to 3, or even 2 > > echo 3 >/proc/sys/net/ipv4/route/gc_elasticity -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 18:37 ` Denys Fedoryshchenko @ 2008-03-27 18:56 ` Eric Dumazet 2008-03-27 19:05 ` Denis V. Lunev 2008-03-27 22:03 ` David Miller 0 siblings, 2 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-27 18:56 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: David Miller, kaber, netdev, netfilter-devel Denys Fedoryshchenko a écrit : > Here is output from dmesg with patch you supplied. > > Kup /config # rtstat -i60 -c60 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 103266| 69986| 11574| 0| 54| 0| 0| 0| > 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254| > 124787| 45125| 6019| 0| 28| 0| 0| 0| > 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518| > 120270| 45588| 6288| 0| 30| 0| 0| 0| > 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743| > 122253| 46522| 6582| 0| 27| 0| 0| 0| > 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761| > > [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999 > elapsed: 1 us > [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998 > elapsed: 2 us > [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000 > elapsed: 1 us > [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998 > elapsed: 1 us > [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600 > elapsed: 2 us > [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696 > elapsed: 6 us > > After while > > Kup /config # rtstat -i300 -c60 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 129138| 64166| 9864| 0| 46| 0| 0| 0| > 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985| > 130125| 43971| 10829| 0| 70| 0| 0| 0| > 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848| > 144857| 32041| 22233| 0| 234| 0| 0| 0| > 738| 261| 0| 22727| 22724| 2| 0| 42769| 811| > > [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999 > elapsed: 1 us > [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600 > elapsed: 11274 us > [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600 > elapsed: 11150 us > [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100 > elapsed: 10896 us > [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655 > elapsed: 11561 us > [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988 > elapsed: 12254 us > [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988 > elapsed: 12209 us > [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987 > elapsed: 12039 us > [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987 > elapsed: 12448 us > [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988 > elapsed: 11435 us > [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987 > elapsed: 12833 us > [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987 > elapsed: 12068 us > [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988 > elapsed: 11970 us > [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989 > elapsed: 11747 us > [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988 > elapsed: 12653 us > [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987 > elapsed: 12217 us > [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987 > elapsed: 12026 us > [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988 > elapsed: 11151 us > > and after 30 minutes more > > Kup /config # rtstat -i300 -c60 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 194841| 176962| 77964| 0| 684| 1| 0| 0| > 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455| > 223224| 28262| 26499| 0| 236| 0| 0| 0| > 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946| > 248425| 28257| 26784| 0| 226| 0| 0| 0| > 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412| > 272823| 28943| 26728| 0| 246| 0| 0| 0| > 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942| > 295087| 28835| 26701| 0| 252| 0| 0| 0| > 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312| > > > [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987 > elapsed: 12519 us > [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987 > elapsed: 13930 us > [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986 > elapsed: 14330 us > [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972 > elapsed: 15230 us > [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985 > elapsed: 15018 us > [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987 > elapsed: 13654 us > [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987 > elapsed: 13137 us > > Very interesting... you can see 101588 dst are *delayed* in dst_garbage, but apparently never freed. Something is wrong, but the count seems stable. Must be some kind of event, admin driven or something... might be the "ip route flush cache" that is schedlued around 600 seconds of machine alive, then secret_interval seconds later... Typically, when a "ip route flush cache" is done (manually or triggered by secret_interval timer), refcounted>0 entries are put into dst_garbage. Then when some trafic occurs on the flows involved, IP stack should decrement refcount so that next dst_garbage round can free the deleted entries. Normal TCP connections are doing this correctly. On your machine, nothing. Possibly idle tcp sessions ? Maybe some netfilter problem ? Please tell us more about your machine ;) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 18:56 ` Eric Dumazet @ 2008-03-27 19:05 ` Denis V. Lunev 2008-03-27 22:03 ` David Miller 1 sibling, 0 replies; 29+ messages in thread From: Denis V. Lunev @ 2008-03-27 19:05 UTC (permalink / raw) To: Eric Dumazet Cc: Denys Fedoryshchenko, David Miller, kaber, netdev, netfilter-devel On Thu, 2008-03-27 at 19:56 +0100, Eric Dumazet wrote: > Denys Fedoryshchenko a écrit : > > Here is output from dmesg with patch you supplied. > > > > Kup /config # rtstat -i60 -c60 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 103266| 69986| 11574| 0| 54| 0| 0| 0| > > 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254| > > 124787| 45125| 6019| 0| 28| 0| 0| 0| > > 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518| > > 120270| 45588| 6288| 0| 30| 0| 0| 0| > > 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743| > > 122253| 46522| 6582| 0| 27| 0| 0| 0| > > 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761| > > > > [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999 > > elapsed: 1 us > > [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998 > > elapsed: 2 us > > [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000 > > elapsed: 1 us > > [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998 > > elapsed: 1 us > > [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600 > > elapsed: 2 us > > [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696 > > elapsed: 6 us > > > > After while > > > > Kup /config # rtstat -i300 -c60 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 129138| 64166| 9864| 0| 46| 0| 0| 0| > > 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985| > > 130125| 43971| 10829| 0| 70| 0| 0| 0| > > 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848| > > 144857| 32041| 22233| 0| 234| 0| 0| 0| > > 738| 261| 0| 22727| 22724| 2| 0| 42769| 811| > > > > [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999 > > elapsed: 1 us > > [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600 > > elapsed: 11274 us > > [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600 > > elapsed: 11150 us > > [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100 > > elapsed: 10896 us > > [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655 > > elapsed: 11561 us > > [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988 > > elapsed: 12254 us > > [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988 > > elapsed: 12209 us > > [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987 > > elapsed: 12039 us > > [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987 > > elapsed: 12448 us > > [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988 > > elapsed: 11435 us > > [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987 > > elapsed: 12833 us > > [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987 > > elapsed: 12068 us > > [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988 > > elapsed: 11970 us > > [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989 > > elapsed: 11747 us > > [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988 > > elapsed: 12653 us > > [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987 > > elapsed: 12217 us > > [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987 > > elapsed: 12026 us > > [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988 > > elapsed: 11151 us > > > > and after 30 minutes more > > > > Kup /config # rtstat -i300 -c60 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 194841| 176962| 77964| 0| 684| 1| 0| 0| > > 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455| > > 223224| 28262| 26499| 0| 236| 0| 0| 0| > > 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946| > > 248425| 28257| 26784| 0| 226| 0| 0| 0| > > 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412| > > 272823| 28943| 26728| 0| 246| 0| 0| 0| > > 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942| > > 295087| 28835| 26701| 0| 252| 0| 0| 0| > > 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312| > > > > > > [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987 > > elapsed: 12519 us > > [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987 > > elapsed: 13930 us > > [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986 > > elapsed: 14330 us > > [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972 > > elapsed: 15230 us > > [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985 > > elapsed: 15018 us > > [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987 > > elapsed: 13654 us > > [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987 > > elapsed: 13137 us > > > > > > Very interesting... you can see 101588 dst are *delayed* in dst_garbage, > but apparently never freed. > > Something is wrong, but the count seems stable. Must be some kind of > event, admin driven or something... > might be the "ip route flush cache" that is schedlued around 600 seconds > of machine alive, then secret_interval seconds later... > > Typically, when a "ip route flush cache" is done (manually or triggered > by secret_interval timer), refcounted>0 entries are put into dst_garbage. > > Then when some trafic occurs on the flows involved, IP stack should > decrement refcount so that next dst_garbage round can free the deleted > entries. Normal TCP connections are doing this correctly. On your > machine, nothing. > > Possibly idle tcp sessions ? > > Maybe some netfilter problem ? > > Please tell us more about your machine ;) could you check the patch sent yesterday by Pavel under the name [PATCH][ICMP]: Dst entry leak in icmp_send host re-lookup code (v2). It can fit the case, massive DST leakage is possible here. Regards, Den -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 18:56 ` Eric Dumazet 2008-03-27 19:05 ` Denis V. Lunev @ 2008-03-27 22:03 ` David Miller 2008-03-28 0:47 ` Denys Fedoryshchenko ` (2 more replies) 1 sibling, 3 replies; 29+ messages in thread From: David Miller @ 2008-03-27 22:03 UTC (permalink / raw) To: dada1; +Cc: denys, kaber, netdev, netfilter-devel From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 27 Mar 2008 19:56:09 +0100 > Maybe some netfilter problem ? > > Please tell us more about your machine ;) I'm beginning to suspect it's IFB and shaping somehow. If those variables could be eliminated by eliminating them from the configuration, it would help diagnose this a lot. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 22:03 ` David Miller @ 2008-03-28 0:47 ` Denys Fedoryshchenko 2008-03-28 4:50 ` Denys Fedoryshchenko 2008-03-28 5:25 ` Denys Fedoryshchenko 2 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 0:47 UTC (permalink / raw) To: dada1, David Miller; +Cc: kaber, netdev, netfilter-devel After removing shaping and ifb problem probably remains. I will try to apply mentioned by Denis Lunev, patch. Netfilter it is kind of difficult to remove completely, pattern of traffic will change a lot, and it will be false that it was cause of problem. I am using NOTRACK, SNAT, DNAT, sure filtering. Nothing "special". rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 122438|25068014| 2464002| 0| 8579| 105| 0| 3| 436160| 149272| 0| 2560231| 2557567| 54| 0|52890998| 1164822| [ 355.594112] dst_total: 115387 delayed: 1 work_perf: 0 expires: 68196 elapsed: 0 us [ 423.780775] dst_total: 126186 delayed: 1 work_perf: 0 expires: 76999 elapsed: 0 us [ 500.768537] dst_total: 122085 delayed: 1 work_perf: 0 expires: 86000 elapsed: 0 us [ 586.755462] dst_total: 121715 delayed: 1 work_perf: 0 expires: 95000 elapsed: 1 us [ 681.741665] dst_total: 129798 delayed: 1 work_perf: 0 expires: 104999 elapsed: 0 us [ 786.732216] dst_total: 116243 delayed: 74333 work_perf: 1 expires: 600 elapsed: 7076 us [ 787.338620] dst_total: 116912 delayed: 74333 work_perf: 0 expires: 1600 elapsed: 6962 us [ 788.945653] dst_total: 118565 delayed: 74333 work_perf: 0 expires: 3100 elapsed: 7246 us [ 792.052890] dst_total: 121611 delayed: 74333 work_perf: 0 expires: 5672 elapsed: 6954 us [ 797.730354] dst_total: 126794 delayed: 74333 work_perf: 0 expires: 7993 elapsed: 7276 us [ 805.729136] dst_total: 115686 delayed: 74333 work_perf: 0 expires: 10993 elapsed: 7270 us [ 816.727732] dst_total: 126402 delayed: 74333 work_perf: 0 expires: 13993 elapsed: 7115 us [ 830.726662] dst_total: 121002 delayed: 74333 work_perf: 0 expires: 17992 elapsed: 7179 us [ 848.724144] dst_total: 119849 delayed: 74333 work_perf: 0 expires: 22992 elapsed: 7370 us [ 871.719176] dst_total: 124661 delayed: 74333 work_perf: 0 expires: 27993 elapsed: 7313 us [ 899.715826] dst_total: 118860 delayed: 74333 work_perf: 0 expires: 32992 elapsed: 7192 us [ 932.710299] dst_total: 120856 delayed: 74333 work_perf: 0 expires: 38993 elapsed: 7291 us [ 971.704648] dst_total: 119519 delayed: 74333 work_perf: 0 expires: 45993 elapsed: 6946 us [ 1017.698036] dst_total: 126722 delayed: 74333 work_perf: 0 expires: 52992 elapsed: 7300 us [ 1070.693197] dst_total: 124835 delayed: 101547 work_perf: 3 expires: 600 elapsed: 10089 us [ 1071.303147] dst_total: 125939 delayed: 101547 work_perf: 0 expires: 1600 elapsed: 9544 us [ 1072.913253] dst_total: 128499 delayed: 101547 work_perf: 0 expires: 3100 elapsed: 9895 us [ 1076.021980] dst_total: 117827 delayed: 101547 work_perf: 0 expires: 5659 elapsed: 10096 us [ 1081.690082] dst_total: 127534 delayed: 101547 work_perf: 0 expires: 7990 elapsed: 10052 us [ 1089.690531] dst_total: 127330 delayed: 101547 work_perf: 0 expires: 10989 elapsed: 11717 us Without sysctl adjustments (each 1 minute) Kup ~ # rtstat -i60 -c60 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 40160| 3323| 1094| 0| 11| 0| 0| 0| 282| 127| 0| 274| 274| 0| 0| 1867| 226| 102698| 26174| 2885| 0| 6| 0| 0| 0| 620| 228| 0| 3118| 3115| 0| 0| 37298| 1331| 141996| 25752| 2172| 0| 3| 0| 0| 0| 521| 167| 0| 2343| 2340| 0| 0| 66292| 1698| 169128| 24716| 1593| 0| 2| 0| 0| 0| 468| 140| 0| 1734| 1731| 0| 0| 83079| 1714| 185495| 24867| 1493| 0| 1| 0| 0| 0| 386| 125| 0| 1619| 1616| 0| 0| 97451| 1603| 195665| 25227| 1443| 0| 1| 0| 0| 0| 427| 126| 0| 1570| 1567| 0| 0| 111618| 1955| 201357| 25239| 1370| 0| 2| 0| 0| 0| 435| 118| 0| 1490| 1487| 0| 0| 119599| 2010| 206841| 25805| 1487| 0| 2| 0| 0| 0| 515| 120| 0| 1607| 1604| 0| 0| 123823| 2429| 213800| 25230| 1678| 0| 2| 0| 0| 0| 414| 116| 0| 1795| 1792| 0| 0| 127516| 2210| 216461| 25882| 1503| 0| 2| 0| 0| 0| 424| 114| 0| 1618| 1615| 0| 0| 136162| 2308| 220386| 26141| 1632| 0| 2| 0| 0| 0| 387| 115| 0| 1749| 1745| 0| 0| 137147| 2165| 221533| 25649| 1476| 0| 2| 0| 0| 0| 394| 116| 0| 1593| 1590| 0| 0| 131760| 2274| 221139| 25682| 1368| 0| 2| 0| 0| 0| 432| 111| 0| 1481| 1477| 0| 0| 130734| 2321| 152901| 24897| 2242| 0| 6| 0| 0| 0| 367| 157| 0| 2405| 2402| 0| 0| 68008| 1245| 190000| 25295| 1747| 0| 1| 0| 0| 0| 403| 143| 0| 1891| 1888| 0| 0| 69542| 916| [ 586.911061] dst_total: 225846 delayed: 4 work_perf: 0 expires: 95000 elapsed: 1 us [ 681.897473] dst_total: 221226 delayed: 4 work_perf: 0 expires: 105000 elapsed: 1 us [ 786.886616] dst_total: 145402 delayed: 59222 work_perf: 0 expires: 600 elapsed: 5893 us [ 787.491390] dst_total: 145565 delayed: 59587 work_perf: 0 expires: 600 elapsed: 5783 us [ 788.096098] dst_total: 145724 delayed: 59888 work_perf: 0 expires: 600 elapsed: 5582 us [ 788.701111] dst_total: 145983 delayed: 60148 work_perf: 0 expires: 600 elapsed: 5685 us [ 789.305991] dst_total: 146120 delayed: 60412 work_perf: 0 expires: 600 elapsed: 5661 us [ 789.910802] dst_total: 146432 delayed: 60643 work_perf: 0 expires: 600 elapsed: 5566 us [ 790.515868] dst_total: 146558 delayed: 61009 work_perf: 0 expires: 600 elapsed: 5720 us [ 791.120788] dst_total: 146781 delayed: 61250 work_perf: 0 expires: 600 elapsed: 5714 us [ 791.725773] dst_total: 147075 delayed: 61488 work_perf: 0 expires: 600 elapsed: 5806 us [ 792.330558] dst_total: 147404 delayed: 61766 work_perf: 0 expires: 600 elapsed: 5685 us [ 792.935548] dst_total: 147761 delayed: 61978 work_perf: 0 expires: 600 elapsed: 5765 us [ 793.540664] dst_total: 147932 delayed: 62339 work_perf: 0 expires: 600 elapsed: 5966 us [ 794.145640] dst_total: 148181 delayed: 62632 work_perf: 0 expires: 600 elapsed: 6046 us [ 794.751394] dst_total: 148507 delayed: 62865 work_perf: 0 expires: 600 elapsed: 5886 us [ 795.356327] dst_total: 148940 delayed: 63123 work_perf: 0 expires: 600 elapsed: 5913 us [ 795.961312] dst_total: 149304 delayed: 63342 work_perf: 0 expires: 600 elapsed: 5988 us [ 796.566230] dst_total: 149471 delayed: 63619 work_perf: 0 expires: 600 elapsed: 6002 us [ 797.172182] dst_total: 149819 delayed: 63853 work_perf: 0 expires: 600 elapsed: 6046 us [ 797.778163] dst_total: 150183 delayed: 64108 work_perf: 0 expires: 600 elapsed: 6117 us [ 798.384008] dst_total: 150492 delayed: 64260 work_perf: 0 expires: 600 elapsed: 6052 us [ 798.990028] dst_total: 150702 delayed: 64535 work_perf: 0 expires: 600 elapsed: 6167 us [ 799.595797] dst_total: 151109 delayed: 64691 work_perf: 0 expires: 600 elapsed: 6026 us [ 800.201763] dst_total: 151498 delayed: 64864 work_perf: 0 expires: 600 elapsed: 6084 us [ 800.807742] dst_total: 151861 delayed: 65075 work_perf: 0 expires: 600 elapsed: 6155 us [ 801.413577] dst_total: 152344 delayed: 65245 work_perf: 0 expires: 600 elapsed: 6079 us [ 802.019727] dst_total: 152608 delayed: 65498 work_perf: 0 expires: 600 elapsed: 6323 us [ 802.625410] dst_total: 153049 delayed: 65676 work_perf: 1 expires: 600 elapsed: 6099 us [ 803.231251] dst_total: 153390 delayed: 65835 work_perf: 0 expires: 600 elapsed: 6024 us [ 803.837186] dst_total: 153855 delayed: 65990 work_perf: 0 expires: 600 elapsed: 6056 us [ 804.443457] dst_total: 154294 delayed: 66133 work_perf: 0 expires: 600 elapsed: 6422 us [ 805.049167] dst_total: 154548 delayed: 66378 work_perf: 0 expires: 600 elapsed: 6221 us [ 805.655036] dst_total: 155001 delayed: 66499 work_perf: 0 expires: 600 elapsed: 6183 us [ 806.260986] dst_total: 155418 delayed: 66642 work_perf: 0 expires: 600 elapsed: 6229 us [ 806.866893] dst_total: 155892 delayed: 66768 work_perf: 0 expires: 600 elapsed: 6224 us [ 807.473041] dst_total: 156268 delayed: 66884 work_perf: 0 expires: 600 elapsed: 6463 us [ 808.078979] dst_total: 156514 delayed: 67093 work_perf: 0 expires: 600 elapsed: 6494 us [ 808.684670] dst_total: 156927 delayed: 67226 work_perf: 0 expires: 600 elapsed: 6275 us [ 809.290646] dst_total: 157364 delayed: 67327 work_perf: 0 expires: 600 elapsed: 6345 us [ 809.896516] dst_total: 157724 delayed: 67428 work_perf: 0 expires: 600 elapsed: 6307 us [ 810.502756] dst_total: 157915 delayed: 67642 work_perf: 0 expires: 600 elapsed: 6641 us [ 811.108374] dst_total: 158349 delayed: 67746 work_perf: 0 expires: 600 elapsed: 6349 us [ 811.714508] dst_total: 158856 delayed: 67843 work_perf: 0 expires: 600 elapsed: 6576 us [ 812.320241] dst_total: 159315 delayed: 67948 work_perf: 0 expires: 600 elapsed: 6403 us [ 812.926305] dst_total: 159726 delayed: 68041 work_perf: 0 expires: 600 elapsed: 6553 us [ 813.532064] dst_total: 159997 delayed: 68217 work_perf: 0 expires: 600 elapsed: 6406 us [ 814.137984] dst_total: 160527 delayed: 68309 work_perf: 0 expires: 600 elapsed: 6416 us [ 814.743949] dst_total: 160994 delayed: 68404 work_perf: 0 expires: 600 elapsed: 6475 us [ 815.350120] dst_total: 161460 delayed: 68497 work_perf: 0 expires: 600 elapsed: 6739 us [ 815.955683] dst_total: 161873 delayed: 68568 work_perf: 0 expires: 600 elapsed: 6393 us [ 816.561587] dst_total: 162083 delayed: 68684 work_perf: 0 expires: 600 elapsed: 6389 us [ 817.167555] dst_total: 162613 delayed: 68748 work_perf: 0 expires: 600 elapsed: 6450 us [ 817.773653] dst_total: 163137 delayed: 68821 work_perf: 0 expires: 600 elapsed: 6637 us [ 818.379400] dst_total: 163592 delayed: 68889 work_perf: 0 expires: 600 elapsed: 6477 us [ 818.985324] dst_total: 164115 delayed: 68946 work_perf: 0 expires: 600 elapsed: 6494 us [ 819.591212] dst_total: 164223 delayed: 69019 work_perf: 0 expires: 600 elapsed: 6474 us [ 820.197032] dst_total: 164666 delayed: 69019 work_perf: 0 expires: 1600 elapsed: 6387 us [ 821.802811] dst_total: 165685 delayed: 69019 work_perf: 0 expires: 3100 elapsed: 6402 us [ 824.908547] dst_total: 167570 delayed: 69019 work_perf: 0 expires: 4967 elapsed: 6588 us [ 829.881811] dst_total: 170472 delayed: 69019 work_perf: 0 expires: 7993 elapsed: 6786 us [ 837.880254] dst_total: 175476 delayed: 69019 work_perf: 0 expires: 10993 elapsed: 6455 us [ 848.878097] dst_total: 181394 delayed: 69019 work_perf: 0 expires: 13994 elapsed: 6752 us [ 862.875836] dst_total: 190093 delayed: 69019 work_perf: 0 expires: 17994 elapsed: 6633 us [ 880.873325] dst_total: 200983 delayed: 69019 work_perf: 0 expires: 22994 elapsed: 6839 us [ 903.870309] dst_total: 207322 delayed: 69019 work_perf: 0 expires: 27993 elapsed: 6522 us On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 27 Mar 2008 19:56:09 +0100 > > > Maybe some netfilter problem ? > > > > Please tell us more about your machine ;) > > I'm beginning to suspect it's IFB and shaping somehow. > > If those variables could be eliminated by eliminating them > from the configuration, it would help diagnose this a lot. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 22:03 ` David Miller 2008-03-28 0:47 ` Denys Fedoryshchenko @ 2008-03-28 4:50 ` Denys Fedoryshchenko 2008-03-28 5:25 ` Denys Fedoryshchenko 2 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 4:50 UTC (permalink / raw) To: David Miller, dada1; +Cc: kaber, netdev, netfilter-devel After patching , without ifb and "shapers". [11807.126790] dst_total: 807311 delayed: 785973 work_perf: 0 expires: 45928 elapsed: 71977 us [11853.118939] dst_total: 810625 delayed: 785973 work_perf: 0 expires: 52928 elapsed: 72092 us [11906.110627] dst_total: 814329 delayed: 785973 work_perf: 0 expires: 59929 elapsed: 71415 us [11966.101684] dst_total: 818621 delayed: 785973 work_perf: 0 expires: 67929 elapsed: 71566 us [12034.092650] dst_total: 823641 delayed: 785973 work_perf: 0 expires: 76927 elapsed: 72856 us [12111.080909] dst_total: 829008 delayed: 785973 work_perf: 0 expires: 85927 elapsed: 73175 us [12197.066708] dst_total: 835397 delayed: 785973 work_perf: 0 expires: 94928 elapsed: 72001 us [12292.054049] dst_total: 842092 delayed: 785972 work_perf: 1 expires: 104927 elapsed: 73341 us [12397.038451] dst_total: 849712 delayed: 785972 work_perf: 0 expires: 115926 elapsed: 74072 us [12513.032054] dst_total: 858081 delayed: 785972 work_perf: 0 expires: 119915 elapsed: 77027 us [12633.016029] dst_total: 866064 delayed: 785972 work_perf: 0 expires: 119913 elapsed: 87431 us [12752.987981] dst_total: 874067 delayed: 785972 work_perf: 0 expires: 119923 elapsed: 76570 us [12872.981870] dst_total: 882006 delayed: 785972 work_perf: 0 expires: 119911 elapsed: 88655 us [12992.950062] dst_total: 889918 delayed: 785972 work_perf: 0 expires: 119924 elapsed: 76035 us rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 896238|135781150|112584149| 0| 475839| 1708| 0| 18| 3234260| 2340122| 0|115324498|115284650| 35962| 0|484427466| 15419774| On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 27 Mar 2008 19:56:09 +0100 > > > Maybe some netfilter problem ? > > > > Please tell us more about your machine ;) > > I'm beginning to suspect it's IFB and shaping somehow. > > If those variables could be eliminated by eliminating them > from the configuration, it would help diagnose this a lot. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 22:03 ` David Miller 2008-03-28 0:47 ` Denys Fedoryshchenko 2008-03-28 4:50 ` Denys Fedoryshchenko @ 2008-03-28 5:25 ` Denys Fedoryshchenko 2008-03-28 5:49 ` Eric Dumazet 2 siblings, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 5:25 UTC (permalink / raw) To: David Miller, dada1; +Cc: kaber, netdev, netfilter-devel Just to make sure 2.6.24.3 is stable and it is regression i am supplying output from it. Do you want me to submit summary to bugzilla and regression list as well? And in short, IMHO 2.6.25 have major issues on routing that have to be fixed before release. TRIE is crashing, and even with HASH there is leak. I am trying my best to bisect it, but it is major router and i cannot take much risk on it, so i wish i can simulate in my home mini-lab. Still i am not able to get even proper switch (Lebanon difficult country for IT). Kup ~ # uname -a Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknown up ~ # rtstat -i60 -c6000 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 54750| 4430| 1128| 0| 12| 0| 0| 0| 263| 190| 0| 709| 708| 0| 0| 3545| 313| 92913| 8829| 1211| 0| 1| 0| 0| 0| 343| 163| 0| 1375| 1373| 0| 0| 12545| 724| 115323| 8232| 906| 0| 0| 0| 0| 0| 299| 128| 0| 1035| 1033| 0| 0| 18069| 813| 128985| 8650| 839| 0| 0| 0| 0| 0| 289| 115| 0| 954| 952| 0| 0| 22515| 845| 116682| 8911| 861| 0| 0| 0| 0| 0| 288| 117| 0| 978| 976| 0| 0| 23433| 775| 99969| 9164| 889| 0| 0| 0| 0| 0| 280| 113| 0| 1002| 1000| 0| 0| 26741| 839| 124602| 9395| 1012| 0| 0| 0| 0| 0| 271| 122| 0| 1134| 1132| 0| 0| 27381| 787| 110051| 10036| 824| 0| 0| 0| 0| 0| 279| 120| 0| 944| 942| 0| 0| 28558| 783| 126835| 10631| 772| 0| 0| 0| 0| 0| 274| 117| 0| 888| 886| 0| 0| 29451| 780| 111881| 10357| 762| 0| 0| 0| 0| 0| 275| 117| 0| 879| 877| 0| 0| 28235| 751| 127018| 10178| 796| 0| 0| 0| 0| 0| 283| 117| 0| 913| 911| 0| 0| 29480| 807| 112242| 9839| 814| 0| 0| 0| 0| 0| 293| 115| 0| 929| 927| 0| 0| 28095| 796| 41267| 9493| 1217| 0| 1| 0| 0| 0| 269| 138| 0| 811| 810| 0| 0| 18545| 548| 76380| 9722| 1060| 0| 1| 0| 0| 0| 250| 135| 0| 1195| 1193| 0| 0| 14786| 414| 99922| 9811| 779| 0| 0| 0| 0| 0| 281| 124| 0| 902| 900| 0| 0| 21853| 589| Kup ~ # rtstat -i60 -c6000 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 122053| 150955| 14888| 0| 25| 1| 0| 0| 4611| 2090| 0| 15820| 15789| 0| 0| 369513| 11562| 105226| 10215| 872| 0| 0| 0| 0| 0| 279| 116| 0| 988| 986| 0| 0| 30343| 799| 126236| 10462| 924| 0| 0| 0| 0| 0| 260| 120| 0| 1044| 1042| 0| 0| 31699| 782| 114492| 9782| 884| 0| 0| 0| 0| 0| 253| 120| 0| 1005| 1003| 0| 0| 29695| 722| After ip route flush cache Kup ~ # rtstat -i60 -c6000 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 9088| 202136| 19262| 0| 29| 1| 0| 0| 5976| 2696| 0| 20647| 20606| 0| 0| 521714| 15415| !!!!! I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will make sure about that now. On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 27 Mar 2008 19:56:09 +0100 > > > Maybe some netfilter problem ? > > > > Please tell us more about your machine ;) > > I'm beginning to suspect it's IFB and shaping somehow. > > If those variables could be eliminated by eliminating them > from the configuration, it would help diagnose this a lot. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 5:25 ` Denys Fedoryshchenko @ 2008-03-28 5:49 ` Eric Dumazet 2008-03-28 7:38 ` Denys Fedoryshchenko 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2008-03-28 5:49 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: David Miller, kaber, netdev, netfilter-devel Denys Fedoryshchenko a écrit : > Just to make sure 2.6.24.3 is stable and it is regression i am supplying > output from it. > Do you want me to submit summary to bugzilla and regression list as well? > > And in short, IMHO 2.6.25 have major issues on routing that have to be fixed > before release. TRIE is crashing, and even with HASH there is leak. I am > trying my best to bisect it, but it is major router and i cannot take much > risk on it, so i wish i can simulate in my home mini-lab. Still i am not able > to get even proper switch (Lebanon difficult country for IT). > > Kup ~ # uname -a > Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknown > > up ~ # rtstat -i60 -c6000 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 54750| 4430| 1128| 0| 12| 0| 0| 0| > 263| 190| 0| 709| 708| 0| 0| 3545| 313| > 92913| 8829| 1211| 0| 1| 0| 0| 0| > 343| 163| 0| 1375| 1373| 0| 0| 12545| 724| > 115323| 8232| 906| 0| 0| 0| 0| 0| > 299| 128| 0| 1035| 1033| 0| 0| 18069| 813| > 128985| 8650| 839| 0| 0| 0| 0| 0| > 289| 115| 0| 954| 952| 0| 0| 22515| 845| > 116682| 8911| 861| 0| 0| 0| 0| 0| > 288| 117| 0| 978| 976| 0| 0| 23433| 775| > 99969| 9164| 889| 0| 0| 0| 0| 0| > 280| 113| 0| 1002| 1000| 0| 0| 26741| 839| > 124602| 9395| 1012| 0| 0| 0| 0| 0| > 271| 122| 0| 1134| 1132| 0| 0| 27381| 787| > 110051| 10036| 824| 0| 0| 0| 0| 0| > 279| 120| 0| 944| 942| 0| 0| 28558| 783| > 126835| 10631| 772| 0| 0| 0| 0| 0| > 274| 117| 0| 888| 886| 0| 0| 29451| 780| > 111881| 10357| 762| 0| 0| 0| 0| 0| > 275| 117| 0| 879| 877| 0| 0| 28235| 751| > 127018| 10178| 796| 0| 0| 0| 0| 0| > 283| 117| 0| 913| 911| 0| 0| 29480| 807| > 112242| 9839| 814| 0| 0| 0| 0| 0| > 293| 115| 0| 929| 927| 0| 0| 28095| 796| > 41267| 9493| 1217| 0| 1| 0| 0| 0| > 269| 138| 0| 811| 810| 0| 0| 18545| 548| > 76380| 9722| 1060| 0| 1| 0| 0| 0| > 250| 135| 0| 1195| 1193| 0| 0| 14786| 414| > 99922| 9811| 779| 0| 0| 0| 0| 0| > 281| 124| 0| 902| 900| 0| 0| 21853| 589| > > Kup ~ # rtstat -i60 -c6000 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > > 122053| 150955| 14888| 0| 25| 1| 0| 0| > 4611| 2090| 0| 15820| 15789| 0| 0| 369513| 11562| > 105226| 10215| 872| 0| 0| 0| 0| 0| > 279| 116| 0| 988| 986| 0| 0| 30343| 799| > 126236| 10462| 924| 0| 0| 0| 0| 0| > 260| 120| 0| 1044| 1042| 0| 0| 31699| 782| > 114492| 9782| 884| 0| 0| 0| 0| 0| > 253| 120| 0| 1005| 1003| 0| 0| 29695| 722| > > After ip route flush cache > Kup ~ # rtstat -i60 -c6000 > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > rt_cache| > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > out_hlis| > | | tot| mc| ute| | an_dst| > an_src| | _tot| _mc| | ed| miss| verflow| > _search|t_search| > 9088| 202136| 19262| 0| 29| 1| 0| 0| > 5976| 2696| 0| 20647| 20606| 0| 0| 521714| 15415| > > > !!!!! > I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will make > sure about that now. > Maybe you are a litle bit too fast for "ip route flush cache" :) It used to work like that : schedule a timer to start a flush in about 2 seconds. A flush meaning : scan the whole table and delete all entries. On machines with 4 millions dst entries, this was using too much time and eventually crashing. On recent kernels, each rtable entry has a special field named rt_genid, so that "ip route flush cache" doesnt have to scan the whole table, but only change the global genid. rtables entries will be deleted later, when their rt_genid is found to be different than the global genid. Please try the patch that was suggested yesterday, as it is probably the cure your router needs. http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160 Thank you -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 5:49 ` Eric Dumazet @ 2008-03-28 7:38 ` Denys Fedoryshchenko 2008-03-28 7:46 ` Eric Dumazet 0 siblings, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 7:38 UTC (permalink / raw) To: netdev, netfilter-devel Already patched and tested, it doesn't change anything. On Fri, 28 Mar 2008 06:49:53 +0100, Eric Dumazet wrote > Denys Fedoryshchenko a : > > Just to make sure 2.6.24.3 is stable and it is regression i am supplying > > output from it. > > Do you want me to submit summary to bugzilla and regression list as well? > > > > And in short, IMHO 2.6.25 have major issues on routing that have to be fixed > > before release. TRIE is crashing, and even with HASH there is leak. I am > > trying my best to bisect it, but it is major router and i cannot take much > > risk on it, so i wish i can simulate in my home mini-lab. Still i am not able > > to get even proper switch (Lebanon difficult country for IT). > > > > Kup ~ # uname -a > > Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknown > > > > up ~ # rtstat -i60 -c6000 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 54750| 4430| 1128| 0| 12| 0| 0| 0| > > 263| 190| 0| 709| 708| 0| 0| 3545| 313| > > 92913| 8829| 1211| 0| 1| 0| 0| 0| > > 343| 163| 0| 1375| 1373| 0| 0| 12545| 724| > > 115323| 8232| 906| 0| 0| 0| 0| 0| > > 299| 128| 0| 1035| 1033| 0| 0| 18069| 813| > > 128985| 8650| 839| 0| 0| 0| 0| 0| > > 289| 115| 0| 954| 952| 0| 0| 22515| 845| > > 116682| 8911| 861| 0| 0| 0| 0| 0| > > 288| 117| 0| 978| 976| 0| 0| 23433| 775| > > 99969| 9164| 889| 0| 0| 0| 0| 0| > > 280| 113| 0| 1002| 1000| 0| 0| 26741| 839| > > 124602| 9395| 1012| 0| 0| 0| 0| 0| > > 271| 122| 0| 1134| 1132| 0| 0| 27381| 787| > > 110051| 10036| 824| 0| 0| 0| 0| 0| > > 279| 120| 0| 944| 942| 0| 0| 28558| 783| > > 126835| 10631| 772| 0| 0| 0| 0| 0| > > 274| 117| 0| 888| 886| 0| 0| 29451| 780| > > 111881| 10357| 762| 0| 0| 0| 0| 0| > > 275| 117| 0| 879| 877| 0| 0| 28235| 751| > > 127018| 10178| 796| 0| 0| 0| 0| 0| > > 283| 117| 0| 913| 911| 0| 0| 29480| 807| > > 112242| 9839| 814| 0| 0| 0| 0| 0| > > 293| 115| 0| 929| 927| 0| 0| 28095| 796| > > 41267| 9493| 1217| 0| 1| 0| 0| 0| > > 269| 138| 0| 811| 810| 0| 0| 18545| 548| > > 76380| 9722| 1060| 0| 1| 0| 0| 0| > > 250| 135| 0| 1195| 1193| 0| 0| 14786| 414| > > 99922| 9811| 779| 0| 0| 0| 0| 0| > > 281| 124| 0| 902| 900| 0| 0| 21853| 589| > > > > Kup ~ # rtstat -i60 -c6000 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > > > 122053| 150955| 14888| 0| 25| 1| 0| 0| > > 4611| 2090| 0| 15820| 15789| 0| 0| 369513| 11562| > > 105226| 10215| 872| 0| 0| 0| 0| 0| > > 279| 116| 0| 988| 986| 0| 0| 30343| 799| > > 126236| 10462| 924| 0| 0| 0| 0| 0| > > 260| 120| 0| 1044| 1042| 0| 0| 31699| 782| > > 114492| 9782| 884| 0| 0| 0| 0| 0| > > 253| 120| 0| 1005| 1003| 0| 0| 29695| 722| > > > > After ip route flush cache > > Kup ~ # rtstat -i60 -c6000 > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| > > rt_cache| > > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| > > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist| > > out_hlis| > > | | tot| mc| ute| | an_dst| > > an_src| | _tot| _mc| | ed| miss| verflow| > > _search|t_search| > > 9088| 202136| 19262| 0| 29| 1| 0| 0| > > 5976| 2696| 0| 20647| 20606| 0| 0| 521714| 15415| > > > > > > !!!!! > > I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will make > > sure about that now. > > > Maybe you are a litle bit too fast for "ip route flush cache" :) > > It used to work like that : schedule a timer to start a flush in > about 2 seconds. A flush meaning : scan the whole table and delete > all entries. > > On machines with 4 millions dst entries, this was using too much > time and eventually crashing. > > On recent kernels, each rtable entry has a special field named > rt_genid, so that "ip route flush cache" doesnt have to scan the > whole table, but only change the global genid. rtables entries will > be deleted later, when their rt_genid is found to be different than > the global genid. > > Please try the patch that was suggested yesterday, as it is probably > the cure your router needs. > > http://git2.kernel.org/?p=linux/kernel/git/davem/net- 2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160 > > Thank you > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 7:38 ` Denys Fedoryshchenko @ 2008-03-28 7:46 ` Eric Dumazet 2008-03-28 9:50 ` Denys Fedoryshchenko 2008-03-28 15:57 ` Eric Dumazet 0 siblings, 2 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-28 7:46 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, netfilter-devel Denys Fedoryshchenko a écrit : > Already patched and tested, it doesn't change anything. > > We still leak dsts somewhere. You could try git bisect, or try to patch net/core/dst.c so that dst_gc_task() (line 83) displays route informations for say 10 first entries found in the dst_busy_list (refcnt, interface, source IP, dest IP, things like that) that could ring a bell given your netfilter rules or network conf. > >> Maybe you are a litle bit too fast for "ip route flush cache" :) >> >> It used to work like that : schedule a timer to start a flush in >> about 2 seconds. A flush meaning : scan the whole table and delete >> all entries. >> >> On machines with 4 millions dst entries, this was using too much >> time and eventually crashing. >> >> On recent kernels, each rtable entry has a special field named >> rt_genid, so that "ip route flush cache" doesnt have to scan the >> whole table, but only change the global genid. rtables entries will >> be deleted later, when their rt_genid is found to be different than >> the global genid. >> >> Please try the patch that was suggested yesterday, as it is probably >> the cure your router needs. >> >> http://git2.kernel.org/?p=linux/kernel/git/davem/net- >> > 2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160 > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 7:46 ` Eric Dumazet @ 2008-03-28 9:50 ` Denys Fedoryshchenko 2008-03-28 15:57 ` Eric Dumazet 1 sibling, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 9:50 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, netfilter-devel I will try do so, And i have a bit more information, not sure if it is useful. I took risk and running now one more loaded router, without nat this time. It has significant less load, but maybe i will catch something here also. Here is dmesg i got [ 23.280155] dst_total: 4 delayed: 1 work_perf: 0 expires: 600 elapsed: 1 us [ 23.888719] dst_total: 5 delayed: 1 work_perf: 0 expires: 1600 elapsed: 1 us [ 25.489486] dst_total: 11 delayed: 0 work_perf: 1 expires: 4294967295 elapsed: 2 us [ 67.187254] dst_total: 23980 delayed: 1 work_perf: 0 expires: 600 elapsed: 2 us [ 67.807126] dst_total: 24452 delayed: 1 work_perf: 0 expires: 1600 elapsed: 3 us [ 69.453570] dst_total: 25103 delayed: 0 work_perf: 2 expires: 4294967295 elapsed: 3 us >>> here i run "ip route flush cache" [ 278.911357] dst_total: 16855 delayed: 2 work_perf: 0 expires: 600 elapsed: 2 us [ 279.530432] dst_total: 16866 delayed: 2 work_perf: 0 expires: 1600 elapsed: 1 us [ 281.197568] dst_total: 16901 delayed: 2 work_perf: 0 expires: 3100 elapsed: 1 us [ 284.425797] dst_total: 16981 delayed: 2 work_perf: 0 expires: 4981 elapsed: 1 us [ 289.665137] dst_total: 17067 delayed: 2 work_perf: 0 expires: 8000 elapsed: 1 us [ 297.960978] dst_total: 17219 delayed: 2 work_perf: 0 expires: 11000 elapsed: 1 us [ 309.379867] dst_total: 17426 delayed: 2 work_perf: 0 expires: 14100 elapsed: 2 us [ 323.972039] dst_total: 17629 delayed: 2 work_perf: 0 expires: 18196 elapsed: 2 us [ 342.831626] dst_total: 13563 delayed: 2 work_perf: 0 expires: 23000 elapsed: 2 us [ 366.592260] dst_total: 13830 delayed: 2 work_perf: 0 expires: 28000 elapsed: 2 us [ 395.753299] dst_total: 14142 delayed: 2 work_perf: 0 expires: 33000 elapsed: 2 us [ 429.952513] dst_total: 13156 delayed: 3 work_perf: 0 expires: 600 elapsed: 2 us [ 430.565783] dst_total: 13164 delayed: 3 work_perf: 0 expires: 1600 elapsed: 1 us [ 432.267868] dst_total: 13184 delayed: 3 work_perf: 0 expires: 3100 elapsed: 1 us [ 435.457375] dst_total: 13220 delayed: 3 work_perf: 0 expires: 5700 elapsed: 2 us [ 441.355880] dst_total: 13279 delayed: 3 work_perf: 0 expires: 8000 elapsed: 2 us [ 449.567527] dst_total: 13363 delayed: 3 work_perf: 0 expires: 11000 elapsed: 1 us [ 461.027956] dst_total: 13507 delayed: 3 work_perf: 0 expires: 14000 elapsed: 2 us [ 475.568361] dst_total: 12084 delayed: 3 work_perf: 0 expires: 18000 elapsed: 2 us Is expires values appearing 4294967295 normal? Btw i am doing ip route flush cache only manually, normally it never happens on this machines. On Fri, 28 Mar 2008 08:46:20 +0100, Eric Dumazet wrote > Denys Fedoryshchenko : > > Already patched and tested, it doesn't change anything. > > > > > > We still leak dsts somewhere. > > You could try git bisect, or try to patch net/core/dst.c so that > dst_gc_task() (line 83) displays > route informations for say 10 first entries found in the dst_busy_list > (refcnt, interface, source IP, dest IP, things like that) that could > ring a bell given your netfilter rules or network conf. -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 7:46 ` Eric Dumazet 2008-03-28 9:50 ` Denys Fedoryshchenko @ 2008-03-28 15:57 ` Eric Dumazet 2008-03-28 20:23 ` Stephen Hemminger 1 sibling, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2008-03-28 15:57 UTC (permalink / raw) To: Eric Dumazet; +Cc: Denys Fedoryshchenko, netdev, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 878 bytes --] Eric Dumazet a écrit : > Denys Fedoryshchenko a écrit : >> Already patched and tested, it doesn't change anything. >> >> > > We still leak dsts somewhere. > > You could try git bisect, or try to patch net/core/dst.c so that > dst_gc_task() (line 83) displays > route informations for say 10 first entries found in the dst_busy_list > (refcnt, interface, source IP, dest IP, things like that) that could > ring a bell given your netfilter rules or network conf. I cooked a patch (untested) to implement this idea : It should display lines similar to /proc/net/rt_cache (reusing the same helper function) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> include/net/dst.h | 2 + net/core/dst.c | 7 ++++++ net/ipv4/route.c | 47 ++++++++++++++++++++++++++------------------ 3 files changed, 37 insertions(+), 19 deletions(-) [-- Attachment #2: show_info.patch --] [-- Type: text/plain, Size: 4070 bytes --] diff --git a/include/net/dst.h b/include/net/dst.h index 002500e..2fe9a6c 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -103,6 +103,8 @@ struct dst_ops void (*link_failure)(struct sk_buff *); void (*update_pmtu)(struct dst_entry *dst, u32 mtu); int (*local_out)(struct sk_buff *skb); + size_t (*info_show)(struct dst_entry *, char *buff, + size_t len); int entry_size; atomic_t entries; diff --git a/net/core/dst.c b/net/core/dst.c index fe03266..08395e8 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -70,6 +70,7 @@ static void dst_gc_task(struct work_struct *work) #if RT_CACHE_DEBUG >= 2 ktime_t time_start = ktime_get(); struct timespec elapsed; + char info[128]; #endif mutex_lock(&dst_gc_mutex); @@ -82,6 +83,12 @@ loop: if (likely(atomic_read(&dst->__refcnt))) { last->next = dst; last = dst; +#if RT_CACHE_DEBUG >= 2 + if (delayed < 10 && dst->ops->info_show) { + dst->ops->info_show(dst, info, sizeof(info)); + printk(KERN_DEBUG "dstgc [%d] %s\n", delayed, info); + } +#endif delayed++; continue; } diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 230716c..ef8c3b7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -148,6 +148,8 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst); static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu); static int rt_garbage_collect(struct dst_ops *ops); +static size_t rt_info_show(struct dst_entry *dst, + char *buff, size_t maxlen); static struct dst_ops ipv4_dst_ops = { @@ -161,6 +163,7 @@ static struct dst_ops ipv4_dst_ops = { .link_failure = ipv4_link_failure, .update_pmtu = ip_rt_update_pmtu, .local_out = ip_local_out, + .info_show = rt_info_show, .entry_size = sizeof(struct rtable), .entries = ATOMIC_INIT(0), }; @@ -269,6 +272,28 @@ static unsigned int rt_hash_code(u32 daddr, u32 saddr) rt_hash_code((__force u32)(__be32)(daddr),\ (__force u32)(__be32)(saddr) ^ ((idx) << 5)) +static size_t rt_info_show(struct dst_entry *dst, char *buff, size_t maxlen) +{ + struct rtable *r = (struct rtable *)dst; + + return snprintf(buff, maxlen, "%s\t%08lX\t%08lX\t%8X\t%d\t%u\t%d\t" + "%08lX\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X", + r->u.dst.dev ? r->u.dst.dev->name : "*", + (unsigned long)r->rt_dst, (unsigned long)r->rt_gateway, + r->rt_flags, atomic_read(&r->u.dst.__refcnt), + r->u.dst.__use, 0, (unsigned long)r->rt_src, + (dst_metric(&r->u.dst, RTAX_ADVMSS) ? + (int)dst_metric(&r->u.dst, RTAX_ADVMSS) + 40 : 0), + dst_metric(&r->u.dst, RTAX_WINDOW), + (int)((dst_metric(&r->u.dst, RTAX_RTT) >> 3) + + dst_metric(&r->u.dst, RTAX_RTTVAR)), + r->fl.fl4_tos, + r->u.dst.hh ? atomic_read(&r->u.dst.hh->hh_refcnt) : -1, + r->u.dst.hh ? (r->u.dst.hh->hh_output == + dev_queue_xmit) : 0, + r->rt_spec_dst); +} + #ifdef CONFIG_PROC_FS struct rt_cache_iter_state { struct seq_net_private p; @@ -368,25 +393,9 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v) "Metric\tSource\t\tMTU\tWindow\tIRTT\tTOS\tHHRef\t" "HHUptod\tSpecDst"); else { - struct rtable *r = v; - char temp[256]; - - sprintf(temp, "%s\t%08lX\t%08lX\t%8X\t%d\t%u\t%d\t" - "%08lX\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X", - r->u.dst.dev ? r->u.dst.dev->name : "*", - (unsigned long)r->rt_dst, (unsigned long)r->rt_gateway, - r->rt_flags, atomic_read(&r->u.dst.__refcnt), - r->u.dst.__use, 0, (unsigned long)r->rt_src, - (dst_metric(&r->u.dst, RTAX_ADVMSS) ? - (int)dst_metric(&r->u.dst, RTAX_ADVMSS) + 40 : 0), - dst_metric(&r->u.dst, RTAX_WINDOW), - (int)((dst_metric(&r->u.dst, RTAX_RTT) >> 3) + - dst_metric(&r->u.dst, RTAX_RTTVAR)), - r->fl.fl4_tos, - r->u.dst.hh ? atomic_read(&r->u.dst.hh->hh_refcnt) : -1, - r->u.dst.hh ? (r->u.dst.hh->hh_output == - dev_queue_xmit) : 0, - r->rt_spec_dst); + char temp[128]; + + rt_info_show(v, temp, 128); seq_printf(seq, "%-127s\n", temp); } return 0; ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 15:57 ` Eric Dumazet @ 2008-03-28 20:23 ` Stephen Hemminger 2008-03-28 20:45 ` Denys Fedoryshchenko 0 siblings, 1 reply; 29+ messages in thread From: Stephen Hemminger @ 2008-03-28 20:23 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, Denys Fedoryshchenko, netdev, netfilter-devel On Fri, 28 Mar 2008 16:57:55 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote: > Eric Dumazet a écrit : > > Denys Fedoryshchenko a écrit : > >> Already patched and tested, it doesn't change anything. > >> > >> > > > > We still leak dsts somewhere. > > > > You could try git bisect, or try to patch net/core/dst.c so that > > dst_gc_task() (line 83) displays > > route informations for say 10 first entries found in the dst_busy_list > > (refcnt, interface, source IP, dest IP, things like that) that could > > ring a bell given your netfilter rules or network conf. > > I cooked a patch (untested) to implement this idea : > > It should display lines similar to /proc/net/rt_cache (reusing the same > helper function) > > I wonder how much the route cache really helps when it grows so large? Robert Olsson had suggested that turning it off when routing would help. Perhaps the route cache is only really useful for local destinations? If the cost of maintaining the route cache exceeds the cost of just using the existing route table, there is no value to having a route cache. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 20:23 ` Stephen Hemminger @ 2008-03-28 20:45 ` Denys Fedoryshchenko 2008-03-28 21:17 ` Eric Dumazet 0 siblings, 1 reply; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-28 20:45 UTC (permalink / raw) To: Eric Dumazet, Stephen Hemminger; +Cc: netdev, netfilter-devel It seems or patch change something (but it is just showing debug, strange), or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC- trie working fine, HASH also i cannot see any leaks. I will have to wait 5-6 hours to make sure. After this time pass, if i will not see bug again, i will try to run kernel just with default debug like before. If it is required, i can test performance and cpu load with/without routing cache on real workload. Sure it is better to have syntetic tests before that. On Fri, 28 Mar 2008 13:23:59 -0700, Stephen Hemminger wrote > On Fri, 28 Mar 2008 16:57:55 +0100 > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > Eric Dumazet a [UTF-8?] : > > > Denys Fedoryshchenko a [UTF-8?] : > > >> Already patched and tested, it doesn't change anything. > > >> > > >> > > > > > > We still leak dsts somewhere. > > > > > > You could try git bisect, or try to patch net/core/dst.c so that > > > dst_gc_task() (line 83) displays > > > route informations for say 10 first entries found in the dst_busy_list > > > (refcnt, interface, source IP, dest IP, things like that) that could > > > ring a bell given your netfilter rules or network conf. > > > > I cooked a patch (untested) to implement this idea : > > > > It should display lines similar to /proc/net/rt_cache (reusing the same > > helper function) > > > > > > I wonder how much the route cache really helps when it grows so > large? Robert Olsson had suggested that turning it off when routing > would help. Perhaps the route cache is only really useful for local > destinations? If the cost of maintaining the route cache exceeds the > cost of just using the existing route table, there is no value to > having a route cache. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 20:45 ` Denys Fedoryshchenko @ 2008-03-28 21:17 ` Eric Dumazet 2008-03-28 23:21 ` David Miller [not found] ` <20080328220902.M5377@visp.net.lb> 0 siblings, 2 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-28 21:17 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Stephen Hemminger, netdev, netfilter-devel Denys Fedoryshchenko a écrit : > It seems or patch change something (but it is just showing debug, strange), > or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC- > trie working fine, HASH also i cannot see any leaks. Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :( commit 7c0ecc4c4f8fd90988aab8a95297b9c0038b6160 [ICMP]: Dst entry leak in icmp_send host re-lookup code (v2). Commit 8b7817f3a959ed99d7443afc12f78a7e1fcc2063 ([IPSEC]: Add ICMP host relookup support) introduced some dst leaks on error paths: the rt pointer can be forgotten to be put. Fix it bu going to a proper label. Found after net namespace's lo refused to unregister :) Many thanks to Den for valuable help during debugging. Herbert pointed out, that xfrm_lookup() will put the rtable in case of error itself, so the first goto fix is redundant. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> > > I will have to wait 5-6 hours to make sure. After this time pass, if i will > not see bug again, i will try to run kernel just with default debug like > before. > > If it is required, i can test performance and cpu load with/without routing > cache on real workload. Sure it is better to have syntetic tests before that. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 21:17 ` Eric Dumazet @ 2008-03-28 23:21 ` David Miller 2008-03-29 9:35 ` Denys Fedoryshchenko [not found] ` <20080328220902.M5377@visp.net.lb> 1 sibling, 1 reply; 29+ messages in thread From: David Miller @ 2008-03-28 23:21 UTC (permalink / raw) To: dada1; +Cc: denys, stephen.hemminger, netdev, netfilter-devel From: Eric Dumazet <dada1@cosmosbay.com> Date: Fri, 28 Mar 2008 22:17:26 +0100 > Denys Fedoryshchenko a écrit : > > It seems or patch change something (but it is just showing debug, strange), > > or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC- > > trie working fine, HASH also i cannot see any leaks. > > Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :( Denys please be more careful in the future :-( So much time and effort got wasted because of this. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-28 23:21 ` David Miller @ 2008-03-29 9:35 ` Denys Fedoryshchenko 0 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-29 9:35 UTC (permalink / raw) To: netdev, netfilter-devel I checked whole night, shame on me. Seems this patch was curing the problem. Probably it was rejected, and i didn't notice. I will be more careful, sorry again. On Fri, 28 Mar 2008 16:21:37 -0700 (PDT), David Miller wrote > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Fri, 28 Mar 2008 22:17:26 +0100 > > > Denys Fedoryshchenko a : > > > It seems or patch change something (but it is just showing debug, strange), > > > or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7- git3. LC- > > > trie working fine, HASH also i cannot see any leaks. > > > > Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :( > > Denys please be more careful in the future :-( > > So much time and effort got wasted because of this. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20080328220902.M5377@visp.net.lb>]
* Re: kernel 2.6.25-rc7 highly unstable on high load [not found] ` <20080328220902.M5377@visp.net.lb> @ 2008-03-29 8:31 ` Eric Dumazet 0 siblings, 0 replies; 29+ messages in thread From: Eric Dumazet @ 2008-03-29 8:31 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Stephen Hemminger, netdev, netfilter-devel Denys Fedoryshchenko a écrit : > It looks like probably i did mistake while applying it. Very strange, cause i > rechecked it, and i did make clean before building system image. > I will make sure on vanilla code that everything is fine. > > Btw, i see permanently 4 "delayed"(?) records. > > [ 6261.050827] dstgc [0] lo 09E9FED5 09E9FED5 > 80000000 5 29 0 8F323353 0 0 > 0 00 -1 009E9FED5 > [ 6261.050834] dstgc [1] lo 09E9FED5 09E9FED5 > 80000000 37 358 0 FC9B62D4 0 0 > 0 00 -1 009E9FED5 > [ 6261.050838] dstgc [2] lo 09E9FED5 09E9FED5 > 80000000 4 8 0 7048DF48 0 0 > 0 00 -1 009E9FED5 > [ 6261.050842] dstgc [3] lo 09E9FED5 09E9FED5 > 80000000 6 5 0 6924E846 0 0 > 0 00 -1 009E9FED5 > [ 6261.050851] dst_total: 129123 delayed: 4 work_perf: 0 expires: 68000 > elapsed: 27 us > > Especially interesting one with refcnt 37 and dst.__use 358. I check for > around 5 minutes - there is no traffic at all. This record staying for long > time and there is no activity on it over "tcpdump -ni any host > 212.98.155.252" , and it is not flushing it over ip route flush cache. > This route with refcnt 37 has Source IP = FC9B62D4, which is 212.98.155.252 Check that you dont have some tcp sessions with this remote address : ss dst 212.98.155.252 As long as you have no activity on it, refcnt wont change. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 6:40 ` David Miller 2008-03-27 7:03 ` Eric Dumazet @ 2008-03-27 7:07 ` Denys Fedoryshchenko 2008-03-27 15:01 ` Patrick McHardy 2 siblings, 0 replies; 29+ messages in thread From: Denys Fedoryshchenko @ 2008-03-27 7:07 UTC (permalink / raw) To: David Miller; +Cc: netdev, kaber, netfilter-devel Sorry, mixed 2.6.25-rc7 (rc7 seems migrated to 2X :-)) sleepless night. I can provide image of system (running on 128MB USB flash), it is Core 2 Duo CPU, 3xe100, 1xe1000e, nat, shaping over ifb with htb, sure some iptables filtering rules. But the problem to simulate the load. Biggest problem, why it doesn't reboot with all this watchdog stuff enabled, it makes serious headache by that. Now we are installing power switch, so it will help somehow. On Wed, 26 Mar 2008 23:40:49 -0700 (PDT), David Miller wrote > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > Date: Thu, 27 Mar 2008 08:35:06 +0200 > > > It seems i am having very bad luck with 2.6.27. As Linus told, it have to be > > released soon, but it is crashing like hell on high network load. > > That's amazing, you've taken a trip into the future and are running > 2.6.27 already, please let me borrow your time machine :-) > > More seriously, there is obviously something very unique to your > setup or else everyone would be reporting this crash, and we have > to find out what that might be. > > There seems to be bunch of netfilter stuff in your traces, but > the top of the trace is somewhere totally unrelated. This is > a common reoccurance in your crash traces, making them less > useful than they could be. > > I know you asked before what can be done to improve the traces, > but I'm not an x86 expert so I have no idea how to help you > in that area. > > Patrick, could you see if you can make any sense of his log? > I see conttrack a lot in the backtraces. > > Thanks. > > > Here is a message i got over syslog on last crash (it was 2.6.25-rc6- git6), > > available also at http://www.nuclearcat.com/files/crash_2.6.25.txt > > > > Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP > > Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers: > > Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000 > > task=f7459080 task.ti=f70b7000) > > Mar 26 02:27:14 ROUTER > > Mar 26 02:27:14 ROUTER [ 4698.694693] Stack: > > Mar 26 02:27:14 ROUTER c092eb14 > > Mar 26 02:27:14 ROUTER c011991e > > Mar 26 02:27:14 ROUTER f750d600 > > Mar 26 02:27:14 ROUTER f750d600 > > Mar 26 02:27:14 ROUTER c0378058 > > Mar 26 02:27:14 ROUTER 00000001 > > Mar 26 02:27:14 ROUTER c092eb34 > > Mar 26 02:27:14 ROUTER c0119b3b > > Mar 26 02:27:14 ROUTER > > Mar 26 02:27:14 ROUTER [ 4698.694693] > > Mar 26 02:27:14 ROUTER 00000000 > > Mar 26 02:27:14 ROUTER 00000001 > > Mar 26 02:27:14 ROUTER 00000082 > > Mar 26 02:27:14 ROUTER f708af88 > > Mar 26 02:27:14 ROUTER c0378058 > > Mar 26 02:27:14 ROUTER 00000001 > > Mar 26 02:27:14 ROUTER c092eb3c > > Mar 26 02:27:14 ROUTER c0119bfe > > Mar 26 02:27:14 ROUTER > > Mar 26 02:27:14 ROUTER [ 4698.694693] > > Mar 26 02:27:14 ROUTER c092eb50 > > Mar 26 02:27:14 ROUTER c012f19c > > Mar 26 02:27:14 ROUTER 00000000 > > Mar 26 02:27:14 ROUTER f708af88 > > Mar 26 02:27:14 ROUTER c0378058 > > Mar 26 02:27:14 ROUTER c092eb74 > > Mar 26 02:27:14 ROUTER c011652a > > Mar 26 02:27:14 ROUTER 00000000 > > Mar 26 02:27:14 ROUTER > > Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace: > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER printk+0x15/0x17 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER update_curr+0x48/0x64 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat] > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5 > > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER inet_release+0x42/0x48 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER sock_release+0x14/0x60 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER sock_close+0x29/0x30 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER __fput+0x93/0x135 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER fput+0x17/0x19 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER filp_close+0x47/0x51 > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d > > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>] > > Mar 26 02:27:14 ROUTER ? > > Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85 > > Mar 26 02:27:14 ROUTER [ 4698.694694] ======================= > > Mar 26 02:27:14 ROUTER [ 4698.694694] Code: > > Mar 26 02:27:14 ROUTER 94 > > Mar 26 02:27:14 ROUTER c0 > > Mar 26 02:27:14 ROUTER 84 > > Mar 26 02:27:14 ROUTER c0 > > Mar 26 02:27:14 ROUTER b9 > > Mar 26 02:27:14 ROUTER 01 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 75 > > Mar 26 02:27:14 ROUTER 09 > > Mar 26 02:27:14 ROUTER f0 > > Mar 26 02:27:14 ROUTER 81 > > Mar 26 02:27:14 ROUTER 02 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 01 > > Mar 26 02:27:14 ROUTER 30 > > Mar 26 02:27:14 ROUTER c9 > > Mar 26 02:27:14 ROUTER 5d > > Mar 26 02:27:14 ROUTER 89 > > Mar 26 02:27:14 ROUTER c8 > > Mar 26 02:27:14 ROUTER c3 > > Mar 26 02:27:14 ROUTER 55 > > Mar 26 02:27:14 ROUTER ba > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 01 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 89 > > Mar 26 02:27:14 ROUTER e5 > > Mar 26 02:27:14 ROUTER f0 > > Mar 26 02:27:14 ROUTER 66 > > Mar 26 02:27:14 ROUTER 0f > > Mar 26 02:27:14 ROUTER c1 > > Mar 26 02:27:14 ROUTER 10 > > Mar 26 02:27:14 ROUTER 38 > > Mar 26 02:27:14 ROUTER f2 > > Mar 26 02:27:14 ROUTER 74 > > Mar 26 02:27:14 ROUTER 06 > > Mar 26 02:27:14 ROUTER f3 > > Mar 26 02:27:14 ROUTER 90 > > Mar 26 02:27:14 ROUTER unparseable log message: "<8a> " > > Mar 26 02:27:14 ROUTER 10 > > Mar 26 02:27:14 ROUTER eb > > Mar 26 02:27:14 ROUTER f6 > > Mar 26 02:27:14 ROUTER 5d > > Mar 26 02:27:14 ROUTER c3 > > Mar 26 02:27:14 ROUTER 55 > > Mar 26 02:27:14 ROUTER 89 > > Mar 26 02:27:14 ROUTER e5 > > Mar 26 02:27:14 ROUTER f0 > > Mar 26 02:27:14 ROUTER 81 > > Mar 26 02:27:14 ROUTER 28 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 00 > > Mar 26 02:27:14 ROUTER 01 > > Mar 26 02:27:14 ROUTER 74 > > Mar 26 02:27:14 ROUTER 05 > > Mar 26 02:27:14 ROUTER e8 > > Mar 26 02:27:14 ROUTER 64 > > Mar 26 02:27:14 ROUTER fd > > Mar 26 02:27:14 ROUTER > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load 2008-03-27 6:40 ` David Miller 2008-03-27 7:03 ` Eric Dumazet 2008-03-27 7:07 ` Denys Fedoryshchenko @ 2008-03-27 15:01 ` Patrick McHardy 2 siblings, 0 replies; 29+ messages in thread From: Patrick McHardy @ 2008-03-27 15:01 UTC (permalink / raw) To: David Miller; +Cc: denys, netdev, netfilter-devel David Miller wrote: > From: "Denys Fedoryshchenko" <denys@visp.net.lb> > Date: Thu, 27 Mar 2008 08:35:06 +0200 > >> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be >> released soon, but it is crashing like hell on high network load. > > That's amazing, you've taken a trip into the future and are running > 2.6.27 already, please let me borrow your time machine :-) > > More seriously, there is obviously something very unique to your > setup or else everyone would be reporting this crash, and we have > to find out what that might be. > > There seems to be bunch of netfilter stuff in your traces, but > the top of the trace is somewhere totally unrelated. This is > a common reoccurance in your crash traces, making them less > useful than they could be. > > I know you asked before what can be done to improve the traces, > but I'm not an x86 expert so I have no idea how to help you > in that area. > > Patrick, could you see if you can make any sense of his log? > I see conttrack a lot in the backtraces. The conntrack stuff looks harmless, I went through the code just to make sure, but I can't see anything wrong there. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2008-03-29 9:35 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-27 6:35 kernel 2.6.25-rc7 highly unstable on high load Denys Fedoryshchenko
2008-03-27 6:40 ` David Miller
2008-03-27 7:03 ` Eric Dumazet
2008-03-27 8:36 ` Denys Fedoryshchenko
2008-03-27 8:48 ` Denys Fedoryshchenko
2008-03-27 14:41 ` Denys Fedoryshchenko
2008-03-27 14:52 ` Eric Dumazet
[not found] ` <20080327151520.M89250@visp.net.lb>
2008-03-27 16:07 ` Eric Dumazet
2008-03-27 16:29 ` Eric Dumazet
2008-03-27 18:37 ` Denys Fedoryshchenko
2008-03-27 18:56 ` Eric Dumazet
2008-03-27 19:05 ` Denis V. Lunev
2008-03-27 22:03 ` David Miller
2008-03-28 0:47 ` Denys Fedoryshchenko
2008-03-28 4:50 ` Denys Fedoryshchenko
2008-03-28 5:25 ` Denys Fedoryshchenko
2008-03-28 5:49 ` Eric Dumazet
2008-03-28 7:38 ` Denys Fedoryshchenko
2008-03-28 7:46 ` Eric Dumazet
2008-03-28 9:50 ` Denys Fedoryshchenko
2008-03-28 15:57 ` Eric Dumazet
2008-03-28 20:23 ` Stephen Hemminger
2008-03-28 20:45 ` Denys Fedoryshchenko
2008-03-28 21:17 ` Eric Dumazet
2008-03-28 23:21 ` David Miller
2008-03-29 9:35 ` Denys Fedoryshchenko
[not found] ` <20080328220902.M5377@visp.net.lb>
2008-03-29 8:31 ` Eric Dumazet
2008-03-27 7:07 ` Denys Fedoryshchenko
2008-03-27 15:01 ` Patrick McHardy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).