* kernel 2.6.25-rc7 highly unstable on high load
@ 2008-03-27 6:35 Denys Fedoryshchenko
2008-03-27 6:40 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 6:35 UTC (permalink / raw)
To: netdev
Hi again
It seems i am having very bad luck with 2.6.27. As Linus told, it have to be
released soon, but it is crashing like hell on high network load.
Even i change FIB to HASH, still after 3-4 hours running it stuck. And
NOTHING helps. Software watchdog (hardware iTCO_wdt probably not working at
all on ICH9 Intel boards),nmi_watchdog (even panic on oops set, and reboot on
panic set too), hangcheck-timer, shell script with ping watchdog. Nothing
helps. So probably people with heavy loaded machines without IPMI and
management cards will be very "happy".
I have now watchdog just doing malloc/free loop, i will try to modify it, so
it will send \000 only when ping succeeded.
One crash i had netconsole enabled, next crash it was disabled.
Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6),
available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000
task=f7459080 task.ti=f70b7000)
Mar 26 02:27:14 ROUTER
Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
Mar 26 02:27:14 ROUTER c092eb14
Mar 26 02:27:14 ROUTER c011991e
Mar 26 02:27:14 ROUTER f750d600
Mar 26 02:27:14 ROUTER f750d600
Mar 26 02:27:14 ROUTER c0378058
Mar 26 02:27:14 ROUTER 00000001
Mar 26 02:27:14 ROUTER c092eb34
Mar 26 02:27:14 ROUTER c0119b3b
Mar 26 02:27:14 ROUTER
Mar 26 02:27:14 ROUTER [ 4698.694693]
Mar 26 02:27:14 ROUTER 00000000
Mar 26 02:27:14 ROUTER 00000001
Mar 26 02:27:14 ROUTER 00000082
Mar 26 02:27:14 ROUTER f708af88
Mar 26 02:27:14 ROUTER c0378058
Mar 26 02:27:14 ROUTER 00000001
Mar 26 02:27:14 ROUTER c092eb3c
Mar 26 02:27:14 ROUTER c0119bfe
Mar 26 02:27:14 ROUTER
Mar 26 02:27:14 ROUTER [ 4698.694693]
Mar 26 02:27:14 ROUTER c092eb50
Mar 26 02:27:14 ROUTER c012f19c
Mar 26 02:27:14 ROUTER 00000000
Mar 26 02:27:14 ROUTER f708af88
Mar 26 02:27:14 ROUTER c0378058
Mar 26 02:27:14 ROUTER c092eb74
Mar 26 02:27:14 ROUTER c011652a
Mar 26 02:27:14 ROUTER 00000000
Mar 26 02:27:14 ROUTER
Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER printk+0x15/0x17
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER __fput+0x93/0x135
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER fput+0x17/0x19
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
Mar 26 02:27:14 ROUTER ?
Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
Mar 26 02:27:14 ROUTER 94
Mar 26 02:27:14 ROUTER c0
Mar 26 02:27:14 ROUTER 84
Mar 26 02:27:14 ROUTER c0
Mar 26 02:27:14 ROUTER b9
Mar 26 02:27:14 ROUTER 01
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 75
Mar 26 02:27:14 ROUTER 09
Mar 26 02:27:14 ROUTER f0
Mar 26 02:27:14 ROUTER 81
Mar 26 02:27:14 ROUTER 02
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 01
Mar 26 02:27:14 ROUTER 30
Mar 26 02:27:14 ROUTER c9
Mar 26 02:27:14 ROUTER 5d
Mar 26 02:27:14 ROUTER 89
Mar 26 02:27:14 ROUTER c8
Mar 26 02:27:14 ROUTER c3
Mar 26 02:27:14 ROUTER 55
Mar 26 02:27:14 ROUTER ba
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 01
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 89
Mar 26 02:27:14 ROUTER e5
Mar 26 02:27:14 ROUTER f0
Mar 26 02:27:14 ROUTER 66
Mar 26 02:27:14 ROUTER 0f
Mar 26 02:27:14 ROUTER c1
Mar 26 02:27:14 ROUTER 10
Mar 26 02:27:14 ROUTER 38
Mar 26 02:27:14 ROUTER f2
Mar 26 02:27:14 ROUTER 74
Mar 26 02:27:14 ROUTER 06
Mar 26 02:27:14 ROUTER f3
Mar 26 02:27:14 ROUTER 90
Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
Mar 26 02:27:14 ROUTER 10
Mar 26 02:27:14 ROUTER eb
Mar 26 02:27:14 ROUTER f6
Mar 26 02:27:14 ROUTER 5d
Mar 26 02:27:14 ROUTER c3
Mar 26 02:27:14 ROUTER 55
Mar 26 02:27:14 ROUTER 89
Mar 26 02:27:14 ROUTER e5
Mar 26 02:27:14 ROUTER f0
Mar 26 02:27:14 ROUTER 81
Mar 26 02:27:14 ROUTER 28
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 00
Mar 26 02:27:14 ROUTER 01
Mar 26 02:27:14 ROUTER 74
Mar 26 02:27:14 ROUTER 05
Mar 26 02:27:14 ROUTER e8
Mar 26 02:27:14 ROUTER 64
Mar 26 02:27:14 ROUTER fd
Mar 26 02:27:14 ROUTER
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 6:35 kernel 2.6.25-rc7 highly unstable on high load Denys Fedoryshchenko
@ 2008-03-27 6:40 ` David Miller
2008-03-27 7:03 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: David Miller @ 2008-03-27 6:40 UTC (permalink / raw)
To: denys; +Cc: netdev, kaber, netfilter-devel
From: "Denys Fedoryshchenko" <denys@visp.net.lb>
Date: Thu, 27 Mar 2008 08:35:06 +0200
> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be
> released soon, but it is crashing like hell on high network load.
That's amazing, you've taken a trip into the future and are running
2.6.27 already, please let me borrow your time machine :-)
More seriously, there is obviously something very unique to your
setup or else everyone would be reporting this crash, and we have
to find out what that might be.
There seems to be bunch of netfilter stuff in your traces, but
the top of the trace is somewhere totally unrelated. This is
a common reoccurance in your crash traces, making them less
useful than they could be.
I know you asked before what can be done to improve the traces,
but I'm not an x86 expert so I have no idea how to help you
in that area.
Patrick, could you see if you can make any sense of his log?
I see conttrack a lot in the backtraces.
Thanks.
> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6),
> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
>
> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000
> task=f7459080 task.ti=f70b7000)
> Mar 26 02:27:14 ROUTER
> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
> Mar 26 02:27:14 ROUTER c092eb14
> Mar 26 02:27:14 ROUTER c011991e
> Mar 26 02:27:14 ROUTER f750d600
> Mar 26 02:27:14 ROUTER f750d600
> Mar 26 02:27:14 ROUTER c0378058
> Mar 26 02:27:14 ROUTER 00000001
> Mar 26 02:27:14 ROUTER c092eb34
> Mar 26 02:27:14 ROUTER c0119b3b
> Mar 26 02:27:14 ROUTER
> Mar 26 02:27:14 ROUTER [ 4698.694693]
> Mar 26 02:27:14 ROUTER 00000000
> Mar 26 02:27:14 ROUTER 00000001
> Mar 26 02:27:14 ROUTER 00000082
> Mar 26 02:27:14 ROUTER f708af88
> Mar 26 02:27:14 ROUTER c0378058
> Mar 26 02:27:14 ROUTER 00000001
> Mar 26 02:27:14 ROUTER c092eb3c
> Mar 26 02:27:14 ROUTER c0119bfe
> Mar 26 02:27:14 ROUTER
> Mar 26 02:27:14 ROUTER [ 4698.694693]
> Mar 26 02:27:14 ROUTER c092eb50
> Mar 26 02:27:14 ROUTER c012f19c
> Mar 26 02:27:14 ROUTER 00000000
> Mar 26 02:27:14 ROUTER f708af88
> Mar 26 02:27:14 ROUTER c0378058
> Mar 26 02:27:14 ROUTER c092eb74
> Mar 26 02:27:14 ROUTER c011652a
> Mar 26 02:27:14 ROUTER 00000000
> Mar 26 02:27:14 ROUTER
> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER printk+0x15/0x17
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER __fput+0x93/0x135
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER fput+0x17/0x19
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
> Mar 26 02:27:14 ROUTER ?
> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
> Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
> Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
> Mar 26 02:27:14 ROUTER 94
> Mar 26 02:27:14 ROUTER c0
> Mar 26 02:27:14 ROUTER 84
> Mar 26 02:27:14 ROUTER c0
> Mar 26 02:27:14 ROUTER b9
> Mar 26 02:27:14 ROUTER 01
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 75
> Mar 26 02:27:14 ROUTER 09
> Mar 26 02:27:14 ROUTER f0
> Mar 26 02:27:14 ROUTER 81
> Mar 26 02:27:14 ROUTER 02
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 01
> Mar 26 02:27:14 ROUTER 30
> Mar 26 02:27:14 ROUTER c9
> Mar 26 02:27:14 ROUTER 5d
> Mar 26 02:27:14 ROUTER 89
> Mar 26 02:27:14 ROUTER c8
> Mar 26 02:27:14 ROUTER c3
> Mar 26 02:27:14 ROUTER 55
> Mar 26 02:27:14 ROUTER ba
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 01
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 89
> Mar 26 02:27:14 ROUTER e5
> Mar 26 02:27:14 ROUTER f0
> Mar 26 02:27:14 ROUTER 66
> Mar 26 02:27:14 ROUTER 0f
> Mar 26 02:27:14 ROUTER c1
> Mar 26 02:27:14 ROUTER 10
> Mar 26 02:27:14 ROUTER 38
> Mar 26 02:27:14 ROUTER f2
> Mar 26 02:27:14 ROUTER 74
> Mar 26 02:27:14 ROUTER 06
> Mar 26 02:27:14 ROUTER f3
> Mar 26 02:27:14 ROUTER 90
> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
> Mar 26 02:27:14 ROUTER 10
> Mar 26 02:27:14 ROUTER eb
> Mar 26 02:27:14 ROUTER f6
> Mar 26 02:27:14 ROUTER 5d
> Mar 26 02:27:14 ROUTER c3
> Mar 26 02:27:14 ROUTER 55
> Mar 26 02:27:14 ROUTER 89
> Mar 26 02:27:14 ROUTER e5
> Mar 26 02:27:14 ROUTER f0
> Mar 26 02:27:14 ROUTER 81
> Mar 26 02:27:14 ROUTER 28
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 00
> Mar 26 02:27:14 ROUTER 01
> Mar 26 02:27:14 ROUTER 74
> Mar 26 02:27:14 ROUTER 05
> Mar 26 02:27:14 ROUTER e8
> Mar 26 02:27:14 ROUTER 64
> Mar 26 02:27:14 ROUTER fd
> Mar 26 02:27:14 ROUTER
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 6:40 ` David Miller
@ 2008-03-27 7:03 ` Eric Dumazet
2008-03-27 8:36 ` Denys Fedoryshchenko
` (2 more replies)
2008-03-27 7:07 ` Denys Fedoryshchenko
2008-03-27 15:01 ` Patrick McHardy
2 siblings, 3 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-27 7:03 UTC (permalink / raw)
To: David Miller; +Cc: denys, netdev, kaber, netfilter-devel
David Miller a écrit :
> From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> Date: Thu, 27 Mar 2008 08:35:06 +0200
>
>> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be
>> released soon, but it is crashing like hell on high network load.
>
> That's amazing, you've taken a trip into the future and are running
> 2.6.27 already, please let me borrow your time machine :-)
>
> More seriously, there is obviously something very unique to your
> setup or else everyone would be reporting this crash, and we have
> to find out what that might be.
>
> There seems to be bunch of netfilter stuff in your traces, but
> the top of the trace is somewhere totally unrelated. This is
> a common reoccurance in your crash traces, making them less
> useful than they could be.
>
> I know you asked before what can be done to improve the traces,
> but I'm not an x86 expert so I have no idea how to help you
> in that area.
>
> Patrick, could you see if you can make any sense of his log?
> I see conttrack a lot in the backtraces.
I can see rt_garbage_collect() involved here. This one might explain very long
delays in softirq processing, and eventually crashes...
Denys, could you post :
grep . /proc/sys/net/ipv4/route/*
rtstat -c1 -i10
So that we can check if you should first change route cache tunables :)
>
> Thanks.
>
>> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-git6),
>> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
>>
>> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
>> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
>> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327, ti=c092e000
>> task=f7459080 task.ti=f70b7000)
>> Mar 26 02:27:14 ROUTER
>> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
>> Mar 26 02:27:14 ROUTER c092eb14
>> Mar 26 02:27:14 ROUTER c011991e
>> Mar 26 02:27:14 ROUTER f750d600
>> Mar 26 02:27:14 ROUTER f750d600
>> Mar 26 02:27:14 ROUTER c0378058
>> Mar 26 02:27:14 ROUTER 00000001
>> Mar 26 02:27:14 ROUTER c092eb34
>> Mar 26 02:27:14 ROUTER c0119b3b
>> Mar 26 02:27:14 ROUTER
>> Mar 26 02:27:14 ROUTER [ 4698.694693]
>> Mar 26 02:27:14 ROUTER 00000000
>> Mar 26 02:27:14 ROUTER 00000001
>> Mar 26 02:27:14 ROUTER 00000082
>> Mar 26 02:27:14 ROUTER f708af88
>> Mar 26 02:27:14 ROUTER c0378058
>> Mar 26 02:27:14 ROUTER 00000001
>> Mar 26 02:27:14 ROUTER c092eb3c
>> Mar 26 02:27:14 ROUTER c0119bfe
>> Mar 26 02:27:14 ROUTER
>> Mar 26 02:27:14 ROUTER [ 4698.694693]
>> Mar 26 02:27:14 ROUTER c092eb50
>> Mar 26 02:27:14 ROUTER c012f19c
>> Mar 26 02:27:14 ROUTER 00000000
>> Mar 26 02:27:14 ROUTER f708af88
>> Mar 26 02:27:14 ROUTER c0378058
>> Mar 26 02:27:14 ROUTER c092eb74
>> Mar 26 02:27:14 ROUTER c011652a
>> Mar 26 02:27:14 ROUTER 00000000
>> Mar 26 02:27:14 ROUTER
>> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER printk+0x15/0x17
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
>> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER __fput+0x93/0x135
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER fput+0x17/0x19
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
>> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
>> Mar 26 02:27:14 ROUTER ?
>> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
>> Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
>> Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
>> Mar 26 02:27:14 ROUTER 94
>> Mar 26 02:27:14 ROUTER c0
>> Mar 26 02:27:14 ROUTER 84
>> Mar 26 02:27:14 ROUTER c0
>> Mar 26 02:27:14 ROUTER b9
>> Mar 26 02:27:14 ROUTER 01
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 75
>> Mar 26 02:27:14 ROUTER 09
>> Mar 26 02:27:14 ROUTER f0
>> Mar 26 02:27:14 ROUTER 81
>> Mar 26 02:27:14 ROUTER 02
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 01
>> Mar 26 02:27:14 ROUTER 30
>> Mar 26 02:27:14 ROUTER c9
>> Mar 26 02:27:14 ROUTER 5d
>> Mar 26 02:27:14 ROUTER 89
>> Mar 26 02:27:14 ROUTER c8
>> Mar 26 02:27:14 ROUTER c3
>> Mar 26 02:27:14 ROUTER 55
>> Mar 26 02:27:14 ROUTER ba
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 01
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 89
>> Mar 26 02:27:14 ROUTER e5
>> Mar 26 02:27:14 ROUTER f0
>> Mar 26 02:27:14 ROUTER 66
>> Mar 26 02:27:14 ROUTER 0f
>> Mar 26 02:27:14 ROUTER c1
>> Mar 26 02:27:14 ROUTER 10
>> Mar 26 02:27:14 ROUTER 38
>> Mar 26 02:27:14 ROUTER f2
>> Mar 26 02:27:14 ROUTER 74
>> Mar 26 02:27:14 ROUTER 06
>> Mar 26 02:27:14 ROUTER f3
>> Mar 26 02:27:14 ROUTER 90
>> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
>> Mar 26 02:27:14 ROUTER 10
>> Mar 26 02:27:14 ROUTER eb
>> Mar 26 02:27:14 ROUTER f6
>> Mar 26 02:27:14 ROUTER 5d
>> Mar 26 02:27:14 ROUTER c3
>> Mar 26 02:27:14 ROUTER 55
>> Mar 26 02:27:14 ROUTER 89
>> Mar 26 02:27:14 ROUTER e5
>> Mar 26 02:27:14 ROUTER f0
>> Mar 26 02:27:14 ROUTER 81
>> Mar 26 02:27:14 ROUTER 28
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 00
>> Mar 26 02:27:14 ROUTER 01
>> Mar 26 02:27:14 ROUTER 74
>> Mar 26 02:27:14 ROUTER 05
>> Mar 26 02:27:14 ROUTER e8
>> Mar 26 02:27:14 ROUTER 64
>> Mar 26 02:27:14 ROUTER fd
>> Mar 26 02:27:14 ROUTER
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 6:40 ` David Miller
2008-03-27 7:03 ` Eric Dumazet
@ 2008-03-27 7:07 ` Denys Fedoryshchenko
2008-03-27 15:01 ` Patrick McHardy
2 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 7:07 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kaber, netfilter-devel
Sorry, mixed 2.6.25-rc7 (rc7 seems migrated to 2X :-))
sleepless night.
I can provide image of system (running on 128MB USB flash), it is Core 2 Duo
CPU, 3xe100, 1xe1000e, nat, shaping over ifb with htb, sure some iptables
filtering rules. But the problem to simulate the load. Biggest problem, why
it doesn't reboot with all this watchdog stuff enabled, it makes serious
headache by that. Now we are installing power switch, so it will help somehow.
On Wed, 26 Mar 2008 23:40:49 -0700 (PDT), David Miller wrote
> From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> Date: Thu, 27 Mar 2008 08:35:06 +0200
>
> > It seems i am having very bad luck with 2.6.27. As Linus told, it have to
be
> > released soon, but it is crashing like hell on high network load.
>
> That's amazing, you've taken a trip into the future and are running
> 2.6.27 already, please let me borrow your time machine :-)
>
> More seriously, there is obviously something very unique to your
> setup or else everyone would be reporting this crash, and we have
> to find out what that might be.
>
> There seems to be bunch of netfilter stuff in your traces, but
> the top of the trace is somewhere totally unrelated. This is
> a common reoccurance in your crash traces, making them less
> useful than they could be.
>
> I know you asked before what can be done to improve the traces,
> but I'm not an x86 expert so I have no idea how to help you
> in that area.
>
> Patrick, could you see if you can make any sense of his log?
> I see conttrack a lot in the backtraces.
>
> Thanks.
>
> > Here is a message i got over syslog on last crash (it was 2.6.25-rc6-
git6),
> > available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
> >
> > Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
> > Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
> > Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327,
ti=c092e000
> > task=f7459080 task.ti=f70b7000)
> > Mar 26 02:27:14 ROUTER
> > Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
> > Mar 26 02:27:14 ROUTER c092eb14
> > Mar 26 02:27:14 ROUTER c011991e
> > Mar 26 02:27:14 ROUTER f750d600
> > Mar 26 02:27:14 ROUTER f750d600
> > Mar 26 02:27:14 ROUTER c0378058
> > Mar 26 02:27:14 ROUTER 00000001
> > Mar 26 02:27:14 ROUTER c092eb34
> > Mar 26 02:27:14 ROUTER c0119b3b
> > Mar 26 02:27:14 ROUTER
> > Mar 26 02:27:14 ROUTER [ 4698.694693]
> > Mar 26 02:27:14 ROUTER 00000000
> > Mar 26 02:27:14 ROUTER 00000001
> > Mar 26 02:27:14 ROUTER 00000082
> > Mar 26 02:27:14 ROUTER f708af88
> > Mar 26 02:27:14 ROUTER c0378058
> > Mar 26 02:27:14 ROUTER 00000001
> > Mar 26 02:27:14 ROUTER c092eb3c
> > Mar 26 02:27:14 ROUTER c0119bfe
> > Mar 26 02:27:14 ROUTER
> > Mar 26 02:27:14 ROUTER [ 4698.694693]
> > Mar 26 02:27:14 ROUTER c092eb50
> > Mar 26 02:27:14 ROUTER c012f19c
> > Mar 26 02:27:14 ROUTER 00000000
> > Mar 26 02:27:14 ROUTER f708af88
> > Mar 26 02:27:14 ROUTER c0378058
> > Mar 26 02:27:14 ROUTER c092eb74
> > Mar 26 02:27:14 ROUTER c011652a
> > Mar 26 02:27:14 ROUTER 00000000
> > Mar 26 02:27:14 ROUTER
> > Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER printk+0x15/0x17
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
> > Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER __fput+0x93/0x135
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER fput+0x17/0x19
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
> > Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
> > Mar 26 02:27:14 ROUTER ?
> > Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
> > Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
> > Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
> > Mar 26 02:27:14 ROUTER 94
> > Mar 26 02:27:14 ROUTER c0
> > Mar 26 02:27:14 ROUTER 84
> > Mar 26 02:27:14 ROUTER c0
> > Mar 26 02:27:14 ROUTER b9
> > Mar 26 02:27:14 ROUTER 01
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 75
> > Mar 26 02:27:14 ROUTER 09
> > Mar 26 02:27:14 ROUTER f0
> > Mar 26 02:27:14 ROUTER 81
> > Mar 26 02:27:14 ROUTER 02
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 01
> > Mar 26 02:27:14 ROUTER 30
> > Mar 26 02:27:14 ROUTER c9
> > Mar 26 02:27:14 ROUTER 5d
> > Mar 26 02:27:14 ROUTER 89
> > Mar 26 02:27:14 ROUTER c8
> > Mar 26 02:27:14 ROUTER c3
> > Mar 26 02:27:14 ROUTER 55
> > Mar 26 02:27:14 ROUTER ba
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 01
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 89
> > Mar 26 02:27:14 ROUTER e5
> > Mar 26 02:27:14 ROUTER f0
> > Mar 26 02:27:14 ROUTER 66
> > Mar 26 02:27:14 ROUTER 0f
> > Mar 26 02:27:14 ROUTER c1
> > Mar 26 02:27:14 ROUTER 10
> > Mar 26 02:27:14 ROUTER 38
> > Mar 26 02:27:14 ROUTER f2
> > Mar 26 02:27:14 ROUTER 74
> > Mar 26 02:27:14 ROUTER 06
> > Mar 26 02:27:14 ROUTER f3
> > Mar 26 02:27:14 ROUTER 90
> > Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
> > Mar 26 02:27:14 ROUTER 10
> > Mar 26 02:27:14 ROUTER eb
> > Mar 26 02:27:14 ROUTER f6
> > Mar 26 02:27:14 ROUTER 5d
> > Mar 26 02:27:14 ROUTER c3
> > Mar 26 02:27:14 ROUTER 55
> > Mar 26 02:27:14 ROUTER 89
> > Mar 26 02:27:14 ROUTER e5
> > Mar 26 02:27:14 ROUTER f0
> > Mar 26 02:27:14 ROUTER 81
> > Mar 26 02:27:14 ROUTER 28
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 00
> > Mar 26 02:27:14 ROUTER 01
> > Mar 26 02:27:14 ROUTER 74
> > Mar 26 02:27:14 ROUTER 05
> > Mar 26 02:27:14 ROUTER e8
> > Mar 26 02:27:14 ROUTER 64
> > Mar 26 02:27:14 ROUTER fd
> > Mar 26 02:27:14 ROUTER
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 7:03 ` Eric Dumazet
@ 2008-03-27 8:36 ` Denys Fedoryshchenko
2008-03-27 8:48 ` Denys Fedoryshchenko
2008-03-27 14:41 ` Denys Fedoryshchenko
2 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 8:36 UTC (permalink / raw)
To: netdev
Kup ~ # grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:5000
/proc/sys/net/ipv4/route/error_cost:1000
grep: /proc/sys/net/ipv4/route/flush: Permission denied
/proc/sys/net/ipv4/route/gc_elasticity:8
/proc/sys/net/ipv4/route/gc_interval:60
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:500
/proc/sys/net/ipv4/route/gc_thresh:32768
/proc/sys/net/ipv4/route/gc_timeout:300
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:20
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:20480
/proc/sys/net/ipv4/route/secret_interval:600
Kup ~ # rtstat -c1 -i10
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
247734| 6225923| 482238| 0| 1472| 32| 0| 0|
75983| 30203| 0| 508575| 507658| 4| 0|18714805| 250265|
Kup ~ # rtstat -c1 -i10
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
246851| 6356930| 496921| 0| 1535| 33| 0| 0|
77469| 30908| 0| 524018| 523085| 5| 0|18873741| 252563|
On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote
> David Miller a :
> > From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> > Date: Thu, 27 Mar 2008 08:35:06 +0200
> >
> >> It seems i am having very bad luck with 2.6.27. As Linus told, it have
to be
> >> released soon, but it is crashing like hell on high network load.
> >
> > That's amazing, you've taken a trip into the future and are running
> > 2.6.27 already, please let me borrow your time machine :-)
> >
> > More seriously, there is obviously something very unique to your
> > setup or else everyone would be reporting this crash, and we have
> > to find out what that might be.
> >
> > There seems to be bunch of netfilter stuff in your traces, but
> > the top of the trace is somewhere totally unrelated. This is
> > a common reoccurance in your crash traces, making them less
> > useful than they could be.
> >
> > I know you asked before what can be done to improve the traces,
> > but I'm not an x86 expert so I have no idea how to help you
> > in that area.
> >
> > Patrick, could you see if you can make any sense of his log?
> > I see conttrack a lot in the backtraces.
>
> I can see rt_garbage_collect() involved here. This one might explain
> very long delays in softirq processing, and eventually crashes...
>
> Denys, could you post :
>
> grep . /proc/sys/net/ipv4/route/*
>
> rtstat -c1 -i10
>
> So that we can check if you should first change route cache tunables
> :)
>
> >
> > Thanks.
> >
> >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-
git6),
> >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
> >>
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
> >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327,
ti=c092e000
> >> task=f7459080 task.ti=f70b7000)
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
> >> Mar 26 02:27:14 ROUTER c092eb14
> >> Mar 26 02:27:14 ROUTER c011991e
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb34
> >> Mar 26 02:27:14 ROUTER c0119b3b
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER 00000082
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb3c
> >> Mar 26 02:27:14 ROUTER c0119bfe
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER c092eb50
> >> Mar 26 02:27:14 ROUTER c012f19c
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER c092eb74
> >> Mar 26 02:27:14 ROUTER c011652a
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER printk+0x15/0x17
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER fput+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
> >> Mar 26 02:27:14 ROUTER 94
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER 84
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER b9
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 75
> >> Mar 26 02:27:14 ROUTER 09
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 02
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 30
> >> Mar 26 02:27:14 ROUTER c9
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER c8
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER ba
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 66
> >> Mar 26 02:27:14 ROUTER 0f
> >> Mar 26 02:27:14 ROUTER c1
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER 38
> >> Mar 26 02:27:14 ROUTER f2
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 06
> >> Mar 26 02:27:14 ROUTER f3
> >> Mar 26 02:27:14 ROUTER 90
> >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER eb
> >> Mar 26 02:27:14 ROUTER f6
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 28
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 05
> >> Mar 26 02:27:14 ROUTER e8
> >> Mar 26 02:27:14 ROUTER 64
> >> Mar 26 02:27:14 ROUTER fd
> >> Mar 26 02:27:14 ROUTER
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 7:03 ` Eric Dumazet
2008-03-27 8:36 ` Denys Fedoryshchenko
@ 2008-03-27 8:48 ` Denys Fedoryshchenko
2008-03-27 14:41 ` Denys Fedoryshchenko
2 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 8:48 UTC (permalink / raw)
To: Eric Dumazet, David Miller; +Cc: netdev, kaber, netfilter-devel
After some time
Kup /config # grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:5000
/proc/sys/net/ipv4/route/error_cost:1000
grep: /proc/sys/net/ipv4/route/flush: Permission denied
/proc/sys/net/ipv4/route/gc_elasticity:8
/proc/sys/net/ipv4/route/gc_interval:60
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:500
/proc/sys/net/ipv4/route/gc_thresh:32768
/proc/sys/net/ipv4/route/gc_timeout:300
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:20
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:20480
/proc/sys/net/ipv4/route/secret_interval:600
Kup /config #
Kup /config # rtstat -c1 -i10
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
337510| 8928889| 1855387| 0| 14286| 69| 0| 0|
110840| 58108| 0| 1922232| 1920809| 377| 0|20908744|
294715|
Kup /config #
On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote
> David Miller:
> > From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> > Date: Thu, 27 Mar 2008 08:35:06 +0200
> >
> >> It seems i am having very bad luck with 2.6.27. As Linus told, it have
to be
> >> released soon, but it is crashing like hell on high network load.
> >
> > That's amazing, you've taken a trip into the future and are running
> > 2.6.27 already, please let me borrow your time machine :-)
> >
> > More seriously, there is obviously something very unique to your
> > setup or else everyone would be reporting this crash, and we have
> > to find out what that might be.
> >
> > There seems to be bunch of netfilter stuff in your traces, but
> > the top of the trace is somewhere totally unrelated. This is
> > a common reoccurance in your crash traces, making them less
> > useful than they could be.
> >
> > I know you asked before what can be done to improve the traces,
> > but I'm not an x86 expert so I have no idea how to help you
> > in that area.
> >
> > Patrick, could you see if you can make any sense of his log?
> > I see conttrack a lot in the backtraces.
>
> I can see rt_garbage_collect() involved here. This one might explain
> very long delays in softirq processing, and eventually crashes...
>
> Denys, could you post :
>
> grep . /proc/sys/net/ipv4/route/*
>
> rtstat -c1 -i10
>
> So that we can check if you should first change route cache tunables
> :)
>
> >
> > Thanks.
> >
> >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-
git6),
> >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
> >>
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
> >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327,
ti=c092e000
> >> task=f7459080 task.ti=f70b7000)
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
> >> Mar 26 02:27:14 ROUTER c092eb14
> >> Mar 26 02:27:14 ROUTER c011991e
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb34
> >> Mar 26 02:27:14 ROUTER c0119b3b
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER 00000082
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb3c
> >> Mar 26 02:27:14 ROUTER c0119bfe
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER c092eb50
> >> Mar 26 02:27:14 ROUTER c012f19c
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER c092eb74
> >> Mar 26 02:27:14 ROUTER c011652a
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER printk+0x15/0x17
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER fput+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
> >> Mar 26 02:27:14 ROUTER 94
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER 84
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER b9
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 75
> >> Mar 26 02:27:14 ROUTER 09
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 02
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 30
> >> Mar 26 02:27:14 ROUTER c9
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER c8
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER ba
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 66
> >> Mar 26 02:27:14 ROUTER 0f
> >> Mar 26 02:27:14 ROUTER c1
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER 38
> >> Mar 26 02:27:14 ROUTER f2
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 06
> >> Mar 26 02:27:14 ROUTER f3
> >> Mar 26 02:27:14 ROUTER 90
> >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER eb
> >> Mar 26 02:27:14 ROUTER f6
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 28
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 05
> >> Mar 26 02:27:14 ROUTER e8
> >> Mar 26 02:27:14 ROUTER 64
> >> Mar 26 02:27:14 ROUTER fd
> >> Mar 26 02:27:14 ROUTER
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 7:03 ` Eric Dumazet
2008-03-27 8:36 ` Denys Fedoryshchenko
2008-03-27 8:48 ` Denys Fedoryshchenko
@ 2008-03-27 14:41 ` Denys Fedoryshchenko
2008-03-27 14:52 ` Eric Dumazet
2 siblings, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 14:41 UTC (permalink / raw)
To: Eric Dumazet, David Miller; +Cc: netdev, kaber, netfilter-devel
Yes, it seems related to routing. Before such thing was not happening (maybe
because TRIE was operating better?).
Here is info at "peak time", i disable nmi_watchdog now, so garbage collector
will not be triggered by nmi watchdog.
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
247011| 2255410| 162866| 0| 610| 5| 0| 0|
33761| 8574| 0| 165475| 165307| 0| 0|10407428| 174948|
251810| 43087| 2782| 0| 9| 0| 0| 0|
618| 122| 0| 2912| 2909| 0| 0| 241946| 3796|
256277| 43035| 2739| 0| 9| 0| 0| 0|
595| 121| 0| 2867| 2864| 0| 0| 243724| 3748|
260177| 43596| 2647| 0| 8| 0| 0| 0|
672| 123| 0| 2778| 2776| 0| 0| 246880| 4048|
232741| 42270| 2759| 0| 15| 0| 0| 0|
665| 135| 0| 2910| 2907| 0| 0| 233990| 3938|
226623| 42615| 2792| 0| 11| 0| 0| 0|
723| 132| 0| 2935| 2932| 0| 0| 218378| 3862|
233190| 42397| 2778| 0| 8| 0| 0| 0|
675| 128| 0| 2913| 2909| 0| 0| 214258| 3703|
239093| 42342| 2713| 0| 9| 0| 0| 0|
764| 126| 0| 2847| 2845| 0| 0| 216453| 4080|
150539| 36992| 7564| 0| 58| 0| 0| 0|
539| 283| 0| 7902| 7900| 0| 0| 93459| 1996|
154441| 40258| 4422| 0| 12| 0| 0| 0|
655| 212| 0| 4644| 4640| 0| 0| 77271| 1458|
On one "crash" i got
Mar 27 11:27:15 205.177.186.74 [ 7141.156100] dst cache overflow
Mar 27 11:27:15 205.177.186.74 [ 7141.002291] dst cache overflow
Mar 27 11:27:15 205.177.186.74 [ 7141.162756] dst cache overflow
So i increase now routes amount more than 512K to 5120K, and will see more.
On Thu, 27 Mar 2008 08:03:25 +0100, Eric Dumazet wrote
> David Miller:
> > From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> > Date: Thu, 27 Mar 2008 08:35:06 +0200
> >
> >> It seems i am having very bad luck with 2.6.27. As Linus told, it have
to be
> >> released soon, but it is crashing like hell on high network load.
> >
> > That's amazing, you've taken a trip into the future and are running
> > 2.6.27 already, please let me borrow your time machine :-)
> >
> > More seriously, there is obviously something very unique to your
> > setup or else everyone would be reporting this crash, and we have
> > to find out what that might be.
> >
> > There seems to be bunch of netfilter stuff in your traces, but
> > the top of the trace is somewhere totally unrelated. This is
> > a common reoccurance in your crash traces, making them less
> > useful than they could be.
> >
> > I know you asked before what can be done to improve the traces,
> > but I'm not an x86 expert so I have no idea how to help you
> > in that area.
> >
> > Patrick, could you see if you can make any sense of his log?
> > I see conttrack a lot in the backtraces.
>
> I can see rt_garbage_collect() involved here. This one might explain
> very long delays in softirq processing, and eventually crashes...
>
> Denys, could you post :
>
> grep . /proc/sys/net/ipv4/route/*
>
> rtstat -c1 -i10
>
> So that we can check if you should first change route cache tunables
> :)
>
> >
> > Thanks.
> >
> >> Here is a message i got over syslog on last crash (it was 2.6.25-rc6-
git6),
> >> available also at http://www.nuclearcat.com/files/crash_2.6.25.txt
> >>
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] BUG: NMI Watchdog detected LOCKUP
> >> Mar 26 02:27:14 ROUTER on CPU1, ip c02ad109, registers:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Process snmpd (pid: 2327,
ti=c092e000
> >> task=f7459080 task.ti=f70b7000)
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Stack:
> >> Mar 26 02:27:14 ROUTER c092eb14
> >> Mar 26 02:27:14 ROUTER c011991e
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER f750d600
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb34
> >> Mar 26 02:27:14 ROUTER c0119b3b
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER 00000082
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER 00000001
> >> Mar 26 02:27:14 ROUTER c092eb3c
> >> Mar 26 02:27:14 ROUTER c0119bfe
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693]
> >> Mar 26 02:27:14 ROUTER c092eb50
> >> Mar 26 02:27:14 ROUTER c012f19c
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER f708af88
> >> Mar 26 02:27:14 ROUTER c0378058
> >> Mar 26 02:27:14 ROUTER c092eb74
> >> Mar 26 02:27:14 ROUTER c011652a
> >> Mar 26 02:27:14 ROUTER 00000000
> >> Mar 26 02:27:14 ROUTER
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] Call Trace:
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011991e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER task_rq_lock+0x31/0x58
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119b3b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x19/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bfe>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER default_wake_function+0xb/0xd
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012f19c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER autoremove_wake_function+0xf/0x33
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011652a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up_common+0x2f/0x5a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01189b8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __wake_up+0x28/0x3b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01201a3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_klogd+0x2e/0x31
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c012033d>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER release_console_sem+0x197/0x19f
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0120747>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER vprintk+0x295/0x2e5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899634c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER death_by_timeout+0x8b/0xa3 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8999d08>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tcp_packet+0x931/0x9e5 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01207ac>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER printk+0x15/0x17
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011fb65>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER warn_on_slowpath+0x2a/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011764a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __update_rq_clock+0x1c/0x126
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0116ab3>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_curr+0x48/0x64
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f89961ed>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_ct_invert_tuple+0x63/0x6f [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996cca>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_tuple_taken+0xf8/0x100 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f899850c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __nf_ct_helper_find+0x2c/0x90 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8996b95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_conntrack_alter_reply+0x4a/0x87 [nf_conntrack]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<f8974976>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER nf_nat_setup_info+0x3cc/0x55a [nf_nat]
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011701c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_entity+0x88/0x171
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117127>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dequeue_rt_stack+0x22/0x27
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0117425>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task_rt+0x19/0x2c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c011617f>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER enqueue_task+0xd/0x18
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01161c0>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER activate_task+0x1e/0x2b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119bb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER try_to_wake_up+0x8f/0xd1
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0119c1b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER wake_up_process+0xf/0x11
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c013dfa1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER softlockup_tick+0x9d/0x10b
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0126f5c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER run_local_timers+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c01272fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER update_process_times+0x24/0x49
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f4c>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_periodic+0x62/0x6e
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0135f71>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER tick_handle_periodic+0x19/0x68
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c010e87b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER smp_apic_timer_interrupt+0x6c/0x81
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0104344>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER apic_timer_interrupt+0x28/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02ad202>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER _spin_lock_bh+0x20/0x22
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02751fa>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER rt_garbage_collect+0x132/0x27a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0262d95>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER dst_alloc+0x19/0x63
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0276eb1>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_route_input+0x6b9/0xbd9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278898>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv_finish+0x2c/0x29a
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0278ef8>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER ip_rcv+0x202/0x22c
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c025ee4e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER netif_receive_skb+0x33e/0x3a9
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c02612c2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER process_backlog+0x62/0xb5
> >> Mar 26 02:27:14 ROUTER [ 4698.694693] [<c0260d27>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER net_rx_action+0x8f/0x191
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01240a7>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __do_softirq+0x64/0xcd
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0105f0a>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER do_softirq+0x55/0x89
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0123f88>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER local_bh_enable+0x61/0x6d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257689>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER lock_sock_nested+0x83/0x8b
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0292e58>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_destroy_sock+0xd/0x20
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0257b9e>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sk_common_release+0x15/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02924a4>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER udp_lib_close+0x8/0xa
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0299006>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER inet_release+0x42/0x48
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c025625b>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_release+0x14/0x60
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c02565d9>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sock_close+0x29/0x30
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a6a2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER __fput+0x93/0x135
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c015a8e2>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER fput+0x17/0x19
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c01583dc>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER filp_close+0x47/0x51
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0159414>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sys_close+0x68/0x9d
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] [<c0103876>]
> >> Mar 26 02:27:14 ROUTER ?
> >> Mar 26 02:27:14 ROUTER sysenter_past_esp+0x5f/0x85
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] =======================
> >> Mar 26 02:27:14 ROUTER [ 4698.694694] Code:
> >> Mar 26 02:27:14 ROUTER 94
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER 84
> >> Mar 26 02:27:14 ROUTER c0
> >> Mar 26 02:27:14 ROUTER b9
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 75
> >> Mar 26 02:27:14 ROUTER 09
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 02
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 30
> >> Mar 26 02:27:14 ROUTER c9
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER c8
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER ba
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 66
> >> Mar 26 02:27:14 ROUTER 0f
> >> Mar 26 02:27:14 ROUTER c1
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER 38
> >> Mar 26 02:27:14 ROUTER f2
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 06
> >> Mar 26 02:27:14 ROUTER f3
> >> Mar 26 02:27:14 ROUTER 90
> >> Mar 26 02:27:14 ROUTER unparseable log message: "<8a> "
> >> Mar 26 02:27:14 ROUTER 10
> >> Mar 26 02:27:14 ROUTER eb
> >> Mar 26 02:27:14 ROUTER f6
> >> Mar 26 02:27:14 ROUTER 5d
> >> Mar 26 02:27:14 ROUTER c3
> >> Mar 26 02:27:14 ROUTER 55
> >> Mar 26 02:27:14 ROUTER 89
> >> Mar 26 02:27:14 ROUTER e5
> >> Mar 26 02:27:14 ROUTER f0
> >> Mar 26 02:27:14 ROUTER 81
> >> Mar 26 02:27:14 ROUTER 28
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 00
> >> Mar 26 02:27:14 ROUTER 01
> >> Mar 26 02:27:14 ROUTER 74
> >> Mar 26 02:27:14 ROUTER 05
> >> Mar 26 02:27:14 ROUTER e8
> >> Mar 26 02:27:14 ROUTER 64
> >> Mar 26 02:27:14 ROUTER fd
> >> Mar 26 02:27:14 ROUTER
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 14:41 ` Denys Fedoryshchenko
@ 2008-03-27 14:52 ` Eric Dumazet
[not found] ` <20080327151520.M89250@visp.net.lb>
0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2008-03-27 14:52 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: David Miller, netdev, kaber, netfilter-devel
Denys Fedoryshchenko a écrit :
> Yes, it seems related to routing. Before such thing was not happening (maybe
> because TRIE was operating better?).
>
> Here is info at "peak time", i disable nmi_watchdog now, so garbage collector
> will not be triggered by nmi watchdog.
>
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 247011| 2255410| 162866| 0| 610| 5| 0| 0|
> 33761| 8574| 0| 165475| 165307| 0| 0|10407428| 174948|
> 251810| 43087| 2782| 0| 9| 0| 0| 0|
> 618| 122| 0| 2912| 2909| 0| 0| 241946| 3796|
> 256277| 43035| 2739| 0| 9| 0| 0| 0|
> 595| 121| 0| 2867| 2864| 0| 0| 243724| 3748|
> 260177| 43596| 2647| 0| 8| 0| 0| 0|
> 672| 123| 0| 2778| 2776| 0| 0| 246880| 4048|
> 232741| 42270| 2759| 0| 15| 0| 0| 0|
> 665| 135| 0| 2910| 2907| 0| 0| 233990| 3938|
> 226623| 42615| 2792| 0| 11| 0| 0| 0|
> 723| 132| 0| 2935| 2932| 0| 0| 218378| 3862|
> 233190| 42397| 2778| 0| 8| 0| 0| 0|
> 675| 128| 0| 2913| 2909| 0| 0| 214258| 3703|
> 239093| 42342| 2713| 0| 9| 0| 0| 0|
> 764| 126| 0| 2847| 2845| 0| 0| 216453| 4080|
> 150539| 36992| 7564| 0| 58| 0| 0| 0|
> 539| 283| 0| 7902| 7900| 0| 0| 93459| 1996|
> 154441| 40258| 4422| 0| 12| 0| 0| 0|
> 655| 212| 0| 4644| 4640| 0| 0| 77271| 1458|
>
> On one "crash" i got
> Mar 27 11:27:15 205.177.186.74 [ 7141.156100] dst cache overflow
> Mar 27 11:27:15 205.177.186.74 [ 7141.002291] dst cache overflow
> Mar 27 11:27:15 205.177.186.74 [ 7141.162756] dst cache overflow
>
> So i increase now routes amount more than 512K to 5120K, and will see more.
>
You want to tune route cache for your special needs, and not permit it
to store 5 millions entries !
# default is a gc every 60 seconds, not good for large caches
echo 1 >/proc/sys/net/ipv4/route/gc_interval
# default is 8 entries per slot..
echo 4 >/proc/sys/net/ipv4/route/gc_elasticity
# avoid a flush every 10 minutes
echo 3600 >/proc/sys/net/ipv4/route/secret_interval
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 6:40 ` David Miller
2008-03-27 7:03 ` Eric Dumazet
2008-03-27 7:07 ` Denys Fedoryshchenko
@ 2008-03-27 15:01 ` Patrick McHardy
2 siblings, 0 replies; 29+ messages in thread
From: Patrick McHardy @ 2008-03-27 15:01 UTC (permalink / raw)
To: David Miller; +Cc: denys, netdev, netfilter-devel
David Miller wrote:
> From: "Denys Fedoryshchenko" <denys@visp.net.lb>
> Date: Thu, 27 Mar 2008 08:35:06 +0200
>
>> It seems i am having very bad luck with 2.6.27. As Linus told, it have to be
>> released soon, but it is crashing like hell on high network load.
>
> That's amazing, you've taken a trip into the future and are running
> 2.6.27 already, please let me borrow your time machine :-)
>
> More seriously, there is obviously something very unique to your
> setup or else everyone would be reporting this crash, and we have
> to find out what that might be.
>
> There seems to be bunch of netfilter stuff in your traces, but
> the top of the trace is somewhere totally unrelated. This is
> a common reoccurance in your crash traces, making them less
> useful than they could be.
>
> I know you asked before what can be done to improve the traces,
> but I'm not an x86 expert so I have no idea how to help you
> in that area.
>
> Patrick, could you see if you can make any sense of his log?
> I see conttrack a lot in the backtraces.
The conntrack stuff looks harmless, I went through the code just
to make sure, but I can't see anything wrong there.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
[not found] ` <20080327151520.M89250@visp.net.lb>
@ 2008-03-27 16:07 ` Eric Dumazet
2008-03-27 16:29 ` Eric Dumazet
2008-03-27 18:37 ` Denys Fedoryshchenko
0 siblings, 2 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-27 16:07 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: David Miller, netdev, kaber, netfilter-devel
Denys Fedoryshchenko a écrit :
> I did this tuning. Not sure what does mean rt_cache_entries, but seems they
> are still increasing.
>
> Kup /config # rtstat -c1000 -i60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 274234| 1590009| 271953| 0| 1809| 6| 0| 0|
> 24551| 7365| 0| 279991| 279867| 24| 0| 5438630| 89656|
> 280535| 25463| 19956| 0| 187| 0| 0| 0|
> 485| 263| 0| 20407| 20404| 3| 0| 12796| 242|
> 288072| 25367| 19830| 0| 163| 0| 0| 0|
> 493| 256| 0| 20248| 20245| 3| 0| 17146| 346|
> 294943| 25689| 19807| 0| 154| 0| 0| 0|
> 438| 250| 0| 20211| 20208| 3| 0| 21554| 432|
> 301118| 25287| 19645| 0| 170| 0| 0| 0|
> 466| 236| 0| 20052| 20048| 3| 0| 25713| 528|
> 307651| 25663| 20264| 0| 180| 0| 0| 0|
> 472| 234| 0| 20678| 20675| 3| 0| 30588| 607|
> 313767| 25810| 20014| 0| 175| 0| 0| 0|
> 459| 237| 0| 20425| 20422| 3| 0| 33701| 699|
>
>
>
Then try to lower gc_elasticity, to 3, or even 2
echo 3 >/proc/sys/net/ipv4/route/gc_elasticity
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 16:07 ` Eric Dumazet
@ 2008-03-27 16:29 ` Eric Dumazet
2008-03-27 18:37 ` Denys Fedoryshchenko
1 sibling, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-27 16:29 UTC (permalink / raw)
To: Eric Dumazet
Cc: Denys Fedoryshchenko, David Miller, netdev, kaber,
netfilter-devel
Eric Dumazet a écrit :
> Denys Fedoryshchenko a écrit :
>> I did this tuning. Not sure what does mean rt_cache_entries, but
>> seems they are still increasing.
>>
>> Kup /config # rtstat -c1000 -i60
>> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
>> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
>> rt_cache|
>> entries| in_hit|in_slow_|in_slow_|in_no_ro|
>> in_brd|in_marti|in_marti|
>> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
>> out_hlis|
>> | | tot| mc| ute| | an_dst|
>> an_src| | _tot| _mc| | ed| miss|
>> verflow| _search|t_search|
>> 274234| 1590009| 271953| 0| 1809| 6| 0|
>> 0| 24551| 7365| 0| 279991| 279867| 24| 0|
>> 5438630| 89656|
>> 280535| 25463| 19956| 0| 187| 0| 0|
>> 0| 485| 263| 0| 20407| 20404| 3| 0|
>> 12796| 242|
>> 288072| 25367| 19830| 0| 163| 0| 0|
>> 0| 493| 256| 0| 20248| 20245| 3| 0|
>> 17146| 346|
>> 294943| 25689| 19807| 0| 154| 0| 0|
>> 0| 438| 250| 0| 20211| 20208| 3| 0|
>> 21554| 432|
>> 301118| 25287| 19645| 0| 170| 0| 0|
>> 0| 466| 236| 0| 20052| 20048| 3| 0|
>> 25713| 528|
>> 307651| 25663| 20264| 0| 180| 0| 0|
>> 0| 472| 234| 0| 20678| 20675| 3| 0|
>> 30588| 607|
>> 313767| 25810| 20014| 0| 175| 0| 0|
>> 0| 459| 237| 0| 20425| 20422| 3| 0|
>> 33701| 699|
>>
>>
>>
> Then try to lower gc_elasticity, to 3, or even 2
>
> echo 3 >/proc/sys/net/ipv4/route/gc_elasticity
>
>
One more idea before leaving :)
If rt_cache_entries still increasing, we might have a dst leak somewhere :
gc tries to evict entries that have a non null refcount -> they are put
in dst_garbage.list
Following patch will show us how dst_garbage behaves.
( printk(KERN_DEBUG "dst_total: %d delayed: %d work_perf: %d" ...)
diff --git a/net/core/dst.c b/net/core/dst.c
index 7deef48..e634e5f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -31,6 +31,8 @@
* 3) This list is guarded by a mutex,
* so that the gc_task and dst_dev_event() can be synchronized.
*/
+#undef RT_CACHE_DEBUG
+#define RT_CACHE_DEBUG 2
#if RT_CACHE_DEBUG >= 2
static atomic_t dst_total = ATOMIC_INIT(0);
#endif
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 16:07 ` Eric Dumazet
2008-03-27 16:29 ` Eric Dumazet
@ 2008-03-27 18:37 ` Denys Fedoryshchenko
2008-03-27 18:56 ` Eric Dumazet
1 sibling, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-27 18:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, kaber, netdev, netfilter-devel
Here is output from dmesg with patch you supplied.
Kup /config # rtstat -i60 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
103266| 69986| 11574| 0| 54| 0| 0| 0|
1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254|
124787| 45125| 6019| 0| 28| 0| 0| 0|
807| 230| 0| 6277| 6274| 0| 0| 128922| 2518|
120270| 45588| 6288| 0| 30| 0| 0| 0|
883| 214| 0| 6532| 6529| 0| 0| 125651| 2743|
122253| 46522| 6582| 0| 27| 0| 0| 0|
897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|
[ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999
elapsed: 1 us
[ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998
elapsed: 2 us
[ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000
elapsed: 1 us
[ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998
elapsed: 1 us
[ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600
elapsed: 2 us
[ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696
elapsed: 6 us
After while
Kup /config # rtstat -i300 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
129138| 64166| 9864| 0| 46| 0| 0| 0|
1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985|
130125| 43971| 10829| 0| 70| 0| 0| 0|
907| 195| 0| 11093| 11090| 0| 0| 100601| 2848|
144857| 32041| 22233| 0| 234| 0| 0| 0|
738| 261| 0| 22727| 22724| 2| 0| 42769| 811|
[ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999
elapsed: 1 us
[ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600
elapsed: 11274 us
[ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600
elapsed: 11150 us
[ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100
elapsed: 10896 us
[ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655
elapsed: 11561 us
[ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988
elapsed: 12254 us
[ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988
elapsed: 12209 us
[ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987
elapsed: 12039 us
[ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987
elapsed: 12448 us
[ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988
elapsed: 11435 us
[ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987
elapsed: 12833 us
[ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987
elapsed: 12068 us
[ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988
elapsed: 11970 us
[ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989
elapsed: 11747 us
[ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988
elapsed: 12653 us
[ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987
elapsed: 12217 us
[ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987
elapsed: 12026 us
[ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988
elapsed: 11151 us
and after 30 minutes more
Kup /config # rtstat -i300 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
194841| 176962| 77964| 0| 684| 1| 0| 0|
3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455|
223224| 28262| 26499| 0| 236| 0| 0| 0|
661| 216| 0| 26951| 26947| 3| 0| 93059| 1946|
248425| 28257| 26784| 0| 226| 0| 0| 0|
678| 202| 0| 27211| 27208| 3| 0| 115593| 2412|
272823| 28943| 26728| 0| 246| 0| 0| 0|
695| 200| 0| 27174| 27171| 3| 0| 135256| 2942|
295087| 28835| 26701| 0| 252| 0| 0| 0|
687| 194| 0| 27146| 27143| 3| 0| 154269| 3312|
[ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987
elapsed: 12519 us
[ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987
elapsed: 13930 us
[ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986
elapsed: 14330 us
[ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972
elapsed: 15230 us
[ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985
elapsed: 15018 us
[ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987
elapsed: 13654 us
[ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987
elapsed: 13137 us
On Thu, 27 Mar 2008 17:07:29 +0100, Eric Dumazet wrote
> Denys Fedoryshchenko a :
> > I did this tuning. Not sure what does mean rt_cache_entries, but seems
they
> > are still increasing.
> >
> > Kup /config # rtstat -c1000 -i60
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 274234| 1590009| 271953| 0| 1809| 6| 0|
0|
> > 24551| 7365| 0| 279991| 279867| 24| 0| 5438630|
89656|
> > 280535| 25463| 19956| 0| 187| 0| 0|
0|
> > 485| 263| 0| 20407| 20404| 3| 0| 12796|
242|
> > 288072| 25367| 19830| 0| 163| 0| 0|
0|
> > 493| 256| 0| 20248| 20245| 3| 0| 17146|
346|
> > 294943| 25689| 19807| 0| 154| 0| 0|
0|
> > 438| 250| 0| 20211| 20208| 3| 0| 21554|
432|
> > 301118| 25287| 19645| 0| 170| 0| 0|
0|
> > 466| 236| 0| 20052| 20048| 3| 0| 25713|
528|
> > 307651| 25663| 20264| 0| 180| 0| 0|
0|
> > 472| 234| 0| 20678| 20675| 3| 0| 30588|
607|
> > 313767| 25810| 20014| 0| 175| 0| 0|
0|
> > 459| 237| 0| 20425| 20422| 3| 0| 33701|
699|
> >
> >
> >
> Then try to lower gc_elasticity, to 3, or even 2
>
> echo 3 >/proc/sys/net/ipv4/route/gc_elasticity
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 18:37 ` Denys Fedoryshchenko
@ 2008-03-27 18:56 ` Eric Dumazet
2008-03-27 19:05 ` Denis V. Lunev
2008-03-27 22:03 ` David Miller
0 siblings, 2 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-27 18:56 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: David Miller, kaber, netdev, netfilter-devel
Denys Fedoryshchenko a écrit :
> Here is output from dmesg with patch you supplied.
>
> Kup /config # rtstat -i60 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 103266| 69986| 11574| 0| 54| 0| 0| 0|
> 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254|
> 124787| 45125| 6019| 0| 28| 0| 0| 0|
> 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518|
> 120270| 45588| 6288| 0| 30| 0| 0| 0|
> 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743|
> 122253| 46522| 6582| 0| 27| 0| 0| 0|
> 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|
>
> [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999
> elapsed: 1 us
> [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998
> elapsed: 2 us
> [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000
> elapsed: 1 us
> [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998
> elapsed: 1 us
> [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600
> elapsed: 2 us
> [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696
> elapsed: 6 us
>
> After while
>
> Kup /config # rtstat -i300 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 129138| 64166| 9864| 0| 46| 0| 0| 0|
> 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985|
> 130125| 43971| 10829| 0| 70| 0| 0| 0|
> 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848|
> 144857| 32041| 22233| 0| 234| 0| 0| 0|
> 738| 261| 0| 22727| 22724| 2| 0| 42769| 811|
>
> [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999
> elapsed: 1 us
> [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600
> elapsed: 11274 us
> [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600
> elapsed: 11150 us
> [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100
> elapsed: 10896 us
> [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655
> elapsed: 11561 us
> [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988
> elapsed: 12254 us
> [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988
> elapsed: 12209 us
> [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987
> elapsed: 12039 us
> [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987
> elapsed: 12448 us
> [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988
> elapsed: 11435 us
> [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987
> elapsed: 12833 us
> [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987
> elapsed: 12068 us
> [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988
> elapsed: 11970 us
> [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989
> elapsed: 11747 us
> [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988
> elapsed: 12653 us
> [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987
> elapsed: 12217 us
> [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987
> elapsed: 12026 us
> [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988
> elapsed: 11151 us
>
> and after 30 minutes more
>
> Kup /config # rtstat -i300 -c60
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 194841| 176962| 77964| 0| 684| 1| 0| 0|
> 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455|
> 223224| 28262| 26499| 0| 236| 0| 0| 0|
> 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946|
> 248425| 28257| 26784| 0| 226| 0| 0| 0|
> 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412|
> 272823| 28943| 26728| 0| 246| 0| 0| 0|
> 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942|
> 295087| 28835| 26701| 0| 252| 0| 0| 0|
> 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312|
>
>
> [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 12519 us
> [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13930 us
> [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986
> elapsed: 14330 us
> [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972
> elapsed: 15230 us
> [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985
> elapsed: 15018 us
> [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13654 us
> [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987
> elapsed: 13137 us
>
>
Very interesting... you can see 101588 dst are *delayed* in dst_garbage,
but apparently never freed.
Something is wrong, but the count seems stable. Must be some kind of
event, admin driven or something...
might be the "ip route flush cache" that is schedlued around 600 seconds
of machine alive, then secret_interval seconds later...
Typically, when a "ip route flush cache" is done (manually or triggered
by secret_interval timer), refcounted>0 entries are put into dst_garbage.
Then when some trafic occurs on the flows involved, IP stack should
decrement refcount so that next dst_garbage round can free the deleted
entries. Normal TCP connections are doing this correctly. On your
machine, nothing.
Possibly idle tcp sessions ?
Maybe some netfilter problem ?
Please tell us more about your machine ;)
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 18:56 ` Eric Dumazet
@ 2008-03-27 19:05 ` Denis V. Lunev
2008-03-27 22:03 ` David Miller
1 sibling, 0 replies; 29+ messages in thread
From: Denis V. Lunev @ 2008-03-27 19:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Denys Fedoryshchenko, David Miller, kaber, netdev,
netfilter-devel
On Thu, 2008-03-27 at 19:56 +0100, Eric Dumazet wrote:
> Denys Fedoryshchenko a écrit :
> > Here is output from dmesg with patch you supplied.
> >
> > Kup /config # rtstat -i60 -c60
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 103266| 69986| 11574| 0| 54| 0| 0| 0|
> > 1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254|
> > 124787| 45125| 6019| 0| 28| 0| 0| 0|
> > 807| 230| 0| 6277| 6274| 0| 0| 128922| 2518|
> > 120270| 45588| 6288| 0| 30| 0| 0| 0|
> > 883| 214| 0| 6532| 6529| 0| 0| 125651| 2743|
> > 122253| 46522| 6582| 0| 27| 0| 0| 0|
> > 897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|
> >
> > [ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999
> > elapsed: 1 us
> > [ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998
> > elapsed: 2 us
> > [ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000
> > elapsed: 1 us
> > [ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998
> > elapsed: 1 us
> > [ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600
> > elapsed: 2 us
> > [ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696
> > elapsed: 6 us
> >
> > After while
> >
> > Kup /config # rtstat -i300 -c60
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 129138| 64166| 9864| 0| 46| 0| 0| 0|
> > 1334| 358| 0| 10012| 10008| 0| 0| 173006| 3985|
> > 130125| 43971| 10829| 0| 70| 0| 0| 0|
> > 907| 195| 0| 11093| 11090| 0| 0| 100601| 2848|
> > 144857| 32041| 22233| 0| 234| 0| 0| 0|
> > 738| 261| 0| 22727| 22724| 2| 0| 42769| 811|
> >
> > [ 687.740779] dst_total: 129365 delayed: 12 work_perf: 0 expires: 104999
> > elapsed: 1 us
> > [ 792.736043] dst_total: 127247 delayed: 101589 work_perf: 0 expires: 600
> > elapsed: 11274 us
> > [ 793.347127] dst_total: 122440 delayed: 101589 work_perf: 0 expires: 1600
> > elapsed: 11150 us
> > [ 794.957419] dst_total: 123780 delayed: 101589 work_perf: 0 expires: 3100
> > elapsed: 10896 us
> > [ 798.068614] dst_total: 121992 delayed: 101589 work_perf: 0 expires: 5655
> > elapsed: 11561 us
> > [ 803.734450] dst_total: 120552 delayed: 101589 work_perf: 0 expires: 7988
> > elapsed: 12254 us
> > [ 811.733227] dst_total: 121105 delayed: 101589 work_perf: 0 expires: 10988
> > elapsed: 12209 us
> > [ 822.733196] dst_total: 120142 delayed: 101589 work_perf: 0 expires: 13987
> > elapsed: 12039 us
> > [ 836.730637] dst_total: 120823 delayed: 101589 work_perf: 0 expires: 17987
> > elapsed: 12448 us
> > [ 854.726913] dst_total: 127478 delayed: 101589 work_perf: 0 expires: 22988
> > elapsed: 11435 us
> > [ 877.724827] dst_total: 127497 delayed: 101589 work_perf: 0 expires: 27987
> > elapsed: 12833 us
> > [ 905.719818] dst_total: 126693 delayed: 101589 work_perf: 0 expires: 32987
> > elapsed: 12068 us
> > [ 938.714595] dst_total: 130740 delayed: 101589 work_perf: 0 expires: 38988
> > elapsed: 11970 us
> > [ 977.707568] dst_total: 135449 delayed: 101589 work_perf: 0 expires: 45989
> > elapsed: 11747 us
> > [ 1023.701554] dst_total: 141181 delayed: 101589 work_perf: 0 expires: 52988
> > elapsed: 12653 us
> > [ 1076.694012] dst_total: 147895 delayed: 101588 work_perf: 1 expires: 59987
> > elapsed: 12217 us
> > [ 1136.684805] dst_total: 154798 delayed: 101588 work_perf: 0 expires: 67987
> > elapsed: 12026 us
> > [ 1204.673424] dst_total: 163376 delayed: 101588 work_perf: 0 expires: 76988
> > elapsed: 11151 us
> >
> > and after 30 minutes more
> >
> > Kup /config # rtstat -i300 -c60
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 194841| 176962| 77964| 0| 684| 1| 0| 0|
> > 3939| 1143| 0| 79531| 79516| 7| 0| 401258| 9455|
> > 223224| 28262| 26499| 0| 236| 0| 0| 0|
> > 661| 216| 0| 26951| 26947| 3| 0| 93059| 1946|
> > 248425| 28257| 26784| 0| 226| 0| 0| 0|
> > 678| 202| 0| 27211| 27208| 3| 0| 115593| 2412|
> > 272823| 28943| 26728| 0| 246| 0| 0| 0|
> > 695| 200| 0| 27174| 27171| 3| 0| 135256| 2942|
> > 295087| 28835| 26701| 0| 252| 0| 0| 0|
> > 687| 194| 0| 27146| 27143| 3| 0| 154269| 3312|
> >
> >
> > [ 2043.547879] dst_total: 242922 delayed: 101588 work_perf: 0 expires: 119987
> > elapsed: 12519 us
> > [ 2163.529997] dst_total: 252951 delayed: 101588 work_perf: 0 expires: 119987
> > elapsed: 13930 us
> > [ 2283.512215] dst_total: 262678 delayed: 101588 work_perf: 0 expires: 119986
> > elapsed: 14330 us
> > [ 2403.508447] dst_total: 271930 delayed: 101588 work_perf: 0 expires: 119972
> > elapsed: 15230 us
> > [ 2523.476844] dst_total: 288046 delayed: 101588 work_perf: 0 expires: 119985
> > elapsed: 15018 us
> > [ 2643.456996] dst_total: 296475 delayed: 101588 work_perf: 0 expires: 119987
> > elapsed: 13654 us
> > [ 2763.438267] dst_total: 305033 delayed: 101588 work_perf: 0 expires: 119987
> > elapsed: 13137 us
> >
> >
>
> Very interesting... you can see 101588 dst are *delayed* in dst_garbage,
> but apparently never freed.
>
> Something is wrong, but the count seems stable. Must be some kind of
> event, admin driven or something...
> might be the "ip route flush cache" that is schedlued around 600 seconds
> of machine alive, then secret_interval seconds later...
>
> Typically, when a "ip route flush cache" is done (manually or triggered
> by secret_interval timer), refcounted>0 entries are put into dst_garbage.
>
> Then when some trafic occurs on the flows involved, IP stack should
> decrement refcount so that next dst_garbage round can free the deleted
> entries. Normal TCP connections are doing this correctly. On your
> machine, nothing.
>
> Possibly idle tcp sessions ?
>
> Maybe some netfilter problem ?
>
> Please tell us more about your machine ;)
could you check the patch sent yesterday by Pavel under the name
[PATCH][ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).
It can fit the case, massive DST leakage is possible here.
Regards,
Den
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 18:56 ` Eric Dumazet
2008-03-27 19:05 ` Denis V. Lunev
@ 2008-03-27 22:03 ` David Miller
2008-03-28 0:47 ` Denys Fedoryshchenko
` (2 more replies)
1 sibling, 3 replies; 29+ messages in thread
From: David Miller @ 2008-03-27 22:03 UTC (permalink / raw)
To: dada1; +Cc: denys, kaber, netdev, netfilter-devel
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 27 Mar 2008 19:56:09 +0100
> Maybe some netfilter problem ?
>
> Please tell us more about your machine ;)
I'm beginning to suspect it's IFB and shaping somehow.
If those variables could be eliminated by eliminating them
from the configuration, it would help diagnose this a lot.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 22:03 ` David Miller
@ 2008-03-28 0:47 ` Denys Fedoryshchenko
2008-03-28 4:50 ` Denys Fedoryshchenko
2008-03-28 5:25 ` Denys Fedoryshchenko
2 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 0:47 UTC (permalink / raw)
To: dada1, David Miller; +Cc: kaber, netdev, netfilter-devel
After removing shaping and ifb problem probably remains. I will try to apply
mentioned by Denis Lunev, patch.
Netfilter it is kind of difficult to remove completely, pattern of traffic
will change a lot, and it will be false that it was cause of problem. I am
using NOTRACK, SNAT, DNAT, sure filtering. Nothing "special".
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
122438|25068014| 2464002| 0| 8579| 105| 0| 3|
436160| 149272| 0| 2560231| 2557567| 54| 0|52890998|
1164822|
[ 355.594112] dst_total: 115387 delayed: 1 work_perf: 0 expires: 68196
elapsed: 0 us
[ 423.780775] dst_total: 126186 delayed: 1 work_perf: 0 expires: 76999
elapsed: 0 us
[ 500.768537] dst_total: 122085 delayed: 1 work_perf: 0 expires: 86000
elapsed: 0 us
[ 586.755462] dst_total: 121715 delayed: 1 work_perf: 0 expires: 95000
elapsed: 1 us
[ 681.741665] dst_total: 129798 delayed: 1 work_perf: 0 expires: 104999
elapsed: 0 us
[ 786.732216] dst_total: 116243 delayed: 74333 work_perf: 1 expires: 600
elapsed: 7076 us
[ 787.338620] dst_total: 116912 delayed: 74333 work_perf: 0 expires: 1600
elapsed: 6962 us
[ 788.945653] dst_total: 118565 delayed: 74333 work_perf: 0 expires: 3100
elapsed: 7246 us
[ 792.052890] dst_total: 121611 delayed: 74333 work_perf: 0 expires: 5672
elapsed: 6954 us
[ 797.730354] dst_total: 126794 delayed: 74333 work_perf: 0 expires: 7993
elapsed: 7276 us
[ 805.729136] dst_total: 115686 delayed: 74333 work_perf: 0 expires: 10993
elapsed: 7270 us
[ 816.727732] dst_total: 126402 delayed: 74333 work_perf: 0 expires: 13993
elapsed: 7115 us
[ 830.726662] dst_total: 121002 delayed: 74333 work_perf: 0 expires: 17992
elapsed: 7179 us
[ 848.724144] dst_total: 119849 delayed: 74333 work_perf: 0 expires: 22992
elapsed: 7370 us
[ 871.719176] dst_total: 124661 delayed: 74333 work_perf: 0 expires: 27993
elapsed: 7313 us
[ 899.715826] dst_total: 118860 delayed: 74333 work_perf: 0 expires: 32992
elapsed: 7192 us
[ 932.710299] dst_total: 120856 delayed: 74333 work_perf: 0 expires: 38993
elapsed: 7291 us
[ 971.704648] dst_total: 119519 delayed: 74333 work_perf: 0 expires: 45993
elapsed: 6946 us
[ 1017.698036] dst_total: 126722 delayed: 74333 work_perf: 0 expires: 52992
elapsed: 7300 us
[ 1070.693197] dst_total: 124835 delayed: 101547 work_perf: 3 expires: 600
elapsed: 10089 us
[ 1071.303147] dst_total: 125939 delayed: 101547 work_perf: 0 expires: 1600
elapsed: 9544 us
[ 1072.913253] dst_total: 128499 delayed: 101547 work_perf: 0 expires: 3100
elapsed: 9895 us
[ 1076.021980] dst_total: 117827 delayed: 101547 work_perf: 0 expires: 5659
elapsed: 10096 us
[ 1081.690082] dst_total: 127534 delayed: 101547 work_perf: 0 expires: 7990
elapsed: 10052 us
[ 1089.690531] dst_total: 127330 delayed: 101547 work_perf: 0 expires: 10989
elapsed: 11717 us
Without sysctl adjustments (each 1 minute)
Kup ~ # rtstat -i60 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
40160| 3323| 1094| 0| 11| 0| 0| 0|
282| 127| 0| 274| 274| 0| 0| 1867| 226|
102698| 26174| 2885| 0| 6| 0| 0| 0|
620| 228| 0| 3118| 3115| 0| 0| 37298| 1331|
141996| 25752| 2172| 0| 3| 0| 0| 0|
521| 167| 0| 2343| 2340| 0| 0| 66292| 1698|
169128| 24716| 1593| 0| 2| 0| 0| 0|
468| 140| 0| 1734| 1731| 0| 0| 83079| 1714|
185495| 24867| 1493| 0| 1| 0| 0| 0|
386| 125| 0| 1619| 1616| 0| 0| 97451| 1603|
195665| 25227| 1443| 0| 1| 0| 0| 0|
427| 126| 0| 1570| 1567| 0| 0| 111618| 1955|
201357| 25239| 1370| 0| 2| 0| 0| 0|
435| 118| 0| 1490| 1487| 0| 0| 119599| 2010|
206841| 25805| 1487| 0| 2| 0| 0| 0|
515| 120| 0| 1607| 1604| 0| 0| 123823| 2429|
213800| 25230| 1678| 0| 2| 0| 0| 0|
414| 116| 0| 1795| 1792| 0| 0| 127516| 2210|
216461| 25882| 1503| 0| 2| 0| 0| 0|
424| 114| 0| 1618| 1615| 0| 0| 136162| 2308|
220386| 26141| 1632| 0| 2| 0| 0| 0|
387| 115| 0| 1749| 1745| 0| 0| 137147| 2165|
221533| 25649| 1476| 0| 2| 0| 0| 0|
394| 116| 0| 1593| 1590| 0| 0| 131760| 2274|
221139| 25682| 1368| 0| 2| 0| 0| 0|
432| 111| 0| 1481| 1477| 0| 0| 130734| 2321|
152901| 24897| 2242| 0| 6| 0| 0| 0|
367| 157| 0| 2405| 2402| 0| 0| 68008| 1245|
190000| 25295| 1747| 0| 1| 0| 0| 0|
403| 143| 0| 1891| 1888| 0| 0| 69542| 916|
[ 586.911061] dst_total: 225846 delayed: 4 work_perf: 0 expires: 95000
elapsed: 1 us
[ 681.897473] dst_total: 221226 delayed: 4 work_perf: 0 expires: 105000
elapsed: 1 us
[ 786.886616] dst_total: 145402 delayed: 59222 work_perf: 0 expires: 600
elapsed: 5893 us
[ 787.491390] dst_total: 145565 delayed: 59587 work_perf: 0 expires: 600
elapsed: 5783 us
[ 788.096098] dst_total: 145724 delayed: 59888 work_perf: 0 expires: 600
elapsed: 5582 us
[ 788.701111] dst_total: 145983 delayed: 60148 work_perf: 0 expires: 600
elapsed: 5685 us
[ 789.305991] dst_total: 146120 delayed: 60412 work_perf: 0 expires: 600
elapsed: 5661 us
[ 789.910802] dst_total: 146432 delayed: 60643 work_perf: 0 expires: 600
elapsed: 5566 us
[ 790.515868] dst_total: 146558 delayed: 61009 work_perf: 0 expires: 600
elapsed: 5720 us
[ 791.120788] dst_total: 146781 delayed: 61250 work_perf: 0 expires: 600
elapsed: 5714 us
[ 791.725773] dst_total: 147075 delayed: 61488 work_perf: 0 expires: 600
elapsed: 5806 us
[ 792.330558] dst_total: 147404 delayed: 61766 work_perf: 0 expires: 600
elapsed: 5685 us
[ 792.935548] dst_total: 147761 delayed: 61978 work_perf: 0 expires: 600
elapsed: 5765 us
[ 793.540664] dst_total: 147932 delayed: 62339 work_perf: 0 expires: 600
elapsed: 5966 us
[ 794.145640] dst_total: 148181 delayed: 62632 work_perf: 0 expires: 600
elapsed: 6046 us
[ 794.751394] dst_total: 148507 delayed: 62865 work_perf: 0 expires: 600
elapsed: 5886 us
[ 795.356327] dst_total: 148940 delayed: 63123 work_perf: 0 expires: 600
elapsed: 5913 us
[ 795.961312] dst_total: 149304 delayed: 63342 work_perf: 0 expires: 600
elapsed: 5988 us
[ 796.566230] dst_total: 149471 delayed: 63619 work_perf: 0 expires: 600
elapsed: 6002 us
[ 797.172182] dst_total: 149819 delayed: 63853 work_perf: 0 expires: 600
elapsed: 6046 us
[ 797.778163] dst_total: 150183 delayed: 64108 work_perf: 0 expires: 600
elapsed: 6117 us
[ 798.384008] dst_total: 150492 delayed: 64260 work_perf: 0 expires: 600
elapsed: 6052 us
[ 798.990028] dst_total: 150702 delayed: 64535 work_perf: 0 expires: 600
elapsed: 6167 us
[ 799.595797] dst_total: 151109 delayed: 64691 work_perf: 0 expires: 600
elapsed: 6026 us
[ 800.201763] dst_total: 151498 delayed: 64864 work_perf: 0 expires: 600
elapsed: 6084 us
[ 800.807742] dst_total: 151861 delayed: 65075 work_perf: 0 expires: 600
elapsed: 6155 us
[ 801.413577] dst_total: 152344 delayed: 65245 work_perf: 0 expires: 600
elapsed: 6079 us
[ 802.019727] dst_total: 152608 delayed: 65498 work_perf: 0 expires: 600
elapsed: 6323 us
[ 802.625410] dst_total: 153049 delayed: 65676 work_perf: 1 expires: 600
elapsed: 6099 us
[ 803.231251] dst_total: 153390 delayed: 65835 work_perf: 0 expires: 600
elapsed: 6024 us
[ 803.837186] dst_total: 153855 delayed: 65990 work_perf: 0 expires: 600
elapsed: 6056 us
[ 804.443457] dst_total: 154294 delayed: 66133 work_perf: 0 expires: 600
elapsed: 6422 us
[ 805.049167] dst_total: 154548 delayed: 66378 work_perf: 0 expires: 600
elapsed: 6221 us
[ 805.655036] dst_total: 155001 delayed: 66499 work_perf: 0 expires: 600
elapsed: 6183 us
[ 806.260986] dst_total: 155418 delayed: 66642 work_perf: 0 expires: 600
elapsed: 6229 us
[ 806.866893] dst_total: 155892 delayed: 66768 work_perf: 0 expires: 600
elapsed: 6224 us
[ 807.473041] dst_total: 156268 delayed: 66884 work_perf: 0 expires: 600
elapsed: 6463 us
[ 808.078979] dst_total: 156514 delayed: 67093 work_perf: 0 expires: 600
elapsed: 6494 us
[ 808.684670] dst_total: 156927 delayed: 67226 work_perf: 0 expires: 600
elapsed: 6275 us
[ 809.290646] dst_total: 157364 delayed: 67327 work_perf: 0 expires: 600
elapsed: 6345 us
[ 809.896516] dst_total: 157724 delayed: 67428 work_perf: 0 expires: 600
elapsed: 6307 us
[ 810.502756] dst_total: 157915 delayed: 67642 work_perf: 0 expires: 600
elapsed: 6641 us
[ 811.108374] dst_total: 158349 delayed: 67746 work_perf: 0 expires: 600
elapsed: 6349 us
[ 811.714508] dst_total: 158856 delayed: 67843 work_perf: 0 expires: 600
elapsed: 6576 us
[ 812.320241] dst_total: 159315 delayed: 67948 work_perf: 0 expires: 600
elapsed: 6403 us
[ 812.926305] dst_total: 159726 delayed: 68041 work_perf: 0 expires: 600
elapsed: 6553 us
[ 813.532064] dst_total: 159997 delayed: 68217 work_perf: 0 expires: 600
elapsed: 6406 us
[ 814.137984] dst_total: 160527 delayed: 68309 work_perf: 0 expires: 600
elapsed: 6416 us
[ 814.743949] dst_total: 160994 delayed: 68404 work_perf: 0 expires: 600
elapsed: 6475 us
[ 815.350120] dst_total: 161460 delayed: 68497 work_perf: 0 expires: 600
elapsed: 6739 us
[ 815.955683] dst_total: 161873 delayed: 68568 work_perf: 0 expires: 600
elapsed: 6393 us
[ 816.561587] dst_total: 162083 delayed: 68684 work_perf: 0 expires: 600
elapsed: 6389 us
[ 817.167555] dst_total: 162613 delayed: 68748 work_perf: 0 expires: 600
elapsed: 6450 us
[ 817.773653] dst_total: 163137 delayed: 68821 work_perf: 0 expires: 600
elapsed: 6637 us
[ 818.379400] dst_total: 163592 delayed: 68889 work_perf: 0 expires: 600
elapsed: 6477 us
[ 818.985324] dst_total: 164115 delayed: 68946 work_perf: 0 expires: 600
elapsed: 6494 us
[ 819.591212] dst_total: 164223 delayed: 69019 work_perf: 0 expires: 600
elapsed: 6474 us
[ 820.197032] dst_total: 164666 delayed: 69019 work_perf: 0 expires: 1600
elapsed: 6387 us
[ 821.802811] dst_total: 165685 delayed: 69019 work_perf: 0 expires: 3100
elapsed: 6402 us
[ 824.908547] dst_total: 167570 delayed: 69019 work_perf: 0 expires: 4967
elapsed: 6588 us
[ 829.881811] dst_total: 170472 delayed: 69019 work_perf: 0 expires: 7993
elapsed: 6786 us
[ 837.880254] dst_total: 175476 delayed: 69019 work_perf: 0 expires: 10993
elapsed: 6455 us
[ 848.878097] dst_total: 181394 delayed: 69019 work_perf: 0 expires: 13994
elapsed: 6752 us
[ 862.875836] dst_total: 190093 delayed: 69019 work_perf: 0 expires: 17994
elapsed: 6633 us
[ 880.873325] dst_total: 200983 delayed: 69019 work_perf: 0 expires: 22994
elapsed: 6839 us
[ 903.870309] dst_total: 207322 delayed: 69019 work_perf: 0 expires: 27993
elapsed: 6522 us
On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Thu, 27 Mar 2008 19:56:09 +0100
>
> > Maybe some netfilter problem ?
> >
> > Please tell us more about your machine ;)
>
> I'm beginning to suspect it's IFB and shaping somehow.
>
> If those variables could be eliminated by eliminating them
> from the configuration, it would help diagnose this a lot.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 22:03 ` David Miller
2008-03-28 0:47 ` Denys Fedoryshchenko
@ 2008-03-28 4:50 ` Denys Fedoryshchenko
2008-03-28 5:25 ` Denys Fedoryshchenko
2 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 4:50 UTC (permalink / raw)
To: David Miller, dada1; +Cc: kaber, netdev, netfilter-devel
After patching , without ifb and "shapers".
[11807.126790] dst_total: 807311 delayed: 785973 work_perf: 0 expires: 45928
elapsed: 71977 us
[11853.118939] dst_total: 810625 delayed: 785973 work_perf: 0 expires: 52928
elapsed: 72092 us
[11906.110627] dst_total: 814329 delayed: 785973 work_perf: 0 expires: 59929
elapsed: 71415 us
[11966.101684] dst_total: 818621 delayed: 785973 work_perf: 0 expires: 67929
elapsed: 71566 us
[12034.092650] dst_total: 823641 delayed: 785973 work_perf: 0 expires: 76927
elapsed: 72856 us
[12111.080909] dst_total: 829008 delayed: 785973 work_perf: 0 expires: 85927
elapsed: 73175 us
[12197.066708] dst_total: 835397 delayed: 785973 work_perf: 0 expires: 94928
elapsed: 72001 us
[12292.054049] dst_total: 842092 delayed: 785972 work_perf: 1 expires: 104927
elapsed: 73341 us
[12397.038451] dst_total: 849712 delayed: 785972 work_perf: 0 expires: 115926
elapsed: 74072 us
[12513.032054] dst_total: 858081 delayed: 785972 work_perf: 0 expires: 119915
elapsed: 77027 us
[12633.016029] dst_total: 866064 delayed: 785972 work_perf: 0 expires: 119913
elapsed: 87431 us
[12752.987981] dst_total: 874067 delayed: 785972 work_perf: 0 expires: 119923
elapsed: 76570 us
[12872.981870] dst_total: 882006 delayed: 785972 work_perf: 0 expires: 119911
elapsed: 88655 us
[12992.950062] dst_total: 889918 delayed: 785972 work_perf: 0 expires: 119924
elapsed: 76035 us
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
896238|135781150|112584149| 0| 475839| 1708| 0| 18|
3234260| 2340122| 0|115324498|115284650| 35962| 0|484427466|
15419774|
On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Thu, 27 Mar 2008 19:56:09 +0100
>
> > Maybe some netfilter problem ?
> >
> > Please tell us more about your machine ;)
>
> I'm beginning to suspect it's IFB and shaping somehow.
>
> If those variables could be eliminated by eliminating them
> from the configuration, it would help diagnose this a lot.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-27 22:03 ` David Miller
2008-03-28 0:47 ` Denys Fedoryshchenko
2008-03-28 4:50 ` Denys Fedoryshchenko
@ 2008-03-28 5:25 ` Denys Fedoryshchenko
2008-03-28 5:49 ` Eric Dumazet
2 siblings, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 5:25 UTC (permalink / raw)
To: David Miller, dada1; +Cc: kaber, netdev, netfilter-devel
Just to make sure 2.6.24.3 is stable and it is regression i am supplying
output from it.
Do you want me to submit summary to bugzilla and regression list as well?
And in short, IMHO 2.6.25 have major issues on routing that have to be fixed
before release. TRIE is crashing, and even with HASH there is leak. I am
trying my best to bisect it, but it is major router and i cannot take much
risk on it, so i wish i can simulate in my home mini-lab. Still i am not able
to get even proper switch (Lebanon difficult country for IT).
Kup ~ # uname -a
Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknown
up ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
54750| 4430| 1128| 0| 12| 0| 0| 0|
263| 190| 0| 709| 708| 0| 0| 3545| 313|
92913| 8829| 1211| 0| 1| 0| 0| 0|
343| 163| 0| 1375| 1373| 0| 0| 12545| 724|
115323| 8232| 906| 0| 0| 0| 0| 0|
299| 128| 0| 1035| 1033| 0| 0| 18069| 813|
128985| 8650| 839| 0| 0| 0| 0| 0|
289| 115| 0| 954| 952| 0| 0| 22515| 845|
116682| 8911| 861| 0| 0| 0| 0| 0|
288| 117| 0| 978| 976| 0| 0| 23433| 775|
99969| 9164| 889| 0| 0| 0| 0| 0|
280| 113| 0| 1002| 1000| 0| 0| 26741| 839|
124602| 9395| 1012| 0| 0| 0| 0| 0|
271| 122| 0| 1134| 1132| 0| 0| 27381| 787|
110051| 10036| 824| 0| 0| 0| 0| 0|
279| 120| 0| 944| 942| 0| 0| 28558| 783|
126835| 10631| 772| 0| 0| 0| 0| 0|
274| 117| 0| 888| 886| 0| 0| 29451| 780|
111881| 10357| 762| 0| 0| 0| 0| 0|
275| 117| 0| 879| 877| 0| 0| 28235| 751|
127018| 10178| 796| 0| 0| 0| 0| 0|
283| 117| 0| 913| 911| 0| 0| 29480| 807|
112242| 9839| 814| 0| 0| 0| 0| 0|
293| 115| 0| 929| 927| 0| 0| 28095| 796|
41267| 9493| 1217| 0| 1| 0| 0| 0|
269| 138| 0| 811| 810| 0| 0| 18545| 548|
76380| 9722| 1060| 0| 1| 0| 0| 0|
250| 135| 0| 1195| 1193| 0| 0| 14786| 414|
99922| 9811| 779| 0| 0| 0| 0| 0|
281| 124| 0| 902| 900| 0| 0| 21853| 589|
Kup ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
122053| 150955| 14888| 0| 25| 1| 0| 0|
4611| 2090| 0| 15820| 15789| 0| 0| 369513| 11562|
105226| 10215| 872| 0| 0| 0| 0| 0|
279| 116| 0| 988| 986| 0| 0| 30343| 799|
126236| 10462| 924| 0| 0| 0| 0| 0|
260| 120| 0| 1044| 1042| 0| 0| 31699| 782|
114492| 9782| 884| 0| 0| 0| 0| 0|
253| 120| 0| 1005| 1003| 0| 0| 29695| 722|
After ip route flush cache
Kup ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
9088| 202136| 19262| 0| 29| 1| 0| 0|
5976| 2696| 0| 20647| 20606| 0| 0| 521714| 15415|
!!!!!
I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will make
sure about that now.
On Thu, 27 Mar 2008 15:03:08 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Thu, 27 Mar 2008 19:56:09 +0100
>
> > Maybe some netfilter problem ?
> >
> > Please tell us more about your machine ;)
>
> I'm beginning to suspect it's IFB and shaping somehow.
>
> If those variables could be eliminated by eliminating them
> from the configuration, it would help diagnose this a lot.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 5:25 ` Denys Fedoryshchenko
@ 2008-03-28 5:49 ` Eric Dumazet
2008-03-28 7:38 ` Denys Fedoryshchenko
0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2008-03-28 5:49 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: David Miller, kaber, netdev, netfilter-devel
Denys Fedoryshchenko a écrit :
> Just to make sure 2.6.24.3 is stable and it is regression i am supplying
> output from it.
> Do you want me to submit summary to bugzilla and regression list as well?
>
> And in short, IMHO 2.6.25 have major issues on routing that have to be fixed
> before release. TRIE is crashing, and even with HASH there is leak. I am
> trying my best to bisect it, but it is major router and i cannot take much
> risk on it, so i wish i can simulate in my home mini-lab. Still i am not able
> to get even proper switch (Lebanon difficult country for IT).
>
> Kup ~ # uname -a
> Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknown
>
> up ~ # rtstat -i60 -c6000
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 54750| 4430| 1128| 0| 12| 0| 0| 0|
> 263| 190| 0| 709| 708| 0| 0| 3545| 313|
> 92913| 8829| 1211| 0| 1| 0| 0| 0|
> 343| 163| 0| 1375| 1373| 0| 0| 12545| 724|
> 115323| 8232| 906| 0| 0| 0| 0| 0|
> 299| 128| 0| 1035| 1033| 0| 0| 18069| 813|
> 128985| 8650| 839| 0| 0| 0| 0| 0|
> 289| 115| 0| 954| 952| 0| 0| 22515| 845|
> 116682| 8911| 861| 0| 0| 0| 0| 0|
> 288| 117| 0| 978| 976| 0| 0| 23433| 775|
> 99969| 9164| 889| 0| 0| 0| 0| 0|
> 280| 113| 0| 1002| 1000| 0| 0| 26741| 839|
> 124602| 9395| 1012| 0| 0| 0| 0| 0|
> 271| 122| 0| 1134| 1132| 0| 0| 27381| 787|
> 110051| 10036| 824| 0| 0| 0| 0| 0|
> 279| 120| 0| 944| 942| 0| 0| 28558| 783|
> 126835| 10631| 772| 0| 0| 0| 0| 0|
> 274| 117| 0| 888| 886| 0| 0| 29451| 780|
> 111881| 10357| 762| 0| 0| 0| 0| 0|
> 275| 117| 0| 879| 877| 0| 0| 28235| 751|
> 127018| 10178| 796| 0| 0| 0| 0| 0|
> 283| 117| 0| 913| 911| 0| 0| 29480| 807|
> 112242| 9839| 814| 0| 0| 0| 0| 0|
> 293| 115| 0| 929| 927| 0| 0| 28095| 796|
> 41267| 9493| 1217| 0| 1| 0| 0| 0|
> 269| 138| 0| 811| 810| 0| 0| 18545| 548|
> 76380| 9722| 1060| 0| 1| 0| 0| 0|
> 250| 135| 0| 1195| 1193| 0| 0| 14786| 414|
> 99922| 9811| 779| 0| 0| 0| 0| 0|
> 281| 124| 0| 902| 900| 0| 0| 21853| 589|
>
> Kup ~ # rtstat -i60 -c6000
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
>
> 122053| 150955| 14888| 0| 25| 1| 0| 0|
> 4611| 2090| 0| 15820| 15789| 0| 0| 369513| 11562|
> 105226| 10215| 872| 0| 0| 0| 0| 0|
> 279| 116| 0| 988| 986| 0| 0| 30343| 799|
> 126236| 10462| 924| 0| 0| 0| 0| 0|
> 260| 120| 0| 1044| 1042| 0| 0| 31699| 782|
> 114492| 9782| 884| 0| 0| 0| 0| 0|
> 253| 120| 0| 1005| 1003| 0| 0| 29695| 722|
>
> After ip route flush cache
> Kup ~ # rtstat -i60 -c6000
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> rt_cache|
> entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> out_hlis|
> | | tot| mc| ute| | an_dst|
> an_src| | _tot| _mc| | ed| miss| verflow|
> _search|t_search|
> 9088| 202136| 19262| 0| 29| 1| 0| 0|
> 5976| 2696| 0| 20647| 20606| 0| 0| 521714| 15415|
>
>
> !!!!!
> I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will make
> sure about that now.
>
Maybe you are a litle bit too fast for "ip route flush cache" :)
It used to work like that : schedule a timer to start a flush in about 2
seconds. A flush meaning : scan the whole table and delete all entries.
On machines with 4 millions dst entries, this was using too much time
and eventually crashing.
On recent kernels, each rtable entry has a special field named rt_genid,
so that "ip route flush cache" doesnt have to scan the whole table, but
only change the global genid. rtables entries will be deleted later,
when their rt_genid is found to be different than the global genid.
Please try the patch that was suggested yesterday, as it is probably the
cure your router needs.
http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
Thank you
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 5:49 ` Eric Dumazet
@ 2008-03-28 7:38 ` Denys Fedoryshchenko
2008-03-28 7:46 ` Eric Dumazet
0 siblings, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 7:38 UTC (permalink / raw)
To: netdev, netfilter-devel
Already patched and tested, it doesn't change anything.
On Fri, 28 Mar 2008 06:49:53 +0100, Eric Dumazet wrote
> Denys Fedoryshchenko a :
> > Just to make sure 2.6.24.3 is stable and it is regression i am supplying
> > output from it.
> > Do you want me to submit summary to bugzilla and regression list as well?
> >
> > And in short, IMHO 2.6.25 have major issues on routing that have to be
fixed
> > before release. TRIE is crashing, and even with HASH there is leak. I am
> > trying my best to bisect it, but it is major router and i cannot take
much
> > risk on it, so i wish i can simulate in my home mini-lab. Still i am not
able
> > to get even proper switch (Lebanon difficult country for IT).
> >
> > Kup ~ # uname -a
> > Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686
unknown
> >
> > up ~ # rtstat -i60 -c6000
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 54750| 4430| 1128| 0| 12| 0| 0|
0|
> > 263| 190| 0| 709| 708| 0| 0| 3545|
313|
> > 92913| 8829| 1211| 0| 1| 0| 0|
0|
> > 343| 163| 0| 1375| 1373| 0| 0| 12545|
724|
> > 115323| 8232| 906| 0| 0| 0| 0|
0|
> > 299| 128| 0| 1035| 1033| 0| 0| 18069|
813|
> > 128985| 8650| 839| 0| 0| 0| 0|
0|
> > 289| 115| 0| 954| 952| 0| 0| 22515|
845|
> > 116682| 8911| 861| 0| 0| 0| 0|
0|
> > 288| 117| 0| 978| 976| 0| 0| 23433|
775|
> > 99969| 9164| 889| 0| 0| 0| 0|
0|
> > 280| 113| 0| 1002| 1000| 0| 0| 26741|
839|
> > 124602| 9395| 1012| 0| 0| 0| 0|
0|
> > 271| 122| 0| 1134| 1132| 0| 0| 27381|
787|
> > 110051| 10036| 824| 0| 0| 0| 0|
0|
> > 279| 120| 0| 944| 942| 0| 0| 28558|
783|
> > 126835| 10631| 772| 0| 0| 0| 0|
0|
> > 274| 117| 0| 888| 886| 0| 0| 29451|
780|
> > 111881| 10357| 762| 0| 0| 0| 0|
0|
> > 275| 117| 0| 879| 877| 0| 0| 28235|
751|
> > 127018| 10178| 796| 0| 0| 0| 0|
0|
> > 283| 117| 0| 913| 911| 0| 0| 29480|
807|
> > 112242| 9839| 814| 0| 0| 0| 0|
0|
> > 293| 115| 0| 929| 927| 0| 0| 28095|
796|
> > 41267| 9493| 1217| 0| 1| 0| 0|
0|
> > 269| 138| 0| 811| 810| 0| 0| 18545|
548|
> > 76380| 9722| 1060| 0| 1| 0| 0|
0|
> > 250| 135| 0| 1195| 1193| 0| 0| 14786|
414|
> > 99922| 9811| 779| 0| 0| 0| 0|
0|
> > 281| 124| 0| 902| 900| 0| 0| 21853|
589|
> >
> > Kup ~ # rtstat -i60 -c6000
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> >
> > 122053| 150955| 14888| 0| 25| 1| 0|
0|
> > 4611| 2090| 0| 15820| 15789| 0| 0| 369513|
11562|
> > 105226| 10215| 872| 0| 0| 0| 0|
0|
> > 279| 116| 0| 988| 986| 0| 0| 30343|
799|
> > 126236| 10462| 924| 0| 0| 0| 0|
0|
> > 260| 120| 0| 1044| 1042| 0| 0| 31699|
782|
> > 114492| 9782| 884| 0| 0| 0| 0|
0|
> > 253| 120| 0| 1005| 1003| 0| 0| 29695|
722|
> >
> > After ip route flush cache
> > Kup ~ # rtstat -i60 -c6000
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
> > rt_cache|
> > entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
> > out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
> > out_hlis|
> > | | tot| mc| ute| | an_dst|
> > an_src| | _tot| _mc| | ed| miss| verflow|
> > _search|t_search|
> > 9088| 202136| 19262| 0| 29| 1| 0|
0|
> > 5976| 2696| 0| 20647| 20606| 0| 0| 521714|
15415|
> >
> >
> > !!!!!
> > I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will
make
> > sure about that now.
> >
> Maybe you are a litle bit too fast for "ip route flush cache" :)
>
> It used to work like that : schedule a timer to start a flush in
> about 2 seconds. A flush meaning : scan the whole table and delete
> all entries.
>
> On machines with 4 millions dst entries, this was using too much
> time and eventually crashing.
>
> On recent kernels, each rtable entry has a special field named
> rt_genid, so that "ip route flush cache" doesnt have to scan the
> whole table, but only change the global genid. rtables entries will
> be deleted later, when their rt_genid is found to be different than
> the global genid.
>
> Please try the patch that was suggested yesterday, as it is probably
> the cure your router needs.
>
> http://git2.kernel.org/?p=linux/kernel/git/davem/net-
2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
>
> Thank you
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 7:38 ` Denys Fedoryshchenko
@ 2008-03-28 7:46 ` Eric Dumazet
2008-03-28 9:50 ` Denys Fedoryshchenko
2008-03-28 15:57 ` Eric Dumazet
0 siblings, 2 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-28 7:46 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, netfilter-devel
Denys Fedoryshchenko a écrit :
> Already patched and tested, it doesn't change anything.
>
>
We still leak dsts somewhere.
You could try git bisect, or try to patch net/core/dst.c so that
dst_gc_task() (line 83) displays
route informations for say 10 first entries found in the dst_busy_list
(refcnt, interface, source IP, dest IP, things like that) that could
ring a bell given your netfilter rules or network conf.
>
>> Maybe you are a litle bit too fast for "ip route flush cache" :)
>>
>> It used to work like that : schedule a timer to start a flush in
>> about 2 seconds. A flush meaning : scan the whole table and delete
>> all entries.
>>
>> On machines with 4 millions dst entries, this was using too much
>> time and eventually crashing.
>>
>> On recent kernels, each rtable entry has a special field named
>> rt_genid, so that "ip route flush cache" doesnt have to scan the
>> whole table, but only change the global genid. rtables entries will
>> be deleted later, when their rt_genid is found to be different than
>> the global genid.
>>
>> Please try the patch that was suggested yesterday, as it is probably
>> the cure your router needs.
>>
>> http://git2.kernel.org/?p=linux/kernel/git/davem/net-
>>
> 2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 7:46 ` Eric Dumazet
@ 2008-03-28 9:50 ` Denys Fedoryshchenko
2008-03-28 15:57 ` Eric Dumazet
1 sibling, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 9:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, netfilter-devel
I will try do so,
And i have a bit more information, not sure if it is useful. I took risk and
running now one more loaded router, without nat this time. It has significant
less load, but maybe i will catch something here also.
Here is dmesg i got
[ 23.280155] dst_total: 4 delayed: 1 work_perf: 0 expires: 600 elapsed: 1 us
[ 23.888719] dst_total: 5 delayed: 1 work_perf: 0 expires: 1600 elapsed: 1
us
[ 25.489486] dst_total: 11 delayed: 0 work_perf: 1 expires: 4294967295
elapsed: 2 us
[ 67.187254] dst_total: 23980 delayed: 1 work_perf: 0 expires: 600 elapsed:
2 us
[ 67.807126] dst_total: 24452 delayed: 1 work_perf: 0 expires: 1600
elapsed: 3 us
[ 69.453570] dst_total: 25103 delayed: 0 work_perf: 2 expires: 4294967295
elapsed: 3 us
>>> here i run "ip route flush cache"
[ 278.911357] dst_total: 16855 delayed: 2 work_perf: 0 expires: 600 elapsed:
2 us
[ 279.530432] dst_total: 16866 delayed: 2 work_perf: 0 expires: 1600
elapsed: 1 us
[ 281.197568] dst_total: 16901 delayed: 2 work_perf: 0 expires: 3100
elapsed: 1 us
[ 284.425797] dst_total: 16981 delayed: 2 work_perf: 0 expires: 4981
elapsed: 1 us
[ 289.665137] dst_total: 17067 delayed: 2 work_perf: 0 expires: 8000
elapsed: 1 us
[ 297.960978] dst_total: 17219 delayed: 2 work_perf: 0 expires: 11000
elapsed: 1 us
[ 309.379867] dst_total: 17426 delayed: 2 work_perf: 0 expires: 14100
elapsed: 2 us
[ 323.972039] dst_total: 17629 delayed: 2 work_perf: 0 expires: 18196
elapsed: 2 us
[ 342.831626] dst_total: 13563 delayed: 2 work_perf: 0 expires: 23000
elapsed: 2 us
[ 366.592260] dst_total: 13830 delayed: 2 work_perf: 0 expires: 28000
elapsed: 2 us
[ 395.753299] dst_total: 14142 delayed: 2 work_perf: 0 expires: 33000
elapsed: 2 us
[ 429.952513] dst_total: 13156 delayed: 3 work_perf: 0 expires: 600 elapsed:
2 us
[ 430.565783] dst_total: 13164 delayed: 3 work_perf: 0 expires: 1600
elapsed: 1 us
[ 432.267868] dst_total: 13184 delayed: 3 work_perf: 0 expires: 3100
elapsed: 1 us
[ 435.457375] dst_total: 13220 delayed: 3 work_perf: 0 expires: 5700
elapsed: 2 us
[ 441.355880] dst_total: 13279 delayed: 3 work_perf: 0 expires: 8000
elapsed: 2 us
[ 449.567527] dst_total: 13363 delayed: 3 work_perf: 0 expires: 11000
elapsed: 1 us
[ 461.027956] dst_total: 13507 delayed: 3 work_perf: 0 expires: 14000
elapsed: 2 us
[ 475.568361] dst_total: 12084 delayed: 3 work_perf: 0 expires: 18000
elapsed: 2 us
Is expires values appearing 4294967295 normal?
Btw i am doing ip route flush cache only manually, normally it never happens
on this machines.
On Fri, 28 Mar 2008 08:46:20 +0100, Eric Dumazet wrote
> Denys Fedoryshchenko :
> > Already patched and tested, it doesn't change anything.
> >
> >
>
> We still leak dsts somewhere.
>
> You could try git bisect, or try to patch net/core/dst.c so that
> dst_gc_task() (line 83) displays
> route informations for say 10 first entries found in the dst_busy_list
> (refcnt, interface, source IP, dest IP, things like that) that could
> ring a bell given your netfilter rules or network conf.
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 7:46 ` Eric Dumazet
2008-03-28 9:50 ` Denys Fedoryshchenko
@ 2008-03-28 15:57 ` Eric Dumazet
2008-03-28 20:23 ` Stephen Hemminger
1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2008-03-28 15:57 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Denys Fedoryshchenko, netdev, netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 878 bytes --]
Eric Dumazet a écrit :
> Denys Fedoryshchenko a écrit :
>> Already patched and tested, it doesn't change anything.
>>
>>
>
> We still leak dsts somewhere.
>
> You could try git bisect, or try to patch net/core/dst.c so that
> dst_gc_task() (line 83) displays
> route informations for say 10 first entries found in the dst_busy_list
> (refcnt, interface, source IP, dest IP, things like that) that could
> ring a bell given your netfilter rules or network conf.
I cooked a patch (untested) to implement this idea :
It should display lines similar to /proc/net/rt_cache (reusing the same
helper function)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
include/net/dst.h | 2 +
net/core/dst.c | 7 ++++++
net/ipv4/route.c | 47 ++++++++++++++++++++++++++------------------
3 files changed, 37 insertions(+), 19 deletions(-)
[-- Attachment #2: show_info.patch --]
[-- Type: text/plain, Size: 4070 bytes --]
diff --git a/include/net/dst.h b/include/net/dst.h
index 002500e..2fe9a6c 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -103,6 +103,8 @@ struct dst_ops
void (*link_failure)(struct sk_buff *);
void (*update_pmtu)(struct dst_entry *dst, u32 mtu);
int (*local_out)(struct sk_buff *skb);
+ size_t (*info_show)(struct dst_entry *, char *buff,
+ size_t len);
int entry_size;
atomic_t entries;
diff --git a/net/core/dst.c b/net/core/dst.c
index fe03266..08395e8 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -70,6 +70,7 @@ static void dst_gc_task(struct work_struct *work)
#if RT_CACHE_DEBUG >= 2
ktime_t time_start = ktime_get();
struct timespec elapsed;
+ char info[128];
#endif
mutex_lock(&dst_gc_mutex);
@@ -82,6 +83,12 @@ loop:
if (likely(atomic_read(&dst->__refcnt))) {
last->next = dst;
last = dst;
+#if RT_CACHE_DEBUG >= 2
+ if (delayed < 10 && dst->ops->info_show) {
+ dst->ops->info_show(dst, info, sizeof(info));
+ printk(KERN_DEBUG "dstgc [%d] %s\n", delayed, info);
+ }
+#endif
delayed++;
continue;
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 230716c..ef8c3b7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -148,6 +148,8 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
static void ipv4_link_failure(struct sk_buff *skb);
static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
static int rt_garbage_collect(struct dst_ops *ops);
+static size_t rt_info_show(struct dst_entry *dst,
+ char *buff, size_t maxlen);
static struct dst_ops ipv4_dst_ops = {
@@ -161,6 +163,7 @@ static struct dst_ops ipv4_dst_ops = {
.link_failure = ipv4_link_failure,
.update_pmtu = ip_rt_update_pmtu,
.local_out = ip_local_out,
+ .info_show = rt_info_show,
.entry_size = sizeof(struct rtable),
.entries = ATOMIC_INIT(0),
};
@@ -269,6 +272,28 @@ static unsigned int rt_hash_code(u32 daddr, u32 saddr)
rt_hash_code((__force u32)(__be32)(daddr),\
(__force u32)(__be32)(saddr) ^ ((idx) << 5))
+static size_t rt_info_show(struct dst_entry *dst, char *buff, size_t maxlen)
+{
+ struct rtable *r = (struct rtable *)dst;
+
+ return snprintf(buff, maxlen, "%s\t%08lX\t%08lX\t%8X\t%d\t%u\t%d\t"
+ "%08lX\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X",
+ r->u.dst.dev ? r->u.dst.dev->name : "*",
+ (unsigned long)r->rt_dst, (unsigned long)r->rt_gateway,
+ r->rt_flags, atomic_read(&r->u.dst.__refcnt),
+ r->u.dst.__use, 0, (unsigned long)r->rt_src,
+ (dst_metric(&r->u.dst, RTAX_ADVMSS) ?
+ (int)dst_metric(&r->u.dst, RTAX_ADVMSS) + 40 : 0),
+ dst_metric(&r->u.dst, RTAX_WINDOW),
+ (int)((dst_metric(&r->u.dst, RTAX_RTT) >> 3) +
+ dst_metric(&r->u.dst, RTAX_RTTVAR)),
+ r->fl.fl4_tos,
+ r->u.dst.hh ? atomic_read(&r->u.dst.hh->hh_refcnt) : -1,
+ r->u.dst.hh ? (r->u.dst.hh->hh_output ==
+ dev_queue_xmit) : 0,
+ r->rt_spec_dst);
+}
+
#ifdef CONFIG_PROC_FS
struct rt_cache_iter_state {
struct seq_net_private p;
@@ -368,25 +393,9 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v)
"Metric\tSource\t\tMTU\tWindow\tIRTT\tTOS\tHHRef\t"
"HHUptod\tSpecDst");
else {
- struct rtable *r = v;
- char temp[256];
-
- sprintf(temp, "%s\t%08lX\t%08lX\t%8X\t%d\t%u\t%d\t"
- "%08lX\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X",
- r->u.dst.dev ? r->u.dst.dev->name : "*",
- (unsigned long)r->rt_dst, (unsigned long)r->rt_gateway,
- r->rt_flags, atomic_read(&r->u.dst.__refcnt),
- r->u.dst.__use, 0, (unsigned long)r->rt_src,
- (dst_metric(&r->u.dst, RTAX_ADVMSS) ?
- (int)dst_metric(&r->u.dst, RTAX_ADVMSS) + 40 : 0),
- dst_metric(&r->u.dst, RTAX_WINDOW),
- (int)((dst_metric(&r->u.dst, RTAX_RTT) >> 3) +
- dst_metric(&r->u.dst, RTAX_RTTVAR)),
- r->fl.fl4_tos,
- r->u.dst.hh ? atomic_read(&r->u.dst.hh->hh_refcnt) : -1,
- r->u.dst.hh ? (r->u.dst.hh->hh_output ==
- dev_queue_xmit) : 0,
- r->rt_spec_dst);
+ char temp[128];
+
+ rt_info_show(v, temp, 128);
seq_printf(seq, "%-127s\n", temp);
}
return 0;
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 15:57 ` Eric Dumazet
@ 2008-03-28 20:23 ` Stephen Hemminger
2008-03-28 20:45 ` Denys Fedoryshchenko
0 siblings, 1 reply; 29+ messages in thread
From: Stephen Hemminger @ 2008-03-28 20:23 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, Denys Fedoryshchenko, netdev, netfilter-devel
On Fri, 28 Mar 2008 16:57:55 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:
> Eric Dumazet a écrit :
> > Denys Fedoryshchenko a écrit :
> >> Already patched and tested, it doesn't change anything.
> >>
> >>
> >
> > We still leak dsts somewhere.
> >
> > You could try git bisect, or try to patch net/core/dst.c so that
> > dst_gc_task() (line 83) displays
> > route informations for say 10 first entries found in the dst_busy_list
> > (refcnt, interface, source IP, dest IP, things like that) that could
> > ring a bell given your netfilter rules or network conf.
>
> I cooked a patch (untested) to implement this idea :
>
> It should display lines similar to /proc/net/rt_cache (reusing the same
> helper function)
>
>
I wonder how much the route cache really helps when it grows so large?
Robert Olsson had suggested that turning it off when routing would help.
Perhaps the route cache is only really useful for local destinations? If the
cost of maintaining the route cache exceeds the cost of just using the existing
route table, there is no value to having a route cache.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 20:23 ` Stephen Hemminger
@ 2008-03-28 20:45 ` Denys Fedoryshchenko
2008-03-28 21:17 ` Eric Dumazet
0 siblings, 1 reply; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-28 20:45 UTC (permalink / raw)
To: Eric Dumazet, Stephen Hemminger; +Cc: netdev, netfilter-devel
It seems or patch change something (but it is just showing debug, strange),
or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC-
trie working fine, HASH also i cannot see any leaks.
I will have to wait 5-6 hours to make sure. After this time pass, if i will
not see bug again, i will try to run kernel just with default debug like
before.
If it is required, i can test performance and cpu load with/without routing
cache on real workload. Sure it is better to have syntetic tests before that.
On Fri, 28 Mar 2008 13:23:59 -0700, Stephen Hemminger wrote
> On Fri, 28 Mar 2008 16:57:55 +0100
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> > Eric Dumazet a [UTF-8?] :
> > > Denys Fedoryshchenko a [UTF-8?] :
> > >> Already patched and tested, it doesn't change anything.
> > >>
> > >>
> > >
> > > We still leak dsts somewhere.
> > >
> > > You could try git bisect, or try to patch net/core/dst.c so that
> > > dst_gc_task() (line 83) displays
> > > route informations for say 10 first entries found in the dst_busy_list
> > > (refcnt, interface, source IP, dest IP, things like that) that could
> > > ring a bell given your netfilter rules or network conf.
> >
> > I cooked a patch (untested) to implement this idea :
> >
> > It should display lines similar to /proc/net/rt_cache (reusing the same
> > helper function)
> >
> >
>
> I wonder how much the route cache really helps when it grows so
> large? Robert Olsson had suggested that turning it off when routing
> would help. Perhaps the route cache is only really useful for local
> destinations? If the cost of maintaining the route cache exceeds the
> cost of just using the existing route table, there is no value to
> having a route cache.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 20:45 ` Denys Fedoryshchenko
@ 2008-03-28 21:17 ` Eric Dumazet
2008-03-28 23:21 ` David Miller
[not found] ` <20080328220902.M5377@visp.net.lb>
0 siblings, 2 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-28 21:17 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Stephen Hemminger, netdev, netfilter-devel
Denys Fedoryshchenko a écrit :
> It seems or patch change something (but it is just showing debug, strange),
> or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC-
> trie working fine, HASH also i cannot see any leaks.
Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :(
commit 7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).
Commit 8b7817f3a959ed99d7443afc12f78a7e1fcc2063 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.
Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.
Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
>
> I will have to wait 5-6 hours to make sure. After this time pass, if i will
> not see bug again, i will try to run kernel just with default debug like
> before.
>
> If it is required, i can test performance and cpu load with/without routing
> cache on real workload. Sure it is better to have syntetic tests before that.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 21:17 ` Eric Dumazet
@ 2008-03-28 23:21 ` David Miller
2008-03-29 9:35 ` Denys Fedoryshchenko
[not found] ` <20080328220902.M5377@visp.net.lb>
1 sibling, 1 reply; 29+ messages in thread
From: David Miller @ 2008-03-28 23:21 UTC (permalink / raw)
To: dada1; +Cc: denys, stephen.hemminger, netdev, netfilter-devel
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 28 Mar 2008 22:17:26 +0100
> Denys Fedoryshchenko a écrit :
> > It seems or patch change something (but it is just showing debug, strange),
> > or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC-
> > trie working fine, HASH also i cannot see any leaks.
>
> Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :(
Denys please be more careful in the future :-(
So much time and effort got wasted because of this.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
[not found] ` <20080328220902.M5377@visp.net.lb>
@ 2008-03-29 8:31 ` Eric Dumazet
0 siblings, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2008-03-29 8:31 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Stephen Hemminger, netdev, netfilter-devel
Denys Fedoryshchenko a écrit :
> It looks like probably i did mistake while applying it. Very strange, cause i
> rechecked it, and i did make clean before building system image.
> I will make sure on vanilla code that everything is fine.
>
> Btw, i see permanently 4 "delayed"(?) records.
>
> [ 6261.050827] dstgc [0] lo 09E9FED5 09E9FED5
> 80000000 5 29 0 8F323353 0 0
> 0 00 -1 009E9FED5
> [ 6261.050834] dstgc [1] lo 09E9FED5 09E9FED5
> 80000000 37 358 0 FC9B62D4 0 0
> 0 00 -1 009E9FED5
> [ 6261.050838] dstgc [2] lo 09E9FED5 09E9FED5
> 80000000 4 8 0 7048DF48 0 0
> 0 00 -1 009E9FED5
> [ 6261.050842] dstgc [3] lo 09E9FED5 09E9FED5
> 80000000 6 5 0 6924E846 0 0
> 0 00 -1 009E9FED5
> [ 6261.050851] dst_total: 129123 delayed: 4 work_perf: 0 expires: 68000
> elapsed: 27 us
>
> Especially interesting one with refcnt 37 and dst.__use 358. I check for
> around 5 minutes - there is no traffic at all. This record staying for long
> time and there is no activity on it over "tcpdump -ni any host
> 212.98.155.252" , and it is not flushing it over ip route flush cache.
>
This route with refcnt 37 has Source IP = FC9B62D4, which is 212.98.155.252
Check that you dont have some tcp sessions with this remote address :
ss dst 212.98.155.252
As long as you have no activity on it, refcnt wont change.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: kernel 2.6.25-rc7 highly unstable on high load
2008-03-28 23:21 ` David Miller
@ 2008-03-29 9:35 ` Denys Fedoryshchenko
0 siblings, 0 replies; 29+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-29 9:35 UTC (permalink / raw)
To: netdev, netfilter-devel
I checked whole night, shame on me. Seems this patch was curing the problem.
Probably it was rejected, and i didn't notice. I will be more careful, sorry
again.
On Fri, 28 Mar 2008 16:21:37 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Fri, 28 Mar 2008 22:17:26 +0100
>
> > Denys Fedoryshchenko a :
> > > It seems or patch change something (but it is just showing debug,
strange),
> > > or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-
git3. LC-
> > > trie working fine, HASH also i cannot see any leaks.
> >
> > Yes, the fix is the patch we mentioned yesterday, and you told us you
tried it :(
>
> Denys please be more careful in the future :-(
>
> So much time and effort got wasted because of this.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2008-03-29 9:35 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-27 6:35 kernel 2.6.25-rc7 highly unstable on high load Denys Fedoryshchenko
2008-03-27 6:40 ` David Miller
2008-03-27 7:03 ` Eric Dumazet
2008-03-27 8:36 ` Denys Fedoryshchenko
2008-03-27 8:48 ` Denys Fedoryshchenko
2008-03-27 14:41 ` Denys Fedoryshchenko
2008-03-27 14:52 ` Eric Dumazet
[not found] ` <20080327151520.M89250@visp.net.lb>
2008-03-27 16:07 ` Eric Dumazet
2008-03-27 16:29 ` Eric Dumazet
2008-03-27 18:37 ` Denys Fedoryshchenko
2008-03-27 18:56 ` Eric Dumazet
2008-03-27 19:05 ` Denis V. Lunev
2008-03-27 22:03 ` David Miller
2008-03-28 0:47 ` Denys Fedoryshchenko
2008-03-28 4:50 ` Denys Fedoryshchenko
2008-03-28 5:25 ` Denys Fedoryshchenko
2008-03-28 5:49 ` Eric Dumazet
2008-03-28 7:38 ` Denys Fedoryshchenko
2008-03-28 7:46 ` Eric Dumazet
2008-03-28 9:50 ` Denys Fedoryshchenko
2008-03-28 15:57 ` Eric Dumazet
2008-03-28 20:23 ` Stephen Hemminger
2008-03-28 20:45 ` Denys Fedoryshchenko
2008-03-28 21:17 ` Eric Dumazet
2008-03-28 23:21 ` David Miller
2008-03-29 9:35 ` Denys Fedoryshchenko
[not found] ` <20080328220902.M5377@visp.net.lb>
2008-03-29 8:31 ` Eric Dumazet
2008-03-27 7:07 ` Denys Fedoryshchenko
2008-03-27 15:01 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).