* Problem wit route cache @ 2010-02-08 13:16 Paweł Staszewski 2010-02-08 13:28 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 13:16 UTC (permalink / raw) To: Linux Network Development list Hello From some time i have problem with route cache in linux this is an info that i have in dmesg: Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! Route hash chain too long! Adjust your secret_interval! vlan0811: 9 rebuilds is over limit, route caching disabled Route hash chain too long! Adjust your secret_interval! The problem is that change of net.ipv4.route.secret_interval is change nothing -- no matter that i set secret_interval from dfault 3600 to 2 or 10000 i have always the same info about route cahce is disabled. Also i change this parameter net.ipv4.rt_cache_rebuild_count from default 4 to 9 and the same info - i try also change this to 12 but also this change nothing. The machine that have this info is: 2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz 12GB of RAM Network controllers are: 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) 05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller And the weird thing is that i need to set affinity for my nics to "ff" because when i bind network card to one cpu this cpu have always 100% Router traffic is about 700Mbit/s RX + 700Mbit/s TX on eth0 (with 2 tagged vlans) and the same traffic on eth1 that is untagged This traffic is forwarded by this router. Simple topology: Clients<- ibgp -> [ vlan0811@eth0 + vlan0508@eth0 - BGP process - eth1 ]<- ebgp -> Internet Provisers Some info about fibtrie: cat /proc/net/fib_triestat Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes. Main: Aver depth: 2.58 Max depth: 6 Leaves: 297506 Prefixes: 312472 Internal nodes: 69968 1: 35673 2: 14840 3: 10980 4: 4729 5: 2315 6: 956 7: 364 8: 109 9: 1 16: 1 Pointers: 570018 Null ptrs: 202545 Total size: 36990 kB Counters: --------- gets = 2797832581 backtracks = 149015808 semantic match passed = 2789993308 semantic match miss = 766703 null node hit= 860359377 skipped node resize = 0 Local: Aver depth: 3.66 Max depth: 4 Leaves: 15 Prefixes: 16 Internal nodes: 11 1: 8 2: 3 Pointers: 28 Null ptrs: 3 Total size: 3 kB Counters: --------- gets = 2792656726 backtracks = 2185449412 semantic match passed = 5895311 semantic match miss = 0 null node hit= 818902 skipped node resize = 0 And interrupts cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 80 21 32 27 38 39 38 34 IO-APIC-edge timer 1: 0 1 0 0 0 0 1 0 IO-APIC-edge i8042 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 14: 0 0 0 0 0 0 0 0 IO-APIC-edge ide0 15: 0 0 0 0 0 0 0 0 IO-APIC-edge ide1 20: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 21: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb6 22: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5 23: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb4 29: 73903 136 135 135 134 138 136 136 PCI-MSI-edge ahci 30: 101136854 101509497 101627297 100701508 101684077 101870217 101154425 100604465 PCI-MSI-edge eth0 31: 91772747 92037311 92317341 92231065 91248484 91062342 91778136 92328099 PCI-MSI-edge eth1 32: 0 0 0 0 0 1 1 0 PCI-MSI-edge 33: 0 1 0 0 0 0 0 1 PCI-MSI-edge 34: 0 0 1 0 0 1 0 0 PCI-MSI-edge 35: 1 0 0 1 0 0 0 0 PCI-MSI-edge 36: 0 0 0 0 1 0 0 1 PCI-MSI-edge 37: 1 0 0 0 0 0 1 0 PCI-MSI-edge 38: 0 0 0 1 1 0 0 0 PCI-MSI-edge 39: 0 1 1 0 0 0 0 0 PCI-MSI-edge NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 407733660 395964025 431438426 402307801 434522968 420170984 400450633 390318324 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 0 0 0 0 Performance pending work RES: 14378 15781 19276 5691 14761 13579 16846 15629 Rescheduling interrupts CAL: 378 383 92 86 364 354 92 89 Function call interrupts TLB: 551 577 433 272 602 543 329 683 TLB shootdowns ERR: 0 MIS: 0 Regards Pawel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 13:16 Problem wit route cache Paweł Staszewski @ 2010-02-08 13:28 ` Eric Dumazet 2010-02-08 13:33 ` Paweł Staszewski 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-08 13:28 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Linux Network Development list Le lundi 08 février 2010 à 14:16 +0100, Paweł Staszewski a écrit : > Hello > > From some time i have problem with route cache in linux > this is an info that i have in dmesg: > > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > Route hash chain too long! > Adjust your secret_interval! > vlan0811: 9 rebuilds is over limit, route caching disabled > Route hash chain too long! > Adjust your secret_interval! > > The problem is that change of net.ipv4.route.secret_interval is change > nothing -- no matter that i set secret_interval from dfault 3600 to 2 or > 10000 i have always the same info about route cahce is disabled. > Also i change this parameter net.ipv4.rt_cache_rebuild_count from > default 4 to 9 and the same info - i try also change this to 12 but also > this change nothing. > > The machine that have this info is: > 2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz > 12GB of RAM > Are you running a 64bit kernel ? What is your kernel version ? Please send : # grep . /proc/sys/net/ipv4/route/* #rtstat -c10 -i1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 13:28 ` Eric Dumazet @ 2010-02-08 13:33 ` Paweł Staszewski 2010-02-08 13:51 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 13:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Network Development list [-- Attachment #1: Type: text/plain, Size: 2784 bytes --] W dniu 2010-02-08 14:28, Eric Dumazet pisze: > Le lundi 08 février 2010 à 14:16 +0100, Paweł Staszewski a écrit : > >> Hello >> >> From some time i have problem with route cache in linux >> this is an info that i have in dmesg: >> >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> Route hash chain too long! >> Adjust your secret_interval! >> vlan0811: 9 rebuilds is over limit, route caching disabled >> Route hash chain too long! >> Adjust your secret_interval! >> >> The problem is that change of net.ipv4.route.secret_interval is change >> nothing -- no matter that i set secret_interval from dfault 3600 to 2 or >> 10000 i have always the same info about route cahce is disabled. >> Also i change this parameter net.ipv4.rt_cache_rebuild_count from >> default 4 to 9 and the same info - i try also change this to 12 but also >> this change nothing. >> >> The machine that have this info is: >> 2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz >> 12GB of RAM >> >> > Are you running a 64bit kernel ? > What is your kernel version ? > > Please send : > > # grep . /proc/sys/net/ipv4/route/* > #rtstat -c10 -i1 > > Yes this is x86_64 kernel i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all kernels the same thing happens. grep . /proc/sys/net/ipv4/route/* /proc/sys/net/ipv4/route/error_burst:1250 /proc/sys/net/ipv4/route/error_cost:250 grep: /proc/sys/net/ipv4/route/flush: Permission denied /proc/sys/net/ipv4/route/gc_elasticity:2 /proc/sys/net/ipv4/route/gc_interval:2 /proc/sys/net/ipv4/route/gc_min_interval:0 /proc/sys/net/ipv4/route/gc_min_interval_ms:500 /proc/sys/net/ipv4/route/gc_thresh:65535 /proc/sys/net/ipv4/route/gc_timeout:300 /proc/sys/net/ipv4/route/max_size:524288 /proc/sys/net/ipv4/route/min_adv_mss:256 /proc/sys/net/ipv4/route/min_pmtu:552 /proc/sys/net/ipv4/route/mtu_expires:600 /proc/sys/net/ipv4/route/redirect_load:5 /proc/sys/net/ipv4/route/redirect_number:9 /proc/sys/net/ipv4/route/redirect_silence:5120 /proc/sys/net/ipv4/route/secret_interval:2 This happens not all the time. I have this info only when there are "internet rush hours" - thn there is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > [-- Attachment #2: rtstat.txt --] [-- Type: text/plain, Size: 2039 bytes --] rtstat -c10 -i1 rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache| entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 12082|3440217296|6456413873| 0| 623094| 294| 0| 3| 735116| 5701062| 0|261260739|261040365| 0| 0|654617961| 179044| 10037| 0| 152142| 0| 7| 0| 0| 0| 0| 123| 0| 0| 0| 0| 0| 0| 0| 12032| 0| 155770| 0| 4| 0| 0| 0| 0| 122| 0| 0| 0| 0| 0| 0| 0| 10991| 0| 161040| 0| 7| 0| 0| 0| 0| 129| 0| 0| 0| 0| 0| 0| 0| 9898| 0| 155503| 0| 9| 0| 0| 0| 0| 125| 0| 0| 0| 0| 0| 0| 0| 12553| 0| 157455| 0| 6| 0| 0| 0| 0| 129| 0| 0| 0| 0| 0| 0| 0| 10983| 0| 157742| 0| 9| 0| 0| 0| 0| 128| 0| 0| 0| 0| 0| 0| 0| 9375| 0| 158226| 0| 8| 0| 0| 0| 0| 115| 0| 0| 0| 0| 0| 0| 0| 11929| 0| 159342| 0| 10| 0| 0| 0| 0| 130| 0| 0| 0| 0| 0| 0| 0| 11046| 0| 158015| 0| 8| 0| 0| 0| 0| 126| 0| 0| 0| 0| 0| 0| 0| ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 13:33 ` Paweł Staszewski @ 2010-02-08 13:51 ` Eric Dumazet 2010-02-08 13:59 ` Paweł Staszewski 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-08 13:51 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Linux Network Development list Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : > > > Yes this is x86_64 kernel > i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all > kernels the same thing happens. > grep . /proc/sys/net/ipv4/route/* > /proc/sys/net/ipv4/route/error_burst:1250 > /proc/sys/net/ipv4/route/error_cost:250 > grep: /proc/sys/net/ipv4/route/flush: Permission denied > /proc/sys/net/ipv4/route/gc_elasticity:2 > /proc/sys/net/ipv4/route/gc_interval:2 > /proc/sys/net/ipv4/route/gc_min_interval:0 > /proc/sys/net/ipv4/route/gc_min_interval_ms:500 > /proc/sys/net/ipv4/route/gc_thresh:65535 > /proc/sys/net/ipv4/route/gc_timeout:300 > /proc/sys/net/ipv4/route/max_size:524288 > /proc/sys/net/ipv4/route/min_adv_mss:256 > /proc/sys/net/ipv4/route/min_pmtu:552 > /proc/sys/net/ipv4/route/mtu_expires:600 > /proc/sys/net/ipv4/route/redirect_load:5 > /proc/sys/net/ipv4/route/redirect_number:9 > /proc/sys/net/ipv4/route/redirect_silence:5120 > /proc/sys/net/ipv4/route/secret_interval:2 > > This happens not all the time. > I have this info only when there are "internet rush hours" - thn there > is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic > I dont understand your settings, they are very very small for your setup. You want to flush cache every 2 seconds... With 12GB of ram, you could have /proc/sys/net/ipv4/route/gc_thresh:524288 /proc/sys/net/ipv4/route/max_size:8388608 /proc/sys/net/ipv4/route/secret_interval:3600 /proc/sys/net/ipv4/route/gc_elasticity:4 /proc/sys/net/ipv4/route/gc_interval:1 That would allow about 2 million entries in your route cache, using 768 Mbytes of ram, and a good cache hit ratio. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 13:51 ` Eric Dumazet @ 2010-02-08 13:59 ` Paweł Staszewski 2010-02-08 14:06 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 13:59 UTC (permalink / raw) To: Eric Dumazet, Linux Network Development list W dniu 2010-02-08 14:51, Eric Dumazet pisze: > Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : > > >>> >>> >> Yes this is x86_64 kernel >> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all >> kernels the same thing happens. >> grep . /proc/sys/net/ipv4/route/* >> /proc/sys/net/ipv4/route/error_burst:1250 >> /proc/sys/net/ipv4/route/error_cost:250 >> grep: /proc/sys/net/ipv4/route/flush: Permission denied >> /proc/sys/net/ipv4/route/gc_elasticity:2 >> /proc/sys/net/ipv4/route/gc_interval:2 >> /proc/sys/net/ipv4/route/gc_min_interval:0 >> /proc/sys/net/ipv4/route/gc_min_interval_ms:500 >> /proc/sys/net/ipv4/route/gc_thresh:65535 >> /proc/sys/net/ipv4/route/gc_timeout:300 >> /proc/sys/net/ipv4/route/max_size:524288 >> /proc/sys/net/ipv4/route/min_adv_mss:256 >> /proc/sys/net/ipv4/route/min_pmtu:552 >> /proc/sys/net/ipv4/route/mtu_expires:600 >> /proc/sys/net/ipv4/route/redirect_load:5 >> /proc/sys/net/ipv4/route/redirect_number:9 >> /proc/sys/net/ipv4/route/redirect_silence:5120 >> /proc/sys/net/ipv4/route/secret_interval:2 >> >> This happens not all the time. >> I have this info only when there are "internet rush hours" - thn there >> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic >> >> > I dont understand your settings, they are very very small for your > setup. You want to flush cache every 2 seconds... > > With 12GB of ram, you could have > > /proc/sys/net/ipv4/route/gc_thresh:524288 > /proc/sys/net/ipv4/route/max_size:8388608 > /proc/sys/net/ipv4/route/secret_interval:3600 > /proc/sys/net/ipv4/route/gc_elasticity:4 > /proc/sys/net/ipv4/route/gc_interval:1 > > That would allow about 2 million entries in your route cache, using 768 > Mbytes of ram, and a good cache hit ratio. > > > Yes as i write i change this settings after i see first info "secret_interval" - from 3600 to 2 To check if this resolve the problem. Also my normal settings are: /proc/sys/net/ipv4/route/gc_thresh:256000 /proc/sys/net/ipv4/route/max_size:1048576 /proc/sys/net/ipv4/route/secret_interval:3600 /proc/sys/net/ipv4/route/gc_interval:2 /proc/sys/net/ipv4/route/gc_elasticity:2 And with this setting i was have this info: Route hash chain too long! Adjust your secret_interval! Now i put Your settings as You suggest ... and we will see but i dont know it will help. Because i try many of different settings. > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 13:59 ` Paweł Staszewski @ 2010-02-08 14:06 ` Eric Dumazet 2010-02-08 14:16 ` Paweł Staszewski 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-08 14:06 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Linux Network Development list Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit : > W dniu 2010-02-08 14:51, Eric Dumazet pisze: > > Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : > > > > > >>> > >>> > >> Yes this is x86_64 kernel > >> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all > >> kernels the same thing happens. > >> grep . /proc/sys/net/ipv4/route/* > >> /proc/sys/net/ipv4/route/error_burst:1250 > >> /proc/sys/net/ipv4/route/error_cost:250 > >> grep: /proc/sys/net/ipv4/route/flush: Permission denied > >> /proc/sys/net/ipv4/route/gc_elasticity:2 > >> /proc/sys/net/ipv4/route/gc_interval:2 > >> /proc/sys/net/ipv4/route/gc_min_interval:0 > >> /proc/sys/net/ipv4/route/gc_min_interval_ms:500 > >> /proc/sys/net/ipv4/route/gc_thresh:65535 > >> /proc/sys/net/ipv4/route/gc_timeout:300 > >> /proc/sys/net/ipv4/route/max_size:524288 > >> /proc/sys/net/ipv4/route/min_adv_mss:256 > >> /proc/sys/net/ipv4/route/min_pmtu:552 > >> /proc/sys/net/ipv4/route/mtu_expires:600 > >> /proc/sys/net/ipv4/route/redirect_load:5 > >> /proc/sys/net/ipv4/route/redirect_number:9 > >> /proc/sys/net/ipv4/route/redirect_silence:5120 > >> /proc/sys/net/ipv4/route/secret_interval:2 > >> > >> This happens not all the time. > >> I have this info only when there are "internet rush hours" - thn there > >> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic > >> > >> > > I dont understand your settings, they are very very small for your > > setup. You want to flush cache every 2 seconds... > > > > With 12GB of ram, you could have > > > > /proc/sys/net/ipv4/route/gc_thresh:524288 > > /proc/sys/net/ipv4/route/max_size:8388608 > > /proc/sys/net/ipv4/route/secret_interval:3600 > > /proc/sys/net/ipv4/route/gc_elasticity:4 > > /proc/sys/net/ipv4/route/gc_interval:1 > > > > That would allow about 2 million entries in your route cache, using 768 > > Mbytes of ram, and a good cache hit ratio. > > > > > > > Yes as i write i change this settings after i see first info > "secret_interval" - from 3600 to 2 > To check if this resolve the problem. > Also my normal settings are: > > /proc/sys/net/ipv4/route/gc_thresh:256000 > /proc/sys/net/ipv4/route/max_size:1048576 > /proc/sys/net/ipv4/route/secret_interval:3600 > /proc/sys/net/ipv4/route/gc_interval:2 > /proc/sys/net/ipv4/route/gc_elasticity:2 > > And with this setting i was have this info: > Route hash chain too long! > Adjust your secret_interval! > > > > Now i put Your settings as You suggest ... and we will see but i dont know it will help. > Because i try many of different settings. > One important point is the size of hash table, you want something big for your router. # dmesg | grep 'IP route' ... IP route cache hash table entries: 524288 (order: 10, 4194304 bytes) Then if it is correctly sized, dont change gc_thresh or max_size, as defaults are good. I would only change gc_interval to 1, to perform a smooth gc And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your machine. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 14:06 ` Eric Dumazet @ 2010-02-08 14:16 ` Paweł Staszewski 2010-02-08 14:32 ` Eric Dumazet 2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski 0 siblings, 2 replies; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 14:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Network Development list W dniu 2010-02-08 15:06, Eric Dumazet pisze: > Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit : > >> W dniu 2010-02-08 14:51, Eric Dumazet pisze: >> >>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : >>> >>> >>> >>>>> >>>>> >>>> Yes this is x86_64 kernel >>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all >>>> kernels the same thing happens. >>>> grep . /proc/sys/net/ipv4/route/* >>>> /proc/sys/net/ipv4/route/error_burst:1250 >>>> /proc/sys/net/ipv4/route/error_cost:250 >>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied >>>> /proc/sys/net/ipv4/route/gc_elasticity:2 >>>> /proc/sys/net/ipv4/route/gc_interval:2 >>>> /proc/sys/net/ipv4/route/gc_min_interval:0 >>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500 >>>> /proc/sys/net/ipv4/route/gc_thresh:65535 >>>> /proc/sys/net/ipv4/route/gc_timeout:300 >>>> /proc/sys/net/ipv4/route/max_size:524288 >>>> /proc/sys/net/ipv4/route/min_adv_mss:256 >>>> /proc/sys/net/ipv4/route/min_pmtu:552 >>>> /proc/sys/net/ipv4/route/mtu_expires:600 >>>> /proc/sys/net/ipv4/route/redirect_load:5 >>>> /proc/sys/net/ipv4/route/redirect_number:9 >>>> /proc/sys/net/ipv4/route/redirect_silence:5120 >>>> /proc/sys/net/ipv4/route/secret_interval:2 >>>> >>>> This happens not all the time. >>>> I have this info only when there are "internet rush hours" - thn there >>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic >>>> >>>> >>>> >>> I dont understand your settings, they are very very small for your >>> setup. You want to flush cache every 2 seconds... >>> >>> With 12GB of ram, you could have >>> >>> /proc/sys/net/ipv4/route/gc_thresh:524288 >>> /proc/sys/net/ipv4/route/max_size:8388608 >>> /proc/sys/net/ipv4/route/secret_interval:3600 >>> /proc/sys/net/ipv4/route/gc_elasticity:4 >>> /proc/sys/net/ipv4/route/gc_interval:1 >>> >>> That would allow about 2 million entries in your route cache, using 768 >>> Mbytes of ram, and a good cache hit ratio. >>> >>> >>> >>> >> Yes as i write i change this settings after i see first info >> "secret_interval" - from 3600 to 2 >> To check if this resolve the problem. >> Also my normal settings are: >> >> /proc/sys/net/ipv4/route/gc_thresh:256000 >> /proc/sys/net/ipv4/route/max_size:1048576 >> /proc/sys/net/ipv4/route/secret_interval:3600 >> /proc/sys/net/ipv4/route/gc_interval:2 >> /proc/sys/net/ipv4/route/gc_elasticity:2 >> >> And with this setting i was have this info: >> Route hash chain too long! >> Adjust your secret_interval! >> >> >> >> Now i put Your settings as You suggest ... and we will see but i dont know it will help. >> Because i try many of different settings. >> >> > One important point is the size of hash table, you want something big > for your router. > > # dmesg | grep 'IP route' > ... IP route cache hash table entries: 524288 (order: 10, 4194304 > bytes) > > On my machine it is also the same: dmesg | grep 'IP route' IP route cache hash table entries: 524288 (order: 10, 4194304 bytes) > Then if it is correctly sized, dont change gc_thresh or max_size, as > defaults are good. > > I would only change gc_interval to 1, to perform a smooth gc > > And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your > machine. > > Some day ago after info about route cache i was have also this info: Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------ Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x130/0x1d6() Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1 Feb 4 13:12:40 TM_01_C1 Call Trace: Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ? warn_slowpath_common+0x77/0xa3 Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? warn_slowpath_fmt+0x51/0x59 Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? try_to_wake_up+0x1eb/0x1f8 Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? netdev_drivername+0x3b/0x40 Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44 Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6 Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ? run_timer_softirq+0x1ff/0x29d Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7 Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196 Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28 Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66 Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ? smp_apic_timer_interrupt+0x87/0x95 Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ? apic_timer_interrupt+0x13/0x20 Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ? mwait_idle+0x9b/0xa0 Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]--- And after change kernel to 2.6.33-rc6 another different inf: BUG: soft lockup - CPU#1 stuck for 61s! [events/1:28] Modules linked in: CPU 1 Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>] kmem_cache_free+0x11b/0x11c RSP: 0018:ffff880028243e50 EFLAGS: 00000292 RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0 RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680 RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0 R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7 FS: 0000000000000000(0000) GS:ffff880028240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task ffff88031f9a4f80) Stack: ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023 <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040 <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740 Call Trace: <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 [<ffffffff8136b08c>] ? schedule+0x82c/0x906 [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc [<ffffffff810479bd>] ? kthread+0x79/0x81 [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81047944>] ? kthread+0x0/0x81 [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 Call Trace: <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 [<ffffffff8136b08c>] ? schedule+0x82c/0x906 [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc [<ffffffff810479bd>] ? kthread+0x79/0x81 [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81047944>] ? kthread+0x0/0x81 [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 14:16 ` Paweł Staszewski @ 2010-02-08 14:32 ` Eric Dumazet 2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet 2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski 1 sibling, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-08 14:32 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Linux Network Development list Le lundi 08 février 2010 à 15:16 +0100, Paweł Staszewski a écrit : > > > Some day ago after info about route cache i was have also this info: > Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48 > 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48 > 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 > Call Trace: > <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > [<ffffffff810479bd>] ? kthread+0x79/0x81 > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > [<ffffffff81047944>] ? kthread+0x0/0x81 > > > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > This trace is indeed very interesting, since dst_gc_task() is run from a work queue, and there is no scheduling point in it. We might need add a scheduling point in dst_gc_task() in case huge number of entries were flushed. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 14:32 ` Eric Dumazet @ 2010-02-08 19:32 ` Eric Dumazet 2010-02-08 23:01 ` David Miller 2010-02-08 23:26 ` Andrew Morton 0 siblings, 2 replies; 22+ messages in thread From: Eric Dumazet @ 2010-02-08 19:32 UTC (permalink / raw) To: Paweł Staszewski, David Miller; +Cc: Linux Network Development list Le lundi 08 février 2010 à 15:32 +0100, Eric Dumazet a écrit : > Le lundi 08 février 2010 à 15:16 +0100, Paweł Staszewski a écrit : > > > > > > Some day ago after info about route cache i was have also this info: > > > Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48 > > 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48 > > 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 > > Call Trace: > > <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 > > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 > > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > > [<ffffffff810479bd>] ? kthread+0x79/0x81 > > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > > [<ffffffff81047944>] ? kthread+0x0/0x81 > > > > > > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > > > > > This trace is indeed very interesting, since dst_gc_task() is run from a > work queue, and there is no scheduling point in it. > > We might need add a scheduling point in dst_gc_task() in case huge > number of entries were flushed. > David, here is the patch I sent to Pawel to solve this problem. This probaby is a stable candidate. Thanks [PATCH] dst: call cond_resched() in dst_gc_task() On some workloads, it is quite possible to get a huge dst list to process in dst_gc_task(), and trigger soft lockup detection. Fix is to call cond_resched(), as we run in process context. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/core/dst.c b/net/core/dst.c index 57bc4d5..cb1b348 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -17,6 +17,7 @@ #include <linux/string.h> #include <linux/types.h> #include <net/net_namespace.h> +#include <linux/sched.h> #include <net/dst.h> @@ -79,6 +80,7 @@ loop: while ((dst = next) != NULL) { next = dst->next; prefetch(&next->next); + cond_resched(); if (likely(atomic_read(&dst->__refcnt))) { last->next = dst; last = dst; ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet @ 2010-02-08 23:01 ` David Miller 2010-02-09 6:07 ` Eric Dumazet 2010-02-08 23:26 ` Andrew Morton 1 sibling, 1 reply; 22+ messages in thread From: David Miller @ 2010-02-08 23:01 UTC (permalink / raw) To: eric.dumazet; +Cc: pstaszewski, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 08 Feb 2010 20:32:40 +0100 > [PATCH] dst: call cond_resched() in dst_gc_task() > > On some workloads, it is quite possible to get a huge dst list to > process in dst_gc_task(), and trigger soft lockup detection. > > Fix is to call cond_resched(), as we run in process context. > > Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied and queued up to -stable. When fixing bugs with kernel bugzilla entries, please mention them in the commit message. I fixed this up for you but please take care of it next time. Thanks! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:01 ` David Miller @ 2010-02-09 6:07 ` Eric Dumazet 0 siblings, 0 replies; 22+ messages in thread From: Eric Dumazet @ 2010-02-09 6:07 UTC (permalink / raw) To: David Miller; +Cc: pstaszewski, netdev Le lundi 08 février 2010 à 15:01 -0800, David Miller a écrit : > > When fixing bugs with kernel bugzilla entries, please > mention them in the commit message. I fixed this up for > you but please take care of it next time. > > Thanks! Sorry Dave, I was not aware of the bugzilla entry. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet 2010-02-08 23:01 ` David Miller @ 2010-02-08 23:26 ` Andrew Morton 2010-02-08 23:34 ` David Miller 1 sibling, 1 reply; 22+ messages in thread From: Andrew Morton @ 2010-02-08 23:26 UTC (permalink / raw) To: Eric Dumazet Cc: Paweł Staszewski, David Miller, Linux Network Development list On Mon, 08 Feb 2010 20:32:40 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > [PATCH] dst: call cond_resched() in dst_gc_task() > > On some workloads, it is quite possible to get a huge dst list to > process in dst_gc_task(), and trigger soft lockup detection. > > Fix is to call cond_resched(), as we run in process context. > > Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > > diff --git a/net/core/dst.c b/net/core/dst.c > index 57bc4d5..cb1b348 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -17,6 +17,7 @@ > #include <linux/string.h> > #include <linux/types.h> > #include <net/net_namespace.h> > +#include <linux/sched.h> > > #include <net/dst.h> > > @@ -79,6 +80,7 @@ loop: > while ((dst = next) != NULL) { > next = dst->next; > prefetch(&next->next); > + cond_resched(); > if (likely(atomic_read(&dst->__refcnt))) { > last->next = dst; > last = dst; Gad. Am I understanding this right? The softlockup threshold is sixty seconds! I assume that this function spends most of its time walking over busy entries? Is a more powerful data structure needed? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:26 ` Andrew Morton @ 2010-02-08 23:34 ` David Miller 2010-02-08 23:37 ` Andrew Morton 0 siblings, 1 reply; 22+ messages in thread From: David Miller @ 2010-02-08 23:34 UTC (permalink / raw) To: akpm; +Cc: eric.dumazet, pstaszewski, netdev From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 8 Feb 2010 15:26:06 -0800 > I assume that this function spends most of its time walking over busy > entries? Is a more powerful data structure needed? When you're getting pounded with millions of packets per second, all mostly to different destinations (and thus resolving to different routing cache entries), this is what happens. For a busy router, really, this is normal behavior. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:34 ` David Miller @ 2010-02-08 23:37 ` Andrew Morton 2010-02-08 23:50 ` David Miller 2010-02-08 23:50 ` Stephen Hemminger 0 siblings, 2 replies; 22+ messages in thread From: Andrew Morton @ 2010-02-08 23:37 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, pstaszewski, netdev On Mon, 08 Feb 2010 15:34:06 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Mon, 8 Feb 2010 15:26:06 -0800 > > > I assume that this function spends most of its time walking over busy > > entries? Is a more powerful data structure needed? > > When you're getting pounded with millions of packets per second, > all mostly to different destinations (and thus resolving to > different routing cache entries), this is what happens. > > For a busy router, really, this is normal behavior. Is the cache a net win in that scenario? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:37 ` Andrew Morton @ 2010-02-08 23:50 ` David Miller 2010-02-08 23:50 ` Stephen Hemminger 1 sibling, 0 replies; 22+ messages in thread From: David Miller @ 2010-02-08 23:50 UTC (permalink / raw) To: akpm; +Cc: eric.dumazet, pstaszewski, netdev From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 8 Feb 2010 15:37:44 -0800 > On Mon, 08 Feb 2010 15:34:06 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > >> For a busy router, really, this is normal behavior. > > Is the cache a net win in that scenario? Absolutely. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:37 ` Andrew Morton 2010-02-08 23:50 ` David Miller @ 2010-02-08 23:50 ` Stephen Hemminger 2010-02-09 6:06 ` Eric Dumazet 1 sibling, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2010-02-08 23:50 UTC (permalink / raw) To: Andrew Morton; +Cc: David Miller, eric.dumazet, pstaszewski, netdev On Mon, 8 Feb 2010 15:37:44 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 08 Feb 2010 15:34:06 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > > > From: Andrew Morton <akpm@linux-foundation.org> > > Date: Mon, 8 Feb 2010 15:26:06 -0800 > > > > > I assume that this function spends most of its time walking over busy > > > entries? Is a more powerful data structure needed? > > > > When you're getting pounded with millions of packets per second, > > all mostly to different destinations (and thus resolving to > > different routing cache entries), this is what happens. > > > > For a busy router, really, this is normal behavior. > > Is the cache a net win in that scenario? No, cache doesn't help. Robert who is the expert in this area, runs with FIB TRIE and no routing cache. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-08 23:50 ` Stephen Hemminger @ 2010-02-09 6:06 ` Eric Dumazet 2010-02-09 6:35 ` Andrew Morton 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-09 6:06 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Andrew Morton, David Miller, pstaszewski, netdev Le lundi 08 février 2010 à 15:50 -0800, Stephen Hemminger a écrit : > No, cache doesn't help. > > Robert who is the expert in this area, runs with FIB TRIE and > no routing cache. Who knows, it probably depends on many factors. I always run with cache enabled, because it saves cycles on moderate load. FIB_TRIE is unrelated here, if routing table is very small, it fits HASH or TRIE. Pawel hit the bug with tunables that basically enabled the cache but in a non helpful way (filling the list of busy dst). User error combined with a lazy kernel function :) Please note that conversion from softirq to workqueue, without scheduling point, might/probably use same cpu for handling network irqs and running dst_gc_task() : On big routers, admins usually use irq affinities, so we can have very litle cpu time available to run other tasks on those cpus. After this patch, I believe that scheduler is allowed to migrate dst_gc_task() to an idle cpu. Another point (for 2.6.34) to address is the dst_gc_mutex that can delay NETDEV_UNREGISTER/NETDEV_DOWN events for a long period. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-09 6:06 ` Eric Dumazet @ 2010-02-09 6:35 ` Andrew Morton 2010-02-09 7:20 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Andrew Morton @ 2010-02-09 6:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > After this patch, I believe that scheduler is allowed to migrate > dst_gc_task() to an idle cpu. No, keventd threads are each pinned to a single CPU (kthread_bind() in start_workqueue_thread()), so dst_gc_task() gets run on the CPU which ran schedule_delayed_work() and no other. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-09 6:35 ` Andrew Morton @ 2010-02-09 7:20 ` Eric Dumazet 2010-02-09 7:31 ` Andrew Morton 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-02-09 7:20 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev Le lundi 08 février 2010 à 22:35 -0800, Andrew Morton a écrit : > On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > After this patch, I believe that scheduler is allowed to migrate > > dst_gc_task() to an idle cpu. > > No, keventd threads are each pinned to a single CPU (kthread_bind() in > start_workqueue_thread()), so dst_gc_task() gets run on the CPU which > ran schedule_delayed_work() and no other. > Ah OK, thanks Andrew for this clarification. I suppose offlining a cpu migrates its works to another (online) cpu ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task() 2010-02-09 7:20 ` Eric Dumazet @ 2010-02-09 7:31 ` Andrew Morton 0 siblings, 0 replies; 22+ messages in thread From: Andrew Morton @ 2010-02-09 7:31 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev On Tue, 09 Feb 2010 08:20:36 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > I suppose offlining a cpu migrates its works to another (online) cpu ? Sort of, effectively. The workqueue code runs all the pending works on the to-be-offlined CPU and then it's done. schedule_delayed_work() starts out with a timer, and the timer code _does_ perform migration off the going-away CPU. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 14:16 ` Paweł Staszewski 2010-02-08 14:32 ` Eric Dumazet @ 2010-02-08 14:32 ` Paweł Staszewski 2010-02-08 14:45 ` Paweł Staszewski 1 sibling, 1 reply; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 14:32 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Network Development list W dniu 2010-02-08 15:16, Paweł Staszewski pisze: > W dniu 2010-02-08 15:06, Eric Dumazet pisze: >> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit : >>> W dniu 2010-02-08 14:51, Eric Dumazet pisze: >>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : >>>> >>>> >>>>>> >>>>> Yes this is x86_64 kernel >>>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all >>>>> kernels the same thing happens. >>>>> grep . /proc/sys/net/ipv4/route/* >>>>> /proc/sys/net/ipv4/route/error_burst:1250 >>>>> /proc/sys/net/ipv4/route/error_cost:250 >>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied >>>>> /proc/sys/net/ipv4/route/gc_elasticity:2 >>>>> /proc/sys/net/ipv4/route/gc_interval:2 >>>>> /proc/sys/net/ipv4/route/gc_min_interval:0 >>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500 >>>>> /proc/sys/net/ipv4/route/gc_thresh:65535 >>>>> /proc/sys/net/ipv4/route/gc_timeout:300 >>>>> /proc/sys/net/ipv4/route/max_size:524288 >>>>> /proc/sys/net/ipv4/route/min_adv_mss:256 >>>>> /proc/sys/net/ipv4/route/min_pmtu:552 >>>>> /proc/sys/net/ipv4/route/mtu_expires:600 >>>>> /proc/sys/net/ipv4/route/redirect_load:5 >>>>> /proc/sys/net/ipv4/route/redirect_number:9 >>>>> /proc/sys/net/ipv4/route/redirect_silence:5120 >>>>> /proc/sys/net/ipv4/route/secret_interval:2 >>>>> >>>>> This happens not all the time. >>>>> I have this info only when there are "internet rush hours" - thn >>>>> there >>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic >>>>> >>>>> >>>> I dont understand your settings, they are very very small for your >>>> setup. You want to flush cache every 2 seconds... >>>> >>>> With 12GB of ram, you could have >>>> >>>> /proc/sys/net/ipv4/route/gc_thresh:524288 >>>> /proc/sys/net/ipv4/route/max_size:8388608 >>>> /proc/sys/net/ipv4/route/secret_interval:3600 >>>> /proc/sys/net/ipv4/route/gc_elasticity:4 >>>> /proc/sys/net/ipv4/route/gc_interval:1 >>>> >>>> That would allow about 2 million entries in your route cache, using >>>> 768 >>>> Mbytes of ram, and a good cache hit ratio. >>>> >>>> >>>> >>> Yes as i write i change this settings after i see first info >>> "secret_interval" - from 3600 to 2 >>> To check if this resolve the problem. >>> Also my normal settings are: >>> >>> /proc/sys/net/ipv4/route/gc_thresh:256000 >>> /proc/sys/net/ipv4/route/max_size:1048576 >>> /proc/sys/net/ipv4/route/secret_interval:3600 >>> /proc/sys/net/ipv4/route/gc_interval:2 >>> /proc/sys/net/ipv4/route/gc_elasticity:2 >>> >>> And with this setting i was have this info: >>> Route hash chain too long! >>> Adjust your secret_interval! >>> >>> >>> >>> Now i put Your settings as You suggest ... and we will see but i >>> dont know it will help. >>> Because i try many of different settings. >>> >> One important point is the size of hash table, you want something big >> for your router. >> >> # dmesg | grep 'IP route' >> ... IP route cache hash table entries: 524288 (order: 10, 4194304 >> bytes) >> > > On my machine it is also the same: > dmesg | grep 'IP route' > IP route cache hash table entries: 524288 (order: 10, 4194304 bytes) > > >> Then if it is correctly sized, dont change gc_thresh or max_size, as >> defaults are good. >> >> I would only change gc_interval to 1, to perform a smooth gc >> >> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your >> machine. >> > Some day ago after info about route cache i was have also this info: > Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------ > Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0x130/0x1d6() > Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT > Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit > queue 0 timed out > Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile > Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1 > Feb 4 13:12:40 TM_01_C1 Call Trace: > Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ? > dev_watchdog+0x130/0x1d6 > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 > Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ? > warn_slowpath_common+0x77/0xa3 > Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? > warn_slowpath_fmt+0x51/0x59 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e > Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? > try_to_wake_up+0x1eb/0x1f8 > Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? > netdev_drivername+0x3b/0x40 > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44 > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6 > Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ? > run_timer_softirq+0x1ff/0x29d > Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ? > smp_apic_timer_interrupt+0x87/0x95 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ? > apic_timer_interrupt+0x13/0x20 > Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ? > mwait_idle+0x9b/0xa0 > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c > Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]--- > > And after change kernel to 2.6.33-rc6 another different inf: > > BUG: soft lockup - CPU#1 stuck for 61s! > [events/1:28] > Modules linked in: > CPU 1 > Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT > RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>] > kmem_cache_free+0x11b/0x11c > RSP: 0018:ffff880028243e50 EFLAGS: 00000292 > RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0 > RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680 > RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c > R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0 > R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7 > FS: 0000000000000000(0000) GS:ffff880028240000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task > ffff88031f9a4f80) > Stack: > ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023 > <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040 > <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740 > Call Trace: > <IRQ> > [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > <EOI> > [<ffffffff81004599>] ? do_softirq+0x31/0x63 > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > [<ffffffff810479bd>] ? kthread+0x79/0x81 > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > [<ffffffff81047944>] ? kthread+0x0/0x81 > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 > c3 08 48 > 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 > f5 53 48 > 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 > Call Trace: > <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > [<ffffffff810479bd>] ? kthread+0x79/0x81 > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > [<ffffffff81047944>] ? kthread+0x0/0x81 > > > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > > And other weird thing is that when i make affinity for nics and i bind eth0 to cpu0 and eth1 to cpu2 i think i have too much cpu load: mpstat -P ALL 1 10 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle Average: all 0.00 0.00 0.00 0.10 1.63 16.71 0.00 0.00 81.56 Average: 0 0.00 0.00 0.00 0.00 5.10 72.80 0.00 0.00 22.10 Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 2 0.00 0.00 0.00 0.00 8.00 61.00 0.00 0.00 31.00 Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 6 0.00 0.00 0.00 0.70 0.00 0.00 0.00 0.00 99.30 Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 As You see there is only 22% and 31% cpu idle With forwarded traffic like here: bwm-ng v0.6 (probing every 3.000s), press 'h' for help input: /proc/net/dev type: rate - iface Rx Tx Total ============================================================================== lo: 0.00 b/s 0.00 b/s 0.00 b/s eth0: 346.64 Mb/s 487.24 Mb/s 833.88 Mb/s eth1: 487.48 Mb/s 344.14 Mb/s 831.61 Mb/s vlan0811: 273.29 Mb/s 381.71 Mb/s 655.01 Mb/s vlan0508: 64.62 Mb/s 105.54 Mb/s 170.15 Mb/s ------------------------------------------------------------------------------ total: 1.14 Gb/s 1.29 Gb/s 2.43 Gb/s > >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache 2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski @ 2010-02-08 14:45 ` Paweł Staszewski 0 siblings, 0 replies; 22+ messages in thread From: Paweł Staszewski @ 2010-02-08 14:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Network Development list W dniu 2010-02-08 15:32, Paweł Staszewski pisze: > W dniu 2010-02-08 15:16, Paweł Staszewski pisze: >> W dniu 2010-02-08 15:06, Eric Dumazet pisze: >>> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit : >>>> W dniu 2010-02-08 14:51, Eric Dumazet pisze: >>>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit : >>>>> >>>>> >>>>>>> >>>>>> Yes this is x86_64 kernel >>>>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all >>>>>> kernels the same thing happens. >>>>>> grep . /proc/sys/net/ipv4/route/* >>>>>> /proc/sys/net/ipv4/route/error_burst:1250 >>>>>> /proc/sys/net/ipv4/route/error_cost:250 >>>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied >>>>>> /proc/sys/net/ipv4/route/gc_elasticity:2 >>>>>> /proc/sys/net/ipv4/route/gc_interval:2 >>>>>> /proc/sys/net/ipv4/route/gc_min_interval:0 >>>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500 >>>>>> /proc/sys/net/ipv4/route/gc_thresh:65535 >>>>>> /proc/sys/net/ipv4/route/gc_timeout:300 >>>>>> /proc/sys/net/ipv4/route/max_size:524288 >>>>>> /proc/sys/net/ipv4/route/min_adv_mss:256 >>>>>> /proc/sys/net/ipv4/route/min_pmtu:552 >>>>>> /proc/sys/net/ipv4/route/mtu_expires:600 >>>>>> /proc/sys/net/ipv4/route/redirect_load:5 >>>>>> /proc/sys/net/ipv4/route/redirect_number:9 >>>>>> /proc/sys/net/ipv4/route/redirect_silence:5120 >>>>>> /proc/sys/net/ipv4/route/secret_interval:2 >>>>>> >>>>>> This happens not all the time. >>>>>> I have this info only when there are "internet rush hours" - thn >>>>>> there >>>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic >>>>>> >>>>>> >>>>> I dont understand your settings, they are very very small for your >>>>> setup. You want to flush cache every 2 seconds... >>>>> >>>>> With 12GB of ram, you could have >>>>> >>>>> /proc/sys/net/ipv4/route/gc_thresh:524288 >>>>> /proc/sys/net/ipv4/route/max_size:8388608 >>>>> /proc/sys/net/ipv4/route/secret_interval:3600 >>>>> /proc/sys/net/ipv4/route/gc_elasticity:4 >>>>> /proc/sys/net/ipv4/route/gc_interval:1 >>>>> >>>>> That would allow about 2 million entries in your route cache, >>>>> using 768 >>>>> Mbytes of ram, and a good cache hit ratio. >>>>> >>>>> >>>>> >>>> Yes as i write i change this settings after i see first info >>>> "secret_interval" - from 3600 to 2 >>>> To check if this resolve the problem. >>>> Also my normal settings are: >>>> >>>> /proc/sys/net/ipv4/route/gc_thresh:256000 >>>> /proc/sys/net/ipv4/route/max_size:1048576 >>>> /proc/sys/net/ipv4/route/secret_interval:3600 >>>> /proc/sys/net/ipv4/route/gc_interval:2 >>>> /proc/sys/net/ipv4/route/gc_elasticity:2 >>>> >>>> And with this setting i was have this info: >>>> Route hash chain too long! >>>> Adjust your secret_interval! >>>> >>>> >>>> >>>> Now i put Your settings as You suggest ... and we will see but i >>>> dont know it will help. >>>> Because i try many of different settings. >>>> >>> One important point is the size of hash table, you want something big >>> for your router. >>> >>> # dmesg | grep 'IP route' >>> ... IP route cache hash table entries: 524288 (order: 10, 4194304 >>> bytes) >>> >> >> On my machine it is also the same: >> dmesg | grep 'IP route' >> IP route cache hash table entries: 524288 (order: 10, 4194304 bytes) >> >> >>> Then if it is correctly sized, dont change gc_thresh or max_size, as >>> defaults are good. >>> >>> I would only change gc_interval to 1, to perform a smooth gc >>> >>> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your >>> machine. >>> >> Some day ago after info about route cache i was have also this info: >> Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------ >> Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261 >> dev_watchdog+0x130/0x1d6() >> Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT >> Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit >> queue 0 timed out >> Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile >> Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1 >> Feb 4 13:12:40 TM_01_C1 Call Trace: >> Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ? >> dev_watchdog+0x130/0x1d6 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ? >> warn_slowpath_common+0x77/0xa3 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? >> warn_slowpath_fmt+0x51/0x59 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e >> Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? >> try_to_wake_up+0x1eb/0x1f8 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? >> netdev_drivername+0x3b/0x40 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ? >> run_timer_softirq+0x1ff/0x29d >> Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ? >> smp_apic_timer_interrupt+0x87/0x95 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ? >> apic_timer_interrupt+0x13/0x20 >> Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ? >> mwait_idle+0x9b/0xa0 >> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c >> Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]--- >> >> And after change kernel to 2.6.33-rc6 another different inf: >> >> BUG: soft lockup - CPU#1 stuck for 61s! >> [events/1:28] >> Modules linked in: >> CPU 1 >> Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT >> RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>] >> kmem_cache_free+0x11b/0x11c >> RSP: 0018:ffff880028243e50 EFLAGS: 00000292 >> RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0 >> RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680 >> RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c >> R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0 >> R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7 >> FS: 0000000000000000(0000) GS:ffff880028240000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task >> ffff88031f9a4f80) >> Stack: >> ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023 >> <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040 >> <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740 >> Call Trace: >> <IRQ> >> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 >> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 >> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c >> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d >> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 >> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 >> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 >> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 >> <EOI> >> [<ffffffff81004599>] ? do_softirq+0x31/0x63 >> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 >> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 >> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 >> [<ffffffff8136b08c>] ? schedule+0x82c/0x906 >> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b >> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d >> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc >> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e >> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc >> [<ffffffff810479bd>] ? kthread+0x79/0x81 >> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 >> [<ffffffff81047944>] ? kthread+0x0/0x81 >> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 >> Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 >> c3 08 48 >> 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 >> 89 f5 53 48 >> 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 >> Call Trace: >> <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 >> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 >> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c >> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d >> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 >> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 >> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 >> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 >> <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 >> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 >> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 >> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 >> [<ffffffff8136b08c>] ? schedule+0x82c/0x906 >> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b >> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d >> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc >> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e >> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc >> [<ffffffff810479bd>] ? kthread+0x79/0x81 >> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 >> [<ffffffff81047944>] ? kthread+0x0/0x81 >> >> >> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 >> >> >> > And other weird thing is that when i make affinity for nics and i bind > eth0 to cpu0 and eth1 to cpu2 i think i have too much cpu load: > mpstat -P ALL 1 10 > Average: CPU %usr %nice %sys %iowait %irq %soft > %steal %guest %idle > Average: all 0.00 0.00 0.00 0.10 1.63 16.71 > 0.00 0.00 81.56 > Average: 0 0.00 0.00 0.00 0.00 5.10 72.80 > 0.00 0.00 22.10 > Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 100.00 > Average: 2 0.00 0.00 0.00 0.00 8.00 61.00 > 0.00 0.00 31.00 > Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 100.00 > Average: 4 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 100.00 > Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 100.00 > Average: 6 0.00 0.00 0.00 0.70 0.00 0.00 > 0.00 0.00 99.30 > Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 100.00 > > As You see there is only 22% and 31% cpu idle > With forwarded traffic like here: > bwm-ng v0.6 (probing every 3.000s), press 'h' for help > input: /proc/net/dev type: rate > - iface Rx > Tx Total > > ============================================================================== > > lo: 0.00 b/s 0.00 b/s > 0.00 b/s > eth0: 346.64 Mb/s 487.24 Mb/s > 833.88 Mb/s > eth1: 487.48 Mb/s 344.14 Mb/s > 831.61 Mb/s > vlan0811: 273.29 Mb/s 381.71 Mb/s > 655.01 Mb/s > vlan0508: 64.62 Mb/s 105.54 Mb/s > 170.15 Mb/s > > ------------------------------------------------------------------------------ > > total: 1.14 Gb/s 1.29 Gb/s > 2.43 Gb/s > > > Ok i forget to add that on this router there is traffic management: tc -s -d filter show dev eth1 | grep flowid | wc -l 9096 tc -s -d filter show dev vlan0811 | grep flowid | wc -l 9096 Those are iproute hashing filters. Without filters on interfaces i have 50% idle. - so iproute traffic management take 30% of cpu more. > >> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-02-09 7:31 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-08 13:16 Problem wit route cache Paweł Staszewski 2010-02-08 13:28 ` Eric Dumazet 2010-02-08 13:33 ` Paweł Staszewski 2010-02-08 13:51 ` Eric Dumazet 2010-02-08 13:59 ` Paweł Staszewski 2010-02-08 14:06 ` Eric Dumazet 2010-02-08 14:16 ` Paweł Staszewski 2010-02-08 14:32 ` Eric Dumazet 2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet 2010-02-08 23:01 ` David Miller 2010-02-09 6:07 ` Eric Dumazet 2010-02-08 23:26 ` Andrew Morton 2010-02-08 23:34 ` David Miller 2010-02-08 23:37 ` Andrew Morton 2010-02-08 23:50 ` David Miller 2010-02-08 23:50 ` Stephen Hemminger 2010-02-09 6:06 ` Eric Dumazet 2010-02-09 6:35 ` Andrew Morton 2010-02-09 7:20 ` Eric Dumazet 2010-02-09 7:31 ` Andrew Morton 2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski 2010-02-08 14:45 ` Paweł Staszewski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).