* Problem wit route cache
@ 2010-02-08 13:16 Paweł Staszewski
2010-02-08 13:28 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 13:16 UTC (permalink / raw)
To: Linux Network Development list
Hello
From some time i have problem with route cache in linux
this is an info that i have in dmesg:
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
vlan0811: 9 rebuilds is over limit, route caching disabled
Route hash chain too long!
Adjust your secret_interval!
The problem is that change of net.ipv4.route.secret_interval is change
nothing -- no matter that i set secret_interval from dfault 3600 to 2 or
10000 i have always the same info about route cahce is disabled.
Also i change this parameter net.ipv4.rt_cache_rebuild_count from
default 4 to 9 and the same info - i try also change this to 12 but also
this change nothing.
The machine that have this info is:
2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
12GB of RAM
Network controllers are:
04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
Controller (Copper) (rev 03)
05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
Controller
And the weird thing is that i need to set affinity for my nics to "ff"
because when i bind network card to one cpu this cpu have always 100%
Router traffic is about 700Mbit/s RX + 700Mbit/s TX on eth0 (with 2
tagged vlans) and the same traffic on eth1 that is untagged
This traffic is forwarded by this router.
Simple topology:
Clients<- ibgp -> [ vlan0811@eth0 + vlan0508@eth0 - BGP process - eth1 ]<- ebgp -> Internet Provisers
Some info about fibtrie:
cat /proc/net/fib_triestat
Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes.
Main:
Aver depth: 2.58
Max depth: 6
Leaves: 297506
Prefixes: 312472
Internal nodes: 69968
1: 35673 2: 14840 3: 10980 4: 4729 5: 2315 6: 956 7: 364 8: 109 9: 1 16: 1
Pointers: 570018
Null ptrs: 202545
Total size: 36990 kB
Counters:
---------
gets = 2797832581
backtracks = 149015808
semantic match passed = 2789993308
semantic match miss = 766703
null node hit= 860359377
skipped node resize = 0
Local:
Aver depth: 3.66
Max depth: 4
Leaves: 15
Prefixes: 16
Internal nodes: 11
1: 8 2: 3
Pointers: 28
Null ptrs: 3
Total size: 3 kB
Counters:
---------
gets = 2792656726
backtracks = 2185449412
semantic match passed = 5895311
semantic match miss = 0
null node hit= 818902
skipped node resize = 0
And interrupts
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 80 21 32 27 38 39 38 34 IO-APIC-edge timer
1: 0 1 0 0 0 0 1 0 IO-APIC-edge i8042
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
14: 0 0 0 0 0 0 0 0 IO-APIC-edge ide0
15: 0 0 0 0 0 0 0 0 IO-APIC-edge ide1
20: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3
21: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb6
22: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5
23: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb4
29: 73903 136 135 135 134 138 136 136 PCI-MSI-edge ahci
30: 101136854 101509497 101627297 100701508 101684077 101870217 101154425 100604465 PCI-MSI-edge eth0
31: 91772747 92037311 92317341 92231065 91248484 91062342 91778136 92328099 PCI-MSI-edge eth1
32: 0 0 0 0 0 1 1 0 PCI-MSI-edge
33: 0 1 0 0 0 0 0 1 PCI-MSI-edge
34: 0 0 1 0 0 1 0 0 PCI-MSI-edge
35: 1 0 0 1 0 0 0 0 PCI-MSI-edge
36: 0 0 0 0 1 0 0 1 PCI-MSI-edge
37: 1 0 0 0 0 0 1 0 PCI-MSI-edge
38: 0 0 0 1 1 0 0 0 PCI-MSI-edge
39: 0 1 1 0 0 0 0 0 PCI-MSI-edge
NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 407733660 395964025 431438426 402307801 434522968 420170984 400450633 390318324 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts
PND: 0 0 0 0 0 0 0 0 Performance pending work
RES: 14378 15781 19276 5691 14761 13579 16846 15629 Rescheduling interrupts
CAL: 378 383 92 86 364 354 92 89 Function call interrupts
TLB: 551 577 433 272 602 543 329 683 TLB shootdowns
ERR: 0
MIS: 0
Regards
Pawel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 13:16 Problem wit route cache Paweł Staszewski
@ 2010-02-08 13:28 ` Eric Dumazet
2010-02-08 13:33 ` Paweł Staszewski
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-08 13:28 UTC (permalink / raw)
To: Paweł Staszewski; +Cc: Linux Network Development list
Le lundi 08 février 2010 à 14:16 +0100, Paweł Staszewski a écrit :
> Hello
>
> From some time i have problem with route cache in linux
> this is an info that i have in dmesg:
>
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> Route hash chain too long!
> Adjust your secret_interval!
> vlan0811: 9 rebuilds is over limit, route caching disabled
> Route hash chain too long!
> Adjust your secret_interval!
>
> The problem is that change of net.ipv4.route.secret_interval is change
> nothing -- no matter that i set secret_interval from dfault 3600 to 2 or
> 10000 i have always the same info about route cahce is disabled.
> Also i change this parameter net.ipv4.rt_cache_rebuild_count from
> default 4 to 9 and the same info - i try also change this to 12 but also
> this change nothing.
>
> The machine that have this info is:
> 2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
> 12GB of RAM
>
Are you running a 64bit kernel ?
What is your kernel version ?
Please send :
# grep . /proc/sys/net/ipv4/route/*
#rtstat -c10 -i1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 13:28 ` Eric Dumazet
@ 2010-02-08 13:33 ` Paweł Staszewski
2010-02-08 13:51 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 13:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Network Development list
[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]
W dniu 2010-02-08 14:28, Eric Dumazet pisze:
> Le lundi 08 février 2010 à 14:16 +0100, Paweł Staszewski a écrit :
>
>> Hello
>>
>> From some time i have problem with route cache in linux
>> this is an info that i have in dmesg:
>>
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> Route hash chain too long!
>> Adjust your secret_interval!
>> vlan0811: 9 rebuilds is over limit, route caching disabled
>> Route hash chain too long!
>> Adjust your secret_interval!
>>
>> The problem is that change of net.ipv4.route.secret_interval is change
>> nothing -- no matter that i set secret_interval from dfault 3600 to 2 or
>> 10000 i have always the same info about route cahce is disabled.
>> Also i change this parameter net.ipv4.rt_cache_rebuild_count from
>> default 4 to 9 and the same info - i try also change this to 12 but also
>> this change nothing.
>>
>> The machine that have this info is:
>> 2x Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
>> 12GB of RAM
>>
>>
> Are you running a 64bit kernel ?
> What is your kernel version ?
>
> Please send :
>
> # grep . /proc/sys/net/ipv4/route/*
> #rtstat -c10 -i1
>
>
Yes this is x86_64 kernel
i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
kernels the same thing happens.
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
grep: /proc/sys/net/ipv4/route/flush: Permission denied
/proc/sys/net/ipv4/route/gc_elasticity:2
/proc/sys/net/ipv4/route/gc_interval:2
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:500
/proc/sys/net/ipv4/route/gc_thresh:65535
/proc/sys/net/ipv4/route/gc_timeout:300
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:2
This happens not all the time.
I have this info only when there are "internet rush hours" - thn there
is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
[-- Attachment #2: rtstat.txt --]
[-- Type: text/plain, Size: 2039 bytes --]
rtstat -c10 -i1
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis|
| | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search|
12082|3440217296|6456413873| 0| 623094| 294| 0| 3| 735116| 5701062| 0|261260739|261040365| 0| 0|654617961| 179044|
10037| 0| 152142| 0| 7| 0| 0| 0| 0| 123| 0| 0| 0| 0| 0| 0| 0|
12032| 0| 155770| 0| 4| 0| 0| 0| 0| 122| 0| 0| 0| 0| 0| 0| 0|
10991| 0| 161040| 0| 7| 0| 0| 0| 0| 129| 0| 0| 0| 0| 0| 0| 0|
9898| 0| 155503| 0| 9| 0| 0| 0| 0| 125| 0| 0| 0| 0| 0| 0| 0|
12553| 0| 157455| 0| 6| 0| 0| 0| 0| 129| 0| 0| 0| 0| 0| 0| 0|
10983| 0| 157742| 0| 9| 0| 0| 0| 0| 128| 0| 0| 0| 0| 0| 0| 0|
9375| 0| 158226| 0| 8| 0| 0| 0| 0| 115| 0| 0| 0| 0| 0| 0| 0|
11929| 0| 159342| 0| 10| 0| 0| 0| 0| 130| 0| 0| 0| 0| 0| 0| 0|
11046| 0| 158015| 0| 8| 0| 0| 0| 0| 126| 0| 0| 0| 0| 0| 0| 0|
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 13:33 ` Paweł Staszewski
@ 2010-02-08 13:51 ` Eric Dumazet
2010-02-08 13:59 ` Paweł Staszewski
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-08 13:51 UTC (permalink / raw)
To: Paweł Staszewski; +Cc: Linux Network Development list
Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
> >
> Yes this is x86_64 kernel
> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
> kernels the same thing happens.
> grep . /proc/sys/net/ipv4/route/*
> /proc/sys/net/ipv4/route/error_burst:1250
> /proc/sys/net/ipv4/route/error_cost:250
> grep: /proc/sys/net/ipv4/route/flush: Permission denied
> /proc/sys/net/ipv4/route/gc_elasticity:2
> /proc/sys/net/ipv4/route/gc_interval:2
> /proc/sys/net/ipv4/route/gc_min_interval:0
> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
> /proc/sys/net/ipv4/route/gc_thresh:65535
> /proc/sys/net/ipv4/route/gc_timeout:300
> /proc/sys/net/ipv4/route/max_size:524288
> /proc/sys/net/ipv4/route/min_adv_mss:256
> /proc/sys/net/ipv4/route/min_pmtu:552
> /proc/sys/net/ipv4/route/mtu_expires:600
> /proc/sys/net/ipv4/route/redirect_load:5
> /proc/sys/net/ipv4/route/redirect_number:9
> /proc/sys/net/ipv4/route/redirect_silence:5120
> /proc/sys/net/ipv4/route/secret_interval:2
>
> This happens not all the time.
> I have this info only when there are "internet rush hours" - thn there
> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>
I dont understand your settings, they are very very small for your
setup. You want to flush cache every 2 seconds...
With 12GB of ram, you could have
/proc/sys/net/ipv4/route/gc_thresh:524288
/proc/sys/net/ipv4/route/max_size:8388608
/proc/sys/net/ipv4/route/secret_interval:3600
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
That would allow about 2 million entries in your route cache, using 768
Mbytes of ram, and a good cache hit ratio.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 13:51 ` Eric Dumazet
@ 2010-02-08 13:59 ` Paweł Staszewski
2010-02-08 14:06 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 13:59 UTC (permalink / raw)
To: Eric Dumazet, Linux Network Development list
W dniu 2010-02-08 14:51, Eric Dumazet pisze:
> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
>
>
>>>
>>>
>> Yes this is x86_64 kernel
>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
>> kernels the same thing happens.
>> grep . /proc/sys/net/ipv4/route/*
>> /proc/sys/net/ipv4/route/error_burst:1250
>> /proc/sys/net/ipv4/route/error_cost:250
>> grep: /proc/sys/net/ipv4/route/flush: Permission denied
>> /proc/sys/net/ipv4/route/gc_elasticity:2
>> /proc/sys/net/ipv4/route/gc_interval:2
>> /proc/sys/net/ipv4/route/gc_min_interval:0
>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
>> /proc/sys/net/ipv4/route/gc_thresh:65535
>> /proc/sys/net/ipv4/route/gc_timeout:300
>> /proc/sys/net/ipv4/route/max_size:524288
>> /proc/sys/net/ipv4/route/min_adv_mss:256
>> /proc/sys/net/ipv4/route/min_pmtu:552
>> /proc/sys/net/ipv4/route/mtu_expires:600
>> /proc/sys/net/ipv4/route/redirect_load:5
>> /proc/sys/net/ipv4/route/redirect_number:9
>> /proc/sys/net/ipv4/route/redirect_silence:5120
>> /proc/sys/net/ipv4/route/secret_interval:2
>>
>> This happens not all the time.
>> I have this info only when there are "internet rush hours" - thn there
>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>>
>>
> I dont understand your settings, they are very very small for your
> setup. You want to flush cache every 2 seconds...
>
> With 12GB of ram, you could have
>
> /proc/sys/net/ipv4/route/gc_thresh:524288
> /proc/sys/net/ipv4/route/max_size:8388608
> /proc/sys/net/ipv4/route/secret_interval:3600
> /proc/sys/net/ipv4/route/gc_elasticity:4
> /proc/sys/net/ipv4/route/gc_interval:1
>
> That would allow about 2 million entries in your route cache, using 768
> Mbytes of ram, and a good cache hit ratio.
>
>
>
Yes as i write i change this settings after i see first info
"secret_interval" - from 3600 to 2
To check if this resolve the problem.
Also my normal settings are:
/proc/sys/net/ipv4/route/gc_thresh:256000
/proc/sys/net/ipv4/route/max_size:1048576
/proc/sys/net/ipv4/route/secret_interval:3600
/proc/sys/net/ipv4/route/gc_interval:2
/proc/sys/net/ipv4/route/gc_elasticity:2
And with this setting i was have this info:
Route hash chain too long!
Adjust your secret_interval!
Now i put Your settings as You suggest ... and we will see but i dont know it will help.
Because i try many of different settings.
>
>
>
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 13:59 ` Paweł Staszewski
@ 2010-02-08 14:06 ` Eric Dumazet
2010-02-08 14:16 ` Paweł Staszewski
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-08 14:06 UTC (permalink / raw)
To: Paweł Staszewski; +Cc: Linux Network Development list
Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit :
> W dniu 2010-02-08 14:51, Eric Dumazet pisze:
> > Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
> >
> >
> >>>
> >>>
> >> Yes this is x86_64 kernel
> >> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
> >> kernels the same thing happens.
> >> grep . /proc/sys/net/ipv4/route/*
> >> /proc/sys/net/ipv4/route/error_burst:1250
> >> /proc/sys/net/ipv4/route/error_cost:250
> >> grep: /proc/sys/net/ipv4/route/flush: Permission denied
> >> /proc/sys/net/ipv4/route/gc_elasticity:2
> >> /proc/sys/net/ipv4/route/gc_interval:2
> >> /proc/sys/net/ipv4/route/gc_min_interval:0
> >> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
> >> /proc/sys/net/ipv4/route/gc_thresh:65535
> >> /proc/sys/net/ipv4/route/gc_timeout:300
> >> /proc/sys/net/ipv4/route/max_size:524288
> >> /proc/sys/net/ipv4/route/min_adv_mss:256
> >> /proc/sys/net/ipv4/route/min_pmtu:552
> >> /proc/sys/net/ipv4/route/mtu_expires:600
> >> /proc/sys/net/ipv4/route/redirect_load:5
> >> /proc/sys/net/ipv4/route/redirect_number:9
> >> /proc/sys/net/ipv4/route/redirect_silence:5120
> >> /proc/sys/net/ipv4/route/secret_interval:2
> >>
> >> This happens not all the time.
> >> I have this info only when there are "internet rush hours" - thn there
> >> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
> >>
> >>
> > I dont understand your settings, they are very very small for your
> > setup. You want to flush cache every 2 seconds...
> >
> > With 12GB of ram, you could have
> >
> > /proc/sys/net/ipv4/route/gc_thresh:524288
> > /proc/sys/net/ipv4/route/max_size:8388608
> > /proc/sys/net/ipv4/route/secret_interval:3600
> > /proc/sys/net/ipv4/route/gc_elasticity:4
> > /proc/sys/net/ipv4/route/gc_interval:1
> >
> > That would allow about 2 million entries in your route cache, using 768
> > Mbytes of ram, and a good cache hit ratio.
> >
> >
> >
> Yes as i write i change this settings after i see first info
> "secret_interval" - from 3600 to 2
> To check if this resolve the problem.
> Also my normal settings are:
>
> /proc/sys/net/ipv4/route/gc_thresh:256000
> /proc/sys/net/ipv4/route/max_size:1048576
> /proc/sys/net/ipv4/route/secret_interval:3600
> /proc/sys/net/ipv4/route/gc_interval:2
> /proc/sys/net/ipv4/route/gc_elasticity:2
>
> And with this setting i was have this info:
> Route hash chain too long!
> Adjust your secret_interval!
>
>
>
> Now i put Your settings as You suggest ... and we will see but i dont know it will help.
> Because i try many of different settings.
>
One important point is the size of hash table, you want something big
for your router.
# dmesg | grep 'IP route'
... IP route cache hash table entries: 524288 (order: 10, 4194304
bytes)
Then if it is correctly sized, dont change gc_thresh or max_size, as
defaults are good.
I would only change gc_interval to 1, to perform a smooth gc
And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your
machine.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 14:06 ` Eric Dumazet
@ 2010-02-08 14:16 ` Paweł Staszewski
2010-02-08 14:32 ` Eric Dumazet
2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski
0 siblings, 2 replies; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 14:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Network Development list
W dniu 2010-02-08 15:06, Eric Dumazet pisze:
> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit :
>
>> W dniu 2010-02-08 14:51, Eric Dumazet pisze:
>>
>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
>>>
>>>
>>>
>>>>>
>>>>>
>>>> Yes this is x86_64 kernel
>>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
>>>> kernels the same thing happens.
>>>> grep . /proc/sys/net/ipv4/route/*
>>>> /proc/sys/net/ipv4/route/error_burst:1250
>>>> /proc/sys/net/ipv4/route/error_cost:250
>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied
>>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>> /proc/sys/net/ipv4/route/gc_interval:2
>>>> /proc/sys/net/ipv4/route/gc_min_interval:0
>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
>>>> /proc/sys/net/ipv4/route/gc_thresh:65535
>>>> /proc/sys/net/ipv4/route/gc_timeout:300
>>>> /proc/sys/net/ipv4/route/max_size:524288
>>>> /proc/sys/net/ipv4/route/min_adv_mss:256
>>>> /proc/sys/net/ipv4/route/min_pmtu:552
>>>> /proc/sys/net/ipv4/route/mtu_expires:600
>>>> /proc/sys/net/ipv4/route/redirect_load:5
>>>> /proc/sys/net/ipv4/route/redirect_number:9
>>>> /proc/sys/net/ipv4/route/redirect_silence:5120
>>>> /proc/sys/net/ipv4/route/secret_interval:2
>>>>
>>>> This happens not all the time.
>>>> I have this info only when there are "internet rush hours" - thn there
>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>>>>
>>>>
>>>>
>>> I dont understand your settings, they are very very small for your
>>> setup. You want to flush cache every 2 seconds...
>>>
>>> With 12GB of ram, you could have
>>>
>>> /proc/sys/net/ipv4/route/gc_thresh:524288
>>> /proc/sys/net/ipv4/route/max_size:8388608
>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>> /proc/sys/net/ipv4/route/gc_elasticity:4
>>> /proc/sys/net/ipv4/route/gc_interval:1
>>>
>>> That would allow about 2 million entries in your route cache, using 768
>>> Mbytes of ram, and a good cache hit ratio.
>>>
>>>
>>>
>>>
>> Yes as i write i change this settings after i see first info
>> "secret_interval" - from 3600 to 2
>> To check if this resolve the problem.
>> Also my normal settings are:
>>
>> /proc/sys/net/ipv4/route/gc_thresh:256000
>> /proc/sys/net/ipv4/route/max_size:1048576
>> /proc/sys/net/ipv4/route/secret_interval:3600
>> /proc/sys/net/ipv4/route/gc_interval:2
>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>
>> And with this setting i was have this info:
>> Route hash chain too long!
>> Adjust your secret_interval!
>>
>>
>>
>> Now i put Your settings as You suggest ... and we will see but i dont know it will help.
>> Because i try many of different settings.
>>
>>
> One important point is the size of hash table, you want something big
> for your router.
>
> # dmesg | grep 'IP route'
> ... IP route cache hash table entries: 524288 (order: 10, 4194304
> bytes)
>
>
On my machine it is also the same:
dmesg | grep 'IP route'
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Then if it is correctly sized, dont change gc_thresh or max_size, as
> defaults are good.
>
> I would only change gc_interval to 1, to perform a smooth gc
>
> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your
> machine.
>
>
Some day ago after info about route cache i was have also this info:
Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------
Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0x130/0x1d6()
Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT
Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit queue
0 timed out
Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile
Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1
Feb 4 13:12:40 TM_01_C1 Call Trace:
Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ?
dev_watchdog+0x130/0x1d6
Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ?
warn_slowpath_common+0x77/0xa3
Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? warn_slowpath_fmt+0x51/0x59
Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e
Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? try_to_wake_up+0x1eb/0x1f8
Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? netdev_drivername+0x3b/0x40
Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44
Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6
Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ?
run_timer_softirq+0x1ff/0x29d
Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7
Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196
Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28
Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66
Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ?
smp_apic_timer_interrupt+0x87/0x95
Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ?
apic_timer_interrupt+0x13/0x20
Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ? mwait_idle+0x9b/0xa0
Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c
Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]---
And after change kernel to 2.6.33-rc6 another different inf:
BUG: soft lockup - CPU#1 stuck for 61s!
[events/1:28]
Modules linked in:
CPU 1
Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT
RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>]
kmem_cache_free+0x11b/0x11c
RSP: 0018:ffff880028243e50 EFLAGS: 00000292
RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0
RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680
RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c
R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0
R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7
FS: 0000000000000000(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task ffff88031f9a4f80)
Stack:
ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023
<0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040
<0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740
Call Trace:
<IRQ>
[<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
[<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
[<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
[<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
[<ffffffff81035311>] ? __do_softirq+0xd7/0x196
[<ffffffff81002dac>] ? call_softirq+0x1c/0x28
[<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
[<ffffffff81002dac>] ? call_softirq+0x1c/0x28
<EOI>
[<ffffffff81004599>] ? do_softirq+0x31/0x63
[<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
[<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
[<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
[<ffffffff8136b08c>] ? schedule+0x82c/0x906
[<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
[<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
[<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
[<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
[<ffffffff810479bd>] ? kthread+0x79/0x81
[<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81047944>] ? kthread+0x0/0x81
[<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48
83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48
89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
Call Trace:
<IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
[<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
[<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
[<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
[<ffffffff81035311>] ? __do_softirq+0xd7/0x196
[<ffffffff81002dac>] ? call_softirq+0x1c/0x28
[<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
[<ffffffff81002dac>] ? call_softirq+0x1c/0x28
<EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63
[<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
[<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
[<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
[<ffffffff8136b08c>] ? schedule+0x82c/0x906
[<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
[<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
[<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
[<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
[<ffffffff810479bd>] ? kthread+0x79/0x81
[<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81047944>] ? kthread+0x0/0x81
[<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 14:16 ` Paweł Staszewski
@ 2010-02-08 14:32 ` Eric Dumazet
2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet
2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski
1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-08 14:32 UTC (permalink / raw)
To: Paweł Staszewski; +Cc: Linux Network Development list
Le lundi 08 février 2010 à 15:16 +0100, Paweł Staszewski a écrit :
> >
> Some day ago after info about route cache i was have also this info:
> Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48
> 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48
> 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
> Call Trace:
> <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63
> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
> [<ffffffff8136b08c>] ? schedule+0x82c/0x906
> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
> [<ffffffff810479bd>] ? kthread+0x79/0x81
> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81047944>] ? kthread+0x0/0x81
>
>
> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>
>
This trace is indeed very interesting, since dst_gc_task() is run from a
work queue, and there is no scheduling point in it.
We might need add a scheduling point in dst_gc_task() in case huge
number of entries were flushed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 14:16 ` Paweł Staszewski
2010-02-08 14:32 ` Eric Dumazet
@ 2010-02-08 14:32 ` Paweł Staszewski
2010-02-08 14:45 ` Paweł Staszewski
1 sibling, 1 reply; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 14:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Network Development list
W dniu 2010-02-08 15:16, Paweł Staszewski pisze:
> W dniu 2010-02-08 15:06, Eric Dumazet pisze:
>> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit :
>>> W dniu 2010-02-08 14:51, Eric Dumazet pisze:
>>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
>>>>
>>>>
>>>>>>
>>>>> Yes this is x86_64 kernel
>>>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
>>>>> kernels the same thing happens.
>>>>> grep . /proc/sys/net/ipv4/route/*
>>>>> /proc/sys/net/ipv4/route/error_burst:1250
>>>>> /proc/sys/net/ipv4/route/error_cost:250
>>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied
>>>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>>> /proc/sys/net/ipv4/route/gc_interval:2
>>>>> /proc/sys/net/ipv4/route/gc_min_interval:0
>>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
>>>>> /proc/sys/net/ipv4/route/gc_thresh:65535
>>>>> /proc/sys/net/ipv4/route/gc_timeout:300
>>>>> /proc/sys/net/ipv4/route/max_size:524288
>>>>> /proc/sys/net/ipv4/route/min_adv_mss:256
>>>>> /proc/sys/net/ipv4/route/min_pmtu:552
>>>>> /proc/sys/net/ipv4/route/mtu_expires:600
>>>>> /proc/sys/net/ipv4/route/redirect_load:5
>>>>> /proc/sys/net/ipv4/route/redirect_number:9
>>>>> /proc/sys/net/ipv4/route/redirect_silence:5120
>>>>> /proc/sys/net/ipv4/route/secret_interval:2
>>>>>
>>>>> This happens not all the time.
>>>>> I have this info only when there are "internet rush hours" - thn
>>>>> there
>>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>>>>>
>>>>>
>>>> I dont understand your settings, they are very very small for your
>>>> setup. You want to flush cache every 2 seconds...
>>>>
>>>> With 12GB of ram, you could have
>>>>
>>>> /proc/sys/net/ipv4/route/gc_thresh:524288
>>>> /proc/sys/net/ipv4/route/max_size:8388608
>>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>>> /proc/sys/net/ipv4/route/gc_elasticity:4
>>>> /proc/sys/net/ipv4/route/gc_interval:1
>>>>
>>>> That would allow about 2 million entries in your route cache, using
>>>> 768
>>>> Mbytes of ram, and a good cache hit ratio.
>>>>
>>>>
>>>>
>>> Yes as i write i change this settings after i see first info
>>> "secret_interval" - from 3600 to 2
>>> To check if this resolve the problem.
>>> Also my normal settings are:
>>>
>>> /proc/sys/net/ipv4/route/gc_thresh:256000
>>> /proc/sys/net/ipv4/route/max_size:1048576
>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>> /proc/sys/net/ipv4/route/gc_interval:2
>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>
>>> And with this setting i was have this info:
>>> Route hash chain too long!
>>> Adjust your secret_interval!
>>>
>>>
>>>
>>> Now i put Your settings as You suggest ... and we will see but i
>>> dont know it will help.
>>> Because i try many of different settings.
>>>
>> One important point is the size of hash table, you want something big
>> for your router.
>>
>> # dmesg | grep 'IP route'
>> ... IP route cache hash table entries: 524288 (order: 10, 4194304
>> bytes)
>>
>
> On my machine it is also the same:
> dmesg | grep 'IP route'
> IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
>
>
>> Then if it is correctly sized, dont change gc_thresh or max_size, as
>> defaults are good.
>>
>> I would only change gc_interval to 1, to perform a smooth gc
>>
>> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your
>> machine.
>>
> Some day ago after info about route cache i was have also this info:
> Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------
> Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261
> dev_watchdog+0x130/0x1d6()
> Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT
> Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit
> queue 0 timed out
> Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile
> Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1
> Feb 4 13:12:40 TM_01_C1 Call Trace:
> Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ?
> dev_watchdog+0x130/0x1d6
> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ?
> warn_slowpath_common+0x77/0xa3
> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ?
> warn_slowpath_fmt+0x51/0x59
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e
> Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ?
> try_to_wake_up+0x1eb/0x1f8
> Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ?
> netdev_drivername+0x3b/0x40
> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44
> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6
> Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ?
> run_timer_softirq+0x1ff/0x29d
> Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ?
> smp_apic_timer_interrupt+0x87/0x95
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ?
> apic_timer_interrupt+0x13/0x20
> Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ?
> mwait_idle+0x9b/0xa0
> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c
> Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]---
>
> And after change kernel to 2.6.33-rc6 another different inf:
>
> BUG: soft lockup - CPU#1 stuck for 61s!
> [events/1:28]
> Modules linked in:
> CPU 1
> Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT
> RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>]
> kmem_cache_free+0x11b/0x11c
> RSP: 0018:ffff880028243e50 EFLAGS: 00000292
> RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0
> RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680
> RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c
> R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0
> R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7
> FS: 0000000000000000(0000) GS:ffff880028240000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task
> ffff88031f9a4f80)
> Stack:
> ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023
> <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040
> <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740
> Call Trace:
> <IRQ>
> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> <EOI>
> [<ffffffff81004599>] ? do_softirq+0x31/0x63
> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
> [<ffffffff8136b08c>] ? schedule+0x82c/0x906
> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
> [<ffffffff810479bd>] ? kthread+0x79/0x81
> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81047944>] ? kthread+0x0/0x81
> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
> Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83
> c3 08 48
> 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89
> f5 53 48
> 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
> Call Trace:
> <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63
> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
> [<ffffffff8136b08c>] ? schedule+0x82c/0x906
> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
> [<ffffffff810479bd>] ? kthread+0x79/0x81
> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81047944>] ? kthread+0x0/0x81
>
>
> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>
>
>
And other weird thing is that when i make affinity for nics and i bind
eth0 to cpu0 and eth1 to cpu2 i think i have too much cpu load:
mpstat -P ALL 1 10
Average: CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %idle
Average: all 0.00 0.00 0.00 0.10 1.63 16.71
0.00 0.00 81.56
Average: 0 0.00 0.00 0.00 0.00 5.10 72.80
0.00 0.00 22.10
Average: 1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 8.00 61.00
0.00 0.00 31.00
Average: 3 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 4 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 5 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 0.70 0.00 0.00
0.00 0.00 99.30
Average: 7 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 100.00
As You see there is only 22% and 31% cpu idle
With forwarded traffic like here:
bwm-ng v0.6 (probing every 3.000s), press 'h' for help
input: /proc/net/dev type: rate
- iface Rx
Tx Total
==============================================================================
lo: 0.00 b/s 0.00 b/s
0.00 b/s
eth0: 346.64 Mb/s 487.24 Mb/s
833.88 Mb/s
eth1: 487.48 Mb/s 344.14 Mb/s
831.61 Mb/s
vlan0811: 273.29 Mb/s 381.71 Mb/s
655.01 Mb/s
vlan0508: 64.62 Mb/s 105.54 Mb/s
170.15 Mb/s
------------------------------------------------------------------------------
total: 1.14 Gb/s 1.29 Gb/s
2.43 Gb/s
>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Problem wit route cache
2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski
@ 2010-02-08 14:45 ` Paweł Staszewski
0 siblings, 0 replies; 22+ messages in thread
From: Paweł Staszewski @ 2010-02-08 14:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Network Development list
W dniu 2010-02-08 15:32, Paweł Staszewski pisze:
> W dniu 2010-02-08 15:16, Paweł Staszewski pisze:
>> W dniu 2010-02-08 15:06, Eric Dumazet pisze:
>>> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit :
>>>> W dniu 2010-02-08 14:51, Eric Dumazet pisze:
>>>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
>>>>>
>>>>>
>>>>>>>
>>>>>> Yes this is x86_64 kernel
>>>>>> i kernels 2.6.32.2 / 2.6.32.7 and now 2.6.33-rc6-git5 and on all
>>>>>> kernels the same thing happens.
>>>>>> grep . /proc/sys/net/ipv4/route/*
>>>>>> /proc/sys/net/ipv4/route/error_burst:1250
>>>>>> /proc/sys/net/ipv4/route/error_cost:250
>>>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied
>>>>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>>>> /proc/sys/net/ipv4/route/gc_interval:2
>>>>>> /proc/sys/net/ipv4/route/gc_min_interval:0
>>>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
>>>>>> /proc/sys/net/ipv4/route/gc_thresh:65535
>>>>>> /proc/sys/net/ipv4/route/gc_timeout:300
>>>>>> /proc/sys/net/ipv4/route/max_size:524288
>>>>>> /proc/sys/net/ipv4/route/min_adv_mss:256
>>>>>> /proc/sys/net/ipv4/route/min_pmtu:552
>>>>>> /proc/sys/net/ipv4/route/mtu_expires:600
>>>>>> /proc/sys/net/ipv4/route/redirect_load:5
>>>>>> /proc/sys/net/ipv4/route/redirect_number:9
>>>>>> /proc/sys/net/ipv4/route/redirect_silence:5120
>>>>>> /proc/sys/net/ipv4/route/secret_interval:2
>>>>>>
>>>>>> This happens not all the time.
>>>>>> I have this info only when there are "internet rush hours" - thn
>>>>>> there
>>>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>>>>>>
>>>>>>
>>>>> I dont understand your settings, they are very very small for your
>>>>> setup. You want to flush cache every 2 seconds...
>>>>>
>>>>> With 12GB of ram, you could have
>>>>>
>>>>> /proc/sys/net/ipv4/route/gc_thresh:524288
>>>>> /proc/sys/net/ipv4/route/max_size:8388608
>>>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>>>> /proc/sys/net/ipv4/route/gc_elasticity:4
>>>>> /proc/sys/net/ipv4/route/gc_interval:1
>>>>>
>>>>> That would allow about 2 million entries in your route cache,
>>>>> using 768
>>>>> Mbytes of ram, and a good cache hit ratio.
>>>>>
>>>>>
>>>>>
>>>> Yes as i write i change this settings after i see first info
>>>> "secret_interval" - from 3600 to 2
>>>> To check if this resolve the problem.
>>>> Also my normal settings are:
>>>>
>>>> /proc/sys/net/ipv4/route/gc_thresh:256000
>>>> /proc/sys/net/ipv4/route/max_size:1048576
>>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>>> /proc/sys/net/ipv4/route/gc_interval:2
>>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>>
>>>> And with this setting i was have this info:
>>>> Route hash chain too long!
>>>> Adjust your secret_interval!
>>>>
>>>>
>>>>
>>>> Now i put Your settings as You suggest ... and we will see but i
>>>> dont know it will help.
>>>> Because i try many of different settings.
>>>>
>>> One important point is the size of hash table, you want something big
>>> for your router.
>>>
>>> # dmesg | grep 'IP route'
>>> ... IP route cache hash table entries: 524288 (order: 10, 4194304
>>> bytes)
>>>
>>
>> On my machine it is also the same:
>> dmesg | grep 'IP route'
>> IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
>>
>>
>>> Then if it is correctly sized, dont change gc_thresh or max_size, as
>>> defaults are good.
>>>
>>> I would only change gc_interval to 1, to perform a smooth gc
>>>
>>> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your
>>> machine.
>>>
>> Some day ago after info about route cache i was have also this info:
>> Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------
>> Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261
>> dev_watchdog+0x130/0x1d6()
>> Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT
>> Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit
>> queue 0 timed out
>> Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile
>> Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1
>> Feb 4 13:12:40 TM_01_C1 Call Trace:
>> Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ?
>> dev_watchdog+0x130/0x1d6
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ?
>> warn_slowpath_common+0x77/0xa3
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ?
>> warn_slowpath_fmt+0x51/0x59
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ?
>> try_to_wake_up+0x1eb/0x1f8
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ?
>> netdev_drivername+0x3b/0x40
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ?
>> run_timer_softirq+0x1ff/0x29d
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ?
>> smp_apic_timer_interrupt+0x87/0x95
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ?
>> apic_timer_interrupt+0x13/0x20
>> Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ?
>> mwait_idle+0x9b/0xa0
>> Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c
>> Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]---
>>
>> And after change kernel to 2.6.33-rc6 another different inf:
>>
>> BUG: soft lockup - CPU#1 stuck for 61s!
>> [events/1:28]
>> Modules linked in:
>> CPU 1
>> Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT
>> RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>]
>> kmem_cache_free+0x11b/0x11c
>> RSP: 0018:ffff880028243e50 EFLAGS: 00000292
>> RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0
>> RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680
>> RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c
>> R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0
>> R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7
>> FS: 0000000000000000(0000) GS:ffff880028240000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task
>> ffff88031f9a4f80)
>> Stack:
>> ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023
>> <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040
>> <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740
>> Call Trace:
>> <IRQ>
>> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
>> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
>> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
>> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
>> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
>> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>> <EOI>
>> [<ffffffff81004599>] ? do_softirq+0x31/0x63
>> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
>> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
>> [<ffffffff8136b08c>] ? schedule+0x82c/0x906
>> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
>> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
>> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
>> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
>> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
>> [<ffffffff810479bd>] ? kthread+0x79/0x81
>> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
>> [<ffffffff81047944>] ? kthread+0x0/0x81
>> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>> Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83
>> c3 08 48
>> 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48
>> 89 f5 53 48
>> 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
>> Call Trace:
>> <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
>> [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
>> [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
>> [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
>> [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
>> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>> [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>> <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63
>> [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
>> [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>> [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
>> [<ffffffff8136b08c>] ? schedule+0x82c/0x906
>> [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
>> [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
>> [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
>> [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
>> [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
>> [<ffffffff810479bd>] ? kthread+0x79/0x81
>> [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
>> [<ffffffff81047944>] ? kthread+0x0/0x81
>>
>>
>> [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>>
>>
>>
> And other weird thing is that when i make affinity for nics and i bind
> eth0 to cpu0 and eth1 to cpu2 i think i have too much cpu load:
> mpstat -P ALL 1 10
> Average: CPU %usr %nice %sys %iowait %irq %soft
> %steal %guest %idle
> Average: all 0.00 0.00 0.00 0.10 1.63 16.71
> 0.00 0.00 81.56
> Average: 0 0.00 0.00 0.00 0.00 5.10 72.80
> 0.00 0.00 22.10
> Average: 1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 100.00
> Average: 2 0.00 0.00 0.00 0.00 8.00 61.00
> 0.00 0.00 31.00
> Average: 3 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 100.00
> Average: 4 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 100.00
> Average: 5 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 100.00
> Average: 6 0.00 0.00 0.00 0.70 0.00 0.00
> 0.00 0.00 99.30
> Average: 7 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 100.00
>
> As You see there is only 22% and 31% cpu idle
> With forwarded traffic like here:
> bwm-ng v0.6 (probing every 3.000s), press 'h' for help
> input: /proc/net/dev type: rate
> - iface Rx
> Tx Total
>
> ==============================================================================
>
> lo: 0.00 b/s 0.00 b/s
> 0.00 b/s
> eth0: 346.64 Mb/s 487.24 Mb/s
> 833.88 Mb/s
> eth1: 487.48 Mb/s 344.14 Mb/s
> 831.61 Mb/s
> vlan0811: 273.29 Mb/s 381.71 Mb/s
> 655.01 Mb/s
> vlan0508: 64.62 Mb/s 105.54 Mb/s
> 170.15 Mb/s
>
> ------------------------------------------------------------------------------
>
> total: 1.14 Gb/s 1.29 Gb/s
> 2.43 Gb/s
>
>
>
Ok i forget to add that on this router there is traffic management:
tc -s -d filter show dev eth1 | grep flowid | wc -l
9096
tc -s -d filter show dev vlan0811 | grep flowid | wc -l
9096
Those are iproute hashing filters.
Without filters on interfaces i have 50% idle. - so iproute traffic
management take 30% of cpu more.
>
>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 14:32 ` Eric Dumazet
@ 2010-02-08 19:32 ` Eric Dumazet
2010-02-08 23:01 ` David Miller
2010-02-08 23:26 ` Andrew Morton
0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2010-02-08 19:32 UTC (permalink / raw)
To: Paweł Staszewski, David Miller; +Cc: Linux Network Development list
Le lundi 08 février 2010 à 15:32 +0100, Eric Dumazet a écrit :
> Le lundi 08 février 2010 à 15:16 +0100, Paweł Staszewski a écrit :
>
> > >
> > Some day ago after info about route cache i was have also this info:
>
> > Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 48
> > 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3> 55 48 89 f5 53 48
> > 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
> > Call Trace:
> > <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
> > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
> > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
> > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
> > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
> > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> > <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63
> > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
> > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
> > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
> > [<ffffffff8136b08c>] ? schedule+0x82c/0x906
> > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
> > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
> > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
> > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
> > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
> > [<ffffffff810479bd>] ? kthread+0x79/0x81
> > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
> > [<ffffffff81047944>] ? kthread+0x0/0x81
> >
> >
> > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
> >
> >
>
> This trace is indeed very interesting, since dst_gc_task() is run from a
> work queue, and there is no scheduling point in it.
>
> We might need add a scheduling point in dst_gc_task() in case huge
> number of entries were flushed.
>
David, here is the patch I sent to Pawel to solve this problem.
This probaby is a stable candidate.
Thanks
[PATCH] dst: call cond_resched() in dst_gc_task()
On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.
Fix is to call cond_resched(), as we run in process context.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dst.c b/net/core/dst.c
index 57bc4d5..cb1b348 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -17,6 +17,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <net/net_namespace.h>
+#include <linux/sched.h>
#include <net/dst.h>
@@ -79,6 +80,7 @@ loop:
while ((dst = next) != NULL) {
next = dst->next;
prefetch(&next->next);
+ cond_resched();
if (likely(atomic_read(&dst->__refcnt))) {
last->next = dst;
last = dst;
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet
@ 2010-02-08 23:01 ` David Miller
2010-02-09 6:07 ` Eric Dumazet
2010-02-08 23:26 ` Andrew Morton
1 sibling, 1 reply; 22+ messages in thread
From: David Miller @ 2010-02-08 23:01 UTC (permalink / raw)
To: eric.dumazet; +Cc: pstaszewski, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 08 Feb 2010 20:32:40 +0100
> [PATCH] dst: call cond_resched() in dst_gc_task()
>
> On some workloads, it is quite possible to get a huge dst list to
> process in dst_gc_task(), and trigger soft lockup detection.
>
> Fix is to call cond_resched(), as we run in process context.
>
> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied and queued up to -stable.
When fixing bugs with kernel bugzilla entries, please
mention them in the commit message. I fixed this up for
you but please take care of it next time.
Thanks!
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet
2010-02-08 23:01 ` David Miller
@ 2010-02-08 23:26 ` Andrew Morton
2010-02-08 23:34 ` David Miller
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2010-02-08 23:26 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paweł Staszewski, David Miller,
Linux Network Development list
On Mon, 08 Feb 2010 20:32:40 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> [PATCH] dst: call cond_resched() in dst_gc_task()
>
> On some workloads, it is quite possible to get a huge dst list to
> process in dst_gc_task(), and trigger soft lockup detection.
>
> Fix is to call cond_resched(), as we run in process context.
>
> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 57bc4d5..cb1b348 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -17,6 +17,7 @@
> #include <linux/string.h>
> #include <linux/types.h>
> #include <net/net_namespace.h>
> +#include <linux/sched.h>
>
> #include <net/dst.h>
>
> @@ -79,6 +80,7 @@ loop:
> while ((dst = next) != NULL) {
> next = dst->next;
> prefetch(&next->next);
> + cond_resched();
> if (likely(atomic_read(&dst->__refcnt))) {
> last->next = dst;
> last = dst;
Gad. Am I understanding this right? The softlockup threshold is sixty
seconds!
I assume that this function spends most of its time walking over busy
entries? Is a more powerful data structure needed?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:26 ` Andrew Morton
@ 2010-02-08 23:34 ` David Miller
2010-02-08 23:37 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2010-02-08 23:34 UTC (permalink / raw)
To: akpm; +Cc: eric.dumazet, pstaszewski, netdev
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 8 Feb 2010 15:26:06 -0800
> I assume that this function spends most of its time walking over busy
> entries? Is a more powerful data structure needed?
When you're getting pounded with millions of packets per second,
all mostly to different destinations (and thus resolving to
different routing cache entries), this is what happens.
For a busy router, really, this is normal behavior.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:34 ` David Miller
@ 2010-02-08 23:37 ` Andrew Morton
2010-02-08 23:50 ` David Miller
2010-02-08 23:50 ` Stephen Hemminger
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2010-02-08 23:37 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, pstaszewski, netdev
On Mon, 08 Feb 2010 15:34:06 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 8 Feb 2010 15:26:06 -0800
>
> > I assume that this function spends most of its time walking over busy
> > entries? Is a more powerful data structure needed?
>
> When you're getting pounded with millions of packets per second,
> all mostly to different destinations (and thus resolving to
> different routing cache entries), this is what happens.
>
> For a busy router, really, this is normal behavior.
Is the cache a net win in that scenario?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:37 ` Andrew Morton
@ 2010-02-08 23:50 ` David Miller
2010-02-08 23:50 ` Stephen Hemminger
1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2010-02-08 23:50 UTC (permalink / raw)
To: akpm; +Cc: eric.dumazet, pstaszewski, netdev
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 8 Feb 2010 15:37:44 -0800
> On Mon, 08 Feb 2010 15:34:06 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> For a busy router, really, this is normal behavior.
>
> Is the cache a net win in that scenario?
Absolutely.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:37 ` Andrew Morton
2010-02-08 23:50 ` David Miller
@ 2010-02-08 23:50 ` Stephen Hemminger
2010-02-09 6:06 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2010-02-08 23:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Miller, eric.dumazet, pstaszewski, netdev
On Mon, 8 Feb 2010 15:37:44 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 08 Feb 2010 15:34:06 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Mon, 8 Feb 2010 15:26:06 -0800
> >
> > > I assume that this function spends most of its time walking over busy
> > > entries? Is a more powerful data structure needed?
> >
> > When you're getting pounded with millions of packets per second,
> > all mostly to different destinations (and thus resolving to
> > different routing cache entries), this is what happens.
> >
> > For a busy router, really, this is normal behavior.
>
> Is the cache a net win in that scenario?
No, cache doesn't help.
Robert who is the expert in this area, runs with FIB TRIE and
no routing cache.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:50 ` Stephen Hemminger
@ 2010-02-09 6:06 ` Eric Dumazet
2010-02-09 6:35 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-09 6:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Andrew Morton, David Miller, pstaszewski, netdev
Le lundi 08 février 2010 à 15:50 -0800, Stephen Hemminger a écrit :
> No, cache doesn't help.
>
> Robert who is the expert in this area, runs with FIB TRIE and
> no routing cache.
Who knows, it probably depends on many factors. I always run with cache
enabled, because it saves cycles on moderate load.
FIB_TRIE is unrelated here, if routing table is very small, it fits HASH
or TRIE.
Pawel hit the bug with tunables that basically enabled the cache but in
a non helpful way (filling the list of busy dst). User error combined
with a lazy kernel function :)
Please note that conversion from softirq to workqueue, without
scheduling point, might/probably use same cpu for handling network irqs
and running dst_gc_task() :
On big routers, admins usually use irq affinities, so we can have very
litle cpu time available to run other tasks on those cpus.
After this patch, I believe that scheduler is allowed to migrate
dst_gc_task() to an idle cpu.
Another point (for 2.6.34) to address is the dst_gc_mutex that can delay
NETDEV_UNREGISTER/NETDEV_DOWN events for a long period.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-08 23:01 ` David Miller
@ 2010-02-09 6:07 ` Eric Dumazet
0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2010-02-09 6:07 UTC (permalink / raw)
To: David Miller; +Cc: pstaszewski, netdev
Le lundi 08 février 2010 à 15:01 -0800, David Miller a écrit :
>
> When fixing bugs with kernel bugzilla entries, please
> mention them in the commit message. I fixed this up for
> you but please take care of it next time.
>
> Thanks!
Sorry Dave, I was not aware of the bugzilla entry.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-09 6:06 ` Eric Dumazet
@ 2010-02-09 6:35 ` Andrew Morton
2010-02-09 7:20 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2010-02-09 6:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev
On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> After this patch, I believe that scheduler is allowed to migrate
> dst_gc_task() to an idle cpu.
No, keventd threads are each pinned to a single CPU (kthread_bind() in
start_workqueue_thread()), so dst_gc_task() gets run on the CPU which
ran schedule_delayed_work() and no other.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-09 6:35 ` Andrew Morton
@ 2010-02-09 7:20 ` Eric Dumazet
2010-02-09 7:31 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-02-09 7:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev
Le lundi 08 février 2010 à 22:35 -0800, Andrew Morton a écrit :
> On Tue, 09 Feb 2010 07:06:38 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > After this patch, I believe that scheduler is allowed to migrate
> > dst_gc_task() to an idle cpu.
>
> No, keventd threads are each pinned to a single CPU (kthread_bind() in
> start_workqueue_thread()), so dst_gc_task() gets run on the CPU which
> ran schedule_delayed_work() and no other.
>
Ah OK, thanks Andrew for this clarification.
I suppose offlining a cpu migrates its works to another (online) cpu ?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] dst: call cond_resched() in dst_gc_task()
2010-02-09 7:20 ` Eric Dumazet
@ 2010-02-09 7:31 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2010-02-09 7:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, pstaszewski, netdev
On Tue, 09 Feb 2010 08:20:36 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I suppose offlining a cpu migrates its works to another (online) cpu ?
Sort of, effectively. The workqueue code runs all the pending works on
the to-be-offlined CPU and then it's done.
schedule_delayed_work() starts out with a timer, and the timer code
_does_ perform migration off the going-away CPU.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-02-09 7:31 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-08 13:16 Problem wit route cache Paweł Staszewski
2010-02-08 13:28 ` Eric Dumazet
2010-02-08 13:33 ` Paweł Staszewski
2010-02-08 13:51 ` Eric Dumazet
2010-02-08 13:59 ` Paweł Staszewski
2010-02-08 14:06 ` Eric Dumazet
2010-02-08 14:16 ` Paweł Staszewski
2010-02-08 14:32 ` Eric Dumazet
2010-02-08 19:32 ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet
2010-02-08 23:01 ` David Miller
2010-02-09 6:07 ` Eric Dumazet
2010-02-08 23:26 ` Andrew Morton
2010-02-08 23:34 ` David Miller
2010-02-08 23:37 ` Andrew Morton
2010-02-08 23:50 ` David Miller
2010-02-08 23:50 ` Stephen Hemminger
2010-02-09 6:06 ` Eric Dumazet
2010-02-09 6:35 ` Andrew Morton
2010-02-09 7:20 ` Eric Dumazet
2010-02-09 7:31 ` Andrew Morton
2010-02-08 14:32 ` Problem wit route cache Paweł Staszewski
2010-02-08 14:45 ` Paweł Staszewski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).