From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Linux Network Development list <netdev@vger.kernel.org>
Subject: Re: weird problem
Date: Fri, 26 Jun 2009 19:57:39 +0200 [thread overview]
Message-ID: <4A450C13.9050907@itcare.pl> (raw)
In-Reply-To: <4A450955.1010806@itcare.pl>
Paweł Staszewski pisze:
> Eric Dumazet pisze:
>> Jarek Poplawski a écrit :
>>
>>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
>>>
>>>> On 25-06-2009 22:18, Eric Dumazet wrote:
>>>>
>>>>> Pawe? Staszewski a ?crit :
>>>>>
>>>>>> Ok
>>>>>>
>>>>>> After this day of observation im near 100% sure that this cpu
>>>>>> load is
>>>>>> made by route cahce flushes
>>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size
>>>>>> or is
>>>>>> near that size
>>>>>> system is starting to drop some routes from cache then cpu load is
>>>>>> increase from 2% to near 80%
>>>>>> after cleaning / flush cache when cache is filling cpu load is again
>>>>>> normal 2%
>>>>>>
>>>>>> Someone know how to resolve this ?
>>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from
>>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and
>>>>>> on all
>>>>>> this kernels >= 2.6.29 problem with cpu load is the same.
>>>>>>
>>>>>> I can minimize this cpu fluctuations by changing of route cache
>>>>>> /proc
>>>>>> parameters but the best result for my router was
>>>>>>
>>>>>> 15 sec of 2% cpu
>>>>>> and after
>>>>>> 15sec of 80% cpu
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Pawel Staszewski
>>>>>>
>>>>> I believe this is known 2.6.29 regressions
>>>>>
>>>>> Following two commits should correct the problem you have
>>>>>
>>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this
>>>>> recent kernel
>>>>> is ok on your machine ?
>>>>>
>>>> Btw., the first of these commits is in 2.6.30, which according to
>>>>
>>> And the second as well.
>>>
>>>
>>
>> Thanks Jarek.
>>
>> Pawel made some reports errors in fib thread, so I am not sure he really
>> tried 2.6.30 and had same oprofile results.
>>
>> rt_worker_func() taking 13% of cpu0 is an alarm for me :)
>> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too...
>>
oprofile from:
Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64
Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
CPU: Core 2, speed 3000.21 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit mask of 0x00 (Unhalted core cycles) count 100000
Samples on CPU 0
Samples on CPU 1
Samples on CPU 2
Samples on CPU 3
Samples on CPU 4
Samples on CPU 5
Samples on CPU 6
Samples on CPU 7
samples % samples % samples % samples %
samples % samples % samples % samples %
image name
app name symbol name
9999 19.0926 0 0 0 0 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux rt_worker_func
9017 17.2175 219 0.0277 13 0.0017 29 0.0814
26 0.3410 12 0.1420 13 0.1718 115 1.7649
vmlinux
vmlinux free_block
8867 16.9311 13588 1.7177 11888 1.5423 9000 25.2731
801 10.5063 461 5.4550 788 10.4136 476 7.3051
vmlinux
vmlinux mwait_idle
5714 10.9106 197 0.0249 143 0.0186 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux dst_destroy
2917 5.5699 95 0.0120 76 0.0099 33 0.0927
3 0.0393 0 0 2 0.0264 0 0
vmlinux
vmlinux __call_rcu
2293 4.3784 74 0.0094 49 0.0064 48 0.1348
24 0.3148 12 0.1420 22 0.2907 12 0.1842
vmlinux
vmlinux __rcu_process_callbacks
1816 3.4676 24921 3.1504 22351 2.8997 1090 3.0609
443 5.8106 297 3.5144 414 5.4711 382 5.8625
vmlinux
vmlinux _raw_spin_lock
1055 2.0145 24 0.0030 21 0.0027 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux dst_rcu_free
669 1.2774 752 0.0951 717 0.0930 3223 9.0506
679 8.9061 1527 18.0689 232 3.0659 517 7.9343
libc-2.8.so
libc-2.8.so (no symbols)
590 1.1266 1745 0.2206 1208 0.1567 83 0.2331
4 0.0525 5 0.0592 6 0.0793 4 0.0614
vmlinux
vmlinux kmem_cache_free
568 1.0846 36 0.0046 29 0.0038 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux ipv4_dst_destroy
534 1.0196 583 0.0737 641 0.0832 691 1.9404
1402 18.3893 542 6.4134 934 12.3431 736 11.2953
vmlinux
vmlinux tg_shares_up
457 0.8726 20 0.0025 19 0.0025 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux call_rcu_bh
422 0.8058 0 0 429 0.0557 0 0
0 0 845 9.9988 0 0 0 0 bgpd
bgpd bgp_best_selection
397 0.7581 0 0 585 0.0759 0 0
153 2.0068 917 10.8508 0 0 0 0 bgpd
bgpd bgp_route_next
339 0.6473 3085 0.3900 3301 0.4283 169 0.4746
20 0.2623 19 0.2248 36 0.4757 31 0.4758
vmlinux
vmlinux _raw_spin_unlock
319 0.6091 3645 0.4608 3122 0.4050 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux local_bh_enable_ip
290 0.5537 0 0 40 0.0052 0 0
0 0 0 0 0 0 0 0
vmlinux
vmlinux e1000e_update_stats
271 0.5175 152 0.0192 146 0.0189 371 1.0418
626 8.2109 192 2.2719 334 4.4139 165 2.5322
vmlinux
vmlinux find_next_bit
228 0.4354 19 0.0024 7 9.1e-04 5 0.0140
15 0.1967 12 0.1420 10 0.1322 20 0.3069
vmlinux
vmlinux rcu_process_callbacks
217 0.4144 0 0 352 0.0457 0 0
84 1.1018 534 6.3188 0 0 0 0 bgpd
bgpd bgp_scan_timer
203 0.3876 694 0.0877 521 0.0676 6 0.0168
2 0.0262 0 0 2 0.0264 0 0
vmlinux
vmlinux __phys_addr
191 0.3647 182 0.0230 66 0.0086 208 0.5841
608 7.9748 259 3.0647 268 3.5417 134 2.0565
vmlinux
vmlinux find_busiest_group
186 0.3552 5221 0.6600 4432 0.5750 116 0.3257
131 1.7183 90 1.0650 122 1.6123 106 1.6268
vmlinux
>> Pawel, could you give us :
>>
>> grep . /proc/sys/net/ipv4/route/*
>> cat /proc/interrupts
>>
>> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...)
>>
>> I suspect a change in hash table size, and/or change in interrupt
>> affinities...
>>
>>
>>
> first machine:
> Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64
> Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
>
> grep . /proc/sys/net/ipv4/route/*
> /proc/sys/net/ipv4/route/error_burst:1250
> /proc/sys/net/ipv4/route/error_cost:250
> /proc/sys/net/ipv4/route/gc_elasticity:4
> /proc/sys/net/ipv4/route/gc_interval:1
> /proc/sys/net/ipv4/route/gc_min_interval:0
> /proc/sys/net/ipv4/route/gc_min_interval_ms:0
> /proc/sys/net/ipv4/route/gc_thresh:190536
> /proc/sys/net/ipv4/route/gc_timeout:15
> /proc/sys/net/ipv4/route/max_size:524288
> /proc/sys/net/ipv4/route/min_adv_mss:256
> /proc/sys/net/ipv4/route/min_pmtu:552
> /proc/sys/net/ipv4/route/mtu_expires:600
> /proc/sys/net/ipv4/route/redirect_load:5
> /proc/sys/net/ipv4/route/redirect_number:9
> /proc/sys/net/ipv4/route/redirect_silence:5120
> /proc/sys/net/ipv4/route/secret_interval:3600
>
> dmesg | grep route
> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
>
>
> cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4
> CPU5 CPU6 CPU7
> 0: 43 0 0 1 1
> 2 0 0 IO-APIC-edge timer
> 1: 0 0 0 1 0
> 0 0 1 IO-APIC-edge i8042
> 9: 0 0 0 0 0
> 0 0 0 IO-APIC-fasteoi acpi
> 14: 0 0 0 0 0
> 0 0 0 IO-APIC-edge ide0
> 15: 0 0 0 0 0
> 0 0 0 IO-APIC-edge ide1
> 29: 1139988 18351004 89662 3 0
> 1 0 3 PCI-MSI-edge eth0
> 30: 0 2 20221692 1 0
> 3 0 0 PCI-MSI-edge eth1
> 31: 0 1 1 0 0
> 0 0 0 PCI-MSI-edge
> 32: 0 0 0 0 0
> 0 2 0 PCI-MSI-edge
> 33: 1 1 0 0 0
> 0 0 0 PCI-MSI-edge
> 34: 0 0 0 1 0
> 1 0 0 PCI-MSI-edge
> 35: 0 0 0 1 0
> 0 0 1 PCI-MSI-edge
> 36: 0 0 0 0 1
> 0 0 1 PCI-MSI-edge
> 37: 1 0 0 0 0
> 1 0 0 PCI-MSI-edge
> 38: 0 0 1 0 1
> 0 0 0 PCI-MSI-edge
> 39: 0 0 2 0 0
> 0 0 0 PCI-MSI-edge
> 40: 0 0 0 0 0
> 0 2 0 PCI-MSI-edge
> 41: 0 2 0 0 0
> 0 0 0 PCI-MSI-edge
> 42: 0 0 0 0 0
> 2 0 0 PCI-MSI-edge
> 43: 0 0 0 2 0
> 0 0 0 PCI-MSI-edge
> 44: 0 0 0 0 0
> 0 0 2 PCI-MSI-edge
> 45: 2 0 0 0 0
> 0 0 0 PCI-MSI-edge
> 46: 0 0 0 0 2
> 0 0 0 PCI-MSI-edge
> 48: 233 200 185 257 256
> 260 269 257 PCI-MSI-edge ahci
> 49: 0 1 1 0 0
> 2 1 0 PCI-MSI-edge ioat-msi
> NMI: 0 0 0 0 0
> 0 0 0 Non-maskable interrupts
> LOC: 1191321 26059516 25803111 64841 32718
> 26651 54058 24166 Local timer interrupts
> RES: 921 59 58 20 14
> 8 10 13 Rescheduling interrupts
> CAL: 20 85 88 87 90
> 90 91 86 Function call interrupts
> TLB: 103 116 937 954 95
> 115 1006 1020 TLB shootdowns
> SPU: 0 0 0 0 0
> 0 0 0 Spurious interrupts
> ERR: 0
> MIS: 0
>
>
> second machine:
> Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686
> Intel(R) Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux
>
> cat /proc/interrupts
> CPU0 CPU1
> 0: 182 129 IO-APIC-edge timer
> 1: 1886 1672 IO-APIC-edge i8042
> 6: 1 1 IO-APIC-edge floppy
> 9: 0 0 IO-APIC-fasteoi acpi
> 12: 2 2 IO-APIC-edge i8042
> 14: 0 0 IO-APIC-edge ide0
> 15: 0 0 IO-APIC-edge ide1
> 27: 41793 26401 PCI-MSI-edge ahci
> 28: 13482 11260 PCI-MSI-edge eth2
> 29: 3 1326457765 PCI-MSI-edge eth1
> 30: 1240943198 137973134 PCI-MSI-edge eth0
> NMI: 0 0 Non-maskable interrupts
> LOC: 1607938599 1514565603 Local timer interrupts
> SPU: 0 0 Spurious interrupts
> RES: 1098 1190 Rescheduling interrupts
> CAL: 28 105 Function call interrupts
> TLB: 2886 3055 TLB shootdowns
> ERR: 0
> MIS: 0
>
> grep . /proc/sys/net/ipv4/route/*
> /proc/sys/net/ipv4/route/error_burst:1250
> /proc/sys/net/ipv4/route/error_cost:250
> /proc/sys/net/ipv4/route/gc_elasticity:4
> /proc/sys/net/ipv4/route/gc_interval:1
> /proc/sys/net/ipv4/route/gc_min_interval:0
> /proc/sys/net/ipv4/route/gc_min_interval_ms:0
> /proc/sys/net/ipv4/route/gc_thresh:190536
> /proc/sys/net/ipv4/route/gc_timeout:15
> /proc/sys/net/ipv4/route/max_size:1524288
> /proc/sys/net/ipv4/route/min_adv_mss:256
> /proc/sys/net/ipv4/route/min_pmtu:552
> /proc/sys/net/ipv4/route/mtu_expires:600
> /proc/sys/net/ipv4/route/redirect_load:5
> /proc/sys/net/ipv4/route/redirect_number:9
> /proc/sys/net/ipv4/route/redirect_silence:5120
> /proc/sys/net/ipv4/route/secret_interval:3600
>
>
> dmesg | grep route
> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
>
> rtstat -k entries -i 1 -c 10
> rt_cache|
> entries|
> 112754|
> 112446|
> 112277|
> 111451|
> 111042|
> 110314|
> 109153|
> 108370|
> 107730|
> 107478|
>
>
>
>
>> Change in hash table size comes from commit
>> c9503e0fe052020e0294cd07d0ecd982eb7c9177
>>
>> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe
>> his hash table is smaller than 512k entries!
>>
>> Author: Anton Blanchard <anton@samba.org>
>> Date: Mon Apr 27 05:42:24 2009 -0700
>>
>> ipv4: Limit size of route cache hash table
>>
>> Right now we have no upper limit on the size of the route cache
>> hash table.
>> On a 128GB POWER6 box it ends up as 32MB:
>>
>> IP route cache hash table entries: 4194304 (order: 9,
>> 33554432 bytes)
>>
>> It would be nice to cap this for memory consumption reasons, but
>> a massive
>> hashtable also causes a significant spike when measuring OS jitter.
>>
>> With a 32MB hashtable and 4 million entries, rt_worker_func is
>> taking
>> 5 ms to complete. On another system with more memory it's taking
>> 14 ms.
>> Even though rt_worker_func does call cond_sched() to limit its
>> impact,
>> in an HPC environment we want to keep all sources of OS jitter to
>> a minimum.
>>
>> With the patch applied we limit the number of entries to 512k which
>> can still be overriden by using the rt_entries boot option:
>>
>> IP route cache hash table entries: 524288 (order: 6, 4194304
>> bytes)
>>
>> With this patch rt_worker_func now takes 0.460 ms on the same
>> system.
>>
>> Signed-off-by: Anton Blanchard <anton@samba.org>
>> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2009-06-26 17:57 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-25 16:06 weird problem Paweł Staszewski
2009-06-25 16:33 ` Paweł Staszewski
2009-06-25 17:18 ` Paweł Staszewski
2009-06-25 19:45 ` Paweł Staszewski
2009-06-25 20:18 ` Eric Dumazet
2009-06-25 22:23 ` Paweł Staszewski
2009-06-26 8:37 ` Jarek Poplawski
2009-06-26 9:05 ` Jarek Poplawski
2009-06-26 10:19 ` Eric Dumazet
2009-06-26 17:45 ` Paweł Staszewski
2009-06-26 17:57 ` Paweł Staszewski [this message]
2009-06-30 6:40 ` Jarek Poplawski
2009-06-30 8:35 ` Paweł Staszewski
2009-06-30 8:36 ` Paweł Staszewski
2009-07-08 22:34 ` Jarek Poplawski
2009-07-09 23:14 ` Paweł Staszewski
2009-07-09 23:59 ` Paweł Staszewski
2009-07-10 14:47 ` Jarek Poplawski
2009-07-11 6:24 ` Jarek Poplawski
2009-07-13 23:26 ` Paweł Staszewski
2009-07-14 16:24 ` Jarek Poplawski
2009-07-15 20:15 ` Paweł Staszewski
2009-07-15 22:43 ` Jarek Poplawski
2009-07-16 11:01 ` Jarek Poplawski
-- strict thread matches above, loose matches on Subject: below --
2003-10-14 11:00 Weird problem Jean-Rene Cormier
[not found] ` <3F8BEAEB.1060005@Loudoun-Fairfax.com>
[not found] ` <1066136413.12935.43.camel@forbidden.cipanb.ca>
2003-10-14 15:31 ` Jeffrey Laramie
[not found] ` <3F8C1700.3070902@Loudoun-Fairfax.com>
2003-10-14 16:59 ` Jean-Rene Cormier
2003-10-14 17:49 ` Jeffrey Laramie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A450C13.9050907@itcare.pl \
--to=pstaszewski@itcare.pl \
--cc=dada1@cosmosbay.com \
--cc=eric.dumazet@gmail.com \
--cc=jarkao2@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.