All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Linux Network Development list <netdev@vger.kernel.org>
Subject: Re: weird problem
Date: Fri, 26 Jun 2009 19:57:39 +0200	[thread overview]
Message-ID: <4A450C13.9050907@itcare.pl> (raw)
In-Reply-To: <4A450955.1010806@itcare.pl>

Paweł Staszewski pisze:
> Eric Dumazet pisze:
>> Jarek Poplawski a écrit :
>>  
>>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
>>>    
>>>> On 25-06-2009 22:18, Eric Dumazet wrote:
>>>>      
>>>>> Pawe? Staszewski a ?crit :
>>>>>        
>>>>>> Ok
>>>>>>
>>>>>> After this day of observation im near 100% sure that this cpu 
>>>>>> load is
>>>>>> made by route cahce flushes
>>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size 
>>>>>> or is
>>>>>> near that size
>>>>>> system is starting to drop some routes from cache then cpu load is
>>>>>> increase from 2% to near 80%
>>>>>> after cleaning / flush cache when cache is filling cpu load is again
>>>>>> normal 2%
>>>>>>
>>>>>> Someone know how to resolve this ?
>>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from
>>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and 
>>>>>> on all
>>>>>> this kernels >= 2.6.29 problem with cpu load is the same.
>>>>>>
>>>>>> I can minimize this cpu fluctuations by changing of route cache 
>>>>>> /proc
>>>>>> parameters but the best result for my router was
>>>>>>
>>>>>> 15 sec of 2% cpu
>>>>>> and after
>>>>>> 15sec of 80% cpu
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Pawel Staszewski
>>>>>>           
>>>>> I believe this is known 2.6.29 regressions
>>>>>
>>>>> Following two commits should correct the problem you have
>>>>>
>>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this 
>>>>> recent kernel
>>>>> is ok on your machine ?
>>>>>         
>>>> Btw., the first of these commits is in 2.6.30, which according to
>>>>       
>>> And the second as well.
>>>
>>>     
>>
>> Thanks Jarek.
>>
>> Pawel made some reports errors in fib thread, so I am not sure he really
>>  tried 2.6.30 and had same oprofile results.
>>
>> rt_worker_func() taking 13% of cpu0 is an alarm for me :)
>> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too...
>>
oprofile from:
Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64 
Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux

CPU: Core 2, speed 3000.21 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a 
unit mask of 0x00 (Unhalted core cycles) count 100000
Samples on CPU 0
Samples on CPU 1
Samples on CPU 2
Samples on CPU 3
Samples on CPU 4
Samples on CPU 5
Samples on CPU 6
Samples on CPU 7
samples  %        samples  %        samples  %        samples  %        
samples  %        samples  %        samples  %        samples  %        
image name
           app name                 symbol name
9999     19.0926  0              0  0              0  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  rt_worker_func
9017     17.2175  219       0.0277  13        0.0017  29        0.0814  
26        0.3410  12        0.1420  13        0.1718  115       1.7649  
vmlinux
           vmlinux                  free_block
8867     16.9311  13588     1.7177  11888     1.5423  9000     25.2731  
801      10.5063  461       5.4550  788      10.4136  476       7.3051  
vmlinux
           vmlinux                  mwait_idle
5714     10.9106  197       0.0249  143       0.0186  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  dst_destroy
2917      5.5699  95        0.0120  76        0.0099  33        0.0927  
3         0.0393  0              0  2         0.0264  0              0  
vmlinux
           vmlinux                  __call_rcu
2293      4.3784  74        0.0094  49        0.0064  48        0.1348  
24        0.3148  12        0.1420  22        0.2907  12        0.1842  
vmlinux
           vmlinux                  __rcu_process_callbacks
1816      3.4676  24921     3.1504  22351     2.8997  1090      3.0609  
443       5.8106  297       3.5144  414       5.4711  382       5.8625  
vmlinux
           vmlinux                  _raw_spin_lock
1055      2.0145  24        0.0030  21        0.0027  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  dst_rcu_free
669       1.2774  752       0.0951  717       0.0930  3223      9.0506  
679       8.9061  1527     18.0689  232       3.0659  517       7.9343  
libc-2.8.so
           libc-2.8.so              (no symbols)
590       1.1266  1745      0.2206  1208      0.1567  83        0.2331  
4         0.0525  5         0.0592  6         0.0793  4         0.0614  
vmlinux
           vmlinux                  kmem_cache_free
568       1.0846  36        0.0046  29        0.0038  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  ipv4_dst_destroy
534       1.0196  583       0.0737  641       0.0832  691       1.9404  
1402     18.3893  542       6.4134  934      12.3431  736      11.2953  
vmlinux
           vmlinux                  tg_shares_up
457       0.8726  20        0.0025  19        0.0025  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  call_rcu_bh
422       0.8058  0              0  429       0.0557  0              0  
0              0  845       9.9988  0              0  0              0  bgpd
           bgpd                     bgp_best_selection
397       0.7581  0              0  585       0.0759  0              0  
153       2.0068  917      10.8508  0              0  0              0  bgpd
           bgpd                     bgp_route_next
339       0.6473  3085      0.3900  3301      0.4283  169       0.4746  
20        0.2623  19        0.2248  36        0.4757  31        0.4758  
vmlinux
           vmlinux                  _raw_spin_unlock
319       0.6091  3645      0.4608  3122      0.4050  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  local_bh_enable_ip
290       0.5537  0              0  40        0.0052  0              0  
0              0  0              0  0              0  0              0  
vmlinux
           vmlinux                  e1000e_update_stats
271       0.5175  152       0.0192  146       0.0189  371       1.0418  
626       8.2109  192       2.2719  334       4.4139  165       2.5322  
vmlinux
           vmlinux                  find_next_bit
228       0.4354  19        0.0024  7        9.1e-04  5         0.0140  
15        0.1967  12        0.1420  10        0.1322  20        0.3069  
vmlinux
           vmlinux                  rcu_process_callbacks
217       0.4144  0              0  352       0.0457  0              0  
84        1.1018  534       6.3188  0              0  0              0  bgpd
           bgpd                     bgp_scan_timer
203       0.3876  694       0.0877  521       0.0676  6         0.0168  
2         0.0262  0              0  2         0.0264  0              0  
vmlinux
           vmlinux                  __phys_addr
191       0.3647  182       0.0230  66        0.0086  208       0.5841  
608       7.9748  259       3.0647  268       3.5417  134       2.0565  
vmlinux
           vmlinux                  find_busiest_group
186       0.3552  5221      0.6600  4432      0.5750  116       0.3257  
131       1.7183  90        1.0650  122       1.6123  106       1.6268  
vmlinux

>> Pawel, could you give us :
>>
>> grep . /proc/sys/net/ipv4/route/*
>> cat /proc/interrupts
>>
>> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...)
>>
>> I suspect a change in hash table size, and/or change in interrupt 
>> affinities...
>>
>>
>>   
> first machine:
> Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64 
> Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
>
> grep . /proc/sys/net/ipv4/route/*
> /proc/sys/net/ipv4/route/error_burst:1250
> /proc/sys/net/ipv4/route/error_cost:250
> /proc/sys/net/ipv4/route/gc_elasticity:4
> /proc/sys/net/ipv4/route/gc_interval:1
> /proc/sys/net/ipv4/route/gc_min_interval:0
> /proc/sys/net/ipv4/route/gc_min_interval_ms:0
> /proc/sys/net/ipv4/route/gc_thresh:190536
> /proc/sys/net/ipv4/route/gc_timeout:15
> /proc/sys/net/ipv4/route/max_size:524288
> /proc/sys/net/ipv4/route/min_adv_mss:256
> /proc/sys/net/ipv4/route/min_pmtu:552
> /proc/sys/net/ipv4/route/mtu_expires:600
> /proc/sys/net/ipv4/route/redirect_load:5
> /proc/sys/net/ipv4/route/redirect_number:9
> /proc/sys/net/ipv4/route/redirect_silence:5120
> /proc/sys/net/ipv4/route/secret_interval:3600
>
> dmesg | grep route
> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
>
>
> cat /proc/interrupts
>           CPU0       CPU1       CPU2       CPU3       CPU4       
> CPU5       CPU6       CPU7
>  0:         43          0          0          1          1          
> 2          0          0   IO-APIC-edge      timer
>  1:          0          0          0          1          0          
> 0          0          1   IO-APIC-edge      i8042
>  9:          0          0          0          0          0          
> 0          0          0   IO-APIC-fasteoi   acpi
> 14:          0          0          0          0          0          
> 0          0          0   IO-APIC-edge      ide0
> 15:          0          0          0          0          0          
> 0          0          0   IO-APIC-edge      ide1
> 29:    1139988   18351004      89662          3          0          
> 1          0          3   PCI-MSI-edge      eth0
> 30:          0          2   20221692          1          0          
> 3          0          0   PCI-MSI-edge      eth1
> 31:          0          1          1          0          0          
> 0          0          0   PCI-MSI-edge
> 32:          0          0          0          0          0          
> 0          2          0   PCI-MSI-edge
> 33:          1          1          0          0          0          
> 0          0          0   PCI-MSI-edge
> 34:          0          0          0          1          0          
> 1          0          0   PCI-MSI-edge
> 35:          0          0          0          1          0          
> 0          0          1   PCI-MSI-edge
> 36:          0          0          0          0          1          
> 0          0          1   PCI-MSI-edge
> 37:          1          0          0          0          0          
> 1          0          0   PCI-MSI-edge
> 38:          0          0          1          0          1          
> 0          0          0   PCI-MSI-edge
> 39:          0          0          2          0          0          
> 0          0          0   PCI-MSI-edge
> 40:          0          0          0          0          0          
> 0          2          0   PCI-MSI-edge
> 41:          0          2          0          0          0          
> 0          0          0   PCI-MSI-edge
> 42:          0          0          0          0          0          
> 2          0          0   PCI-MSI-edge
> 43:          0          0          0          2          0          
> 0          0          0   PCI-MSI-edge
> 44:          0          0          0          0          0          
> 0          0          2   PCI-MSI-edge
> 45:          2          0          0          0          0          
> 0          0          0   PCI-MSI-edge
> 46:          0          0          0          0          2          
> 0          0          0   PCI-MSI-edge
> 48:        233        200        185        257        256        
> 260        269        257   PCI-MSI-edge      ahci
> 49:          0          1          1          0          0          
> 2          1          0   PCI-MSI-edge      ioat-msi
> NMI:          0          0          0          0          0          
> 0          0          0   Non-maskable interrupts
> LOC:    1191321   26059516   25803111      64841      32718      
> 26651      54058      24166   Local timer interrupts
> RES:        921         59         58         20         14          
> 8         10         13   Rescheduling interrupts
> CAL:         20         85         88         87         90         
> 90         91         86   Function call interrupts
> TLB:        103        116        937        954         95        
> 115       1006       1020   TLB shootdowns
> SPU:          0          0          0          0          0          
> 0          0          0   Spurious interrupts
> ERR:          0
> MIS:          0
>
>
> second machine:
> Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686 
> Intel(R) Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux
>
> cat /proc/interrupts
>           CPU0       CPU1
>  0:        182        129   IO-APIC-edge      timer
>  1:       1886       1672   IO-APIC-edge      i8042
>  6:          1          1   IO-APIC-edge      floppy
>  9:          0          0   IO-APIC-fasteoi   acpi
> 12:          2          2   IO-APIC-edge      i8042
> 14:          0          0   IO-APIC-edge      ide0
> 15:          0          0   IO-APIC-edge      ide1
> 27:      41793      26401   PCI-MSI-edge      ahci
> 28:      13482      11260   PCI-MSI-edge      eth2
> 29:          3 1326457765   PCI-MSI-edge      eth1
> 30: 1240943198  137973134   PCI-MSI-edge      eth0
> NMI:          0          0   Non-maskable interrupts
> LOC: 1607938599 1514565603   Local timer interrupts
> SPU:          0          0   Spurious interrupts
> RES:       1098       1190   Rescheduling interrupts
> CAL:         28        105   Function call interrupts
> TLB:       2886       3055   TLB shootdowns
> ERR:          0
> MIS:          0
>
> grep . /proc/sys/net/ipv4/route/*
> /proc/sys/net/ipv4/route/error_burst:1250
> /proc/sys/net/ipv4/route/error_cost:250
> /proc/sys/net/ipv4/route/gc_elasticity:4
> /proc/sys/net/ipv4/route/gc_interval:1
> /proc/sys/net/ipv4/route/gc_min_interval:0
> /proc/sys/net/ipv4/route/gc_min_interval_ms:0
> /proc/sys/net/ipv4/route/gc_thresh:190536
> /proc/sys/net/ipv4/route/gc_timeout:15
> /proc/sys/net/ipv4/route/max_size:1524288
> /proc/sys/net/ipv4/route/min_adv_mss:256
> /proc/sys/net/ipv4/route/min_pmtu:552
> /proc/sys/net/ipv4/route/mtu_expires:600
> /proc/sys/net/ipv4/route/redirect_load:5
> /proc/sys/net/ipv4/route/redirect_number:9
> /proc/sys/net/ipv4/route/redirect_silence:5120
> /proc/sys/net/ipv4/route/secret_interval:3600
>
>
> dmesg | grep route
> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
>
> rtstat -k entries -i 1 -c 10
> rt_cache|
> entries|
>  112754|
>  112446|
>  112277|
>  111451|
>  111042|
>  110314|
>  109153|
>  108370|
>  107730|
>  107478|
>
>
>
>
>> Change in hash table size comes from commit 
>> c9503e0fe052020e0294cd07d0ecd982eb7c9177
>>
>> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe
>> his hash table is smaller than 512k entries!
>>
>> Author: Anton Blanchard <anton@samba.org>
>> Date:   Mon Apr 27 05:42:24 2009 -0700
>>
>>     ipv4: Limit size of route cache hash table
>>
>>     Right now we have no upper limit on the size of the route cache 
>> hash table.
>>     On a 128GB POWER6 box it ends up as 32MB:
>>
>>         IP route cache hash table entries: 4194304 (order: 9, 
>> 33554432 bytes)
>>
>>     It would be nice to cap this for memory consumption reasons, but 
>> a massive
>>     hashtable also causes a significant spike when measuring OS jitter.
>>
>>     With a 32MB hashtable and 4 million entries, rt_worker_func is 
>> taking
>>     5 ms to complete. On another system with more memory it's taking 
>> 14 ms.
>>     Even though rt_worker_func does call cond_sched() to limit its 
>> impact,
>>     in an HPC environment we want to keep all sources of OS jitter to 
>> a minimum.
>>
>>     With the patch applied we limit the number of entries to 512k which
>>     can still be overriden by using the rt_entries boot option:
>>
>>         IP route cache hash table entries: 524288 (order: 6, 4194304 
>> bytes)
>>
>>     With this patch rt_worker_func now takes 0.460 ms on the same 
>> system.
>>
>>     Signed-off-by: Anton Blanchard <anton@samba.org>
>>     Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>   
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


  reply	other threads:[~2009-06-26 17:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-25 16:06 weird problem Paweł Staszewski
2009-06-25 16:33 ` Paweł Staszewski
2009-06-25 17:18   ` Paweł Staszewski
2009-06-25 19:45     ` Paweł Staszewski
2009-06-25 20:18       ` Eric Dumazet
2009-06-25 22:23         ` Paweł Staszewski
2009-06-26  8:37         ` Jarek Poplawski
2009-06-26  9:05           ` Jarek Poplawski
2009-06-26 10:19             ` Eric Dumazet
2009-06-26 17:45               ` Paweł Staszewski
2009-06-26 17:57                 ` Paweł Staszewski [this message]
2009-06-30  6:40                 ` Jarek Poplawski
2009-06-30  8:35                   ` Paweł Staszewski
2009-06-30  8:36                     ` Paweł Staszewski
2009-07-08 22:34                       ` Jarek Poplawski
2009-07-09 23:14                         ` Paweł Staszewski
2009-07-09 23:59                           ` Paweł Staszewski
2009-07-10 14:47                             ` Jarek Poplawski
2009-07-11  6:24                               ` Jarek Poplawski
2009-07-13 23:26                                 ` Paweł Staszewski
2009-07-14 16:24                                   ` Jarek Poplawski
2009-07-15 20:15                                     ` Paweł Staszewski
2009-07-15 22:43                                       ` Jarek Poplawski
2009-07-16 11:01                                       ` Jarek Poplawski
  -- strict thread matches above, loose matches on Subject: below --
2003-10-14 11:00 Weird problem Jean-Rene Cormier
     [not found] ` <3F8BEAEB.1060005@Loudoun-Fairfax.com>
     [not found]   ` <1066136413.12935.43.camel@forbidden.cipanb.ca>
2003-10-14 15:31     ` Jeffrey Laramie
     [not found]     ` <3F8C1700.3070902@Loudoun-Fairfax.com>
2003-10-14 16:59       ` Jean-Rene Cormier
2003-10-14 17:49         ` Jeffrey Laramie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A450C13.9050907@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=dada1@cosmosbay.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jarkao2@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.