All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Linux Network Development list <netdev@vger.kernel.org>
Subject: Re: weird problem
Date: Fri, 26 Jun 2009 19:45:57 +0200	[thread overview]
Message-ID: <4A450955.1010806@itcare.pl> (raw)
In-Reply-To: <4A44A098.8080006@cosmosbay.com>

Eric Dumazet pisze:
> Jarek Poplawski a écrit :
>   
>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
>>     
>>> On 25-06-2009 22:18, Eric Dumazet wrote:
>>>       
>>>> Pawe? Staszewski a ?crit :
>>>>         
>>>>> Ok
>>>>>
>>>>> After this day of observation im near 100% sure that this cpu load is
>>>>> made by route cahce flushes
>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size or is
>>>>> near that size
>>>>> system is starting to drop some routes from cache then cpu load is
>>>>> increase from 2% to near 80%
>>>>> after cleaning / flush cache when cache is filling cpu load is again
>>>>> normal 2%
>>>>>
>>>>> Someone know how to resolve this ?
>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from
>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and on all
>>>>> this kernels >= 2.6.29 problem with cpu load is the same.
>>>>>
>>>>> I can minimize this cpu fluctuations by changing of route cache /proc
>>>>> parameters but the best result for my router was
>>>>>
>>>>> 15 sec of 2% cpu
>>>>> and after
>>>>> 15sec of 80% cpu
>>>>>
>>>>>
>>>>> Regards
>>>>> Pawel Staszewski
>>>>>           
>>>> I believe this is known 2.6.29 regressions
>>>>
>>>> Following two commits should correct the problem you have
>>>>
>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this recent kernel
>>>> is ok on your machine ?
>>>>         
>>> Btw., the first of these commits is in 2.6.30, which according to
>>>       
>> And the second as well.
>>
>>     
>
> Thanks Jarek.
>
> Pawel made some reports errors in fib thread, so I am not sure he really
>  tried 2.6.30 and had same oprofile results.
>
> rt_worker_func() taking 13% of cpu0 is an alarm for me :)
> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too...
>
> Pawel, could you give us :
>
> grep . /proc/sys/net/ipv4/route/*
> cat /proc/interrupts
>
> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...)
>
> I suspect a change in hash table size, and/or change in interrupt affinities...
>
>
>   
first machine:
Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64 
Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux

grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600

dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)


cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       
CPU5       CPU6       CPU7
  0:         43          0          0          1          1          
2          0          0   IO-APIC-edge      timer
  1:          0          0          0          1          0          
0          0          1   IO-APIC-edge      i8042
  9:          0          0          0          0          0          
0          0          0   IO-APIC-fasteoi   acpi
 14:          0          0          0          0          0          
0          0          0   IO-APIC-edge      ide0
 15:          0          0          0          0          0          
0          0          0   IO-APIC-edge      ide1
 29:    1139988   18351004      89662          3          0          
1          0          3   PCI-MSI-edge      eth0
 30:          0          2   20221692          1          0          
3          0          0   PCI-MSI-edge      eth1
 31:          0          1          1          0          0          
0          0          0   PCI-MSI-edge
 32:          0          0          0          0          0          
0          2          0   PCI-MSI-edge
 33:          1          1          0          0          0          
0          0          0   PCI-MSI-edge
 34:          0          0          0          1          0          
1          0          0   PCI-MSI-edge
 35:          0          0          0          1          0          
0          0          1   PCI-MSI-edge
 36:          0          0          0          0          1          
0          0          1   PCI-MSI-edge
 37:          1          0          0          0          0          
1          0          0   PCI-MSI-edge
 38:          0          0          1          0          1          
0          0          0   PCI-MSI-edge
 39:          0          0          2          0          0          
0          0          0   PCI-MSI-edge
 40:          0          0          0          0          0          
0          2          0   PCI-MSI-edge
 41:          0          2          0          0          0          
0          0          0   PCI-MSI-edge
 42:          0          0          0          0          0          
2          0          0   PCI-MSI-edge
 43:          0          0          0          2          0          
0          0          0   PCI-MSI-edge
 44:          0          0          0          0          0          
0          0          2   PCI-MSI-edge
 45:          2          0          0          0          0          
0          0          0   PCI-MSI-edge
 46:          0          0          0          0          2          
0          0          0   PCI-MSI-edge
 48:        233        200        185        257        256        
260        269        257   PCI-MSI-edge      ahci
 49:          0          1          1          0          0          
2          1          0   PCI-MSI-edge      ioat-msi
NMI:          0          0          0          0          0          
0          0          0   Non-maskable interrupts
LOC:    1191321   26059516   25803111      64841      32718      
26651      54058      24166   Local timer interrupts
RES:        921         59         58         20         14          
8         10         13   Rescheduling interrupts
CAL:         20         85         88         87         90         
90         91         86   Function call interrupts
TLB:        103        116        937        954         95        
115       1006       1020   TLB shootdowns
SPU:          0          0          0          0          0          
0          0          0   Spurious interrupts
ERR:          0
MIS:          0


second machine:
Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686 Intel(R) 
Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux

cat /proc/interrupts
           CPU0       CPU1
  0:        182        129   IO-APIC-edge      timer
  1:       1886       1672   IO-APIC-edge      i8042
  6:          1          1   IO-APIC-edge      floppy
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          2          2   IO-APIC-edge      i8042
 14:          0          0   IO-APIC-edge      ide0
 15:          0          0   IO-APIC-edge      ide1
 27:      41793      26401   PCI-MSI-edge      ahci
 28:      13482      11260   PCI-MSI-edge      eth2
 29:          3 1326457765   PCI-MSI-edge      eth1
 30: 1240943198  137973134   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC: 1607938599 1514565603   Local timer interrupts
SPU:          0          0   Spurious interrupts
RES:       1098       1190   Rescheduling interrupts
CAL:         28        105   Function call interrupts
TLB:       2886       3055   TLB shootdowns
ERR:          0
MIS:          0

grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:1524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600


dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)

rtstat -k entries -i 1 -c 10
rt_cache|
 entries|
  112754|
  112446|
  112277|
  111451|
  111042|
  110314|
  109153|
  108370|
  107730|
  107478|




> Change in hash table size comes from commit c9503e0fe052020e0294cd07d0ecd982eb7c9177
>
> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe
> his hash table is smaller than 512k entries!
>
> Author: Anton Blanchard <anton@samba.org>
> Date:   Mon Apr 27 05:42:24 2009 -0700
>
>     ipv4: Limit size of route cache hash table
>
>     Right now we have no upper limit on the size of the route cache hash table.
>     On a 128GB POWER6 box it ends up as 32MB:
>
>         IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
>
>     It would be nice to cap this for memory consumption reasons, but a massive
>     hashtable also causes a significant spike when measuring OS jitter.
>
>     With a 32MB hashtable and 4 million entries, rt_worker_func is taking
>     5 ms to complete. On another system with more memory it's taking 14 ms.
>     Even though rt_worker_func does call cond_sched() to limit its impact,
>     in an HPC environment we want to keep all sources of OS jitter to a minimum.
>
>     With the patch applied we limit the number of entries to 512k which
>     can still be overriden by using the rt_entries boot option:
>
>         IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
>
>     With this patch rt_worker_func now takes 0.460 ms on the same system.
>
>     Signed-off-by: Anton Blanchard <anton@samba.org>
>     Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>   


  reply	other threads:[~2009-06-26 17:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-25 16:06 weird problem Paweł Staszewski
2009-06-25 16:33 ` Paweł Staszewski
2009-06-25 17:18   ` Paweł Staszewski
2009-06-25 19:45     ` Paweł Staszewski
2009-06-25 20:18       ` Eric Dumazet
2009-06-25 22:23         ` Paweł Staszewski
2009-06-26  8:37         ` Jarek Poplawski
2009-06-26  9:05           ` Jarek Poplawski
2009-06-26 10:19             ` Eric Dumazet
2009-06-26 17:45               ` Paweł Staszewski [this message]
2009-06-26 17:57                 ` Paweł Staszewski
2009-06-30  6:40                 ` Jarek Poplawski
2009-06-30  8:35                   ` Paweł Staszewski
2009-06-30  8:36                     ` Paweł Staszewski
2009-07-08 22:34                       ` Jarek Poplawski
2009-07-09 23:14                         ` Paweł Staszewski
2009-07-09 23:59                           ` Paweł Staszewski
2009-07-10 14:47                             ` Jarek Poplawski
2009-07-11  6:24                               ` Jarek Poplawski
2009-07-13 23:26                                 ` Paweł Staszewski
2009-07-14 16:24                                   ` Jarek Poplawski
2009-07-15 20:15                                     ` Paweł Staszewski
2009-07-15 22:43                                       ` Jarek Poplawski
2009-07-16 11:01                                       ` Jarek Poplawski
  -- strict thread matches above, loose matches on Subject: below --
2003-10-14 11:00 Weird problem Jean-Rene Cormier
     [not found] ` <3F8BEAEB.1060005@Loudoun-Fairfax.com>
     [not found]   ` <1066136413.12935.43.camel@forbidden.cipanb.ca>
2003-10-14 15:31     ` Jeffrey Laramie
     [not found]     ` <3F8C1700.3070902@Loudoun-Fairfax.com>
2003-10-14 16:59       ` Jean-Rene Cormier
2003-10-14 17:49         ` Jeffrey Laramie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A450955.1010806@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=dada1@cosmosbay.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jarkao2@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.