All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linux Network Development list <netdev@vger.kernel.org>
Subject: Re: Problem wit route cache
Date: Mon, 08 Feb 2010 15:32:54 +0100	[thread overview]
Message-ID: <4B702096.7030101@itcare.pl> (raw)
In-Reply-To: <4B701CA8.7050205@itcare.pl>

W dniu 2010-02-08 15:16, Paweł Staszewski pisze:
> W dniu 2010-02-08 15:06, Eric Dumazet pisze:
>> Le lundi 08 février 2010 à 14:59 +0100, Paweł Staszewski a écrit :
>>> W dniu 2010-02-08 14:51, Eric Dumazet pisze:
>>>> Le lundi 08 février 2010 à 14:33 +0100, Paweł Staszewski a écrit :
>>>>
>>>>
>>>>>>
>>>>> Yes this is x86_64 kernel
>>>>> i kernels  2.6.32.2 /  2.6.32.7 and now 2.6.33-rc6-git5 and on all
>>>>> kernels the same thing happens.
>>>>> grep . /proc/sys/net/ipv4/route/*
>>>>> /proc/sys/net/ipv4/route/error_burst:1250
>>>>> /proc/sys/net/ipv4/route/error_cost:250
>>>>> grep: /proc/sys/net/ipv4/route/flush: Permission denied
>>>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>>> /proc/sys/net/ipv4/route/gc_interval:2
>>>>> /proc/sys/net/ipv4/route/gc_min_interval:0
>>>>> /proc/sys/net/ipv4/route/gc_min_interval_ms:500
>>>>> /proc/sys/net/ipv4/route/gc_thresh:65535
>>>>> /proc/sys/net/ipv4/route/gc_timeout:300
>>>>> /proc/sys/net/ipv4/route/max_size:524288
>>>>> /proc/sys/net/ipv4/route/min_adv_mss:256
>>>>> /proc/sys/net/ipv4/route/min_pmtu:552
>>>>> /proc/sys/net/ipv4/route/mtu_expires:600
>>>>> /proc/sys/net/ipv4/route/redirect_load:5
>>>>> /proc/sys/net/ipv4/route/redirect_number:9
>>>>> /proc/sys/net/ipv4/route/redirect_silence:5120
>>>>> /proc/sys/net/ipv4/route/secret_interval:2
>>>>>
>>>>> This happens not all the time.
>>>>> I have this info only when there are "internet rush hours" - thn 
>>>>> there
>>>>> is about 700Mbit/s TX + 700Mbit/s RX forwarded traffic
>>>>>
>>>>>
>>>> I dont understand your settings, they are very very small for your
>>>> setup. You want to flush cache every 2 seconds...
>>>>
>>>> With 12GB of ram, you could have
>>>>
>>>> /proc/sys/net/ipv4/route/gc_thresh:524288
>>>> /proc/sys/net/ipv4/route/max_size:8388608
>>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>>> /proc/sys/net/ipv4/route/gc_elasticity:4
>>>> /proc/sys/net/ipv4/route/gc_interval:1
>>>>
>>>> That would allow about 2 million entries in your route cache, using 
>>>> 768
>>>> Mbytes of ram, and a good cache hit ratio.
>>>>
>>>>
>>>>
>>> Yes as i write i change this settings after i see first info
>>> "secret_interval" - from 3600 to 2
>>> To check if this resolve the problem.
>>> Also my normal settings are:
>>>
>>> /proc/sys/net/ipv4/route/gc_thresh:256000
>>> /proc/sys/net/ipv4/route/max_size:1048576
>>> /proc/sys/net/ipv4/route/secret_interval:3600
>>> /proc/sys/net/ipv4/route/gc_interval:2
>>> /proc/sys/net/ipv4/route/gc_elasticity:2
>>>
>>> And with this setting i was have this info:
>>> Route hash chain too long!
>>> Adjust your secret_interval!
>>>
>>>
>>>
>>> Now i put Your settings as You suggest ... and we will see but i 
>>> dont know it will help.
>>> Because i try many of different settings.
>>>
>> One important point is the size of hash table, you want something big
>> for your router.
>>
>> # dmesg | grep 'IP route'
>>   ... IP route cache hash table entries: 524288 (order: 10, 4194304
>> bytes)
>>
>
> On my machine it is also the same:
> dmesg | grep 'IP route'
> IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
>
>
>> Then if it is correctly sized, dont change gc_thresh or max_size, as
>> defaults are good.
>>
>> I would only change gc_interval to 1, to perform a smooth gc
>>
>> And eventually gc_elasticity to 4, 5 or 6 if I had less ram than your
>> machine.
>>
> Some day ago after info about route cache i was have  also this info:
> Feb  4 13:12:40 TM_01_C1 ------------[ cut here ]------------
> Feb  4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261 
> dev_watchdog+0x130/0x1d6()
> Feb  4 13:12:40 TM_01_C1 Hardware name: X7DCT
> Feb  4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit 
> queue 0 timed out
> Feb  4 13:12:40 TM_01_C1 Modules linked in: oprofile
> Feb  4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1
> Feb  4 13:12:40 TM_01_C1 Call Trace:
> Feb  4 13:12:40 TM_01_C1 <IRQ>  [<ffffffff812fcaf7>] ? 
> dev_watchdog+0x130/0x1d6
> Feb  4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
> Feb  4 13:12:40 TM_01_C1 [<ffffffff81038811>] ? 
> warn_slowpath_common+0x77/0xa3
> Feb  4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? 
> warn_slowpath_fmt+0x51/0x59
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e
> Feb  4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? 
> try_to_wake_up+0x1eb/0x1f8
> Feb  4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? 
> netdev_drivername+0x3b/0x40
> Feb  4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44
> Feb  4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6
> Feb  4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ? 
> run_timer_softirq+0x1ff/0x29d
> Feb  4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ? 
> smp_apic_timer_interrupt+0x87/0x95
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ? 
> apic_timer_interrupt+0x13/0x20
> Feb  4 13:12:40 TM_01_C1 <EOI>  [<ffffffff810111f5>] ? 
> mwait_idle+0x9b/0xa0
> Feb  4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c
> Feb  4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]---
>
> And after change kernel to 2.6.33-rc6 another different inf:
>
> BUG: soft lockup - CPU#1 stuck for 61s!
> [events/1:28]
> Modules linked in:
> CPU 1
> Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT
> RIP: 0010:[<ffffffff810a3d89>]  [<ffffffff810a3d89>]
> kmem_cache_free+0x11b/0x11c
> RSP: 0018:ffff880028243e50  EFLAGS: 00000292
> RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0
> RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680
> RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c
> R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0
> R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7
> FS:  0000000000000000(0000) GS:ffff880028240000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task 
> ffff88031f9a4f80)
> Stack:
>  ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023
> <0>  01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040
> <0>  0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740
> Call Trace:
> <IRQ>
>  [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
>  [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
>  [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
>  [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
>  [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
>  [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>  [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>  [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> <EOI>
>  [<ffffffff81004599>] ? do_softirq+0x31/0x63
>  [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
>  [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>  [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
>  [<ffffffff8136b08c>] ? schedule+0x82c/0x906
>  [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
>  [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
>  [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
>  [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
>  [<ffffffff810479bd>] ? kthread+0x79/0x81
>  [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff81047944>] ? kthread+0x0/0x81
>  [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
> Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 
> c3 08 48
> 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f<c3>  55 48 89 
> f5 53 48
> 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10
> Call Trace:
> <IRQ>   [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74
>  [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235
>  [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c
>  [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d
>  [<ffffffff81035311>] ? __do_softirq+0xd7/0x196
>  [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
>  [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>  [<ffffffff81002dac>] ? call_softirq+0x1c/0x28
> <EOI>   [<ffffffff81004599>] ? do_softirq+0x31/0x63
>  [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86
>  [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7
>  [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7
>  [<ffffffff8136b08c>] ? schedule+0x82c/0x906
>  [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b
>  [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d
>  [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc
>  [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc
>  [<ffffffff810479bd>] ? kthread+0x79/0x81
>  [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff81047944>] ? kthread+0x0/0x81
>
>
>  [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10
>
>
>
And other weird thing is that when i make affinity for nics and i bind 
eth0 to cpu0 and eth1 to cpu2 i think i have too much cpu load:
mpstat -P ALL 1 10
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  
%steal  %guest   %idle
Average:     all    0.00    0.00    0.00    0.10    1.63   16.71    
0.00    0.00   81.56
Average:       0    0.00    0.00    0.00    0.00    5.10   72.80    
0.00    0.00   22.10
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00  100.00
Average:       2    0.00    0.00    0.00    0.00    8.00   61.00    
0.00    0.00   31.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00  100.00
Average:       5    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00  100.00
Average:       6    0.00    0.00    0.00    0.70    0.00    0.00    
0.00    0.00   99.30
Average:       7    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00  100.00

As You see there is only 22% and 31% cpu idle
With forwarded traffic like here:
  bwm-ng v0.6 (probing every 3.000s), press 'h' for help
   input: /proc/net/dev type: rate
   -         iface                   Rx                   
Tx                Total
   
==============================================================================
                lo:           0.00  b/s            0.00  b/s            
0.00  b/s
              eth0:         346.64 Mb/s          487.24 Mb/s          
833.88 Mb/s
              eth1:         487.48 Mb/s          344.14 Mb/s          
831.61 Mb/s
          vlan0811:         273.29 Mb/s          381.71 Mb/s          
655.01 Mb/s
          vlan0508:          64.62 Mb/s          105.54 Mb/s          
170.15 Mb/s
   
------------------------------------------------------------------------------
             total:           1.14 Gb/s            1.29 Gb/s            
2.43 Gb/s




>
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


  parent reply	other threads:[~2010-02-08 14:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-08 13:16 Problem wit route cache Paweł Staszewski
2010-02-08 13:28 ` Eric Dumazet
2010-02-08 13:33   ` Paweł Staszewski
2010-02-08 13:51     ` Eric Dumazet
2010-02-08 13:59       ` Paweł Staszewski
2010-02-08 14:06         ` Eric Dumazet
2010-02-08 14:16           ` Paweł Staszewski
2010-02-08 14:32             ` Eric Dumazet
2010-02-08 19:32               ` [PATCH] dst: call cond_resched() in dst_gc_task() Eric Dumazet
2010-02-08 23:01                 ` David Miller
2010-02-09  6:07                   ` Eric Dumazet
2010-02-08 23:26                 ` Andrew Morton
2010-02-08 23:34                   ` David Miller
2010-02-08 23:37                     ` Andrew Morton
2010-02-08 23:50                       ` David Miller
2010-02-08 23:50                       ` Stephen Hemminger
2010-02-09  6:06                         ` Eric Dumazet
2010-02-09  6:35                           ` Andrew Morton
2010-02-09  7:20                             ` Eric Dumazet
2010-02-09  7:31                               ` Andrew Morton
2010-02-08 14:32             ` Paweł Staszewski [this message]
2010-02-08 14:45               ` Problem wit route cache Paweł Staszewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B702096.7030101@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.