From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Linux Network Development list <netdev@vger.kernel.org>
Subject: Re: weird problem
Date: Fri, 26 Jun 2009 19:45:57 +0200 [thread overview]
Message-ID: <4A450955.1010806@itcare.pl> (raw)
In-Reply-To: <4A44A098.8080006@cosmosbay.com>
Eric Dumazet pisze:
> Jarek Poplawski a écrit :
>
>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
>>
>>> On 25-06-2009 22:18, Eric Dumazet wrote:
>>>
>>>> Pawe? Staszewski a ?crit :
>>>>
>>>>> Ok
>>>>>
>>>>> After this day of observation im near 100% sure that this cpu load is
>>>>> made by route cahce flushes
>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size or is
>>>>> near that size
>>>>> system is starting to drop some routes from cache then cpu load is
>>>>> increase from 2% to near 80%
>>>>> after cleaning / flush cache when cache is filling cpu load is again
>>>>> normal 2%
>>>>>
>>>>> Someone know how to resolve this ?
>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from
>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and on all
>>>>> this kernels >= 2.6.29 problem with cpu load is the same.
>>>>>
>>>>> I can minimize this cpu fluctuations by changing of route cache /proc
>>>>> parameters but the best result for my router was
>>>>>
>>>>> 15 sec of 2% cpu
>>>>> and after
>>>>> 15sec of 80% cpu
>>>>>
>>>>>
>>>>> Regards
>>>>> Pawel Staszewski
>>>>>
>>>> I believe this is known 2.6.29 regressions
>>>>
>>>> Following two commits should correct the problem you have
>>>>
>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this recent kernel
>>>> is ok on your machine ?
>>>>
>>> Btw., the first of these commits is in 2.6.30, which according to
>>>
>> And the second as well.
>>
>>
>
> Thanks Jarek.
>
> Pawel made some reports errors in fib thread, so I am not sure he really
> tried 2.6.30 and had same oprofile results.
>
> rt_worker_func() taking 13% of cpu0 is an alarm for me :)
> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too...
>
> Pawel, could you give us :
>
> grep . /proc/sys/net/ipv4/route/*
> cat /proc/interrupts
>
> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...)
>
> I suspect a change in hash table size, and/or change in interrupt affinities...
>
>
>
first machine:
Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64
Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600
dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4
CPU5 CPU6 CPU7
0: 43 0 0 1 1
2 0 0 IO-APIC-edge timer
1: 0 0 0 1 0
0 0 1 IO-APIC-edge i8042
9: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi acpi
14: 0 0 0 0 0
0 0 0 IO-APIC-edge ide0
15: 0 0 0 0 0
0 0 0 IO-APIC-edge ide1
29: 1139988 18351004 89662 3 0
1 0 3 PCI-MSI-edge eth0
30: 0 2 20221692 1 0
3 0 0 PCI-MSI-edge eth1
31: 0 1 1 0 0
0 0 0 PCI-MSI-edge
32: 0 0 0 0 0
0 2 0 PCI-MSI-edge
33: 1 1 0 0 0
0 0 0 PCI-MSI-edge
34: 0 0 0 1 0
1 0 0 PCI-MSI-edge
35: 0 0 0 1 0
0 0 1 PCI-MSI-edge
36: 0 0 0 0 1
0 0 1 PCI-MSI-edge
37: 1 0 0 0 0
1 0 0 PCI-MSI-edge
38: 0 0 1 0 1
0 0 0 PCI-MSI-edge
39: 0 0 2 0 0
0 0 0 PCI-MSI-edge
40: 0 0 0 0 0
0 2 0 PCI-MSI-edge
41: 0 2 0 0 0
0 0 0 PCI-MSI-edge
42: 0 0 0 0 0
2 0 0 PCI-MSI-edge
43: 0 0 0 2 0
0 0 0 PCI-MSI-edge
44: 0 0 0 0 0
0 0 2 PCI-MSI-edge
45: 2 0 0 0 0
0 0 0 PCI-MSI-edge
46: 0 0 0 0 2
0 0 0 PCI-MSI-edge
48: 233 200 185 257 256
260 269 257 PCI-MSI-edge ahci
49: 0 1 1 0 0
2 1 0 PCI-MSI-edge ioat-msi
NMI: 0 0 0 0 0
0 0 0 Non-maskable interrupts
LOC: 1191321 26059516 25803111 64841 32718
26651 54058 24166 Local timer interrupts
RES: 921 59 58 20 14
8 10 13 Rescheduling interrupts
CAL: 20 85 88 87 90
90 91 86 Function call interrupts
TLB: 103 116 937 954 95
115 1006 1020 TLB shootdowns
SPU: 0 0 0 0 0
0 0 0 Spurious interrupts
ERR: 0
MIS: 0
second machine:
Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686 Intel(R)
Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux
cat /proc/interrupts
CPU0 CPU1
0: 182 129 IO-APIC-edge timer
1: 1886 1672 IO-APIC-edge i8042
6: 1 1 IO-APIC-edge floppy
9: 0 0 IO-APIC-fasteoi acpi
12: 2 2 IO-APIC-edge i8042
14: 0 0 IO-APIC-edge ide0
15: 0 0 IO-APIC-edge ide1
27: 41793 26401 PCI-MSI-edge ahci
28: 13482 11260 PCI-MSI-edge eth2
29: 3 1326457765 PCI-MSI-edge eth1
30: 1240943198 137973134 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 1607938599 1514565603 Local timer interrupts
SPU: 0 0 Spurious interrupts
RES: 1098 1190 Rescheduling interrupts
CAL: 28 105 Function call interrupts
TLB: 2886 3055 TLB shootdowns
ERR: 0
MIS: 0
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:1524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600
dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
rtstat -k entries -i 1 -c 10
rt_cache|
entries|
112754|
112446|
112277|
111451|
111042|
110314|
109153|
108370|
107730|
107478|
> Change in hash table size comes from commit c9503e0fe052020e0294cd07d0ecd982eb7c9177
>
> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe
> his hash table is smaller than 512k entries!
>
> Author: Anton Blanchard <anton@samba.org>
> Date: Mon Apr 27 05:42:24 2009 -0700
>
> ipv4: Limit size of route cache hash table
>
> Right now we have no upper limit on the size of the route cache hash table.
> On a 128GB POWER6 box it ends up as 32MB:
>
> IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
>
> It would be nice to cap this for memory consumption reasons, but a massive
> hashtable also causes a significant spike when measuring OS jitter.
>
> With a 32MB hashtable and 4 million entries, rt_worker_func is taking
> 5 ms to complete. On another system with more memory it's taking 14 ms.
> Even though rt_worker_func does call cond_sched() to limit its impact,
> in an HPC environment we want to keep all sources of OS jitter to a minimum.
>
> With the patch applied we limit the number of entries to 512k which
> can still be overriden by using the rt_entries boot option:
>
> IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
>
> With this patch rt_worker_func now takes 0.460 ms on the same system.
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Acked-by: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
next prev parent reply other threads:[~2009-06-26 17:45 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-25 16:06 weird problem Paweł Staszewski
2009-06-25 16:33 ` Paweł Staszewski
2009-06-25 17:18 ` Paweł Staszewski
2009-06-25 19:45 ` Paweł Staszewski
2009-06-25 20:18 ` Eric Dumazet
2009-06-25 22:23 ` Paweł Staszewski
2009-06-26 8:37 ` Jarek Poplawski
2009-06-26 9:05 ` Jarek Poplawski
2009-06-26 10:19 ` Eric Dumazet
2009-06-26 17:45 ` Paweł Staszewski [this message]
2009-06-26 17:57 ` Paweł Staszewski
2009-06-30 6:40 ` Jarek Poplawski
2009-06-30 8:35 ` Paweł Staszewski
2009-06-30 8:36 ` Paweł Staszewski
2009-07-08 22:34 ` Jarek Poplawski
2009-07-09 23:14 ` Paweł Staszewski
2009-07-09 23:59 ` Paweł Staszewski
2009-07-10 14:47 ` Jarek Poplawski
2009-07-11 6:24 ` Jarek Poplawski
2009-07-13 23:26 ` Paweł Staszewski
2009-07-14 16:24 ` Jarek Poplawski
2009-07-15 20:15 ` Paweł Staszewski
2009-07-15 22:43 ` Jarek Poplawski
2009-07-16 11:01 ` Jarek Poplawski
-- strict thread matches above, loose matches on Subject: below --
2003-10-14 11:00 Weird problem Jean-Rene Cormier
[not found] ` <3F8BEAEB.1060005@Loudoun-Fairfax.com>
[not found] ` <1066136413.12935.43.camel@forbidden.cipanb.ca>
2003-10-14 15:31 ` Jeffrey Laramie
[not found] ` <3F8C1700.3070902@Loudoun-Fairfax.com>
2003-10-14 16:59 ` Jean-Rene Cormier
2003-10-14 17:49 ` Jeffrey Laramie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A450955.1010806@itcare.pl \
--to=pstaszewski@itcare.pl \
--cc=dada1@cosmosbay.com \
--cc=eric.dumazet@gmail.com \
--cc=jarkao2@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.