0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
@ 2010-01-26 16:58 cold cold
  2010-01-27 15:26 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-26 16:58 UTC (permalink / raw)
  To: netdev

HI,

i have expiriance some CPU usage spikes up to 10% on each four cpus
after the first kernel route cache flush.
After mashine start first 20 min CPU is 0%si  300Mbits/s full duplex
and arount 100k pps forwarded traffic, without any firewall, just
plain routing.

route -n |wc -l
34

ip route show cache | wc -l
2140842

cat /proc/sys/net/ipv4/route/secret_interval
600
cat /proc/sys/net/ipv4/route/max_size
33554432

after kernel  flush i got 10% on all CPUs for 5-6 mins.  It's not from
rebuilding route cashe becouse after
fresh boot or network restrat there is no CPU usage until kernel flush
route cache.
I try to play with rhash_entries= 300000 to 2000000 same result.

Network Card: Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection
CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
RAM: 4G
Kernel: vanilla 2.6.32.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel  flush route cache
  2010-01-26 16:58 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache cold cold
@ 2010-01-27 15:26 ` Eric Dumazet
  2010-01-27 19:53   ` cold cold
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-01-27 15:26 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le mardi 26 janvier 2010 à 18:58 +0200, cold cold a écrit :
> HI,
> 
> 
> i have expiriance some CPU usage spikes up to 10% on each four cpus
> after the first kernel route cache flush.
> After mashine start first 20 min CPU is 0%si  300Mbits/s full duplex
> and arount 100k pps forwarded traffic, without any firewall, just
> plain routing.
> 
> route -n |wc -l
> 34
> 
> ip route show cache | wc -l
> 2140842
> 
> cat /proc/sys/net/ipv4/route/secret_interval
> 600
> cat /proc/sys/net/ipv4/route/max_size
> 33554432
> 
> after kernel  flush i got 10% on all CPUs for 5-6 mins.  It's not from
> rebuilding route cashe becouse after
> fresh boot or network restrat there is no CPU usage until kernel flush
> route cache.
> I try to play with rhash_entries= 300000 to 2000000 same result.

If you have one million dst entries to flush, it takes some time.

You could try to not increase the rhash_entries 
(or keep it low, say 131072)
but tune /proc/sys/net/ipv4/route settings.

Try to reduce gc_elasticity from 8 to 2
Try to reduce gc_interval from 60 to 1

Important thing to consider is to irq affinities (so that one cpu
handles network interrupts, to minimize cache ping poings )




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-27 15:26 ` Eric Dumazet
@ 2010-01-27 19:53   ` cold cold
  2010-01-27 21:18     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-27 19:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, Jan 27, 2010 at 5:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 26 janvier 2010 à 18:58 +0200, cold cold a écrit :
>> HI,
>>
>>
>> i have expiriance some CPU usage spikes up to 10% on each four cpus
>> after the first kernel route cache flush.
>> After mashine start first 20 min CPU is 0%si  300Mbits/s full duplex
>> and arount 100k pps forwarded traffic, without any firewall, just
>> plain routing.
>>
>> route -n |wc -l
>> 34
>>
>> ip route show cache | wc -l
>> 2140842
>>
>> cat /proc/sys/net/ipv4/route/secret_interval
>> 600
>> cat /proc/sys/net/ipv4/route/max_size
>> 33554432
>>
>> after kernel  flush i got 10% on all CPUs for 5-6 mins.  It's not from
>> rebuilding route cashe becouse after
>> fresh boot or network restrat there is no CPU usage until kernel flush
>> route cache.
>> I try to play with rhash_entries= 300000 to 2000000 same result.
>
> If you have one million dst entries to flush, it takes some time.
>
> You could try to not increase the rhash_entries
> (or keep it low, say 131072)
> but tune /proc/sys/net/ipv4/route settings.
>
> Try to reduce gc_elasticity from 8 to 2
> Try to reduce gc_interval from 60 to 1
>
> Important thing to consider is to irq affinities (so that one cpu
> handles network interrupts, to minimize cache ping poings )
>


btw atm i have 1k ip on that router but plan to put 10k so route cache
will grow 10 time fast.
i try to keep low  rhash_entries but garbage collector  is non stop
running and eat CPU
atm i have around 7k  entries per second for second for first minute.

i think it will be better if i flush cache on secret interval instead
of giving work to gc.
i test with 512MB cache and CPU is 0% and flushing entire hash dont take a lot,
I'm not sure is there some side effects from flushing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-27 19:53   ` cold cold
@ 2010-01-27 21:18     ` Eric Dumazet
  2010-01-28  9:14       ` cold cold
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-01-27 21:18 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le mercredi 27 janvier 2010 à 21:53 +0200, cold cold a écrit :

> 
> btw atm i have 1k ip on that router but plan to put 10k so route cache
> will grow 10 time fast.
> i try to keep low  rhash_entries but garbage collector  is non stop
> running and eat CPU
> atm i have around 7k  entries per second for second for first minute.
> 
> i think it will be better if i flush cache on secret interval instead
> of giving work to gc.
> i test with 512MB cache and CPU is 0% and flushing entire hash dont take a lot,
> I'm not sure is there some side effects from flushing.

Flush is immediate, it only marks entries as invalid and they are
cleaned up later.

512 MB cache is quite small for your needs, each entry uses 384 bytes
(assuming you use a 64bit kernel)

In my experiments, I found using gc (and no flushing) was the most
reliable way to have an equilibrium.

WHen setting gc_interval to 1,  the garbage collector is fired every
second and handles 1/300 of entries, from a work queue (thus doesnt stop
packets to be handled by irqs), in a smooth way.

You can post "perf top" results to check where cpu is consumed.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-27 21:18     ` Eric Dumazet
@ 2010-01-28  9:14       ` cold cold
  2010-01-28 16:09         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-28  9:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, Jan 27, 2010 at 11:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 27 janvier 2010 à 21:53 +0200, cold cold a écrit :
>
>>
>> btw atm i have 1k ip on that router but plan to put 10k so route cache
>> will grow 10 time fast.
>> i try to keep low  rhash_entries but garbage collector  is non stop
>> running and eat CPU
>> atm i have around 7k  entries per second for second for first minute.
>>
>> i think it will be better if i flush cache on secret interval instead
>> of giving work to gc.
>> i test with 512MB cache and CPU is 0% and flushing entire hash dont take a lot,
>> I'm not sure is there some side effects from flushing.
>
> Flush is immediate, it only marks entries as invalid and they are
> cleaned up later.
>
> 512 MB cache is quite small for your needs, each entry uses 384 bytes
> (assuming you use a 64bit kernel)
>
> In my experiments, I found using gc (and no flushing) was the most
> reliable way to have an equilibrium.
>
> WHen setting gc_interval to 1,  the garbage collector is fired every
> second and handles 1/300 of entries, from a work queue (thus doesnt stop
> packets to be handled by irqs), in a smooth way.
>
> You can post "perf top" results to check where cpu is consumed.
>
>
>

RX Kpps : 57 TX Kpps : 53  RX Kbits : 331184 TX Kbits : 306213
RX Kpps : 59 TX Kpps : 54  RX Kbits : 345517 TX Kbits : 304323
RX Kpps : 56 TX Kpps : 52  RX Kbits : 331418 TX Kbits : 296032
RX Kpps : 60 TX Kpps : 54  RX Kbits : 362007 TX Kbits : 297371
RX Kpps : 59 TX Kpps : 52  RX Kbits : 360455 TX Kbits : 280603


ON  one cpu, gc_interval to 1, gc_elasticity 2

Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  8.3%hi, 19.7%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

------------------------------------------------------------------------------
   PerfTop:   17064 irqs/sec  kernel:98.0% [100000 cycles],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            40388.00 - 27.8% : acpi_idle_do_entry
            24651.00 - 17.0% : read_hpet
             4271.00 -  2.9% : _spin_lock
             3388.00 -  2.3% : pskb_expand_head
             3288.00 -  2.3% : igb_poll [igb]
             3246.00 -  2.2% : irq_entries_start
             2868.00 -  2.0% : dev_gro_receive
             2665.00 -  1.8% : igb_xmit_frame_adv       [igb]
             2513.00 -  1.7% : ip_route_input
             2144.00 -  1.5% : igb_clean_tx_irq [igb]
             1842.00 -  1.3% : __slab_free
             1544.00 -  1.1% : dev_queue_xmit
             1423.00 -  1.0% : igb_msix_rx      [igb]
             1353.00 -  0.9% : __alloc_skb
             1285.00 -  0.9% : eth_type_trans

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28  9:14       ` cold cold
@ 2010-01-28 16:09         ` Eric Dumazet
  2010-01-28 16:26           ` cold cold
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-01-28 16:09 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le jeudi 28 janvier 2010 à 11:14 +0200, cold cold a écrit :

> 
> RX Kpps : 57 TX Kpps : 53  RX Kbits : 331184 TX Kbits : 306213
> RX Kpps : 59 TX Kpps : 54  RX Kbits : 345517 TX Kbits : 304323
> RX Kpps : 56 TX Kpps : 52  RX Kbits : 331418 TX Kbits : 296032
> RX Kpps : 60 TX Kpps : 54  RX Kbits : 362007 TX Kbits : 297371
> RX Kpps : 59 TX Kpps : 52  RX Kbits : 360455 TX Kbits : 280603
> 
> 
> ON  one cpu, gc_interval to 1, gc_elasticity 2
> 
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  8.3%hi, 19.7%si,  0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> 
> ------------------------------------------------------------------------------
>    PerfTop:   17064 irqs/sec  kernel:98.0% [100000 cycles],  (all, 4 CPUs)
> ------------------------------------------------------------------------------
> 
>              samples    pcnt   kernel function
>              _______   _____   _______________
> 
>             40388.00 - 27.8% : acpi_idle_do_entry
>             24651.00 - 17.0% : read_hpet
>              4271.00 -  2.9% : _spin_lock
>              3388.00 -  2.3% : pskb_expand_head
>              3288.00 -  2.3% : igb_poll [igb]
>              3246.00 -  2.2% : irq_entries_start
>              2868.00 -  2.0% : dev_gro_receive
>              2665.00 -  1.8% : igb_xmit_frame_adv       [igb]
>              2513.00 -  1.7% : ip_route_input
>              2144.00 -  1.5% : igb_clean_tx_irq [igb]
>              1842.00 -  1.3% : __slab_free
>              1544.00 -  1.1% : dev_queue_xmit
>              1423.00 -  1.0% : igb_msix_rx      [igb]
>              1353.00 -  0.9% : __alloc_skb
>              1285.00 -  0.9% : eth_type_trans
> --

All this seems pretty normal profile (regarding networking functions),
your machine should scale without problem.

Of course, the two first functions (acpi_idle_do_entry() & read_hpet())
look suspicious but I have no idea why.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28 16:09         ` Eric Dumazet
@ 2010-01-28 16:26           ` cold cold
  2010-01-28 17:06             ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-28 16:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, Jan 28, 2010 at 6:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 28 janvier 2010 à 11:14 +0200, cold cold a écrit :
>
>>
>> RX Kpps : 57 TX Kpps : 53  RX Kbits : 331184 TX Kbits : 306213
>> RX Kpps : 59 TX Kpps : 54  RX Kbits : 345517 TX Kbits : 304323
>> RX Kpps : 56 TX Kpps : 52  RX Kbits : 331418 TX Kbits : 296032
>> RX Kpps : 60 TX Kpps : 54  RX Kbits : 362007 TX Kbits : 297371
>> RX Kpps : 59 TX Kpps : 52  RX Kbits : 360455 TX Kbits : 280603
>>
>>
>> ON  one cpu, gc_interval to 1, gc_elasticity 2
>>
>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  8.3%hi, 19.7%si,  0.0%st
>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>>
>> ------------------------------------------------------------------------------
>>    PerfTop:   17064 irqs/sec  kernel:98.0% [100000 cycles],  (all, 4 CPUs)
>> ------------------------------------------------------------------------------
>>
>>              samples    pcnt   kernel function
>>              _______   _____   _______________
>>
>>             40388.00 - 27.8% : acpi_idle_do_entry
>>             24651.00 - 17.0% : read_hpet
>>              4271.00 -  2.9% : _spin_lock
>>              3388.00 -  2.3% : pskb_expand_head
>>              3288.00 -  2.3% : igb_poll [igb]
>>              3246.00 -  2.2% : irq_entries_start
>>              2868.00 -  2.0% : dev_gro_receive
>>              2665.00 -  1.8% : igb_xmit_frame_adv       [igb]
>>              2513.00 -  1.7% : ip_route_input
>>              2144.00 -  1.5% : igb_clean_tx_irq [igb]
>>              1842.00 -  1.3% : __slab_free
>>              1544.00 -  1.1% : dev_queue_xmit
>>              1423.00 -  1.0% : igb_msix_rx      [igb]
>>              1353.00 -  0.9% : __alloc_skb
>>              1285.00 -  0.9% : eth_type_trans
>> --
>
> All this seems pretty normal profile (regarding networking functions),
> your machine should scale without problem.
>
> Of course, the two first functions (acpi_idle_do_entry() & read_hpet())
> look suspicious but I have no idea why.
>

i make with flushing without gc on 2 cpu  2 time more traffic  and CPU
usage about 5 times less

top - 11:22:04 up  6:45,  5 users,  load average: 0.00, 0.10, 0.25
Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.3%sy,  0.0%ni, 94.0%id,  0.0%wa,  1.7%hi,  4.0%si,  0.0%st
Cpu1  :  0.3%us,  0.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.7%hi,  1.3%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st


RX Kpps : 90 TX Kpps : 75  RX Kbits : 582791 TX Kbits : 413873
RX Kpps : 87 TX Kpps : 74  RX Kbits : 546327 TX Kbits : 415852
RX Kpps : 87 TX Kpps : 74  RX Kbits : 544820 TX Kbits : 418339
RX Kpps : 88 TX Kpps : 73  RX Kbits : 569143 TX Kbits : 406438

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28 16:26           ` cold cold
@ 2010-01-28 17:06             ` Eric Dumazet
  2010-01-28 18:49               ` cold cold
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-01-28 17:06 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le jeudi 28 janvier 2010 à 18:26 +0200, cold cold a écrit :
> On Thu, Jan 28, 2010 at 6:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 28 janvier 2010 à 11:14 +0200, cold cold a écrit :
> >
> >>
> >> RX Kpps : 57 TX Kpps : 53  RX Kbits : 331184 TX Kbits : 306213
> >> RX Kpps : 59 TX Kpps : 54  RX Kbits : 345517 TX Kbits : 304323
> >> RX Kpps : 56 TX Kpps : 52  RX Kbits : 331418 TX Kbits : 296032
> >> RX Kpps : 60 TX Kpps : 54  RX Kbits : 362007 TX Kbits : 297371
> >> RX Kpps : 59 TX Kpps : 52  RX Kbits : 360455 TX Kbits : 280603
> >>
> >>
> >> ON  one cpu, gc_interval to 1, gc_elasticity 2
> >>
> >> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  8.3%hi, 19.7%si,  0.0%st
> >> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> >> Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> >> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> >>
> >> ------------------------------------------------------------------------------
> >>    PerfTop:   17064 irqs/sec  kernel:98.0% [100000 cycles],  (all, 4 CPUs)
> >> ------------------------------------------------------------------------------
> >>
> >>              samples    pcnt   kernel function
> >>              _______   _____   _______________
> >>
> >>             40388.00 - 27.8% : acpi_idle_do_entry
> >>             24651.00 - 17.0% : read_hpet
> >>              4271.00 -  2.9% : _spin_lock
> >>              3388.00 -  2.3% : pskb_expand_head
> >>              3288.00 -  2.3% : igb_poll [igb]
> >>              3246.00 -  2.2% : irq_entries_start
> >>              2868.00 -  2.0% : dev_gro_receive
> >>              2665.00 -  1.8% : igb_xmit_frame_adv       [igb]
> >>              2513.00 -  1.7% : ip_route_input
> >>              2144.00 -  1.5% : igb_clean_tx_irq [igb]
> >>              1842.00 -  1.3% : __slab_free
> >>              1544.00 -  1.1% : dev_queue_xmit
> >>              1423.00 -  1.0% : igb_msix_rx      [igb]
> >>              1353.00 -  0.9% : __alloc_skb
> >>              1285.00 -  0.9% : eth_type_trans
> >> --
> >
> > All this seems pretty normal profile (regarding networking functions),
> > your machine should scale without problem.
> >
> > Of course, the two first functions (acpi_idle_do_entry() & read_hpet())
> > look suspicious but I have no idea why.
> >
> 
> i make with flushing without gc on 2 cpu  2 time more traffic  and CPU
> usage about 5 times less
> 
> top - 11:22:04 up  6:45,  5 users,  load average: 0.00, 0.10, 0.25
> Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0%us,  0.3%sy,  0.0%ni, 94.0%id,  0.0%wa,  1.7%hi,  4.0%si,  0.0%st
> Cpu1  :  0.3%us,  0.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.7%hi,  1.3%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> 

pure luck I guess, since you use two cpus here instead of one, but
nothing guarantee this.

> 
> RX Kpps : 90 TX Kpps : 75  RX Kbits : 582791 TX Kbits : 413873
> RX Kpps : 87 TX Kpps : 74  RX Kbits : 546327 TX Kbits : 415852
> RX Kpps : 87 TX Kpps : 74  RX Kbits : 544820 TX Kbits : 418339
> RX Kpps : 88 TX Kpps : 73  RX Kbits : 569143 TX Kbits : 406438
> --

Not sure I understand your goals. Previous numbers were with less
trafic ?

What do you want ? your cpus being idle, or your router being able to
forward packets without drops ?

Good, but if you drop packets when real garbage collection is done, you
loose. Make your choice :)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28 17:06             ` Eric Dumazet
@ 2010-01-28 18:49               ` cold cold
  2010-01-28 21:16                 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-28 18:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, Jan 28, 2010 at 7:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 28 janvier 2010 à 18:26 +0200, cold cold a écrit :
>> On Thu, Jan 28, 2010 at 6:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le jeudi 28 janvier 2010 à 11:14 +0200, cold cold a écrit :
>> >
>> >>
>> >> RX Kpps : 57 TX Kpps : 53  RX Kbits : 331184 TX Kbits : 306213
>> >> RX Kpps : 59 TX Kpps : 54  RX Kbits : 345517 TX Kbits : 304323
>> >> RX Kpps : 56 TX Kpps : 52  RX Kbits : 331418 TX Kbits : 296032
>> >> RX Kpps : 60 TX Kpps : 54  RX Kbits : 362007 TX Kbits : 297371
>> >> RX Kpps : 59 TX Kpps : 52  RX Kbits : 360455 TX Kbits : 280603
>> >>
>> >>
>> >> ON  one cpu, gc_interval to 1, gc_elasticity 2
>> >>
>> >> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  8.3%hi, 19.7%si,  0.0%st
>> >> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> >> Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> >> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> >>
>> >> ------------------------------------------------------------------------------
>> >>    PerfTop:   17064 irqs/sec  kernel:98.0% [100000 cycles],  (all, 4 CPUs)
>> >> ------------------------------------------------------------------------------
>> >>
>> >>              samples    pcnt   kernel function
>> >>              _______   _____   _______________
>> >>
>> >>             40388.00 - 27.8% : acpi_idle_do_entry
>> >>             24651.00 - 17.0% : read_hpet
>> >>              4271.00 -  2.9% : _spin_lock
>> >>              3388.00 -  2.3% : pskb_expand_head
>> >>              3288.00 -  2.3% : igb_poll [igb]
>> >>              3246.00 -  2.2% : irq_entries_start
>> >>              2868.00 -  2.0% : dev_gro_receive
>> >>              2665.00 -  1.8% : igb_xmit_frame_adv       [igb]
>> >>              2513.00 -  1.7% : ip_route_input
>> >>              2144.00 -  1.5% : igb_clean_tx_irq [igb]
>> >>              1842.00 -  1.3% : __slab_free
>> >>              1544.00 -  1.1% : dev_queue_xmit
>> >>              1423.00 -  1.0% : igb_msix_rx      [igb]
>> >>              1353.00 -  0.9% : __alloc_skb
>> >>              1285.00 -  0.9% : eth_type_trans
>> >> --
>> >
>> > All this seems pretty normal profile (regarding networking functions),
>> > your machine should scale without problem.
>> >
>> > Of course, the two first functions (acpi_idle_do_entry() & read_hpet())
>> > look suspicious but I have no idea why.
>> >
>>
>> i make with flushing without gc on 2 cpu  2 time more traffic  and CPU
>> usage about 5 times less
>>
>> top - 11:22:04 up  6:45,  5 users,  load average: 0.00, 0.10, 0.25
>> Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
>> Cpu0  :  0.0%us,  0.3%sy,  0.0%ni, 94.0%id,  0.0%wa,  1.7%hi,  4.0%si,  0.0%st
>> Cpu1  :  0.3%us,  0.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.7%hi,  1.3%si,  0.0%st
>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>>
>
> pure luck I guess, since you use two cpus here instead of one, but
> nothing guarantee this.
>
>>
>> RX Kpps : 90 TX Kpps : 75  RX Kbits : 582791 TX Kbits : 413873
>> RX Kpps : 87 TX Kpps : 74  RX Kbits : 546327 TX Kbits : 415852
>> RX Kpps : 87 TX Kpps : 74  RX Kbits : 544820 TX Kbits : 418339
>> RX Kpps : 88 TX Kpps : 73  RX Kbits : 569143 TX Kbits : 406438
>> --
>
> Not sure I understand your goals. Previous numbers were with less
> trafic ?
>
> What do you want ? your cpus being idle, or your router being able to
> forward packets without drops ?
>
> Good, but if you drop packets when real garbage collection is done, you
> loose. Make your choice :)
>
>

what you mean drop packets ?

i test 2 different things and shearing results with you
first test with high CPU is with garbage collection function
second results represent CPU usage with totally disabled  garbage collection

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28 18:49               ` cold cold
@ 2010-01-28 21:16                 ` Eric Dumazet
  2010-01-29  7:38                   ` cold cold
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-01-28 21:16 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le jeudi 28 janvier 2010 à 20:49 +0200, cold cold a écrit :
> what you mean drop packets ?
> 
> i test 2 different things and shearing results with you
> first test with high CPU is with garbage collection function
> second results represent CPU usage with totally disabled  garbage collection

ell, you didnt describe your benchmark method.

1) your results were on different rx/tx workload, and describing your
workload is very important to be able to compare results. Then it should
be exactly same workload.

For example, when tx/tx load is high enough, less cpu overhead is spent
on irq processing, since each IRQ delivers more packets per round.

2) you didnt sent "perf top" results for the second/last one.

  But the first "perf top" results showed less than 1% of cpu time was
used by cache cleanup. I guess you dont want to focus on this, since
its already very good.

Usually, when we want to bench a router, we study how it deals with DDOS
workload. Feeding lot of packets to the device and study what percentage
of them are actually transmitted. Goal being 100% of legit packets of
course.

Route cache settings matter in DDOS situations, and the flush operation
can have a big impact on dropped frames because of cpu/ram congestion.

Because of 600 seconds oscillations, its pretty hard to study exact cpu
use of a router, unless taking samples on long periods.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-28 21:16                 ` Eric Dumazet
@ 2010-01-29  7:38                   ` cold cold
  2010-01-29  9:06                     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: cold cold @ 2010-01-29  7:38 UTC (permalink / raw)
  To: netdev

On Thu, Jan 28, 2010 at 11:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 28 janvier 2010 à 20:49 +0200, cold cold a écrit :
>> what you mean drop packets ?
>>
>> i test 2 different things and shearing results with you
>> first test with high CPU is with garbage collection function
>> second results represent CPU usage with totally disabled  garbage collection
>
> ell, you didnt describe your benchmark method.
>
> 1) your results were on different rx/tx workload, and describing your
> workload is very important to be able to compare results. Then it should
> be exactly same workload.
>
> For example, when tx/tx load is high enough, less cpu overhead is spent
> on irq processing, since each IRQ delivers more packets per round.
>
> 2) you didnt sent "perf top" results for the second/last one.
>
>  But the first "perf top" results showed less than 1% of cpu time was
> used by cache cleanup. I guess you dont want to focus on this, since
> its already very good.
>
>
>
> Usually, when we want to bench a router, we study how it deals with DDOS
> workload. Feeding lot of packets to the device and study what percentage
> of them are actually transmitted. Goal being 100% of legit packets of
> course.
>
> Route cache settings matter in DDOS situations, and the flush operation
> can have a big impact on dropped frames because of cpu/ram congestion.
>
> Because of 600 seconds oscillations, its pretty hard to study exact cpu
> use of a router, unless taking samples on long periods.
>
>
>
>

I'm totally agree with you there must be some scheduler to release
route cache over the time.
So far i see garbage collector do this, but it cost a lot of CPU
probably its is a bug or design problem don't know
for this i make this 2 test to compare CPU usage with and without GC.


/ secret_interval 10 min,  1300000 route entries in cash ofter 10 min,
 7k new route on empty cache 2k on 1300000  /
all route cache parameters default. I try also with gc_elasticity from
8 to 2 and gc_interval from 60 to 1 but don't have
too much difference.

What I'm trying to say is that flush cash is almost instant ( less
then second on 1 CPU) so releasing of cash is not so heavy job
 ( you are right can have a big impact on dropped frames because of
cpu/ram congestion ) but my point is why GC need 5-6min
10% no  4 CPU to do same job ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache
  2010-01-29  7:38                   ` cold cold
@ 2010-01-29  9:06                     ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2010-01-29  9:06 UTC (permalink / raw)
  To: cold cold; +Cc: netdev

Le vendredi 29 janvier 2010 à 09:38 +0200, cold cold a écrit :

> I'm totally agree with you there must be some scheduler to release
> route cache over the time.
> So far i see garbage collector do this, but it cost a lot of CPU
> probably its is a bug or design problem don't know
> for this i make this 2 test to compare CPU usage with and without GC.
> 
> 
> / secret_interval 10 min,  1300000 route entries in cash ofter 10 min,
>  7k new route on empty cache 2k on 1300000  /
> all route cache parameters default. I try also with gc_elasticity from
> 8 to 2 and gc_interval from 60 to 1 but don't have
> too much difference.
> 
> What I'm trying to say is that flush cash is almost instant ( less
> then second on 1 CPU) so releasing of cash is not so heavy job
>  ( you are right can have a big impact on dropped frames because of
> cpu/ram congestion ) but my point is why GC need 5-6min
> 10% no  4 CPU to do same job ?
> --

Once again, 'flushing cache' is immediate, it only increments a global
variable (aka a generation number)

Then, later, when ip routing hits an entry with an old generation
number, this entry is discarded. This slows down processing, and your
router might drop packets during 5 to 60 seconds, while stale entries
are eliminated.

This delays the real cost of 'flush cache' in a smooth way, depending
on trafic you have.

Releasing 1.300.000 dst entries is expensive, no matter how you trigger
the release, because it has to go through RCU queueing, spinlocks,
kernel memory allocator logic, and touch a lot of memory.

In your previous "perf top" results, we saw most of kernel cpu cycles
were consumed outside of network stack, you might investigate why.

Using HPET time keeping is probably not very good for your machine...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-01-29  9:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-26 16:58 0% cpu usasge after fresh boot or net restart but 10% CPU if kernel flush route cache cold cold
2010-01-27 15:26 ` Eric Dumazet
2010-01-27 19:53   ` cold cold
2010-01-27 21:18     ` Eric Dumazet
2010-01-28  9:14       ` cold cold
2010-01-28 16:09         ` Eric Dumazet
2010-01-28 16:26           ` cold cold
2010-01-28 17:06             ` Eric Dumazet
2010-01-28 18:49               ` cold cold
2010-01-28 21:16                 ` Eric Dumazet
2010-01-29  7:38                   ` cold cold
2010-01-29  9:06                     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).