atomic operations bottleneck in the IPv6 stack

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* atomic operations bottleneck in the IPv6 stack
@ 2014-12-10 16:56 cristian.bercaru
  2014-12-10 17:16 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 3+ messages in thread
From: cristian.bercaru @ 2014-12-10 16:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: R89243@freescale.com, Madalin-Cristian Bucur,
	Razvan.Ungureanu@freescale.com


Hello!

I am running IPv6 forwarding cases and I get worse performance with 24 cores than with 16 cores.

Test scenario:
10G --->[T4240]---> 10G
- platform: Freescale T4240, powerpc, 24 x e6500 64-bit cores (I can disable 8 of them from uboot)
- input type: raw IPv6 78-byte packets
- input rate: 10Gbps
- forwarding/output rate: 16 cores - 3.3 Gbps; 24 cores - 2.4 Gbps

Doing a perf with "record -C 1 -c 10000000 -a sleep 120" record I observe
- on 16 cores
# Overhead      Command      Shared Object       Symbol
    19.59%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    18.07%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     5.09%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    34.98%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    31.86%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     3.76%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 
     2.72%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           

I de-inlined 'atomic_dec_return' and 'atomic_inc' that are used by 'ip6_pol_route' and 'dst_release' and I get
- on 16 cores
    17.26%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    13.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     5.53%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     5.02%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    32.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    30.56%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     4.71%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     3.57%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 

It seems to me that the atomic operations on the IPv6 forwarding path are a bottleneck and they are not scalable with the number of cores. Am I right? What improvements can be brought to the IPv6 kernel code to make it less dependent of atomic operations/variables?

Thank you,
Cristian Bercaru

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: atomic operations bottleneck in the IPv6 stack
  2014-12-10 16:56 atomic operations bottleneck in the IPv6 stack cristian.bercaru
@ 2014-12-10 17:16 ` Hannes Frederic Sowa
  2014-12-10 17:58   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 3+ messages in thread
From: Hannes Frederic Sowa @ 2014-12-10 17:16 UTC (permalink / raw)
  To: cristian.bercaru@freescale.com
  Cc: netdev@vger.kernel.org, R89243@freescale.com,
	Madalin-Cristian Bucur, Razvan.Ungureanu@freescale.com

On Mi, 2014-12-10 at 16:56 +0000, cristian.bercaru@freescale.com wrote:
> 
> It seems to me that the atomic operations on the IPv6 forwarding path
> are a bottleneck and they are not scalable with the number of cores.
> Am I right? What improvements can be brought to the IPv6 kernel code
> to make it less dependent of atomic operations/variables?

For a starter, something like the following commit:

commit d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Jul 31 05:45:30 2012 +0000

    ipv4: percpu nh_rth_output cache

Bye,
Hannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: atomic operations bottleneck in the IPv6 stack
  2014-12-10 17:16 ` Hannes Frederic Sowa
@ 2014-12-10 17:58   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 3+ messages in thread
From: Hannes Frederic Sowa @ 2014-12-10 17:58 UTC (permalink / raw)
  To: cristian.bercaru@freescale.com
  Cc: netdev@vger.kernel.org, R89243@freescale.com,
	Madalin-Cristian Bucur, Razvan.Ungureanu@freescale.com

On Mi, 2014-12-10 at 18:16 +0100, Hannes Frederic Sowa wrote:
> On Mi, 2014-12-10 at 16:56 +0000, cristian.bercaru@freescale.com wrote:
> > 
> > It seems to me that the atomic operations on the IPv6 forwarding path
> > are a bottleneck and they are not scalable with the number of cores.
> > Am I right? What improvements can be brought to the IPv6 kernel code
> > to make it less dependent of atomic operations/variables?
> 
> For a starter, something like the following commit:
> 
> commit d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Jul 31 05:45:30 2012 +0000
> 
>     ipv4: percpu nh_rth_output cache

Actually, we should be able to remove the atomics in input and
forwarding path by just relying on RCU. I'll have a look.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-10 18:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 16:56 atomic operations bottleneck in the IPv6 stack cristian.bercaru
2014-12-10 17:16 ` Hannes Frederic Sowa
2014-12-10 17:58   ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).