netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* atomic operations bottleneck in the IPv6 stack
@ 2014-12-10 16:56 cristian.bercaru
  2014-12-10 17:16 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 3+ messages in thread
From: cristian.bercaru @ 2014-12-10 16:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: R89243@freescale.com, Madalin-Cristian Bucur,
	Razvan.Ungureanu@freescale.com


Hello!

I am running IPv6 forwarding cases and I get worse performance with 24 cores than with 16 cores.

Test scenario:
10G --->[T4240]---> 10G
- platform: Freescale T4240, powerpc, 24 x e6500 64-bit cores (I can disable 8 of them from uboot)
- input type: raw IPv6 78-byte packets
- input rate: 10Gbps
- forwarding/output rate: 16 cores - 3.3 Gbps; 24 cores - 2.4 Gbps

Doing a perf with "record -C 1 -c 10000000 -a sleep 120" record I observe
- on 16 cores
# Overhead      Command      Shared Object       Symbol
    19.59%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    18.07%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     5.09%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    34.98%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    31.86%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     3.76%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 
     2.72%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           

I de-inlined 'atomic_dec_return' and 'atomic_inc' that are used by 'ip6_pol_route' and 'dst_release' and I get
- on 16 cores
    17.26%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    13.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     5.53%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     5.02%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    32.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    30.56%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     4.71%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     3.57%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 

It seems to me that the atomic operations on the IPv6 forwarding path are a bottleneck and they are not scalable with the number of cores. Am I right? What improvements can be brought to the IPv6 kernel code to make it less dependent of atomic operations/variables?

Thank you,
Cristian Bercaru

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-10 18:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 16:56 atomic operations bottleneck in the IPv6 stack cristian.bercaru
2014-12-10 17:16 ` Hannes Frederic Sowa
2014-12-10 17:58   ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).