netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* vethpair creation performance, 3.14 versus 4.2.0
@ 2015-08-31 19:48 Rick Jones
  2015-08-31 21:29 ` David Ahern
  2015-08-31 23:04 ` Eric Dumazet
  0 siblings, 2 replies; 4+ messages in thread
From: Rick Jones @ 2015-08-31 19:48 UTC (permalink / raw)
  To: Raghavendra K T, netdev

On 08/29/2015 10:59 PM, Raghavendra K T wrote:
 > Please note that similar overhead was also reported while creating
 > veth pairs  https://lkml.org/lkml/2013/3/19/556


That got me curious, so I took the veth pair creation script from there, 
and started running it out to 10K pairs, comparing a 3.14.44 kernel with 
a 4.2.0-rc4+ from net-next and then net-next after pulling to get the 
snmp stat aggregation perf change (4.2.0-rc8+).

Indeed, the 4.2.0-rc8+ kernel with the change was faster than the 
4.2.0-rc4+ kernel without it, but both were slower than the 3.14.44 kernel.

I've put a spreadsheet with the results at:

ftp://ftp.netperf.org/vethpair/vethpair_compare.ods

A perf top for the 4.20-rc8+ kernel from the net-next tree looks like 
this out around 10K pairs:

    PerfTop:   11155 irqs/sec  kernel:94.2%  exact:  0.0% [4000Hz 
cycles],  (all, 32 CPUs)
-------------------------------------------------------------------------------

     23.44%  [kernel]       [k] vsscanf
      7.32%  [kernel]       [k] mutex_spin_on_owner.isra.4
      5.63%  [kernel]       [k] __memcpy
      5.27%  [kernel]       [k] __dev_alloc_name
      3.46%  [kernel]       [k] format_decode
      3.44%  [kernel]       [k] vsnprintf
      3.16%  [kernel]       [k] acpi_os_write_port
      2.71%  [kernel]       [k] number.isra.13
      1.50%  [kernel]       [k] strncmp
      1.21%  [kernel]       [k] _parse_integer
      0.93%  [kernel]       [k] filemap_map_pages
      0.82%  [kernel]       [k] put_dec_trunc8
      0.82%  [kernel]       [k] unmap_single_vma
      0.78%  [kernel]       [k] native_queued_spin_lock_slowpath
      0.71%  [kernel]       [k] menu_select
      0.65%  [kernel]       [k] clear_page
      0.64%  [kernel]       [k] _raw_spin_lock
      0.62%  [kernel]       [k] page_fault
      0.60%  [kernel]       [k] find_busiest_group
      0.53%  [kernel]       [k] snprintf
      0.52%  [kernel]       [k] int_sqrt
      0.46%  [kernel]       [k] simple_strtoull
      0.44%  [kernel]       [k] page_remove_rmap

My attempts to get a call-graph have been met with very limited success. 
  Even though I've installed the dbg package from "make deb-pkg" the 
symbol resolution doesn't seem to be working.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vethpair creation performance, 3.14 versus 4.2.0
  2015-08-31 19:48 vethpair creation performance, 3.14 versus 4.2.0 Rick Jones
@ 2015-08-31 21:29 ` David Ahern
  2015-08-31 21:31   ` Rick Jones
  2015-08-31 23:04 ` Eric Dumazet
  1 sibling, 1 reply; 4+ messages in thread
From: David Ahern @ 2015-08-31 21:29 UTC (permalink / raw)
  To: Rick Jones, Raghavendra K T, netdev

On 8/31/15 1:48 PM, Rick Jones wrote:
> My attempts to get a call-graph have been met with very limited success.
>   Even though I've installed the dbg package from "make deb-pkg" the
> symbol resolution doesn't seem to be working.

Looks like Debian does not enable framepointers by default:

$ grep FRAME /boot/config-3.2.0-4-amd64
...
# CONFIG_FRAME_POINTER is not set

Similar result for jessie.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vethpair creation performance, 3.14 versus 4.2.0
  2015-08-31 21:29 ` David Ahern
@ 2015-08-31 21:31   ` Rick Jones
  0 siblings, 0 replies; 4+ messages in thread
From: Rick Jones @ 2015-08-31 21:31 UTC (permalink / raw)
  To: David Ahern, Raghavendra K T, netdev

On 08/31/2015 02:29 PM, David Ahern wrote:
> On 8/31/15 1:48 PM, Rick Jones wrote:
>> My attempts to get a call-graph have been met with very limited success.
>>   Even though I've installed the dbg package from "make deb-pkg" the
>> symbol resolution doesn't seem to be working.
>
> Looks like Debian does not enable framepointers by default:
>
> $ grep FRAME /boot/config-3.2.0-4-amd64
> ...
> # CONFIG_FRAME_POINTER is not set
>
> Similar result for jessie.

And indeed, my config file has a Debian lineage.

rick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vethpair creation performance, 3.14 versus 4.2.0
  2015-08-31 19:48 vethpair creation performance, 3.14 versus 4.2.0 Rick Jones
  2015-08-31 21:29 ` David Ahern
@ 2015-08-31 23:04 ` Eric Dumazet
  1 sibling, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2015-08-31 23:04 UTC (permalink / raw)
  To: Rick Jones; +Cc: Raghavendra K T, netdev

On Mon, 2015-08-31 at 12:48 -0700, Rick Jones wrote:
> On 08/29/2015 10:59 PM, Raghavendra K T wrote:
>  > Please note that similar overhead was also reported while creating
>  > veth pairs  https://lkml.org/lkml/2013/3/19/556
> 
> 
> That got me curious, so I took the veth pair creation script from there, 
> and started running it out to 10K pairs, comparing a 3.14.44 kernel with 
> a 4.2.0-rc4+ from net-next and then net-next after pulling to get the 
> snmp stat aggregation perf change (4.2.0-rc8+).
> 
> Indeed, the 4.2.0-rc8+ kernel with the change was faster than the 
> 4.2.0-rc4+ kernel without it, but both were slower than the 3.14.44 kernel.
> 
> I've put a spreadsheet with the results at:
> 
> ftp://ftp.netperf.org/vethpair/vethpair_compare.ods
> 
> A perf top for the 4.20-rc8+ kernel from the net-next tree looks like 
> this out around 10K pairs:
> 
>     PerfTop:   11155 irqs/sec  kernel:94.2%  exact:  0.0% [4000Hz 
> cycles],  (all, 32 CPUs)
> -------------------------------------------------------------------------------
> 
>      23.44%  [kernel]       [k] vsscanf
>       7.32%  [kernel]       [k] mutex_spin_on_owner.isra.4
>       5.63%  [kernel]       [k] __memcpy
>       5.27%  [kernel]       [k] __dev_alloc_name
>       3.46%  [kernel]       [k] format_decode
>       3.44%  [kernel]       [k] vsnprintf
>       3.16%  [kernel]       [k] acpi_os_write_port
>       2.71%  [kernel]       [k] number.isra.13
>       1.50%  [kernel]       [k] strncmp
>       1.21%  [kernel]       [k] _parse_integer
>       0.93%  [kernel]       [k] filemap_map_pages
>       0.82%  [kernel]       [k] put_dec_trunc8
>       0.82%  [kernel]       [k] unmap_single_vma
>       0.78%  [kernel]       [k] native_queued_spin_lock_slowpath
>       0.71%  [kernel]       [k] menu_select
>       0.65%  [kernel]       [k] clear_page
>       0.64%  [kernel]       [k] _raw_spin_lock
>       0.62%  [kernel]       [k] page_fault
>       0.60%  [kernel]       [k] find_busiest_group
>       0.53%  [kernel]       [k] snprintf
>       0.52%  [kernel]       [k] int_sqrt
>       0.46%  [kernel]       [k] simple_strtoull
>       0.44%  [kernel]       [k] page_remove_rmap
> 
> My attempts to get a call-graph have been met with very limited success. 
>   Even though I've installed the dbg package from "make deb-pkg" the 
> symbol resolution doesn't seem to be working.


Well, you do not need call graph to spot the well known issue with
__dev_alloc_name() which has O(N) behavior

If we really need to be fast here, and keep eth%d or veth%d names
with guarantee of lowest numbers, we would need an IDR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-31 23:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-31 19:48 vethpair creation performance, 3.14 versus 4.2.0 Rick Jones
2015-08-31 21:29 ` David Ahern
2015-08-31 21:31   ` Rick Jones
2015-08-31 23:04 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).