linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* v7_dma_inv_range performance/high expense
@ 2016-05-27 14:40 Andrew Lunn
  2016-05-27 14:58 ` Russell King - ARM Linux
  2016-05-27 16:14 ` Mark Rutland
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Lunn @ 2016-05-27 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi folks

I have an imx6q, which is a quad core v7 processor. Attached to it via
pcie i have an intel i210 Ethernet controller.

When the ethernet is transmitting, i can get gigabit line rate, and
use one core to about 35% of one core. When receiving, i get around
700Mbps and ksoftirqd/0 is 98% loading a core.

Using perf to profile the ksoftirqd/0 pid is see:

  46.38%  [kernel]  [k] v7_dma_inv_range
  21.25%  [kernel]  [k] l2c210_inv_range
  10.90%  [kernel]  [k] igb_poll
   1.69%  [kernel]  [k] dma_cache_maint_page
   1.27%  [kernel]  [k] eth_type_trans
   1.20%  [kernel]  [k] skb_add_rx_frag

Digging deeper into v7_dma_inv_range i see:

           801182c0 <v7_dma_inv_range>:
           v7_dma_inv_range():
  0.26       mrc    15, 0, r3, cr0, cr0, {1}
  0.07       lsr    r3, r3, #16
             and    r3, r3, #15
  0.04       mov    r2, #4
             lsl    r2, r2, r3
  0.04       sub    r3, r2, #1
             tst    r0, r3
  0.02       bic    r0, r0, r3
  0.03       dsb    sy
  3.01       mcrne  15, 0, r0, cr7, cr14, {1}
  0.54       tst    r1, r3
             bic    r1, r1, r3
  0.08       mcrne  15, 0, r1, cr7, cr14, {1}
  3.82 34:   mcr    15, 0, r0, cr7, cr6, {1}
 88.32       add    r0, r0, r2
             cmp    r0, r1
  1.97       bcc    34
  0.43       dsb    st
  1.37       bx     lr

I'm assuming perf is off by one here, and the add is not taking 88.32%
of the load, rather it is the mcr instruction before it.

The original code in arch/arm/mm/cache-v7.S  says:

        mcr     p15, 0, r0, c7, c6, 1           @ invalidate D / U line

I don't get why a cache invalidate instruction should be so expensive.
It is just throwing away the contents of the cache line, not flushing
it out to DRAM. Should i trust perf? Is a cache invalidate really so
expensive? Or am i totally missing something here?

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-27 16:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-27 14:40 v7_dma_inv_range performance/high expense Andrew Lunn
2016-05-27 14:58 ` Russell King - ARM Linux
2016-05-27 15:38   ` Andrew Lunn
2016-05-27 16:37     ` Russell King - ARM Linux
2016-05-27 16:14 ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).