significant napi_gro_frags() slowdown

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michal Schmidt <mschmidt@redhat.com>
To: netdev@vger.kernel.org
Cc: Jerry Chu <hkchu@google.com>, Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: significant napi_gro_frags() slowdown
Date: Thu, 27 Mar 2014 17:39:11 +0100	[thread overview]
Message-ID: <5334542F.1050702@redhat.com> (raw)

Hello,

I received a report about a performance regression on a simple workload 
of receiving a single TCP stream using a be2net NIC. Previously the 
reporter was able to reach more than 9 Gb/s, but now he sees less than 7 
Gb/s.

On my system the performance loss was not so obvious, but profiling with 
perf showed significant time spent in copying memory in 
__pskb_pull_tail(). Here is an example perf histogram from 3.14-rc8:

-  14.03%         swapper  [kernel.kallsyms]        [k] memcpy
    - memcpy
       - 99.99% __pskb_pull_tail
          - 46.16% tcp_gro_receive
               tcp4_gro_receive
               inet_gro_receive
               dev_gro_receive
               napi_gro_frags
               be_process_rx
               be_poll
               net_rx_action
               __do_softirq
             + irq_exit
          - 31.18% napi_gro_frags
               be_process_rx
               be_poll
               net_rx_action
               __do_softirq
               irq_exit
               do_IRQ
             + ret_from_intr
          - 22.66% inet_gro_receive
               dev_gro_receive
               napi_gro_frags
               be_process_rx
               be_poll
               net_rx_action
               __do_softirq
               irq_exit
               do_IRQ
             + ret_from_intr
-  13.44%       netserver  [kernel.kallsyms]        [k] 
copy_user_generic_string
    - copy_user_generic_string
    - skb_copy_datagram_iovec
       + 56.70% skb_copy_datagram_iovec
       + 32.27% tcp_recvmsg
       + 11.03% tcp_rcv_established
-   5.64%         swapper  [kernel.kallsyms]        [k] __pskb_pull_tail
    - __pskb_pull_tail
       + 48.58% tcp_gro_receive
       + 24.48% inet_gro_receive
       + 24.11% napi_gro_frags
       + 1.28% tcp4_gro_receive
       + 0.91% be_process_rx
       + 0.64% dev_gro_receive
+   5.13%         swapper  [kernel.kallsyms]        [k] skb_copy_bits
...

Bisection identified this first bad commit:

     commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36
     Author: Jerry Chu <hkchu@google.com>
     Date:   Wed Dec 11 20:53:45 2013 -0800

         net-gro: Prepare GRO stack for the upcoming tunneling support

Before this commit, the GRO code was able to access the received 
packets' headers quickly via NAPI_GRO_CB(skb)->frag0. After the commit, 
this optimization no longer applies. napi_frags_skb() will now always 
pull the Ethernet header from the first frag. Subsequently, the slow 
paths are being taken in .gro_receive functions, always pulling the 
other headers with __pskb_pull_tail().

To demonstrate the call chains, let's see the function graph traced by 
ftrace.
Before the commit:

  13)               |  napi_gro_frags() {
  13)   0.038 us    |    skb_gro_reset_offset();
  13)               |    dev_gro_receive() {
  13)               |      inet_gro_receive() {
  13)               |        tcp4_gro_receive() {
  13)               |          tcp_gro_receive() {
  13)   0.059 us    |            skb_gro_receive();
  13)   0.385 us    |          }
  13)   0.675 us    |        }
  13)   1.012 us    |      }
  13)   1.336 us    |    }
  13)   0.040 us    |    napi_reuse_skb.isra.57();
  13)   2.235 us    |  }

After the commit:

   7)               |  napi_gro_frags() {
   7)               |    __pskb_pull_tail() {
   7)   0.204 us    |      skb_copy_bits();
   7)   0.551 us    |    }
   7)   0.046 us    |    eth_type_trans();
   7)               |    dev_gro_receive() {
   7)               |      inet_gro_receive() {
   7)               |        __pskb_pull_tail() {
   7)   0.095 us    |          skb_copy_bits();
   7)   0.412 us    |        }
   7)               |        tcp4_gro_receive() {
   7)               |          tcp_gro_receive() {
   7)               |            __pskb_pull_tail() {
   7)   0.095 us    |              skb_copy_bits();
   7)   0.410 us    |            }
   7)               |            __pskb_pull_tail() {
   7)   0.095 us    |              skb_copy_bits();
   7)   0.412 us    |            }
   7)   0.055 us    |            skb_gro_receive();
   7)   1.771 us    |          }
   7)   2.077 us    |        }
   7)   3.078 us    |      }
   7)   3.403 us    |    }
   7)   0.043 us    |    napi_reuse_skb.isra.72();
   7)   5.152 us    |  }

Here are typical times spent in napi_gro_frags(), measured using ftrace 
and making napi_gro_frags() a leaf function (thus avoiding the tracing 
overhead in child functions).
Before the commit:

...
  12)   0.130 us    |  napi_gro_frags();
  12)   0.129 us    |  napi_gro_frags();
  12)   0.132 us    |  napi_gro_frags();
  12)   0.128 us    |  napi_gro_frags();
  12)   0.126 us    |  napi_gro_frags();
  12)   0.129 us    |  napi_gro_frags();
  12)   1.423 us    |  napi_gro_frags();
...

After the commit:

...
  20)   0.812 us    |  napi_gro_frags();
  20)   0.820 us    |  napi_gro_frags();
  20)   0.811 us    |  napi_gro_frags();
  20)   0.812 us    |  napi_gro_frags();
  20)   0.814 us    |  napi_gro_frags();
  20)   0.819 us    |  napi_gro_frags();
  20)   1.526 us    |  napi_gro_frags();
...

This should affect not just be2net, but all drivers that use 
napi_gro_frags().
Any suggestions how to restore the frag0 optimization?

Thanks,
Michal

next             reply	other threads:[~2014-03-27 16:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-27 16:39 Michal Schmidt [this message]
2014-03-27 16:47 ` significant napi_gro_frags() slowdown Eric Dumazet
2014-03-27 17:05   ` Michal Schmidt
2014-03-27 17:21     ` Eric Dumazet
2014-03-30  4:28       ` [PATCH net-next] net-gro: restore frag0 optimization Eric Dumazet
2014-03-31 20:27         ` David Miller
2014-03-31 21:01           ` Eric Dumazet
2014-03-31 21:19             ` Eric Dumazet
2014-04-01 16:40         ` Michal Schmidt
2014-04-01 17:16           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5334542F.1050702@redhat.com \
    --to=mschmidt@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hkchu@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).