All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ramu Ramamurthy <sramamur@linux.vnet.ibm.com>
To: Tom Herbert <tom@herbertland.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Tom Herbert <therbert@google.com>, Jiri Benc <jbenc@redhat.com>,
	James Morris <jmorris@namei.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	pradeeps@linux.vnet.ibm.com, jkidambi@us.ibm.com
Subject: Re: [PATCH] - vxlan: gro not effective for intel 82599
Date: Thu, 25 Jun 2015 18:06:08 -0700	[thread overview]
Message-ID: <0b2eff60824ac7b7d3a672da9be9bf99@imap.linux.ibm.com> (raw)
In-Reply-To: <CALx6S36JQaPg9Otg4pv5ah16JGaEXzo1xDL4jXKBdziUTVvXmw@mail.gmail.com>

On 2015-06-25 17:20, Tom Herbert wrote:
> On Thu, Jun 25, 2015 at 5:03 PM, Ramu Ramamurthy
> <sramamur@linux.vnet.ibm.com> wrote:
>> Problem:
>> -------
>> 
>> GRO is enabled on the interfaces in the following test,
>> but GRO does not take effect for vxlan-encapsulated tcp streams. The 
>> root
>> cause of why GRO does not take effect is described below.
>> 
>> VM nic (mtu 1450)---bridge---vxlan----10Gb nic (intel 82599ES)-----|
>> VM nic (mtu 1450)---bridge---vxlan----10Gb nic (intel 82599ES)-----|
>> 
>> Because gro is not effective, the throughput for vxlan-encapsulated
>> tcp-stream is around 3 Gbps.
>> 
>> With the proposed patch, gro takes effect for vxlan-encapsulated tcp
>> streams,
>> and performance in the same test is around 8.6 Gbps.
>> 
>> 
>> Root Cause:
>> ----------
>> 
>> 
>> At entry to udp4_gro_receive(), the gro parameters are set as follows:
>> 
>>     skb->ip_summed  == 0 (CHECKSUM_NONE)
>>     NAPI_GRO_CB(skb)->csum_cnt == 0
>>     NAPI_GRO_CB(skb)->csum_valid == 0
>> 
>>     UDH header checksum is 0.
>> 
>> static struct sk_buff **udp4_gro_receive(struct sk_buff **head,
>>                                          struct sk_buff *skb)
>> {
>> 
>>          <snip>
>> 
>>         if (skb_gro_checksum_validate_zero_check(skb, IPPROTO_UDP,
>> uh->check,
>>                                                  
>> inet_gro_compute_pseudo))
>> 
>>>>>             This calls __skb_incr_checksum_unnecessary which sets
>>>>>                     skb->ip_summed to  CHECKSUM_UNNECESSARY
>>>>> 
>> 
>>                 goto flush;
>>         else if (uh->check)
>>                 skb_gro_checksum_try_convert(skb, IPPROTO_UDP, 
>> uh->check,
>>                                              inet_gro_compute_pseudo);
>> skip:
>>         NAPI_GRO_CB(skb)->is_ipv6 = 0;
>>         return udp_gro_receive(head, skb, uh);
>> 
>> }
>> 
>> struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff 
>> *skb,
>>                                  struct udphdr *uh)
>> {
>>         struct udp_offload_priv *uo_priv;
>>         struct sk_buff *p, **pp = NULL;
>>         struct udphdr *uh2;
>>         unsigned int off = skb_gro_offset(skb);
>>         int flush = 1;
>> 
>>         if (NAPI_GRO_CB(skb)->udp_mark ||
>>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>              !NAPI_GRO_CB(skb)->csum_valid))
>>                 goto out;
>>>>> 
>>>>> 
>>>>>      vxlan GRO gets skipped due to the above condition because 
>>>>> here,:
>>>>>          skb->ip_summed == CHECKSUM_UNNECESSARY
>>>>>          NAPI_GRO_CB(skb)->csum_cnt == 0
>>>>>          NAPI_GRO_CB(skb)->csum_valid == 0
>> 
>> 
>> There is no reason for skipping vxlan gro in the above combination of
>> conditions,
>> because, tcp4_gro_receive() validates the inner tcp checksum anyway !
>> 
>> 
>> Patch:
>> ------
>> 
>> Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
>> ---
>>  net/ipv4/udp_offload.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>> 
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index f938616..17fc12b 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -301,6 +301,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff 
>> **head,
>> struct sk_buff *skb,
>> 
>>         if (NAPI_GRO_CB(skb)->udp_mark ||
>>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>> +            skb->ip_summed != CHECKSUM_UNNECESSARY &&
>>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>              !NAPI_GRO_CB(skb)->csum_valid))
>>                 goto out;
>> --
> 
> This isn't right. The CHECKSUM_UNNECESSARY only refers to the outer
> checksum which is zero in this case so it is trivially unnecessary.
> The inner checksum still needs to be computed on the host. By
> convention, we do not do GRO if it is required to compute the inner
> checksum (csum_cnt == 0 checks that). If we want to allow checksum
> calculation to occur in the GRO path, meaning we understand the
> ramifications and can show this is better for performance, then all
> the checks about checksum here should be removed.
> 

Isnt the inner checksum computed on the gro-path from tcp4_gro_receive() 
as follows ?
This trace is from my testbed.

In my tests, I consistently get 8.5-9 Gbps with vxlan gro (inspite of
the added sw inner checksumming), whereas without vxlan GRO  the 
performance
drops down to 3Gbps or so. So, a significant performance benefit can be 
gained
on intel 10G nics which are widely deployed. Hence the interest in 
pursuing this or a modified patch.

      vxlan_gro_receive <-udp4_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420280: __pskb_pull_tail 
<-vxlan_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420280: skb_copy_bits 
<-__pskb_pull_tail
      ksoftirqd/1-94    [001] ..s. 11421.420280: __pskb_pull_tail 
<-vxlan_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: skb_copy_bits 
<-__pskb_pull_tail
      ksoftirqd/1-94    [001] ..s. 11421.420281: gro_find_receive_by_type 
<-vxlan_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: inet_gro_receive 
<-vxlan_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: __pskb_pull_tail 
<-inet_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: skb_copy_bits 
<-__pskb_pull_tail
      ksoftirqd/1-94    [001] ..s. 11421.420281: tcp4_gro_receive 
<-inet_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: 
__skb_gro_checksum_complete <-tcp4_gro_receive
      ksoftirqd/1-94    [001] ..s. 11421.420281: skb_checksum 
<-__skb_gro_checksum_complete
      ksoftirqd/1-94    [001] ..s. 11421.420281: __skb_checksum 
<-skb_checksum
      ksoftirqd/1-94    [001] ..s1 11421.420281: csum_partial 
<-csum_partial_ext
      ksoftirqd/1-94    [001] ..s1 11421.420281: do_csum <-csum_partial



>> 1.7.1
>> 
>> 
>> 
>> 
>> 
>> Notes:
>> -------
>> 
>> The above gro fix applies to all udp-encapsulation protocols (vxlan, 
>> geneve)
>> 
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-06-26  1:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-26  0:03 [PATCH] - vxlan: gro not effective for intel 82599 Ramu Ramamurthy
2015-06-26  0:20 ` Tom Herbert
2015-06-26  1:06   ` Ramu Ramamurthy [this message]
2015-06-26  2:57     ` Tom Herbert
2015-06-26  5:15       ` Eric Dumazet
2015-06-26 17:24         ` Tom Herbert
2015-06-26 17:36       ` Ramu Ramamurthy
2015-06-26 18:04         ` Tom Herbert
2015-06-26 19:31           ` Ramu Ramamurthy
2015-06-26 19:59             ` Tom Herbert
2015-06-26 21:44               ` Ramu Ramamurthy
2015-06-28 20:19               ` Or Gerlitz
2015-06-28 21:17                 ` Tom Herbert
2015-06-29 19:56                   ` Ramu Ramamurthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b2eff60824ac7b7d3a672da9be9bf99@imap.linux.ibm.com \
    --to=sramamur@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=jbenc@redhat.com \
    --cc=jkidambi@us.ibm.com \
    --cc=jmorris@namei.org \
    --cc=netdev@vger.kernel.org \
    --cc=pradeeps@linux.vnet.ibm.com \
    --cc=therbert@google.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.