From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76D03C169C4 for ; Thu, 31 Jan 2019 04:33:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 334D82184D for ; Thu, 31 Jan 2019 04:33:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="1sTdFK/8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727784AbfAaEdg (ORCPT ); Wed, 30 Jan 2019 23:33:36 -0500 Received: from mail-qt1-f194.google.com ([209.85.160.194]:42909 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725798AbfAaEdf (ORCPT ); Wed, 30 Jan 2019 23:33:35 -0500 Received: by mail-qt1-f194.google.com with SMTP id d19so2127652qtq.9 for ; Wed, 30 Jan 2019 20:33:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=kqvXbvJrUsNMN0/VfsxWRPk/nxHx6o7xjXeUdHNoDrI=; b=1sTdFK/8WLbduMNFp0Zh0Rd41J6TB05d7hCLhHDIeIHF3YjQ2YSNrjBm7O6rP/Cm4o COEi/I4OH5MHfz7wkWlhbZH3iyJK1dSFvUhWxp40Bmun3NdsjMKanW/BPoLKc0FJ6Cw6 XoBYifm4z2VMZeZxfZcMbAxMua4jRGnH0pVHIDfnB7do+vsyiC3O1FQmt0/fKIWqxAL4 n5Ang2y+ZjD6+FyYrhbYdz18yqnrqAel6gxy0tr2ruv8eUzQ+6ThMCFrnyMdY//zcsm/ /Teb6MdHTSnB1Y+OdOwaY0eJE/FCDy94ZkTsM4vrm1OpxpSuAEhx09nXxUf+I/IW9ZXO 4GFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=kqvXbvJrUsNMN0/VfsxWRPk/nxHx6o7xjXeUdHNoDrI=; b=M4/NVLYA8591b2Xh4GOjELv3e78laBAstF9Gt2ZBS6USQ3cptjxHEx+QHFcHO9g7L1 0mRwY7MZHoJS706dU4Shst2QK2dGbEr8bE3cXYRnXxQrYCwmXcIMijrXZQDNohabTrRK yIOFeXvPhbtudkpJJCh42chkkMLYd5wOQbvaX3v5eI7gPdaDuJJowO1juLNwk0RnaCuW jHEc4bOiJY/H64UwROeKO58CCAw1+fojQ5d1LjuCvS4tFg1F4mdlKy4LyjotARC1iE9f NkDashsc15J8absZuW5EkXFUTvhkf+rB7RMfzGrj0fjGUT61QHmDKD5RBAOTqW+A30kQ t/PQ== X-Gm-Message-State: AJcUukeKF+r77GUZUY/EdGTbzsWO22CMeqpVOvD3O+rGBxUZnggXc19f iQAbsF4zRPhObul7//R959DnmhzTFnFVx31P4BHxQw== X-Google-Smtp-Source: ALg8bN7MtlTSlZRDicwFYsFY7FqLheA7Og9Y7kUJEr4IrbKd1Iaebx5iau+PtGo33Ma6HzaxKMS+DSGNb7nSWRKhOug= X-Received: by 2002:a0c:b24f:: with SMTP id k15mr31492605qve.72.1548909214036; Wed, 30 Jan 2019 20:33:34 -0800 (PST) MIME-Version: 1.0 References: <1548214428-114642-1-git-send-email-maowenan@huawei.com> <038cf844-ca2f-8db2-7ed6-fee39cbcfc7d@huawei.com> <17c94d54-0a0e-21e7-1a8d-5e84b2f7a07d@huawei.com> <57903a5f-5033-4da9-8807-0b6f894827a9@huawei.com> In-Reply-To: <57903a5f-5033-4da9-8807-0b6f894827a9@huawei.com> From: Tom Herbert Date: Wed, 30 Jan 2019 20:33:22 -0800 Message-ID: Subject: Re: [PATCH net-next] net: udp Allow CHECKSUM_UNNECESSARY packets to do GRO. To: maowenan Cc: Linux Kernel Network Developers , "David S. Miller" , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, Jan 30, 2019 at 6:59 PM maowenan wrote: > > > > On 2019/1/31 10:43, Tom Herbert wrote: > > On Wed, Jan 30, 2019 at 5:58 PM maowenan wrote: > >> > >> > >> > >> On 2019/1/30 4:24, Tom Herbert wrote: > >>> On Tue, Jan 29, 2019 at 12:08 AM maowenan wrote= : > >>>> > >>>> > >>>> > >>>> On 2019/1/29 14:24, Tom Herbert wrote: > >>>>> On Mon, Jan 28, 2019 at 10:04 PM maowenan wro= te: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2019/1/29 12:01, Tom Herbert wrote: > >>>>>>> On Mon, Jan 28, 2019 at 7:00 PM maowenan wr= ote: > >>>>>>>> > >>>>>>>> Hi all=EF=BC=8C > >>>>>>>> Do you have any comments about this change? > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2019/1/23 11:33, Mao Wenan wrote: > >>>>>>>>> When udp4_gro_receive() get one packet that uh->check=3D0, > >>>>>>>>> skb_gro_checksum_validate_zero_check() will set the > >>>>>>>>> skb->ip_summed =3D CHECKSUM_UNNECESSARY; > >>>>>>>>> skb->csum_level =3D 0; > >>>>>>>>> Then udp_gro_receive() will flush the packet which is not CHECK= SUM_PARTIAL, > >>>>>>>>> It is not our expect, because check=3D0 in udp header indicate= s this > >>>>>>>>> packet is no need to caculate checksum, we should go further to= do GRO. > >>>>>>>>> > >>>>>>>>> This patch changes the value of csum_cnt according to skb->csum= _level. > >>>>>>>>> --- > >>>>>>>>> include/linux/netdevice.h | 1 + > >>>>>>>>> 1 file changed, 1 insertion(+) > >>>>>>>>> > >>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevic= e.h > >>>>>>>>> index 1377d08..9c819f1 100644 > >>>>>>>>> --- a/include/linux/netdevice.h > >>>>>>>>> +++ b/include/linux/netdevice.h > >>>>>>>>> @@ -2764,6 +2764,7 @@ static inline void skb_gro_incr_csum_unne= cessary(struct sk_buff *skb) > >>>>>>>>> * during GRO. This saves work if we fallback to = normal path. > >>>>>>>>> */ > >>>>>>>>> __skb_incr_checksum_unnecessary(skb); > >>>>>>>>> + NAPI_GRO_CB(skb)->csum_cnt =3D skb->csum_level + = 1; > >>>>>>> > >>>>>>> That doesn't look right. This would be reinitializing the GRO > >>>>>>> checksums from the beginning. > >>>>>>> > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> I assume the code is bailing on this conditional: > >>>>>>> > >>>>>>> if (NAPI_GRO_CB(skb)->encap_mark || > >>>>>>> (skb->ip_summed !=3D CHECKSUM_PARTIAL && > >>>>>>> NAPI_GRO_CB(skb)->csum_cnt =3D=3D 0 && > >>>>>>> !NAPI_GRO_CB(skb)->csum_valid) || > >>>>>>> !udp_sk(sk)->gro_receive) > >>>>>>> goto out_unlock; > >>>>>>> > >>>>>>> I am trying to remember why this needs to check csum_cnt. If ther= e was > >>>>>>> a csum_cnt for the UDP csum being zero from checksum-unnecessary,= it > >>>>>>> was consumed by skb_gro_checksum_validate_zero_check in UDP4 GRO > >>>>>>> received. > >>>>>> > >>>>>> We have met the scene about two VMs in different host with vxlan p= ackets, when udp4_gro_receive receives > >>>>>> one packet with ip_summed=3DCHECKSUM_NONE,csum_cnt=3D0,csum_valid= =3D0,and udp->check=3D0, then skb_gro_checksum_validate_zero_check()-> > >>>>>> skb_gro_incr_csum_unnecessary() validate it and set ip_summed=3DCH= ECKSUM_UNNECESSARY,csum_level=3D0, but csum_cnt and csum_valid > >>>>>> keep zero value. Then it will be flushed in udp_gro_receive(), the= codes as you have showed. > >>>>>> > >>>>>> so I think it forgets to modify csum_cnt since csum_level is chang= ed in skb_gro_incr_csum_unnecessary()->__skb_incr_checksum_unnecessary(). > >>>>>> > >>>>> Yes, but the csum_level is changing since we've gone beyond the > >>>>> checksums initially reported inc checksum-unnecessary. GRO csum_cnt= is > >>>>> initialized to skb->csum_level + 1 at the start of GRO processing. > >>>>> > >>>>> If I recall, the rule is that UDP GRO requires at least one non-zer= o > >>>>> checksum to be verified. The idea is that if we end up computing > >>>>> packet checksums on the host for inner checksums like TCP during GR= O, > >>>>> then that's negating the performance benefits of GRO. Had UDP check > >>>>> not been zero then we would do checksum unnecessary conversion and = so > >>>>> csum_valid would be set for the remainded of GRO processing. The > >>>>> existing code is following the rule I believe, so this may be worki= ng > >>>>> as intended. > >>>> > >>>> Do you have any suggestion if I need do GRO as udp->check is zero? > >>>> My previous modification which works fine as below: > >>>> if (NAPI_GRO_CB(skb)->encap_mark || > >>>> (skb->ip_summed !=3D CHECKSUM_PARTIAL && > >>>> + skb->ip_summed !=3D CHECKSUM_UNNECESSARY && > >>> > >>> That's effectively disabling the rule that we need a real checksum > >>> calculation to proceed with GRO. Besides that, the device returning > >>> one checksum-unnecessary level because UDP csum is zero is pretty > >>> pointelss; we can just as easily deduce get to same state just by > >>> looking at the field with CHECKSUM_NONE. What we really want to see > >>> for GRO is a real checksum computation being done on the packet. > >>> > >>> A few questions: > >>> > >>> What type of packets are being GROed? Are these TCP? What performance > >>> difference do you see with our patch? Can you try enabling UDP > >>> checksums, and even RCO with VXLAN? With UDP encapsulation we > >>> generally see better performance with checksum enabled since UDP > >>> checksum offload is ubiquitous and we can easily convert > >>> checksum-unnecessary (with non-zero csum) to checksum-complete. > >> > >> We use the physical network card calculate the checksum of the inner p= acket with checksum offload. > >> Set the udp checksum of the vxlan header is 0. > >> > > I see. It sounds like the device is really verifying two checksums in > > the packet, the outer UDP checksum (which is zero for UDP) and an > > inner checksum, but only reporting one checksum was verified. The > > driver needs to set csum_level to 1 in this case (meaning two > > checksums have been verified for checksum-unnecessary). What NIC are > > you using? > > Currently it is 82599, whose driver can't recognize the vxlan packet. > I guess so many NICs can't do this checking in it's driver, so I think th= is is a > common case, will we fix it in stack? > Mao, The problem isn't in the stack, it seems to be in the driver. If the device reports a verified checksum for an encpasulated packet then the driver needs to set csum_level to 1. Otherwise, the stack can't just assume that the inner checksum was verified. *How* a driver deduces that the device is reporting about an encapsulated checksum is specific to the device and its driver. I'm not sure which driver your running, but if you search the code there should be something like "skb->csum_level =3D1" that would be a clue about support. A good example is ixgbe, if the device reports checksum verified and that packet was VXLAN, it deduces that the inner checksum was verified and so the driver sets CHECKSUM_UNNECESSARY and skb->csum_level. Of course all this complexity goes away when devices just provide checksum-complete. Tom > > > > Tom > > > > > >> With this patch, the bandwidth of TCP between two VMs increase from 2G= bit/s to 6Gbit/s. > >> > >>> > >>> Tom > >>> > >>> > >>> > >>> > >>>> NAPI_GRO_CB(skb)->csum_cnt =3D=3D 0 && > >>>> !NAPI_GRO_CB(skb)->csum_valid) || > >>>> !udp_sk(sk)->gro_receive) > >>>> goto out_unlock; > >>>> > >>>> > >>>>> > >>>>> Tom > >>>>> > >>>>>>> > >>>>>>> . > >>>>>>> > >>>>>> > >>>>> > >>>>> . > >>>>> > >>>> > >>> > >>> . > >>> > >> > > > > . > > >