From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f46.google.com ([209.85.160.46]:35141 "EHLO mail-pl0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751454AbeCPXDV (ORCPT ); Fri, 16 Mar 2018 19:03:21 -0400 Received: by mail-pl0-f46.google.com with SMTP id p9-v6so4199169pls.2 for ; Fri, 16 Mar 2018 16:03:21 -0700 (PDT) Subject: Re: BUG_ON triggered in skb_segment To: Yonghong Song , Alexei Starovoitov , Steffen Klassert , Daniel Borkmann Cc: netdev , Martin Lau , saeedm@mellanox.com, diptanu@fb.com References: <9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com> <875f59f2-d1ec-c47c-cdd7-2ce4985c5143@gmail.com> <20180313084444.5hyu7kteswh6v5li@gauss3.secunet.de> <06bef27e-60f6-0fb4-e926-cb7ff3666a3c@fb.com> <4c0d599e-95a3-be7d-bfeb-e51a16648cc8@gmail.com> <39a56b2c-6ea8-02f7-9e9c-968b16b9aa7a@gmail.com> <50906fd5-9834-d41a-c132-86f64be38606@gmail.com> <84dc788d-eedd-acc2-64dc-d51e909bb663@gmail.com> <85e72428-5ac4-6727-ab92-828286e5c789@fb.com> From: Eric Dumazet Message-ID: Date: Fri, 16 Mar 2018 16:03:18 -0700 MIME-Version: 1.0 In-Reply-To: <85e72428-5ac4-6727-ab92-828286e5c789@fb.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org List-ID: On 03/16/2018 03:37 PM, Yonghong Song wrote: > > Eric and Daniel, > > I have tried to fix this issue but not really successful. > I tried two hacks: >   . if skb_headlen(list_skb) is not 0, we just pull >     skb_headlen(list_skb) from the skb to make skb_headlen(list_skb) = 0, or >   . if skb_headlen(list_skb) is not 0, we go to the beginning of >     the outer loop which will allocate another nskb for this list_skb. > > Both approaches removed the BUG and the packet is able to reach the > remote host. Upon receiving the packet, however, the remote host sends a > reset packet back so connection eventually closed. I did not debug > further on this. > > Considering it is tricky to change skb_segment, I hacked test_bpf > kernel module to reproduce the issue. The change reflects the gso packet > structure I got from mlx5. Maybe you could take a look and suggest a fix or a direction of how to move forward. > > Thanks! > > ============= PATCH  =============== > > -bash-4.2$ git show > commit 41681ab51f85b4a0ea3416a0a62d6bde74f3af4b > Author: Yonghong Song > Date:   Fri Mar 16 15:10:02 2018 -0700 > >     [hack] hack test_bpf module to trigger BUG_ON in skb_segment. > >     "modprobe test_bpf" will have the following errors: >     ... >     [   98.149165] ------------[ cut here ]------------ >     [   98.159362] kernel BUG at net/core/skbuff.c:3667! >     [   98.169756] invalid opcode: 0000 [#1] SMP PTI >     [   98.179370] Modules linked in: >     [   98.179371]  test_bpf(+) >     ... > >     The BUG happens in function skb_segment: >     ... >     3665                 while (pos < offset + len) { >     3666                         if (i >= nfrags) { >     3667                                 BUG_ON(skb_headlen(list_skb)); Note that you do not need to pull data. Chances are high that skb->head is a page fragment, so can be considered as such ;) Look at users of skb_head_is_locked() >     3668 >     3669                                 i = 0; >     3670                                 nfrags = skb_shinfo(list_skb)->nr_frags; >     3671                                 frag = skb_shinfo(list_skb)->frags; >     3672                                 frag_skb = list_skb; >     3673 >     3674                                 BUG_ON(!nfrags); >     ... > >     The skbs are constructed to mimic what mlx5 may generate. >     The packet size/header may not mimic real cases in production. But >     the processing flow is similar. > >     Signed-off-by: Yonghong Song > > diff --git a/lib/test_bpf.c b/lib/test_bpf.c > index 2efb213..d36a991 100644 > --- a/lib/test_bpf.c > +++ b/lib/test_bpf.c > @@ -6574,6 +6574,67 @@ static bool exclude_test(int test_id) >         return test_id < test_range[0] || test_id > test_range[1]; >  } > > +static struct sk_buff *build_test_skb(void) { > +       u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > +       u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; > +       struct sk_buff *skb[2]; > +       void *data[2], *page; > +       int i, data_size = 8; > + > +       page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); > +       if (!page) > +               return NULL; > + > +       for (i = 0; i < 2; i++) { > +               data[i] = kzalloc(headroom + tailroom + data_size, GFP_KERNEL); > +               if (!data[i]) > +                       return NULL; > +               skb[i] = build_skb(data[i], 0); > +               if (!skb[i]) { > +                       kfree(data[i]); > +                       return NULL; > +               } > +               skb_reserve(skb[i], headroom); > +               skb_put(skb[i], data_size); > +               skb[i]->protocol = htons(ETH_P_IP); > +               skb_reset_network_header(skb[i]); > +               skb_set_mac_header(skb[i], -ETH_HLEN); > + > +               skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags, > +                        page, 0, 64, 64); > +       } > + > +       /* setup shinfo */ > +       skb_shinfo(skb[0])->gso_size = 1448; > +       skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; > +       skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; > +       skb_shinfo(skb[0])->gso_segs = 0; > +       skb_shinfo(skb[0])->frag_list = skb[1]; > + > +       /* adjust skb[0]'s len */ > +       skb[0]->len += skb[1]->len; > +       skb[0]->data_len += skb[1]->data_len; > +       skb[0]->truesize += skb[1]->truesize; > + > +       return skb[0]; > +} > + > +static void test_skb_segment(void) { > +       netdev_features_t features; > +       struct sk_buff *skb; > + > +       features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM; > +       features |= NETIF_F_RXCSUM; > +       skb = build_test_skb(); > +       if (!skb) > +               pr_info("Failed in test_skb_segment:build_test_skb!"); > + > +       if (skb_segment(skb, features)) > +               pr_info("Success in test_skb_segment!"); > +       else > +               pr_info("Failed in test_skb_segment!"); > +} > + >  static __init int test_bpf(void) >  { >         int i, err_cnt = 0, pass_cnt = 0; > @@ -6631,7 +6692,8 @@ static int __init test_bpf_init(void) >         if (ret < 0) >                 return ret; > > -       ret = test_bpf(); > +       // ret = test_bpf(); > +       test_skb_segment(); > >         destroy_bpf_tests(); >         return ret; > > > =============  END ================ > > > On 3/13/18 6:15 PM, Eric Dumazet wrote: >> >> >> On 03/13/2018 05:35 PM, Eric Dumazet wrote: >>> >>> >>> On 03/13/2018 05:26 PM, Eric Dumazet wrote: >>>> >>>> >>>> On 03/13/2018 05:04 PM, Alexei Starovoitov wrote: >>>>> On 3/13/18 4:27 PM, Eric Dumazet wrote: >>>>>> >>>>>> >>>>>> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote: >>>>>> >>>>>>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time. >>>>>>> It's not clear why it's not crashing there, but we cannot just >>>>>>> reject changing proto in bpf programs now. >>>>>>> We have to fix whatever needs to be fixed in skb_segment >>>>>>> (if bug is there) or fix whatever necessary on mlx5 side. >>>>>>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming >>>>>>> through virtio would do, so if skb_segment() needs to do something >>>>>>> special with skb the SKB_GSO_DODGY flag is already there. >>>>>> >>>>>> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was >>>>>> not happy with the fix and provided something else. >>>>> >>>>> any link to your old patches and discussion? >>>>> >>>>> I think since mlx4 can do tso on them and the packets came out >>>>> correct on the wire, there is nothing fundamentally wrong with >>>>> changing gso_size. Just tricky to teach skb_segment. >>>>> >>>> >>>> The world is not mlx4 only. Some NIC will ask skb_segment() fallback segmentation for various reasons (like skb->len above a given limit like 16KB) >>>> >>>> Maybe https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255549.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=0Nxx2G8PtDAEMCFAvQ7kxYTXVr9aHdOolP1KB_lnmes&e= >>> >>> >>> Herbert patch : >>> >>> commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3 >>> Author: Herbert Xu >>> Date:   Thu Nov 21 11:10:04 2013 -0800 >>> >>>      gso: handle new frag_list of frags GRO packets >>> >> >> I found my initial patch. >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255452.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=VuWRpUdJwBwTxpnMNZYgKvQANLL5UA7hZnTFZsQlK6c&e= >> >>