* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Dmitry Vyukov @ 2019-08-18 14:13 UTC (permalink / raw)
To: Xin Long
Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CADvbK_e+em+LiQOttfA9nckA4EPFuW_Q-cBmXx3S5pw5X+tQfw@mail.gmail.com>
On Sun, Aug 18, 2019 at 7:07 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Sat, Aug 17, 2019 at 2:38 AM syzbot
> <syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 459c5fb4 Merge branch 'mscc-PTP-support'
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/skbuff.h:2225!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> > RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> > RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> > Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> > 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> > 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> > RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> > RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> > RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> > RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> > R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> > R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> > FS: 0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> > Call Trace:
> > sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
> > sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
> > sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
> > sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
> > sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
> > ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> > ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> > NF_HOOK include/linux/netfilter.h:305 [inline]
> > NF_HOOK include/linux/netfilter.h:299 [inline]
> > ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> > dst_input include/net/dst.h:442 [inline]
> > ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
> Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
> as does in ip_sublist_rcv_finish().
This was recently introduced, right? Only in net-next and linux-next.
Otherwise, is it a remote DoS? If so and if it's present in any
releases, may need a CVE.
^ permalink raw reply
* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Xin Long @ 2019-08-18 14:06 UTC (permalink / raw)
To: syzbot
Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <0000000000008182a50590404a02@google.com>
On Sat, Aug 17, 2019 at 2:38 AM syzbot
<syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 459c5fb4 Merge branch 'mscc-PTP-support'
> git tree: net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> kernel config: https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> kernel BUG at include/linux/skbuff.h:2225!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS: 0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> Call Trace:
> sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
> sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
> sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
> sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
> sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
> ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> NF_HOOK include/linux/netfilter.h:305 [inline]
> NF_HOOK include/linux/netfilter.h:299 [inline]
> ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> dst_input include/net/dst.h:442 [inline]
> ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
as does in ip_sublist_rcv_finish().
> ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
> ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
> ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
> __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
> __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
> __netif_receive_skb_list net/core/dev.c:5149 [inline]
> netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
> gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
> gro_normal_list net/core/dev.c:5755 [inline]
> gro_normal_one net/core/dev.c:5769 [inline]
> napi_frags_finish net/core/dev.c:5782 [inline]
> napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
> tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
> tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
> call_write_iter include/linux/fs.h:1870 [inline]
> do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
> do_iter_write fs/read_write.c:970 [inline]
> do_iter_write+0x184/0x610 fs/read_write.c:951
> vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
> do_writev+0x15b/0x330 fs/read_write.c:1058
> __do_sys_writev fs/read_write.c:1131 [inline]
> __se_sys_writev fs/read_write.c:1128 [inline]
> __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
> do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x441b10
> Code: 05 48 3d 01 f0 ff ff 0f 83 5d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> 00 66 90 83 3d 01 95 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 34 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
> RSP: 002b:00007ffe63706b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 00007ffe63706ba0 RCX: 0000000000441b10
> RDX: 0000000000000001 RSI: 00007ffe63706bd0 RDI: 00000000000000f0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000122cb
> R13: 0000000000402960 R14: 0000000000000000 R15: 0000000000000000
> Modules linked in:
> ---[ end trace c37566c1c02066db ]---
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS: 0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* Re: [PATCH 1/3] nvmem: mxs-ocotp: update MODULE_AUTHOR() email address
From: Srinivas Kandagatla @ 2019-08-18 9:08 UTC (permalink / raw)
To: Stefan Wahren, Jean Delvare, Guenter Roeck, David S. Miller,
Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
NXP Linux Team
Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <1565720249-6549-1-git-send-email-wahrenst@gmx.net>
On 13/08/2019 19:17, Stefan Wahren wrote:
> The email address listed in MODULE_AUTHOR() will be disabled in the
> near future. Replace it with my private one.
>
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> =2D--
> drivers/nvmem/mxs-ocotp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Applied thanks.
--srini
^ permalink raw reply
* Question on xen-netfront code to fix a potential ring buffer corruption
From: Dongli Zhang @ 2019-08-18 8:31 UTC (permalink / raw)
To: xen-devel; +Cc: netdev, jgross, Joe Jin
Hi,
Would you please help confirm why the condition at line 908 is ">="?
In my opinion, we would only hit "skb_shinfo(skb)->nr_frag == MAX_SKB_FRAGS" at
line 908.
890 static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
891 struct sk_buff *skb,
892 struct sk_buff_head *list)
893 {
894 RING_IDX cons = queue->rx.rsp_cons;
895 struct sk_buff *nskb;
896
897 while ((nskb = __skb_dequeue(list))) {
898 struct xen_netif_rx_response *rx =
899 RING_GET_RESPONSE(&queue->rx, ++cons);
900 skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
901
902 if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
903 unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
904
905 BUG_ON(pull_to < skb_headlen(skb));
906 __pskb_pull_tail(skb, pull_to - skb_headlen(skb));
907 }
908 if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
909 queue->rx.rsp_cons = ++cons;
910 kfree_skb(nskb);
911 return ~0U;
912 }
913
914 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
915 skb_frag_page(nfrag),
916 rx->offset, rx->status, PAGE_SIZE);
917
918 skb_shinfo(nskb)->nr_frags = 0;
919 kfree_skb(nskb);
920 }
921
922 return cons;
923 }
The reason that I ask about this is because I am considering below patch to
avoid a potential xen-netfront ring buffer corruption.
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8d33970..48a2162 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -906,7 +906,7 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
}
if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
- queue->rx.rsp_cons = ++cons;
+ queue->rx.rsp_cons = cons + skb_queue_len(list) + 1;
kfree_skb(nskb);
return ~0U;
}
If there is skb left in list when we return ~0U, queue->rx.rsp_cons may be set
incorrectly.
While I am trying to make up a case that would hit the corruption, I could not
explain why (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)), but not
just "==". Perhaps __pskb_pull_tail() may fail although pull_to is less than
skb_headlen(skb).
Thank you very much!
Dongli Zhang
^ permalink raw reply related
* Re: [PATCH v2] virtio-net: lower min ring num_free for efficiency
From: Michael S. Tsirkin @ 2019-08-18 7:10 UTC (permalink / raw)
To: ? jiang
Cc: jasowang@redhat.com, davem@davemloft.net, ast@kernel.org,
daniel@iogearbox.net, jakub.kicinski@netronome.com,
hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
songliubraving@fb.com, yhs@fb.com,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB32058F4B2AD162F5421BB9B4A6AC0@BYAPR14MB3205.namprd14.prod.outlook.com>
On Thu, Aug 15, 2019 at 09:42:40AM +0000, ? jiang wrote:
> This change lowers ring buffer reclaim threshold from 1/2*queue to budget
> for better performance. According to our test with qemu + dpdk, packet
> dropping happens when the guest is not able to provide free buffer in
> avail ring timely with default 1/2*queue. The value in the patch has been
> tested and does show better performance.
>
> Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP)
> avg packets drop before: 2842
> avg packets drop after: 360(-87.3%)
>
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>
To add to that:
Further, current code suffers from a starvation problem: the amount of
work done by try_fill_recv is not bounded by the budget parameter, thus
(with large queues) once in a while userspace gets blocked for a long
time while queue is being refilled. Trigger refills earlier to make sure
the amount of work to do is limited.
With this addition to the log:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc08be7925eb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> }
> }
>
> - if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> + if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
> if (!try_fill_recv(vi, rq, GFP_ATOMIC))
> schedule_delayed_work(&vi->refill, 0);
> }
> --
> 2.11.0
^ permalink raw reply
* Re: [PATCH bpf-next v4 0/4] bpf: support cloning sk storage on accept()
From: Daniel Borkmann @ 2019-08-17 21:32 UTC (permalink / raw)
To: Stanislav Fomichev, netdev, bpf
Cc: davem, ast, Martin KaFai Lau, Yonghong Song
In-Reply-To: <20190814173751.31806-1-sdf@google.com>
On 8/14/19 7:37 PM, Stanislav Fomichev wrote:
> Currently there is no way to propagate sk storage from the listener
> socket to a newly accepted one. Consider the following use case:
>
> fd = socket();
> setsockopt(fd, SOL_IP, IP_TOS,...);
> /* ^^^ setsockopt BPF program triggers here and saves something
> * into sk storage of the listener.
> */
> listen(fd, ...);
> while (client = accept(fd)) {
> /* At this point all association between listener
> * socket and newly accepted one is gone. New
> * socket will not have any sk storage attached.
> */
> }
>
> Let's add new BPF_F_CLONE flag that can be specified when creating
> a socket storage map. This new flag indicates that map contents
> should be cloned when the socket is cloned.
>
> v4:
> * drop 'goto err' in bpf_sk_storage_clone (Yonghong Song)
> * add comment about race with bpf_sk_storage_map_free to the
> bpf_sk_storage_clone side as well (Daniel Borkmann)
>
> v3:
> * make sure BPF_F_NO_PREALLOC is always present when creating
> a map (Martin KaFai Lau)
> * don't call bpf_sk_storage_free explicitly, rely on
> sk_free_unlock_clone to do the cleanup (Martin KaFai Lau)
>
> v2:
> * remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
> * BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
> * hold a map while cloning (Martin KaFai Lau)
> * use BTF maps in selftests (Yonghong Song)
> * do proper cleanup selftests; don't call close(-1) (Yonghong Song)
> * export bpf_map_inc_not_zero
>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
>
> Stanislav Fomichev (4):
> bpf: export bpf_map_inc_not_zero
> bpf: support cloning sk storage on accept()
> bpf: sync bpf.h to tools/
> selftests/bpf: add sockopt clone/inheritance test
>
> include/linux/bpf.h | 2 +
> include/net/bpf_sk_storage.h | 10 +
> include/uapi/linux/bpf.h | 3 +
> kernel/bpf/syscall.c | 16 +-
> net/core/bpf_sk_storage.c | 104 ++++++-
> net/core/sock.c | 9 +-
> tools/include/uapi/linux/bpf.h | 3 +
> tools/testing/selftests/bpf/.gitignore | 1 +
> tools/testing/selftests/bpf/Makefile | 3 +-
> .../selftests/bpf/progs/sockopt_inherit.c | 97 +++++++
> .../selftests/bpf/test_sockopt_inherit.c | 253 ++++++++++++++++++
> 11 files changed, 491 insertions(+), 10 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
> create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c
>
Applied, thanks!
^ permalink raw reply
* Re: [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Daniel Borkmann @ 2019-08-17 21:31 UTC (permalink / raw)
To: Petar Penkov, netdev, bpf; +Cc: davem, ast, sdf, Petar Penkov
In-Reply-To: <20190816170825.22500-1-ppenkov.kernel@gmail.com>
On 8/16/19 7:08 PM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
>
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
>
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
>
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
>
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
>
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Daniel Borkmann @ 2019-08-17 21:30 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, netdev
Cc: andrii.nakryiko, kernel-team, Michael Holzheu, Naveen N . Rao,
David S . Miller, Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>
On 8/16/19 7:45 AM, Andrii Nakryiko wrote:
> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Hechao Li <hechaol@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Acked-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Acked-by: Lawrence Brakmo <brakmo@fb.com>
> Acked-by: Adam Barth <arb@fb.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Acked-by: Josef Bacik <jbacik@fb.com>
> Acked-by: Joe Stringer <joe@wand.net.nz>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: David Ahern <dsahern@gmail.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Lorenz Bauer <lmb@cloudflare.com>
> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Petar Penkov <ppenkov@google.com>
> Acked-by: Teng Qin <palmtenor@gmail.com>
> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Rostecki <mrostecki@opensuse.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next v2] net: Don't call XDP_SETUP_PROG when nothing is changed
From: Daniel Borkmann @ 2019-08-17 21:29 UTC (permalink / raw)
To: Maxim Mikityanskiy, Alexei Starovoitov, Jakub Kicinski
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, David S. Miller,
Björn Töpel, Saeed Mahameed, Jesper Dangaard Brouer,
John Fastabend, Martin KaFai Lau, Song Liu, Yonghong Song
In-Reply-To: <20190814143352.3759-1-maximmi@mellanox.com>
On 8/14/19 4:34 PM, Maxim Mikityanskiy wrote:
> Don't uninstall an XDP program when none is installed, and don't install
> an XDP program that has the same ID as the one already installed.
>
> dev_change_xdp_fd doesn't perform any checks in case it uninstalls an
> XDP program. It means that the driver's ndo_bpf can be called with
> XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This
> case happens if the user runs `ip link set eth0 xdp off` when there is
> no XDP program attached.
>
> The symmetrical case is possible when the user tries to set the program
> that is already set.
>
> The drivers typically perform some heavy operations on XDP_SETUP_PROG,
> so they all have to handle these cases internally to return early if
> they happen. This patch puts this check into the kernel code, so that
> all drivers will benefit from it.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next v4 0/8] add need_wakeup flag to the AF_XDP rings
From: Daniel Borkmann @ 2019-08-17 21:29 UTC (permalink / raw)
To: Magnus Karlsson, bjorn.topel, ast, netdev, brouer, maximmi
Cc: bpf, bruce.richardson, ciara.loftus, jakub.kicinski, xiaolong.ye,
qi.z.zhang, sridhar.samudrala, kevin.laatz, ilias.apalodimas,
jonathan.lemon, kiran.patil, axboe, maciej.fijalkowski,
maciejromanfijalkowski, intel-wired-lan
In-Reply-To: <1565767643-4908-1-git-send-email-magnus.karlsson@intel.com>
On 8/14/19 9:27 AM, Magnus Karlsson wrote:
> This patch set adds support for a new flag called need_wakeup in the
> AF_XDP Tx and fill rings. When this flag is set by the driver, it
> means that the application has to explicitly wake up the kernel Rx
> (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring)
> processing by issuing a syscall. Poll() can wake up both and sendto()
> will wake up Tx processing only.
>
> The main reason for introducing this new flag is to be able to
> efficiently support the case when application and driver is executing
> on the same core. Previously, the driver was just busy-spinning on the
> fill ring if it ran out of buffers in the HW and there were none to
> get from the fill ring. This approach works when the application and
> driver is running on different cores as the application can replenish
> the fill ring while the driver is busy-spinning. Though, this is a
> lousy approach if both of them are running on the same core as the
> probability of the fill ring getting more entries when the driver is
> busy-spinning is zero. With this new feature the driver now sets the
> need_wakeup flag and returns to the application. The application can
> then replenish the fill queue and then explicitly wake up the Rx
> processing in the kernel using the syscall poll(). For Tx, the flag is
> only set to one if the driver has no outstanding Tx completion
> interrupts. If it has some, the flag is zero as it will be woken up by
> a completion interrupt anyway. This flag can also be used in other
> situations where the driver needs to be woken up explicitly.
>
> As a nice side effect, this new flag also improves the Tx performance
> of the case where application and driver are running on two different
> cores as it reduces the number of syscalls to the kernel. The kernel
> tells user space if it needs to be woken up by a syscall, and this
> eliminates many of the syscalls. The Rx performance of the 2-core case
> is on the other hand slightly worse, since there is a need to use a
> syscall now to wake up the driver, instead of the driver
> busy-spinning. It does waste less CPU cycles though, which might lead
> to better overall system performance.
>
> This new flag needs some simple driver support. If the driver does not
> support it, the Rx flag is always zero and the Tx flag is always
> one. This makes any application relying on this feature default to the
> old behavior of not requiring any syscalls in the Rx path and always
> having to call sendto() in the Tx path.
>
> For backwards compatibility reasons, this feature has to be explicitly
> turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
> that you always turn it on as it has a large positive performance
> impact for the one core case and does not degrade 2 core performance
> and actually improves it for Tx heavy workloads.
>
> Here are some performance numbers measured on my local,
> non-performance optimized development system. That is why you are
> seeing numbers lower than the ones from Björn and Jesper. 64 byte
> packets at 40Gbit/s line rate. All results in Mpps. Cores == 1 means
> that both application and driver is executing on the same core. Cores
> == 2 that they are on different cores.
>
> Applications
> need_wakeup cores txpush rxdrop l2fwd
> ---------------------------------------------------------------
> n 1 0.07 0.06 0.03
> y 1 21.6 8.2 6.5
> n 2 32.3 11.7 8.7
> y 2 33.1 11.7 8.7
>
> Overall, the need_wakeup flag provides the same or better performance
> in all the micro-benchmarks. The reduction of sendto() calls in txpush
> is large. Only a few per second is needed. For l2fwd, the drop is 50%
> for the 1 core case and more than 99.9% for the 2 core case. Do not
> know why I am not seeing the same drop for the 1 core case yet.
>
> The name and inspiration of the flag has been taken from io_uring by
> Jens Axboe. Details about this feature in io_uring can be found in
> http://kernel.dk/io_uring.pdf, section 8.3. It also addresses most of
> the denial of service and sendto() concerns raised by Maxim
> Mikityanskiy in https://www.spinics.net/lists/netdev/msg554657.html.
>
> The typical Tx part of an application will have to change from:
>
> ret = sendto(fd,....)
>
> to:
>
> if (xsk_ring_prod__needs_wakeup(&xsk->tx))
> ret = sendto(fd,....)
>
> and th Rx part from:
>
> rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> if (!rcvd)
> return;
>
> to:
>
> rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> if (!rcvd) {
> if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
> ret = poll(fd,.....);
> return;
> }
>
> v3 -> v4:
> * Maxim found a possible race in the Tx part of the driver. The
> setting of the flag needs to happen before the sending, otherwise it
> might trigger this race. Fixed in ixgbe and i40e driver.
> * Mellanox support contributed by Maxim
> * Removed the XSK_DRV_CAN_SLEEP flag as it was not used
> anymore. Thanks to Sridhar for discovering this.
> * For consistency the feature is now always called need_wakeup. There
> were some places where it was referred to as might_sleep, but they
> have been removed. Thanks to Sridhar for spotting.
> * Fixed some typos in the commit messages
>
> v2 -> v3:
> * Converted the Mellanox driver to the new ndo in patch 1 as pointed out
> by Maxim
> * Fixed the compatibility code of XDP_MMAP_OFFSETS so it now works.
>
> v1 -> v2:
> * Fixed bisectability problem pointed out by Jakub
> * Added missing initiliztion of the Tx need_wakeup flag to 1
>
> This patch has been applied against commit b753c5a7f99f ("Merge branch 'r8152-RX-improve'")
>
> Structure of the patch set:
>
> Patch 1: Replaces the ndo_xsk_async_xmit with ndo_xsk_wakeup to
> support waking up both Rx and Tx processing
> Patch 2: Implements the need_wakeup functionality in common code
> Patch 3-4: Add need_wakeup support to the i40e and ixgbe drivers
> Patch 5: Add need_wakeup support to libbpf
> Patch 6: Add need_wakeup support to the xdpsock sample application
> Patch 7-8: Add need_wakeup support to the Mellanox mlx5 driver
>
> Thanks: Magnus
>
> Magnus Karlsson (6):
> xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup
> xsk: add support for need_wakeup flag in AF_XDP rings
> i40e: add support for AF_XDP need_wakeup feature
> ixgbe: add support for AF_XDP need_wakeup feature
> libbpf: add support for need_wakeup flag in AF_XDP part
> samples/bpf: add use of need_wakeup flag in xdpsock
>
> Maxim Mikityanskiy (2):
> net/mlx5e: Move the SW XSK code from NAPI poll to a separate function
> net/mlx5e: Add AF_XDP need_wakeup support
>
> drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +-
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 25 ++-
> drivers/net/ethernet/intel/i40e/i40e_xsk.h | 2 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +-
> .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 2 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 22 ++-
> .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 14 ++
> .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h | 14 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 27 ++-
> include/linux/netdevice.h | 14 +-
> include/net/xdp_sock.h | 33 +++-
> include/uapi/linux/if_xdp.h | 13 ++
> net/xdp/xdp_umem.c | 12 +-
> net/xdp/xsk.c | 149 +++++++++++++---
> net/xdp/xsk.h | 13 ++
> net/xdp/xsk_queue.h | 1 +
> samples/bpf/xdpsock_user.c | 192 +++++++++++++--------
> tools/include/uapi/linux/if_xdp.h | 13 ++
> tools/lib/bpf/xsk.c | 4 +
> tools/lib/bpf/xsk.h | 6 +
> 23 files changed, 462 insertions(+), 115 deletions(-)
>
> --
> 2.7.4
>
Series applied, thanks everyone!
^ permalink raw reply
* Re: [PATCH bpf-next v5 0/2] net: xdp: XSKMAP improvements
From: Daniel Borkmann @ 2019-08-17 21:28 UTC (permalink / raw)
To: Björn Töpel, ast, netdev
Cc: magnus.karlsson, jonathan.lemon, bjorn.topel, bruce.richardson,
songliubraving, bpf
In-Reply-To: <20190815093014.31174-1-bjorn.topel@gmail.com>
On 8/15/19 11:30 AM, Björn Töpel wrote:
> This series (v5 and counting) add two improvements for the XSKMAP,
> used by AF_XDP sockets.
>
> 1. Automatic cleanup when an AF_XDP socket goes out of scope/is
> released. Instead of require that the user manually clears the
> "released" state socket from the map, this is done
> automatically. Each socket tracks which maps it resides in, and
> remove itself from those maps at relase. A notable implementation
> change, is that the sockets references the map, instead of the map
> referencing the sockets. Which implies that when the XSKMAP is
> freed, it is by definition cleared of sockets.
>
> 2. The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flag on insert,
> which this patch addresses.
>
>
> Thanks,
> Björn
>
> v1->v2: Fixed deadlock and broken cleanup. (Daniel)
> v2->v3: Rebased onto bpf-next
> v3->v4: {READ, WRITE}_ONCE consistency. (Daniel)
> Socket release/map update race. (Daniel)
> v4->v5: Avoid use-after-free on XSKMAP self-assignment [1]. (Daniel)
> Removed redundant assignment in xsk_map_update_elem().
> Variable name consistency; Use map_entry everywhere.
>
> [1] https://lore.kernel.org/bpf/20190802081154.30962-1-bjorn.topel@gmail.com/T/#mc68439e97bc07fa301dad9fc4850ed5aa392f385
>
> Björn Töpel (2):
> xsk: remove AF_XDP socket from map when the socket is released
> xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP
>
> include/net/xdp_sock.h | 18 ++++++
> kernel/bpf/xskmap.c | 133 ++++++++++++++++++++++++++++++++++-------
> net/xdp/xsk.c | 50 ++++++++++++++++
> 3 files changed, 179 insertions(+), 22 deletions(-)
>
Looks better, applied thanks!
^ permalink raw reply
* [PATCH net v2 6/6] bnxt_en: Fix to include flow direction in L2 key
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev, Somnath Kotur
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
From: Somnath Kotur <somnath.kotur@broadcom.com>
FW expects the driver to provide unique flow reference handles
for Tx or Rx flows. When a Tx flow and an Rx flow end up sharing
a reference handle, flow offload does not seem to work.
This could happen in the case of 2 flows having their L2 fields
wildcarded but in different direction.
Fix to incorporate the flow direction as part of the L2 key
v2: Move the dir field to the end of the bnxt_tc_l2_key struct to
fix the warning reported by kbuild test robot <lkp@intel.com>.
There is existing code that initializes the structure using
nested initializer and will warn with the new u8 field added to
the beginning. The structure also packs nicer when this new u8 is
added to the end of the structure [MChan].
Fixes: abd43a13525d ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 4 ++--
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 6224c30..dd621f6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1236,7 +1236,7 @@ static int __bnxt_tc_del_flow(struct bnxt *bp,
static void bnxt_tc_set_flow_dir(struct bnxt *bp, struct bnxt_tc_flow *flow,
u16 src_fid)
{
- flow->dir = (bp->pf.fw_fid == src_fid) ? BNXT_DIR_RX : BNXT_DIR_TX;
+ flow->l2_key.dir = (bp->pf.fw_fid == src_fid) ? BNXT_DIR_RX : BNXT_DIR_TX;
}
static void bnxt_tc_set_src_fid(struct bnxt *bp, struct bnxt_tc_flow *flow,
@@ -1405,7 +1405,7 @@ static void bnxt_fill_cfa_stats_req(struct bnxt *bp,
* 2. 15th bit of flow_handle must specify the flow
* direction (TX/RX).
*/
- if (flow_node->flow.dir == BNXT_DIR_RX)
+ if (flow_node->flow.l2_key.dir == BNXT_DIR_RX)
handle = CFA_FLOW_INFO_REQ_FLOW_HANDLE_DIR_RX |
CFA_FLOW_INFO_REQ_FLOW_HANDLE_MAX_MASK;
else
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
index ffec57d..4f05305 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
@@ -23,6 +23,9 @@ struct bnxt_tc_l2_key {
__be16 inner_vlan_tci;
__be16 ether_type;
u8 num_vlans;
+ u8 dir;
+#define BNXT_DIR_RX 1
+#define BNXT_DIR_TX 0
};
struct bnxt_tc_l3_key {
@@ -98,9 +101,6 @@ struct bnxt_tc_flow {
/* flow applicable to pkts ingressing on this fid */
u16 src_fid;
- u8 dir;
-#define BNXT_DIR_RX 1
-#define BNXT_DIR_TX 0
struct bnxt_tc_l2_key l2_key;
struct bnxt_tc_l2_key l2_mask;
struct bnxt_tc_l3_key l3_key;
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 5/6] bnxt_en: Use correct src_fid to determine direction of the flow
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev, Venkat Duvvuru
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
From: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Direction of the flow is determined using src_fid. For an RX flow,
src_fid is PF's fid and for TX flow, src_fid is VF's fid. Direction
of the flow must be specified, when getting statistics for that flow.
Currently, for DECAP flow, direction is determined incorrectly, i.e.,
direction is initialized as TX for DECAP flow, instead of RX. Because
of which, stats are not reported for this DECAP flow, though it is
offloaded and there is traffic for that flow, resulting in flow age out.
This patch fixes the problem by determining the DECAP flow's direction
using correct fid. Set the flow direction in all cases for consistency
even if 64-bit flow handle is not used.
Fixes: abd43a13525d ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 6fe4a71..6224c30 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1285,9 +1285,7 @@ static int bnxt_tc_add_flow(struct bnxt *bp, u16 src_fid,
goto free_node;
bnxt_tc_set_src_fid(bp, flow, src_fid);
-
- if (bp->fw_cap & BNXT_FW_CAP_OVS_64BIT_HANDLE)
- bnxt_tc_set_flow_dir(bp, flow, src_fid);
+ bnxt_tc_set_flow_dir(bp, flow, flow->src_fid);
if (!bnxt_tc_can_offload(bp, flow)) {
rc = -EOPNOTSUPP;
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 3/6] bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
If FW returns FRAG_ERR in response error code, driver is resending the
command only when HWRM command returns success. Fix the code to resend
NVM_INSTALL_UPDATE command with DEFRAG install flags, if FW returns
FRAG_ERR in its response error code.
Fixes: cb4d1d626145 ("bnxt_en: Retry failed NVM_INSTALL_UPDATE with defragmentation flag enabled.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index c7ee63d..8445a0c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2016,21 +2016,19 @@ static int bnxt_flash_package_from_file(struct net_device *dev,
mutex_lock(&bp->hwrm_cmd_lock);
hwrm_err = _hwrm_send_message(bp, &install, sizeof(install),
INSTALL_PACKAGE_TIMEOUT);
- if (hwrm_err)
- goto flash_pkg_exit;
-
- if (resp->error_code) {
+ if (hwrm_err) {
u8 error_code = ((struct hwrm_err_output *)resp)->cmd_err;
- if (error_code == NVM_INSTALL_UPDATE_CMD_ERR_CODE_FRAG_ERR) {
+ if (resp->error_code && error_code ==
+ NVM_INSTALL_UPDATE_CMD_ERR_CODE_FRAG_ERR) {
install.flags |= cpu_to_le16(
NVM_INSTALL_UPDATE_REQ_FLAGS_ALLOWED_TO_DEFRAG);
hwrm_err = _hwrm_send_message(bp, &install,
sizeof(install),
INSTALL_PACKAGE_TIMEOUT);
- if (hwrm_err)
- goto flash_pkg_exit;
}
+ if (hwrm_err)
+ goto flash_pkg_exit;
}
if (resp->result) {
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 4/6] bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
For newly added NVM parameters, older firmware may not have the support.
Suppress the error message to avoid the unncessary error message which is
triggered when devlink calls the driver during initialization.
Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial params table and register it.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 549c90d3..c05d663 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -98,10 +98,13 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
if (idx)
req->dimensions = cpu_to_le16(1);
- if (req->req_type == cpu_to_le16(HWRM_NVM_SET_VARIABLE))
+ if (req->req_type == cpu_to_le16(HWRM_NVM_SET_VARIABLE)) {
memcpy(data_addr, buf, bytesize);
-
- rc = hwrm_send_message(bp, msg, msg_len, HWRM_CMD_TIMEOUT);
+ rc = hwrm_send_message(bp, msg, msg_len, HWRM_CMD_TIMEOUT);
+ } else {
+ rc = hwrm_send_message_silent(bp, msg, msg_len,
+ HWRM_CMD_TIMEOUT);
+ }
if (!rc && req->req_type == cpu_to_le16(HWRM_NVM_GET_VARIABLE))
memcpy(buf, data_addr, bytesize);
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 2/6] bnxt_en: Improve RX doorbell sequence.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
When both RX buffers and RX aggregation buffers have to be
replenished at the end of NAPI, post the RX aggregation buffers first
before RX buffers. Otherwise, we may run into a situation where
there are only RX buffers without RX aggregation buffers for a split
second. This will cause the hardware to abort the RX packet and
report buffer errors, which will cause unnecessary cleanup by the
driver.
Ringing the Aggregation ring doorbell first before the RX ring doorbell
will prevent some of these buffer errors. Use the same sequence during
ring initialization as well.
Fixes: 697197e5a173 ("bnxt_en: Re-structure doorbells.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1ef224f..8dce406 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2021,9 +2021,9 @@ static void __bnxt_poll_work_done(struct bnxt *bp, struct bnxt_napi *bnapi)
if (bnapi->events & BNXT_RX_EVENT) {
struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
- bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
if (bnapi->events & BNXT_AGG_EVENT)
bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod);
+ bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
}
bnapi->events = 0;
}
@@ -5064,6 +5064,7 @@ static void bnxt_set_db(struct bnxt *bp, struct bnxt_db_info *db, u32 ring_type,
static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
{
+ bool agg_rings = !!(bp->flags & BNXT_FLAG_AGG_RINGS);
int i, rc = 0;
u32 type;
@@ -5139,7 +5140,9 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
if (rc)
goto err_out;
bnxt_set_db(bp, &rxr->rx_db, type, map_idx, ring->fw_ring_id);
- bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
+ /* If we have agg rings, post agg buffers first. */
+ if (!agg_rings)
+ bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
bp->grp_info[map_idx].rx_fw_ring_id = ring->fw_ring_id;
if (bp->flags & BNXT_FLAG_CHIP_P5) {
struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
@@ -5158,7 +5161,7 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
}
}
- if (bp->flags & BNXT_FLAG_AGG_RINGS) {
+ if (agg_rings) {
type = HWRM_RING_ALLOC_AGG;
for (i = 0; i < bp->rx_nr_rings; i++) {
struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
@@ -5174,6 +5177,7 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
bnxt_set_db(bp, &rxr->rx_agg_db, type, map_idx,
ring->fw_ring_id);
bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod);
+ bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
bp->grp_info[grp_idx].agg_fw_ring_id = ring->fw_ring_id;
}
}
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 1/6] bnxt_en: Fix VNIC clearing logic for 57500 chips.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>
During device shutdown, the VNIC clearing sequence needs to be modified
to free the VNIC first before freeing the RSS contexts. The current
code is doing the reverse and we can get mis-directed RX completions
to CP ring ID 0 when the RSS contexts are freed and zeroed. The clearing
of RSS contexts is not required with the new sequence.
Refactor the VNIC clearing logic into a new function bnxt_clear_vnic()
and do the chip specific VNIC clearing sequence.
Fixes: 7b3af4f75b81 ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7070349..1ef224f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7016,19 +7016,29 @@ static void bnxt_hwrm_clear_vnic_rss(struct bnxt *bp)
bnxt_hwrm_vnic_set_rss(bp, i, false);
}
-static void bnxt_hwrm_resource_free(struct bnxt *bp, bool close_path,
- bool irq_re_init)
+static void bnxt_clear_vnic(struct bnxt *bp)
{
- if (bp->vnic_info) {
- bnxt_hwrm_clear_vnic_filter(bp);
+ if (!bp->vnic_info)
+ return;
+
+ bnxt_hwrm_clear_vnic_filter(bp);
+ if (!(bp->flags & BNXT_FLAG_CHIP_P5)) {
/* clear all RSS setting before free vnic ctx */
bnxt_hwrm_clear_vnic_rss(bp);
bnxt_hwrm_vnic_ctx_free(bp);
- /* before free the vnic, undo the vnic tpa settings */
- if (bp->flags & BNXT_FLAG_TPA)
- bnxt_set_tpa(bp, false);
- bnxt_hwrm_vnic_free(bp);
}
+ /* before free the vnic, undo the vnic tpa settings */
+ if (bp->flags & BNXT_FLAG_TPA)
+ bnxt_set_tpa(bp, false);
+ bnxt_hwrm_vnic_free(bp);
+ if (bp->flags & BNXT_FLAG_CHIP_P5)
+ bnxt_hwrm_vnic_ctx_free(bp);
+}
+
+static void bnxt_hwrm_resource_free(struct bnxt *bp, bool close_path,
+ bool irq_re_init)
+{
+ bnxt_clear_vnic(bp);
bnxt_hwrm_ring_free(bp, close_path);
bnxt_hwrm_ring_grp_free(bp);
if (irq_re_init) {
--
2.5.1
^ permalink raw reply related
* [PATCH net v2 0/6] bnxt_en: Bug fixes.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev
2 Bug fixes related to 57500 shutdown sequence and doorbell sequence,
2 TC Flower bug fixes related to the setting of the flow direction,
1 NVRAM update bug fix, and a minor fix to suppress an unnecessary
error message. Please queue for -stable as well. Thanks.
v2: Fix compile warning reported by kbuild test robot on patch #6.
Michael Chan (2):
bnxt_en: Fix VNIC clearing logic for 57500 chips.
bnxt_en: Improve RX doorbell sequence.
Somnath Kotur (1):
bnxt_en: Fix to include flow direction in L2 key
Vasundhara Volam (2):
bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
Venkat Duvvuru (1):
bnxt_en: Use correct src_fid to determine direction of the flow
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 ++++++++++++++++-------
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 9 ++++--
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++++----
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 8 ++---
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h | 6 ++--
5 files changed, 42 insertions(+), 29 deletions(-)
--
2.5.1
^ permalink raw reply
* Re: [PATCH RFC v2 net-next 0/4] mv88e6xxx: setting 2500base-x mode for CPU/DSA port in dts
From: Vivien Didelot @ 2019-08-17 19:50 UTC (permalink / raw)
To: Marek Behún
Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
Marek Behún
In-Reply-To: <20190817191452.16716-1-marek.behun@nic.cz>
Hi Marek,
On Sat, 17 Aug 2019 21:14:48 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> Hi,
>
> here is another proposal for supporting setting 2500base-x mode for
> CPU/DSA ports in device tree correctly.
>
> The changes from v1 are that instead of adding .port_setup() and
> .port_teardown() methods to the DSA operations struct we instead, for
> CPU/DSA ports, call dsa_port_enable() from dsa_port_setup(), but only
> after the port is registered (and required phylink/devlink structures
> exist).
>
> The .port_enable/.port_disable methods are now only meant to be used
> for user ports, when the slave interface is brought up/down. This
> proposal changes that in such a way that these methods are also called
> for CPU/DSA ports, but only just after the switch is set up (and just
> before the switch is tore down).
>
> If we went this way, we would have to patch the other DSA drivers to
> check if user port is being given in their respective .port_enable
> and .port_disable implmentations.
>
> What do you think about this?
This looks much better. Let me pass through all patches of this RFC so that
I can include bits I would like to see in your next series.
Thanks,
Vivien
^ permalink raw reply
* Re: [net-next 11/16] net/mlx5e: Report and recover from rx timeout
From: David Miller @ 2019-08-17 19:48 UTC (permalink / raw)
To: saeedm; +Cc: netdev, ayal, tariqt, jiri
In-Reply-To: <20190815190911.12050-12-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 15 Aug 2019 19:10:07 +0000
> +static int mlx5e_rx_reporter_timeout_recover(void *ctx)
> +{
> + struct mlx5e_rq *rq = ctx;
> + struct mlx5e_icosq *icosq = &rq->channel->icosq;
> + struct mlx5_eq_comp *eq = rq->cq.mcq.eq;
> + int err;
In this and several further patches, this non-reverse-christmas tree
sequence appears. Please fix it.
Put the variable assignments into the body of the function if you have
to in order to make this styled correctly.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next v3 00/12] net: stmmac: Improvements for -next
From: David Miller @ 2019-08-17 19:44 UTC (permalink / raw)
To: Jose.Abreu
Cc: netdev, Joao.Pinto, jakub.kicinski, peppe.cavallaro,
alexandre.torgue, mcoquelin.stm32, linux-stm32, linux-arm-kernel,
linux-kernel
In-Reply-To: <cover.1566067802.git.joabreu@synopsys.com>
From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Sat, 17 Aug 2019 20:54:39 +0200
> Couple of improvements for -next tree. More info in commit logs.
Series applied.
^ permalink raw reply
* Re: [PATCH net-next v3 00/16] Add drop monitor for offloaded data paths
From: David Miller @ 2019-08-17 19:41 UTC (permalink / raw)
To: idosch
Cc: netdev, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
idosch
In-Reply-To: <20190817132825.29790-1-idosch@idosch.org>
From: Ido Schimmel <idosch@idosch.org>
Date: Sat, 17 Aug 2019 16:28:09 +0300
> Users have several ways to debug the kernel and understand why a packet
> was dropped. For example, using drop monitor and perf. Both utilities
> trace kfree_skb(), which is the function called when a packet is freed
> as part of a failure. The information provided by these tools is
> invaluable when trying to understand the cause of a packet loss.
>
> In recent years, large portions of the kernel data path were offloaded
> to capable devices. Today, it is possible to perform L2 and L3
> forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> Different TC classifiers and actions are also offloaded to capable
> devices, at both ingress and egress.
>
> However, when the data path is offloaded it is not possible to achieve
> the same level of introspection since packets are dropped by the
> underlying device and never reach the kernel.
>
> This patchset aims to solve this by allowing users to monitor packets
> that the underlying device decided to drop along with relevant metadata
> such as the drop reason and ingress port.
...
Looks great, series applied, thanks.
^ permalink raw reply
* Re: pull request: bluetooth 2019-08-17
From: David Miller @ 2019-08-17 19:38 UTC (permalink / raw)
To: johan.hedberg; +Cc: netdev, linux-bluetooth
In-Reply-To: <20190817114451.GA10661@abukhnin-mobl1.ccr.corp.intel.com>
From: Johan Hedberg <johan.hedberg@gmail.com>
Date: Sat, 17 Aug 2019 14:44:51 +0300
> Here's a set of Bluetooth fixes for the 5.3-rc series:
>
> - Multiple fixes for Qualcomm (btqca & hci_qca) drivers
> - Minimum encryption key size debugfs setting (this is required for
> Bluetooth Qualification)
> - Fix hidp_send_message() to have a meaningful return value
Pulled, thanks.
^ permalink raw reply
* Re: [PATCH net-next v3 0/4] net: bridge: mdb: allow dump/add/del of host-joined entries
From: David Miller @ 2019-08-17 19:37 UTC (permalink / raw)
To: nikolay; +Cc: netdev, roopa, bridge
In-Reply-To: <20190817112213.27097-1-nikolay@cumulusnetworks.com>
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Sat, 17 Aug 2019 14:22:09 +0300
> This set makes the bridge dump host-joined mdb entries, they should be
> treated as normal entries since they take a slot and are aging out.
> We already have notifications for them but we couldn't dump them until
> now so they remained hidden. We dump them similar to how they're
> notified, in order to keep user-space compatibility with the dumped
> objects (e.g. iproute2 dumps mdbs in a format which can be fed into
> add/del commands) we allow host-joined groups also to be added/deleted via
> mdb commands. That can later be used for L2 mcast MAC manipulation as
> was recently discussed. Note that iproute2 changes are not necessary,
> this set will work with the current user-space mdb code.
>
> Patch 01 - a trivial comment move
> Patch 02 - factors out the mdb filling code so it can be
> re-used for the host-joined entries
> Patch 03 - dumps host-joined entries
> Patch 04 - allows manipulation of host-joined entries via standard mdb
> calls
>
> v3: fix compiler warning in patch 04 (DaveM)
> v2: change patch 04 to avoid double notification and improve host group
> manual removal if no ports are present in the group
Series applied.
^ permalink raw reply
* Re: [PATCH net-next v3 0/3] net: phy: remove genphy_config_init
From: David Miller @ 2019-08-17 19:35 UTC (permalink / raw)
To: hkallweit1
Cc: andrew, f.fainelli, khilman, vivien.didelot, netdev,
linux-amlogic
In-Reply-To: <8790db9d-af10-c3b1-bc65-ee21bb99e6d9@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 17 Aug 2019 12:28:18 +0200
> Supported PHY features are either auto-detected or explicitly set.
> In both cases calling genphy_config_init isn't needed. All that
> genphy_config_init does is removing features that are set as
> supported but can't be auto-detected. Basically it duplicates the
> code in genphy_read_abilities. Therefore remove genphy_config_init.
>
> v2:
> - remove call also from new adin driver
> v3:
> - pass NULL as config_init function pointer for dp83848
Series applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox