Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: KMSAN: uninit-value in netif_skb_features
From: Dmitry Vyukov @ 2018-04-12  8:03 UTC (permalink / raw)
  To: syzbot
  Cc: bpoirier, David Miller, Eric Dumazet, Reshetova, Elena, Kees Cook,
	LKML, Mike Maloney, netdev, rami.rosen, syzkaller-bugs,
	Willem de Bruijn, makita.toshiaki
In-Reply-To: <089e082d0cb81b67d10569a2283f@google.com>

On Thu, Apr 12, 2018 at 10:01 AM, syzbot
<syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on https://github.com/google/kmsan.git/master
> commit
> e2ab7e8abba47a2f2698216258e5d8727ae58717 (Fri Apr 6 16:24:31 2018 +0000)
> kmsan: temporarily disable visitAsmInstruction() to help syzbot
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=0bbe42c764feafa82c5a
>
> So far this crash happened 30 times on
> https://github.com/google/kmsan.git/master.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4850744041668608
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=6289386287136768
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4577411249209344
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=6627248707860932248
> compiler: clang version 7.0.0 (trunk 329391)

+Toshiaki as this seems to be related to the recent vlan tagging changes.
This also seems to be related to
https://groups.google.com/d/msg/syzkaller-bugs/FNEavkB4QaM/efXl2AeRBgAJ



> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> ==================================================================
> BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283
> [inline]
> BUG: KMSAN: uninit-value in skb_vlan_tagged_multi
> include/linux/if_vlan.h:656 [inline]
> BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672
> [inline]
> BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline]
> BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0
> net/core/dev.c:3009
> CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
>  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
>  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
>  eth_type_vlan include/linux/if_vlan.h:283 [inline]
>  skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
>  vlan_features_check include/linux/if_vlan.h:672 [inline]
>  dflt_features_check net/core/dev.c:2949 [inline]
>  netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
>  validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084
>  __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549
>  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
>  packet_snd net/packet/af_packet.c:2944 [inline]
>  packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg net/socket.c:640 [inline]
>  sock_write_iter+0x3b9/0x470 net/socket.c:909
>  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
>  do_iter_write+0x30d/0xd40 fs/read_write.c:932
>  vfs_writev fs/read_write.c:977 [inline]
>  do_writev+0x3c9/0x830 fs/read_write.c:1012
>  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
>  SyS_writev+0x56/0x80 fs/read_write.c:1082
>  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x43ffa9
> RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9
> RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003
> RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0
> R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000
>
> Uninit was created at:
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
>  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
>  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
>  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
>  slab_post_alloc_hook mm/slab.h:445 [inline]
>  slab_alloc_node mm/slub.c:2737 [inline]
>  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
>  __kmalloc_reserve net/core/skbuff.c:138 [inline]
>  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
>  alloc_skb include/linux/skbuff.h:984 [inline]
>  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
>  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
>  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
>  packet_snd net/packet/af_packet.c:2894 [inline]
>  packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg net/socket.c:640 [inline]
>  sock_write_iter+0x3b9/0x470 net/socket.c:909
>  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
>  do_iter_write+0x30d/0xd40 fs/read_write.c:932
>  vfs_writev fs/read_write.c:977 [inline]
>  do_writev+0x3c9/0x830 fs/read_write.c:1012
>  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
>  SyS_writev+0x56/0x80 fs/read_write.c:1082
>  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> ==================================================================
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/089e082d0cb81b67d10569a2283f%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* KMSAN: uninit-value in __netif_receive_skb_core
From: syzbot @ 2018-04-12  8:01 UTC (permalink / raw)
  To: bpoirier, davem, edumazet, elena.reshetova, ishkamiel, keescook,
	linux-kernel, maloney, netdev, rami.rosen, syzkaller-bugs,
	willemb

Hello,

syzbot hit the following crash on  
https://github.com/google/kmsan.git/master commit
e2ab7e8abba47a2f2698216258e5d8727ae58717 (Fri Apr 6 16:24:31 2018 +0000)
kmsan: temporarily disable visitAsmInstruction() to help syzbot
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b202b7208664142954fa

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5356516437655552
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=6627248707860932248
compiler: clang version 7.0.0 (trunk 329391)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b202b7208664142954fa@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

==================================================================
BUG: KMSAN: uninit-value in __read_once_size include/linux/compiler.h:197  
[inline]
BUG: KMSAN: uninit-value in deliver_ptype_list_skb net/core/dev.c:1908  
[inline]
BUG: KMSAN: uninit-value in __netif_receive_skb_core+0x4630/0x4a80  
net/core/dev.c:4545
CPU: 0 PID: 5999 Comm: syz-executor3 Not tainted 4.16.0+ #82
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
  __read_once_size include/linux/compiler.h:197 [inline]
  deliver_ptype_list_skb net/core/dev.c:1908 [inline]
  __netif_receive_skb_core+0x4630/0x4a80 net/core/dev.c:4545
  __netif_receive_skb net/core/dev.c:4627 [inline]
  process_backlog+0x62d/0xe20 net/core/dev.c:5307
  napi_poll net/core/dev.c:5705 [inline]
  net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
  __do_softirq+0x56d/0x93d kernel/softirq.c:285
  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1040
  </IRQ>
  do_softirq kernel/softirq.c:329 [inline]
  __local_bh_enable_ip+0x114/0x140 kernel/softirq.c:182
  local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
  rcu_read_unlock_bh include/linux/rcupdate.h:726 [inline]
  __dev_queue_xmit+0x2a31/0x2b60 net/core/dev.c:3584
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  packet_snd net/packet/af_packet.c:2944 [inline]
  packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
  do_iter_write+0x30d/0xd40 fs/read_write.c:932
  vfs_writev fs/read_write.c:977 [inline]
  do_writev+0x3c9/0x830 fs/read_write.c:1012
  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
  SyS_writev+0x56/0x80 fs/read_write.c:1082
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x455259
RSP: 002b:00007fb53ede8c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00007fb53ede96d4 RCX: 0000000000455259
RDX: 0000000000000001 RSI: 00000000200010c0 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000006cd R14: 00000000006fd3d8 R15: 0000000000000000

Uninit was stored to memory at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
  kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
  __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
  skb_vlan_untag+0x950/0xee0 include/linux/if_vlan.h:597
  __netif_receive_skb_core+0x70a/0x4a80 net/core/dev.c:4460
  __netif_receive_skb net/core/dev.c:4627 [inline]
  process_backlog+0x62d/0xe20 net/core/dev.c:5307
  napi_poll net/core/dev.c:5705 [inline]
  net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
  __do_softirq+0x56d/0x93d kernel/softirq.c:285
Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:984 [inline]
  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
  packet_snd net/packet/af_packet.c:2894 [inline]
  packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
  do_iter_write+0x30d/0xd40 fs/read_write.c:932
  vfs_writev fs/read_write.c:977 [inline]
  do_writev+0x3c9/0x830 fs/read_write.c:1012
  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
  SyS_writev+0x56/0x80 fs/read_write.c:1082
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* KMSAN: uninit-value in netif_skb_features
From: syzbot @ 2018-04-12  8:01 UTC (permalink / raw)
  To: bpoirier, davem, edumazet, elena.reshetova, keescook,
	linux-kernel, maloney, netdev, rami.rosen, syzkaller-bugs,
	willemb

Hello,

syzbot hit the following crash on  
https://github.com/google/kmsan.git/master commit
e2ab7e8abba47a2f2698216258e5d8727ae58717 (Fri Apr 6 16:24:31 2018 +0000)
kmsan: temporarily disable visitAsmInstruction() to help syzbot
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=0bbe42c764feafa82c5a

So far this crash happened 30 times on  
https://github.com/google/kmsan.git/master.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4850744041668608
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=6289386287136768
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4577411249209344
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=6627248707860932248
compiler: clang version 7.0.0 (trunk 329391)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

==================================================================
BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283  
[inline]
BUG: KMSAN: uninit-value in skb_vlan_tagged_multi  
include/linux/if_vlan.h:656 [inline]
BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672  
[inline]
BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline]
BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0  
net/core/dev.c:3009
CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
  eth_type_vlan include/linux/if_vlan.h:283 [inline]
  skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
  vlan_features_check include/linux/if_vlan.h:672 [inline]
  dflt_features_check net/core/dev.c:2949 [inline]
  netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
  validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084
  __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  packet_snd net/packet/af_packet.c:2944 [inline]
  packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
  do_iter_write+0x30d/0xd40 fs/read_write.c:932
  vfs_writev fs/read_write.c:977 [inline]
  do_writev+0x3c9/0x830 fs/read_write.c:1012
  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
  SyS_writev+0x56/0x80 fs/read_write.c:1082
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x43ffa9
RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9
RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0
R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:984 [inline]
  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
  packet_snd net/packet/af_packet.c:2894 [inline]
  packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  sock_write_iter+0x3b9/0x470 net/socket.c:909
  do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
  do_iter_write+0x30d/0xd40 fs/read_write.c:932
  vfs_writev fs/read_write.c:977 [inline]
  do_writev+0x3c9/0x830 fs/read_write.c:1012
  SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
  SyS_writev+0x56/0x80 fs/read_write.c:1082
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* RE: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap
From: Karlsson, Magnus @ 2018-04-12  7:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Björn Töpel
  Cc: Duyck, Alexander H, alexander.duyck@gmail.com,
	john.fastabend@gmail.com, ast@fb.com, brouer@redhat.com,
	willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, michael.lundkvist@ericsson.com,
	Brandeburg, Jesse, Singhai, Anjali, Zhang, Qi Z,
	ravineet.singh@ericsson.com
In-Reply-To: <20180412050542-mutt-send-email-mst@kernel.org>



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 12, 2018 4:16 AM
> To: Björn Töpel <bjorn.topel@gmail.com>
> Cc: Karlsson, Magnus <magnus.karlsson@intel.com>; Duyck, Alexander H
> <alexander.h.duyck@intel.com>; alexander.duyck@gmail.com;
> john.fastabend@gmail.com; ast@fb.com; brouer@redhat.com;
> willemdebruijn.kernel@gmail.com; daniel@iogearbox.net;
> netdev@vger.kernel.org; michael.lundkvist@ericsson.com; Brandeburg,
> Jesse <jesse.brandeburg@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ravineet.singh@ericsson.com
> Subject: Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and
> mmap
> 
> On Tue, Mar 27, 2018 at 06:59:08PM +0200, Björn Töpel wrote:
> > @@ -30,4 +31,18 @@ struct xdp_umem_reg {
> >  	__u32 frame_headroom; /* Frame head room */  };
> >
> > +/* Pgoff for mmaping the rings */
> > +#define XDP_UMEM_PGOFF_FILL_QUEUE	0x100000000
> > +
> > +struct xdp_queue {
> > +	__u32 head_idx __attribute__((aligned(64)));
> > +	__u32 tail_idx __attribute__((aligned(64))); };
> > +
> > +/* Used for the fill and completion queues for buffers */ struct
> > +xdp_umem_queue {
> > +	struct xdp_queue ptrs;
> > +	__u32 desc[0] __attribute__((aligned(64))); };
> > +
> >  #endif /* _LINUX_IF_XDP_H */
> 
> So IIUC it's a head/tail ring of 32 bit descriptors.
> 
> In my experience (from implementing ptr_ring) this implies that head/tail
> cache lines bounce a lot between CPUs. Caching will help some. You are also
> forced to use barriers to check validity which is slow on some architectures.
> 
> If instead you can use a special descriptor value (e.g. 0) as a valid signal,
> things work much better:
> 
> - you read descriptor atomically, if it's not 0 it's fine
> - same with write - write 0 to pass it to the other side
> - there is a data dependency so no need for barriers (except on dec alpha)
> - no need for power of 2 limitations, you can make it any size you like
> - easy to resize too
> 
> architecture (if not implementation) would be shared with ptr_ring so some
> of the optimization ideas like batched updates could be lifted from there.
> 
> When I was building ptr_ring, any head/tail design underperformed storing
> valid flag with data itself. YMMV.
> 
> --
> MST

I think you are definitely right in that there are ways in which
we can improve performance here. That said, the current queue
performs slightly better than the previous one we had that was
more or less a copy of one of your first virtio 1.1 proposals
from little over a year ago. It had bidirectional queues and a
valid flag in the descriptor itself. The reason we abandoned this
was not poor performance (it was good), but a need to go to
unidirectional queues. Maybe I should have only changed that
aspect and kept the valid flag.

Anyway, I will take a look at ptr_ring and run some experiments
along the lines of what you propose to get some
numbers. Considering your experience with these kind of
structures, you are likely right. I just need to convince
myself :-).

/Magnus

^ permalink raw reply

* [PATCHv2 net] sctp: do not check port in sctp_inet6_cmp_addr
From: Xin Long @ 2018-04-12  6:24 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

pf->cmp_addr() is called before binding a v6 address to the sock. It
should not check ports, like in sctp_inet_cmp_addr.

But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
sctp_v6_cmp_addr where it also compares the ports.

This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp:
lack the check for ports in sctp_v6_cmp_addr").

This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
but do the proper check for both v6 addrs and v4mapped addrs.

v1->v2:
  - define __sctp_v6_cmp_addr to do the common address comparison
    used for both pf and af v6 cmp_addr.

Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/ipv6.c | 60 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f1fc48e..09aba03 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -521,46 +521,49 @@ static void sctp_v6_to_addr(union sctp_addr *addr, struct in6_addr *saddr,
 	addr->v6.sin6_scope_id = 0;
 }
 
-/* Compare addresses exactly.
- * v4-mapped-v6 is also in consideration.
- */
-static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
-			    const union sctp_addr *addr2)
+static int __sctp_v6_cmp_addr(const union sctp_addr *addr1,
+			      const union sctp_addr *addr2)
 {
 	if (addr1->sa.sa_family != addr2->sa.sa_family) {
 		if (addr1->sa.sa_family == AF_INET &&
 		    addr2->sa.sa_family == AF_INET6 &&
-		    ipv6_addr_v4mapped(&addr2->v6.sin6_addr)) {
-			if (addr2->v6.sin6_port == addr1->v4.sin_port &&
-			    addr2->v6.sin6_addr.s6_addr32[3] ==
-			    addr1->v4.sin_addr.s_addr)
-				return 1;
-		}
+		    ipv6_addr_v4mapped(&addr2->v6.sin6_addr) &&
+		    addr2->v6.sin6_addr.s6_addr32[3] ==
+		    addr1->v4.sin_addr.s_addr)
+			return 1;
+
 		if (addr2->sa.sa_family == AF_INET &&
 		    addr1->sa.sa_family == AF_INET6 &&
-		    ipv6_addr_v4mapped(&addr1->v6.sin6_addr)) {
-			if (addr1->v6.sin6_port == addr2->v4.sin_port &&
-			    addr1->v6.sin6_addr.s6_addr32[3] ==
-			    addr2->v4.sin_addr.s_addr)
-				return 1;
-		}
+		    ipv6_addr_v4mapped(&addr1->v6.sin6_addr) &&
+		    addr1->v6.sin6_addr.s6_addr32[3] ==
+		    addr2->v4.sin_addr.s_addr)
+			return 1;
+
 		return 0;
 	}
-	if (addr1->v6.sin6_port != addr2->v6.sin6_port)
-		return 0;
+
 	if (!ipv6_addr_equal(&addr1->v6.sin6_addr, &addr2->v6.sin6_addr))
 		return 0;
+
 	/* If this is a linklocal address, compare the scope_id. */
-	if (ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
-		if (addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id &&
-		    (addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)) {
-			return 0;
-		}
-	}
+	if ((ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) &&
+	    addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id &&
+	    addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)
+		return 0;
 
 	return 1;
 }
 
+/* Compare addresses exactly.
+ * v4-mapped-v6 is also in consideration.
+ */
+static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
+			    const union sctp_addr *addr2)
+{
+	return __sctp_v6_cmp_addr(addr1, addr2) &&
+	       addr1->v6.sin6_port == addr2->v6.sin6_port;
+}
+
 /* Initialize addr struct to INADDR_ANY. */
 static void sctp_v6_inaddr_any(union sctp_addr *addr, __be16 port)
 {
@@ -846,8 +849,8 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
 			       const union sctp_addr *addr2,
 			       struct sctp_sock *opt)
 {
-	struct sctp_af *af1, *af2;
 	struct sock *sk = sctp_opt2sk(opt);
+	struct sctp_af *af1, *af2;
 
 	af1 = sctp_get_af_specific(addr1->sa.sa_family);
 	af2 = sctp_get_af_specific(addr2->sa.sa_family);
@@ -863,10 +866,7 @@ static int sctp_inet6_cmp_addr(const union sctp_addr *addr1,
 	if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2))
 		return 1;
 
-	if (addr1->sa.sa_family != addr2->sa.sa_family)
-		return 0;
-
-	return af1->cmp_addr(addr1, addr2);
+	return __sctp_v6_cmp_addr(addr1, addr2);
 }
 
 /* Verify that the provided sockaddr looks bindable.   Common verification,
-- 
2.1.0

^ permalink raw reply related

* Re: [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock critical section
From: Eric Dumazet @ 2018-04-12  2:59 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet@gmail.com, davem@davemloft.net
  Cc: netdev@vger.kernel.org
In-Reply-To: <1523490467.3587.20.camel@mellanox.com>



On 04/11/2018 04:47 PM, Saeed Mahameed wrote:
> 
> Well if we allow devices to access HW counters via FW command
> interfaces in ndo_get_stats and by testing mlx5 where we query up to 5
> hw registers, it could take 100us, still this is way smaller than 10sec
>  :) and it is really a nice rate to fetch HW stats on demand.

If hardware stats are slower than software ones, maybe it is time to use software stats,
instead of changing the whole stack ?

There are very few devices drivers having issues like that.

^ permalink raw reply

* Re: [PATCH v2] net: dsa: b53: Using sleep-able operations in b53_switch_reset_gpio
From: Phil Reid @ 2018-04-12  2:31 UTC (permalink / raw)
  To: Jia-Ju Bai, f.fainelli, andrew, vivien.didelot; +Cc: netdev, linux-kernel
In-Reply-To: <1523497702-7200-1-git-send-email-baijiaju1990@gmail.com>

On 12/04/2018 09:48, Jia-Ju Bai wrote:
> b53_switch_reset_gpio() is never called in atomic context.
> 
> The call chain ending up at b53_switch_reset_gpio() is:
> [1] b53_switch_reset_gpio() <- b53_switch_reset() <-
>      b53_reset_switch() <- b53_setup()
> 
> b53_switch_reset_gpio() is set as ".setup" in struct dsa_switch_ops.
> This function is not called in atomic context.
> 
> Despite never getting called from atomic context, b53_switch_reset_gpio()
> calls non-sleep operations mdelay() and gpio_set_value().
> They are not necessary and can be replaced with msleep()
> and gpio_set_value_cansleep().
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
> v2:
> * Use gpio_set_value_cansleep() to replace gpio_set_value() additionally.
>    Thanks for Florian and Phil for good advice.
> ---
>   drivers/net/dsa/b53/b53_common.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
> index 274f367..36cc60d 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -596,11 +596,11 @@ static void b53_switch_reset_gpio(struct b53_device *dev)
>   
>   	/* Reset sequence: RESET low(50ms)->high(20ms)
>   	 */
> -	gpio_set_value(gpio, 0);
> -	mdelay(50);
> +	gpio_set_value_cansleep(gpio, 0);
> +	msleep(50);
>   
> -	gpio_set_value(gpio, 1);
> -	mdelay(20);
> +	gpio_set_value_cansleep(gpio, 1);
> +	msleep(20);
>   
>   	dev->current_page = 0xff;
>   }
> 

FWIW:
Reviewed-by: Phil Reid <preid@electromag.com.au>

^ permalink raw reply

* Re: [PATCH v2] dec: tulip: de4x5: Replace mdelay with usleep_range in de4x5_hw_init
From: Jia-Ju Bai @ 2018-04-12  2:26 UTC (permalink / raw)
  To: arvindY, James Bottomley, davem, stephen, johannes.berg, dhowells
  Cc: netdev, linux-parisc, linux-kernel
In-Reply-To: <5ACEC2B6.9080904@gmail.com>



On 2018/4/12 10:21, arvindY wrote:
>
>
> On Thursday 12 April 2018 07:00 AM, Jia-Ju Bai wrote:
>>
>>
>> On 2018/4/12 0:16, James Bottomley wrote:
>>> On Wed, 2018-04-11 at 23:39 +0800, Jia-Ju Bai wrote:
>>>> de4x5_hw_init() is never called in atomic context.
>>>>
>>>> de4x5_hw_init() is only called by de4x5_pci_probe(), which is only
>>>> set as ".probe" in struct pci_driver.
>>>>
>>>> Despite never getting called from atomic context, de4x5_hw_init()
>>>> calls mdelay() to busily wait. This is not necessary and can be
>>>> replaced with usleep_range() to  avoid busy waiting.
>>>>
>>>> This is found by a static analysis tool named DCNS written by myself.
>>>> And I also manually check it.
>>> Did you actually test this?  The usual reason for wanting m/udelay is
>>> that the timing must be exact.  The driver is filled with mdelay()s for
>>> this reason.  The one you've picked on is in the init path so it won't
>>> affect the runtime in any way.  I also don't think we have the hrtimer
>>> machinery for usleep_range() to work properly on parisc, so I don't
>>> think the replacement works.
>>>
>>> James
>>>
>>
>> Hello, James.
>> Thanks for your reply :)
>>
>> I agree that usleep_range() here will not much affect the real 
>> execution of this driver.
>>
>> But I think usleep_range() can more opportunity for other threads to 
>> use the CPU core to schedule during waiting.
>> That is why I detect mdelay() that can be replaced with msleep() or 
>> usleep_range().
>>
>
> James is right, You have added all usleep_range() during system 
> boot-up time.
> During boot-up system will run as single threaded. Where this change will
> not make much sense. System first priority is match the exact timing on
> each and every boot-up.
>

Hello, Arvind.
Thanks for your reply :)

I admit I am not familiar with this driver.
I did not know this driver is only loaded during system boot-up time,
I thought this driver can be loaded as a kernel module (like many 
drivers) after system booting.
After knowing this, I admit my patch is not proper, sorry...


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* iproute2-4.16.0 no longer accepts routes via fe80::1
From: Thomas Deutschmann @ 2018-04-12  2:23 UTC (permalink / raw)
  To: netdev

Hi,

I upgraded to iproute2-4.16.0 and rebooted my system.

On start I noticed that my IPv6 default route via "fe80::1" was rejected:

> # ip route add default via ff80::1
> Error: inet address is expected rather than "ff80::1".

This works fine with <iproute2-4.16.0.

-- 
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5

^ permalink raw reply

* Re: [PATCH v2] dec: tulip: de4x5: Replace mdelay with usleep_range in de4x5_hw_init
From: arvindY @ 2018-04-12  2:21 UTC (permalink / raw)
  To: Jia-Ju Bai, James Bottomley, davem, stephen, johannes.berg,
	dhowells
  Cc: netdev, linux-parisc, linux-kernel
In-Reply-To: <8bac3385-97c6-fff1-17c6-11f5e98a039a@gmail.com>



On Thursday 12 April 2018 07:00 AM, Jia-Ju Bai wrote:
>
>
> On 2018/4/12 0:16, James Bottomley wrote:
>> On Wed, 2018-04-11 at 23:39 +0800, Jia-Ju Bai wrote:
>>> de4x5_hw_init() is never called in atomic context.
>>>
>>> de4x5_hw_init() is only called by de4x5_pci_probe(), which is only
>>> set as ".probe" in struct pci_driver.
>>>
>>> Despite never getting called from atomic context, de4x5_hw_init()
>>> calls mdelay() to busily wait. This is not necessary and can be
>>> replaced with usleep_range() to  avoid busy waiting.
>>>
>>> This is found by a static analysis tool named DCNS written by myself.
>>> And I also manually check it.
>> Did you actually test this?  The usual reason for wanting m/udelay is
>> that the timing must be exact.  The driver is filled with mdelay()s for
>> this reason.  The one you've picked on is in the init path so it won't
>> affect the runtime in any way.  I also don't think we have the hrtimer
>> machinery for usleep_range() to work properly on parisc, so I don't
>> think the replacement works.
>>
>> James
>>
>
> Hello, James.
> Thanks for your reply :)
>
> I agree that usleep_range() here will not much affect the real 
> execution of this driver.
>
> But I think usleep_range() can more opportunity for other threads to 
> use the CPU core to schedule during waiting.
> That is why I detect mdelay() that can be replaced with msleep() or 
> usleep_range().
>

James is right, You have added all usleep_range() during system boot-up 
time.
During boot-up system will run as single threaded. Where this change will
not make much sense. System first priority is match the exact timing on
each and every boot-up.

~arvind

>
> Best wishes,
> Jia-Ju Bai

^ permalink raw reply

* Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap
From: Michael S. Tsirkin @ 2018-04-12  2:15 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, alexander.h.duyck, alexander.duyck,
	john.fastabend, ast, brouer, willemdebruijn.kernel, daniel,
	netdev, michael.lundkvist, jesse.brandeburg, anjali.singhai,
	qi.z.zhang, ravineet.singh
In-Reply-To: <20180327165919.17933-4-bjorn.topel@gmail.com>

On Tue, Mar 27, 2018 at 06:59:08PM +0200, Björn Töpel wrote:
> @@ -30,4 +31,18 @@ struct xdp_umem_reg {
>  	__u32 frame_headroom; /* Frame head room */
>  };
>  
> +/* Pgoff for mmaping the rings */
> +#define XDP_UMEM_PGOFF_FILL_QUEUE	0x100000000
> +
> +struct xdp_queue {
> +	__u32 head_idx __attribute__((aligned(64)));
> +	__u32 tail_idx __attribute__((aligned(64)));
> +};
> +
> +/* Used for the fill and completion queues for buffers */
> +struct xdp_umem_queue {
> +	struct xdp_queue ptrs;
> +	__u32 desc[0] __attribute__((aligned(64)));
> +};
> +
>  #endif /* _LINUX_IF_XDP_H */

So IIUC it's a head/tail ring of 32 bit descriptors.

In my experience (from implementing ptr_ring) this
implies that head/tail cache lines bounce a lot between
CPUs. Caching will help some. You are also forced to
use barriers to check validity which is slow on
some architectures.

If instead you can use a special descriptor value (e.g. 0) as
a valid signal, things work much better:

- you read descriptor atomically, if it's not 0 it's fine
- same with write - write 0 to pass it to the other side
- there is a data dependency so no need for barriers (except on dec alpha)
- no need for power of 2 limitations, you can make it any size you like
- easy to resize too

architecture (if not implementation) would be shared with ptr_ring
so some of the optimization ideas like batched updates could
be lifted from there.

When I was building ptr_ring, any head/tail design underperformed
storing valid flag with data itself. YMMV.

-- 
MST

^ permalink raw reply

* iproute2-4.16.0 no longer accepts routes via fe80::1
From: Thomas Deutschmann @ 2018-04-12  1:50 UTC (permalink / raw)
  To: netdev

Hi,

I upgraded to iproute2-4.16.0 and rebooted my system.

On start I noticed that my IPv6 default route via "fe80::1" was rejected:

> # ip route add default via ff80::1
> Error: inet address is expected rather than "ff80::1".

This works with <iproute2-4.16.0.

-- 
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5

^ permalink raw reply

* Re: [PATCH] net: dsa: b53: Replace mdelay with msleep in b53_switch_reset_gpio
From: Jia-Ju Bai @ 2018-04-12  1:49 UTC (permalink / raw)
  To: Florian Fainelli, Phil Reid, andrew, vivien.didelot; +Cc: netdev, linux-kernel
In-Reply-To: <7d376874-8078-0252-ed7e-29392a519fc8@gmail.com>



On 2018/4/12 0:19, Florian Fainelli wrote:
> On 04/11/2018 12:14 AM, Jia-Ju Bai wrote:
>>
>> On 2018/4/11 13:30, Phil Reid wrote:
>>> On 11/04/2018 09:51, Jia-Ju Bai wrote:
>>>> b53_switch_reset_gpio() is never called in atomic context.
>>>>
>>>> The call chain ending up at b53_switch_reset_gpio() is:
>>>> [1] b53_switch_reset_gpio() <- b53_switch_reset() <-
>>>>      b53_reset_switch() <- b53_setup()
>>>>
>>>> b53_switch_reset_gpio() is set as ".setup" in struct dsa_switch_ops.
>>>> This function is not called in atomic context.
>>>>
>>>> Despite never getting called from atomic context,
>>>> b53_switch_reset_gpio()
>>>> calls mdelay() to busily wait.
>>>> This is not necessary and can be replaced with msleep() to
>>>> avoid busy waiting.
>>>>
>>>> This is found by a static analysis tool named DCNS written by myself.
>>>> And I also manually check it.
>>>>
>>>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>>>> ---
>>>>    drivers/net/dsa/b53/b53_common.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/dsa/b53/b53_common.c
>>>> b/drivers/net/dsa/b53/b53_common.c
>>>> index 274f367..e070ff6 100644
>>>> --- a/drivers/net/dsa/b53/b53_common.c
>>>> +++ b/drivers/net/dsa/b53/b53_common.c
>>>> @@ -597,10 +597,10 @@ static void b53_switch_reset_gpio(struct
>>>> b53_device *dev)
>>>>        /* Reset sequence: RESET low(50ms)->high(20ms)
>>>>         */
>>>>        gpio_set_value(gpio, 0);
>>>> -    mdelay(50);
>>>> +    msleep(50);
>>>>          gpio_set_value(gpio, 1);
>>>> -    mdelay(20);
>>>> +    msleep(20);
>>>>          dev->current_page = 0xff;
>>>>    }
>>>>
>>> Would that also imply gpio_set_value could be gpio_set_value_cansleep?
>>>
>> Yes, I think gpio_set_value_cansleep() is okay here?
>> Do I need to send a V2 patch to replace gpio_set_value()?
> Yes, I would lump these two changes in the same patch since this is
> effectively about solving sleeping vs. non sleeping operations.

Okay, I have sent a V2 patch, and you can have a look :)


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* [PATCH v2] net: dsa: b53: Using sleep-able operations in b53_switch_reset_gpio
From: Jia-Ju Bai @ 2018-04-12  1:48 UTC (permalink / raw)
  To: f.fainelli, andrew, vivien.didelot, preid
  Cc: netdev, linux-kernel, Jia-Ju Bai

b53_switch_reset_gpio() is never called in atomic context.

The call chain ending up at b53_switch_reset_gpio() is:
[1] b53_switch_reset_gpio() <- b53_switch_reset() <-
    b53_reset_switch() <- b53_setup()

b53_switch_reset_gpio() is set as ".setup" in struct dsa_switch_ops.
This function is not called in atomic context.

Despite never getting called from atomic context, b53_switch_reset_gpio()
calls non-sleep operations mdelay() and gpio_set_value().
They are not necessary and can be replaced with msleep() 
and gpio_set_value_cansleep().

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
v2:
* Use gpio_set_value_cansleep() to replace gpio_set_value() additionally.
  Thanks for Florian and Phil for good advice.
---
 drivers/net/dsa/b53/b53_common.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 274f367..36cc60d 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -596,11 +596,11 @@ static void b53_switch_reset_gpio(struct b53_device *dev)
 
 	/* Reset sequence: RESET low(50ms)->high(20ms)
 	 */
-	gpio_set_value(gpio, 0);
-	mdelay(50);
+	gpio_set_value_cansleep(gpio, 0);
+	msleep(50);
 
-	gpio_set_value(gpio, 1);
-	mdelay(20);
+	gpio_set_value_cansleep(gpio, 1);
+	msleep(20);
 
 	dev->current_page = 0xff;
 }
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2] dec: tulip: de4x5: Replace mdelay with usleep_range in de4x5_hw_init
From: Jia-Ju Bai @ 2018-04-12  1:30 UTC (permalink / raw)
  To: James Bottomley, davem, stephen, johannes.berg, arvind.yadav.cs,
	dhowells
  Cc: netdev, linux-parisc, linux-kernel
In-Reply-To: <1523463379.3221.18.camel@HansenPartnership.com>



On 2018/4/12 0:16, James Bottomley wrote:
> On Wed, 2018-04-11 at 23:39 +0800, Jia-Ju Bai wrote:
>> de4x5_hw_init() is never called in atomic context.
>>
>> de4x5_hw_init() is only called by de4x5_pci_probe(), which is only
>> set as ".probe" in struct pci_driver.
>>
>> Despite never getting called from atomic context, de4x5_hw_init()
>> calls mdelay() to busily wait. This is not necessary and can be
>> replaced with usleep_range() to  avoid busy waiting.
>>
>> This is found by a static analysis tool named DCNS written by myself.
>> And I also manually check it.
> Did you actually test this?  The usual reason for wanting m/udelay is
> that the timing must be exact.  The driver is filled with mdelay()s for
> this reason.  The one you've picked on is in the init path so it won't
> affect the runtime in any way.  I also don't think we have the hrtimer
> machinery for usleep_range() to work properly on parisc, so I don't
> think the replacement works.
>
> James
>

Hello, James.
Thanks for your reply :)

I agree that usleep_range() here will not much affect the real execution 
of this driver.

But I think usleep_range() can more opportunity for other threads to use 
the CPU core to schedule during waiting.
That is why I detect mdelay() that can be replaced with msleep() or 
usleep_range().


Best wishes,
Jia-Ju Bai

^ permalink raw reply

* RE: WARNING in kobject_add_internal
From: Yuan, Linyu (NSB - CN/Shanghai) @ 2018-04-12  0:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: bridge@lists.linux-foundation.org, David Miller,
	Greg Kroah-Hartman, LKML, netdev, stephen hemminger,
	syzkaller-bugs
In-Reply-To: <CACT4Y+bO2HN9Tuw2iMND9tD8578DNSOwUs7_W6Rvg-8oX5Qipw@mail.gmail.com>

Hi,

I have a question,
"can syzbot auto test each tree with newest changeset" ?

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Dmitry Vyukov
> Sent: Wednesday, April 11, 2018 10:58 PM
> To: syzbot
> Cc: bridge@lists.linux-foundation.org; David Miller; Greg Kroah-Hartman;
> LKML; netdev; stephen hemminger; syzkaller-bugs
> Subject: Re: WARNING in kobject_add_internal
> 
> On Fri, Jan 5, 2018 at 10:41 PM, syzbot
> <syzbot+e204ced820ef739d71ef5438f5e1976a874abc8d@syzkaller.appspotma
> il.com>
> wrote:
> > syzkaller has found reproducer for the following crash on
> > 89876f275e8d562912d9c238cd888b52065cf25c
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> >
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by:
> >
> syzbot+e204ced820ef739d71ef5438f5e1976a874abc8d@syzkaller.appspotmail
> .com
> > It will help syzbot understand when the bug is fixed.
> 
> #syz dup: WARNING: kobject bug in device_add
> 
> > ------------[ cut here ]------------
> > kobject_add_internal failed for   (error: -12 parent: net)
> > WARNING: CPU: 1 PID: 3494 at lib/kobject.c:244
> > kobject_add_internal+0x3f6/0xbc0 lib/kobject.c:242
> > Kernel panic - not syncing: panic_on_warn set ...
> >
> > CPU: 1 PID: 3494 Comm: syzkaller425998 Not tainted 4.15.0-rc6+ #249
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> >  panic+0x1e4/0x41c kernel/panic.c:183
> >  __warn+0x1dc/0x200 kernel/panic.c:547
> >  report_bug+0x211/0x2d0 lib/bug.c:184
> >  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> >  fixup_bug arch/x86/kernel/traps.c:247 [inline]
> >  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> >  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
> > RIP: 0010:kobject_add_internal+0x3f6/0xbc0 lib/kobject.c:242
> > RSP: 0018:ffff8801c53c76f0 EFLAGS: 00010286
> > RAX: dffffc0000000008 RBX: ffff8801bf5a88d8 RCX: ffffffff8159da9e
> > RDX: 0000000000000000 RSI: 1ffff10038a78e99 RDI: ffff8801c53c73f8
> > RBP: ffff8801c53c77e8 R08: 1ffff10038a78e5b R09: 0000000000000000
> > R10: ffff8801c53c74b0 R11: 0000000000000000 R12: 1ffff10038a78ee4
> > R13: 00000000fffffff4 R14: ffff8801d8359a80 R15: ffffffff86201980
> >  kobject_add_varg lib/kobject.c:366 [inline]
> >  kobject_add+0x132/0x1f0 lib/kobject.c:411
> >  device_add+0x35d/0x1650 drivers/base/core.c:1787
> >  netdev_register_kobject+0x183/0x360 net/core/net-sysfs.c:1604
> >  register_netdevice+0xb2b/0x1010 net/core/dev.c:7698
> >  tun_set_iff drivers/net/tun.c:2319 [inline]
> >  __tun_chr_ioctl+0x1d89/0x3dd0 drivers/net/tun.c:2524
> >  tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2773
> >  vfs_ioctl fs/ioctl.c:46 [inline]
> >  do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
> >  SYSC_ioctl fs/ioctl.c:701 [inline]
> >  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
> >  entry_SYSCALL_64_fastpath+0x23/0x9a
> > RIP: 0033:0x444fc9
> > RSP: 002b:00007fff53389dc8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000444fc9
> > RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000004
> > RBP: 0000000000000005 R08: 0000000000000002 R09: 0000006f00003131
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402500
> > R13: 0000000000402590 R14: 0000000000000000 R15: 0000000000000000
> >
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >

^ permalink raw reply

* [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev
In-Reply-To: <20180411235230.19135.63857.stgit@john-Precision-Tower-5810>

Relying on map_release hook to decrement the reference counts when a
map is removed only works if the map is not being pinned. In the
pinned case the ref is decremented immediately and the BPF programs
released. After this BPF programs may not be in-use which is not
what the user would expect.

This patch moves the release logic into bpf_map_put_uref() and brings
sockmap in-line with how a similar case is handled in prog array maps.

Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h  |    3 +++
 kernel/bpf/sockmap.c |    3 +--
 kernel/bpf/syscall.c |   10 +++++++++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 95a7abd..8a71058 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -646,6 +646,7 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
 #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_INET)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
 int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
+void sock_map_release(struct bpf_map *map);
 #else
 static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -658,6 +659,8 @@ static inline int sock_map_prog(struct bpf_map *map,
 {
 	return -EOPNOTSUPP;
 }
+
+void sock_map_release(struct bpf_map *map) {}
 #endif
 
 /* verifier prototypes for helper functions called from eBPF programs */
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 8dd9210..cef187f 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1834,7 +1834,7 @@ static int sock_map_update_elem(struct bpf_map *map,
 	return err;
 }
 
-static void sock_map_release(struct bpf_map *map, struct file *map_file)
+void sock_map_release(struct bpf_map *map)
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
 	struct bpf_prog *orig;
@@ -1858,7 +1858,6 @@ static void sock_map_release(struct bpf_map *map, struct file *map_file)
 	.map_get_next_key = sock_map_get_next_key,
 	.map_update_elem = sock_map_update_elem,
 	.map_delete_elem = sock_map_delete_elem,
-	.map_release = sock_map_release,
 };
 
 BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0244973..449a618 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -257,8 +257,16 @@ static void bpf_map_free_deferred(struct work_struct *work)
 static void bpf_map_put_uref(struct bpf_map *map)
 {
 	if (atomic_dec_and_test(&map->usercnt)) {
-		if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY)
+		switch (map->map_type) {
+		case BPF_MAP_TYPE_PROG_ARRAY:
 			bpf_fd_array_map_clear(map);
+			break;
+		case BPF_MAP_TYPE_SOCKMAP:
+			sock_map_release(map);
+			break;
+		default:
+			break;
+		}
 	}
 }
 

^ permalink raw reply related

* [bpf PATCH 2/3] bpf: sockmap, sk_wait_event needed to handle blocking cases
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev
In-Reply-To: <20180411235230.19135.63857.stgit@john-Precision-Tower-5810>

In the recvmsg handler we need to add a wait event to support the
blocking use cases. Without this we return zero and may confuse
user applications. In the wait event any data received on the
sk either via sk_receive_queue or the psock ingress list will
wake up the sock.

Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index cef187f..ffe266b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -43,6 +43,7 @@
 #include <net/tcp.h>
 #include <linux/ptr_ring.h>
 #include <net/inet_common.h>
+#include <linux/sched/signal.h>
 
 #define SOCK_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -732,6 +733,26 @@ static int bpf_exec_tx_verdict(struct smap_psock *psock,
 	return err;
 }
 
+static int bpf_wait_data(struct sock *sk,
+			 struct smap_psock *psk, int flags,
+			 long timeo, int *err)
+{
+	int rc;
+
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	rc = sk_wait_event(sk, &timeo,
+			   !list_empty(&psk->ingress) ||
+			   !skb_queue_empty(&sk->sk_receive_queue),
+			   &wait);
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+
+	return rc;
+}
+
 static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			   int nonblock, int flags, int *addr_len)
 {
@@ -755,6 +776,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 
 	lock_sock(sk);
+bytes_ready:
 	while (copied != len) {
 		struct scatterlist *sg;
 		struct sk_msg_buff *md;
@@ -809,6 +831,29 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 	}
 
+	if (!copied) {
+		long timeo;
+		int data;
+		int err = 0;
+
+		timeo = sock_rcvtimeo(sk, nonblock);
+		data = bpf_wait_data(sk, psock, flags, timeo, &err);
+
+		if (data) {
+			if (!skb_queue_empty(&sk->sk_receive_queue)) {
+				release_sock(sk);
+				smap_release_sock(psock, sk);
+				copied = tcp_recvmsg(sk, msg, len,
+						     nonblock, flags, addr_len);
+				return copied;
+			}
+			goto bytes_ready;
+		}
+
+		if (err)
+			copied = err;
+	}
+
 	release_sock(sk);
 	smap_release_sock(psock, sk);
 	return copied;

^ permalink raw reply related

* [bpf PATCH 0/3] BPF, a couple sockmap fixes
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

While testing sockmap with more programs (besides our test programs)
I found a couple issues.

The attached series fixes an issue where pinned maps were not
working correctly, blocking sockets returned zero, and an error
path that when the sock hit an out of memory case resulted in a
double page_put() while doing ingress redirects.

See individual patches for more details.

Thanks,
John

---

John Fastabend (3):
      bpf: sockmap, map_release does not hold refcnt for pinned maps
      bpf: sockmap, sk_wait_event needed to handle blocking cases
      bpf: sockmap, fix double put_page on ENOMEM error in redirect path

 include/linux/bpf.h  |    3 +++
 kernel/bpf/sockmap.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++----
 kernel/bpf/syscall.c |   10 +++++++++-
 3 files changed, 59 insertions(+), 5 deletions(-)

^ permalink raw reply

* [bpf PATCH 3/3] bpf: sockmap, fix double put_page on ENOMEM error in redirect path
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev
In-Reply-To: <20180411235230.19135.63857.stgit@john-Precision-Tower-5810>

In the case where the socket memory boundary is hit the redirect
path returns an ENOMEM error. However, before checking for this
condition the redirect scatterlist buffer is setup with a valid
page and length. This is never unwound so when the buffers are
released latter in the error path we do a put_page() and clear
the scatterlist fields. But, because the initial error happens
before completing the scatterlist buffer we end up with both the
original buffer and the redirect buffer pointing to the same page
resulting in duplicate put_page() calls.

To fix this simply move the initial configuration of the redirect
scatterlist buffer below the sock memory check.

Found this while running TCP_STREAM test with netperf using Cilium.

Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index ffe266b..0fefdb0 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -524,8 +524,6 @@ static int bpf_tcp_ingress(struct sock *sk, int apply_bytes,
 	i = md->sg_start;

 	do {
-		r->sg_data[i] = md->sg_data[i];
-
 		size = (apply && apply_bytes < md->sg_data[i].length) ?
 			apply_bytes : md->sg_data[i].length;

@@ -536,6 +534,7 @@ static int bpf_tcp_ingress(struct sock *sk, int apply_bytes,
 		}

 		sk_mem_charge(sk, size);
+		r->sg_data[i] = md->sg_data[i];
 		r->sg_data[i].length = size;
 		md->sg_data[i].length -= size;
 		md->sg_data[i].offset += size;

^ permalink raw reply related

* [PATCH net 4/4] nfp: flower: split and limit cmsg skb lists
From: Jakub Kicinski @ 2018-04-11 23:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren
In-Reply-To: <20180411234738.6766-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Introduce a second skb list for handling control messages and limit the
number of allowed messages. Some control messages are considered more
crucial than others, resulting in the need for a second skb list. By
splitting the list into a separate high and low priority list we can
ensure that messages on the high list get added to the head of the list
that gets processed, this however has no functional impact. Previously
there was no limit on the number of messages allowed on the queue, this
could result in the queue growing boundlessly and eventually the host
running out of memory.

Fixes: b985f870a5f0 ("nfp: process control messages in workqueue in flower app")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 38 +++++++++++++++++++++---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h |  2 ++
 drivers/net/ethernet/netronome/nfp/flower/main.c |  6 ++--
 drivers/net/ethernet/netronome/nfp/flower/main.h |  8 +++--
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 8ad857eb89c6..577659f332e4 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -272,18 +272,49 @@ nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
 
 void nfp_flower_cmsg_process_rx(struct work_struct *work)
 {
+	struct sk_buff_head cmsg_joined;
 	struct nfp_flower_priv *priv;
 	struct sk_buff *skb;
 
 	priv = container_of(work, struct nfp_flower_priv, cmsg_work);
+	skb_queue_head_init(&cmsg_joined);
 
-	while ((skb = skb_dequeue(&priv->cmsg_skbs)))
+	spin_lock_bh(&priv->cmsg_skbs_high.lock);
+	skb_queue_splice_tail_init(&priv->cmsg_skbs_high, &cmsg_joined);
+	spin_unlock_bh(&priv->cmsg_skbs_high.lock);
+
+	spin_lock_bh(&priv->cmsg_skbs_low.lock);
+	skb_queue_splice_tail_init(&priv->cmsg_skbs_low, &cmsg_joined);
+	spin_unlock_bh(&priv->cmsg_skbs_low.lock);
+
+	while ((skb = __skb_dequeue(&cmsg_joined)))
 		nfp_flower_cmsg_process_one_rx(priv->app, skb);
 }
 
-void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb)
+static void
+nfp_flower_queue_ctl_msg(struct nfp_app *app, struct sk_buff *skb, int type)
 {
 	struct nfp_flower_priv *priv = app->priv;
+	struct sk_buff_head *skb_head;
+
+	if (type == NFP_FLOWER_CMSG_TYPE_PORT_REIFY ||
+	    type == NFP_FLOWER_CMSG_TYPE_PORT_MOD)
+		skb_head = &priv->cmsg_skbs_high;
+	else
+		skb_head = &priv->cmsg_skbs_low;
+
+	if (skb_queue_len(skb_head) >= NFP_FLOWER_WORKQ_MAX_SKBS) {
+		nfp_flower_cmsg_warn(app, "Dropping queued control messages\n");
+		dev_kfree_skb_any(skb);
+		return;
+	}
+
+	skb_queue_tail(skb_head, skb);
+	schedule_work(&priv->cmsg_work);
+}
+
+void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb)
+{
 	struct nfp_flower_cmsg_hdr *cmsg_hdr;
 
 	cmsg_hdr = nfp_flower_cmsg_get_hdr(skb);
@@ -307,7 +338,6 @@ void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb)
 		/* Acks from the NFP that the route is added - ignore. */
 		dev_consume_skb_any(skb);
 	} else {
-		skb_queue_tail(&priv->cmsg_skbs, skb);
-		schedule_work(&priv->cmsg_work);
+		nfp_flower_queue_ctl_msg(app, skb, cmsg_hdr->type);
 	}
 }
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 96bc0e33980c..b6c0fd053a50 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -108,6 +108,8 @@
 #define NFP_FL_IPV4_TUNNEL_TYPE		GENMASK(7, 4)
 #define NFP_FL_IPV4_PRE_TUN_INDEX	GENMASK(2, 0)
 
+#define NFP_FLOWER_WORKQ_MAX_SKBS	30000
+
 #define nfp_flower_cmsg_warn(app, fmt, args...)                         \
 	do {                                                            \
 		if (net_ratelimit())                                    \
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 6357e0720f43..ad02592a82b7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -519,7 +519,8 @@ static int nfp_flower_init(struct nfp_app *app)
 
 	app->priv = app_priv;
 	app_priv->app = app;
-	skb_queue_head_init(&app_priv->cmsg_skbs);
+	skb_queue_head_init(&app_priv->cmsg_skbs_high);
+	skb_queue_head_init(&app_priv->cmsg_skbs_low);
 	INIT_WORK(&app_priv->cmsg_work, nfp_flower_cmsg_process_rx);
 	init_waitqueue_head(&app_priv->reify_wait_queue);
 
@@ -549,7 +550,8 @@ static void nfp_flower_clean(struct nfp_app *app)
 {
 	struct nfp_flower_priv *app_priv = app->priv;
 
-	skb_queue_purge(&app_priv->cmsg_skbs);
+	skb_queue_purge(&app_priv->cmsg_skbs_high);
+	skb_queue_purge(&app_priv->cmsg_skbs_low);
 	flush_work(&app_priv->cmsg_work);
 
 	nfp_flower_metadata_cleanup(app);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index e030b3ce4510..c67e1b54c614 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -107,7 +107,10 @@ struct nfp_mtu_conf {
  * @mask_table:		Hash table used to store masks
  * @flow_table:		Hash table used to store flower rules
  * @cmsg_work:		Workqueue for control messages processing
- * @cmsg_skbs:		List of skbs for control message processing
+ * @cmsg_skbs_high:	List of higher priority skbs for control message
+ *			processing
+ * @cmsg_skbs_low:	List of lower priority skbs for control message
+ *			processing
  * @nfp_mac_off_list:	List of MAC addresses to offload
  * @nfp_mac_index_list:	List of unique 8-bit indexes for non NFP netdevs
  * @nfp_ipv4_off_list:	List of IPv4 addresses to offload
@@ -136,7 +139,8 @@ struct nfp_flower_priv {
 	DECLARE_HASHTABLE(mask_table, NFP_FLOWER_MASK_HASH_BITS);
 	DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
 	struct work_struct cmsg_work;
-	struct sk_buff_head cmsg_skbs;
+	struct sk_buff_head cmsg_skbs_high;
+	struct sk_buff_head cmsg_skbs_low;
 	struct list_head nfp_mac_off_list;
 	struct list_head nfp_mac_index_list;
 	struct list_head nfp_ipv4_off_list;
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 3/4] nfp: flower: move route ack control messages out of the workqueue
From: Jakub Kicinski @ 2018-04-11 23:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren
In-Reply-To: <20180411234738.6766-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously we processed the route ack control messages in the workqueue,
this unnecessarily loads the workqueue. We can deal with these messages
sooner as we know we are going to drop them.

Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 3735c09d2112..8ad857eb89c6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -258,9 +258,6 @@ nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
 	case NFP_FLOWER_CMSG_TYPE_ACTIVE_TUNS:
 		nfp_tunnel_keep_alive(app, skb);
 		break;
-	case NFP_FLOWER_CMSG_TYPE_TUN_NEIGH:
-		/* Acks from the NFP that the route is added - ignore. */
-		break;
 	default:
 		nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control type %u\n",
 				     type);
@@ -306,6 +303,9 @@ void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb)
 		   nfp_flower_process_mtu_ack(app, skb)) {
 		/* Handle MTU acks outside wq to prevent RTNL conflict. */
 		dev_consume_skb_any(skb);
+	} else if (cmsg_hdr->type == NFP_FLOWER_CMSG_TYPE_TUN_NEIGH) {
+		/* Acks from the NFP that the route is added - ignore. */
+		dev_consume_skb_any(skb);
 	} else {
 		skb_queue_tail(&priv->cmsg_skbs, skb);
 		schedule_work(&priv->cmsg_work);
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 2/4] nfp: print a message when mutex wait is interrupted
From: Jakub Kicinski @ 2018-04-11 23:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180411234738.6766-1-jakub.kicinski@netronome.com>

When waiting for an NFP mutex is interrupted print a message
to make root causing later error messages easier.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
index f7b958181126..cb28ac03e4ca 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
@@ -211,8 +211,11 @@ int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
 			break;
 
 		err = msleep_interruptible(timeout_ms);
-		if (err != 0)
+		if (err != 0) {
+			nfp_info(mutex->cpp,
+				 "interrupted waiting for NFP mutex\n");
 			return -ERESTARTSYS;
+		}
 
 		if (time_is_before_eq_jiffies(warn_at)) {
 			warn_at = jiffies + NFP_MUTEX_WAIT_NEXT_WARN * HZ;
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 1/4] nfp: ignore signals when communicating with management FW
From: Jakub Kicinski @ 2018-04-11 23:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180411234738.6766-1-jakub.kicinski@netronome.com>

We currently allow signals to interrupt the wait for management FW
commands.  Exiting the wait should not cause trouble, the FW will
just finish executing the command in the background and new commands
will wait for the old one to finish.

However, this may not be what users expect (Ctrl-C not actually stopping
the command).  Moreover some systems routinely request link information
with signals pending (Ubuntu 14.04 runs a landscape-sysinfo python tool
from MOTD) worrying users with errors like these:

nfp 0000:04:00.0: nfp_nsp: Error -512 waiting for code 0x0007 to start
nfp 0000:04:00.0: nfp: reading port table failed -512

Make the wait for management FW responses non-interruptible.

Fixes: 1a64821c6af7 ("nfp: add support for service processor access")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 99bb679a9801..2abee0fe3a7c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -281,8 +281,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg, u32 nsp_cpp, u64 addr,
 		if ((*reg & mask) == val)
 			return 0;

-		if (msleep_interruptible(25))
-			return -ERESTARTSYS;
+		msleep(25);

 		if (time_after(start_time, wait_until))
 			return -ETIMEDOUT;
-- 
2.16.2

^ permalink raw reply related

* [PATCH net 0/4] nfp: improve signal handing on FW waits and flower control message processing
From: Jakub Kicinski @ 2018-04-11 23:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Hi!

The first part of this set aims to improve handling of interrupted
waits.  Patch 1 makes waiting for management FW responses
uninterruptible while patch 2 adds a message when signal arrives
while waiting for an NFP mutex.  We can't interrupt execution of
FW commands so uninterruptible sleep seems reasonable there.
Exiting a wait for a mutex should be clean and have no side affects
so we are allowing to abort it.  Note that both waits have rather
large timeouts (tens of seconds).

Patches 3 and 4 improve flower offload operation under heavy load.
Currently there is no cap on the number of queued FW notifications.
Some of the notifications have to be processed from a workqueue
which may lead to very large number of messages getting queued
if workqueue never gets a chance to run.  Pieter puts a limit
on number of queued messages, tries to drop some messages we ignore
without queuing and process more important messages first.

Jakub Kicinski (2):
  nfp: ignore signals when communicating with management FW
  nfp: print a message when mutex wait is interrupted

Pieter Jansen van Vuuren (2):
  nfp: flower: move route ack control messages out of the workqueue
  nfp: flower: split and limit cmsg skb lists

 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   | 44 ++++++++++++++++++----
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  2 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  6 ++-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  8 +++-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c |  5 ++-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  3 +-
 6 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.16.2

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox