* Re: [PATCH] net: ethernet: slicoss: use module_pci_driver()
From: David Miller @ 2016-12-08 18:01 UTC (permalink / raw)
To: tklauser; +Cc: LinoSanfilippo, netdev
In-Reply-To: <20161207134330.8829-1-tklauser@distanz.ch>
From: Tobias Klauser <tklauser@distanz.ch>
Date: Wed, 7 Dec 2016 14:43:30 +0100
> Use module_pci_driver() to get rid of some boilerplate code.
>
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Applied.
^ permalink raw reply
* Re: net: deadlock on genl_mutex
From: Dmitry Vyukov @ 2016-12-08 18:02 UTC (permalink / raw)
To: syzkaller
Cc: Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen,
Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert,
netdev, LKML, Richard Guy Briggs, netdev-owner
In-Reply-To: <CACT4Y+Zy82UAJ55VbPbVadUM92ZSx1VJCFPdhhcmj53uxZ5PXQ@mail.gmail.com>
On Thu, Dec 8, 2016 at 6:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 29, 2016 at 6:59 AM, <subashab@codeaurora.org> wrote:
>>>>
>>>> Issue was reported yesterday and is under investigation.
>>>>
>>>>
>>>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>>>
>>>>
>>>> Thanks !
>>>
>>>
>>> Hi Dmitry
>>>
>>> Can you try the patch below with your reproducer? I haven't seen similar
>>> crashes reported after this (or even with Eric's patch).
>>
>> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
>> _not_ see this report happening anymore.
>> Thanks.
>
>
> But now I am seeing "possible deadlock" warnings involving genl_lock:
>
> [ INFO: possible circular locking dependency detected ]
> 4.9.0-rc8+ #77 Not tainted
> -------------------------------------------------------
> syz-executor7/18794 is trying to acquire lock:
> (rtnl_mutex){+.+.+.}, at: [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
> but task is already holding lock:
> (genl_mutex){+.+.+.}, at: [< inline >] genl_lock
> net/netlink/genetlink.c:31
> (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> [ 315.403815] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
> [ 315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
> [ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
> [ 315.403815] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
> [ 315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
> [ 315.403815] [< inline >] genl_lock net/netlink/genetlink.c:31
> [ 315.403815] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
> net/netlink/genetlink.c:518
> [ 315.403815] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
> net/netlink/af_netlink.c:2127
> [ 315.403815] [<ffffffff86cb7b6a>]
> __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
> [ 315.403815] [<ffffffff86cc2319>]
> genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
> [ 315.403815] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
> [ 315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
> [ 315.403815] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
> [ 315.403815] [< inline >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
> [ 315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
> [ 315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
> [ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
> [ 315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
> [ 315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
> [ 315.403815] [< inline >] new_sync_write fs/read_write.c:499
> [ 315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
> [ 315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
> [ 315.403815] [< inline >] SYSC_write fs/read_write.c:607
> [ 315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
> [ 315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> [ 315.403815] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
> [ 315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
> [ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
> [ 315.403815] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
> [ 315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
> [ 315.403815] [<ffffffff86cb7779>]
> __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
> [ 315.403815] [< inline >] netlink_dump_start
> include/linux/netlink.h:165
> [ 315.403815] [<ffffffff86d14d48>]
> ctnetlink_stat_ct_cpu+0x198/0x1e0
> net/netfilter/nf_conntrack_netlink.c:2045
> [ 315.403815] [<ffffffff86cd313e>]
> nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
> [ 315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
> [ 315.403815] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
> net/netfilter/nfnetlink.c:474
> [ 315.403815] [< inline >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
> [ 315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
> [ 315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
> [ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
> [ 315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
> [ 315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
> [ 315.403815] [< inline >] new_sync_write fs/read_write.c:499
> [ 315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
> [ 315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
> [ 315.403815] [< inline >] SYSC_write fs/read_write.c:607
> [ 315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
> [ 315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> [ 315.403815] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
> [ 315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
> [ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
> [ 315.403815] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
> [ 315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
> [ 315.403815] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
> net/netfilter/nfnetlink.c:61
> [ 315.403815] [<ffffffff86d7c5b1>]
> nf_tables_netdev_event+0x1f1/0x720
> net/netfilter/nf_tables_netdev.c:122
> [ 315.403815] [<ffffffff8149095a>]
> notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
> [ 315.403815] [< inline >] __raw_notifier_call_chain
> kernel/notifier.c:394
> [ 315.403815] [<ffffffff81490b82>]
> raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
> [ 315.403815] [<ffffffff86ae4af6>]
> call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
> [ 315.403815] [< inline >] call_netdevice_notifiers
> net/core/dev.c:1661
> [ 315.403815] [<ffffffff86af898d>]
> rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
> [ 315.403815] [<ffffffff86af8e9e>]
> rollback_registered+0xae/0x100 net/core/dev.c:6800
> [ 315.403815] [<ffffffff86af8f76>]
> unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
> [ 315.403815] [< inline >] unregister_netdevice
> include/linux/netdevice.h:2455
> [ 315.403815] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
> drivers/net/tun.c:567
> [ 315.808015] [< inline >] tun_detach drivers/net/tun.c:578
> [ 315.808015] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
> drivers/net/tun.c:2350
> [ 315.808015] [<ffffffff81a77f7e>] __fput+0x34e/0x910
> fs/file_table.c:208
> [ 315.808015] [<ffffffff81a785ca>] ____fput+0x1a/0x20
> fs/file_table.c:244
> [ 315.808015] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
> kernel/task_work.c:116
> [ 315.808015] [< inline >] exit_task_work
> include/linux/task_work.h:21
> [ 315.808015] [<ffffffff814129e2>] do_exit+0x1842/0x2650
> kernel/exit.c:828
> [ 315.808015] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
> kernel/exit.c:932
> [ 315.808015] [<ffffffff81442b43>] get_signal+0x663/0x1880
> kernel/signal.c:2307
> [ 315.808015] [<ffffffff81239b45>] do_signal+0xc5/0x2190
> arch/x86/kernel/signal.c:807
> [ 315.808015] [<ffffffff8100666a>]
> exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
> [ 315.808015] [< inline >] prepare_exit_to_usermode
> arch/x86/entry/common.c:190
> [ 315.808015] [<ffffffff81009693>]
> syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
> [ 315.808015] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6
>
> [ 315.808015] [< inline >] check_prev_add
> kernel/locking/lockdep.c:1828
> [ 315.808015] [<ffffffff8156309b>]
> check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
> [ 315.808015] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
> [ 315.808015] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
> [ 315.808015] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
> [ 315.808015] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
> [ 315.808015] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
> [ 315.808015] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
> [ 315.808015] [<ffffffff87b5cdf9>]
> nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
> [ 315.808015] [<ffffffff86cc1cd0>]
> genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
> [ 315.808015] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
> [ 315.808015] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
> [ 315.808015] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
> [ 315.808015] [< inline >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
> [ 315.808015] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
> [ 315.808015] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
> [ 315.808015] [< inline >] sock_sendmsg_nosec net/socket.c:621
> [ 315.808015] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
> [ 315.808015] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
> [ 315.808015] [<ffffffff81a6f9a3>]
> do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
> [ 315.808015] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
> fs/read_write.c:872
> [ 315.808015] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
> fs/read_write.c:911
> [ 315.808015] [<ffffffff81a73075>] do_writev+0x115/0x2d0
> fs/read_write.c:944
> [ 315.808015] [< inline >] SYSC_writev fs/read_write.c:1017
> [ 315.808015] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
> fs/read_write.c:1014
> [ 315.808015] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> other info that might help us debug this:
>
> Chain exists of:
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(genl_mutex);
> lock(nlk->cb_mutex);
> lock(genl_mutex);
> lock(rtnl_mutex);
>
> *** DEADLOCK ***
>
> 2 locks held by syz-executor7/18794:
> #0: (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
> net/netlink/genetlink.c:670
> #1: (genl_mutex){+.+.+.}, at: [< inline >] genl_lock
> net/netlink/genetlink.c:31
> #1: (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
>
> stack backtrace:
> CPU: 0 PID: 18794 Comm: syz-executor7 Not tainted 4.9.0-rc8+ #77
> Hardware name: Google Google/Google, BIOS Google 01/01/2011
> ffff88004add6468 ffffffff834c44f9 ffffffff00000000 1ffff100095bac20
> ffffed00095bac18 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
> [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
> kernel/locking/lockdep.c:1202
> [< inline >] check_prev_add kernel/locking/lockdep.c:1828
> [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
> [< inline >] validate_chain kernel/locking/lockdep.c:2265
> [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
> [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
> [< inline >] __mutex_lock_common kernel/locking/mutex.c:521
> [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
> [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70
> [<ffffffff87b5cdf9>] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
> [<ffffffff86cc1cd0>] genl_family_rcv_msg+0x780/0x1070
> net/netlink/genetlink.c:631
> [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
> [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
> [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
> [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
> [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
> [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
> [< inline >] sock_sendmsg_nosec net/socket.c:621
> [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
> [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
> [<ffffffff81a6f9a3>] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
> [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
> [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0 fs/read_write.c:911
> [<ffffffff81a73075>] do_writev+0x115/0x2d0 fs/read_write.c:944
> [< inline >] SYSC_writev fs/read_write.c:1017
> [<ffffffff81a7682c>] SyS_writev+0x2c/0x40 fs/read_write.c:1014
> [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
Probably a related one:
[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor5/5777 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [< inline >] genl_lock
net/netlink/genetlink.c:31
(genl_mutex){+.+.+.}, at: [<ffffffff86cc0c26>]
genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
but task is already holding lock:
(nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
[ 158.966653] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 158.966653] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 158.966653] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
[ 158.966653] [< inline >] netlink_dump_start
include/linux/netlink.h:165
[ 158.966653] [<ffffffff86d1395f>]
ctnetlink_get_ct_unconfirmed+0x17f/0x220
net/netfilter/nf_conntrack_netlink.c:1369
[ 158.966653] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
[ 158.966653] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 158.966653] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
[ 158.966653] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 158.966653] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 158.966653] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 158.966653] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 158.966653] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 158.966653] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 158.966653] [< inline >] new_sync_write fs/read_write.c:499
[ 158.966653] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 158.966653] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 158.966653] [< inline >] SYSC_write fs/read_write.c:607
[ 158.966653] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 158.966653] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 158.966653] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 158.966653] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 158.966653] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
[ 158.966653] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
[ 158.966653] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
[ 158.966653] [< inline >] __raw_notifier_call_chain
kernel/notifier.c:394
[ 158.966653] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
[ 158.966653] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
[ 158.966653] [< inline >] call_netdevice_notifiers
net/core/dev.c:1661
[ 158.966653] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
[ 158.966653] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
[ 158.966653] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
[ 158.966653] [< inline >] unregister_netdevice
include/linux/netdevice.h:2455
[ 158.966653] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
[ 158.966653] [< inline >] tun_detach drivers/net/tun.c:578
[ 158.966653] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
[ 158.966653] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
[ 158.966653] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
[ 158.966653] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
[ 158.966653] [< inline >] exit_task_work
include/linux/task_work.h:21
[ 158.966653] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
[ 158.966653] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
[ 159.308048] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
[ 159.308048] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
[ 159.308048] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
[ 159.308048] [< inline >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
[ 159.308048] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 159.308048] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6
[ 159.308048] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 159.308048] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 159.308048] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
[ 159.308048] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
[ 159.308048] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
[ 159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 159.308048] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 159.308048] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 159.308048] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
[ 159.308048] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
[ 159.308048] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
[ 159.308048] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
[ 159.308048] [< inline >] SYSC_writev fs/read_write.c:1017
[ 159.308048] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
[ 159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 159.308048] [< inline >] check_prev_add
kernel/locking/lockdep.c:1828
[ 159.308048] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[ 159.308048] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 159.308048] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 159.308048] [< inline >] genl_lock net/netlink/genetlink.c:31
[ 159.308048] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
[ 159.308048] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
[ 159.308048] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
[ 159.308048] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
[ 159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 159.308048] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 159.308048] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 159.308048] [< inline >] new_sync_write fs/read_write.c:499
[ 159.308048] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 159.308048] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 159.308048] [< inline >] SYSC_write fs/read_write.c:607
[ 159.308048] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
other info that might help us debug this:
Chain exists of:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(nlk->cb_mutex);
lock(&table[i].mutex);
lock(nlk->cb_mutex);
lock(genl_mutex);
*** DEADLOCK ***
2 locks held by syz-executor5/5777:
#0: (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
#1: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084
stack backtrace:
CPU: 1 PID: 5777 Comm: syz-executor5 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
ffff88005fe363e8 ffffffff834c44f9 ffffffff00000001 1ffff1000bfc6c10
ffffed000bfc6c08 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
0000000000000000 0000000000000000 0000000000000000 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
[< inline >] check_prev_add kernel/locking/lockdep.c:1828
[<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[< inline >] validate_chain kernel/locking/lockdep.c:2265
[<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[< inline >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
[<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127
[<ffffffff86cb7b6a>] __netlink_dump_start+0x4ea/0x760
net/netlink/af_netlink.c:2217
[<ffffffff86cc2319>] genl_family_rcv_msg+0xdc9/0x1070
net/netlink/genetlink.c:586
[<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
[<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
[<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
[< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
[<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
[<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
[< inline >] sock_sendmsg_nosec net/socket.c:621
[<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
[<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
[< inline >] new_sync_write fs/read_write.c:499
[<ffffffff81a701ae>] __vfs_write+0x4fe/0x830 fs/read_write.c:512
[<ffffffff81a71c55>] vfs_write+0x175/0x4e0 fs/read_write.c:560
[< inline >] SYSC_write fs/read_write.c:607
[<ffffffff81a760e0>] SyS_write+0x100/0x240 fs/read_write.c:599
[<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
^ permalink raw reply
* Re: [PATCH 0/1] NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
From: David Miller @ 2016-12-08 18:02 UTC (permalink / raw)
To: dnlplm; +Cc: bjorn, oliver, netdev, linux-usb
In-Reply-To: <1481116068-32691-1-git-send-email-dnlplm@gmail.com>
From: Daniele Palmas <dnlplm@gmail.com>
Date: Wed, 7 Dec 2016 14:07:47 +0100
> Telit LE922A MBIM based composition does not work properly
> with altsetting toggle done in cdc_ncm_bind_common.
>
> This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
> to avoid this procedure that, instead, is mandatory for
> other modems.
>
> References:
> https://www.spinics.net/lists/linux-usb/msg149249.html
> https://www.spinics.net/lists/linux-usb/msg149819.html
>
> Thanks to Bjørn for the productive discussion and feedback!
Patch applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Eric Dumazet @ 2016-12-08 18:02 UTC (permalink / raw)
To: Tom Herbert; +Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CALx6S34e_5KW3cdxS_yNXwhYuK2FQe=6+9=yTCVDsg6f2vx87g@mail.gmail.com>
On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:
> Of course that would only help on systems where no one enable encaps,
> ie. looks good in the the simple benchmarks but in real life if just
> one socket enables encap everyone else takes the hit. Alternatively,
> maybe we could do early demux when we do the lookup in GRO to
> eliminate the extra lookup?
Well, if you do the lookup in GRO, wont it be done for every incoming
MSS, instead of once per GRO packet ?
Anyway, the flooded UDP sockets out there are not normally connected
ones.
^ permalink raw reply
* Re: [PATCH v3 0/6] net: stmmac: make DMA programmable burst length more configurable
From: David Miller @ 2016-12-08 18:07 UTC (permalink / raw)
To: niklas.cassel; +Cc: netdev, niklass, devicetree, linux-kernel, linux-doc
In-Reply-To: <1481120409-18103-1-git-send-email-niklass@axis.com>
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Wed, 7 Dec 2016 15:20:02 +0100
> Make DMA programmable burst length more configurable in the stmmac driver.
>
> This is done by adding support for independent pbl for tx/rx through DT.
> More fine grained tuning of pbl is possible thanks to a DT property saying
> that we should NOT multiply pbl values by x8/x4 in hardware.
>
> All new DT properties are optional, and created in a way that it will not
> affect any existing DT configurations.
Series applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Eric Dumazet @ 2016-12-08 18:07 UTC (permalink / raw)
To: Tom Herbert; +Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CALx6S34e_5KW3cdxS_yNXwhYuK2FQe=6+9=yTCVDsg6f2vx87g@mail.gmail.com>
On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:
> Of course that would only help on systems where no one enable encaps,
> ie. looks good in the the simple benchmarks but in real life if just
> one socket enables encap everyone else takes the hit.
Well, in real life most linux hosts do not use any UDP encapsulation.
Or if they do, maybe they still have to handle a lot of UDP traffic
which does not hit a tunnel in the kernel.
Anyway, my difference vs GRO on/off were caused by copybreak in mlx4
driver.
GRO off --> mlx4 uses copybreak for small messages (all protocols)
GRO on --> no copybreak for native protocols (IP+TCP IP+UDP)
The lookup being done twice is not that expensive, if the first two
cache lines of the socket stay shared (mostly read)
^ permalink raw reply
* Re: [net-next] macsec: remove first zero and add attribute name in comments
From: David Miller @ 2016-12-08 18:08 UTC (permalink / raw)
To: zhangshengju; +Cc: netdev
In-Reply-To: <1481122929-19147-1-git-send-email-zhangshengju@cmss.chinamobile.com>
From: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Date: Wed, 7 Dec 2016 23:02:09 +0800
> Remove first zero for add, and use full attribute name in comments.
>
> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Applied.
^ permalink raw reply
* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-08 18:18 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
netdev, brouer
In-Reply-To: <20161208080647-mutt-send-email-mst@kernel.org>
On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
>> This adds support for the XDP_TX action to virtio_net. When an XDP
>> program is run and returns the XDP_TX action the virtio_net XDP
>> implementation will transmit the packet on a TX queue that aligns
>> with the current CPU that the XDP packet was processed on.
>>
>> Before sending the packet the header is zeroed. Also XDP is expected
>> to handle checksum correctly so no checksum offload support is
>> provided.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>> drivers/net/virtio_net.c | 99 +++++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 92 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 28b1196..8e5b13c 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>> return skb;
>> }
>>
>> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
>> + struct receive_queue *rq,
>> + struct send_queue *sq,
>> + struct xdp_buff *xdp)
>> +{
>> + struct page *page = virt_to_head_page(xdp->data);
>> + struct virtio_net_hdr_mrg_rxbuf *hdr;
>> + unsigned int num_sg, len;
>> + void *xdp_sent;
>> + int err;
>> +
>> + /* Free up any pending old buffers before queueing new ones. */
>> + while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> + struct page *sent_page = virt_to_head_page(xdp_sent);
>> +
>> + if (vi->mergeable_rx_bufs)
>> + put_page(sent_page);
>> + else
>> + give_pages(rq, sent_page);
>> + }
>
> Looks like this is the only place where you do virtqueue_get_buf.
> No interrupt handler?
> This means that if you fill up the queue, nothing will clean it
> and things will get stuck.
hmm OK so the callbacks should be implemented to do this and a pair
of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
and disable callbacks if packets are enqueued.
Also in the normal xmit path via start_xmit() will the same condition
happen? It looks like free_old_xmit_skbs for example is only called if
a packet is sent could we end up holding on to skbs in this case? I
don't see free_old_xmit_skbs being called from any callbacks?
> Can this be the issue you saw?
nope see below I was mishandling the big_packets page cleanup path in
the error case.
>
>
>> +
>> + /* Zero header and leave csum up to XDP layers */
>> + hdr = xdp->data;
>> + memset(hdr, 0, vi->hdr_len);
>> +
>> + nu_sg = 1;
>> + sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
>> + err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>> + xdp->data, GFP_ATOMIC);
>> + if (unlikely(err)) {
>> + if (vi->mergeable_rx_bufs)
>> + put_page(page);
>> + else
>> + give_pages(rq, page);
>> + } else if (!vi->mergeable_rx_bufs) {
>> + /* If not mergeable bufs must be big packets so cleanup pages */
>> + give_pages(rq, (struct page *)page->private);
>> + page->private = 0;
>> + }
>> +
>> + virtqueue_kick(sq->vq);
>
> Is this unconditional kick a work-around for hang
> we could not figure out yet?
I tracked the original issue down to how I handled the big_packet page
cleanups.
> I guess this helps because it just slows down the guest.
> I don't much like it ...
I left it like this copying the pattern in balloon and input drivers. I
can change it back to the previous pattern where it is only called if
there is no errors. It has been running fine with the old pattern now
for an hour or so.
.John
^ permalink raw reply
* Re: [PATCH net-next] net: rfs: add a jump label
From: David Miller @ 2016-12-08 18:19 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481128150.4930.25.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 08:29:10 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> RFS is not commonly used, so add a jump label to avoid some conditionals
> in fast path.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, but I wonder how effective this will really be in the long run.
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Paolo Abeni @ 2016-12-08 18:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481218739-27089-5-git-send-email-edumazet@google.com>
On Thu, 2016-12-08 at 09:38 -0800, Eric Dumazet wrote:
> If udp_recvmsg() constantly releases sk_rmem_alloc
> for every read packet, it gives opportunity for
> producers to immediately grab spinlocks and desperatly
> try adding another packet, causing false sharing.
>
> We can add a simple heuristic to give the signal
> by batches of ~25 % of the queue capacity.
>
> This patch considerably increases performance under
> flood by about 50 %, since the thread draining the queue
> is no longer slowed by false sharing.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/udp.h | 3 +++
> net/ipv4/udp.c | 11 +++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index d1fd8cd39478..c0f530809d1f 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -79,6 +79,9 @@ struct udp_sock {
> int (*gro_complete)(struct sock *sk,
> struct sk_buff *skb,
> int nhoff);
> +
> + /* This field is dirtied by udp_recvmsg() */
> + int forward_deficit;
> };
>
> static inline struct udp_sock *udp_sk(const struct sock *sk)
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 880cd3d84abf..f0096d088104 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1177,8 +1177,19 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
> /* fully reclaim rmem/fwd memory allocated for skb */
> static void udp_rmem_release(struct sock *sk, int size, int partial)
> {
> + struct udp_sock *up = udp_sk(sk);
> int amt;
>
> + if (likely(partial)) {
> + up->forward_deficit += size;
> + size = up->forward_deficit;
> + if (size < (sk->sk_rcvbuf >> 2))
> + return;
> + } else {
> + size += up->forward_deficit;
> + }
> + up->forward_deficit = 0;
> +
> atomic_sub(size, &sk->sk_rmem_alloc);
> sk->sk_forward_alloc += size;
> amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
Nice one! This sounds like a relevant improvement!
I'm wondering if it may cause regressions with small value of
sk_rcvbuf ?!? e.g. with:
netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
I'm sorry, I fear I will not unable to do any test before next week.
Cheers,
Paolo
^ permalink raw reply
* Re: [PATCH net-next V3 0/7] liquidio VF data path
From: David Miller @ 2016-12-08 18:25 UTC (permalink / raw)
To: rvatsavayi; +Cc: netdev
In-Reply-To: <1481129677-10586-1-git-send-email-rvatsavayi@caviumnetworks.com>
From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Wed, 7 Dec 2016 08:54:30 -0800
> Following is V3 patch series that adds support for VF
> data path related features. It also has following changes
> related to previous comments:
> 1) Remove unnecessary "void *" casting.
> 2) Remove inline for functions and let gcc decide.
>
> Please apply patches in following order as some of them
> depend on earlier patches.
Series applied.
^ permalink raw reply
* Re: [PATCH net-next] udp: under rx pressure, try to condense skbs
From: David Miller @ 2016-12-08 18:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481131173.4930.36.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 09:19:33 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> Under UDP flood, many softirq producers try to add packets to
> UDP receive queue, and one user thread is burning one cpu trying
> to dequeue packets as fast as possible.
>
> Two parts of the per packet cost are :
> - copying payload from kernel space to user space,
> - freeing memory pieces associated with skb.
>
> If socket is under pressure, softirq handler(s) can try to pull in
> skb->head the payload of the packet if it fits.
>
> Meaning the softirq handler(s) can free/reuse the page fragment
> immediately, instead of letting udp_recvmsg() do this hundreds of usec
> later, possibly from another node.
>
>
> Additional gains :
> - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
> - We avoid cache line misses at copyout() time and consume_skb() time,
> and avoid one put_page() with potential alien freeing on NUMA hosts.
>
> This comes at the cost of a copy, bounded to available tail room, which
> is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
> than necessary)
>
> This patch gave me about 5 % increase in throughput in my tests.
>
> skb_condense() helper could probably used in other contexts.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
This is isolated to UDP, and would be easy to revert if it causes
problems. So applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH net v2 1/1] driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
From: Mahesh Bandewar (महेश बंडेवार) @ 2016-12-08 18:29 UTC (permalink / raw)
To: fgao; +Cc: David Miller, Eric Dumazet, linux-netdev, gfree.wind
In-Reply-To: <1481167018-559-1-git-send-email-fgao@ikuai8.com>
On Wed, Dec 7, 2016 at 7:16 PM, <fgao@ikuai8.com> wrote:
> From: Gao Feng <fgao@ikuai8.com>
>
> When netdev_upper_dev_unlink failed in ipvlan_link_new, need to
> unlink the ipvlan dev with upper dev.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
> v2: Rename the label to unlink_netdev, per Mahesh Bandewar
> v1: Initial patch
>
> drivers/net/ipvlan/ipvlan_main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 0fef178..dfbc4ef 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -546,13 +546,15 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> }
> err = ipvlan_set_port_mode(port, mode);
> if (err) {
> - goto unregister_netdev;
> + goto unlink_netdev;
> }
>
> list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
> netif_stacked_transfer_operstate(phy_dev, dev);
> return 0;
>
> +unlink_netdev:
> + netdev_upper_dev_unlink(phy_dev, dev);
> unregister_netdev:
> unregister_netdevice(dev);
> destroy_ipvlan_port:
> --
> 1.9.1
>
>
^ permalink raw reply
* Re: [PATCH] net: pch_gbe: Fix TX RX descriptor accesses for big endian systems
From: David Miller @ 2016-12-08 18:29 UTC (permalink / raw)
To: hassan.naveed; +Cc: netdev, paul.burton, matt.redfearn, fw, romieu
In-Reply-To: <1481133534-26224-1-git-send-email-hassan.naveed@imgtec.com>
From: Hassan Naveed <hassan.naveed@imgtec.com>
Date: Wed, 7 Dec 2016 09:58:54 -0800
> Fix pch_gbe driver for ethernet operations for a big endian CPU.
> Values written to and read from transmit and receive descriptors
> in the pch_gbe driver are byte swapped from the perspective of a
> big endian CPU, since the ethernet controller always operates in
> little endian mode. Rectify this by appropriately byte swapping
> these descriptor field values in the driver software.
>
> Signed-off-by: Hassan Naveed <hassan.naveed@imgtec.com>
> Reviewed-by: Paul Burton <paul.burton@imgtec.com>
> Reviewed-by: Matt Redfearn <matt.redfearn@imgtec.com>
As explained by Francois, you need to use the proper endian types in
the descriptor datastructure.
Then please run sparse with endianness checking enabled on the build
of the driver.
^ permalink raw reply
* Re: [PATCH net-next] net: do not read sk_drops if application does not care
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481133936.4930.51.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 10:05:36 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> sk_drops can be an often written field, do not read it unless
> application showed interest.
>
> Note that sk_drops can be read via inet_diag, so applications
> can avoid getting this info from every received packet.
>
> In the future, 'reading' sk_drops might require folding per node or per
> cpu fields, and thus become even more expensive than today.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] net: rfs: add a jump label
From: Eric Dumazet @ 2016-12-08 18:31 UTC (permalink / raw)
To: David Miller; +Cc: netdev, pabeni
In-Reply-To: <20161208.131900.434329215014851517.davem@davemloft.net>
On Thu, 2016-12-08 at 13:19 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 07 Dec 2016 08:29:10 -0800
>
> > From: Eric Dumazet <edumazet@google.com>
> >
> > RFS is not commonly used, so add a jump label to avoid some conditionals
> > in fast path.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Applied, but I wonder how effective this will really be in the long run.
I guess this applies to about all jump labels.
As soon as the attribute is per namespace, we no longer can use them.
A conditional cost really depends on the expression complexity
(including cache line misses)
TCP stack might benefit from jump labels, like sysctl_tcp_low_latency
which is often set to 1 on hosts mostly using epoll()/poll()/select()
instead of blocking read()/recvmsg()
^ permalink raw reply
* Re: [PATCH net-next] bpf: fix state equivalence
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
To: ast; +Cc: daniel, jbacik, tgraf, netdev
In-Reply-To: <1481137079-2205635-1-git-send-email-ast@fb.com>
From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 7 Dec 2016 10:57:59 -0800
> Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> and 484611357c19 ("bpf: allow access into map value arrays") by themselves
> are correct, but in combination they make state equivalence ignore 'id' field
> of the register state which can lead to accepting invalid program.
>
> Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> Fixes: 484611357c19 ("bpf: allow access into map value arrays")
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Applied.
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:36 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481221491.6120.11.camel@redhat.com>
On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Nice one! This sounds like a relevant improvement!
>
> I'm wondering if it may cause regressions with small value of
> sk_rcvbuf ?!? e.g. with:
>
> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>
Possibly, then simply we can refine the test to :
size = up->forward_deficit;
if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
return;
^ permalink raw reply
* Re: [PATCH v4 net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Saeed Mahameed @ 2016-12-08 18:36 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
Daniel Borkmann, David Miller, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Saeed Mahameed,
Tariq Toukan, Kernel Team
In-Reply-To: <1481154794-2311034-3-git-send-email-kafai@fb.com>
On Thu, Dec 8, 2016 at 1:53 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> When XDP is active in mlx4, mlx4 is using one page/pkt.
> At the same time (i.e. when XDP is active), it is currently
> limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
> which is 1514 in x86. AFAICT, we can at least raise the MTU
> limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
> patch is doing. It will be useful in the next patch which
> allows XDP program to extend the packet by adding new header(s).
>
> Note: In the earlier XDP patches, there is already existing guard
> to ensure the page/pkt scheme only applies when XDP is active
> in mlx4.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:38 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <CANn89iL8r9UD=sGn3WxVFZ+Z_QJYYM6aXxCFvafwvJ-bEtNhKQ@mail.gmail.com>
On Thu, Dec 8, 2016 at 10:36 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>
>> Nice one! This sounds like a relevant improvement!
>>
>> I'm wondering if it may cause regressions with small value of
>> sk_rcvbuf ?!? e.g. with:
>>
>> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>>
>
> Possibly, then simply we can refine the test to :
>
> size = up->forward_deficit;
> if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
> return;
BTW, I tried :
lpaa6:~# ./netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m
1024 -M 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
127.0.0.1 () port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
4608 1024 10.00 4499400 0 3685.88
2560 10.00 4498670 3685.28
So it looks like it is working.
However I have no doubt there might be a corner case for tiny
SO_RCVBUF values or for some message sizes.
^ permalink raw reply
* Re: [PATCH] drivers: net: xgene: initialize slots
From: Iyappan Subramanian @ 2016-12-08 18:44 UTC (permalink / raw)
To: Colin King; +Cc: Keyur Chudgar, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20161208111754.9711-1-colin.king@canonical.com>
On Thu, Dec 8, 2016 at 3:17 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> static analysis using cppcheck detected that slots was uninitialized.
> Fix this by initializing it to buf_pool->slots - 1
>
> Found using static analysis with CoverityScan, CID #1387620
>
> Fixes: a9380b0f7be818 ("drivers: net: xgene: Add support for Jumbo frame")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> index 6c7eea8..899163c 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> @@ -636,6 +636,7 @@ static void xgene_enet_free_pagepool(struct xgene_enet_desc_ring *buf_pool,
>
> dev = ndev_to_dev(buf_pool->ndev);
> head = buf_pool->head;
> + slots = buf_pool->slots - 1;
>
> for (i = 0; i < 4; i++) {
> frag_size = xgene_enet_get_data_len(le64_to_cpu(desc[i ^ 1]));
Thanks, Colin.
Dan Carpenter <dan.carpenter@oracle.com> posted the fix already and
got accepted.
http://marc.info/?l=linux-netdev&m=148110980224343&w=2
> --
> 2.10.2
>
^ permalink raw reply
* Re: [PATCH v4 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active
From: Saeed Mahameed @ 2016-12-08 18:47 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
Daniel Borkmann, David Miller, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Saeed Mahameed,
Tariq Toukan, Kernel Team
In-Reply-To: <1481154794-2311034-4-git-send-email-kafai@fb.com>
On Thu, Dec 8, 2016 at 1:53 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head()
> support. This patch only affects the code path when XDP is active.
>
> After testing, the tx_dropped counter is incremented if the xdp_prog sends
> more than wire MTU.
>
I guess this is the HW tx_dropped counter. I suggest as a future
improvement to
drop in SW to save CPU and HW cycles on such packets.
Also those packets pages will be recycled immediately if dropped in
SW, one more extra point.
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Paolo Abeni @ 2016-12-08 18:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Willem de Bruijn, Tom Herbert
In-Reply-To: <1481120956.4930.7.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, 2016-12-07 at 06:29 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-07 at 08:57 +0100, Paolo Abeni wrote:
>
> > We have some experimental patches to implement GRO for plain UDP
> > connected sockets, using frag_list to preserve the individual skb len,
> > and deliver the packet to user space individually. With that I got
> > ~3mpps with a single queue/user space sink - before the recent udp
> > improvements. I would like to present these patches on netdev soon (no
> > sooner than next week, anyway).
> >
>
> Make sure you handle properly all netfilter helpers :(
Thank you for the head-up!
UDP-GRO will be enabled by a specific netdev feature bit, disabled by
default, should not impact by default any setup.
> Keeping frag_list means you keep one sk_buff per segment, so this really
> looks like a legacy UDP server (like a DNS server) wont benefit from
> this anyway.
I'm sorry, I do not follow.
UDP GRO will require connected socket - very likely no DNS server. The
use-case is an application using long lived UDP sockets doing a lot of
traffic, like fix protocol feeds over UDP.
Thank you,
Paolo
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:52 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <CANn89i+10fwMQ+oqs2AgVfE9CHnpZqecN_NxVqobyzD1riyMfg@mail.gmail.com>
On Thu, Dec 8, 2016 at 10:38 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Dec 8, 2016 at 10:36 AM, Eric Dumazet <edumazet@google.com> wrote:
>> On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>>
>>> Nice one! This sounds like a relevant improvement!
>>>
>>> I'm wondering if it may cause regressions with small value of
>>> sk_rcvbuf ?!? e.g. with:
>>>
>>> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>>>
>>
>> Possibly, then simply we can refine the test to :
>>
>> size = up->forward_deficit;
>> if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
>> return;
>
I will also add this patch :
This really makes sure our changes to sk_forward_alloc wont be slowed
because producers see
the change to sk_rmem_alloc too soon.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8400d6954558..6bdcbe103390 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1191,13 +1191,14 @@ static void udp_rmem_release(struct sock *sk,
int size, int partial)
}
up->forward_deficit = 0;
- atomic_sub(size, &sk->sk_rmem_alloc);
sk->sk_forward_alloc += size;
amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
sk->sk_forward_alloc -= amt;
if (amt)
__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
+
+ atomic_sub(size, &sk->sk_rmem_alloc);
}
/* Note: called with sk_receive_queue.lock held.
^ permalink raw reply related
* Re: [PATCH 3/6] net: ethernet: ti: cpts: add support of cpts HW_TS_PUSH
From: Grygorii Strashko @ 2016-12-08 19:04 UTC (permalink / raw)
To: Richard Cochran
Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-omap-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok
In-Reply-To: <20161203232130.GA17944@netboy>
On 12/03/2016 05:21 PM, Richard Cochran wrote:
> On Mon, Nov 28, 2016 at 05:04:25PM -0600, Grygorii Strashko wrote:
>> This also change overflow polling period when HW_TS_PUSH feature is
>> enabled - overflow check work will be scheduled more often (every
>> 200ms) for proper HW_TS_PUSH events reporting.
>
> For proper reporting, you should make use of the interrupt. The small
> fifo (16 iirc) could very well overflow in 200 ms. The interrupt
> handler should read out the entire fifo at each interrupt.
>
huh. Seems this is not really good idea, because MISC Irq will be
triggered for *any* CPTS event and there is no way to enable it just for
HW_TS_PUSH. So, this doesn't work will with current code for RX/TX timestamping
(which uses polling mode). + runtime overhead in net RX/TX caused by
triggering more interrupts.
May be, overflow check/polling timeout can be made configurable (module parameter).
--
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox