Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Paolo Abeni @ 2016-12-08 17:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Willem de Bruijn, Tom Herbert
In-Reply-To: <1481120791.4930.4.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2016-12-07 at 06:26 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-07 at 08:57 +0100, Paolo Abeni wrote:
> > On Tue, 2016-12-06 at 22:47 -0800, Eric Dumazet wrote:
> > > On Tue, 2016-12-06 at 19:32 -0800, Eric Dumazet wrote:
> > > > A follow up patch will provide a static_key (Jump Label) since most
> > > > hosts do not even use RFS.
> > > 
> > > Speaking of static_key, it appears we now have GRO on UDP, and this
> > > consumes a considerable amount of cpu cycles.
> > > 
> > > Turning off GRO allows me to get +20 % more packets on my single UDP
> > > socket. (1.2 Mpps instead of 1.0 Mpps)
> > 
> > I see also an improvement for single flow tests disabling GRO, but on a
> > smaller scale (~5% if I recall correctly).
> 
> Was it on a NUMA host ?

I'm using a single socket host, with 12 cores/24 threads and 16 RX
queues. 
But my data is old. I'll re-run the test on top of current net-next.

Paolo

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 0/3] i40e: Support for XDP
From: John Fastabend @ 2016-12-08 17:52 UTC (permalink / raw)
  To: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan
  Cc: netdev, Björn Töpel, magnus.karlsson,
	Alexei Starovoitov
In-Reply-To: <20161208170022.11555-1-bjorn.topel@gmail.com>

On 16-12-08 09:00 AM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> This series adds XDP support for i40e-based NICs.
> 
> The first patch adds XDP_RX support, the second XDP_TX support and the
> last patch makes it possible to change an XDP program without
> rebuilding the rings.
> 
> 
> Björn
> 
> 
> Björn Töpel (3):
>   i40e: Initial support for XDP
>   i40e: Add XDP_TX support
>   i40e: Don't reset/rebuild rings on XDP program swap
> 
>  drivers/net/ethernet/intel/i40e/i40e.h         |  18 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   3 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c    | 358 +++++++++++++++++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 445 +++++++++++++++++++++----
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |   7 +
>  5 files changed, 715 insertions(+), 116 deletions(-)
> 

Hi Jeff,

These are for the Intel driver net-next tree per our offlist email.

Thanks!
John

^ permalink raw reply

* Re: [PATCH net] phy: Don't increment MDIO bus refcount unless it's a different owner
From: Florian Fainelli @ 2016-12-08 17:54 UTC (permalink / raw)
  To: Johan Hovold; +Cc: netdev, rmk+kernel, andrew
In-Reply-To: <20161208170127.GJ31573@localhost>

On 12/08/2016 09:01 AM, Johan Hovold wrote:
> On Thu, Dec 08, 2016 at 08:47:54AM -0800, Florian Fainelli wrote:
>> On 12/08/2016 08:27 AM, Johan Hovold wrote:
>>> On Tue, Dec 06, 2016 at 08:54:43PM -0800, Florian Fainelli wrote:
>>>> Commit 3e3aaf649416 ("phy: fix mdiobus module safety") fixed the way we
>>>> dealt with MDIO bus module reference count, but sort of introduced a
>>>> regression in that, if an Ethernet driver registers its own MDIO bus
>>>> driver, as is common, we will end up with the Ethernet driver's
>>>> module->refnct set to 1, thus preventing this driver from any removal.
>>>>
>>>> Fix this by comparing the network device's device driver owner against
>>>> the MDIO bus driver owner, and only if they are different, increment the
>>>> MDIO bus module refcount.
>>>>
>>>> Fixes: 3e3aaf649416 ("phy: fix mdiobus module safety")
>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>>> ---
>>>> Russell,
>>>>
>>>> I verified this against the ethoc driver primarily (on a TS7300 board)
>>>> and bcmgenet.
>>>>
>>>> Thanks!
>>>>
>>>>  drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>  1 file changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index 1a4bf8acad78..c4ceb082e970 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -857,11 +857,17 @@ EXPORT_SYMBOL(phy_attached_print);
>>>>  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>>  		      u32 flags, phy_interface_t interface)
>>>>  {
>>>> +	struct module *ndev_owner = dev->dev.parent->driver->owner;
>>>
>>> Is this really safe? A driver does not need to set a parent device, and
>>> in that case you get a NULL-deref here (I tried using cpsw).
>>
>> Humm, cpsw does call SET_NETDEV_DEV() which should take care of that, is
>> the call made too late? Do you have an example oops?
> 
> Sorry if I was being unclear, cpsw does set a parent device, but there
> are network driver that do not. Perhaps such drivers will never hit this
> code path, but I can't say for sure and everything appear to work for
> cpsw if you comment out that SET_NETDEV_DEV (well, at least before this
> patch).

You were clear, I did not understand that you exercised this with cpsw
to see whether this was safe in all conditions.

> 
>> I don't mind safeguarding this with a check against dev->dev.parent, but
>> I would like to fix the drivers where relevant too, since
>> SET_NETDEV_DEV() should really be called, otherwise a number of things
>> just don't work
> 
> I grepped for for register_netdev and think I saw a number of drivers
> which do not call SET_NETDEV_DEV.
> 
> Again, perhaps they will never hit this path, but thought I should ask.

You are absolutely right, this is a potential problem, so far I found
two legitimate drivers that do not call SET_NETDEV_DEV (lantiq_etop.c
and cpmac.c, both fixed), and Freescale's FMAN driver, which I have a
hard time understanding what it does with mac_dev->net_dev...

Thanks!
-- 
Florian

^ permalink raw reply

* A
From: richard @ 2016-12-08  7:37 UTC (permalink / raw)


Please confirm receipt of my previous mail? What time and when can i call you?

^ permalink raw reply

* Re: [PATCH] net: ethernet: slicoss: use module_pci_driver()
From: David Miller @ 2016-12-08 18:01 UTC (permalink / raw)
  To: tklauser; +Cc: LinoSanfilippo, netdev
In-Reply-To: <20161207134330.8829-1-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Wed,  7 Dec 2016 14:43:30 +0100

> Use module_pci_driver() to get rid of some boilerplate code.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: net: deadlock on genl_mutex
From: Dmitry Vyukov @ 2016-12-08 18:02 UTC (permalink / raw)
  To: syzkaller
  Cc: Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen,
	Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert,
	netdev, LKML, Richard Guy Briggs, netdev-owner
In-Reply-To: <CACT4Y+Zy82UAJ55VbPbVadUM92ZSx1VJCFPdhhcmj53uxZ5PXQ@mail.gmail.com>

On Thu, Dec 8, 2016 at 6:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 29, 2016 at 6:59 AM,  <subashab@codeaurora.org> wrote:
>>>>
>>>> Issue was reported yesterday and is under investigation.
>>>>
>>>>
>>>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>>>
>>>>
>>>> Thanks !
>>>
>>>
>>> Hi Dmitry
>>>
>>> Can you try the patch below with your reproducer? I haven't seen similar
>>> crashes reported after this (or even with Eric's patch).
>>
>> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
>> _not_ see this report happening anymore.
>> Thanks.
>
>
> But now I am seeing "possible deadlock" warnings involving genl_lock:
>
> [ INFO: possible circular locking dependency detected ]
> 4.9.0-rc8+ #77 Not tainted
> -------------------------------------------------------
> syz-executor7/18794 is trying to acquire lock:
>  (rtnl_mutex){+.+.+.}, at: [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
> but task is already holding lock:
>  (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
> net/netlink/genetlink.c:31
>  (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<     inline     >] genl_lock net/netlink/genetlink.c:31
>        [  315.403815] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
> net/netlink/genetlink.c:518
>        [  315.403815] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
> net/netlink/af_netlink.c:2127
>        [  315.403815] [<ffffffff86cb7b6a>]
> __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
>        [  315.403815] [<ffffffff86cc2319>]
> genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
>        [  315.403815] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
>        [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.403815] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
>        [  315.403815] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
>        [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
>        [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
>        [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
>        [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
>        [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<ffffffff86cb7779>]
> __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
>        [  315.403815] [<     inline     >] netlink_dump_start
> include/linux/netlink.h:165
>        [  315.403815] [<ffffffff86d14d48>]
> ctnetlink_stat_ct_cpu+0x198/0x1e0
> net/netfilter/nf_conntrack_netlink.c:2045
>        [  315.403815] [<ffffffff86cd313e>]
> nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
>        [  315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.403815] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
> net/netfilter/nfnetlink.c:474
>        [  315.403815] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.403815] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.403815] [<     inline     >] new_sync_write fs/read_write.c:499
>        [  315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
>        [  315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
> fs/read_write.c:560
>        [  315.403815] [<     inline     >] SYSC_write fs/read_write.c:607
>        [  315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
> fs/read_write.c:599
>        [  315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
>        [  315.403815] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.403815] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.403815] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.403815] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.403815] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
> net/netfilter/nfnetlink.c:61
>        [  315.403815] [<ffffffff86d7c5b1>]
> nf_tables_netdev_event+0x1f1/0x720
> net/netfilter/nf_tables_netdev.c:122
>        [  315.403815] [<ffffffff8149095a>]
> notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
>        [  315.403815] [<     inline     >] __raw_notifier_call_chain
> kernel/notifier.c:394
>        [  315.403815] [<ffffffff81490b82>]
> raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
>        [  315.403815] [<ffffffff86ae4af6>]
> call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
>        [  315.403815] [<     inline     >] call_netdevice_notifiers
> net/core/dev.c:1661
>        [  315.403815] [<ffffffff86af898d>]
> rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
>        [  315.403815] [<ffffffff86af8e9e>]
> rollback_registered+0xae/0x100 net/core/dev.c:6800
>        [  315.403815] [<ffffffff86af8f76>]
> unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
>        [  315.403815] [<     inline     >] unregister_netdevice
> include/linux/netdevice.h:2455
>        [  315.403815] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
> drivers/net/tun.c:567
>        [  315.808015] [<     inline     >] tun_detach drivers/net/tun.c:578
>        [  315.808015] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
> drivers/net/tun.c:2350
>        [  315.808015] [<ffffffff81a77f7e>] __fput+0x34e/0x910
> fs/file_table.c:208
>        [  315.808015] [<ffffffff81a785ca>] ____fput+0x1a/0x20
> fs/file_table.c:244
>        [  315.808015] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
> kernel/task_work.c:116
>        [  315.808015] [<     inline     >] exit_task_work
> include/linux/task_work.h:21
>        [  315.808015] [<ffffffff814129e2>] do_exit+0x1842/0x2650
> kernel/exit.c:828
>        [  315.808015] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
> kernel/exit.c:932
>        [  315.808015] [<ffffffff81442b43>] get_signal+0x663/0x1880
> kernel/signal.c:2307
>        [  315.808015] [<ffffffff81239b45>] do_signal+0xc5/0x2190
> arch/x86/kernel/signal.c:807
>        [  315.808015] [<ffffffff8100666a>]
> exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
>        [  315.808015] [<     inline     >] prepare_exit_to_usermode
> arch/x86/entry/common.c:190
>        [  315.808015] [<ffffffff81009693>]
> syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>        [  315.808015] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6
>
>        [  315.808015] [<     inline     >] check_prev_add
> kernel/locking/lockdep.c:1828
>        [  315.808015] [<ffffffff8156309b>]
> check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
>        [  315.808015] [<     inline     >] validate_chain
> kernel/locking/lockdep.c:2265
>        [  315.808015] [<ffffffff81569576>]
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>        [  315.808015] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>        [  315.808015] [<     inline     >] __mutex_lock_common
> kernel/locking/mutex.c:521
>        [  315.808015] [<ffffffff88195bcf>]
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>        [  315.808015] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
>        [  315.808015] [<ffffffff87b5cdf9>]
> nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
>        [  315.808015] [<ffffffff86cc1cd0>]
> genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
>        [  315.808015] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
>        [  315.808015] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>        [  315.808015] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
>        [  315.808015] [<     inline     >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>        [  315.808015] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>        [  315.808015] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>        [  315.808015] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>        [  315.808015] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>        [  315.808015] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
> net/socket.c:829
>        [  315.808015] [<ffffffff81a6f9a3>]
> do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>        [  315.808015] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
> fs/read_write.c:872
>        [  315.808015] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
> fs/read_write.c:911
>        [  315.808015] [<ffffffff81a73075>] do_writev+0x115/0x2d0
> fs/read_write.c:944
>        [  315.808015] [<     inline     >] SYSC_writev fs/read_write.c:1017
>        [  315.808015] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
> fs/read_write.c:1014
>        [  315.808015] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> other info that might help us debug this:
>
> Chain exists of:
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(genl_mutex);
>                                lock(nlk->cb_mutex);
>                                lock(genl_mutex);
>   lock(rtnl_mutex);
>
>  *** DEADLOCK ***
>
> 2 locks held by syz-executor7/18794:
>  #0:  (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
> net/netlink/genetlink.c:670
>  #1:  (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
> net/netlink/genetlink.c:31
>  #1:  (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
>
> stack backtrace:
> CPU: 0 PID: 18794 Comm: syz-executor7 Not tainted 4.9.0-rc8+ #77
> Hardware name: Google Google/Google, BIOS Google 01/01/2011
>  ffff88004add6468 ffffffff834c44f9 ffffffff00000000 1ffff100095bac20
>  ffffed00095bac18 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
>  [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
> kernel/locking/lockdep.c:1202
>  [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
>  [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
>  [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
>  [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>  [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
>  [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
>  [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>  [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70
>  [<ffffffff87b5cdf9>] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
>  [<ffffffff86cc1cd0>] genl_family_rcv_msg+0x780/0x1070
> net/netlink/genetlink.c:631
>  [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
>  [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
>  [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
>  [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
>  [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
>  [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>  [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [<ffffffff81a6f9a3>] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>  [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
>  [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0 fs/read_write.c:911
>  [<ffffffff81a73075>] do_writev+0x115/0x2d0 fs/read_write.c:944
>  [<     inline     >] SYSC_writev fs/read_write.c:1017
>  [<ffffffff81a7682c>] SyS_writev+0x2c/0x40 fs/read_write.c:1014
>  [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6



Probably a related one:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor5/5777 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [<     inline     >] genl_lock
net/netlink/genetlink.c:31
 (genl_mutex){+.+.+.}, at: [<ffffffff86cc0c26>]
genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
but task is already holding lock:
 (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

       [  158.966653] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  158.966653] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  158.966653] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
       [  158.966653] [<     inline     >] netlink_dump_start
include/linux/netlink.h:165
       [  158.966653] [<ffffffff86d1395f>]
ctnetlink_get_ct_unconfirmed+0x17f/0x220
net/netfilter/nf_conntrack_netlink.c:1369
       [  158.966653] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
       [  158.966653] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  158.966653] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
       [  158.966653] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  158.966653] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  158.966653] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  158.966653] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  158.966653] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  158.966653] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  158.966653] [<     inline     >] new_sync_write fs/read_write.c:499
       [  158.966653] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  158.966653] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  158.966653] [<     inline     >] SYSC_write fs/read_write.c:607
       [  158.966653] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  158.966653] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  158.966653] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  158.966653] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  158.966653] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
       [  158.966653] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
       [  158.966653] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
       [  158.966653] [<     inline     >] __raw_notifier_call_chain
kernel/notifier.c:394
       [  158.966653] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
       [  158.966653] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
       [  158.966653] [<     inline     >] call_netdevice_notifiers
net/core/dev.c:1661
       [  158.966653] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
       [  158.966653] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
       [  158.966653] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
       [  158.966653] [<     inline     >] unregister_netdevice
include/linux/netdevice.h:2455
       [  158.966653] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
       [  158.966653] [<     inline     >] tun_detach drivers/net/tun.c:578
       [  158.966653] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
       [  158.966653] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
       [  158.966653] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
       [  158.966653] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
       [  158.966653] [<     inline     >] exit_task_work
include/linux/task_work.h:21
       [  158.966653] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
       [  158.966653] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
       [  159.308048] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
       [  159.308048] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
       [  159.308048] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
       [  159.308048] [<     inline     >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
       [  159.308048] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
       [  159.308048] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6

       [  159.308048] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  159.308048] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  159.308048] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
       [  159.308048] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
       [  159.308048] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
       [  159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  159.308048] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  159.308048] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  159.308048] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
       [  159.308048] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
       [  159.308048] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
       [  159.308048] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
       [  159.308048] [<     inline     >] SYSC_writev fs/read_write.c:1017
       [  159.308048] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
       [  159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

       [  159.308048] [<     inline     >] check_prev_add
kernel/locking/lockdep.c:1828
       [  159.308048] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
       [  159.308048] [<     inline     >] validate_chain
kernel/locking/lockdep.c:2265
       [  159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
       [  159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
       [  159.308048] [<     inline     >] __mutex_lock_common
kernel/locking/mutex.c:521
       [  159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
       [  159.308048] [<     inline     >] genl_lock net/netlink/genetlink.c:31
       [  159.308048] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
       [  159.308048] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
       [  159.308048] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
       [  159.308048] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
       [  159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
       [  159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
       [  159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
       [  159.308048] [<     inline     >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
       [  159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
       [  159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
       [  159.308048] [<     inline     >] sock_sendmsg_nosec net/socket.c:621
       [  159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
       [  159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
       [  159.308048] [<     inline     >] new_sync_write fs/read_write.c:499
       [  159.308048] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
       [  159.308048] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
       [  159.308048] [<     inline     >] SYSC_write fs/read_write.c:607
       [  159.308048] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
       [  159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

Chain exists of:
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(nlk->cb_mutex);
                               lock(&table[i].mutex);
                               lock(nlk->cb_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor5/5777:
 #0:  (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
 #1:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084

stack backtrace:
CPU: 1 PID: 5777 Comm: syz-executor5 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
 ffff88005fe363e8 ffffffff834c44f9 ffffffff00000001 1ffff1000bfc6c10
 ffffed000bfc6c08 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
 0000000000000000 0000000000000000 0000000000000000 dffffc0000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
 [<     inline     >] check_prev_add kernel/locking/lockdep.c:1828
 [<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
 [<     inline     >] validate_chain kernel/locking/lockdep.c:2265
 [<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
 [<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
 [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
 [<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
 [<     inline     >] genl_lock net/netlink/genetlink.c:31
 [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
 [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127
 [<ffffffff86cb7b6a>] __netlink_dump_start+0x4ea/0x760
net/netlink/af_netlink.c:2217
 [<ffffffff86cc2319>] genl_family_rcv_msg+0xdc9/0x1070
net/netlink/genetlink.c:586
 [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
 [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
 [<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
 [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
 [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
 [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
 [<     inline     >] new_sync_write fs/read_write.c:499
 [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830 fs/read_write.c:512
 [<ffffffff81a71c55>] vfs_write+0x175/0x4e0 fs/read_write.c:560
 [<     inline     >] SYSC_write fs/read_write.c:607
 [<ffffffff81a760e0>] SyS_write+0x100/0x240 fs/read_write.c:599
 [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

^ permalink raw reply

* Re: [PATCH 0/1] NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
From: David Miller @ 2016-12-08 18:02 UTC (permalink / raw)
  To: dnlplm; +Cc: bjorn, oliver, netdev, linux-usb
In-Reply-To: <1481116068-32691-1-git-send-email-dnlplm@gmail.com>

From: Daniele Palmas <dnlplm@gmail.com>
Date: Wed,  7 Dec 2016 14:07:47 +0100

> Telit LE922A MBIM based composition does not work properly
> with altsetting toggle done in cdc_ncm_bind_common.
> 
> This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
> to avoid this procedure that, instead, is mandatory for
> other modems.
> 
> References:
> https://www.spinics.net/lists/linux-usb/msg149249.html
> https://www.spinics.net/lists/linux-usb/msg149819.html
> 
> Thanks to Bjørn for the productive discussion and feedback!

Patch applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Eric Dumazet @ 2016-12-08 18:02 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CALx6S34e_5KW3cdxS_yNXwhYuK2FQe=6+9=yTCVDsg6f2vx87g@mail.gmail.com>

On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:

> Of course that would only help on systems where no one enable encaps,
> ie. looks good in the the simple benchmarks but in real life if just
> one socket enables encap everyone else takes the hit. Alternatively,
> maybe we could do early demux when we do the lookup in GRO to
> eliminate the extra lookup?

Well, if you do the lookup in GRO, wont it be done for every incoming
MSS, instead of once per GRO packet ?

Anyway, the flooded UDP sockets out there are not normally connected
ones.

^ permalink raw reply

* Re: [PATCH v3 0/6] net: stmmac: make DMA programmable burst length more configurable
From: David Miller @ 2016-12-08 18:07 UTC (permalink / raw)
  To: niklas.cassel; +Cc: netdev, niklass, devicetree, linux-kernel, linux-doc
In-Reply-To: <1481120409-18103-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>
Date: Wed, 7 Dec 2016 15:20:02 +0100

> Make DMA programmable burst length more configurable in the stmmac driver.
> 
> This is done by adding support for independent pbl for tx/rx through DT.
> More fine grained tuning of pbl is possible thanks to a DT property saying
> that we should NOT multiply pbl values by x8/x4 in hardware.
> 
> All new DT properties are optional, and created in a way that it will not
> affect any existing DT configurations.

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Eric Dumazet @ 2016-12-08 18:07 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CALx6S34e_5KW3cdxS_yNXwhYuK2FQe=6+9=yTCVDsg6f2vx87g@mail.gmail.com>

On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:

> Of course that would only help on systems where no one enable encaps,
> ie. looks good in the the simple benchmarks but in real life if just
> one socket enables encap everyone else takes the hit.

Well, in real life most linux hosts do not use any UDP encapsulation.

Or if they do, maybe they still have to handle a lot of UDP traffic
which does not hit a tunnel in the kernel.

Anyway, my difference vs GRO on/off were caused by copybreak in mlx4
driver.

GRO off --> mlx4 uses copybreak for small messages (all protocols)
GRO on  --> no copybreak for native protocols (IP+TCP IP+UDP)

The lookup being done twice is not that expensive, if the first two
cache lines of the socket stay shared (mostly read)

^ permalink raw reply

* Re: [net-next] macsec: remove first zero and add attribute name in comments
From: David Miller @ 2016-12-08 18:08 UTC (permalink / raw)
  To: zhangshengju; +Cc: netdev
In-Reply-To: <1481122929-19147-1-git-send-email-zhangshengju@cmss.chinamobile.com>

From: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Date: Wed,  7 Dec 2016 23:02:09 +0800

> Remove first zero for add, and use full attribute name in comments.
> 
> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-08 18:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161208080647-mutt-send-email-mst@kernel.org>

On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
>> This adds support for the XDP_TX action to virtio_net. When an XDP
>> program is run and returns the XDP_TX action the virtio_net XDP
>> implementation will transmit the packet on a TX queue that aligns
>> with the current CPU that the XDP packet was processed on.
>>
>> Before sending the packet the header is zeroed.  Also XDP is expected
>> to handle checksum correctly so no checksum offload  support is
>> provided.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 92 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 28b1196..8e5b13c 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>  	return skb;
>>  }
>>  
>> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
>> +			     struct receive_queue *rq,
>> +			     struct send_queue *sq,
>> +			     struct xdp_buff *xdp)
>> +{
>> +	struct page *page = virt_to_head_page(xdp->data);
>> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
>> +	unsigned int num_sg, len;
>> +	void *xdp_sent;
>> +	int err;
>> +
>> +	/* Free up any pending old buffers before queueing new ones. */
>> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		struct page *sent_page = virt_to_head_page(xdp_sent);
>> +
>> +		if (vi->mergeable_rx_bufs)
>> +			put_page(sent_page);
>> +		else
>> +			give_pages(rq, sent_page);
>> +	}
> 
> Looks like this is the only place where you do virtqueue_get_buf.
> No interrupt handler?
> This means that if you fill up the queue, nothing will clean it
> and things will get stuck.

hmm OK so the callbacks should be implemented to do this and a pair
of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
and disable callbacks if packets are enqueued.

Also in the normal xmit path via start_xmit() will the same condition
happen? It looks like free_old_xmit_skbs for example is only called if
a packet is sent could we end up holding on to skbs in this case? I
don't see free_old_xmit_skbs being called from any callbacks?

> Can this be the issue you saw?

nope see below I was mishandling the big_packets page cleanup path in
the error case.

> 
> 
>> +
>> +	/* Zero header and leave csum up to XDP layers */
>> +	hdr = xdp->data;
>> +	memset(hdr, 0, vi->hdr_len);
>> +
>> +	nu_sg = 1;
>> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
>> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>> +				   xdp->data, GFP_ATOMIC);
>> +	if (unlikely(err)) {
>> +		if (vi->mergeable_rx_bufs)
>> +			put_page(page);
>> +		else
>> +			give_pages(rq, page);
>> +	} else if (!vi->mergeable_rx_bufs) {
>> +		/* If not mergeable bufs must be big packets so cleanup pages */
>> +		give_pages(rq, (struct page *)page->private);
>> +		page->private = 0;
>> +	}
>> +
>> +	virtqueue_kick(sq->vq);
> 
> Is this unconditional kick a work-around for hang
> we could not figure out yet?

I tracked the original issue down to how I handled the big_packet page
cleanups.

> I guess this helps because it just slows down the guest.
> I don't much like it ...

I left it like this copying the pattern in balloon and input drivers. I
can change it back to the previous pattern where it is only called if
there is no errors. It has been running fine with the old pattern now
for an hour or so.

.John

^ permalink raw reply

* Re: [PATCH net-next] net: rfs: add a jump label
From: David Miller @ 2016-12-08 18:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481128150.4930.25.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 08:29:10 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> RFS is not commonly used, so add a jump label to avoid some conditionals
> in fast path.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, but I wonder how effective this will really be in the long run.

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Paolo Abeni @ 2016-12-08 18:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481218739-27089-5-git-send-email-edumazet@google.com>

On Thu, 2016-12-08 at 09:38 -0800, Eric Dumazet wrote:
> If udp_recvmsg() constantly releases sk_rmem_alloc
> for every read packet, it gives opportunity for
> producers to immediately grab spinlocks and desperatly
> try adding another packet, causing false sharing.
> 
> We can add a simple heuristic to give the signal
> by batches of ~25 % of the queue capacity.
> 
> This patch considerably increases performance under
> flood by about 50 %, since the thread draining the queue
> is no longer slowed by false sharing.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/udp.h |  3 +++
>  net/ipv4/udp.c      | 11 +++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index d1fd8cd39478..c0f530809d1f 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -79,6 +79,9 @@ struct udp_sock {
>  	int			(*gro_complete)(struct sock *sk,
>  						struct sk_buff *skb,
>  						int nhoff);
> +
> +	/* This field is dirtied by udp_recvmsg() */
> +	int		forward_deficit;
>  };
>  
>  static inline struct udp_sock *udp_sk(const struct sock *sk)
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 880cd3d84abf..f0096d088104 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1177,8 +1177,19 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
>  /* fully reclaim rmem/fwd memory allocated for skb */
>  static void udp_rmem_release(struct sock *sk, int size, int partial)
>  {
> +	struct udp_sock *up = udp_sk(sk);
>  	int amt;
>  
> +	if (likely(partial)) {
> +		up->forward_deficit += size;
> +		size = up->forward_deficit;
> +		if (size < (sk->sk_rcvbuf >> 2))
> +			return;
> +	} else {
> +		size += up->forward_deficit;
> +	}
> +	up->forward_deficit = 0;
> +
>  	atomic_sub(size, &sk->sk_rmem_alloc);
>  	sk->sk_forward_alloc += size;
>  	amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);

Nice one! This sounds like a relevant improvement! 

I'm wondering if it may cause regressions with small value of
sk_rcvbuf ?!? e.g. with:

netperf -t UDP_STREAM  -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024

I'm sorry, I fear I will not unable to do any test before next week.

Cheers,

Paolo

^ permalink raw reply

* Re: [PATCH net-next V3 0/7] liquidio VF data path
From: David Miller @ 2016-12-08 18:25 UTC (permalink / raw)
  To: rvatsavayi; +Cc: netdev
In-Reply-To: <1481129677-10586-1-git-send-email-rvatsavayi@caviumnetworks.com>

From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Wed, 7 Dec 2016 08:54:30 -0800

> Following is V3 patch series that adds support for VF
> data path related features. It also has following changes
> related to previous comments:
> 1) Remove unnecessary "void *" casting.
> 2) Remove inline for functions and let gcc decide.
> 
> Please apply patches in following order as some of them
> depend on earlier patches.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] udp: under rx pressure, try to condense skbs
From: David Miller @ 2016-12-08 18:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481131173.4930.36.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 09:19:33 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Under UDP flood, many softirq producers try to add packets to
> UDP receive queue, and one user thread is burning one cpu trying
> to dequeue packets as fast as possible.
> 
> Two parts of the per packet cost are :
> - copying payload from kernel space to user space,
> - freeing memory pieces associated with skb.
> 
> If socket is under pressure, softirq handler(s) can try to pull in
> skb->head the payload of the packet if it fits.
> 
> Meaning the softirq handler(s) can free/reuse the page fragment
> immediately, instead of letting udp_recvmsg() do this hundreds of usec
> later, possibly from another node.
> 
> 
> Additional gains :
> - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
> - We avoid cache line misses at copyout() time and consume_skb() time,
> and avoid one put_page() with potential alien freeing on NUMA hosts.
> 
> This comes at the cost of a copy, bounded to available tail room, which
> is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
> than necessary)
> 
> This patch gave me about 5 % increase in throughput in my tests.
> 
> skb_condense() helper could probably used in other contexts.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This is isolated to UDP, and would be easy to revert if it causes
problems.  So applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net v2 1/1] driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
From: Mahesh Bandewar (महेश बंडेवार) @ 2016-12-08 18:29 UTC (permalink / raw)
  To: fgao; +Cc: David Miller, Eric Dumazet, linux-netdev, gfree.wind
In-Reply-To: <1481167018-559-1-git-send-email-fgao@ikuai8.com>

On Wed, Dec 7, 2016 at 7:16 PM,  <fgao@ikuai8.com> wrote:
> From: Gao Feng <fgao@ikuai8.com>
>
> When netdev_upper_dev_unlink failed in ipvlan_link_new, need to
> unlink the ipvlan dev with upper dev.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  v2: Rename the label to unlink_netdev, per Mahesh Bandewar
>  v1: Initial patch
>
>  drivers/net/ipvlan/ipvlan_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 0fef178..dfbc4ef 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -546,13 +546,15 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>         }
>         err = ipvlan_set_port_mode(port, mode);
>         if (err) {
> -               goto unregister_netdev;
> +               goto unlink_netdev;
>         }
>
>         list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
>         netif_stacked_transfer_operstate(phy_dev, dev);
>         return 0;
>
> +unlink_netdev:
> +       netdev_upper_dev_unlink(phy_dev, dev);
>  unregister_netdev:
>         unregister_netdevice(dev);
>  destroy_ipvlan_port:
> --
> 1.9.1
>
>

^ permalink raw reply

* Re: [PATCH] net: pch_gbe: Fix TX RX descriptor accesses for big endian systems
From: David Miller @ 2016-12-08 18:29 UTC (permalink / raw)
  To: hassan.naveed; +Cc: netdev, paul.burton, matt.redfearn, fw, romieu
In-Reply-To: <1481133534-26224-1-git-send-email-hassan.naveed@imgtec.com>

From: Hassan Naveed <hassan.naveed@imgtec.com>
Date: Wed, 7 Dec 2016 09:58:54 -0800

> Fix pch_gbe driver for ethernet operations for a big endian CPU.
> Values written to and read from transmit and receive descriptors
> in the pch_gbe driver are byte swapped from the perspective of a
> big endian CPU, since the ethernet controller always operates in
> little endian mode. Rectify this by appropriately byte swapping
> these descriptor field values in the driver software.
> 
> Signed-off-by: Hassan Naveed <hassan.naveed@imgtec.com>
> Reviewed-by: Paul Burton <paul.burton@imgtec.com>
> Reviewed-by: Matt Redfearn <matt.redfearn@imgtec.com>

As explained by Francois, you need to use the proper endian types in
the descriptor datastructure.

Then please run sparse with endianness checking enabled on the build
of the driver.

^ permalink raw reply

* Re: [PATCH net-next] net: do not read sk_drops if application does not care
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481133936.4930.51.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 10:05:36 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> sk_drops can be an often written field, do not read it unless
> application showed interest.
> 
> Note that sk_drops can be read via inet_diag, so applications
> can avoid getting this info from every received packet.
> 
> In the future, 'reading' sk_drops might require folding per node or per
> cpu fields, and thus become even more expensive than today.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: rfs: add a jump label
From: Eric Dumazet @ 2016-12-08 18:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, pabeni
In-Reply-To: <20161208.131900.434329215014851517.davem@davemloft.net>

On Thu, 2016-12-08 at 13:19 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 07 Dec 2016 08:29:10 -0800
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > RFS is not commonly used, so add a jump label to avoid some conditionals
> > in fast path.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied, but I wonder how effective this will really be in the long run.

I guess this applies to about all jump labels.

As soon as the attribute is per namespace, we no longer can use them.

A conditional cost really depends on the expression complexity
(including cache line misses)

TCP stack might benefit from jump labels, like sysctl_tcp_low_latency
which is often set to 1 on hosts mostly using epoll()/poll()/select()
instead of blocking read()/recvmsg()

^ permalink raw reply

* Re: [PATCH net-next] bpf: fix state equivalence
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
  To: ast; +Cc: daniel, jbacik, tgraf, netdev
In-Reply-To: <1481137079-2205635-1-git-send-email-ast@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 7 Dec 2016 10:57:59 -0800

> Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> and 484611357c19 ("bpf: allow access into map value arrays") by themselves
> are correct, but in combination they make state equivalence ignore 'id' field
> of the register state which can lead to accepting invalid program.
> 
> Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> Fixes: 484611357c19 ("bpf: allow access into map value arrays")
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:36 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481221491.6120.11.camel@redhat.com>

On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:

> Nice one! This sounds like a relevant improvement!
>
> I'm wondering if it may cause regressions with small value of
> sk_rcvbuf ?!? e.g. with:
>
> netperf -t UDP_STREAM  -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>

Possibly, then simply we can refine the test to :

size = up->forward_deficit;
if (size < (sk->sk_rcvbuf >> 2)  && !skb_queue_empty(sk->sk_receive_buf))
     return;

^ permalink raw reply

* Re: [PATCH v4 net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Saeed Mahameed @ 2016-12-08 18:36 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
	Daniel Borkmann, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Saeed Mahameed,
	Tariq Toukan, Kernel Team
In-Reply-To: <1481154794-2311034-3-git-send-email-kafai@fb.com>

On Thu, Dec 8, 2016 at 1:53 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> When XDP is active in mlx4, mlx4 is using one page/pkt.
> At the same time (i.e. when XDP is active), it is currently
> limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
> which is 1514 in x86.  AFAICT, we can at least raise the MTU
> limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
> patch is doing.  It will be useful in the next patch which
> allows XDP program to extend the packet by adding new header(s).
>
> Note: In the earlier XDP patches, there is already existing guard
> to ensure the page/pkt scheme only applies when XDP is active
> in mlx4.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:38 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <CANn89iL8r9UD=sGn3WxVFZ+Z_QJYYM6aXxCFvafwvJ-bEtNhKQ@mail.gmail.com>

On Thu, Dec 8, 2016 at 10:36 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>
>> Nice one! This sounds like a relevant improvement!
>>
>> I'm wondering if it may cause regressions with small value of
>> sk_rcvbuf ?!? e.g. with:
>>
>> netperf -t UDP_STREAM  -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>>
>
> Possibly, then simply we can refine the test to :
>
> size = up->forward_deficit;
> if (size < (sk->sk_rcvbuf >> 2)  && !skb_queue_empty(sk->sk_receive_buf))
>      return;

BTW, I tried :

lpaa6:~# ./netperf -t UDP_STREAM  -H 127.0.0.1 -- -s 1280 -S 1280 -m
1024 -M 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
127.0.0.1 () port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

  4608    1024   10.00     4499400      0    3685.88
  2560           10.00     4498670           3685.28

So it looks like it is working.

However I have no doubt there might be a corner case for tiny
SO_RCVBUF values or for some message sizes.

^ permalink raw reply

* Re: [PATCH] drivers: net: xgene: initialize slots
From: Iyappan Subramanian @ 2016-12-08 18:44 UTC (permalink / raw)
  To: Colin King; +Cc: Keyur Chudgar, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20161208111754.9711-1-colin.king@canonical.com>

On Thu, Dec 8, 2016 at 3:17 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> static analysis using cppcheck detected that slots was uninitialized.
> Fix this by initializing it to buf_pool->slots - 1
>
> Found using static analysis with CoverityScan, CID #1387620
>
> Fixes: a9380b0f7be818 ("drivers: net: xgene: Add support for Jumbo frame")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> index 6c7eea8..899163c 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> @@ -636,6 +636,7 @@ static void xgene_enet_free_pagepool(struct xgene_enet_desc_ring *buf_pool,
>
>         dev = ndev_to_dev(buf_pool->ndev);
>         head = buf_pool->head;
> +       slots = buf_pool->slots - 1;
>
>         for (i = 0; i < 4; i++) {
>                 frag_size = xgene_enet_get_data_len(le64_to_cpu(desc[i ^ 1]));

Thanks, Colin.

Dan Carpenter <dan.carpenter@oracle.com> posted the fix already and
got accepted.
http://marc.info/?l=linux-netdev&m=148110980224343&w=2

> --
> 2.10.2
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox