* Re: [PATCH net] mlxsw: spectrum_switchdev: Do not remove mrouter port from MDB's ports list
From: David Miller @ 2018-04-27 17:45 UTC (permalink / raw)
To: idosch; +Cc: netdev, jiri, nogahf, colin.king, mlxsw
In-Reply-To: <20180426084629.20978-1-idosch@mellanox.com>
From: Ido Schimmel <idosch@mellanox.com>
Date: Thu, 26 Apr 2018 11:46:29 +0300
> When IGMP snooping is enabled on a bridge, traffic forwarded by an MDB
> entry should be sent to both ports member in the MDB's ports list and
> mrouter ports.
>
> In case a port needs to be removed from an MDB's ports list, but this
> port is also configured as an mrouter port, then do not update the
> device so that it will continue to forward traffic through that port.
>
> Fix a copy-paste error that checked that IGMP snooping is enabled twice
> instead of checking the port's mrouter state.
>
> Fixes: ded711c87a04 ("mlxsw: spectrum_switchdev: Consider mrouter status for mdb changes")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Colin King <colin.king@canonical.com>
> Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next v9 0/4] Enable virtio_net to act as a standby for a passthru device
From: Jiri Pirko @ 2018-04-27 17:45 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <1524848820-42258-1-git-send-email-sridhar.samudrala@intel.com>
Fri, Apr 27, 2018 at 07:06:56PM CEST, sridhar.samudrala@intel.com wrote:
>v9:
>Select NET_FAILOVER automatically when VIRTIO_NET/HYPERV_NET
>are enabled. (stephen)
>
>Tested live migration with virtio-net/AVF(i40evf) configured in
>failover mode while running iperf in background.
>Build tested netvsc module.
>
>The main motivation for this patch is to enable cloud service providers
>to provide an accelerated datapath to virtio-net enabled VMs in a
>transparent manner with no/minimal guest userspace changes. This also
>enables hypervisor controlled live migration to be supported with VMs that
>have direct attached SR-IOV VF devices.
>
>Patch 1 introduces a new feature bit VIRTIO_NET_F_STANDBY that can be
>used by hypervisor to indicate that virtio_net interface should act as
>a standby for another device with the same MAC address.
>
>Patch 2 introduces a failover module that provides a generic interface for
>paravirtual drivers to listen for netdev register/unregister/link change
>events from pci ethernet devices with the same MAC and takeover their
>datapath. The notifier and event handling code is based on the existing
>netvsc implementation. It provides 2 sets of interfaces to paravirtual
>drivers to support 2-netdev(netvsc) and 3-netdev(virtio_net) models.
>
>Patch 3 extends virtio_net to use alternate datapath when available and
>registered. When STANDBY feature is enabled, virtio_net driver creates
>an additional 'failover' netdev that acts as a master device and controls
>2 slave devices. The original virtio_net netdev is registered as
>'standby' netdev and a passthru/vf device with the same MAC gets
>registered as 'primary' netdev. Both 'standby' and 'primary' netdevs are
>associated with the same 'pci' device. The user accesses the network
>interface via 'failover' netdev. The 'failover' netdev chooses 'primary'
>netdev as default for transmits when it is available with link up and
>running.
>
>Patch 4 refactors netvsc to use the registration/notification framework
>supported by failover module.
>
>As this patch series is initially focusing on usecases where hypervisor
>fully controls the VM networking and the guest is not expected to directly
>configure any hardware settings, it doesn't expose all the ndo/ethtool ops
>that are supported by virtio_net at this time. To support additional usecases,
>it should be possible to enable additional ops later by caching the state
>in virtio netdev and replaying when the 'primary' netdev gets registered.
>
>The hypervisor needs to enable only one datapath at any time so that packets
>don't get looped back to the VM over the other datapath. When a VF is
>plugged, the virtio datapath link state can be marked as down.
>At the time of live migration, the hypervisor needs to unplug the VF device
>from the guest on the source host and reset the MAC filter of the VF to
>initiate failover of datapath to virtio before starting the migration. After
>the migration is completed, the destination hypervisor sets the MAC filter
>on the VF and plugs it back to the guest to switch over to VF datapath.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
No changes in v9?
>
>v8:
>- Made the failover managment routines more robust by updating the feature
> bits/other fields in the failover netdev when slave netdevs are
> registered/unregistered. (mst)
>- added support for handling vlans.
>- Limited the changes in netvsc to only use the notifier/event/lookups
> from the failover module. The slave register/unregister/link-change
> handlers are only updated to use the getbymac routine to get the
> upper netdev. There is no change in their functionality. (stephen)
>- renamed structs/function/file names to use net_failover prefix. (mst)
>
>v7
>- Rename 'bypass/active/backup' terminology with 'failover/primary/standy'
> (jiri, mst)
>- re-arranged dev_open() and dev_set_mtu() calls in the register routines
> so that they don't get called for 2-netdev model. (stephen)
>- fixed select_queue() routine to do queue selection based on VF if it is
> registered as primary. (stephen)
>- minor bugfixes
>
>v6 RFC:
> Simplified virtio_net changes by moving all the ndo_ops of the
> bypass_netdev and create/destroy of bypass_netdev to 'bypass' module.
> avoided 2 phase registration(driver + instances).
> introduced IFF_BYPASS/IFF_BYPASS_SLAVE dev->priv_flags
> replaced mutex with a spinlock
>
>v5 RFC:
> Based on Jiri's comments, moved the common functionality to a 'bypass'
> module so that the same notifier and event handlers to handle child
> register/unregister/link change events can be shared between virtio_net
> and netvsc.
> Improved error handling based on Siwei's comments.
>v4:
>- Based on the review comments on the v3 version of the RFC patch and
> Jakub's suggestion for the naming issue with 3 netdev solution,
> proposed 3 netdev in-driver bonding solution for virtio-net.
>v3 RFC:
>- Introduced 3 netdev model and pointed out a couple of issues with
> that model and proposed 2 netdev model to avoid these issues.
>- Removed broadcast/multicast optimization and only use virtio as
> backup path when VF is unplugged.
>v2 RFC:
>- Changed VIRTIO_NET_F_MASTER to VIRTIO_NET_F_BACKUP (mst)
>- made a small change to the virtio-net xmit path to only use VF datapath
> for unicasts. Broadcasts/multicasts use virtio datapath. This avoids
> east-west broadcasts to go over the PCI link.
>- added suppport for the feature bit in qemu
>
>Sridhar Samudrala (4):
> virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
> net: Introduce generic failover module
> virtio_net: Extend virtio to use VF datapath when available
> netvsc: refactor notifier/event handling code to use the failover
> framework
>
> drivers/net/Kconfig | 1 +
> drivers/net/hyperv/Kconfig | 1 +
> drivers/net/hyperv/hyperv_net.h | 2 +
> drivers/net/hyperv/netvsc_drv.c | 134 ++----
> drivers/net/virtio_net.c | 37 +-
> include/linux/netdevice.h | 16 +
> include/net/net_failover.h | 62 +++
> include/uapi/linux/virtio_net.h | 3 +
> net/Kconfig | 10 +
> net/core/Makefile | 1 +
> net/core/net_failover.c | 892 ++++++++++++++++++++++++++++++++++++++++
> 11 files changed, 1046 insertions(+), 113 deletions(-)
> create mode 100644 include/net/net_failover.h
> create mode 100644 net/core/net_failover.c
>
>--
>2.14.3
^ permalink raw reply
* Re: pull-request: wireless-drivers 2018-04-26
From: David Miller @ 2018-04-27 17:50 UTC (permalink / raw)
To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87h8ny6ztl.fsf@kamboji.qca.qualcomm.com>
From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 26 Apr 2018 13:12:54 +0300
> here's a pull request to net tree, more info below. Please let me know
> if you have any problems.
Pulled, thanks Kalle.
^ permalink raw reply
* Re: [PATCH net-next] geneve: fix build with modular IPV6
From: David Miller @ 2018-04-27 17:52 UTC (permalink / raw)
To: tobias.regnery; +Cc: netdev, linux-kernel, alexey.kodanev
In-Reply-To: <20180426103636.16113-1-tobias.regnery@gmail.com>
From: Tobias Regnery <tobias.regnery@gmail.com>
Date: Thu, 26 Apr 2018 12:36:36 +0200
> Commit c40e89fd358e ("geneve: configure MTU based on a lower device") added
> an IS_ENABLED(CONFIG_IPV6) to geneve, leading to the following link error
> with CONFIG_GENEVE=y and CONFIG_IPV6=m:
>
> drivers/net/geneve.o: In function `geneve_link_config':
> geneve.c:(.text+0x14c): undefined reference to `rt6_lookup'
>
> Fix this by adding a Kconfig dependency and forcing GENEVE to be a module
> when IPV6 is a module.
>
> Fixes: c40e89fd358e ("geneve: configure MTU based on a lower device")
> Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next v9 0/4] Enable virtio_net to act as a standby for a passthru device
From: Samudrala, Sridhar @ 2018-04-27 17:53 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180427174523.GE5632@nanopsycho.orion>
On 4/27/2018 10:45 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:56PM CEST, sridhar.samudrala@intel.com wrote:
>> v9:
>> Select NET_FAILOVER automatically when VIRTIO_NET/HYPERV_NET
>> are enabled. (stephen)
>>
>> Tested live migration with virtio-net/AVF(i40evf) configured in
>> failover mode while running iperf in background.
>> Build tested netvsc module.
>>
>> The main motivation for this patch is to enable cloud service providers
>> to provide an accelerated datapath to virtio-net enabled VMs in a
>> transparent manner with no/minimal guest userspace changes. This also
>> enables hypervisor controlled live migration to be supported with VMs that
>> have direct attached SR-IOV VF devices.
>>
>> Patch 1 introduces a new feature bit VIRTIO_NET_F_STANDBY that can be
>> used by hypervisor to indicate that virtio_net interface should act as
>> a standby for another device with the same MAC address.
>>
>> Patch 2 introduces a failover module that provides a generic interface for
>> paravirtual drivers to listen for netdev register/unregister/link change
>> events from pci ethernet devices with the same MAC and takeover their
>> datapath. The notifier and event handling code is based on the existing
>> netvsc implementation. It provides 2 sets of interfaces to paravirtual
>> drivers to support 2-netdev(netvsc) and 3-netdev(virtio_net) models.
>>
>> Patch 3 extends virtio_net to use alternate datapath when available and
>> registered. When STANDBY feature is enabled, virtio_net driver creates
>> an additional 'failover' netdev that acts as a master device and controls
>> 2 slave devices. The original virtio_net netdev is registered as
>> 'standby' netdev and a passthru/vf device with the same MAC gets
>> registered as 'primary' netdev. Both 'standby' and 'primary' netdevs are
>> associated with the same 'pci' device. The user accesses the network
>> interface via 'failover' netdev. The 'failover' netdev chooses 'primary'
>> netdev as default for transmits when it is available with link up and
>> running.
>>
>> Patch 4 refactors netvsc to use the registration/notification framework
>> supported by failover module.
>>
>> As this patch series is initially focusing on usecases where hypervisor
>> fully controls the VM networking and the guest is not expected to directly
>> configure any hardware settings, it doesn't expose all the ndo/ethtool ops
>> that are supported by virtio_net at this time. To support additional usecases,
>> it should be possible to enable additional ops later by caching the state
>> in virtio netdev and replaying when the 'primary' netdev gets registered.
>>
>> The hypervisor needs to enable only one datapath at any time so that packets
>> don't get looped back to the VM over the other datapath. When a VF is
>> plugged, the virtio datapath link state can be marked as down.
>> At the time of live migration, the hypervisor needs to unplug the VF device
> >from the guest on the source host and reset the MAC filter of the VF to
>> initiate failover of datapath to virtio before starting the migration. After
>> the migration is completed, the destination hypervisor sets the MAC filter
>> on the VF and plugs it back to the guest to switch over to VF datapath.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>
> No changes in v9?
I listed v9 updates at the start of the message.
v9:
Select NET_FAILOVER automatically when VIRTIO_NET/HYPERV_NET
are enabled. (stephen)
Tested live migration with virtio-net/AVF(i40evf) configured in
failover mode while running iperf in background.
Build tested netvsc module.
>
>> v8:
>> - Made the failover managment routines more robust by updating the feature
>> bits/other fields in the failover netdev when slave netdevs are
>> registered/unregistered. (mst)
>> - added support for handling vlans.
>> - Limited the changes in netvsc to only use the notifier/event/lookups
>> from the failover module. The slave register/unregister/link-change
>> handlers are only updated to use the getbymac routine to get the
>> upper netdev. There is no change in their functionality. (stephen)
>> - renamed structs/function/file names to use net_failover prefix. (mst)
>>
>> v7
>> - Rename 'bypass/active/backup' terminology with 'failover/primary/standy'
>> (jiri, mst)
>> - re-arranged dev_open() and dev_set_mtu() calls in the register routines
>> so that they don't get called for 2-netdev model. (stephen)
>> - fixed select_queue() routine to do queue selection based on VF if it is
>> registered as primary. (stephen)
>> - minor bugfixes
>>
>> v6 RFC:
>> Simplified virtio_net changes by moving all the ndo_ops of the
>> bypass_netdev and create/destroy of bypass_netdev to 'bypass' module.
>> avoided 2 phase registration(driver + instances).
>> introduced IFF_BYPASS/IFF_BYPASS_SLAVE dev->priv_flags
>> replaced mutex with a spinlock
>>
>> v5 RFC:
>> Based on Jiri's comments, moved the common functionality to a 'bypass'
>> module so that the same notifier and event handlers to handle child
>> register/unregister/link change events can be shared between virtio_net
>> and netvsc.
>> Improved error handling based on Siwei's comments.
>> v4:
>> - Based on the review comments on the v3 version of the RFC patch and
>> Jakub's suggestion for the naming issue with 3 netdev solution,
>> proposed 3 netdev in-driver bonding solution for virtio-net.
>> v3 RFC:
>> - Introduced 3 netdev model and pointed out a couple of issues with
>> that model and proposed 2 netdev model to avoid these issues.
>> - Removed broadcast/multicast optimization and only use virtio as
>> backup path when VF is unplugged.
>> v2 RFC:
>> - Changed VIRTIO_NET_F_MASTER to VIRTIO_NET_F_BACKUP (mst)
>> - made a small change to the virtio-net xmit path to only use VF datapath
>> for unicasts. Broadcasts/multicasts use virtio datapath. This avoids
>> east-west broadcasts to go over the PCI link.
>> - added suppport for the feature bit in qemu
>>
>> Sridhar Samudrala (4):
>> virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
>> net: Introduce generic failover module
>> virtio_net: Extend virtio to use VF datapath when available
>> netvsc: refactor notifier/event handling code to use the failover
>> framework
>>
>> drivers/net/Kconfig | 1 +
>> drivers/net/hyperv/Kconfig | 1 +
>> drivers/net/hyperv/hyperv_net.h | 2 +
>> drivers/net/hyperv/netvsc_drv.c | 134 ++----
>> drivers/net/virtio_net.c | 37 +-
>> include/linux/netdevice.h | 16 +
>> include/net/net_failover.h | 62 +++
>> include/uapi/linux/virtio_net.h | 3 +
>> net/Kconfig | 10 +
>> net/core/Makefile | 1 +
>> net/core/net_failover.c | 892 ++++++++++++++++++++++++++++++++++++++++
>> 11 files changed, 1046 insertions(+), 113 deletions(-)
>> create mode 100644 include/net/net_failover.h
>> create mode 100644 net/core/net_failover.c
>>
>> --
>> 2.14.3
^ permalink raw reply
* Re: [PATCH net-next] net: Fix coccinelle warning
From: David Miller @ 2018-04-27 17:53 UTC (permalink / raw)
To: ktkhai; +Cc: netdev, lkp
In-Reply-To: <152474505955.21078.9976470400033894421.stgit@localhost.localdomain>
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu, 26 Apr 2018 15:18:38 +0300
> kbuild test robot says:
>
> >coccinelle warnings: (new ones prefixed by >>)
> >>> net/core/dev.c:1588:2-3: Unneeded semicolon
>
> So, let's remove it.
>
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-04-27 17:53 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <1524848820-42258-3-git-send-email-sridhar.samudrala@intel.com>
Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>This provides a generic interface for paravirtual drivers to listen
>for netdev register/unregister/link change events from pci ethernet
>devices with the same MAC and takeover their datapath. The notifier and
>event handling code is based on the existing netvsc implementation.
>
>It exposes 2 sets of interfaces to the paravirtual drivers.
>1. For paravirtual drivers like virtio_net that use 3 netdev model, the
> the failover module provides interfaces to create/destroy additional
> master netdev and all the slave events are managed internally.
> net_failover_create()
> net_failover_destroy()
> A failover netdev is created that acts a master device and controls 2
> slave devices. The original virtio_net netdev is registered as 'standby'
> netdev and a passthru/vf device with the same MAC gets registered as
> 'primary' netdev. Both 'standby' and 'primary' netdevs are associated
> with the same 'pci' device. The user accesses the network interface via
> 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
> default for transmits when it is available with link up and running.
>2. For existing netvsc driver that uses 2 netdev model, no master netdev
> is created. The paravirtual driver registers each instance of netvsc
> as a 'failover' netdev along with a set of ops to manage the slave
> events. There is no 'standby' netdev in this model. A passthru/vf device
> with the same MAC gets registered as 'primary' netdev.
> net_failover_register()
> net_failover_unregister()
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>---
> include/linux/netdevice.h | 16 +
> include/net/net_failover.h | 62 ++++
> net/Kconfig | 10 +
> net/core/Makefile | 1 +
> net/core/net_failover.c | 892 +++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 981 insertions(+)
> create mode 100644 include/net/net_failover.h
> create mode 100644 net/core/net_failover.c
checkpatch says:
_exportax/0002-net-Introduce-generic-failover-module.patch
----------------------------------------------------------
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#92:
new file mode 100644
Please add an entry to the MAINTAINERS file.
^ permalink raw reply
* Re: [PATCH 2/2] bpf: btf: remove a couple conditions
From: Martin KaFai Lau @ 2018-04-27 17:55 UTC (permalink / raw)
To: Dan Carpenter
Cc: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel,
kernel-janitors
In-Reply-To: <20180427172023.6japncdd3nbqauzn@kafai-mbp>
On Fri, Apr 27, 2018 at 10:20:25AM -0700, Martin KaFai Lau wrote:
> On Fri, Apr 27, 2018 at 05:04:59PM +0300, Dan Carpenter wrote:
> > We know "err" is zero so we can remove these and pull the code in one
> > indent level.
> >
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> Thanks for the simplification!
>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
btw, it should be for bpf-next. Please tag the subject with bpf-next when
you respin. Thanks!
>
> > ---
> > This applies to the BPF tree (linux-next)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index e631b6fd60d3..7cb0905f37c2 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -1973,16 +1973,14 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
> > if (err)
> > goto errout;
> >
> > - if (!err && log->level && bpf_verifier_log_full(log)) {
> > + if (log->level && bpf_verifier_log_full(log)) {
> > err = -ENOSPC;
> > goto errout;
> > }
> >
> > - if (!err) {
> > - btf_verifier_env_free(env);
> > - btf_get(btf);
> > - return btf;
> > - }
> > + btf_verifier_env_free(env);
> > + btf_get(btf);
> > + return btf;
> >
> > errout:
> > btf_verifier_env_free(env);
^ permalink raw reply
* Re: [net-next] ipv6: sr: Extract the right key values for "seg6_make_flowlabel"
From: David Miller @ 2018-04-27 17:59 UTC (permalink / raw)
To: amsalam20; +Cc: dav.lebrun, netdev, linux-kernel
In-Reply-To: <1524751871-1353-1-git-send-email-amsalam20@gmail.com>
From: Ahmed Abdelsalam <amsalam20@gmail.com>
Date: Thu, 26 Apr 2018 16:11:11 +0200
> @@ -119,6 +119,9 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
> int hdrlen, tot_len, err;
> __be32 flowlabel;
>
> + inner_hdr = ipv6_hdr(skb);
You have to make this assignment after, not before, the skb_cow_header()
call. Otherwise this point can be pointing to freed up memory.
^ permalink raw reply
* Re: [net-next] net: intel: Cleanup the copyright/license headers
From: David Miller @ 2018-04-27 18:00 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180426150809.11482-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 26 Apr 2018 08:08:09 -0700
> After many years of having a ~30 line copyright and license header to our
> source files, we are finally able to reduce that to one line with the
> advent of the SPDX identifier.
>
> Also caught a few files missing the SPDX license identifier, so fixed
> them up.
>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
> Acked-by: Richard Cochran <richardcochran@gmail.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next v3 0/4] fixes from 2018-04-17 - v3
From: David Miller @ 2018-04-27 18:03 UTC (permalink / raw)
To: ubraun; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180426151823.78967-1-ubraun@linux.ibm.com>
From: Ursula Braun <ubraun@linux.ibm.com>
Date: Thu, 26 Apr 2018 17:18:19 +0200
> Version 3 changes
> * no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
> * allow fallback for some sockopts eliminating SMC usage
> * when setting TCP_NODELAY always enforce data transmission
> (not only together with corked data)
This looks a lot better than what you were doing before.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] tcp: remove mss check in tcp_select_initial_window()
From: David Miller @ 2018-04-27 18:05 UTC (permalink / raw)
To: weiwan; +Cc: netdev, ycheng, edumazet, soheil
In-Reply-To: <20180426165810.164524-1-tracywwnj@gmail.com>
From: Wei Wang <weiwan@google.com>
Date: Thu, 26 Apr 2018 09:58:10 -0700
> From: Wei Wang <weiwan@google.com>
>
> In tcp_select_initial_window(), we only set rcv_wnd to
> tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
> rcv_wnd is kept at the full receive space of the socket which is a
> value way larger than tcp_default_init_rwnd().
> With larger initial rcv_wnd value, receive buffer autotuning logic
> takes longer to kick in and increase the receive buffer.
>
> In a TCP throughput test where receiver has rmem[2] set to 125MB
> (wscale is 11), we see the connection gets recvbuf limited at the
> beginning of the connection and gets less throughput overall.
>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
Very nice commit message.
Applied.
^ permalink raw reply
* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Thomas Deutschmann @ 2018-04-27 18:07 UTC (permalink / raw)
To: Eddie Chapman, Greg KH; +Cc: stable, davem, nicolas.dichtel, netdev
In-Reply-To: <ae1401af-a400-f6de-658e-bae0b29c52e4@ehuk.net>
Hi Greg,
first, we need to cherry-pick another patch first:
> From 52a589d51f1008f62569bf89e95b26221ee76690 Mon Sep 17 00:00:00 2001
> From: Xin Long <lucien.xin@gmail.com>
> Date: Mon, 25 Dec 2017 14:43:58 +0800
> Subject: [PATCH] geneve: update skb dst pmtu on tx path
>
> Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed
> a performance issue caused by the change of lower dev's mtu for vxlan.
>
> The same thing needs to be done for geneve as well.
>
> Note that geneve cannot adjust it's mtu according to lower dev's mtu
> when creating it. The performance is very low later when netperfing
> over it without fixing the mtu manually. This patch could also avoid
> this issue.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Then you can apply the following backport. A backport is required
because v4.15 has commit 77552cfa39c48e695c39d0553afc8c6018e411ce
which rewrote
> skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info);
into
> skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2,
> rel_info);
in net/ipv6/ip6_tunnel.c which is missing:
>From b2fb9a8178660f92c6ab29d3171bc44e2cb1b618 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 25 Jan 2018 19:03:03 +0100
Subject: net: don't call update_pmtu unconditionally
commit f15ca723c1ebe6c1a06bc95fda6b62cd87b44559 upstream.
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"
Let's add a helper to check if update_pmtu is available before calling it.
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 3 +--
drivers/net/geneve.c | 4 ++--
drivers/net/vxlan.c | 6 ++----
include/net/dst.h | 8 ++++++++
net/ipv4/ip_tunnel.c | 3 +--
net/ipv4/ip_vti.c | 2 +-
net/ipv6/ip6_tunnel.c | 5 ++---
net/ipv6/ip6_vti.c | 2 +-
net/ipv6/sit.c | 4 ++--
9 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 7774654c2ccb..7a5ed5a5391e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1447,8 +1447,7 @@ void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb,
struct ipoib_dev_priv *priv = ipoib_priv(dev);
int e = skb_queue_empty(&priv->cm.skb_queue);
- if (skb_dst(skb))
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
skb_queue_tail(&priv->cm.skb_queue, skb);
if (e)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 1b0fcf0b2afa..fbc825ac97ab 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -829,7 +829,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
int mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) -
GENEVE_BASE_HLEN - info->options_len - 14;
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
}
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
@@ -875,7 +875,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
GENEVE_BASE_HLEN - info->options_len - 14;
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
}
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bb44f0c6891f..3d9c5b35a4a7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2158,8 +2158,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
if (skb_dst(skb)) {
int mtu = dst_mtu(ndst) - VXLAN_HEADROOM;
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
- skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
}
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
@@ -2200,8 +2199,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
if (skb_dst(skb)) {
int mtu = dst_mtu(ndst) - VXLAN6_HEADROOM;
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
- skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
}
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
diff --git a/include/net/dst.h b/include/net/dst.h
index 694c2e6ae618..ebfb4328fdb1 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -520,4 +520,12 @@ static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
}
#endif
+static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
+{
+ struct dst_entry *dst = skb_dst(skb);
+
+ if (dst && dst->ops->update_pmtu)
+ dst->ops->update_pmtu(dst, NULL, skb, mtu);
+}
+
#endif /* _NET_DST_H */
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 13f7bbc0168d..a2fcc20774a6 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -521,8 +521,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
else
mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
- if (skb_dst(skb))
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
if (skb->protocol == htons(ETH_P_IP)) {
if (!skb_is_gso(skb) &&
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 89453cf62158..c9cd891f69c2 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -209,7 +209,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
mtu = dst_mtu(dst);
if (skb->len > mtu) {
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
if (skb->protocol == htons(ETH_P_IP)) {
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
htonl(mtu));
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7e11f6a811f5..d61a82fd4b60 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -652,7 +652,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
if (rel_info > dst_mtu(skb_dst(skb2)))
goto out;
- skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info);
+ skb_dst_update_pmtu(skb2, rel_info);
}
if (rel_type == ICMP_REDIRECT)
skb_dst(skb2)->ops->redirect(skb_dst(skb2), NULL, skb2);
@@ -1141,8 +1141,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
mtu = 576;
}
- if (skb_dst(skb) && !t->parms.collect_md)
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
*pmtu = mtu;
err = -EMSGSIZE;
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 7c0f647b5195..2493a40bc4b1 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -486,7 +486,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
mtu = dst_mtu(dst);
if (!skb->ignore_df && skb->len > mtu) {
- skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+ skb_dst_update_pmtu(skb, mtu);
if (skb->protocol == htons(ETH_P_IPV6)) {
if (mtu < IPV6_MIN_MTU)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index f03c1a562135..b35d8905794c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -925,8 +925,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
df = 0;
}
- if (tunnel->parms.iph.daddr && skb_dst(skb))
- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ if (tunnel->parms.iph.daddr)
+ skb_dst_update_pmtu(skb, mtu);
if (skb->len > mtu && !skb_is_gso(skb)) {
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
--
2.17.0
^ permalink raw reply related
* DSA
From: Dave Richards @ 2018-04-27 18:10 UTC (permalink / raw)
To: netdev@vger.kernel.org
Hello,
I am building a prototype for a new product based on a Lanner, Inc. embedded PC. It is an Intel Celeron-based system with two host I210 GbE chips connected to 2 MV88E6172 chips (one NIC to one switch). Everything appears to show up hardware-wise. My question is, what is the next step? How does DSA know which NICs are intended to be masters? Is this supposed to be auto-detected or is this knowledge supposed to be communicated explicitly. Reading through the DSA driver code I see that there is a check of the OF property list for the device for a "label"/"cpu" property/value pair that needs to be present. Who sets this and when?
I'm sorry for this basic question, but Google has not enlightened me.
Thanks!
Dave
Dave Richards
VP Software Engineering
Impinj, Inc
400 Fairview Ave N. #1200
Seattle, WA
O: (206) 812-9863
^ permalink raw reply
* Re: [PATCH bpf-next v2 00/15] Introducing AF_XDP support
From: Björn Töpel @ 2018-04-27 18:12 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
Daniel Borkmann, Michael S. Tsirkin, Network Development,
Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAF=yD-JUezRSGP_6f=WDHqcznFEW4K95hd6qVK8BaAr6Q5VR5Q@mail.gmail.com>
2018-04-27 19:16 GMT+02:00 Willem de Bruijn <willemdebruijn.kernel@gmail.com>:
> On Fri, Apr 27, 2018 at 8:17 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> This patch set introduces a new address family called AF_XDP that is
>> optimized for high performance packet processing and, in upcoming
>> patch sets, zero-copy semantics. In this v2 version, we have removed
>> all zero-copy related code in order to make it smaller, simpler and
>> hopefully more review friendly. This patch set only supports copy-mode
>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>> work has already been accepted. We will publish our zero-copy support
>> for RX and TX on top of his patch sets at a later point in time.
>
>> Changes from V1:
>>
>> * Fixes to bugs spotted by Will in his review
>> * Implemented the performance otimization to BPF_MAP_TYPE_XSKMAP
>> suggested by Will
>
> An xsk may only exist in one map at a time. Is this somehow assured?
>
Actually this is *not* the case. An xsk may reside in many maps, and
multiple times in the same map. So it's not assured at all. :-)
The restriction for an xsk is per netdev/queue/umem (and) the napi
context guarantee the SPSC constraint.
For the record, your XSKMAP suggestion gave ~100kpps in the ingress
path! Very nice!
>> * Refactored packet_direct_xmit to become a common function
>> in core/dev.c as suggested by Will
>> * Added documentation as suggested by Jesper
>> * Proper page unpinning as suggested by MST
>> * Some minor code cleanups
>
> Everything else looks great to me. If the above is correct (or corrected)
>
> Acked-by: Willem de Bruijn <willemb@google.com>
>
Thanks for the in-depth review, Will! Very much appreciated! (bow)
Björn
> I did not read everything again, but applied both patchsets on top of
> bpf-next to do a diff of diffs. In case others find it useful:
>
> https://github.com/wdebruij/linux/tree/bpf-next-afxdp-v1
> https://github.com/wdebruij/linux/tree/bpf-next-afxdp-v2
^ permalink raw reply
* [PATCH 0/3] Clean up users of skb_tx_hash and __skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
To: netdev, davem
Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.
My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.
---
Alexander Duyck (3):
opa_vnic: Just use skb_get_hash instead of skb_tx_hash
mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c | 21 ++++++++++----------
.../infiniband/ulp/opa_vnic/opa_vnic_internal.h | 2 +-
drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +-
include/linux/netdevice.h | 13 ------------
net/core/dev.c | 10 ++++------
6 files changed, 17 insertions(+), 33 deletions(-)
^ permalink raw reply
* [PATCH 1/3] opa_vnic: Just use skb_get_hash instead of skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
To: netdev, davem
Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>
This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.
The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.
Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.
Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.
In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c | 21 ++++++++++----------
.../infiniband/ulp/opa_vnic/opa_vnic_internal.h | 2 +-
drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 2 +-
3 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
index 4be3aef..267da82 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
@@ -443,17 +443,16 @@ static u8 opa_vnic_get_rc(struct __opa_veswport_info *info,
}
/* opa_vnic_calc_entropy - calculate the packet entropy */
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb)
+u8 opa_vnic_calc_entropy(struct sk_buff *skb)
{
- u16 hash16;
-
- /*
- * Get flow based 16-bit hash and then XOR the upper and lower bytes
- * to get the entropy.
- * __skb_tx_hash limits qcount to 16 bits. Hence, get 15-bit hash.
- */
- hash16 = __skb_tx_hash(adapter->netdev, skb, BIT(15));
- return (u8)((hash16 >> 8) ^ (hash16 & 0xff));
+ u32 hash = skb_get_hash(skb);
+
+ /* store XOR of all bytes in lower 8 bits */
+ hash ^= hash >> 8;
+ hash ^= hash >> 16;
+
+ /* return lower 8 bits as entropy */
+ return (u8)(hash & 0xFF);
}
/* opa_vnic_get_def_port - get default port based on entropy */
@@ -490,7 +489,7 @@ void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb)
hdr = skb_push(skb, OPA_VNIC_HDR_LEN);
- entropy = opa_vnic_calc_entropy(adapter, skb);
+ entropy = opa_vnic_calc_entropy(skb);
def_port = opa_vnic_get_def_port(adapter, entropy);
len = opa_vnic_wire_length(skb);
dlid = opa_vnic_get_dlid(adapter, skb, def_port);
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
index afd95f4..43ac61f 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
@@ -299,7 +299,7 @@ struct opa_vnic_adapter *opa_vnic_add_netdev(struct ib_device *ibdev,
void opa_vnic_rem_netdev(struct opa_vnic_adapter *adapter);
void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
u8 opa_vnic_get_vl(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
+u8 opa_vnic_calc_entropy(struct sk_buff *skb);
void opa_vnic_process_vema_config(struct opa_vnic_adapter *adapter);
void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter);
void opa_vnic_query_mac_tbl(struct opa_vnic_adapter *adapter,
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
index ce57e0f..0c8aec6 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
@@ -104,7 +104,7 @@ static u16 opa_vnic_select_queue(struct net_device *netdev, struct sk_buff *skb,
/* pass entropy and vl as metadata in skb */
mdata = skb_push(skb, sizeof(*mdata));
- mdata->entropy = opa_vnic_calc_entropy(adapter, skb);
+ mdata->entropy = opa_vnic_calc_entropy(skb);
mdata->vl = opa_vnic_get_vl(adapter, skb);
rc = adapter->rn_ops->ndo_select_queue(netdev, skb,
accel_priv, fallback);
^ permalink raw reply related
* [PATCH 2/3] mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
To: netdev, davem
Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>
The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6b68537..0227786 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -694,7 +694,7 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb,
u16 rings_p_up = priv->num_tx_rings_p_up;
if (netdev_get_num_tc(dev))
- return skb_tx_hash(dev, skb);
+ return fallback(dev, skb);
return fallback(dev, skb) % rings_p_up;
}
^ permalink raw reply related
* Re: [PATCH net-next 03/13] sctp: remove an if() that is always true
From: Marcelo Ricardo Leitner @ 2018-04-27 18:13 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, linux-sctp, Vlad Yasevich, Xin Long
In-Reply-To: <20180427105050.GA22078@hmswarspite.think-freely.org>
On Fri, Apr 27, 2018 at 06:50:50AM -0400, Neil Horman wrote:
> On Thu, Apr 26, 2018 at 04:58:52PM -0300, Marcelo Ricardo Leitner wrote:
> > As noticed by Xin Long, the if() here is always true as PMTU can never
> > be 0.
> >
> > Reported-by: Xin Long <lucien.xin@gmail.com>
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> > net/sctp/associola.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> > index b3aa95222bd52113295cb246c503c903bdd5c353..c5ed09cfa8423b17546e3d45f6d06db03af66384 100644
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -1397,10 +1397,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
> > pmtu = t->pathmtu;
> > }
> >
> > - if (pmtu) {
> > - asoc->pathmtu = pmtu;
> > - asoc->frag_point = sctp_frag_point(asoc, pmtu);
> > - }
> > + asoc->pathmtu = pmtu;
> > + asoc->frag_point = sctp_frag_point(asoc, pmtu);
> >
> Can you double check this? Looking at it, it seems far fetched, but if someone
Sure.
> sends a crafted icmp dest unreach message to the host, pmtu_sending might be
> able to get set for an association (which may have no transports established
> yet), and if so, on the first packet send sctp_assoc_sync_pmtu can be called,
> leading to a fall through in the loop over all transports, and pmtu being zero.
> It seems like a far fetched set of circumstances, I know, but if it can happen,
> I think you might see a crash in sctp_frag_point due to an underflow of the frag
> value
If I got you right, this situation would not happen because when
handling the icmp it will check if there is a transport and ignore it
otherwise.
Marcelo
>
> Neil
>
> > pr_debug("%s: asoc:%p, pmtu:%d, frag_point:%d\n", __func__, asoc,
> > asoc->pathmtu, asoc->frag_point);
> > --
> > 2.14.3
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* [PATCH 3/3] net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
To: netdev, davem
Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
include/linux/netdevice.h | 13 -------------
net/core/dev.c | 10 ++++------
2 files changed, 4 insertions(+), 19 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf44503..6da5371 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3213,19 +3213,6 @@ static inline int netif_set_xps_queue(struct net_device *dev,
}
#endif
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues);
-
-/*
- * Returns a Tx hash for the given packet when dev->real_num_tx_queues is used
- * as a distribution range limit for the returned value.
- */
-static inline u16 skb_tx_hash(const struct net_device *dev,
- struct sk_buff *skb)
-{
- return __skb_tx_hash(dev, skb, dev->real_num_tx_queues);
-}
-
/**
* netif_is_multiqueue - test if device has multiple transmit queues
* @dev: network device
diff --git a/net/core/dev.c b/net/core/dev.c
index 9b04a9f..7584d9c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2614,17 +2614,16 @@ void netif_device_attach(struct net_device *dev)
* Returns a Tx hash based on the given packet descriptor a Tx queues' number
* to be used as a distribution range.
*/
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues)
+static u16 skb_tx_hash(const struct net_device *dev, struct sk_buff *skb)
{
u32 hash;
u16 qoffset = 0;
- u16 qcount = num_tx_queues;
+ u16 qcount = dev->real_num_tx_queues;
if (skb_rx_queue_recorded(skb)) {
hash = skb_get_rx_queue(skb);
- while (unlikely(hash >= num_tx_queues))
- hash -= num_tx_queues;
+ while (unlikely(hash >= qcount))
+ hash -= qcount;
return hash;
}
@@ -2637,7 +2636,6 @@ u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
}
-EXPORT_SYMBOL(__skb_tx_hash);
static void skb_warn_bad_offload(const struct sk_buff *skb)
{
^ permalink raw reply related
* [PATCH net] bridge: netfilter stp fix reference to uninitialized data
From: Stephen Hemminger @ 2018-04-27 18:16 UTC (permalink / raw)
To: pablo, kadlec, fw, davem
Cc: netfilter-devel, bridge, netdev, Stephen Hemminger,
Stephen Hemminger
The destination mac (destmac) is only valid if EBT_DESTMAC flag
is set. Fix by changing the order of the comparison to look for
the flag first.
Reported-by: syzbot+5c06e318fc558cc27823@syzkaller.appspotmail.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
Note: no fixes since this bug goes back to pre-git days.
Should go to stable as well.
net/bridge/netfilter/ebt_stp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bridge/netfilter/ebt_stp.c b/net/bridge/netfilter/ebt_stp.c
index 47ba98db145d..46c1fe7637ea 100644
--- a/net/bridge/netfilter/ebt_stp.c
+++ b/net/bridge/netfilter/ebt_stp.c
@@ -161,8 +161,8 @@ static int ebt_stp_mt_check(const struct xt_mtchk_param *par)
/* Make sure the match only receives stp frames */
if (!par->nft_compat &&
(!ether_addr_equal(e->destmac, eth_stp_addr) ||
- !is_broadcast_ether_addr(e->destmsk) ||
- !(e->bitmask & EBT_DESTMAC)))
+ !(e->bitmask & EBT_DESTMAC) ||
+ !is_broadcast_ether_addr(e->destmsk)))
return -EINVAL;
return 0;
--
2.17.0
^ permalink raw reply related
* Re: [PATCH net-next] net: sch: prio: Set bands to default on delete instead of noop
From: Cong Wang @ 2018-04-27 18:20 UTC (permalink / raw)
To: Nogah Frankel
Cc: Linux Kernel Network Developers, David Miller, Jiri Pirko,
Jamal Hadi Salim, mlxsw
In-Reply-To: <1524749556-36199-1-git-send-email-nogahf@mellanox.com>
On Thu, Apr 26, 2018 at 6:32 AM, Nogah Frankel <nogahf@mellanox.com> wrote:
> When a band is created, it is set to the default qdisc, which is
> "invisible" pfifo.
Isn't TCA_DUMP_INVISIBLE for dumping this invisible qdisc?
> However, if a band is set to a qdisc that is later being deleted, it will
> be set to noop qdisc. This can cause a packet loss, while there is no clear
> user indication for it. ("invisible" qdisc are not being shown by default).
> This patch sets a band to the default qdisc, rather then the noop qdisc, on
> delete operation.
It is set to noop historically, may be not reasonable but changing
it could break things.
What's wrong with using TCA_DUMP_INVISIBLE to dump it?
^ permalink raw reply
* Re: DSA
From: Florian Fainelli @ 2018-04-27 18:32 UTC (permalink / raw)
To: Dave Richards, netdev@vger.kernel.org, andrew, vivien.didelot
In-Reply-To: <MWHPR06MB3503CE521D6993C7786A3E93DC8D0@MWHPR06MB3503.namprd06.prod.outlook.com>
Hello,
On 04/27/2018 11:10 AM, Dave Richards wrote:
> Hello,
>
> I am building a prototype for a new product based on a Lanner, Inc. embedded PC. It is an Intel Celeron-based system with two host I210 GbE chips connected to 2 MV88E6172 chips (one NIC to one switch). Everything appears to show up hardware-wise. My question is, what is the next step? How does DSA know which NICs are intended to be masters? Is this supposed to be auto-detected or is this knowledge supposed to be communicated explicitly. Reading through the DSA driver code I see that there is a check of the OF property list for the device for a "label"/"cpu" property/value pair that needs to be present. Who sets this and when?
On system where Device Tree can be used, we expect you to declare all
relevant peripherals in Device Tree and those would include the Ethernet
controller (i210) and the Ethernet switches. An example of this can be
found here:
On x86, there is not an universal use of Device Tree, so we can use
something along these lines to register an Ethernet switch through DSA:
https://github.com/lunn/linux/commit/34055b931848545b6ba11ee50b88e89aeb02c9a5
There might be a way for you to use a conjuction of DMI Match entries to
match your specific board design and based on that run the DSA switch
registration code, indicating the port mapping and the Ethernet
controller to be used as a "master device.
--
Florian
^ permalink raw reply
* Re: [pull request][net 0/7] Mellanox, mlx5 fixes 2018-04-26
From: David Miller @ 2018-04-27 18:32 UTC (permalink / raw)
To: saeedm; +Cc: netdev
In-Reply-To: <20180426195842.29665-1-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 26 Apr 2018 12:58:35 -0700
> This pull request includes fixes for mlx5 core and netdev driver.
>
> Please pull and let me know if there's any problems.
Pulled.
> For -stable v4.12
> net/mlx5e: TX, Use correct counter in dma_map error flow
> For -stable v4.13
> net/mlx5: Avoid cleaning flow steering table twice during error flow
> For -stable v4.14
> net/mlx5e: Allow offloading ipv4 header re-write for icmp
> For -stable v4.15
> net/mlx5e: DCBNL fix min inline header size for dscp
> For -stable v4.16
> net/mlx5: Fix mlx5_get_vector_affinity function
Queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: David Miller @ 2018-04-27 18:33 UTC (permalink / raw)
To: sbrivio; +Cc: steffen.klassert, lucien.xin, alexey.kodanev, jarod, sd, netdev
In-Reply-To: <c2369c8f004006b33007bad40b63c35f50ff3c23.1524764073.git.sbrivio@redhat.com>
From: Stefano Brivio <sbrivio@redhat.com>
Date: Thu, 26 Apr 2018 19:41:02 +0200
> A vti6 interface can carry IPv4 packets too.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Applied, thank you.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox