Netdev List
 help / color / mirror / Atom feed
* [Patch net-next] ipvs: remove an annoying printk in netns init
From: Cong Wang @ 2016-12-10  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Simon Horman

At most it is used for debugging purpose, but I don't think
it is even useful for debugging, just remove it.

Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/netfilter/ipvs/ip_vs_core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 2c1b498..febc7f3 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2231,8 +2231,6 @@ static int __net_init __ip_vs_init(struct net *net)
 	if (ip_vs_sync_net_init(ipvs) < 0)
 		goto sync_fail;
 
-	printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
-			 sizeof(struct netns_ipvs), ipvs->gen);
 	return 0;
 /*
  * Error handling
-- 
2.5.5

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2016-12-10  4:42 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Limit the number of can filters to avoid > MAX_ORDER allocations.
   Fix from Marc Kleine-Budde.

2) Limit GSO max size in netvsc driver to avoid problems with
   NVGRE configurations.  From Stephen Hemminger.

3) Return proper error when memory allocation fails in
   ser_gigaset_init(), from Dan Carpenter.

4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
   Feng.

5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers,
   from Florian Fainelli.

6) Handle probe deferral properly in smsc911x driver.

Please pull, thanks a lot!

The following changes since commit bc3913a5378cd0ddefd1dfec6917cc12eb23a946:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2016-12-06 09:24:11 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to d33695fbfab73a4a6550fa5c2d0bacc68d7c5901:

  net: mlx5: Fix Kconfig help text (2016-12-09 23:08:32 -0500)

----------------------------------------------------------------
Alex (1):
      drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links

Arjun V (1):
      cxgb4/cxgb4vf: Assign netdev->dev_port with port ID

Christopher Covington (1):
      net: mlx5: Fix Kconfig help text

Dan Carpenter (1):
      ser_gigaset: return -ENOMEM on error instead of success

Daniele Palmas (1):
      NET: usb: cdc_mbim: add quirk for supporting Telit LE922A

David S. Miller (3):
      Merge tag 'linux-can-fixes-for-4.9-20161207' of git://git.kernel.org/.../mkl/linux-can
      Merge tag 'linux-can-fixes-for-4.9-20161208' of git://git.kernel.org/.../mkl/linux-can
      Merge branch 'ethernet-missing-netdev-parent'

Florian Fainelli (3):
      phy: Don't increment MDIO bus refcount unless it's a different owner
      net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
      net: ethernet: cpmac: Call SET_NETDEV_DEV()

Gao Feng (1):
      driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed

Linus Walleij (1):
      net: smsc911x: back out silently on probe deferrals

Marc Kleine-Budde (1):
      can: raw: raw_setsockopt: limit number of can_filter that can be set

Peng Tao (1):
      vhost-vsock: fix orphan connection reset

Thomas Falcon (1):
      ibmveth: set correct gso_size and gso_type

stephen hemminger (1):
      netvsc: reduce maximum GSO size

추지호 (1):
      can: peak: fix bad memory access and free sequence

 drivers/isdn/gigaset/ser-gigaset.c                  |  4 +++-
 drivers/net/can/usb/peak_usb/pcan_usb_core.c        |  6 ++++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c     |  1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c          |  1 -
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c |  1 +
 drivers/net/ethernet/ibm/ibmveth.c                  | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/ibm/ibmveth.h                  |  1 +
 drivers/net/ethernet/lantiq_etop.c                  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig     |  2 --
 drivers/net/ethernet/smsc/smsc911x.c                |  9 ++++++++-
 drivers/net/ethernet/ti/cpmac.c                     |  1 +
 drivers/net/ethernet/ti/cpsw-phy-sel.c              |  1 +
 drivers/net/hyperv/netvsc_drv.c                     |  5 +++++
 drivers/net/ipvlan/ipvlan_main.c                    |  4 +++-
 drivers/net/phy/phy_device.c                        | 16 +++++++++++++---
 drivers/net/usb/cdc_mbim.c                          | 21 +++++++++++++++++++++
 drivers/net/usb/cdc_ncm.c                           | 14 +++++++++-----
 drivers/vhost/vsock.c                               |  2 +-
 include/linux/usb/cdc_ncm.h                         |  3 ++-
 include/uapi/linux/can.h                            |  1 +
 net/can/raw.c                                       |  3 +++
 21 files changed, 142 insertions(+), 20 deletions(-)

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Eric Dumazet @ 2016-12-10  4:14 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Hannes Frederic Sowa, Tom Herbert,
	Linux Kernel Network Developers
In-Reply-To: <1481341624.4930.204.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote:

> 
> Hmm... Is your ephemeral port range includes the port your load
> balancing app is using ?

I suspect that you might have processes doing bind( port = 0) that are
trapped into the bind_conflict() scan ?

With 100,000 + timewaits there, this possibly hurts.

Can you try the following loop breaker ?

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d5d3ead0a6c31e42e8843d30f8c643324a91b8e9..74f0f5ee6a02c624edb0263b9ddd27813f68d0a5 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -51,7 +51,7 @@ int inet_csk_bind_conflict(const struct sock *sk,
 	int reuse = sk->sk_reuse;
 	int reuseport = sk->sk_reuseport;
 	kuid_t uid = sock_i_uid((struct sock *)sk);
-
+	unsigned int max_count;
 	/*
 	 * Unlike other sk lookup places we do not check
 	 * for sk_net here, since _all_ the socks listed
@@ -59,6 +59,7 @@ int inet_csk_bind_conflict(const struct sock *sk,
 	 * one this bucket belongs to.
 	 */
 
+	max_count = relax ? ~0U : 100;
 	sk_for_each_bound(sk2, &tb->owners) {
 		if (sk != sk2 &&
 		    !inet_v6_ipv6only(sk2) &&
@@ -84,6 +85,8 @@ int inet_csk_bind_conflict(const struct sock *sk,
 					break;
 			}
 		}
+		if (--max_count == 0)
+			return 1;
 	}
 	return sk2 != NULL;
 }
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 1c86c478f578b49373e61a4c397f23f3dc7f3fc6..4f63d06e0d601da94eb3f2b35a988abd060e156c 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -35,12 +35,14 @@ int inet6_csk_bind_conflict(const struct sock *sk,
 	int reuse = sk->sk_reuse;
 	int reuseport = sk->sk_reuseport;
 	kuid_t uid = sock_i_uid((struct sock *)sk);
+	unsigned int max_count;
 
 	/* We must walk the whole port owner list in this case. -DaveM */
 	/*
 	 * See comment in inet_csk_bind_conflict about sock lookup
 	 * vs net namespaces issues.
 	 */
+	max_count = relax ? ~0U : 100;
 	sk_for_each_bound(sk2, &tb->owners) {
 		if (sk != sk2 &&
 		    (!sk->sk_bound_dev_if ||
@@ -61,6 +63,8 @@ int inet6_csk_bind_conflict(const struct sock *sk,
 			    ipv6_rcv_saddr_equal(sk, sk2, true))
 				break;
 		}
+		if (--max_count == 0)
+			return 1;
 	}
 
 	return sk2 != NULL;

^ permalink raw reply related

* Re: netlink: GPF in sock_sndtimeo
From: Cong Wang @ 2016-12-10  4:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, Paul Moore, Dmitry Vyukov, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML, syzkaller
In-Reply-To: <20161209110155.GW22655@madcap2.tricolour.ca>

On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2016-12-08 22:57, Cong Wang wrote:
>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > I also tried to extend Cong Wang's idea to attempt to proactively respond to a
>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
>> > stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
>> > Eliminating the lock since the sock is dead anways eliminates the error.
>> >
>> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
>> > get the test case to compile.
>>
>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
>> are updated as a whole and race between audit_receive_msg() and
>> NETLINK_URELEASE.
>
> This is what I expected and why I originally added the mutex lock in the
> callback...  The dumps I got were bare with no wrapper identifying the
> process context or specific error, so I'm at a bit of a loss how to
> solve this (without thinking more about it) other than instinctively
> removing the mutex.

Netlink notifier can safely be converted to blocking one, I will send
a patch.

But I seriously doubt you really need NETLINK_URELEASE here,
it adds nothing but overhead, b/c the netlink notifier is called on
every netlink socket in the system, but for net exit path, that is
relatively a slow path.

Also, kauditd_send_skb() needs audit_cmd_mutex too.

I will send a formal patch.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: mlx5: Fix Kconfig help text
From: David Miller @ 2016-12-10  4:09 UTC (permalink / raw)
  To: cov-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20161209215306.721-1-cov-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

From: Christopher Covington <cov-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Date: Fri,  9 Dec 2016 16:53:05 -0500

> Since the following commit, Infiniband and Ethernet have not been
> mutually exclusive.
> 
> Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet
> 
> Signed-off-by: Christopher Covington <cov-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] i40e: don't truncate match_method assignment
From: David Miller @ 2016-12-10  4:07 UTC (permalink / raw)
  To: jacob.e.keller
  Cc: intel-wired-lan, jeffrey.t.kirsher, netdev, sfr, bimmy.pujari
In-Reply-To: <20161209213921.26451-1-jacob.e.keller@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>
Date: Fri,  9 Dec 2016 13:39:21 -0800

> The .match_method field is a u8, so we shouldn't be casting to a u16,
> and because it is only one byte, we do not need to byte swap anything.
> Just assign the value directly. This avoids issues on Big Endian
> architectures which would have byte swapped and then incorrectly
> truncated the value.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Bimmy Pujari <bimmy.pujari@intel.com>
> ---
> Not sure if this was already in Jeff's queue, but since it's an obvious
> fix for the issue found by Stephen, I thought I'd send it out now just
> to make sure. Thanks for catching this, and sorry we didn't find the fix
> earlier.

Jeff, what do you want me to do with this?

^ permalink raw reply

* Re: [PATCH net-next] net: skb_condense() can also deal with empty skbs
From: David Miller @ 2016-12-10  4:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1481299325.4930.183.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 09 Dec 2016 08:02:05 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> It seems attackers can also send UDP packets with no payload at all.
> 
> skb_condense() can still be a win in this case.
> 
> It will be possible to replace the custom code in tcp_add_backlog()
> to get full benefit from skb_condense()
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: smsc911x: back out silently on probe deferrals
From: David Miller @ 2016-12-10  4:05 UTC (permalink / raw)
  To: linus.walleij
  Cc: netdev, steve.glendinning, linux, jeremy.linton, kamlakant.patel,
	p.fedin
In-Reply-To: <1481289480-22096-1-git-send-email-linus.walleij@linaro.org>

From: Linus Walleij <linus.walleij@linaro.org>
Date: Fri,  9 Dec 2016 14:18:00 +0100

> When trying to get a regulator we may get deferred and we see
> this noise:
> 
> smsc911x 1b800000.ethernet-ebi2 (unnamed net_device) (uninitialized):
>    couldn't get regulators -517
> 
> Then the driver continues anyway. Which means that the regulator
> may not be properly retrieved and reference counted, and may be
> switched off in case noone else is using it.
> 
> Fix this by returning silently on deferred probe and let the
> system work it out.
> 
> Cc: Jeremy Linton <jeremy.linton@arm.com>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Looks good, applied, thanks.

^ permalink raw reply

* Re: pull-request: mac80211-next 2016-12-09
From: David Miller @ 2016-12-10  3:59 UTC (permalink / raw)
  To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20161209120014.20292-1-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri,  9 Dec 2016 13:00:13 +0100

> Closing net-next caught me by surprise, so I had to rebase a bit,
> but these three patches really should go in soon. I'm not sending
> them for 4.9 this late though.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

^ permalink raw reply

* Re: [PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.
From: David Miller @ 2016-12-10  3:56 UTC (permalink / raw)
  To: bfolta
  Cc: nicolas.ferre, niklas.cassel, alexandre.torgue, satananda.burla,
	rvatsavayi, simon.horman, linux-kernel, netdev, rafalo
In-Reply-To: <SN1PR0701MB1951518D661B27AB9C63FA59CC870@SN1PR0701MB1951.namprd07.prod.outlook.com>

From: Bartosz Folta <bfolta@cadence.com>
Date: Fri, 9 Dec 2016 10:05:46 +0000

> There are hardware PCI implementations of Cadence GEM network controller. This patch will allow to use such hardware with reuse of existing Platform Driver.

Please properly format your commit message text to 80 columns.

> 
> Signed-off-by: Bartosz Folta <bfolta@cadence.com>
> ---
>  drivers/net/ethernet/cadence/Kconfig    |   9 ++
>  drivers/net/ethernet/cadence/Makefile   |   1 +
>  drivers/net/ethernet/cadence/macb.c     |  31 +++++--
>  drivers/net/ethernet/cadence/macb_pci.c | 152 ++++++++++++++++++++++++++++++++
>  include/linux/platform_data/macb.h      |   6 ++
>  5 files changed, 194 insertions(+), 5 deletions(-)  create mode 100644 drivers/net/ethernet/cadence/macb_pci.c

This patch doesn't apply to net-next, please respin.

^ permalink raw reply

* Re: [PATCH] ibmveth: set correct gso_size and gso_type
From: David Miller @ 2016-12-10  3:48 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev
In-Reply-To: <1481236803-4807-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Thu,  8 Dec 2016 16:40:03 -0600

> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 
> "We recently encountered a bug where a few customers using ibmveth on the
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size
> which is translated by TCP into the MSS later up the stack. The MSS is
> used to calculate the TCP window size and as that was abnormally large,
> it was calculating a zero window, even although the sockets receive buffer
> was completely empty."
> 
> We rely on the Virtual I/O Server partition in a pseries
> environment to provide the MSS through the TCP header checksum
> field. The stipulation is that users should not disable checksum
> offloading if rx packet aggregation is enabled through VIOS.
> 
> Some firmware offerings provide the MSS in the RX buffer.
> This is signalled by a bit in the RX queue descriptor.
> 
> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
> Reviewed-by: David Dai <zdai@us.ibm.com>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

Applied, although mis-using the TCP checksum field for this is kind of
bogus.  I'm surprised there wasn't some other place you could stick
this value, which wouldn't modify the packet contents.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Eric Dumazet @ 2016-12-10  3:47 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Hannes Frederic Sowa, Tom Herbert,
	Linux Kernel Network Developers
In-Reply-To: <1481335192.3663.0@smtp.office365.com>

On Fri, 2016-12-09 at 20:59 -0500, Josef Bacik wrote:
> On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik <jbacik@fb.com> wrote:
> > 
> >>  On Dec 8, 2016, at 7:32 PM, Eric Dumazet <eric.dumazet@gmail.com> 
> >> wrote:
> >> 
> >>>  On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote:
> >>> 
> >>>  We can reproduce the problem at will, still trying to run down the
> >>>  problem.  I'll try and find one of the boxes that dumped a core 
> >>> and get
> >>>  a bt of everybody.  Thanks,
> >> 
> >>  OK, sounds good.
> >> 
> >>  I had a look and :
> >>  - could not spot a fix that came after 4.6.
> >>  - could not spot an obvious bug.
> >> 
> >>  Anything special in the program triggering the issue ?
> >>  SO_REUSEPORT and/or special socket options ?
> >> 
> > 
> > So they recently started using SO_REUSEPORT, that's what triggered 
> > it, if they don't use it then everything is fine.
> > 
> > I added some instrumentation for get_port to see if it was looping in 
> > there and none of my printk's triggered.  The softlockup messages are 
> > always on the inet_bind_bucket lock, sometimes in the process context 
> > in get_port or in the softirq context either through inet_put_port or 
> > inet_kill_twsk.  On the box that I have a coredump for there's only 
> > one processor in the inet code so I'm not sure what to make of that.  
> > That was a box from last week so I'll look at a more recent core and 
> > see if it's different.  Thanks,
> 
> Ok more investigation today, a few bullet points
> 
> - With all the debugging turned on the boxes seem to recover after 
> about a minute.  I'd get the spam of the soft lockup messages all on 
> the inet_bind_bucket, and then the box would be fine.
> - I looked at a core I had from before I started investigating things 
> and there's only one process trying to get the inet_bind_bucket of all 
> the 48 cpus.
> - I noticed that there was over 100k twsk's in that original core.
> - I put a global counter of the twsk's (since most of the softlockup 
> messages have the twsk timers in the stack) and noticed with the 
> debugging kernel it started around 16k twsk's and once it recovered it 
> was down to less than a thousand.  There's a jump where it goes from 8k 
> to 2k and then there's only one more softlockup message and the box is 
> fine.
> - This happens when we restart the service with the config option to 
> start using SO_REUSEPORT.
> 
> The application is our load balancing app, so obviously has lots of 
> connections opened at any given time.  What I'm wondering and will test 
> on Monday is if the SO_REUSEPORT change even matters, or if simply 
> restarting the service is what triggers the problem.  One thing I 
> forgot to mention is that it's also using TCP_FASTOPEN in both the 
> non-reuseport and reuseport variants.
> 
> What I suspect is happening is the service stops, all of the sockets it 
> had open go into TIMEWAIT with relatively the same timer period, and 
> then suddenly all wake up at the same time which coupled with the 
> massive amount of traffic that we see per box anyway results in so much 
> contention and ksoftirqd usage that the box livelocks for a while.  
> With the lock debugging and stuff turned on we aren't able to service 
> as much traffic so it recovers relatively quickly, whereas a normal 
> production kernel never recovers.
> 
> Please keep in mind that I"m a file system developer so my conclusions 
> may be completely insane, any guidance would be welcome.  I'll continue 
> hammering on this on Monday.  Thanks,

Hmm... Is your ephemeral port range includes the port your load
balancing app is using ?

^ permalink raw reply

* Re: [PATCH net v2] ibmveth: set correct gso_size and gso_type
From: Eric Dumazet @ 2016-12-10  3:28 UTC (permalink / raw)
  To: Thomas Falcon; +Cc: netdev, brking, pradeeps, marcelo.leitner, jmaxwell37, zdai
In-Reply-To: <1481333480-10827-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

On Fri, 2016-12-09 at 19:31 -0600, Thomas Falcon wrote:
> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 

> +					DIV_ROUND_UP(skb->len - hdr_len, mss);
> +	} else if (offset) {
> +		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
> +		skb_shinfo(skb)->gso_segs =
> +				DIV_ROUND_UP(skb->len - hdr_len,
> +					     skb_shinfo(skb)->gso_size);
> +		tcph->check = 0;
> +	}

Are you sure that tcph->check could never be 0 on some cases ?

That would crash on a divide by 0

^ permalink raw reply

* Re: [PATCH v3 net-next 0/4] udp: receive path optimizations
From: David Miller @ 2016-12-10  3:13 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, pabeni, eric.dumazet
In-Reply-To: <1481226117-31288-1-git-send-email-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu,  8 Dec 2016 11:41:53 -0800

> This patch series provides about 100 % performance increase under flood. 
> 
> v2: added Paolo feedback on udp_rmem_release() for tiny sk_rcvbuf
>     added the last patch touching sk_rmem_alloc later

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-10  2:25 UTC (permalink / raw)
  To: Pavel Machek, Francois Romieu
  Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue,
	davem, linux-kernel, netdev
In-Reply-To: <20161209112142.GA22710@amd>

Hi,

On 09.12.2016 12:21, Pavel Machek wrote:
> On Fri 2016-12-09 00:19:43, Francois Romieu wrote:
>> Lino Sanfilippo <LinoSanfilippo@gmx.de> :
>> [...]
>> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
>> > this is caused by that locking scheme (in a way I have not figured out yet)
>> > or if it is a different issue.
>> 
>> stmmac_tx_err races with stmmac_xmit.
> 
> Umm, yes, that looks real.
> 
> And that means that removing tx_lock will not be completely trivial
> :-(. Lino, any ideas there?
> 

Ok, the race is there but it looks like a problem that is not related to 
the use or removal of the private lock.
By a glimpse into other drivers (e.g sky2 or e1000), a possible way to handle a 
tx error is to start a separate task and restart the tx path in that task instead
the irq handler (or timer in case of the watchdog).

In that task we could do:
1. deactivate napi
2. deactivate irqs
3. wait for running napi/irqs do complete (_sync)
4. call stmmac_tx_err()
5. reenable napi
6. reenable irqs

We have to ensure that no xmit() is executing while stmmac_tx_err() does the cleanup,
so stmmac_tx_err() should IMO rather call netif_tx_disable() instead of netif_stop_queue()
(the former grabs the xmit lock before it sets __QUEUE_STATE_DRV_XOFF to disable
the queue).

Regards,
Lino

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII
From: Jie Deng @ 2016-12-10  2:16 UTC (permalink / raw)
  To: Andrew Lunn, Jie Deng
  Cc: Florian Fainelli, davem, netdev, linux-kernel, CARLOS.PALMINHA,
	lars.persson, thomas.lendacky
In-Reply-To: <20161209163905.GG9923@lunn.ch>



On 2016/12/10 0:39, Andrew Lunn wrote:
> On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote:
>>
>> On 2016/12/9 6:15, Florian Fainelli wrote:
>>> On 12/06/2016 07:57 PM, Jie Deng wrote:
>>>> This patch adds phy-mode support for Synopsys XLGMAC
>>> The functional changes look good, but I would like to see some
>>> description of what the XL part stands for here.
>>>
>>> While you are modifying this, do you also mind submitting a Device Tree
>>> specification change:
>>>
>>> https://www.devicetree.org/specifications/
>>>
>>> Thanks!
>> Thank you for the information.
>>
>> Currenlty, the XLGMAC is a new IP from Synopsys.
> I think Florian wants to know about the IEEE standard or what ever
> which defines what the phy-mode XLGMAC is, in the same way there are
> standards for RGMII, SGMII, etc.
>
> 	  Andrew
Understood! Thank you !

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Jie Deng @ 2016-12-10  2:13 UTC (permalink / raw)
  To: Andy Shevchenko, Florian Fainelli
  Cc: David Miller, Joao Pinto, Giuseppe CAVALLARO, lars.persson,
	rabin.vincent, netdev, CARLOS.PALMINHA
In-Reply-To: <CAHp75VfT9B3O5jU0eHoKtgYc48K2ZjCQ-g9ZQ9nX1Hew6tz-zw@mail.gmail.com>



On 2016/12/10 8:16, Andy Shevchenko wrote:
> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>> did
>> actually pioneer the upstreaming effort, but it is good to see people
>> from Synopsys willing to fix that in the future.
> Wait, you would like to tell that we have more than 2 drivers for the
> same (okay, same vendor) IP?!
> It's better to unify them earlier, than have n+ copies.
>
> P.S. Though, I don't see how sxgbe got in the list. First glance on
> the code doesn't show similarities.
Glance on sxgbe_reg.h the register seems from Synopsys XGMAC IP... Probably,
amd-xgbe and sxgbe targeted the same IP

^ permalink raw reply

* Re: [PATCH 0/2 v3] net: qcom/emac: simplify support for different SOCs
From: David Miller @ 2016-12-10  2:06 UTC (permalink / raw)
  To: timur; +Cc: netdev, alokc
In-Reply-To: <1481225061-30962-1-git-send-email-timur@codeaurora.org>

From: Timur Tabi <timur@codeaurora.org>
Date: Thu,  8 Dec 2016 13:24:19 -0600

> On SOCs that have the Qualcomm EMAC network controller, the internal
> PHY block is always different.  Sometimes the differences are small, 
> sometimes it might be a completely different IP.  Either way, using version
> numbers to differentiate them and putting all of the init code in one
> file does not scale.
> 
> This patchset does two things:  The first breaks up the current code into
> different files, and the second patch adds support for a third SOC, the
> Qualcomm Technologies QDF2400 ARM Server SOC.

Series applied.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-10  1:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, Tom Herbert,
	Linux Kernel Network Developers
In-Reply-To: <6C6EE0ED-7E78-4866-8AAF-D75FD4719EF3@fb.com>

On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik <jbacik@fb.com> wrote:
> 
>>  On Dec 8, 2016, at 7:32 PM, Eric Dumazet <eric.dumazet@gmail.com> 
>> wrote:
>> 
>>>  On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote:
>>> 
>>>  We can reproduce the problem at will, still trying to run down the
>>>  problem.  I'll try and find one of the boxes that dumped a core 
>>> and get
>>>  a bt of everybody.  Thanks,
>> 
>>  OK, sounds good.
>> 
>>  I had a look and :
>>  - could not spot a fix that came after 4.6.
>>  - could not spot an obvious bug.
>> 
>>  Anything special in the program triggering the issue ?
>>  SO_REUSEPORT and/or special socket options ?
>> 
> 
> So they recently started using SO_REUSEPORT, that's what triggered 
> it, if they don't use it then everything is fine.
> 
> I added some instrumentation for get_port to see if it was looping in 
> there and none of my printk's triggered.  The softlockup messages are 
> always on the inet_bind_bucket lock, sometimes in the process context 
> in get_port or in the softirq context either through inet_put_port or 
> inet_kill_twsk.  On the box that I have a coredump for there's only 
> one processor in the inet code so I'm not sure what to make of that.  
> That was a box from last week so I'll look at a more recent core and 
> see if it's different.  Thanks,

Ok more investigation today, a few bullet points

- With all the debugging turned on the boxes seem to recover after 
about a minute.  I'd get the spam of the soft lockup messages all on 
the inet_bind_bucket, and then the box would be fine.
- I looked at a core I had from before I started investigating things 
and there's only one process trying to get the inet_bind_bucket of all 
the 48 cpus.
- I noticed that there was over 100k twsk's in that original core.
- I put a global counter of the twsk's (since most of the softlockup 
messages have the twsk timers in the stack) and noticed with the 
debugging kernel it started around 16k twsk's and once it recovered it 
was down to less than a thousand.  There's a jump where it goes from 8k 
to 2k and then there's only one more softlockup message and the box is 
fine.
- This happens when we restart the service with the config option to 
start using SO_REUSEPORT.

The application is our load balancing app, so obviously has lots of 
connections opened at any given time.  What I'm wondering and will test 
on Monday is if the SO_REUSEPORT change even matters, or if simply 
restarting the service is what triggers the problem.  One thing I 
forgot to mention is that it's also using TCP_FASTOPEN in both the 
non-reuseport and reuseport variants.

What I suspect is happening is the service stops, all of the sockets it 
had open go into TIMEWAIT with relatively the same timer period, and 
then suddenly all wake up at the same time which coupled with the 
massive amount of traffic that we see per box anyway results in so much 
contention and ksoftirqd usage that the box livelocks for a while.  
With the lock debugging and stuff turned on we aren't able to service 
as much traffic so it recovers relatively quickly, whereas a normal 
production kernel never recovers.

Please keep in mind that I"m a file system developer so my conclusions 
may be completely insane, any guidance would be welcome.  I'll continue 
hammering on this on Monday.  Thanks,

Josef

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Florian Fainelli @ 2016-12-10  1:44 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Miller, Joao Pinto, Giuseppe CAVALLARO, lars.persson,
	rabin.vincent, netdev, CARLOS.PALMINHA, Jie.Deng1
In-Reply-To: <CAHp75VfT9B3O5jU0eHoKtgYc48K2ZjCQ-g9ZQ9nX1Hew6tz-zw@mail.gmail.com>

Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
> 
>> did
>> actually pioneer the upstreaming effort, but it is good to see people
>> from Synopsys willing to fix that in the future.
> 
> Wait, you would like to tell that we have more than 2 drivers for the
> same (okay, same vendor) IP?!
> It's better to unify them earlier, than have n+ copies.

Unfortunately that is the case, see this email:

https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html

dwc_eth_qos and stmmac have some overlap. There seems to be work
underway to unify these two to begin with.

> 
> P.S. Though, I don't see how sxgbe got in the list. First glance on
> the code doesn't show similarities.

Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
just my cursory look at the code, it may very well be something entirely
different. The descriptor formats just look suspiciously similar.
-- 
Florian

^ permalink raw reply

* [PATCH net v2] ibmveth: set correct gso_size and gso_type
From: Thomas Falcon @ 2016-12-10  1:31 UTC (permalink / raw)
  To: netdev; +Cc: brking, pradeeps, marcelo.leitner, jmaxwell37, zdai, eric.dumazet
In-Reply-To: <1481236803-4807-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Dai <zdai@us.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
    and make sure everyone is included on CC
---
 drivers/net/ethernet/ibm/ibmveth.c | 72 ++++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/ibm/ibmveth.h |  1 +
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index ebe6071..f0c3ae7 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -58,7 +58,7 @@
 
 static const char ibmveth_driver_name[] = "ibmveth";
 static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver";
-#define ibmveth_driver_version "1.05"
+#define ibmveth_driver_version "1.06"
 
 MODULE_AUTHOR("Santiago Leon <santil@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver");
@@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter)
 	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK;
 }
 
+static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter)
+{
+	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT;
+}
+
 static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter)
 {
 	return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length);
@@ -1174,6 +1179,52 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	goto retry_bounce;
 }
 
+static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
+{
+	struct tcphdr *tcph;
+	int offset = 0;
+	int hdr_len;
+
+	/* only TCP packets will be aggregated */
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)skb->data;
+
+		if (iph->protocol == IPPROTO_TCP) {
+			offset = iph->ihl * 4;
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		} else {
+			return;
+		}
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data;
+
+		if (iph6->nexthdr == IPPROTO_TCP) {
+			offset = sizeof(struct ipv6hdr);
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		} else {
+			return;
+		}
+	} else {
+		return;
+	}
+	/* if mss is not set through Large Packet bit/mss in rx buffer,
+	 * expect that the mss will be written to the tcp header checksum.
+	 */
+	tcph = (struct tcphdr *)(skb->data + offset);
+	hdr_len = offset + tcph->doff * 4;
+	if (lrg_pkt) {
+		skb_shinfo(skb)->gso_size = mss;
+		skb_shinfo(skb)->gso_segs =
+					DIV_ROUND_UP(skb->len - hdr_len, mss);
+	} else if (offset) {
+		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
+		skb_shinfo(skb)->gso_segs =
+				DIV_ROUND_UP(skb->len - hdr_len,
+					     skb_shinfo(skb)->gso_size);
+		tcph->check = 0;
+	}
+}
+
 static int ibmveth_poll(struct napi_struct *napi, int budget)
 {
 	struct ibmveth_adapter *adapter =
@@ -1182,6 +1233,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	int frames_processed = 0;
 	unsigned long lpar_rc;
 	struct iphdr *iph;
+	u16 mss = 0;
 
 restart_poll:
 	while (frames_processed < budget) {
@@ -1199,9 +1251,21 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 			int length = ibmveth_rxq_frame_length(adapter);
 			int offset = ibmveth_rxq_frame_offset(adapter);
 			int csum_good = ibmveth_rxq_csum_good(adapter);
+			int lrg_pkt = ibmveth_rxq_large_packet(adapter);
 
 			skb = ibmveth_rxq_get_buffer(adapter);
 
+			/* if the large packet bit is set in the rx queue
+			 * descriptor, the mss will be written by PHYP eight
+			 * bytes from the start of the rx buffer, which is
+			 * skb->data at this stage
+			 */
+			if (lrg_pkt) {
+				__be64 *rxmss = (__be64 *)(skb->data + 8);
+
+				mss = (u16)be64_to_cpu(*rxmss);
+			}
+
 			new_skb = NULL;
 			if (length < rx_copybreak)
 				new_skb = netdev_alloc_skb(netdev, length);
@@ -1235,11 +1299,15 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 					if (iph->check == 0xffff) {
 						iph->check = 0;
 						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
-						adapter->rx_large_packets++;
 					}
 				}
 			}
 
+			if (length > netdev->mtu + ETH_HLEN) {
+				ibmveth_rx_mss_helper(skb, mss, lrg_pkt);
+				adapter->rx_large_packets++;
+			}
+
 			napi_gro_receive(napi, skb);	/* send it up */
 
 			netdev->stats.rx_packets++;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 4eade67..7acda04 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -209,6 +209,7 @@ struct ibmveth_rx_q_entry {
 #define IBMVETH_RXQ_TOGGLE		0x80000000
 #define IBMVETH_RXQ_TOGGLE_SHIFT	31
 #define IBMVETH_RXQ_VALID		0x40000000
+#define IBMVETH_RXQ_LRG_PKT		0x04000000
 #define IBMVETH_RXQ_NO_CSUM		0x02000000
 #define IBMVETH_RXQ_CSUM_GOOD		0x01000000
 #define IBMVETH_RXQ_OFF_MASK		0x0000FFFF
-- 
1.8.3.1

^ permalink raw reply related

* Re: Synopsys Ethernet QoS
From: Andy Shevchenko @ 2016-12-10  0:16 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David Miller, Joao Pinto, Giuseppe CAVALLARO, lars.persson,
	rabin.vincent, netdev, CARLOS.PALMINHA
In-Reply-To: <3aee5a67-5e19-34e6-1719-ff13c7b914ea@gmail.com>

On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:

> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)

> did
> actually pioneer the upstreaming effort, but it is good to see people
> from Synopsys willing to fix that in the future.

Wait, you would like to tell that we have more than 2 drivers for the
same (okay, same vendor) IP?!
It's better to unify them earlier, than have n+ copies.

P.S. Though, I don't see how sxgbe got in the list. First glance on
the code doesn't show similarities.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* fib_frontend: Add network specific broadcasts, when it takes a sense
From: Brandon Philips @ 2016-12-10  0:07 UTC (permalink / raw)
  To: netdev, Tom Denham, Aaron Levy, Brad Ison

Hello-

A number of us are working on an OSS overlay network system called flannel.
It is used in a variety of Linux container systems and one of the backends
is VXLAN.

The issue we have: when creating the VXLAN interface and assigning it an
address we see a broadcast route being added by the Kernel. For example if
we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is created. This route is
unwanted because we assign 10.4.0.0 to one of our VXLAN interfaces.

However, the Kernel interface bring-up comment reads: Add network specific
broadcasts, when it takes a sense. The code is here:
https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872

Can someone explain why creation of the broadcast route is non-optional?
Would a patch to make it optional be acceptable? Is it safe for us to
simply delete the route? We have a patch that simply deletes the broadcast
route after interface creation but don't know why the Kernel code "makes
sense".

You can read more information about the issue here:
https://github.com/coreos/flannel/pull/569

Thank You,

Brandon

^ permalink raw reply

* Re: [PATCH V2 03/22] bnxt_re: register with the NIC driver
From: Jonathan Toppins @ 2016-12-10  0:03 UTC (permalink / raw)
  To: Selvin Xavier, dledford, linux-rdma
  Cc: netdev, Eddie Wai, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna
In-Reply-To: <1481266096-23331-4-git-send-email-selvin.xavier@broadcom.com>

On 12/09/2016 01:47 AM, Selvin Xavier wrote:
> This patch handles the registration with bnxt_en driver. The driver registers
> with netdev notifier chain. Upon receiving NETDEV_REGISTER event, the driver
> in turn registers with bnxt_en driver.
> 	1. bnxt_en's ulp_probe function returns a structure that contains information
> 	   about the device and additional entry points.
> 	2. bnxt_en driver returns 'struct bnxt_eth_dev' that contains set of operation
> 	   vectors that RocE driver invokes later.
> 	3. bnxt_request_msix() allows the RoCE driver to specify the number of MSI-X
> 	   vectors that are needed.
> 	4. bnxt_send_fw_msg () can be used to send messages to the FW
> 	5. bnxt_register_async_events() can be used to register for async event
> 	   callbacks.
> 
> v2: Remove some sparse warning. Also, remove some unused code from unreg path.
> 
> Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> ---
>  drivers/infiniband/hw/bnxtre/bnxt_re.h      |  48 +++
>  drivers/infiniband/hw/bnxtre/bnxt_re_main.c | 436 ++++++++++++++++++++++++++++
>  2 files changed, 484 insertions(+)
> 

[...]

>  #endif
> diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> index ebe1c69..029824a 100644
> --- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> +++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> +
> +static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
> +{
> +	int i, j, rc;
> +
> +	/* Registered a new RoCE device instance to netdev */
> +	rc = bnxt_re_register_netdev(rdev);
> +	if (rc) {
> +		pr_err("Failed to register with netedev: %#x\n", rc);
> +		return -EINVAL;
> +	}
> +	set_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, &rdev->flags);
> +
> +	rc = bnxt_re_request_msix(rdev);
> +	if (rc) {
> +		pr_err("Failed to get MSI-X vectors: %#x\n", rc);
> +		rc = -EINVAL;
> +		goto fail;
> +	}
> +	set_bit(BNXT_RE_FLAG_GOT_MSIX, &rdev->flags);

Though this exit path looks correct (need to verify) once all patches
are applied, this looks incorrect if only considering this specific
patch. I think you need the following:

+ return 0;

> +
> +fail:
> +	bnxt_re_ib_unreg(rdev, true);
> +	return rc;
> +}
> +

^ permalink raw reply

* fib_frontend: Add network specific broadcasts, when it takes a sense
From: Brandon Philips @ 2016-12-09 23:41 UTC (permalink / raw)
  To: netdev, Tom Denham, Aaron Levy, Brad Ison

Hello-

A number of us are working on an OSS overlay network system called
flannel. It is used in a variety of Linux container systems and one of
the backends is VXLAN.

The issue we have: when creating the VXLAN interface and assigning it
an address we see a broadcast route being added by the Kernel. For
example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is
created. This route is unwanted because we assign 10.4.0.0 to one of
our VXLAN interfaces.

However, the Kernel interface bring-up comment reads: Add network
specific broadcasts, when it takes a sense. The code is here:
https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872

Can someone explain why creation of the broadcast route is
non-optional? Would a patch to make it optional be acceptable? Is it
safe for us to simply delete the route? We have a patch that simply
deletes the broadcast route after interface creation but don't know
why the Kernel code "makes sense".

You can read more information about the issue here:
https://github.com/coreos/flannel/pull/569

Thank You,

Brandon

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox