Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: Fix coccinelle warning
From: David Miller @ 2018-04-27 17:53 UTC (permalink / raw)
  To: ktkhai; +Cc: netdev, lkp
In-Reply-To: <152474505955.21078.9976470400033894421.stgit@localhost.localdomain>

From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu, 26 Apr 2018 15:18:38 +0300

> kbuild test robot says:
> 
>   >coccinelle warnings: (new ones prefixed by >>)
>   >>> net/core/dev.c:1588:2-3: Unneeded semicolon
> 
> So, let's remove it.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-04-27 17:53 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown
In-Reply-To: <1524848820-42258-3-git-send-email-sridhar.samudrala@intel.com>

Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>This provides a generic interface for paravirtual drivers to listen
>for netdev register/unregister/link change events from pci ethernet
>devices with the same MAC and takeover their datapath. The notifier and
>event handling code is based on the existing netvsc implementation.
>
>It exposes 2 sets of interfaces to the paravirtual drivers.
>1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>   the failover module provides interfaces to create/destroy additional
>   master netdev and all the slave events are managed internally.
>        net_failover_create()
>        net_failover_destroy()
>   A failover netdev is created that acts a master device and controls 2
>   slave devices. The original virtio_net netdev is registered as 'standby'
>   netdev and a passthru/vf device with the same MAC gets registered as
>   'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>   with the same 'pci' device.  The user accesses the network interface via
>   'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>   default for transmits when it is available with link up and running.
>2. For existing netvsc driver that uses 2 netdev model, no master netdev
>   is created. The paravirtual driver registers each instance of netvsc
>   as a 'failover' netdev  along with a set of ops to manage the slave
>   events. There is no 'standby' netdev in this model. A passthru/vf device
>   with the same MAC gets registered as 'primary' netdev.
>        net_failover_register()
>        net_failover_unregister()
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>---
> include/linux/netdevice.h  |  16 +
> include/net/net_failover.h |  62 ++++
> net/Kconfig                |  10 +
> net/core/Makefile          |   1 +
> net/core/net_failover.c    | 892 +++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 981 insertions(+)
> create mode 100644 include/net/net_failover.h
> create mode 100644 net/core/net_failover.c

checkpatch says:

_exportax/0002-net-Introduce-generic-failover-module.patch
----------------------------------------------------------
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#92: 
new file mode 100644

Please add an entry to the MAINTAINERS file.

^ permalink raw reply

* Re: [PATCH 2/2] bpf: btf: remove a couple conditions
From: Martin KaFai Lau @ 2018-04-27 17:55 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20180427172023.6japncdd3nbqauzn@kafai-mbp>

On Fri, Apr 27, 2018 at 10:20:25AM -0700, Martin KaFai Lau wrote:
> On Fri, Apr 27, 2018 at 05:04:59PM +0300, Dan Carpenter wrote:
> > We know "err" is zero so we can remove these and pull the code in one
> > indent level.
> > 
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> Thanks for the simplification!
> 
> Acked-by: Martin KaFai Lau <kafai@fb.com>
btw, it should be for bpf-next.  Please tag the subject with bpf-next when
you respin. Thanks!

> 
> > ---
> > This applies to the BPF tree (linux-next)
> > 
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index e631b6fd60d3..7cb0905f37c2 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -1973,16 +1973,14 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
> >  	if (err)
> >  		goto errout;
> >  
> > -	if (!err && log->level && bpf_verifier_log_full(log)) {
> > +	if (log->level && bpf_verifier_log_full(log)) {
> >  		err = -ENOSPC;
> >  		goto errout;
> >  	}
> >  
> > -	if (!err) {
> > -		btf_verifier_env_free(env);
> > -		btf_get(btf);
> > -		return btf;
> > -	}
> > +	btf_verifier_env_free(env);
> > +	btf_get(btf);
> > +	return btf;
> >  
> >  errout:
> >  	btf_verifier_env_free(env);

^ permalink raw reply

* Re: [net-next] ipv6: sr: Extract the right key values for "seg6_make_flowlabel"
From: David Miller @ 2018-04-27 17:59 UTC (permalink / raw)
  To: amsalam20; +Cc: dav.lebrun, netdev, linux-kernel
In-Reply-To: <1524751871-1353-1-git-send-email-amsalam20@gmail.com>

From: Ahmed Abdelsalam <amsalam20@gmail.com>
Date: Thu, 26 Apr 2018 16:11:11 +0200

> @@ -119,6 +119,9 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
>  	int hdrlen, tot_len, err;
>  	__be32 flowlabel;
>  
> +	inner_hdr = ipv6_hdr(skb);

You have to make this assignment after, not before, the skb_cow_header()
call.  Otherwise this point can be pointing to freed up memory.

^ permalink raw reply

* Re: [net-next] net: intel: Cleanup the copyright/license headers
From: David Miller @ 2018-04-27 18:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180426150809.11482-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 26 Apr 2018 08:08:09 -0700

> After many years of having a ~30 line copyright and license header to our
> source files, we are finally able to reduce that to one line with the
> advent of the SPDX identifier.
> 
> Also caught a few files missing the SPDX license identifier, so fixed
> them up.
> 
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
> Acked-by: Richard Cochran <richardcochran@gmail.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] fixes from 2018-04-17 - v3
From: David Miller @ 2018-04-27 18:03 UTC (permalink / raw)
  To: ubraun; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180426151823.78967-1-ubraun@linux.ibm.com>

From: Ursula Braun <ubraun@linux.ibm.com>
Date: Thu, 26 Apr 2018 17:18:19 +0200

> Version 3 changes
>    * no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
>    * allow fallback for some sockopts eliminating SMC usage
>    * when setting TCP_NODELAY always enforce data transmission
>      (not only together with corked data)

This looks a lot better than what you were doing before.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: remove mss check in tcp_select_initial_window()
From: David Miller @ 2018-04-27 18:05 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, ycheng, edumazet, soheil
In-Reply-To: <20180426165810.164524-1-tracywwnj@gmail.com>

From: Wei Wang <weiwan@google.com>
Date: Thu, 26 Apr 2018 09:58:10 -0700

> From: Wei Wang <weiwan@google.com>
> 
> In tcp_select_initial_window(), we only set rcv_wnd to
> tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
> rcv_wnd is kept at the full receive space of the socket which is a
> value way larger than tcp_default_init_rwnd().
> With larger initial rcv_wnd value, receive buffer autotuning logic
> takes longer to kick in and increase the receive buffer.
> 
> In a TCP throughput test where receiver has rmem[2] set to 125MB
> (wscale is 11), we see the connection gets recvbuf limited at the
> beginning of the connection and gets less throughput overall.
> 
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>

Very nice commit message.

Applied.

^ permalink raw reply

* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Thomas Deutschmann @ 2018-04-27 18:07 UTC (permalink / raw)
  To: Eddie Chapman, Greg KH; +Cc: stable, davem, nicolas.dichtel, netdev
In-Reply-To: <ae1401af-a400-f6de-658e-bae0b29c52e4@ehuk.net>

Hi Greg,

first, we need to cherry-pick another patch first:
 
> From 52a589d51f1008f62569bf89e95b26221ee76690 Mon Sep 17 00:00:00 2001
> From: Xin Long <lucien.xin@gmail.com>
> Date: Mon, 25 Dec 2017 14:43:58 +0800
> Subject: [PATCH] geneve: update skb dst pmtu on tx path
> 
> Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed
> a performance issue caused by the change of lower dev's mtu for vxlan.
> 
> The same thing needs to be done for geneve as well.
> 
> Note that geneve cannot adjust it's mtu according to lower dev's mtu
> when creating it. The performance is very low later when netperfing
> over it without fixing the mtu manually. This patch could also avoid
> this issue.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Then you can apply the following backport. A backport is required
because v4.15 has commit 77552cfa39c48e695c39d0553afc8c6018e411ce
which rewrote

> skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info);

into

> 		skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2,
> 						rel_info);

in net/ipv6/ip6_tunnel.c which is missing:

>From b2fb9a8178660f92c6ab29d3171bc44e2cb1b618 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 25 Jan 2018 19:03:03 +0100
Subject: net: don't call update_pmtu unconditionally

commit f15ca723c1ebe6c1a06bc95fda6b62cd87b44559 upstream.

Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c | 3 +--
 drivers/net/geneve.c                    | 4 ++--
 drivers/net/vxlan.c                     | 6 ++----
 include/net/dst.h                       | 8 ++++++++
 net/ipv4/ip_tunnel.c                    | 3 +--
 net/ipv4/ip_vti.c                       | 2 +-
 net/ipv6/ip6_tunnel.c                   | 5 ++---
 net/ipv6/ip6_vti.c                      | 2 +-
 net/ipv6/sit.c                          | 4 ++--
 9 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 7774654c2ccb..7a5ed5a5391e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1447,8 +1447,7 @@ void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb,
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int e = skb_queue_empty(&priv->cm.skb_queue);
 
-	if (skb_dst(skb))
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+	skb_dst_update_pmtu(skb, mtu);
 
 	skb_queue_tail(&priv->cm.skb_queue, skb);
 	if (e)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 1b0fcf0b2afa..fbc825ac97ab 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -829,7 +829,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		int mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) -
 			  GENEVE_BASE_HLEN - info->options_len - 14;
 
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+		skb_dst_update_pmtu(skb, mtu);
 	}
 
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
@@ -875,7 +875,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
 			  GENEVE_BASE_HLEN - info->options_len - 14;
 
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+		skb_dst_update_pmtu(skb, mtu);
 	}
 
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bb44f0c6891f..3d9c5b35a4a7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2158,8 +2158,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		if (skb_dst(skb)) {
 			int mtu = dst_mtu(ndst) - VXLAN_HEADROOM;
 
-			skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
-						       skb, mtu);
+			skb_dst_update_pmtu(skb, mtu);
 		}
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
@@ -2200,8 +2199,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		if (skb_dst(skb)) {
 			int mtu = dst_mtu(ndst) - VXLAN6_HEADROOM;
 
-			skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
-						       skb, mtu);
+			skb_dst_update_pmtu(skb, mtu);
 		}
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
diff --git a/include/net/dst.h b/include/net/dst.h
index 694c2e6ae618..ebfb4328fdb1 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -520,4 +520,12 @@ static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
 }
 #endif
 
+static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (dst && dst->ops->update_pmtu)
+		dst->ops->update_pmtu(dst, NULL, skb, mtu);
+}
+
 #endif /* _NET_DST_H */
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 13f7bbc0168d..a2fcc20774a6 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -521,8 +521,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 	else
 		mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
-	if (skb_dst(skb))
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+	skb_dst_update_pmtu(skb, mtu);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		if (!skb_is_gso(skb) &&
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 89453cf62158..c9cd891f69c2 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -209,7 +209,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+		skb_dst_update_pmtu(skb, mtu);
 		if (skb->protocol == htons(ETH_P_IP)) {
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 				  htonl(mtu));
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7e11f6a811f5..d61a82fd4b60 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -652,7 +652,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		if (rel_info > dst_mtu(skb_dst(skb2)))
 			goto out;
 
-		skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info);
+		skb_dst_update_pmtu(skb2, rel_info);
 	}
 	if (rel_type == ICMP_REDIRECT)
 		skb_dst(skb2)->ops->redirect(skb_dst(skb2), NULL, skb2);
@@ -1141,8 +1141,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 		mtu = 576;
 	}
 
-	if (skb_dst(skb) && !t->parms.collect_md)
-		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+	skb_dst_update_pmtu(skb, mtu);
 	if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
 		*pmtu = mtu;
 		err = -EMSGSIZE;
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 7c0f647b5195..2493a40bc4b1 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -486,7 +486,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 
 	mtu = dst_mtu(dst);
 	if (!skb->ignore_df && skb->len > mtu) {
-		skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+		skb_dst_update_pmtu(skb, mtu);
 
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			if (mtu < IPV6_MIN_MTU)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index f03c1a562135..b35d8905794c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -925,8 +925,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			df = 0;
 		}
 
-		if (tunnel->parms.iph.daddr && skb_dst(skb))
-			skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+		if (tunnel->parms.iph.daddr)
+			skb_dst_update_pmtu(skb, mtu);
 
 		if (skb->len > mtu && !skb_is_gso(skb)) {
 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
2.17.0

^ permalink raw reply related

* DSA
From: Dave Richards @ 2018-04-27 18:10 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

I am building a prototype for a new product based on a Lanner, Inc. embedded PC.  It is an Intel Celeron-based system with two host I210 GbE chips connected to 2 MV88E6172 chips (one NIC to one switch).  Everything appears to show up hardware-wise.  My question is, what is the next step?  How does DSA know which NICs are intended to be masters?  Is this supposed to be auto-detected or is this knowledge supposed to be communicated explicitly.  Reading through the DSA driver code I see that there is a check of the OF property list for the device for a "label"/"cpu" property/value pair that needs to be present.  Who sets this and when?

I'm sorry for this basic question, but Google has not enlightened me.

	Thanks!

	Dave


Dave Richards
VP Software Engineering
Impinj, Inc
400 Fairview Ave N. #1200
Seattle, WA 
O: (206) 812-9863

^ permalink raw reply

* Re: [PATCH bpf-next v2 00/15] Introducing AF_XDP support
From: Björn Töpel @ 2018-04-27 18:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Daniel Borkmann, Michael S. Tsirkin, Network Development,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAF=yD-JUezRSGP_6f=WDHqcznFEW4K95hd6qVK8BaAr6Q5VR5Q@mail.gmail.com>

2018-04-27 19:16 GMT+02:00 Willem de Bruijn <willemdebruijn.kernel@gmail.com>:
> On Fri, Apr 27, 2018 at 8:17 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> This patch set introduces a new address family called AF_XDP that is
>> optimized for high performance packet processing and, in upcoming
>> patch sets, zero-copy semantics. In this v2 version, we have removed
>> all zero-copy related code in order to make it smaller, simpler and
>> hopefully more review friendly. This patch set only supports copy-mode
>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>> work has already been accepted. We will publish our zero-copy support
>> for RX and TX on top of his patch sets at a later point in time.
>
>> Changes from V1:
>>
>> * Fixes to bugs spotted by Will in his review
>> * Implemented the performance otimization to BPF_MAP_TYPE_XSKMAP
>>   suggested by Will
>
> An xsk may only exist in one map at a time. Is this somehow assured?
>

Actually this is *not* the case. An xsk may reside in many maps, and
multiple times in the same map. So it's not assured at all. :-)

The restriction for an xsk is per netdev/queue/umem (and) the napi
context guarantee the SPSC constraint.

For the record, your XSKMAP suggestion gave ~100kpps in the ingress
path! Very nice!

>> * Refactored packet_direct_xmit to become a common function
>>   in core/dev.c as suggested by Will
>> * Added documentation as suggested by Jesper
>> * Proper page unpinning as suggested by MST
>> * Some minor code cleanups
>
> Everything else looks great to me. If the above is correct (or corrected)
>
> Acked-by: Willem de Bruijn <willemb@google.com>
>

Thanks for the in-depth review, Will! Very much appreciated! (bow)


Björn

> I did not read everything again, but applied both patchsets on top of
> bpf-next to do a diff of diffs. In case others find it useful:
>
>   https://github.com/wdebruij/linux/tree/bpf-next-afxdp-v1
>   https://github.com/wdebruij/linux/tree/bpf-next-afxdp-v2

^ permalink raw reply

* [PATCH 0/3] Clean up users of skb_tx_hash and __skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
  To: netdev, davem
  Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt

I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.

My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.

---

Alexander Duyck (3):
      opa_vnic: Just use skb_get_hash instead of skb_tx_hash
      mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
      net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash


 drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c   |   21 ++++++++++----------
 .../infiniband/ulp/opa_vnic/opa_vnic_internal.h    |    2 +-
 drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c  |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |    2 +-
 include/linux/netdevice.h                          |   13 ------------
 net/core/dev.c                                     |   10 ++++------
 6 files changed, 17 insertions(+), 33 deletions(-)

^ permalink raw reply

* [PATCH 1/3] opa_vnic: Just use skb_get_hash instead of skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
  To: netdev, davem
  Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>

This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.

The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.

Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.

Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.

In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c   |   21 ++++++++++----------
 .../infiniband/ulp/opa_vnic/opa_vnic_internal.h    |    2 +-
 drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c  |    2 +-
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
index 4be3aef..267da82 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
@@ -443,17 +443,16 @@ static u8 opa_vnic_get_rc(struct __opa_veswport_info *info,
 }
 
 /* opa_vnic_calc_entropy - calculate the packet entropy */
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb)
+u8 opa_vnic_calc_entropy(struct sk_buff *skb)
 {
-	u16 hash16;
-
-	/*
-	 * Get flow based 16-bit hash and then XOR the upper and lower bytes
-	 * to get the entropy.
-	 * __skb_tx_hash limits qcount to 16 bits. Hence, get 15-bit hash.
-	 */
-	hash16 = __skb_tx_hash(adapter->netdev, skb, BIT(15));
-	return (u8)((hash16 >> 8) ^ (hash16 & 0xff));
+	u32 hash = skb_get_hash(skb);
+
+	/* store XOR of all bytes in lower 8 bits */
+	hash ^= hash >> 8;
+	hash ^= hash >> 16;
+
+	/* return lower 8 bits as entropy */
+	return (u8)(hash & 0xFF);
 }
 
 /* opa_vnic_get_def_port - get default port based on entropy */
@@ -490,7 +489,7 @@ void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb)
 
 	hdr = skb_push(skb, OPA_VNIC_HDR_LEN);
 
-	entropy = opa_vnic_calc_entropy(adapter, skb);
+	entropy = opa_vnic_calc_entropy(skb);
 	def_port = opa_vnic_get_def_port(adapter, entropy);
 	len = opa_vnic_wire_length(skb);
 	dlid = opa_vnic_get_dlid(adapter, skb, def_port);
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
index afd95f4..43ac61f 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
@@ -299,7 +299,7 @@ struct opa_vnic_adapter *opa_vnic_add_netdev(struct ib_device *ibdev,
 void opa_vnic_rem_netdev(struct opa_vnic_adapter *adapter);
 void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
 u8 opa_vnic_get_vl(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
+u8 opa_vnic_calc_entropy(struct sk_buff *skb);
 void opa_vnic_process_vema_config(struct opa_vnic_adapter *adapter);
 void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter);
 void opa_vnic_query_mac_tbl(struct opa_vnic_adapter *adapter,
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
index ce57e0f..0c8aec6 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
@@ -104,7 +104,7 @@ static u16 opa_vnic_select_queue(struct net_device *netdev, struct sk_buff *skb,
 
 	/* pass entropy and vl as metadata in skb */
 	mdata = skb_push(skb, sizeof(*mdata));
-	mdata->entropy =  opa_vnic_calc_entropy(adapter, skb);
+	mdata->entropy = opa_vnic_calc_entropy(skb);
 	mdata->vl = opa_vnic_get_vl(adapter, skb);
 	rc = adapter->rn_ops->ndo_select_queue(netdev, skb,
 					       accel_priv, fallback);

^ permalink raw reply related

* [PATCH 2/3] mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
  To: netdev, davem
  Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>

The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6b68537..0227786 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -694,7 +694,7 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb,
 	u16 rings_p_up = priv->num_tx_rings_p_up;
 
 	if (netdev_get_num_tc(dev))
-		return skb_tx_hash(dev, skb);
+		return fallback(dev, skb);
 
 	return fallback(dev, skb) % rings_p_up;
 }

^ permalink raw reply related

* Re: [PATCH net-next 03/13] sctp: remove an if() that is always true
From: Marcelo Ricardo Leitner @ 2018-04-27 18:13 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, linux-sctp, Vlad Yasevich, Xin Long
In-Reply-To: <20180427105050.GA22078@hmswarspite.think-freely.org>

On Fri, Apr 27, 2018 at 06:50:50AM -0400, Neil Horman wrote:
> On Thu, Apr 26, 2018 at 04:58:52PM -0300, Marcelo Ricardo Leitner wrote:
> > As noticed by Xin Long, the if() here is always true as PMTU can never
> > be 0.
> >
> > Reported-by: Xin Long <lucien.xin@gmail.com>
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> >  net/sctp/associola.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> > index b3aa95222bd52113295cb246c503c903bdd5c353..c5ed09cfa8423b17546e3d45f6d06db03af66384 100644
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -1397,10 +1397,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
> >  			pmtu = t->pathmtu;
> >  	}
> >
> > -	if (pmtu) {
> > -		asoc->pathmtu = pmtu;
> > -		asoc->frag_point = sctp_frag_point(asoc, pmtu);
> > -	}
> > +	asoc->pathmtu = pmtu;
> > +	asoc->frag_point = sctp_frag_point(asoc, pmtu);
> >
> Can you double check this?  Looking at it, it seems far fetched, but if someone

Sure.

> sends a crafted icmp dest unreach message to the host, pmtu_sending might be
> able to get set for an association (which may have no transports established
> yet), and if so, on the first packet send sctp_assoc_sync_pmtu can be called,
> leading to a fall through in the loop over all transports, and pmtu being zero.
> It seems like a far fetched set of circumstances, I know, but if it can happen,
> I think you might see a crash in sctp_frag_point due to an underflow of the frag
> value

If I got you right, this situation would not happen because when
handling the icmp it will check if there is a transport and ignore it
otherwise.

  Marcelo

>
> Neil
>
> >  	pr_debug("%s: asoc:%p, pmtu:%d, frag_point:%d\n", __func__, asoc,
> >  		 asoc->pathmtu, asoc->frag_point);
> > --
> > 2.14.3
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH 3/3] net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
From: Alexander Duyck @ 2018-04-27 18:06 UTC (permalink / raw)
  To: netdev, davem
  Cc: linux-rdma, dennis.dalessandro, niranjana.vishwanathapura, tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>

I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/linux/netdevice.h |   13 -------------
 net/core/dev.c            |   10 ++++------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf44503..6da5371 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3213,19 +3213,6 @@ static inline int netif_set_xps_queue(struct net_device *dev,
 }
 #endif
 
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
-		  unsigned int num_tx_queues);
-
-/*
- * Returns a Tx hash for the given packet when dev->real_num_tx_queues is used
- * as a distribution range limit for the returned value.
- */
-static inline u16 skb_tx_hash(const struct net_device *dev,
-			      struct sk_buff *skb)
-{
-	return __skb_tx_hash(dev, skb, dev->real_num_tx_queues);
-}
-
 /**
  *	netif_is_multiqueue - test if device has multiple transmit queues
  *	@dev: network device
diff --git a/net/core/dev.c b/net/core/dev.c
index 9b04a9f..7584d9c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2614,17 +2614,16 @@ void netif_device_attach(struct net_device *dev)
  * Returns a Tx hash based on the given packet descriptor a Tx queues' number
  * to be used as a distribution range.
  */
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
-		  unsigned int num_tx_queues)
+static u16 skb_tx_hash(const struct net_device *dev, struct sk_buff *skb)
 {
 	u32 hash;
 	u16 qoffset = 0;
-	u16 qcount = num_tx_queues;
+	u16 qcount = dev->real_num_tx_queues;
 
 	if (skb_rx_queue_recorded(skb)) {
 		hash = skb_get_rx_queue(skb);
-		while (unlikely(hash >= num_tx_queues))
-			hash -= num_tx_queues;
+		while (unlikely(hash >= qcount))
+			hash -= qcount;
 		return hash;
 	}
 
@@ -2637,7 +2636,6 @@ u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
 
 	return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
 }
-EXPORT_SYMBOL(__skb_tx_hash);
 
 static void skb_warn_bad_offload(const struct sk_buff *skb)
 {

^ permalink raw reply related

* [PATCH net] bridge: netfilter stp fix reference to uninitialized data
From: Stephen Hemminger @ 2018-04-27 18:16 UTC (permalink / raw)
  To: pablo, kadlec, fw, davem
  Cc: netfilter-devel, bridge, netdev, Stephen Hemminger,
	Stephen Hemminger

The destination mac (destmac) is only valid if EBT_DESTMAC flag
is set. Fix by changing the order of the comparison to look for
the flag first.

Reported-by: syzbot+5c06e318fc558cc27823@syzkaller.appspotmail.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
Note: no fixes since this bug goes back to pre-git days.
Should go to stable as well.

 net/bridge/netfilter/ebt_stp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebt_stp.c b/net/bridge/netfilter/ebt_stp.c
index 47ba98db145d..46c1fe7637ea 100644
--- a/net/bridge/netfilter/ebt_stp.c
+++ b/net/bridge/netfilter/ebt_stp.c
@@ -161,8 +161,8 @@ static int ebt_stp_mt_check(const struct xt_mtchk_param *par)
 	/* Make sure the match only receives stp frames */
 	if (!par->nft_compat &&
 	    (!ether_addr_equal(e->destmac, eth_stp_addr) ||
-	     !is_broadcast_ether_addr(e->destmsk) ||
-	     !(e->bitmask & EBT_DESTMAC)))
+	     !(e->bitmask & EBT_DESTMAC) ||
+	     !is_broadcast_ether_addr(e->destmsk)))
 		return -EINVAL;
 
 	return 0;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next] net: sch: prio: Set bands to default on delete instead of noop
From: Cong Wang @ 2018-04-27 18:20 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: Linux Kernel Network Developers, David Miller, Jiri Pirko,
	Jamal Hadi Salim, mlxsw
In-Reply-To: <1524749556-36199-1-git-send-email-nogahf@mellanox.com>

On Thu, Apr 26, 2018 at 6:32 AM, Nogah Frankel <nogahf@mellanox.com> wrote:
> When a band is created, it is set to the default qdisc, which is
> "invisible" pfifo.


Isn't TCA_DUMP_INVISIBLE for dumping this invisible qdisc?


> However, if a band is set to a qdisc that is later being deleted, it will
> be set to noop qdisc. This can cause a packet loss, while there is no clear
> user indication for it. ("invisible" qdisc are not being shown by default).
> This patch sets a band to the default qdisc, rather then the noop qdisc, on
> delete operation.

It is set to noop historically, may be not reasonable but changing
it could break things.

What's wrong with using TCA_DUMP_INVISIBLE to dump it?

^ permalink raw reply

* Re: DSA
From: Florian Fainelli @ 2018-04-27 18:32 UTC (permalink / raw)
  To: Dave Richards, netdev@vger.kernel.org, andrew, vivien.didelot
In-Reply-To: <MWHPR06MB3503CE521D6993C7786A3E93DC8D0@MWHPR06MB3503.namprd06.prod.outlook.com>

Hello,

On 04/27/2018 11:10 AM, Dave Richards wrote:
> Hello,
> 
> I am building a prototype for a new product based on a Lanner, Inc. embedded PC.  It is an Intel Celeron-based system with two host I210 GbE chips connected to 2 MV88E6172 chips (one NIC to one switch).  Everything appears to show up hardware-wise.  My question is, what is the next step?  How does DSA know which NICs are intended to be masters?  Is this supposed to be auto-detected or is this knowledge supposed to be communicated explicitly.  Reading through the DSA driver code I see that there is a check of the OF property list for the device for a "label"/"cpu" property/value pair that needs to be present.  Who sets this and when?

On system where Device Tree can be used, we expect you to declare all
relevant peripherals in Device Tree and those would include the Ethernet
controller (i210) and the Ethernet switches. An example of this can be
found here:

On x86, there is not an universal use of Device Tree, so we can use
something along these lines to register an Ethernet switch through DSA:

https://github.com/lunn/linux/commit/34055b931848545b6ba11ee50b88e89aeb02c9a5

There might be a way for you to use a conjuction of DMI Match entries to
match your specific board design and based on that run the DSA switch
registration code, indicating the port mapping and the Ethernet
controller to be used as a "master device.
-- 
Florian

^ permalink raw reply

* Re: [pull request][net 0/7] Mellanox, mlx5 fixes 2018-04-26
From: David Miller @ 2018-04-27 18:32 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20180426195842.29665-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 26 Apr 2018 12:58:35 -0700

> This pull request includes fixes for mlx5 core and netdev driver.
> 
> Please pull and let me know if there's any problems.

Pulled.

> For -stable v4.12
>     net/mlx5e: TX, Use correct counter in dma_map error flow
> For -stable v4.13
>     net/mlx5: Avoid cleaning flow steering table twice during error flow
> For -stable v4.14
>     net/mlx5e: Allow offloading ipv4 header re-write for icmp
> For -stable v4.15
>     net/mlx5e: DCBNL fix min inline header size for dscp
> For -stable v4.16
>     net/mlx5: Fix mlx5_get_vector_affinity function

Queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: David Miller @ 2018-04-27 18:33 UTC (permalink / raw)
  To: sbrivio; +Cc: steffen.klassert, lucien.xin, alexey.kodanev, jarod, sd, netdev
In-Reply-To: <c2369c8f004006b33007bad40b63c35f50ff3c23.1524764073.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Thu, 26 Apr 2018 19:41:02 +0200

> A vti6 interface can carry IPv4 packets too.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Applied, thank you.

^ permalink raw reply

* Re: Suggestions on iterating eBPF maps
From: Chenbo Feng @ 2018-04-27 18:33 UTC (permalink / raw)
  To: netdev, Alexei Starovoitov, Daniel Borkmann
  Cc: Lorenzo Colitti, Joel Fernandes
In-Reply-To: <CAMOXUJm3pYr4RwzJSYop3jFT+NjrcjKgm=QGZbbP_1U2299JhQ@mail.gmail.com>

resend with  plain text

On Fri, Apr 27, 2018 at 11:22 AM Chenbo Feng <fengc@google.com> wrote:

> Hi net-next,

> When doing the eBPF tools user-space development I noticed that the map
iterating process in user-space have some little flaws. If we want to dump
the whole map. The only way now I know is to use a null key to start the
iteration and keep calling bpf_get_next_key and bpf_look_up_elem for each
new key value pair until we reach the end of the map. I noticed the
bpftools recently added used the similar approach.

> The overhead of repeating syscalls is acceptable, but the race problem
come with this iteration process is a little annoying. If the current key
we are using get deleted before we do the syscall to get the next key . The
next key returned will start from the beginning of the map again and some
entry will be dumped again depending on the position of the key deleted. If
the racing problem is within the same userspace process, it can be easily
fixed by adding some read/write locks. However, if multiple processes is
reading the map through pinned fd while there is one process is editing the
map entry or the kernel program is deleting entries, it become harder to
get a consistent and correct map dump.

> We are wondering if there is already implementation we didn't notice in
mainline kernel that help improved this iteration process and addressed the
racing problem mentioned above? If not, what can be down to address the
issue above. One thing we came up with is to use a single entry bpf map as
a across process lock to prevent multiple userspace process to read/write
other maps at the same time. But I don't know how safe this solution is
since there will still be a race to read the lock map value and setup the
lock.

> Thanks
> Chenbo Feng

^ permalink raw reply

* Re: [PATCH net-next 00/13] sctp: refactor MTU handling
From: David Miller @ 2018-04-27 18:42 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman, lucien.xin
In-Reply-To: <cover.1524772453.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu, 26 Apr 2018 16:58:49 -0300

> Currently MTU handling is spread over SCTP stack. There are multiple
> places doing same/similar calculations and updating them is error prone
> as one spot can easily be left out.
> 
> This patchset converges it into a more concise and consistent code. In
> general, it moves MTU handling from functions with bigger objectives,
> such as sctp_assoc_add_peer(), to specific functions.
> 
> It's also a preparation for the next patchset, which removes the
> duplication between sctp_make_op_error_space and
> sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.
> 
> More details on each patch.

Series applied, thanks!

^ permalink raw reply

* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Eddie Chapman @ 2018-04-27 18:43 UTC (permalink / raw)
  To: Thomas Deutschmann, Greg KH; +Cc: stable, davem, nicolas.dichtel, netdev
In-Reply-To: <f663a39c-25f4-0558-21e0-a25c21287384@gentoo.org>

On 27/04/18 19:07, Thomas Deutschmann wrote:
> Hi Greg,
> 
> first, we need to cherry-pick another patch first:
>   
>>  From 52a589d51f1008f62569bf89e95b26221ee76690 Mon Sep 17 00:00:00 2001
>> From: Xin Long <lucien.xin@gmail.com>
>> Date: Mon, 25 Dec 2017 14:43:58 +0800
>> Subject: [PATCH] geneve: update skb dst pmtu on tx path
>>
>> Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed
>> a performance issue caused by the change of lower dev's mtu for vxlan.
>>
>> The same thing needs to be done for geneve as well.
>>
>> Note that geneve cannot adjust it's mtu according to lower dev's mtu
>> when creating it. The performance is very low later when netperfing
>> over it without fixing the mtu manually. This patch could also avoid
>> this issue.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>

Oops, I completely missed that the coreos patch doesn't have the geneve 
hunk that is in the original 4.15 patch. I don't load the geneve module 
on my box hence why no problems surfaced on my machine.

Thanks Thomas for the correct instructions. Ignore my message Greg, I'll 
drop back into the shadows where I belong, sorry for the noise!

^ permalink raw reply

* [PATCH v5 net-next 0/3] lan78xx updates along with Fixed phy Support
From: Raghuram Chary J @ 2018-04-27 18:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli

These series of patches handle few modifications in driver
and adds support for fixed phy.

Raghuram Chary J (3):
  lan78xx: Lan7801 Support for Fixed PHY
  lan78xx: Remove DRIVER_VERSION for lan78xx driver
  lan78xx: Modify error messages

 drivers/net/usb/Kconfig   |   1 +
 drivers/net/usb/lan78xx.c | 110 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 79 insertions(+), 32 deletions(-)

-- 
2.16.2

^ permalink raw reply

* [PATCH v5 net-next 1/3] lan78xx: Lan7801 Support for Fixed PHY
From: Raghuram Chary J @ 2018-04-27 18:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
In-Reply-To: <20180427184744.28798-1-raghuramchary.jallipalli@microchip.com>

Adding Fixed PHY support to the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
v0->v1:
   * Remove driver version #define
   * Modify netdev_info to netdev_dbg
   * Move lan7801 specific to new routine and add switch case
   * Minor cleanup

v1->v2:
   * Removed fixedphy variable and used phy_is_pseudo_fixed_link() check.
v2->v3:
   * Revert driver version, debug statment changes for separate patch.
   * Modify lan7801 specific routine with return type struct phy_device.
v3->v4:
   * Modify lan7801 specific routine by removing phydev arg and get phydev.
v4->v5:
   * Patched in latest net-next.
   * Also handled error condition for fixedphy.
---
 drivers/net/usb/Kconfig   |   1 +
 drivers/net/usb/lan78xx.c | 104 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index f28bd74ac275..418b0904cecb 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -111,6 +111,7 @@ config USB_LAN78XX
 	select MII
 	select PHYLIB
 	select MICROCHIP_PHY
+	select FIXED_PHY
 	help
 	  This option adds support for Microchip LAN78XX based USB 2
 	  & USB 3 10/100/1000 Ethernet adapters.
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index c59f8afd0d73..81dfd10c3b92 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,7 +36,7 @@
 #include <linux/irq.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/microchipphy.h>
-#include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include <linux/of_mdio.h>
 #include <linux/of_net.h>
 #include "lan78xx.h"
@@ -2063,52 +2063,91 @@ static int ksz9031rnx_fixup(struct phy_device *phydev)
 	return 1;
 }
 
-static int lan78xx_phy_init(struct lan78xx_net *dev)
+static struct phy_device *lan7801_phy_init(struct lan78xx_net *dev)
 {
+	u32 buf;
 	int ret;
-	u32 mii_adv;
+	struct fixed_phy_status fphy_status = {
+		.link = 1,
+		.speed = SPEED_1000,
+		.duplex = DUPLEX_FULL,
+	};
 	struct phy_device *phydev;
 
 	phydev = phy_find_first(dev->mdiobus);
 	if (!phydev) {
-		netdev_err(dev->net, "no PHY found\n");
-		return -EIO;
-	}
-
-	if ((dev->chipid == ID_REV_CHIP_ID_7800_) ||
-	    (dev->chipid == ID_REV_CHIP_ID_7850_)) {
-		phydev->is_internal = true;
-		dev->interface = PHY_INTERFACE_MODE_GMII;
-
-	} else if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+		netdev_dbg(dev->net, "PHY Not Found!! Registering Fixed PHY\n");
+		phydev = fixed_phy_register(PHY_POLL, &fphy_status, -1,
+					    NULL);
+		if (IS_ERR(phydev)) {
+			netdev_err(dev->net, "No PHY/fixed_PHY found\n");
+			return NULL;
+		}
+		netdev_dbg(dev->net, "Registered FIXED PHY\n");
+		dev->interface = PHY_INTERFACE_MODE_RGMII;
+		ret = lan78xx_write_reg(dev, MAC_RGMII_ID,
+					MAC_RGMII_ID_TXC_DELAY_EN_);
+		ret = lan78xx_write_reg(dev, RGMII_TX_BYP_DLL, 0x3D00);
+		ret = lan78xx_read_reg(dev, HW_CFG, &buf);
+		buf |= HW_CFG_CLK125_EN_;
+		buf |= HW_CFG_REFCLK25_EN_;
+		ret = lan78xx_write_reg(dev, HW_CFG, buf);
+	} else {
 		if (!phydev->drv) {
 			netdev_err(dev->net, "no PHY driver found\n");
-			return -EIO;
+			return NULL;
 		}
-
 		dev->interface = PHY_INTERFACE_MODE_RGMII;
-
 		/* external PHY fixup for KSZ9031RNX */
 		ret = phy_register_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0,
 						 ksz9031rnx_fixup);
 		if (ret < 0) {
 			netdev_err(dev->net, "fail to register fixup\n");
-			return ret;
+			return NULL;
 		}
 		/* external PHY fixup for LAN8835 */
 		ret = phy_register_fixup_for_uid(PHY_LAN8835, 0xfffffff0,
 						 lan8835_fixup);
 		if (ret < 0) {
 			netdev_err(dev->net, "fail to register fixup\n");
-			return ret;
+			return NULL;
 		}
 		/* add more external PHY fixup here if needed */
 
 		phydev->is_internal = false;
-	} else {
-		netdev_err(dev->net, "unknown ID found\n");
-		ret = -EIO;
-		goto error;
+	}
+	return phydev;
+}
+
+static int lan78xx_phy_init(struct lan78xx_net *dev)
+{
+	int ret;
+	u32 mii_adv;
+	struct phy_device *phydev;
+
+	switch (dev->chipid) {
+	case ID_REV_CHIP_ID_7801_:
+		phydev = lan7801_phy_init(dev);
+		if (!phydev) {
+			netdev_err(dev->net, "lan7801: PHY Init Failed");
+			return -EIO;
+		}
+		break;
+
+	case ID_REV_CHIP_ID_7800_:
+	case ID_REV_CHIP_ID_7850_:
+		phydev = phy_find_first(dev->mdiobus);
+		if (!phydev) {
+			netdev_err(dev->net, "no PHY found\n");
+			return -EIO;
+		}
+		phydev->is_internal = true;
+		dev->interface = PHY_INTERFACE_MODE_GMII;
+		break;
+
+	default:
+		netdev_err(dev->net, "Unknown CHIP ID found\n");
+		return -EIO;
 	}
 
 	/* if phyirq is not set, use polling mode in phylib */
@@ -2127,6 +2166,16 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 	if (ret) {
 		netdev_err(dev->net, "can't attach PHY to %s\n",
 			   dev->mdiobus->id);
+		if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+			if (phy_is_pseudo_fixed_link(phydev)) {
+				fixed_phy_unregister(phydev);
+			} else {
+				phy_unregister_fixup_for_uid(PHY_KSZ9031RNX,
+							     0xfffffff0);
+				phy_unregister_fixup_for_uid(PHY_LAN8835,
+							     0xfffffff0);
+			}
+		}
 		return -EIO;
 	}
 
@@ -2166,12 +2215,6 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 	dev->fc_autoneg = phydev->autoneg;
 
 	return 0;
-
-error:
-	phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
-	phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
-
-	return ret;
 }
 
 static int lan78xx_set_rx_max_frame_length(struct lan78xx_net *dev, int size)
@@ -3569,6 +3612,7 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 	struct lan78xx_net		*dev;
 	struct usb_device		*udev;
 	struct net_device		*net;
+	struct phy_device		*phydev;
 
 	dev = usb_get_intfdata(intf);
 	usb_set_intfdata(intf, NULL);
@@ -3577,12 +3621,16 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 
 	udev = interface_to_usbdev(intf);
 	net = dev->net;
+	phydev = net->phydev;
 
 	phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
 	phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
 
 	phy_disconnect(net->phydev);
 
+	if (phy_is_pseudo_fixed_link(phydev))
+		fixed_phy_unregister(phydev);
+
 	unregister_netdev(net);
 
 	cancel_delayed_work_sync(&dev->wq);
-- 
2.16.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox