Netdev List
 help / color / mirror / Atom feed
* [PATCH] tipc: Fix tipc_sk_reinit handling of -EAGAIN
From: Bob Peterson @ 2017-08-23 14:43 UTC (permalink / raw)
  To: herbert, Phil Sutter, davem, netdev; +Cc: LKML
In-Reply-To: <1907606232.1152250.1503498773672.JavaMail.zimbra@redhat.com>

Hi,

In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.

However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:

 * ...Note that we take the RCU lock in all
 * cases including when we return an error.  So you must always call
 * rhashtable_walk_stop to clean up.

This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 101e3597338f..d50edd6e0019 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2255,8 +2255,8 @@ void tipc_sk_reinit(struct net *net)
 
 	do {
 		tsk = ERR_PTR(rhashtable_walk_start(&iter));
-		if (tsk)
-			continue;
+		if (IS_ERR(tsk))
+			goto walk_stop;
 
 		while ((tsk = rhashtable_walk_next(&iter)) && !IS_ERR(tsk)) {
 			spin_lock_bh(&tsk->sk.sk_lock.slock);
@@ -2265,7 +2265,7 @@ void tipc_sk_reinit(struct net *net)
 			msg_set_orignode(msg, tn->own_addr);
 			spin_unlock_bh(&tsk->sk.sk_lock.slock);
 		}
-
+walk_stop:
 		rhashtable_walk_stop(&iter);
 	} while (tsk == ERR_PTR(-EAGAIN));
 }

^ permalink raw reply related

* Re: [PATCH v2 net-next 1/5] selftests/bpf: add a test for a bug in liveness-based pruning
From: Alexei Starovoitov via iovisor-dev @ 2017-08-23 14:43 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <83dbbb0c-3cbc-e033-0ceb-f31db6eb57c2-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 8/23/17 7:09 AM, Edward Cree wrote:
> Writes in straight-line code should not prevent reads from propagating
>  along jumps.  With current verifier code, the jump from 3 to 5 does not
>  add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
>  the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.
>
> Verifier output:
> 0: (61) r2 = *(u32 *)(r1 +0)
> 1: (35) if r2 >= 0x0 goto pc+1
>  R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
> 2: (b7) r0 = 0
> 3: (35) if r2 >= 0x0 goto pc+1
>  R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
> 4: (b7) r0 = 0
> 5: (95) exit
>
> from 3 to 5: safe
>
> from 1 to 3: safe
> processed 8 insns, stack depth 0
>
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 3/5] selftests/bpf: add a test for a pruning bug in the verifier
From: Daniel Borkmann @ 2017-08-23 14:43 UTC (permalink / raw)
  To: Edward Cree, davem, Alexei Starovoitov, Alexei Starovoitov
  Cc: netdev, iovisor-dev
In-Reply-To: <bbfb3545-abbd-9c9d-1632-64804d1b7cc2@solarflare.com>

On 08/23/2017 04:10 PM, Edward Cree wrote:
> From: Alexei Starovoitov <ast@fb.com>
>
> The test makes a read through a map value pointer, then considers pruning
>   a branch where the register holds an adjusted map value pointer.  It
>   should not prune, but currently it does.
>
> Signed-off-by: Alexei Starovoitov <ast@fb.com>
> [ecree@solarflare.com: added test-name and patch description]
> Signed-off-by: Edward Cree <ecree@solarflare.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH v2 net-next 4/5] bpf/verifier: remove varlen_map_value_access flag
From: Daniel Borkmann @ 2017-08-23 14:44 UTC (permalink / raw)
  To: Edward Cree, davem, Alexei Starovoitov, Alexei Starovoitov
  Cc: netdev, iovisor-dev
In-Reply-To: <34d97e19-2de8-d336-ba13-77d3b02c5f20@solarflare.com>

On 08/23/2017 04:10 PM, Edward Cree wrote:
> The optimisation it does is broken when the 'new' register value has a
>   variable offset and the 'old' was constant.  I broke it with my pointer
>   types unification (see Fixes tag below), before which the 'new' value
>   would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
>   other changes in that patch mean that its original behaviour (ignore
>   min/max values) cannot be restored.
> Tests on a sample set of cilium programs show no change in count of
>   processed instructions.
>
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Signed-off-by: Edward Cree <ecree@solarflare.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-23 14:47 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Willem de Bruijn, virtualization, Network Development
In-Reply-To: <1503498504.8694.26.camel@klaipeden.com>

On Wed, 2017-08-23 at 23:28 +0900, Koichiro Den wrote:
> On Tue, 2017-08-22 at 20:55 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 22, 2017 at 10:50:41AM +0800, Jason Wang wrote:
> > > > Perhaps the descriptor pool should also be
> > > > revised to allow out of order completions. Then there is no need to
> > > > copy zerocopy packets whenever they may experience delay.
> > > 
> > > Yes, but as replied in the referenced thread, windows driver may treat out
> > > of order completion as a bug.
> > 
> > That would be a windows driver bug then, but I don't think it makes this
> > assumption. What the referenced thread
> > (https://patchwork.kernel.org/patch/3787671/) is saying is that host
> > must use any buffers made available on a tx vq within a reasonable
> > timeframe otherwise windows guests panic.
> > 
> > Ideally we would detect that a packet is actually experiencing delay and
> > trigger the copy at that point e.g. by calling skb_linearize. But it
> > isn't easy to track these packets though and even harder to do a data
> > copy without races.
> > 
> > Which reminds me that skb_linearize in net core seems to be
> > fundamentally racy - I suspect that if skb is cloned, and someone is
> > trying to use the shared frags while another thread calls skb_linearize,
> > we get some use after free bugs which likely mostly go undetected
> > because the corrupted packets mostly go on wire and get dropped
> > by checksum code.
> > 
> 
> Please let me make sure if I understand it correctly:
> * always do copy with skb_orphan_frags_rx as Willem mentioned in the earlier
> post, before the xmit_skb as opposed to my original patch, is safe but too
> costly so cannot be adopted.
> * as a generic solution, if we were to somehow overcome the safety issue,
> track
> the delay and do copy if some threshold is reached could be an answer, but
> it's
> hard for now.
> * so things like the current vhost-net implementation of deciding whether or
> not
> to do zerocopy beforehand referring the zerocopy tx error ratio is a point of
> practical compromise.
<- I forgot to mention the max pend checking part.
> 
> Thanks.

^ permalink raw reply

* Re: [PATCH v2] at76c50x-usb: check memory allocation failure
From: Kalle Valo @ 2017-08-23 14:49 UTC (permalink / raw)
  To: Himanshu Jha
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1503498556-13835-1-git-send-email-himanshujha199640-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Himanshu Jha <himanshujha199640-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Check memory allocation failure and return -ENOMEM rather than just
> retuning void.
>
> Signed-off-by: Himanshu Jha <himanshujha199640-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[...]

> -static void at76_dump_mib_mac_addr(struct at76_priv *priv)
> +static int at76_dump_mib_mac_addr(struct at76_priv *priv)
>  {
>  	int i;
>  	int ret;
> @@ -940,7 +940,7 @@ static void at76_dump_mib_mac_addr(struct at76_priv *priv)
>  					 GFP_KERNEL);
>  
>  	if (!m)
> -		return;
> +		return -ENOMEM;

This feels rather pointless. You are changing these functions to return
an error code but the callers it anyway. And besides, did you check
where these are actually used?

	if (at76_debug & DBG_MIB) {
		at76_dump_mib_mac(priv);
		at76_dump_mib_mac_addr(priv);
		at76_dump_mib_mac_mgmt(priv);
		at76_dump_mib_mac_wep(priv);
		at76_dump_mib_mdomain(priv);
		at76_dump_mib_phy(priv);
		at76_dump_mib_local(priv);
	}

So it's just for debug messages and the current error handling looks the
correct approach. There is not much anything else we can do than just
skip the message printout.

-- 
Kalle Valo

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Alexander Duyck @ 2017-08-23 14:51 UTC (permalink / raw)
  To: Michael Chan
  Cc: Duyck, Alexander H, john.fastabend@gmail.com, brouer@redhat.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <CACKFLinJ0N7b8Xhq4ZoHdB80uXp_MU_vVyzZa7Dq11XrXsvDbw@mail.gmail.com>

On Tue, Aug 22, 2017 at 11:59 PM, Michael Chan
<michael.chan@broadcom.com> wrote:
> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>
>>> Right, but it's conceivable to add an API to "return" the buffer to
>>> the input device, right?
>>
>> You could, it is just added complexity. "just free the buffer" in
>> ixgbe usually just amounts to one atomic operation to decrement the
>> total page count since page recycling is already implemented in the
>> driver. You still would have to unmap the buffer regardless of if you
>> were recycling it or not so all you would save is 1.000015259 atomic
>> operations per packet. The fraction is because once every 64K uses we
>> have to bulk update the count on the page.
>>
>
> If the buffer is returned to the input device, the input device can
> keep the DMA mapping.  All it needs to do is to dma_sync it back to
> the input device when the buffer is returned.

That is what ixgbe is already doing. The Rx path doesn't free the
page, it just treats it as a bounce buffer and uses the page count to
make certain we don't use a section of the buffer that is already in
use.

- Alex

^ permalink raw reply

* [PATCH net-next 1/3] gre: refactor the gre_fb_xmit
From: William Tu @ 2017-08-23 14:55 UTC (permalink / raw)
  To: netdev

The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 net/ipv4/ip_gre.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5a20ba9b9b50..cc4c6285179a 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -433,39 +433,33 @@ static struct rtable *gre_get_rt(struct sk_buff *skb,
 	return ip_route_output_key(net, fl);
 }
 
-static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
-			__be16 proto)
+static struct rtable *prepare_fb_xmit(struct sk_buff *skb,
+				      struct net_device *dev,
+				      struct flowi4 *fl,
+				      int tunnel_hlen)
 {
 	struct ip_tunnel_info *tun_info;
 	const struct ip_tunnel_key *key;
 	struct rtable *rt = NULL;
-	struct flowi4 fl;
 	int min_headroom;
-	int tunnel_hlen;
-	__be16 df, flags;
 	bool use_cache;
 	int err;
 
 	tun_info = skb_tunnel_info(skb);
-	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
-		     ip_tunnel_info_af(tun_info) != AF_INET))
-		goto err_free_skb;
-
 	key = &tun_info->key;
 	use_cache = ip_tunnel_dst_cache_usable(skb, tun_info);
+
 	if (use_cache)
-		rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl.saddr);
+		rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl->saddr);
 	if (!rt) {
-		rt = gre_get_rt(skb, dev, &fl, key);
+		rt = gre_get_rt(skb, dev, fl, key);
 		if (IS_ERR(rt))
-				goto err_free_skb;
+			goto err_free_skb;
 		if (use_cache)
 			dst_cache_set_ip4(&tun_info->dst_cache, &rt->dst,
-					  fl.saddr);
+					  fl->saddr);
 	}
 
-	tunnel_hlen = gre_calc_hlen(key->tun_flags);
-
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ tunnel_hlen + sizeof(struct iphdr);
 	if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
@@ -477,6 +471,37 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (unlikely(err))
 			goto err_free_rt;
 	}
+	return rt;
+
+err_free_rt:
+	ip_rt_put(rt);
+err_free_skb:
+	kfree_skb(skb);
+	dev->stats.tx_dropped++;
+	return NULL;
+}
+
+static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+			__be16 proto)
+{
+	struct ip_tunnel_info *tun_info;
+	const struct ip_tunnel_key *key;
+	struct rtable *rt = NULL;
+	struct flowi4 fl;
+	int tunnel_hlen;
+	__be16 df, flags;
+
+	tun_info = skb_tunnel_info(skb);
+	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     ip_tunnel_info_af(tun_info) != AF_INET))
+		goto err_free_skb;
+
+	key = &tun_info->key;
+	tunnel_hlen = gre_calc_hlen(key->tun_flags);
+
+	rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+	if (!rt)
+		return;
 
 	/* Push Tunnel header. */
 	if (gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM)))
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/3] gre: add collect_md mode to ERSPAN tunnel
From: William Tu @ 2017-08-23 14:55 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1503500157-3717-1-git-send-email-u9012063@gmail.com>

Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away.  OVS can use it as well in the future.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 include/net/ip_tunnels.h |   4 +-
 net/ipv4/ip_gre.c        | 102 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 625c29329372..992652856fe8 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -154,8 +154,10 @@ struct ip_tunnel {
 #define TUNNEL_GENEVE_OPT	__cpu_to_be16(0x0800)
 #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
 #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
+#define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
 
-#define TUNNEL_OPTIONS_PRESENT	(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT)
+#define TUNNEL_OPTIONS_PRESENT \
+		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
 
 struct tnl_ptk_info {
 	__be16 flags;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index cc4c6285179a..6840771912b5 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -113,6 +113,8 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static int ipgre_tunnel_init(struct net_device *dev);
+static void erspan_build_header(struct sk_buff *skb,
+				__be32 id, u32 index, bool truncate);
 
 static unsigned int ipgre_net_id __read_mostly;
 static unsigned int gre_tap_net_id __read_mostly;
@@ -288,7 +290,33 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 					   false, false) < 0)
 			goto drop;
 
-		tunnel->index = ntohl(index);
+		if (tunnel->collect_md) {
+			struct ip_tunnel_info *info;
+			struct erspan_metadata *md;
+			__be64 tun_id;
+			__be16 flags;
+
+			tpi->flags |= TUNNEL_KEY;
+			flags = tpi->flags;
+			tun_id = key32_to_tunnel_id(tpi->key);
+
+			tun_dst = ip_tun_rx_dst(skb, flags,
+						tun_id, sizeof(*md));
+			if (!tun_dst)
+				return PACKET_REJECT;
+
+			md = ip_tunnel_info_opts(&tun_dst->u.tun_info);
+			if (!md)
+				return PACKET_REJECT;
+
+			md->index = index;
+			info = &tun_dst->u.tun_info;
+			info->key.tun_flags |= TUNNEL_ERSPAN_OPT;
+			info->options_len = sizeof(*md);
+		} else {
+			tunnel->index = ntohl(index);
+		}
+
 		skb_reset_mac_header(skb);
 		ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
 		return PACKET_RCVD;
@@ -524,6 +552,64 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 	dev->stats.tx_dropped++;
 }
 
+static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+			   __be16 proto)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+	struct ip_tunnel_info *tun_info;
+	const struct ip_tunnel_key *key;
+	struct erspan_metadata *md;
+	struct rtable *rt = NULL;
+	bool truncate = false;
+	struct flowi4 fl;
+	int tunnel_hlen;
+	__be16 df;
+
+	tun_info = skb_tunnel_info(skb);
+	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     ip_tunnel_info_af(tun_info) != AF_INET))
+		goto err_free_skb;
+
+	key = &tun_info->key;
+
+	/* ERSPAN has fixed 8 byte GRE header */
+	tunnel_hlen = 8 + sizeof(struct erspanhdr);
+
+	rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+	if (!rt)
+		return;
+
+	if (gre_handle_offloads(skb, false))
+		goto err_free_rt;
+
+	if (skb->len > dev->mtu) {
+		pskb_trim(skb, dev->mtu);
+		truncate = true;
+	}
+
+	md = ip_tunnel_info_opts(tun_info);
+	if (!md)
+		goto err_free_rt;
+
+	erspan_build_header(skb, tunnel_id_to_key32(key->tun_id),
+			    ntohl(md->index), truncate);
+
+	gre_build_header(skb, 8, TUNNEL_SEQ,
+			 htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++));
+
+	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ?  htons(IP_DF) : 0;
+
+	iptunnel_xmit(skb->sk, rt, skb, fl.saddr, key->u.ipv4.dst, IPPROTO_GRE,
+		      key->tos, key->ttl, df, false);
+	return;
+
+err_free_rt:
+	ip_rt_put(rt);
+err_free_skb:
+	kfree_skb(skb);
+	dev->stats.tx_dropped++;
+}
+
 static int gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
@@ -637,6 +723,11 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	bool truncate = false;
 
+	if (tunnel->collect_md) {
+		erspan_fb_xmit(skb, dev, skb->protocol);
+		return NETDEV_TX_OK;
+	}
+
 	if (gre_handle_offloads(skb, false))
 		goto free_skb;
 
@@ -999,9 +1090,12 @@ static int erspan_validate(struct nlattr *tb[], struct nlattr *data[],
 		return ret;
 
 	/* ERSPAN should only have GRE sequence and key flag */
-	flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
-	flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
-	if (flags != (GRE_SEQ | GRE_KEY))
+	if (data[IFLA_GRE_OFLAGS])
+		flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
+	if (data[IFLA_GRE_IFLAGS])
+		flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
+	if (!data[IFLA_GRE_COLLECT_METADATA] &&
+	    flags != (GRE_SEQ | GRE_KEY))
 		return -EINVAL;
 
 	/* ERSPAN Session ID only has 10-bit. Since we reuse
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/3] samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
From: William Tu @ 2017-08-23 14:55 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1503500157-3717-1-git-send-email-u9012063@gmail.com>

Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 samples/bpf/tcbpf2_kern.c      | 63 +++++++++++++++++++++++++++++++++++++++++-
 samples/bpf/test_tunnel_bpf.sh | 29 +++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
index 270edcc149a1..370b749f5ee6 100644
--- a/samples/bpf/tcbpf2_kern.c
+++ b/samples/bpf/tcbpf2_kern.c
@@ -17,6 +17,7 @@
 #include <uapi/linux/pkt_cls.h>
 #include <net/ipv6.h>
 #include "bpf_helpers.h"
+#include "bpf_endian.h"
 
 #define _htonl __builtin_bswap32
 #define ERROR(ret) do {\
@@ -38,6 +39,10 @@ struct vxlan_metadata {
 	u32     gbp;
 };
 
+struct erspan_metadata {
+	__be32 index;
+};
+
 SEC("gre_set_tunnel")
 int _gre_set_tunnel(struct __sk_buff *skb)
 {
@@ -76,6 +81,63 @@ int _gre_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	md.index = htonl(123);
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x erspan index 0x%x\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	index = bpf_ntohl(md.index);
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, index);
+
+	return TC_ACT_OK;
+}
+
 SEC("vxlan_set_tunnel")
 int _vxlan_set_tunnel(struct __sk_buff *skb)
 {
@@ -378,5 +440,4 @@ int _ip6ip6_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
-
 char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
index a70d2ea90313..410052d9fc37 100755
--- a/samples/bpf/test_tunnel_bpf.sh
+++ b/samples/bpf/test_tunnel_bpf.sh
@@ -32,6 +32,19 @@ function add_gre_tunnel {
 	ip addr add dev $DEV 10.1.1.200/24
 }
 
+function add_erspan_tunnel {
+	# in namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 local 172.16.1.100 remote 172.16.1.200 erspan 123
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# out of namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
 function add_vxlan_tunnel {
 	# Set static ARP entry here because iptables set-mark works
 	# on L3 packet, as a result not applying to ARP packets,
@@ -99,6 +112,18 @@ function test_gre {
 	cleanup
 }
 
+function test_erspan {
+	TYPE=erspan
+	DEV_NS=erspan00
+	DEV=erspan11
+	config_device
+	add_erspan_tunnel
+	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+	ping -c 1 10.1.1.100
+	ip netns exec at_ns0 ping -c 1 10.1.1.200
+	cleanup
+}
+
 function test_vxlan {
 	TYPE=vxlan
 	DEV_NS=vxlan00
@@ -151,14 +176,18 @@ function cleanup {
 	ip link del gretap11
 	ip link del vxlan11
 	ip link del geneve11
+	ip link del erspan11
 	pkill tcpdump
 	pkill cat
 	set -ex
 }
 
+trap cleanup 0 2 3 6 9
 cleanup
 echo "Testing GRE tunnel..."
 test_gre
+echo "Testing ERSPAN tunnel..."
+test_erspan
 echo "Testing VXLAN tunnel..."
 test_vxlan
 echo "Testing GENEVE tunnel..."
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2 net-next 5/5] bpf/verifier: document liveness analysis
From: Daniel Borkmann via iovisor-dev @ 2017-08-23 14:57 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <3d4870a2-d7ee-2ac0-aaed-a9faeae89b9e-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 08/23/2017 04:11 PM, Edward Cree wrote:
> The liveness tracking algorithm is quite subtle; add comments to explain it.
>
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 2/5] bpf/verifier: when pruning a branch, ignore its write marks
From: Daniel Borkmann @ 2017-08-23 14:56 UTC (permalink / raw)
  To: Edward Cree, davem, Alexei Starovoitov, Alexei Starovoitov
  Cc: netdev, iovisor-dev
In-Reply-To: <ec2ee579-8391-c023-348a-770faa6bc5c7@solarflare.com>

On 08/23/2017 04:10 PM, Edward Cree wrote:
> The fact that writes occurred in reaching the continuation state does
>   not screen off its reads from us, because we're not really its parent.
> So detect 'not really the parent' in do_propagate_liveness, and ignore
>   write marks in that case.
>
> Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
> Signed-off-by: Edward Cree <ecree@solarflare.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-23 15:20 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Michael S. Tsirkin, Jason Wang, virtualization,
	Network Development
In-Reply-To: <1503498504.8694.26.camel@klaipeden.com>

> Please let me make sure if I understand it correctly:
> * always do copy with skb_orphan_frags_rx as Willem mentioned in the earlier
> post, before the xmit_skb as opposed to my original patch, is safe but too
> costly so cannot be adopted.

One more point about msg_zerocopy in the guest. This does add new allocation
limits on optmem and locked pages rlimit.

Hitting these should be extremely rare. The tcp small queues limit normally
throttles well before this.

Virtio-net is an exception because it breaks the tsq signal by calling
skb_orphan before transmission.

As a result hitting these limits is more likely here. But, in this edge case the
sendmsg call will not block, either, but fail with -ENOBUFS. The caller can
send without zerocopy to make forward progress and
trigger free_old_xmit_skbs from start_xmit.

> * as a generic solution, if we were to somehow overcome the safety issue, track
> the delay and do copy if some threshold is reached could be an answer, but it's
> hard for now.> * so things like the current vhost-net implementation of deciding whether or not
> to do zerocopy beforehand referring the zerocopy tx error ratio is a point of
> practical compromise.

The fragility of this mechanism is another argument for switching to tx napi
as default.

Is there any more data about the windows guest issues when completions
are not queued within a reasonable timeframe? What is this timescale and
do we really need to work around this. That is the only thing keeping us from
removing the HoL blocking in vhost-net zerocopy.

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next 10/18] net: mvpp2: use the GoP interrupt for link status changes
From: Stefan Chulski @ 2017-08-23 15:24 UTC (permalink / raw)
  To: Antoine Tenart, Andrew Lunn
  Cc: thomas.petazzoni@free-electrons.com, jason@lakedaemon.net,
	netdev@vger.kernel.org, linux@armlinux.org.uk, Nadav Haklai,
	gregory.clement@free-electrons.com, mw@semihalf.com,
	davem@davemloft.net, linux-arm-kernel@lists.infradead.org,
	sebastian.hesselbarth@gmail.com
In-Reply-To: <20170823082510.GD5202@kwain>

> When the cable is connected (there is signal) and the serdes is in sync and AN
> succeeded.
> 
> > With SFF/SFP ports, you generally need a gpio line the fibre module
> > can use to indicate if it has link. Fixed-phy has such support, and
> > your link_change function will get called when the link changes.
> 
> So that would work when using SFP modules but I wonder if the GoP irq isn't
> needed when using passive cable, in which case this patch would still be needed
> (and of course we should support the new Russell phylib capabilities).
> 
> What's your thoughts on this?
> 
> Thanks!
> Antoine
> 

Even if new phylib driver supports passive direct cables connection, 
GoP IRQ required for SOHO/Peridot external switch support.
SOHO/Peridot external switch could be connected directly to Serdes line.

Regards,
Stefan

^ permalink raw reply

* RE: [PATCH net-next v4] openvswitch: enable NSH support
From: David Laight @ 2017-08-23 15:27 UTC (permalink / raw)
  To: 'Ben Pfaff', Jan Scheurich
  Cc: Jiri Benc, Yang, Yi, netdev@vger.kernel.org, dev@openvswitch.org,
	e@erig.me
In-Reply-To: <20170822173513.GM6175@ovn.org>

From: Ben Pfaff
> Sent: 22 August 2017 18:35
...
> We solved the alignment problem in OVS userspace a different way, by
> defining our versions of the network protocol headers so that they only
> need 16-bit alignment.  In turn, we did that by defining a
> ovs_16aligned_be32 type as a pair of be16s and ovs_16aligned_be64 as
> four be16s, and using helper functions for reads and writes.
...

If you add __attribute__((aligned(2))) to the 32bit and 64bit structure
members then gcc will do the loads and shifts for you.

	David

^ permalink raw reply

* [PATCH net-next 3/5] ipv6: sr: enforce IPv6 packets for seg6local lwt
From: David Lebrun @ 2017-08-23 15:32 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170823153203.11951-1-david.lebrun@uclouvain.be>

This patch ensures that the seg6local lightweight tunnel is used solely
with IPv6 routes and processes only IPv6 packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/seg6_local.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 609b94e..c626325 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -357,6 +357,11 @@ static int seg6_local_input(struct sk_buff *skb)
 	struct seg6_action_desc *desc;
 	struct seg6_local_lwt *slwt;
 
+	if (skb->protocol != htons(ETH_P_IPV6)) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
 	slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
 	desc = slwt->desc;
 
@@ -623,6 +628,9 @@ static int seg6_local_build_state(struct nlattr *nla, unsigned int family,
 	struct seg6_local_lwt *slwt;
 	int err;
 
+	if (family != AF_INET6)
+		return -EINVAL;
+
 	err = nla_parse_nested(tb, SEG6_LOCAL_MAX, nla, seg6_local_policy,
 			       extack);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 1/5] ipv6: sr: add support for ip4ip6 encapsulation
From: David Lebrun @ 2017-08-23 15:31 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170823153203.11951-1-david.lebrun@uclouvain.be>

This patch enables the SRv6 encapsulation mode to carry an IPv4 payload.
All the infrastructure was already present, I just had to add a parameter
to seg6_do_srh_encap() to specify the inner packet protocol, and perform
some additional checks.

Usage example:
ip route add 1.2.3.4 encap seg6 mode encap segs fc00::1,fc00::2 dev eth0

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 include/net/seg6.h       |  3 ++-
 net/ipv6/seg6_iptunnel.c | 47 +++++++++++++++++++++++++++++++++++++----------
 net/ipv6/seg6_local.c    |  2 +-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 5379f55..099bad5 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -60,7 +60,8 @@ extern int seg6_local_init(void);
 extern void seg6_local_exit(void);
 
 extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len);
-extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
+extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
+			     int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
 
 #endif
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5012330..5bec781 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -91,7 +91,7 @@ static void set_tun_src(struct net *net, struct net_device *dev,
 }
 
 /* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
 {
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	struct ipv6hdr *hdr, *inner_hdr;
@@ -116,15 +116,22 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
 	 * hlim will be decremented in ip6_forward() afterwards and
 	 * decapsulation will overwrite inner hlim with outer hlim
 	 */
-	ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
-		     ip6_flowlabel(inner_hdr));
-	hdr->hop_limit = inner_hdr->hop_limit;
+
+	if (skb->protocol == htons(ETH_P_IPV6)) {
+		ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
+			     ip6_flowlabel(inner_hdr));
+		hdr->hop_limit = inner_hdr->hop_limit;
+	} else {
+		ip6_flow_hdr(hdr, 0, 0);
+		hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb));
+	}
+
 	hdr->nexthdr = NEXTHDR_ROUTING;
 
 	isrh = (void *)hdr + sizeof(*hdr);
 	memcpy(isrh, osrh, hdrlen);
 
-	isrh->nexthdr = NEXTHDR_IPV6;
+	isrh->nexthdr = proto;
 
 	hdr->daddr = isrh->segments[isrh->first_segment];
 	set_tun_src(net, skb->dev, &hdr->daddr, &hdr->saddr);
@@ -199,7 +206,7 @@ static int seg6_do_srh(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct seg6_iptunnel_encap *tinfo;
-	int err = 0;
+	int proto, err = 0;
 
 	tinfo = seg6_encap_lwtunnel(dst->lwtstate);
 
@@ -210,17 +217,31 @@ static int seg6_do_srh(struct sk_buff *skb)
 
 	switch (tinfo->mode) {
 	case SEG6_IPTUN_MODE_INLINE:
+		if (skb->protocol != htons(ETH_P_IPV6))
+			return -EINVAL;
+
 		err = seg6_do_srh_inline(skb, tinfo->srh);
+		if (err)
+			return err;
+
 		skb_reset_inner_headers(skb);
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
-		err = seg6_do_srh_encap(skb, tinfo->srh);
+		if (skb->protocol == htons(ETH_P_IPV6))
+			proto = IPPROTO_IPV6;
+		else if (skb->protocol == htons(ETH_P_IP))
+			proto = IPPROTO_IPIP;
+		else
+			return -EINVAL;
+
+		err = seg6_do_srh_encap(skb, tinfo->srh, proto);
+		if (err)
+			return err;
+
+		skb->protocol = htons(ETH_P_IPV6);
 		break;
 	}
 
-	if (err)
-		return err;
-
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
@@ -334,6 +355,9 @@ static int seg6_build_state(struct nlattr *nla,
 	struct seg6_lwt *slwt;
 	int err;
 
+	if (family != AF_INET && family != AF_INET6)
+		return -EINVAL;
+
 	err = nla_parse_nested(tb, SEG6_IPTUNNEL_MAX, nla,
 			       seg6_iptunnel_policy, extack);
 
@@ -356,6 +380,9 @@ static int seg6_build_state(struct nlattr *nla,
 
 	switch (tuninfo->mode) {
 	case SEG6_IPTUN_MODE_INLINE:
+		if (family != AF_INET6)
+			return -EINVAL;
+
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
 		break;
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 147680e..609b94e 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -290,7 +290,7 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 	skb_reset_inner_headers(skb);
 	skb->encapsulation = 1;
 
-	err = seg6_do_srh_encap(skb, slwt->srh);
+	err = seg6_do_srh_encap(skb, slwt->srh, IPPROTO_IPV6);
 	if (err)
 		goto drop;
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 4/5] ipv6: sr: add helper functions for seg6local
From: David Lebrun @ 2017-08-23 15:32 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170823153203.11951-1-david.lebrun@uclouvain.be>

This patch adds three helper functions to be used with the seg6local packet
processing actions.

The decap_and_validate() function will be used by the End.D* actions, that
decapsulate an SR-enabled packet.

The advance_nextseg() function applies the fundamental operations to update
an SRH for the next segment.

The lookup_nexthop() function helps select the next-hop for the processed
SR packets. It supports an optional next-hop address to route the packet
specifically through it, and an optional routing table to use.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/Kconfig      |   1 +
 net/ipv6/seg6_local.c | 189 ++++++++++++++++++++++++++------------------------
 2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 0d72239..ea71e4b 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -308,6 +308,7 @@ config IPV6_SEG6_LWTUNNEL
 	depends on IPV6
 	select LWTUNNEL
 	select DST_CACHE
+	select IPV6_MULTIPLE_TABLES
 	---help---
 	  Support for encapsulation of packets within an outer IPv6
 	  header and a Segment Routing Header using the lightweight
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index c626325..26db4d3e 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -99,23 +99,105 @@ static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
 	return srh;
 }
 
+static bool decap_and_validate(struct sk_buff *skb, int proto)
+{
+	struct ipv6_sr_hdr *srh;
+	unsigned int off = 0;
+
+	srh = get_srh(skb);
+	if (srh && srh->segments_left > 0)
+		return false;
+
+#ifdef CONFIG_IPV6_SEG6_HMAC
+	if (srh && !seg6_hmac_validate_skb(skb))
+		return false;
+#endif
+
+	if (ipv6_find_hdr(skb, &off, proto, NULL, NULL) < 0)
+		return false;
+
+	if (!pskb_pull(skb, off))
+		return false;
+
+	skb_postpull_rcsum(skb, skb_network_header(skb), off);
+
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb->encapsulation = 0;
+
+	return true;
+}
+
+static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
+{
+	struct in6_addr *addr;
+
+	srh->segments_left--;
+	addr = srh->segments + srh->segments_left;
+	*daddr = *addr;
+}
+
+static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+			   u32 tbl_id)
+{
+	struct net *net = dev_net(skb->dev);
+	struct ipv6hdr *hdr = ipv6_hdr(skb);
+	int flags = RT6_LOOKUP_F_HAS_SADDR;
+	struct dst_entry *dst = NULL;
+	struct rt6_info *rt;
+	struct flowi6 fl6;
+
+	fl6.flowi6_iif = skb->dev->ifindex;
+	fl6.daddr = nhaddr ? *nhaddr : hdr->daddr;
+	fl6.saddr = hdr->saddr;
+	fl6.flowlabel = ip6_flowinfo(hdr);
+	fl6.flowi6_mark = skb->mark;
+	fl6.flowi6_proto = hdr->nexthdr;
+
+	if (nhaddr)
+		fl6.flowi6_flags = FLOWI_FLAG_KNOWN_NH;
+
+	if (!tbl_id) {
+		dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
+	} else {
+		struct fib6_table *table;
+
+		table = fib6_get_table(net, tbl_id);
+		if (!table)
+			goto out;
+
+		rt = ip6_pol_route(net, table, 0, &fl6, flags);
+		dst = &rt->dst;
+	}
+
+	if (dst && dst->dev->flags & IFF_LOOPBACK && !dst->error) {
+		dst_release(dst);
+		dst = NULL;
+	}
+
+out:
+	if (!dst) {
+		rt = net->ipv6.ip6_blk_hole_entry;
+		dst = &rt->dst;
+		dst_hold(dst);
+	}
+
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+}
+
 /* regular endpoint function */
 static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
 	struct ipv6_sr_hdr *srh;
-	struct in6_addr *addr;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-
-	ipv6_hdr(skb)->daddr = *addr;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
@@ -127,41 +209,15 @@ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 /* regular endpoint, and forward to specified nexthop */
 static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
-	struct net *net = dev_net(skb->dev);
 	struct ipv6_sr_hdr *srh;
-	struct dst_entry *dst;
-	struct in6_addr *addr;
-	struct ipv6hdr *hdr;
-	struct flowi6 fl6;
-	int flags;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-
-	hdr = ipv6_hdr(skb);
-	hdr->daddr = *addr;
-
-	skb_dst_drop(skb);
-
-	fl6.flowi6_iif = skb->dev->ifindex;
-	fl6.daddr = slwt->nh6;
-	fl6.saddr = hdr->saddr;
-	fl6.flowlabel = ip6_flowinfo(hdr);
-	fl6.flowi6_mark = skb->mark;
-	fl6.flowi6_proto = hdr->nexthdr;
-
-	flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE |
-		RT6_LOOKUP_F_REACHABLE;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-	dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
-	if (dst->dev->flags & IFF_LOOPBACK)
-		goto drop;
-
-	skb_dst_set(skb, dst);
+	lookup_nexthop(skb, &slwt->nh6, 0);
 
 	return dst_input(skb);
 
@@ -174,42 +230,18 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 static int input_action_end_dx6(struct sk_buff *skb,
 				struct seg6_local_lwt *slwt)
 {
-	struct net *net = dev_net(skb->dev);
-	struct ipv6hdr *inner_hdr;
-	struct ipv6_sr_hdr *srh;
-	struct dst_entry *dst;
-	unsigned int off = 0;
-	struct flowi6 fl6;
-	bool use_nh;
-	int flags;
+	struct in6_addr *nhaddr = NULL;
 
 	/* this function accepts IPv6 encapsulated packets, with either
 	 * an SRH with SL=0, or no SRH.
 	 */
 
-	srh = get_srh(skb);
-	if (srh && srh->segments_left > 0)
-		goto drop;
-
-#ifdef CONFIG_IPV6_SEG6_HMAC
-	if (srh && !seg6_hmac_validate_skb(skb))
+	if (!decap_and_validate(skb, IPPROTO_IPV6))
 		goto drop;
-#endif
 
-	if (ipv6_find_hdr(skb, &off, IPPROTO_IPV6, NULL, NULL) < 0)
-		goto drop;
-
-	if (!pskb_pull(skb, off))
+	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
 		goto drop;
 
-	skb_postpull_rcsum(skb, skb_network_header(skb), off);
-
-	skb_reset_network_header(skb);
-	skb_reset_transport_header(skb);
-	skb->encapsulation = 0;
-
-	inner_hdr = ipv6_hdr(skb);
-
 	/* The inner packet is not associated to any local interface,
 	 * so we do not call netif_rx().
 	 *
@@ -217,26 +249,10 @@ static int input_action_end_dx6(struct sk_buff *skb,
 	 * inner packet's DA. Otherwise, use the specified nexthop.
 	 */
 
-	use_nh = !ipv6_addr_any(&slwt->nh6);
+	if (!ipv6_addr_any(&slwt->nh6))
+		nhaddr = &slwt->nh6;
 
-	skb_dst_drop(skb);
-
-	fl6.flowi6_iif = skb->dev->ifindex;
-	fl6.daddr = use_nh ? slwt->nh6 : inner_hdr->daddr;
-	fl6.saddr = inner_hdr->saddr;
-	fl6.flowlabel = ip6_flowinfo(inner_hdr);
-	fl6.flowi6_mark = skb->mark;
-	fl6.flowi6_proto = inner_hdr->nexthdr;
-
-	flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_REACHABLE;
-	if (use_nh)
-		flags |= RT6_LOOKUP_F_IFACE;
-
-	dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
-	if (dst->dev->flags & IFF_LOOPBACK)
-		goto drop;
-
-	skb_dst_set(skb, dst);
+	lookup_nexthop(skb, nhaddr, 0);
 
 	return dst_input(skb);
 drop:
@@ -261,8 +277,7 @@ static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
@@ -276,16 +291,13 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 				     struct seg6_local_lwt *slwt)
 {
 	struct ipv6_sr_hdr *srh;
-	struct in6_addr *addr;
 	int err = -EINVAL;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-	ipv6_hdr(skb)->daddr = *addr;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
 	skb_reset_inner_headers(skb);
 	skb->encapsulation = 1;
@@ -297,8 +309,7 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 0/5] net: updates for IPv6 Segment Routing
From: David Lebrun @ 2017-08-23 15:31 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun

This patch series provides several updates for the SRv6 implementation. The
first patch leverages the existing infrastructure to support encapsulation
of IPv4 packets. The second patch implements the T.Encaps.L2 SR function,
enabling to encapsulate an L2 Ethernet frame within an IPv6+SRH packet.
The last three patches update the seg6local lightweight tunnel, and mainly
implement four new actions: End.T, End.DX2, End.DX4 and End.DT6.

David Lebrun (5):
  ipv6: sr: add support for ip4ip6 encapsulation
  ipv6: sr: add support for encapsulation of L2 frames
  ipv6: sr: enforce IPv6 packets for seg6local lwt
  ipv6: sr: add helper functions for seg6local
  ipv6: sr: implement additional seg6local actions

 include/net/seg6.h                 |   3 +-
 include/uapi/linux/seg6_iptunnel.h |  19 ++-
 net/ipv6/Kconfig                   |   1 +
 net/ipv6/seg6_iptunnel.c           |  65 ++++++--
 net/ipv6/seg6_local.c              | 314 ++++++++++++++++++++++++++++---------
 5 files changed, 313 insertions(+), 89 deletions(-)

-- 
2.10.2

^ permalink raw reply

* [PATCH net-next 2/5] ipv6: sr: add support for encapsulation of L2 frames
From: David Lebrun @ 2017-08-23 15:32 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170823153203.11951-1-david.lebrun@uclouvain.be>

This patch implements the L2 frame encapsulation mechanism, referred to
as T.Encaps.L2 in the SRv6 specifications [1].

A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only
accepts packets with an existing MAC header (i.e., it will not work for
locally generated packets). The resulting packet looks like IPv6 -> SRH ->
Ethernet -> original L3 payload. The next header field of the SRH is set to
NEXTHDR_NONE.

[1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 include/uapi/linux/seg6_iptunnel.h | 19 +++++++++++++++----
 net/ipv6/seg6_iptunnel.c           | 18 ++++++++++++++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
index b6e5a0a..63f0fe4 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -33,16 +33,27 @@ struct seg6_iptunnel_encap {
 enum {
 	SEG6_IPTUN_MODE_INLINE,
 	SEG6_IPTUN_MODE_ENCAP,
+	SEG6_IPTUN_MODE_L2ENCAP,
 };
 
 #ifdef __KERNEL__
 
 static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
 {
-	int encap = (tuninfo->mode == SEG6_IPTUN_MODE_ENCAP);
-
-	return ((tuninfo->srh->hdrlen + 1) << 3) +
-	       (encap * sizeof(struct ipv6hdr));
+	int head = 0;
+
+	switch (tuninfo->mode) {
+	case SEG6_IPTUN_MODE_INLINE:
+		break;
+	case SEG6_IPTUN_MODE_ENCAP:
+		head = sizeof(struct ipv6hdr);
+		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		head = sizeof(struct ipv6hdr) + 14;
+		break;
+	}
+
+	return ((tuninfo->srh->hdrlen + 1) << 3) + head;
 }
 
 #endif
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5bec781..93aa322 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -240,6 +240,22 @@ static int seg6_do_srh(struct sk_buff *skb)
 
 		skb->protocol = htons(ETH_P_IPV6);
 		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		if (!skb_mac_header_was_set(skb))
+			return -EINVAL;
+
+		if (pskb_expand_head(skb, skb->mac_len, 0, GFP_ATOMIC) < 0)
+			return -ENOMEM;
+
+		skb_mac_header_rebuild(skb);
+		skb_push(skb, skb->mac_len);
+
+		err = seg6_do_srh_encap(skb, tinfo->srh, NEXTHDR_NONE);
+		if (err)
+			return err;
+
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
 	}
 
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
@@ -386,6 +402,8 @@ static int seg6_build_state(struct nlattr *nla,
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
 		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 5/5] ipv6: sr: implement additional seg6local actions
From: David Lebrun @ 2017-08-23 15:33 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170823153203.11951-1-david.lebrun@uclouvain.be>

This patch implements the following seg6local actions.

- SEG6_LOCAL_ACTION_END_T: regular SRH processing and forward to the
  next-hop looked up in the specified routing table.

- SEG6_LOCAL_ACTION_END_DX2: decapsulate an L2 frame and forward it to
  the specified network interface.

- SEG6_LOCAL_ACTION_END_DX4: decapsulate an IPv4 packet and forward it,
  possibly to the specified next-hop.

- SEG6_LOCAL_ACTION_END_DT6: decapsulate an IPv6 packet and forward it
  to the next-hop looked up in the specified routing table.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/seg6_local.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 26db4d3e..9c1a885 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -30,6 +30,7 @@
 #ifdef CONFIG_IPV6_SEG6_HMAC
 #include <net/seg6_hmac.h>
 #endif
+#include <linux/etherdevice.h>
 
 struct seg6_local_lwt;
 
@@ -226,6 +227,82 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 	return -EINVAL;
 }
 
+static int input_action_end_t(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+	struct ipv6_sr_hdr *srh;
+
+	srh = get_and_validate_srh(skb);
+	if (!srh)
+		goto drop;
+
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+
+	lookup_nexthop(skb, NULL, slwt->table);
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+/* decapsulate and forward inner L2 frame on specified interface */
+static int input_action_end_dx2(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	struct net *net = dev_net(skb->dev);
+	struct net_device *odev;
+	struct ethhdr *eth;
+
+	if (!decap_and_validate(skb, NEXTHDR_NONE))
+		goto drop;
+
+	if (!pskb_may_pull(skb, ETH_HLEN))
+		goto drop;
+
+	skb_reset_mac_header(skb);
+	eth = (struct ethhdr *)skb->data;
+
+	/* To determine the frame's protocol, we assume it is 802.3. This avoids
+	 * a call to eth_type_trans(), which is not really relevant for our
+	 * use case.
+	 */
+	if (!eth_proto_is_802_3(eth->h_proto))
+		goto drop;
+
+	odev = dev_get_by_index_rcu(net, slwt->oif);
+	if (!odev)
+		goto drop;
+
+	/* As we accept Ethernet frames, make sure the egress device is of
+	 * the correct type.
+	 */
+	if (odev->type != ARPHRD_ETHER)
+		goto drop;
+
+	if (!(odev->flags & IFF_UP) || !netif_carrier_ok(odev))
+		goto drop;
+
+	skb_orphan(skb);
+
+	if (skb_warn_if_lro(skb))
+		goto drop;
+
+	skb_forward_csum(skb);
+
+	if (skb->len - ETH_HLEN > odev->mtu)
+		goto drop;
+
+	skb->dev = odev;
+	skb->protocol = eth->h_proto;
+
+	return dev_queue_xmit(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
 /* decapsulate and forward to specified nexthop */
 static int input_action_end_dx6(struct sk_buff *skb,
 				struct seg6_local_lwt *slwt)
@@ -260,6 +337,56 @@ static int input_action_end_dx6(struct sk_buff *skb,
 	return -EINVAL;
 }
 
+static int input_action_end_dx4(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	struct iphdr *iph;
+	__be32 nhaddr;
+	int err;
+
+	if (!decap_and_validate(skb, IPPROTO_IPIP))
+		goto drop;
+
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		goto drop;
+
+	skb->protocol = htons(ETH_P_IP);
+
+	iph = ip_hdr(skb);
+
+	nhaddr = slwt->nh4.s_addr ?: iph->daddr;
+
+	skb_dst_drop(skb);
+
+	err = ip_route_input(skb, nhaddr, iph->saddr, 0, skb->dev);
+	if (err)
+		goto drop;
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+static int input_action_end_dt6(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	if (!decap_and_validate(skb, IPPROTO_IPV6))
+		goto drop;
+
+	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+		goto drop;
+
+	lookup_nexthop(skb, NULL, slwt->table);
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
 /* push an SRH on top of the current one */
 static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
@@ -330,11 +457,31 @@ static struct seg6_action_desc seg6_action_table[] = {
 		.input		= input_action_end_x,
 	},
 	{
+		.action		= SEG6_LOCAL_ACTION_END_T,
+		.attrs		= (1 << SEG6_LOCAL_TABLE),
+		.input		= input_action_end_t,
+	},
+	{
+		.action		= SEG6_LOCAL_ACTION_END_DX2,
+		.attrs		= (1 << SEG6_LOCAL_OIF),
+		.input		= input_action_end_dx2,
+	},
+	{
 		.action		= SEG6_LOCAL_ACTION_END_DX6,
 		.attrs		= (1 << SEG6_LOCAL_NH6),
 		.input		= input_action_end_dx6,
 	},
 	{
+		.action		= SEG6_LOCAL_ACTION_END_DX4,
+		.attrs		= (1 << SEG6_LOCAL_NH4),
+		.input		= input_action_end_dx4,
+	},
+	{
+		.action		= SEG6_LOCAL_ACTION_END_DT6,
+		.attrs		= (1 << SEG6_LOCAL_TABLE),
+		.input		= input_action_end_dt6,
+	},
+	{
 		.action		= SEG6_LOCAL_ACTION_END_B6,
 		.attrs		= (1 << SEG6_LOCAL_SRH),
 		.input		= input_action_end_b6,
-- 
2.10.2

^ permalink raw reply related

* [PATCH][next] net: hinic: fix comparison of a uint16_t type with -1
From: Colin King @ 2017-08-23 15:39 UTC (permalink / raw)
  To: Aviad Krawczyk, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The comparison of hw_ioctxt.rx_buf_sz_idx == -1 is always false because
rx_buf_sz_idx is a uint16_t. Fix this by explicitly casting -1 to uint16_t.

Detected by CoverityScan, CID#1454559 ("Operands don't affect result")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 09dec6de8dd5..71e26070fb7f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -352,7 +352,7 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, unsigned int rq_depth,
 		}
 	}
 
-	if (hw_ioctxt.rx_buf_sz_idx == -1)
+	if (hw_ioctxt.rx_buf_sz_idx == (uint16_t)-1)
 		return -EINVAL;
 
 	hw_ioctxt.sq_depth  = ilog2(sq_depth);
-- 
2.14.1


^ permalink raw reply related

* [PATCH net-next] bpf, doc: Add arm32 as arch supporting eBPF JIT
From: Shubham Bansal @ 2017-08-23 15:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, ast, Shubham Bansal

As eBPF JIT support for arm32 was added recently with
commit 39c13c204bb1150d401e27d41a9d8b332be47c49, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.

Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com>
---
 Documentation/networking/filter.txt | 4 ++--
 Documentation/sysctl/net.txt        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 6a0df8d..e5e33ba 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -596,8 +596,8 @@ skb pointer). All constraints and restrictions from bpf_check_classic() apply
 before a conversion to the new layout is being done behind the scenes!
 
 Currently, the classic BPF format is being used for JITing on most 32-bit
-architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform JIT
-compilation from eBPF instruction set.
+architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform
+JIT compilation from eBPF instruction set.
 
 Some core changes of the new internal format:
 
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 28596e0..b67044a 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -46,13 +46,13 @@ translate these BPF proglets into native CPU instructions. There are
 two flavors of JITs, the newer eBPF JIT currently supported on:
   - x86_64
   - arm64
+  - arm32
   - ppc64
   - sparc64
   - mips64
   - s390x
 
 And the older cBPF JIT supported on the following archs:
-  - arm
   - mips
   - ppc
   - sparc
-- 
2.7.4

^ permalink raw reply related

* [PATCH] e1000e: changed some expensive calls of udelay to usleep_range
From: Matthew Tan @ 2017-08-23 15:59 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: michael.kardonik, carolyn.wyborny, donald.c.skidmore,
	bruce.w.allan, john.ronciak, mitch.a.williams, intel-wired-lan,
	netdev, linux-kernel, Matthew Tan

    Calls to udelay are not preemtable by userspace so userspace
    applications experience a large (~200us) latency when running on core
    0. Instead usleep_range can be used to be more friendly to userspace
    since it is preemtable. This is due to udelay using busy-wait loops
    while usleep_rang uses hrtimers instead. It is recommended to use
    udelay when the delay is <10us since at that precision overhead of
    usleep_range hrtimer setup causes issues. However, the replaced calls
    are for 50us and 100us so this should not be not an issue.

Signed-off-by: Matthew Tan <matthew.tan_1@nxp.com>
---
 drivers/net/ethernet/intel/e1000e/phy.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index de13aea..e318fdc 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -158,7 +158,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
 	 * the lower time out
 	 */
 	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-		udelay(50);
+		usleep_range(40, 60);
 		mdic = er32(MDIC);
 		if (mdic & E1000_MDIC_READY)
 			break;
@@ -183,7 +183,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
 	 * reading duplicate data in the next MDIC transaction.
 	 */
 	if (hw->mac.type == e1000_pch2lan)
-		udelay(100);
+		usleep_range(90, 100);
 
 	return 0;
 }
@@ -222,7 +222,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
 	 * the lower time out
 	 */
 	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-		udelay(50);
+		usleep_range(40, 60);
 		mdic = er32(MDIC);
 		if (mdic & E1000_MDIC_READY)
 			break;
@@ -246,7 +246,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
 	 * reading duplicate data in the next MDIC transaction.
 	 */
 	if (hw->mac.type == e1000_pch2lan)
-		udelay(100);
+		usleep_range(90, 110);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [EXT] Re: [PATCH net-next 10/18] net: mvpp2: use the GoP interrupt for link status changes
From: Antoine Tenart @ 2017-08-23 16:04 UTC (permalink / raw)
  To: Stefan Chulski
  Cc: Antoine Tenart, Andrew Lunn, davem@davemloft.net,
	jason@lakedaemon.net, gregory.clement@free-electrons.com,
	sebastian.hesselbarth@gmail.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux@armlinux.org.uk, mw@semihalf.com, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <5ebd192c54984b25adda0a294035b1b9@IL-EXCH01.marvell.com>

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

On Wed, Aug 23, 2017 at 03:24:55PM +0000, Stefan Chulski wrote:
> > When the cable is connected (there is signal) and the serdes is in sync and AN
> > succeeded.
> > 
> > > With SFF/SFP ports, you generally need a gpio line the fibre module
> > > can use to indicate if it has link. Fixed-phy has such support, and
> > > your link_change function will get called when the link changes.
> > 
> > So that would work when using SFP modules but I wonder if the GoP irq isn't
> > needed when using passive cable, in which case this patch would still be needed
> > (and of course we should support the new Russell phylib capabilities).
> 
> Even if new phylib driver supports passive direct cables connection, 
> GoP IRQ required for SOHO/Peridot external switch support.
> SOHO/Peridot external switch could be connected directly to Serdes line.

So I guess the GoP link irq patches are needed. Should I resend them
then?

We'll have to discuss how to handle fixed-phy vs GoP IRQ, but I guess we
can do this when adding the fixed-phy support later.

Thanks,
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox