Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 18:03 UTC (permalink / raw)
  To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CABXk95ATb_AFk+4GX9Xw+HEU6No8irb0mOoLE9O4EBuLAgA-1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

Disregard this email. Re-sending in plain-text mode to prevent rejection by
netdev list.

On Fri, Aug 25, 2017 at 10:56 AM Jeffrey Vander Stoep <jeffv-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:

> I’d like to get your thoughts on adding LSM permission checks on BPF
> objects.
>
> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
> bpf() functions. This seems like poor granularity. [2]
>
> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
>
> Let me know what you think. Are there simpler alternative approaches that
> we haven’t considered?
>
> Thanks!
> Jeff
>
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
> allowing all processes permission to use bpf() goes against the principle
> of least privilege exposing a lot of kernel attack surface to processes
> that do not actually need it.
>

[-- Attachment #2: Type: text/html, Size: 1660 bytes --]

^ permalink raw reply

* Permissions for eBPF objects
From: Jeffrey Vander Stoep @ 2017-08-25 18:01 UTC (permalink / raw)
  To: Chenbo Feng, netdev, SELinux

I’d like to get your thoughts on adding LSM permission checks on BPF objects.

By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
to bpf() functions. This seems like poor granularity. [2]

Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.

Let me know what you think. Are there simpler alternative approaches
that we haven’t considered?

Thanks!
Jeff

[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges.
Alternatively allowing all processes permission to use bpf() goes
against the principle of least privilege exposing a lot of kernel
attack surface to processes that do not actually need it.

^ permalink raw reply

* Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 17:56 UTC (permalink / raw)
  To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

I’d like to get your thoughts on adding LSM permission checks on BPF
objects.

By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
bpf() functions. This seems like poor granularity. [2]

Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.

Let me know what you think. Are there simpler alternative approaches that
we haven’t considered?

Thanks!
Jeff

[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
allowing all processes permission to use bpf() goes against the principle
of least privilege exposing a lot of kernel attack surface to processes
that do not actually need it.

[-- Attachment #2: Type: text/html, Size: 1175 bytes --]

^ permalink raw reply

* [PATCH net 3/3] nfp: remove incorrect mask check for vlan matching
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.

Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index b365110..d25b503 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -42,6 +42,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 			    struct tc_cls_flower_offload *flow, u8 key_type,
 			    bool mask_version)
 {
+	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
 	struct flow_dissector_key_vlan *flow_vlan;
 	u16 tmp_tci;
 
@@ -50,15 +51,10 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 	frame->nfp_flow_key_layer = key_type;
 	frame->mask_id = ~0;
 
-	if (mask_version) {
-		frame->tci = cpu_to_be16(~0);
-		return;
-	}
-
 	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
 		flow_vlan = skb_flow_dissector_target(flow->dissector,
 						      FLOW_DISSECTOR_KEY_VLAN,
-						      flow->key);
+						      target);
 		/* Populate the tci field. */
 		if (flow_vlan->vlan_id) {
 			tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/3] nfp: fix supported key layers calculation
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.

Fixes: af9d842c1354 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 6c8ecc2..74a96d6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -107,6 +107,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 {
 	struct flow_dissector_key_basic *mask_basic = NULL;
 	struct flow_dissector_key_basic *key_basic = NULL;
+	struct flow_dissector_key_ip *mask_ip = NULL;
 	u32 key_layer_two;
 	u8 key_layer;
 	int key_size;
@@ -132,6 +133,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 						      flow->key);
 	}
 
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_IP))
+		mask_ip = skb_flow_dissector_target(flow->dissector,
+						    FLOW_DISSECTOR_KEY_IP,
+						    flow->mask);
+
 	key_layer_two = 0;
 	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
 	key_size = sizeof(struct nfp_flower_meta_one) +
@@ -142,11 +148,19 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		/* Ethernet type is present in the key. */
 		switch (key_basic->n_proto) {
 		case cpu_to_be16(ETH_P_IP):
+			if (mask_ip && mask_ip->tos)
+				return -EOPNOTSUPP;
+			if (mask_ip && mask_ip->ttl)
+				return -EOPNOTSUPP;
 			key_layer |= NFP_FLOWER_LAYER_IPV4;
 			key_size += sizeof(struct nfp_flower_ipv4);
 			break;
 
 		case cpu_to_be16(ETH_P_IPV6):
+			if (mask_ip && mask_ip->tos)
+				return -EOPNOTSUPP;
+			if (mask_ip && mask_ip->ttl)
+				return -EOPNOTSUPP;
 			key_layer |= NFP_FLOWER_LAYER_IPV6;
 			key_size += sizeof(struct nfp_flower_ipv6);
 			break;
@@ -157,6 +171,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		case cpu_to_be16(ETH_P_ARP):
 			return -EOPNOTSUPP;
 
+		/* Currently we do not offload MPLS. */
+		case cpu_to_be16(ETH_P_MPLS_UC):
+		case cpu_to_be16(ETH_P_MPLS_MC):
+			return -EOPNOTSUPP;
+
 		/* Will be included in layer 2. */
 		case cpu_to_be16(ETH_P_8021Q):
 			break;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/3] nfp: fix unchecked flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.

Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 133 +++++++++++----------
 .../net/ethernet/netronome/nfp/flower/offload.c    |  41 ++++---
 2 files changed, 93 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 0e08404..b365110 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -45,6 +45,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 	struct flow_dissector_key_vlan *flow_vlan;
 	u16 tmp_tci;
 
+	memset(frame, 0, sizeof(struct nfp_flower_meta_two));
 	/* Populate the metadata frame. */
 	frame->nfp_flow_key_layer = key_type;
 	frame->mask_id = ~0;
@@ -54,21 +55,20 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 		return;
 	}
 
-	flow_vlan = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_VLAN,
-					      flow->key);
-
-	/* Populate the tci field. */
-	if (!flow_vlan->vlan_id) {
-		tmp_tci = 0;
-	} else {
-		tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
-				     flow_vlan->vlan_priority) |
-			  FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
-				     flow_vlan->vlan_id) |
-			  NFP_FLOWER_MASK_VLAN_CFI;
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
+		flow_vlan = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_VLAN,
+						      flow->key);
+		/* Populate the tci field. */
+		if (flow_vlan->vlan_id) {
+			tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
+					     flow_vlan->vlan_priority) |
+				  FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
+					     flow_vlan->vlan_id) |
+				  NFP_FLOWER_MASK_VLAN_CFI;
+			frame->tci = cpu_to_be16(tmp_tci);
+		}
 	}
-	frame->tci = cpu_to_be16(tmp_tci);
 }
 
 static void
@@ -99,17 +99,18 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
 		       bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_eth_addrs *flow_mac;
-
-	flow_mac = skb_flow_dissector_target(flow->dissector,
-					     FLOW_DISSECTOR_KEY_ETH_ADDRS,
-					     target);
+	struct flow_dissector_key_eth_addrs *addr;
 
 	memset(frame, 0, sizeof(struct nfp_flower_mac_mpls));
 
-	/* Populate mac frame. */
-	ether_addr_copy(frame->mac_dst, &flow_mac->dst[0]);
-	ether_addr_copy(frame->mac_src, &flow_mac->src[0]);
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_ETH_ADDRS,
+						 target);
+		/* Populate mac frame. */
+		ether_addr_copy(frame->mac_dst, &addr->dst[0]);
+		ether_addr_copy(frame->mac_src, &addr->src[0]);
+	}
 
 	if (mask_version)
 		frame->mpls_lse = cpu_to_be32(~0);
@@ -121,14 +122,17 @@ nfp_flower_compile_tport(struct nfp_flower_tp_ports *frame,
 			 bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ports *flow_tp;
+	struct flow_dissector_key_ports *tp;
 
-	flow_tp = skb_flow_dissector_target(flow->dissector,
-					    FLOW_DISSECTOR_KEY_PORTS,
-					    target);
+	memset(frame, 0, sizeof(struct nfp_flower_tp_ports));
 
-	frame->port_src = flow_tp->src;
-	frame->port_dst = flow_tp->dst;
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_PORTS)) {
+		tp = skb_flow_dissector_target(flow->dissector,
+					       FLOW_DISSECTOR_KEY_PORTS,
+					       target);
+		frame->port_src = tp->src;
+		frame->port_dst = tp->dst;
+	}
 }
 
 static void
@@ -137,25 +141,27 @@ nfp_flower_compile_ipv4(struct nfp_flower_ipv4 *frame,
 			bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ipv4_addrs *flow_ipv4;
-	struct flow_dissector_key_basic *flow_basic;
-
-	flow_ipv4 = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_IPV4_ADDRS,
-					      target);
+	struct flow_dissector_key_ipv4_addrs *addr;
+	struct flow_dissector_key_basic *basic;
 
-	flow_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       target);
-
-	/* Populate IPv4 frame. */
-	frame->reserved = 0;
-	frame->ipv4_src = flow_ipv4->src;
-	frame->ipv4_dst = flow_ipv4->dst;
-	frame->proto = flow_basic->ip_proto;
 	/* Wildcard TOS/TTL for now. */
-	frame->tos = 0;
-	frame->ttl = 0;
+	memset(frame, 0, sizeof(struct nfp_flower_ipv4));
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_IPV4_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_IPV4_ADDRS,
+						 target);
+		frame->ipv4_src = addr->src;
+		frame->ipv4_dst = addr->dst;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		basic = skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_BASIC,
+						  target);
+		frame->proto = basic->ip_proto;
+	}
 }
 
 static void
@@ -164,26 +170,27 @@ nfp_flower_compile_ipv6(struct nfp_flower_ipv6 *frame,
 			bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ipv6_addrs *flow_ipv6;
-	struct flow_dissector_key_basic *flow_basic;
-
-	flow_ipv6 = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_IPV6_ADDRS,
-					      target);
-
-	flow_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       target);
+	struct flow_dissector_key_ipv6_addrs *addr;
+	struct flow_dissector_key_basic *basic;
 
-	/* Populate IPv6 frame. */
-	frame->reserved = 0;
-	frame->ipv6_src = flow_ipv6->src;
-	frame->ipv6_dst = flow_ipv6->dst;
-	frame->proto = flow_basic->ip_proto;
 	/* Wildcard LABEL/TOS/TTL for now. */
-	frame->ipv6_flow_label_exthdr = 0;
-	frame->tos = 0;
-	frame->ttl = 0;
+	memset(frame, 0, sizeof(struct nfp_flower_ipv6));
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_IPV6_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_IPV6_ADDRS,
+						 target);
+		frame->ipv6_src = addr->src;
+		frame->ipv6_dst = addr->dst;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		basic = skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_BASIC,
+						  target);
+		frame->proto = basic->ip_proto;
+	}
 }
 
 int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 4ad10bd..6c8ecc2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -105,35 +105,40 @@ static int
 nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 				struct tc_cls_flower_offload *flow)
 {
-	struct flow_dissector_key_control *mask_enc_ctl;
-	struct flow_dissector_key_basic *mask_basic;
-	struct flow_dissector_key_basic *key_basic;
+	struct flow_dissector_key_basic *mask_basic = NULL;
+	struct flow_dissector_key_basic *key_basic = NULL;
 	u32 key_layer_two;
 	u8 key_layer;
 	int key_size;
 
-	mask_enc_ctl = skb_flow_dissector_target(flow->dissector,
-						 FLOW_DISSECTOR_KEY_ENC_CONTROL,
-						 flow->mask);
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+		struct flow_dissector_key_control *mask_enc_ctl =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_CONTROL,
+						  flow->mask);
+		/* We are expecting a tunnel. For now we ignore offloading. */
+		if (mask_enc_ctl->addr_type)
+			return -EOPNOTSUPP;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		mask_basic = skb_flow_dissector_target(flow->dissector,
+						       FLOW_DISSECTOR_KEY_BASIC,
+						       flow->mask);
 
-	mask_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       flow->mask);
+		key_basic = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_BASIC,
+						      flow->key);
+	}
 
-	key_basic = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_BASIC,
-					      flow->key);
 	key_layer_two = 0;
 	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
 	key_size = sizeof(struct nfp_flower_meta_one) +
 		   sizeof(struct nfp_flower_in_port) +
 		   sizeof(struct nfp_flower_mac_mpls);
 
-	/* We are expecting a tunnel. For now we ignore offloading. */
-	if (mask_enc_ctl->addr_type)
-		return -EOPNOTSUPP;
-
-	if (mask_basic->n_proto) {
+	if (mask_basic && mask_basic->n_proto) {
 		/* Ethernet type is present in the key. */
 		switch (key_basic->n_proto) {
 		case cpu_to_be16(ETH_P_IP):
@@ -166,7 +171,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		}
 	}
 
-	if (mask_basic->ip_proto) {
+	if (mask_basic && mask_basic->ip_proto) {
 		/* Ethernet type is present in the key. */
 		switch (key_basic->ip_proto) {
 		case IPPROTO_TCP:
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/3] fix layer calculation and flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren

Hi,

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.

Pieter Jansen van Vuuren (3):
  nfp: fix unchecked flow dissector use
  nfp: fix supported key layers calculation
  nfp: remove incorrect mask check for vlan matching

 drivers/net/ethernet/netronome/nfp/flower/match.c  | 139 +++++++++++----------
 .../net/ethernet/netronome/nfp/flower/offload.c    |  60 ++++++---
 2 files changed, 113 insertions(+), 86 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH] net: stmmac: dwmac-sun8i: Use reset exclusive
From: Maxime Ripard @ 2017-08-25 17:12 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: peppe.cavallaro, alexandre.torgue, wens, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20170825151733.GB9475@Red>

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On Fri, Aug 25, 2017 at 05:17:33PM +0200, Corentin Labbe wrote:
> On Fri, Aug 25, 2017 at 04:48:32PM +0200, Maxime Ripard wrote:
> > On Fri, Aug 25, 2017 at 04:38:05PM +0200, Corentin Labbe wrote:
> > > The current dwmac_sun8i module cannot be rmmod/modprobe due to that
> > > the reset controller was not released when removed.
> > > 
> > > This patch remove ambiguity, by using of_reset_control_get_exclusive and
> > > add the missing reset_control_put().
> > > 
> > > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > ---
> > >  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > index fffd6d5fc907..675a09629d85 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > @@ -782,6 +782,7 @@ static int sun8i_dwmac_unpower_internal_phy(struct sunxi_priv_data *gmac)
> > >  
> > >  	clk_disable_unprepare(gmac->ephy_clk);
> > >  	reset_control_assert(gmac->rst_ephy);
> > > +	reset_control_put(gmac->rst_ephy);
> > >  	return 0;
> > >  }
> > >  
> > > @@ -942,7 +943,7 @@ static int sun8i_dwmac_probe(struct platform_device *pdev)
> > >  			return -EINVAL;
> > >  		}
> > >  
> > > -		gmac->rst_ephy = of_reset_control_get(plat_dat->phy_node, NULL);
> > > +		gmac->rst_ephy = of_reset_control_get_exclusive(plat_dat->phy_node, NULL);
> > 
> > Why not just use devm_reset_control_get?
> > 
> 
> Because there no devm_ functions with of_

devm_reset_control_get uses of_reset_control_get internally.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net-next] xen-netback: update ubuf_info initialization to anonymous union
From: Willem de Bruijn @ 2017-08-25 17:10 UTC (permalink / raw)
  To: netdev; +Cc: davem, wei.liu2, paul.durrant, kbuild-all, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.

This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:

   drivers/net//xen-netback/interface.c: In function
   'xenvif_init_queue':
   >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
   >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
      drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
   >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
   >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.

Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/xen-netback/interface.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e322a862ddfe..ee8ed9da00ad 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -551,8 +551,8 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 	for (i = 0; i < MAX_PENDING_REQS; i++) {
 		queue->pending_tx_info[i].callback_struct = (struct ubuf_info)
 			{ .callback = xenvif_zerocopy_callback,
-			  .ctx = NULL,
-			  .desc = i };
+			  { { .ctx = NULL,
+			      .desc = i } } };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
 	}
 
-- 
2.14.1.342.g6490525c54-goog

^ permalink raw reply related

* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Antoine Tenart @ 2017-08-25 17:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, kishon, andrew, jason,
	sebastian.hesselbarth, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <1faa09d8-305a-f6d2-facf-d28f336c849a@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

Hi Florian,

On Fri, Aug 25, 2017 at 09:49:15AM -0700, Florian Fainelli wrote:
> On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> > This patch adds an extra check when the link_event function is called,
> > so that it won't do anything when the netif isn't running.
> 
> Why is this needed? Are you possibly starting the PHY state machine
> earlier than your ndo_open() call? Looking quickly through the driver
> does not suggest this is going on since you properly connect to the PHY
> in mvpp2_open() and start the PHY there.

I added some checks while working on this, and kept this one. But I
looked at the driver again and I assume you're right and the patch could
be dropped.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Florian Fainelli @ 2017-08-25 16:49 UTC (permalink / raw)
  To: Antoine Tenart, davem, kishon, andrew, jason,
	sebastian.hesselbarth, gregory.clement
  Cc: thomas.petazzoni, nadavh, linux, linux-kernel, mw, stefanc,
	miquel.raynal, netdev
In-Reply-To: <20170825144821.31129-9-antoine.tenart@free-electrons.com>

On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> This patch adds an extra check when the link_event function is called,
> so that it won't do anything when the netif isn't running.

Why is this needed? Are you possibly starting the PHY state machine
earlier than your ndo_open() call? Looking quickly through the driver
does not suggest this is going on since you properly connect to the PHY
in mvpp2_open() and start the PHY there.

> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
> index 1b26f5ed994f..49a6789a4142 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5741,6 +5741,9 @@ static void mvpp2_link_event(struct net_device *dev)
>  	struct mvpp2_port *port = netdev_priv(dev);
>  	struct phy_device *phydev = dev->phydev;
>  
> +	if (!netif_running(dev))
> +		return;
> +
>  	if (phydev->link) {
>  		if ((port->speed != phydev->speed) ||
>  		    (port->duplex != phydev->duplex)) {
> 


-- 
Florian

^ permalink raw reply

* Re: Possible race in nsc-ircc.ko
From: Jean Tourrilhes @ 2017-08-25 16:48 UTC (permalink / raw)
  To: Anton Volkov
  Cc: dagb, samuel, netdev, linux-kernel, ldv-project,
	Alexey Khoroshilov
In-Reply-To: <4a33a281-f8bd-305b-e580-0e594feea799@ispras.ru>

On Fri, Aug 25, 2017 at 05:05:25PM +0300, Anton Volkov wrote:
> Hello.
> 
> While searching for races in the Linux kernel I've come across
> "drivers/net/irda/nsc-ircc.ko" module. Here is a question that I came up
> with while analyzing results. Lines are given using the info from Linux
> v4.12.
> 
> Consider the following case:
> 
> Thread 1:                   Thread 2:
> nsc_ircc_init
> ->nsc_ircc_open
>     self = netdev_priv(dev)
>     register_netdev(dev)
>                             nsc_ircc_net_ioctl
>                             ->nsc_ircc_change_speed
>     self->dongle_id = ...       <READ self->io.dongle_id>
>     (nsc-ircc.c: line 485)      (nsc-ircc.c: line 1318)
>     platform_device_register_simple
> 
> Before the initialization of self->dongle_id in msc_ircc_open() its value is
> 0. Thus if read access to its value in nsc_ircc_change_speed occurs before
> the initialization there will be an attempt to change speed of dongle with
> undesired id (if the dongle with id 0 exists). Is this case feasible from
> your point of view?
> 
> Thank you for your time.
> 
> -- Anton Volkov

	A first glance, that seems like a valid race. I'm not sure if
there is a netdev lock/status to protect the driver, because it looks
like doing any operation on a device before "open" has completed would
be dangerous for most drivers.
	I don't have time to check the code paths, as I have not
looked at that code in ages.
	Good luck !

	Jean

^ permalink raw reply

* Re: [PATCH net-next v2] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25 16:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list
In-Reply-To: <20170825035824.26935-1-f.fainelli@gmail.com>

On 08/24/2017 08:58 PM, Florian Fainelli wrote:
> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb_any() to be drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 327dfe5bedc0..a90e459c5b8a 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
>  		buffer_info->dma = 0;
>  	}
>  	if (buffer_info->skb) {
> -		dev_kfree_skb_any(buffer_info->skb);
> +		dev_consume_skb_any(buffer_info->skb);
>  		buffer_info->skb = NULL;
>  	}
>  	buffer_info->time_stamp = 0;
> @@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct *work)
>  		wmb(); /* force write prior to skb_tstamp_tx */
>  
>  		skb_tstamp_tx(skb, &shhwtstamps);
> -		dev_kfree_skb_any(skb);
> +		dev_consume_skb_any(skb);
>  	} else if (time_after(jiffies, adapter->tx_hwtstamp_start
>  			      + adapter->tx_timeout_factor * HZ)) {
>  		dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
> @@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
>  		}
>  
>  		if (buffer_info->skb) {
> -			dev_kfree_skb(buffer_info->skb);
> +			dev_consume_skb_any(buffer_info->skb);

This one is not actually needed since dev_kfree_skb() is an alias for
consume_skb().

>  			buffer_info->skb = NULL;
>  		}
>  
> @@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
>  
>  	/* there also may be some cached data from a chained receive */
>  	if (rx_ring->rx_skb_top) {
> -		dev_kfree_skb(rx_ring->rx_skb_top);
> +		dev_consume_skb_any(rx_ring->rx_skb_top);

Same here.

I will make a repost of this patch later, Jeff please drop it if you
have it queued already, thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: stmmac: Handle possible fixed-link with need_mdio_ids
From: Florian Fainelli @ 2017-08-25 16:45 UTC (permalink / raw)
  To: Andrew Lunn, Corentin Labbe
  Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <20170825162827.GA4207@lunn.ch>

On 08/25/2017 09:28 AM, Andrew Lunn wrote:
> On Fri, Aug 25, 2017 at 04:42:08PM +0200, Corentin Labbe wrote:
>> In case of fixed link, there are no mdio node.
>> This patch add a test for fixed-link for bypassing MDIO node register.
> 
> The two are not mutually exclusive. E.g.
> vf610-zii-dev.dtsi/vf610-zii-dev-rev-b.dts.  It has a fixed-link on
> the FEC ethernet controller, and an Ethernet switch on the MDIO bus.
> 
> If anybody ever wants to use a switch with the stmmac, this will be
> required.

This is already done in the Lamobo R1 DTS file so it would be nice not
to break this use case:

&gmac {
        pinctrl-names = "default";
        pinctrl-0 = <&gmac_pins_rgmii_a>;
        phy-mode = "rgmii";
        phy-supply = <&reg_gmac_3v3>;
        status = "okay";

        fixed-link {
                speed = <1000>;
                full-duplex;
        };

        mdio {
                compatible = "snps,dwmac-mdio";
                #address-cells = <1>;
                #size-cells = <0>;

                switch: ethernet-switch@1e {
                        compatible = "brcm,bcm53125";
                        reg = <30>;
                        #address-cells = <1>;
                        #size-cells = <0>;


> 
> 	Andrew
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25 16:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, andrew, Sebastian Hesselbarth, open list
In-Reply-To: <1503635699.2499.100.camel@edumazet-glaptop3.roam.corp.google.com>

On 08/24/2017 09:34 PM, Eric Dumazet wrote:
> On Thu, 2017-08-24 at 20:55 -0700, Florian Fainelli wrote:
>> txq_reclaim() does the normal transmit queue reclamation and
>> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
>> so use dev_consume_skb() for both locations.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
>> index fb2d533ae4ef..81c1fac00d33 100644
>> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
>> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
>> @@ -1121,7 +1121,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
>>  			struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
>>  
>>  			if (!WARN_ON(!skb))
>> -				dev_kfree_skb(skb);
>> +				dev_consume_skb_any(skb);
>>  		}
>>  
>>  		if (cmd_sts & ERROR_SUMMARY) {
>> @@ -2024,7 +2024,7 @@ static void rxq_deinit(struct rx_queue *rxq)
>>  
>>  	for (i = 0; i < rxq->rx_ring_size; i++) {
>>  		if (rxq->rx_skb[i]) {
>> -			dev_kfree_skb(rxq->rx_skb[i]);
>> +			dev_consume_skb_any(rxq->rx_skb[i]);
>>  			rxq->rx_desc_count--;
>>  		}
>>  	}
> 
> 
> I do not believe this patch is needed.
> 
> dev_kfree_skb() is an alias of consume_skb(), which is already drop
> monitor ready ;)

You are right, this patch is/was not needed, I have been too trigger
happy with the dev_kfree_skb*() replacement, but only
dev_kfree_skb_{any,irq} are valid candidates for replacement by
dev_consume_skb_{any,irq}. I will re-audit my other two submissions to
e1000e and r8169. Thanks Eric.
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: stmmac: Handle possible fixed-link with need_mdio_ids
From: Andrew Lunn @ 2017-08-25 16:28 UTC (permalink / raw)
  To: Corentin Labbe; +Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <20170825144208.24503-1-clabbe.montjoie@gmail.com>

On Fri, Aug 25, 2017 at 04:42:08PM +0200, Corentin Labbe wrote:
> In case of fixed link, there are no mdio node.
> This patch add a test for fixed-link for bypassing MDIO node register.

The two are not mutually exclusive. E.g.
vf610-zii-dev.dtsi/vf610-zii-dev-rev-b.dts.  It has a fixed-link on
the FEC ethernet controller, and an Ethernet switch on the MDIO bus.

If anybody ever wants to use a switch with the stmmac, this will be
required.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
From: Jiri Benc @ 2017-08-25 16:25 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, e, blp, jan.scheurich
In-Reply-To: <1503670805-31051-3-git-send-email-yi.y.yang@intel.com>

On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -766,7 +766,7 @@ struct sk_buff {
>  	__u8			ndisc_nodetype:2;
>  #endif
>  	__u8			ipvs_property:1;
> -	__u8			inner_protocol_type:1;
> +	__u8			inner_protocol_type:2;

Adding anything to sk_buff is pretty much forbidden. You can't add more
bytes to it and there are no more free bits to use, either.

Luckily, we still have one byte hole next to inner_ipproto that we can
use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3
and storing the L3 type in the unused byte. It's not beautiful (would
be better to use ethertype than a custom enum) but it will work.

While looking at this, I realized that GSO for VXLAN-GPE is broken,
too. Let me fix it by implementing what I described above which will
make your patch much easier.

 Jiri

^ permalink raw reply

* RE: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more
From: Keller, Jacob E @ 2017-08-25 16:24 UTC (permalink / raw)
  To: Stephen Hemminger, Waskiewicz Jr, Peter; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170825085816.3425a70c@xeon-e3>

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, August 25, 2017 8:58 AM
> To: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
> 
> xmit_more is only a hint to the device. The device driver should ignore it unless
> there are hardware advantages. The device driver is the place with HW specific
> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this
> device).
> 
> Anything that pushes that optimization out to the user is only useful for
> benchmarks
> and embedded devices.

Right so most drivers I've seen simply take it as a "avoid bumping tail of a ring" whenever they see xmit_more. But unfortunately in some circumstances, this results in potentially several hundred packets being set with xmit_more in a row, and then the driver doesn't bump the tail for a long time, resulting in high latency spikes..

I was trying to find a way to fix this potentially in multiple drivers, rather than just a single driver, since I figured the same sort of code might need to be needed.

So you're suggesting we should just perform some check in the device driver, even if it might be duplication?

We could also instead make it a setting in the netdev struct or something which would be set by the driver and then tell stack code to limit how many it sends at once (so that we don't need to duplicate that checking code in every driver?)

Thanks,
Jake

^ permalink raw reply

* [PATCH net-next 3/3] samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>

Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/tcbpf2_kern.c      | 63 +++++++++++++++++++++++++++++++++++++++++-
 samples/bpf/test_tunnel_bpf.sh | 29 +++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
index 270edcc149a1..370b749f5ee6 100644
--- a/samples/bpf/tcbpf2_kern.c
+++ b/samples/bpf/tcbpf2_kern.c
@@ -17,6 +17,7 @@
 #include <uapi/linux/pkt_cls.h>
 #include <net/ipv6.h>
 #include "bpf_helpers.h"
+#include "bpf_endian.h"
 
 #define _htonl __builtin_bswap32
 #define ERROR(ret) do {\
@@ -38,6 +39,10 @@ struct vxlan_metadata {
 	u32     gbp;
 };
 
+struct erspan_metadata {
+	__be32 index;
+};
+
 SEC("gre_set_tunnel")
 int _gre_set_tunnel(struct __sk_buff *skb)
 {
@@ -76,6 +81,63 @@ int _gre_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	md.index = htonl(123);
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x erspan index 0x%x\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	index = bpf_ntohl(md.index);
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, index);
+
+	return TC_ACT_OK;
+}
+
 SEC("vxlan_set_tunnel")
 int _vxlan_set_tunnel(struct __sk_buff *skb)
 {
@@ -378,5 +440,4 @@ int _ip6ip6_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
-
 char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
index a70d2ea90313..410052d9fc37 100755
--- a/samples/bpf/test_tunnel_bpf.sh
+++ b/samples/bpf/test_tunnel_bpf.sh
@@ -32,6 +32,19 @@ function add_gre_tunnel {
 	ip addr add dev $DEV 10.1.1.200/24
 }
 
+function add_erspan_tunnel {
+	# in namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 local 172.16.1.100 remote 172.16.1.200 erspan 123
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# out of namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
 function add_vxlan_tunnel {
 	# Set static ARP entry here because iptables set-mark works
 	# on L3 packet, as a result not applying to ARP packets,
@@ -99,6 +112,18 @@ function test_gre {
 	cleanup
 }
 
+function test_erspan {
+	TYPE=erspan
+	DEV_NS=erspan00
+	DEV=erspan11
+	config_device
+	add_erspan_tunnel
+	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+	ping -c 1 10.1.1.100
+	ip netns exec at_ns0 ping -c 1 10.1.1.200
+	cleanup
+}
+
 function test_vxlan {
 	TYPE=vxlan
 	DEV_NS=vxlan00
@@ -151,14 +176,18 @@ function cleanup {
 	ip link del gretap11
 	ip link del vxlan11
 	ip link del geneve11
+	ip link del erspan11
 	pkill tcpdump
 	pkill cat
 	set -ex
 }
 
+trap cleanup 0 2 3 6 9
 cleanup
 echo "Testing GRE tunnel..."
 test_gre
+echo "Testing ERSPAN tunnel..."
+test_erspan
 echo "Testing VXLAN tunnel..."
 test_vxlan
 echo "Testing GENEVE tunnel..."
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/3] gre: add collect_md mode to ERSPAN tunnel
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>

Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away.  OVS can use it as well in the future.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 include/net/ip_tunnels.h |   4 +-
 net/ipv4/ip_gre.c        | 102 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 625c29329372..992652856fe8 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -154,8 +154,10 @@ struct ip_tunnel {
 #define TUNNEL_GENEVE_OPT	__cpu_to_be16(0x0800)
 #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
 #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
+#define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
 
-#define TUNNEL_OPTIONS_PRESENT	(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT)
+#define TUNNEL_OPTIONS_PRESENT \
+		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
 
 struct tnl_ptk_info {
 	__be16 flags;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 453b7925b940..0162fb955b33 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -113,6 +113,8 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static int ipgre_tunnel_init(struct net_device *dev);
+static void erspan_build_header(struct sk_buff *skb,
+				__be32 id, u32 index, bool truncate);
 
 static unsigned int ipgre_net_id __read_mostly;
 static unsigned int gre_tap_net_id __read_mostly;
@@ -287,7 +289,33 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 					   false, false) < 0)
 			goto drop;
 
-		tunnel->index = ntohl(index);
+		if (tunnel->collect_md) {
+			struct ip_tunnel_info *info;
+			struct erspan_metadata *md;
+			__be64 tun_id;
+			__be16 flags;
+
+			tpi->flags |= TUNNEL_KEY;
+			flags = tpi->flags;
+			tun_id = key32_to_tunnel_id(tpi->key);
+
+			tun_dst = ip_tun_rx_dst(skb, flags,
+						tun_id, sizeof(*md));
+			if (!tun_dst)
+				return PACKET_REJECT;
+
+			md = ip_tunnel_info_opts(&tun_dst->u.tun_info);
+			if (!md)
+				return PACKET_REJECT;
+
+			md->index = index;
+			info = &tun_dst->u.tun_info;
+			info->key.tun_flags |= TUNNEL_ERSPAN_OPT;
+			info->options_len = sizeof(*md);
+		} else {
+			tunnel->index = ntohl(index);
+		}
+
 		skb_reset_mac_header(skb);
 		ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
 		return PACKET_RCVD;
@@ -523,6 +551,64 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 	dev->stats.tx_dropped++;
 }
 
+static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+			   __be16 proto)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+	struct ip_tunnel_info *tun_info;
+	const struct ip_tunnel_key *key;
+	struct erspan_metadata *md;
+	struct rtable *rt = NULL;
+	bool truncate = false;
+	struct flowi4 fl;
+	int tunnel_hlen;
+	__be16 df;
+
+	tun_info = skb_tunnel_info(skb);
+	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     ip_tunnel_info_af(tun_info) != AF_INET))
+		goto err_free_skb;
+
+	key = &tun_info->key;
+
+	/* ERSPAN has fixed 8 byte GRE header */
+	tunnel_hlen = 8 + sizeof(struct erspanhdr);
+
+	rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+	if (!rt)
+		return;
+
+	if (gre_handle_offloads(skb, false))
+		goto err_free_rt;
+
+	if (skb->len > dev->mtu) {
+		pskb_trim(skb, dev->mtu);
+		truncate = true;
+	}
+
+	md = ip_tunnel_info_opts(tun_info);
+	if (!md)
+		goto err_free_rt;
+
+	erspan_build_header(skb, tunnel_id_to_key32(key->tun_id),
+			    ntohl(md->index), truncate);
+
+	gre_build_header(skb, 8, TUNNEL_SEQ,
+			 htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++));
+
+	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ?  htons(IP_DF) : 0;
+
+	iptunnel_xmit(skb->sk, rt, skb, fl.saddr, key->u.ipv4.dst, IPPROTO_GRE,
+		      key->tos, key->ttl, df, false);
+	return;
+
+err_free_rt:
+	ip_rt_put(rt);
+err_free_skb:
+	kfree_skb(skb);
+	dev->stats.tx_dropped++;
+}
+
 static int gre_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
@@ -636,6 +722,11 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	bool truncate = false;
 
+	if (tunnel->collect_md) {
+		erspan_fb_xmit(skb, dev, skb->protocol);
+		return NETDEV_TX_OK;
+	}
+
 	if (gre_handle_offloads(skb, false))
 		goto free_skb;
 
@@ -998,9 +1089,12 @@ static int erspan_validate(struct nlattr *tb[], struct nlattr *data[],
 		return ret;
 
 	/* ERSPAN should only have GRE sequence and key flag */
-	flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
-	flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
-	if (flags != (GRE_SEQ | GRE_KEY))
+	if (data[IFLA_GRE_OFLAGS])
+		flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]);
+	if (data[IFLA_GRE_IFLAGS])
+		flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]);
+	if (!data[IFLA_GRE_COLLECT_METADATA] &&
+	    flags != (GRE_SEQ | GRE_KEY))
 		return -EINVAL;
 
 	/* ERSPAN Session ID only has 10-bit. Since we reuse
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 1/3] gre: refactor the gre_fb_xmit
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1503678089-27131-1-git-send-email-u9012063@gmail.com>

The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 net/ipv4/ip_gre.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f70674799fdd..453b7925b940 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -432,39 +432,33 @@ static struct rtable *gre_get_rt(struct sk_buff *skb,
 	return ip_route_output_key(net, fl);
 }
 
-static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
-			__be16 proto)
+static struct rtable *prepare_fb_xmit(struct sk_buff *skb,
+				      struct net_device *dev,
+				      struct flowi4 *fl,
+				      int tunnel_hlen)
 {
 	struct ip_tunnel_info *tun_info;
 	const struct ip_tunnel_key *key;
 	struct rtable *rt = NULL;
-	struct flowi4 fl;
 	int min_headroom;
-	int tunnel_hlen;
-	__be16 df, flags;
 	bool use_cache;
 	int err;
 
 	tun_info = skb_tunnel_info(skb);
-	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
-		     ip_tunnel_info_af(tun_info) != AF_INET))
-		goto err_free_skb;
-
 	key = &tun_info->key;
 	use_cache = ip_tunnel_dst_cache_usable(skb, tun_info);
+
 	if (use_cache)
-		rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl.saddr);
+		rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl->saddr);
 	if (!rt) {
-		rt = gre_get_rt(skb, dev, &fl, key);
+		rt = gre_get_rt(skb, dev, fl, key);
 		if (IS_ERR(rt))
-				goto err_free_skb;
+			goto err_free_skb;
 		if (use_cache)
 			dst_cache_set_ip4(&tun_info->dst_cache, &rt->dst,
-					  fl.saddr);
+					  fl->saddr);
 	}
 
-	tunnel_hlen = gre_calc_hlen(key->tun_flags);
-
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ tunnel_hlen + sizeof(struct iphdr);
 	if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
@@ -476,6 +470,37 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (unlikely(err))
 			goto err_free_rt;
 	}
+	return rt;
+
+err_free_rt:
+	ip_rt_put(rt);
+err_free_skb:
+	kfree_skb(skb);
+	dev->stats.tx_dropped++;
+	return NULL;
+}
+
+static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
+			__be16 proto)
+{
+	struct ip_tunnel_info *tun_info;
+	const struct ip_tunnel_key *key;
+	struct rtable *rt = NULL;
+	struct flowi4 fl;
+	int tunnel_hlen;
+	__be16 df, flags;
+
+	tun_info = skb_tunnel_info(skb);
+	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
+		     ip_tunnel_info_af(tun_info) != AF_INET))
+		goto err_free_skb;
+
+	key = &tun_info->key;
+	tunnel_hlen = gre_calc_hlen(key->tun_flags);
+
+	rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen);
+	if (!rt)
+		return;
 
 	/* Push Tunnel header. */
 	if (gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM)))
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 0/3] gre: add collect_md mode for ERSPAN tunnel
From: William Tu @ 2017-08-25 16:21 UTC (permalink / raw)
  To: netdev

This patch series provide collect_md mode for ERSPAN tunnel.  The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit.  The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic.  The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.

Thank you

William Tu (3):
  gre: refactor the gre_fb_xmit
  gre: add collect_md mode to ERSPAN tunnel
  samples/bpf: extend test_tunnel_bpf.sh with ERSPAN

 include/net/ip_tunnels.h       |   4 +-
 net/ipv4/ip_gre.c              | 157 ++++++++++++++++++++++++++++++++++++-----
 samples/bpf/tcbpf2_kern.c      |  63 ++++++++++++++++-
 samples/bpf/test_tunnel_bpf.sh |  29 ++++++++
 4 files changed, 232 insertions(+), 21 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH 3/7] ieee802154: 6lowpan: make header_ops const
From: Marcel Holtmann @ 2017-08-25 16:17 UTC (permalink / raw)
  To: Bhumika Goyal
  Cc: julia.lawall, Gustavo F. Padovan, Johan Hedberg, David S. Miller,
	Pablo Neira Ayuso, kadlec, Florian Westphal, Stephen Hemminger,
	Alexander Aring, Stefan Schmidt, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, santosh.shilimkar, trond.myklebust,
	anna.schumaker, bfields, jlayton, open list:BLUETOOTH DRIVERS,
	Network Development, LKML, netfilter-devel
In-Reply-To: <1503670907-23221-4-git-send-email-bhumirks@gmail.com>

Hi Bhumika,

> Make this const as it is only stored as a reference in a const field of
> a net_device structure.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
> ---
> net/ieee802154/6lowpan/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

^ permalink raw reply

* Re: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more
From: Stephen Hemminger @ 2017-08-25 15:58 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter; +Cc: Keller, Jacob E, netdev@vger.kernel.org
In-Reply-To: <E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>

On Fri, 25 Aug 2017 15:36:22 +0000
"Waskiewicz Jr, Peter" <peter.waskiewicz.jr@intel.com> wrote:

> On 8/25/17 11:25 AM, Jacob Keller wrote:
> > Under some circumstances, such as with many stacked devices, it is
> > possible that dev_hard_start_xmit will bundle many packets together, and
> > mark them all with xmit_more.
> > 
> > Most drivers respond to xmit_more by skipping tail bumps on packet
> > rings, or similar behavior as long as xmit_more is set. This is
> > a performance win since it means drivers can avoid notifying hardware of
> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
> > bandwidth.
> > 
> > This use of xmit_more comes with a trade off because bundling too many
> > packets can increase latency of the Tx packets. To avoid this, we should
> > limit the maximum number of packets with xmit_more.
> > 
> > Driver authors could modify their drivers to check for some determined
> > limit, but this requires all drivers to be modified in order to gain
> > advantage.
> > 
> > Instead, add a sysctl "xmit_more_max" which can be used to configure the
> > maximum number of xmit_more skbs to send in a sequence. This ensures
> > that all drivers benefit, and allows system administrators the option to
> > tune the value to their environment.
> > 
> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> > ---
> > 
> > Stray thoughts and further questions....
> > 
> > Is this the right approach? Did I miss any other places where we should
> > limit? Does the limit make sense? Should it instead be a per-device
> > tuning nob instead of a global? Is 32 a good default?  
> 
> I actually like the idea of a per-device knob.  A xmit_more_max that's 
> global in a system with 1GbE devices along with a 25/50GbE or more just 
> doesn't make much sense to me.  Or having heterogeneous vendor devices 
> in the same system that have different HW behaviors could mask issues 
> with latency.
> 
> This seems like another incarnation of possible buffer-bloat if the max 
> is too high...
> 
> > 
> >   Documentation/sysctl/net.txt |  6 ++++++
> >   include/linux/netdevice.h    |  2 ++
> >   net/core/dev.c               | 10 +++++++++-
> >   net/core/sysctl_net_core.c   |  7 +++++++
> >   4 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> > index b67044a2575f..3d995e8f4448 100644
> > --- a/Documentation/sysctl/net.txt
> > +++ b/Documentation/sysctl/net.txt
> > @@ -230,6 +230,12 @@ netdev_max_backlog
> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
> >   receives packets faster than kernel can process them.
> >   
> > +xmit_more_max
> > +-------------
> > +
> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
> > +indicates no limit.  
> 
> What defines "packet?"  MTU-sized packets, or payloads coming down from 
> the stack (e.g. TSO's)?

xmit_more is only a hint to the device. The device driver should ignore it unless
there are hardware advantages. The device driver is the place with HW specific
knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).

Anything that pushes that optimization out to the user is only useful for benchmarks
and embedded devices.

^ permalink raw reply

* Re: [PATCH net-next v2 00/14] net: mvpp2: comphy configuration
From: Antoine Tenart @ 2017-08-25 15:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, kishon, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux, linux-kernel,
	mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170825155111.GD30922@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Hi Andrew,

On Fri, Aug 25, 2017 at 05:51:11PM +0200, Andrew Lunn wrote:
> >   - Checked if the carrier_on/off functions were needed. They are.
> 
> Could you explain the situations they are needed in?
> 
> Quite a few drivers do this, so i'm not saying it is wrong.  But it
> would be nice to understand if we are missing something in phylib.

I know it's not working without these calls, but I can't tell why (yet).
I can try to dig into this.

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox