Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Florian Fainelli @ 2017-08-25 16:49 UTC (permalink / raw)
  To: Antoine Tenart, davem, kishon, andrew, jason,
	sebastian.hesselbarth, gregory.clement
  Cc: thomas.petazzoni, nadavh, linux, linux-kernel, mw, stefanc,
	miquel.raynal, netdev
In-Reply-To: <20170825144821.31129-9-antoine.tenart@free-electrons.com>

On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> This patch adds an extra check when the link_event function is called,
> so that it won't do anything when the netif isn't running.

Why is this needed? Are you possibly starting the PHY state machine
earlier than your ndo_open() call? Looking quickly through the driver
does not suggest this is going on since you properly connect to the PHY
in mvpp2_open() and start the PHY there.

> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
> index 1b26f5ed994f..49a6789a4142 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -5741,6 +5741,9 @@ static void mvpp2_link_event(struct net_device *dev)
>  	struct mvpp2_port *port = netdev_priv(dev);
>  	struct phy_device *phydev = dev->phydev;
>  
> +	if (!netif_running(dev))
> +		return;
> +
>  	if (phydev->link) {
>  		if ((port->speed != phydev->speed) ||
>  		    (port->duplex != phydev->duplex)) {
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 08/14] net: mvpp2: check the netif is running in the link_event function
From: Antoine Tenart @ 2017-08-25 17:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, kishon, andrew, jason,
	sebastian.hesselbarth, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <1faa09d8-305a-f6d2-facf-d28f336c849a@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

Hi Florian,

On Fri, Aug 25, 2017 at 09:49:15AM -0700, Florian Fainelli wrote:
> On 08/25/2017 07:48 AM, Antoine Tenart wrote:
> > This patch adds an extra check when the link_event function is called,
> > so that it won't do anything when the netif isn't running.
> 
> Why is this needed? Are you possibly starting the PHY state machine
> earlier than your ndo_open() call? Looking quickly through the driver
> does not suggest this is going on since you properly connect to the PHY
> in mvpp2_open() and start the PHY there.

I added some checks while working on this, and kept this one. But I
looked at the driver again and I assume you're right and the patch could
be dropped.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net-next] xen-netback: update ubuf_info initialization to anonymous union
From: Willem de Bruijn @ 2017-08-25 17:10 UTC (permalink / raw)
  To: netdev; +Cc: davem, wei.liu2, paul.durrant, kbuild-all, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.

This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:

   drivers/net//xen-netback/interface.c: In function
   'xenvif_init_queue':
   >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
   >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
      drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
   >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
   >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.

Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/xen-netback/interface.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e322a862ddfe..ee8ed9da00ad 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -551,8 +551,8 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 	for (i = 0; i < MAX_PENDING_REQS; i++) {
 		queue->pending_tx_info[i].callback_struct = (struct ubuf_info)
 			{ .callback = xenvif_zerocopy_callback,
-			  .ctx = NULL,
-			  .desc = i };
+			  { { .ctx = NULL,
+			      .desc = i } } };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
 	}
 
-- 
2.14.1.342.g6490525c54-goog

^ permalink raw reply related

* Re: [PATCH] net: stmmac: dwmac-sun8i: Use reset exclusive
From: Maxime Ripard @ 2017-08-25 17:12 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: peppe.cavallaro, alexandre.torgue, wens, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20170825151733.GB9475@Red>

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On Fri, Aug 25, 2017 at 05:17:33PM +0200, Corentin Labbe wrote:
> On Fri, Aug 25, 2017 at 04:48:32PM +0200, Maxime Ripard wrote:
> > On Fri, Aug 25, 2017 at 04:38:05PM +0200, Corentin Labbe wrote:
> > > The current dwmac_sun8i module cannot be rmmod/modprobe due to that
> > > the reset controller was not released when removed.
> > > 
> > > This patch remove ambiguity, by using of_reset_control_get_exclusive and
> > > add the missing reset_control_put().
> > > 
> > > Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > ---
> > >  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > index fffd6d5fc907..675a09629d85 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> > > @@ -782,6 +782,7 @@ static int sun8i_dwmac_unpower_internal_phy(struct sunxi_priv_data *gmac)
> > >  
> > >  	clk_disable_unprepare(gmac->ephy_clk);
> > >  	reset_control_assert(gmac->rst_ephy);
> > > +	reset_control_put(gmac->rst_ephy);
> > >  	return 0;
> > >  }
> > >  
> > > @@ -942,7 +943,7 @@ static int sun8i_dwmac_probe(struct platform_device *pdev)
> > >  			return -EINVAL;
> > >  		}
> > >  
> > > -		gmac->rst_ephy = of_reset_control_get(plat_dat->phy_node, NULL);
> > > +		gmac->rst_ephy = of_reset_control_get_exclusive(plat_dat->phy_node, NULL);
> > 
> > Why not just use devm_reset_control_get?
> > 
> 
> Because there no devm_ functions with of_

devm_reset_control_get uses of_reset_control_get internally.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net 0/3] fix layer calculation and flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren

Hi,

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.

Pieter Jansen van Vuuren (3):
  nfp: fix unchecked flow dissector use
  nfp: fix supported key layers calculation
  nfp: remove incorrect mask check for vlan matching

 drivers/net/ethernet/netronome/nfp/flower/match.c  | 139 +++++++++++----------
 .../net/ethernet/netronome/nfp/flower/offload.c    |  60 ++++++---
 2 files changed, 113 insertions(+), 86 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net 1/3] nfp: fix unchecked flow dissector use
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.

Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 133 +++++++++++----------
 .../net/ethernet/netronome/nfp/flower/offload.c    |  41 ++++---
 2 files changed, 93 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 0e08404..b365110 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -45,6 +45,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 	struct flow_dissector_key_vlan *flow_vlan;
 	u16 tmp_tci;
 
+	memset(frame, 0, sizeof(struct nfp_flower_meta_two));
 	/* Populate the metadata frame. */
 	frame->nfp_flow_key_layer = key_type;
 	frame->mask_id = ~0;
@@ -54,21 +55,20 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 		return;
 	}
 
-	flow_vlan = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_VLAN,
-					      flow->key);
-
-	/* Populate the tci field. */
-	if (!flow_vlan->vlan_id) {
-		tmp_tci = 0;
-	} else {
-		tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
-				     flow_vlan->vlan_priority) |
-			  FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
-				     flow_vlan->vlan_id) |
-			  NFP_FLOWER_MASK_VLAN_CFI;
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
+		flow_vlan = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_VLAN,
+						      flow->key);
+		/* Populate the tci field. */
+		if (flow_vlan->vlan_id) {
+			tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
+					     flow_vlan->vlan_priority) |
+				  FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID,
+					     flow_vlan->vlan_id) |
+				  NFP_FLOWER_MASK_VLAN_CFI;
+			frame->tci = cpu_to_be16(tmp_tci);
+		}
 	}
-	frame->tci = cpu_to_be16(tmp_tci);
 }
 
 static void
@@ -99,17 +99,18 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
 		       bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_eth_addrs *flow_mac;
-
-	flow_mac = skb_flow_dissector_target(flow->dissector,
-					     FLOW_DISSECTOR_KEY_ETH_ADDRS,
-					     target);
+	struct flow_dissector_key_eth_addrs *addr;
 
 	memset(frame, 0, sizeof(struct nfp_flower_mac_mpls));
 
-	/* Populate mac frame. */
-	ether_addr_copy(frame->mac_dst, &flow_mac->dst[0]);
-	ether_addr_copy(frame->mac_src, &flow_mac->src[0]);
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_ETH_ADDRS,
+						 target);
+		/* Populate mac frame. */
+		ether_addr_copy(frame->mac_dst, &addr->dst[0]);
+		ether_addr_copy(frame->mac_src, &addr->src[0]);
+	}
 
 	if (mask_version)
 		frame->mpls_lse = cpu_to_be32(~0);
@@ -121,14 +122,17 @@ nfp_flower_compile_tport(struct nfp_flower_tp_ports *frame,
 			 bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ports *flow_tp;
+	struct flow_dissector_key_ports *tp;
 
-	flow_tp = skb_flow_dissector_target(flow->dissector,
-					    FLOW_DISSECTOR_KEY_PORTS,
-					    target);
+	memset(frame, 0, sizeof(struct nfp_flower_tp_ports));
 
-	frame->port_src = flow_tp->src;
-	frame->port_dst = flow_tp->dst;
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_PORTS)) {
+		tp = skb_flow_dissector_target(flow->dissector,
+					       FLOW_DISSECTOR_KEY_PORTS,
+					       target);
+		frame->port_src = tp->src;
+		frame->port_dst = tp->dst;
+	}
 }
 
 static void
@@ -137,25 +141,27 @@ nfp_flower_compile_ipv4(struct nfp_flower_ipv4 *frame,
 			bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ipv4_addrs *flow_ipv4;
-	struct flow_dissector_key_basic *flow_basic;
-
-	flow_ipv4 = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_IPV4_ADDRS,
-					      target);
+	struct flow_dissector_key_ipv4_addrs *addr;
+	struct flow_dissector_key_basic *basic;
 
-	flow_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       target);
-
-	/* Populate IPv4 frame. */
-	frame->reserved = 0;
-	frame->ipv4_src = flow_ipv4->src;
-	frame->ipv4_dst = flow_ipv4->dst;
-	frame->proto = flow_basic->ip_proto;
 	/* Wildcard TOS/TTL for now. */
-	frame->tos = 0;
-	frame->ttl = 0;
+	memset(frame, 0, sizeof(struct nfp_flower_ipv4));
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_IPV4_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_IPV4_ADDRS,
+						 target);
+		frame->ipv4_src = addr->src;
+		frame->ipv4_dst = addr->dst;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		basic = skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_BASIC,
+						  target);
+		frame->proto = basic->ip_proto;
+	}
 }
 
 static void
@@ -164,26 +170,27 @@ nfp_flower_compile_ipv6(struct nfp_flower_ipv6 *frame,
 			bool mask_version)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
-	struct flow_dissector_key_ipv6_addrs *flow_ipv6;
-	struct flow_dissector_key_basic *flow_basic;
-
-	flow_ipv6 = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_IPV6_ADDRS,
-					      target);
-
-	flow_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       target);
+	struct flow_dissector_key_ipv6_addrs *addr;
+	struct flow_dissector_key_basic *basic;
 
-	/* Populate IPv6 frame. */
-	frame->reserved = 0;
-	frame->ipv6_src = flow_ipv6->src;
-	frame->ipv6_dst = flow_ipv6->dst;
-	frame->proto = flow_basic->ip_proto;
 	/* Wildcard LABEL/TOS/TTL for now. */
-	frame->ipv6_flow_label_exthdr = 0;
-	frame->tos = 0;
-	frame->ttl = 0;
+	memset(frame, 0, sizeof(struct nfp_flower_ipv6));
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_IPV6_ADDRS)) {
+		addr = skb_flow_dissector_target(flow->dissector,
+						 FLOW_DISSECTOR_KEY_IPV6_ADDRS,
+						 target);
+		frame->ipv6_src = addr->src;
+		frame->ipv6_dst = addr->dst;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		basic = skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_BASIC,
+						  target);
+		frame->proto = basic->ip_proto;
+	}
 }
 
 int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 4ad10bd..6c8ecc2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -105,35 +105,40 @@ static int
 nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 				struct tc_cls_flower_offload *flow)
 {
-	struct flow_dissector_key_control *mask_enc_ctl;
-	struct flow_dissector_key_basic *mask_basic;
-	struct flow_dissector_key_basic *key_basic;
+	struct flow_dissector_key_basic *mask_basic = NULL;
+	struct flow_dissector_key_basic *key_basic = NULL;
 	u32 key_layer_two;
 	u8 key_layer;
 	int key_size;
 
-	mask_enc_ctl = skb_flow_dissector_target(flow->dissector,
-						 FLOW_DISSECTOR_KEY_ENC_CONTROL,
-						 flow->mask);
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+		struct flow_dissector_key_control *mask_enc_ctl =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_CONTROL,
+						  flow->mask);
+		/* We are expecting a tunnel. For now we ignore offloading. */
+		if (mask_enc_ctl->addr_type)
+			return -EOPNOTSUPP;
+	}
+
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		mask_basic = skb_flow_dissector_target(flow->dissector,
+						       FLOW_DISSECTOR_KEY_BASIC,
+						       flow->mask);
 
-	mask_basic = skb_flow_dissector_target(flow->dissector,
-					       FLOW_DISSECTOR_KEY_BASIC,
-					       flow->mask);
+		key_basic = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_BASIC,
+						      flow->key);
+	}
 
-	key_basic = skb_flow_dissector_target(flow->dissector,
-					      FLOW_DISSECTOR_KEY_BASIC,
-					      flow->key);
 	key_layer_two = 0;
 	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
 	key_size = sizeof(struct nfp_flower_meta_one) +
 		   sizeof(struct nfp_flower_in_port) +
 		   sizeof(struct nfp_flower_mac_mpls);
 
-	/* We are expecting a tunnel. For now we ignore offloading. */
-	if (mask_enc_ctl->addr_type)
-		return -EOPNOTSUPP;
-
-	if (mask_basic->n_proto) {
+	if (mask_basic && mask_basic->n_proto) {
 		/* Ethernet type is present in the key. */
 		switch (key_basic->n_proto) {
 		case cpu_to_be16(ETH_P_IP):
@@ -166,7 +171,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		}
 	}
 
-	if (mask_basic->ip_proto) {
+	if (mask_basic && mask_basic->ip_proto) {
 		/* Ethernet type is present in the key. */
 		switch (key_basic->ip_proto) {
 		case IPPROTO_TCP:
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/3] nfp: fix supported key layers calculation
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.

Fixes: af9d842c1354 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 6c8ecc2..74a96d6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -107,6 +107,7 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 {
 	struct flow_dissector_key_basic *mask_basic = NULL;
 	struct flow_dissector_key_basic *key_basic = NULL;
+	struct flow_dissector_key_ip *mask_ip = NULL;
 	u32 key_layer_two;
 	u8 key_layer;
 	int key_size;
@@ -132,6 +133,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 						      flow->key);
 	}
 
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_IP))
+		mask_ip = skb_flow_dissector_target(flow->dissector,
+						    FLOW_DISSECTOR_KEY_IP,
+						    flow->mask);
+
 	key_layer_two = 0;
 	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
 	key_size = sizeof(struct nfp_flower_meta_one) +
@@ -142,11 +148,19 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		/* Ethernet type is present in the key. */
 		switch (key_basic->n_proto) {
 		case cpu_to_be16(ETH_P_IP):
+			if (mask_ip && mask_ip->tos)
+				return -EOPNOTSUPP;
+			if (mask_ip && mask_ip->ttl)
+				return -EOPNOTSUPP;
 			key_layer |= NFP_FLOWER_LAYER_IPV4;
 			key_size += sizeof(struct nfp_flower_ipv4);
 			break;
 
 		case cpu_to_be16(ETH_P_IPV6):
+			if (mask_ip && mask_ip->tos)
+				return -EOPNOTSUPP;
+			if (mask_ip && mask_ip->ttl)
+				return -EOPNOTSUPP;
 			key_layer |= NFP_FLOWER_LAYER_IPV6;
 			key_size += sizeof(struct nfp_flower_ipv6);
 			break;
@@ -157,6 +171,11 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 		case cpu_to_be16(ETH_P_ARP):
 			return -EOPNOTSUPP;
 
+		/* Currently we do not offload MPLS. */
+		case cpu_to_be16(ETH_P_MPLS_UC):
+		case cpu_to_be16(ETH_P_MPLS_MC):
+			return -EOPNOTSUPP;
+
 		/* Will be included in layer 2. */
 		case cpu_to_be16(ETH_P_8021Q):
 			break;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 3/3] nfp: remove incorrect mask check for vlan matching
From: Pieter Jansen van Vuuren @ 2017-08-25 17:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, simon.horman, jakub.kicinski,
	Pieter Jansen van Vuuren
In-Reply-To: <1503682263-17858-1-git-send-email-pieter.jansenvanvuuren@netronome.com>

Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.

Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index b365110..d25b503 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -42,6 +42,7 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 			    struct tc_cls_flower_offload *flow, u8 key_type,
 			    bool mask_version)
 {
+	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
 	struct flow_dissector_key_vlan *flow_vlan;
 	u16 tmp_tci;
 
@@ -50,15 +51,10 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
 	frame->nfp_flow_key_layer = key_type;
 	frame->mask_id = ~0;
 
-	if (mask_version) {
-		frame->tci = cpu_to_be16(~0);
-		return;
-	}
-
 	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
 		flow_vlan = skb_flow_dissector_target(flow->dissector,
 						      FLOW_DISSECTOR_KEY_VLAN,
-						      flow->key);
+						      target);
 		/* Populate the tci field. */
 		if (flow_vlan->vlan_id) {
 			tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO,
-- 
2.7.4

^ permalink raw reply related

* Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 17:56 UTC (permalink / raw)
  To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

I’d like to get your thoughts on adding LSM permission checks on BPF
objects.

By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
bpf() functions. This seems like poor granularity. [2]

Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.

Let me know what you think. Are there simpler alternative approaches that
we haven’t considered?

Thanks!
Jeff

[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
allowing all processes permission to use bpf() goes against the principle
of least privilege exposing a lot of kernel attack surface to processes
that do not actually need it.

[-- Attachment #2: Type: text/html, Size: 1175 bytes --]

^ permalink raw reply

* Permissions for eBPF objects
From: Jeffrey Vander Stoep @ 2017-08-25 18:01 UTC (permalink / raw)
  To: Chenbo Feng, netdev, SELinux

I’d like to get your thoughts on adding LSM permission checks on BPF objects.

By default, the ability to create and use eBPF maps/programs requires
CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
to bpf() functions. This seems like poor granularity. [2]

Like files and sockets, eBPF maps and programs can be passed between
processes by FD and have a number of functions that map cleanly to
permissions.

Let me know what you think. Are there simpler alternative approaches
that we haven’t considered?

Thanks!
Jeff

[1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
[2] We are considering eBPF for network filtering by netd. Giving netd
CAP_SYS_ADMIN would considerably increase netd’s privileges.
Alternatively allowing all processes permission to use bpf() goes
against the principle of least privilege exposing a lot of kernel
attack surface to processes that do not actually need it.

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 18:03 UTC (permalink / raw)
  To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CABXk95ATb_AFk+4GX9Xw+HEU6No8irb0mOoLE9O4EBuLAgA-1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

Disregard this email. Re-sending in plain-text mode to prevent rejection by
netdev list.

On Fri, Aug 25, 2017 at 10:56 AM Jeffrey Vander Stoep <jeffv-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:

> I’d like to get your thoughts on adding LSM permission checks on BPF
> objects.
>
> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
> bpf() functions. This seems like poor granularity. [2]
>
> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
>
> Let me know what you think. Are there simpler alternative approaches that
> we haven’t considered?
>
> Thanks!
> Jeff
>
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
> allowing all processes permission to use bpf() goes against the principle
> of least privilege exposing a lot of kernel attack surface to processes
> that do not actually need it.
>

[-- Attachment #2: Type: text/html, Size: 1660 bytes --]

^ permalink raw reply

* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Michael S. Tsirkin @ 2017-08-25 18:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <1502905007.4936.133.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> As found by syzkaller, malicious users can set whatever tx_queue_len
> on a tun device and eventually crash the kernel.
> 
> Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> ring buffer is not fast anyway.

I'm not sure it's worth changing for small rings.

Does kmalloc_array guarantee cache line alignment for big buffers
then? If the ring is misaligned it will likely cause false sharing
as it's designed to be accessed from two CPUs.

> Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> ---
>  include/linux/ptr_ring.h  |    9 +++++----
>  include/linux/skb_array.h |    3 ++-
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index d8c97ec8a8e6..37b4bb2545b3 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -436,9 +436,9 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
>  	__PTR_RING_PEEK_CALL_v; \
>  })
>  
> -static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
> +static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
>  {
> -	return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
> +	return kcalloc(size, sizeof(void *), gfp);
>  }
>  
>  static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
> @@ -582,7 +582,8 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
>   * In particular if you consume ring in interrupt or BH context, you must
>   * disable interrupts/BH when doing so.
>   */
> -static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
> +static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
> +					   unsigned int nrings,
>  					   int size,
>  					   gfp_t gfp, void (*destroy)(void *))
>  {
> @@ -590,7 +591,7 @@ static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
>  	void ***queues;
>  	int i;
>  
> -	queues = kmalloc(nrings * sizeof *queues, gfp);
> +	queues = kmalloc_array(nrings, sizeof(*queues), gfp);
>  	if (!queues)
>  		goto noqueues;
>  
> diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
> index 35226cd4efb0..8621ffdeecbf 100644
> --- a/include/linux/skb_array.h
> +++ b/include/linux/skb_array.h
> @@ -193,7 +193,8 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
>  }
>  
>  static inline int skb_array_resize_multiple(struct skb_array **rings,
> -					    int nrings, int size, gfp_t gfp)
> +					    int nrings, unsigned int size,
> +					    gfp_t gfp)
>  {
>  	BUILD_BUG_ON(offsetof(struct skb_array, ring));
>  	return ptr_ring_resize_multiple((struct ptr_ring **)rings,
> 

^ permalink raw reply

* [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2017-08-25 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kvm, mst, netdev, linux-kernel, stable, virtualization, stefanha,
	yasu.isimatu, hch

The following changes since commit 14ccee78fc82f5512908f4424f541549a5705b89:

  Linux 4.13-rc6 (2017-08-20 14:13:52 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to ba74b6f7fcc07355d087af6939712eed4a454821:

  virtio_pci: fix cpu affinity support (2017-08-25 21:38:26 +0300)

----------------------------------------------------------------
virtio: bugfix

Fixes two obvious bugs in virtio pci.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Christoph Hellwig (1):
      virtio_pci: fix cpu affinity support

Stefan Hajnoczi (1):
      virtio_blk: fix incorrect message when disk is resized

 drivers/block/virtio_blk.c         | 16 ++++++++++------
 drivers/virtio/virtio_pci_common.c | 10 +++++++---
 2 files changed, 17 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Eric Dumazet @ 2017-08-25 18:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <20170825205653-mutt-send-email-mst@kernel.org>

On Fri, 2017-08-25 at 21:03 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > As found by syzkaller, malicious users can set whatever tx_queue_len
> > on a tun device and eventually crash the kernel.
> > 
> > Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> > ring buffer is not fast anyway.
> 
> I'm not sure it's worth changing for small rings.
> 
> Does kmalloc_array guarantee cache line alignment for big buffers
> then? If the ring is misaligned it will likely cause false sharing
> as it's designed to be accessed from two CPUs.

I specifically said that in the changelog :

"since a small ring buffer is not fast anyway."

If one user sets up a pathological small ring buffer, kernel should not
try to fix it.

In this case, you would have to setup a ring of 2 or 4 slots to
eventually hit false sharing.

^ permalink raw reply

* [PATCH v2 net-next 0/8] bpf: Add option to set mark and priority in cgroup sock programs
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern

Add option to set mark and priority in addition to bound device for newly
created sockets. Also, allow the bpf programs to use the get_current_uid_gid
helper meaning socket marks, priority and device can be set base on the
uid/gid of the running process.

For flexbility in deploying these programs, option is added to allow cgroups
to be walked from current to root running any program attached. This allows
one cgroup level to control the device a socket is bound to (e.g, a VRF) while
cgroups can be used to set socket marks and priority.

Sample programs are updated to demonstrate the new options.

v2
- added flag to control recursive behavior as requested by Alexei
- added comment to sock_filter_func_proto regarding use of
  get_current_uid_gid helper
- updated test programs for recursive option

David Ahern (8):
  bpf: Add support for recursively running cgroup sock filters
  bpf: Add mark and priority to sock options that can be set
  bpf: Allow cgroup sock filters to use get_current_uid_gid helper
  samples/bpf: Update sock test to allow setting mark and priority
  samples/bpf: Add detach option to test_cgrp2_sock
  samples/bpf: Add option to dump socket settings
  samples/bpf: Add test case for nested socket options
  samples/bpf: Update cgroup socket examples to use uid gid helper

 include/linux/bpf-cgroup.h      |  10 +-
 include/uapi/linux/bpf.h        |  11 ++
 kernel/bpf/cgroup.c             |  29 +++--
 kernel/bpf/syscall.c            |   6 +-
 kernel/cgroup/cgroup.c          |  25 +++-
 net/core/filter.c               |  42 ++++++-
 samples/bpf/sock_flags_kern.c   |   5 +
 samples/bpf/test_cgrp2_sock.c   | 258 ++++++++++++++++++++++++++++++++++++----
 samples/bpf/test_cgrp2_sock.sh  |   2 +-
 samples/bpf/test_cgrp2_sock3.sh | 162 +++++++++++++++++++++++++
 10 files changed, 506 insertions(+), 44 deletions(-)
 create mode 100755 samples/bpf/test_cgrp2_sock3.sh

-- 
2.1.4

^ permalink raw reply

* [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add support for recursively applying sock filters attached to a cgroup.
For now, start with the inner cgroup attached to the socket and work back
to the root or first cgroup without the recursive flag set. Once the
recursive flag is set for a cgroup all descendant group's must have the
flag as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/bpf-cgroup.h | 10 ++++++----
 include/uapi/linux/bpf.h   |  9 +++++++++
 kernel/bpf/cgroup.c        | 29 ++++++++++++++++++++++-------
 kernel/bpf/syscall.c       |  6 +++---
 kernel/cgroup/cgroup.c     | 25 +++++++++++++++++++++++--
 5 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d41d40ac3efd..2d02187f242f 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -23,6 +23,7 @@ struct cgroup_bpf {
 	struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
 	struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE];
 	bool disallow_override[MAX_BPF_ATTACH_TYPE];
+	bool is_recursive[MAX_BPF_ATTACH_TYPE];
 };
 
 void cgroup_bpf_put(struct cgroup *cgrp);
@@ -30,18 +31,19 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
 
 int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			struct bpf_prog *prog, enum bpf_attach_type type,
-			bool overridable);
+			u32 flags);
 
 /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
 int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, bool overridable);
+		      enum bpf_attach_type type, u32 flags);
 
 int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
 				enum bpf_attach_type type);
 
-int __cgroup_bpf_run_filter_sk(struct sock *sk,
+int __cgroup_bpf_run_filter_sk(struct cgroup *cgrp, struct sock *sk,
 			       enum bpf_attach_type type);
+int cgroup_bpf_run_filter_sk(struct sock *sk, enum bpf_attach_type type);
 
 int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 				     struct bpf_sock_ops_kern *sock_ops,
@@ -74,7 +76,7 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled && sk) {					       \
-		__ret = __cgroup_bpf_run_filter_sk(sk,			       \
+		__ret = cgroup_bpf_run_filter_sk(sk,			       \
 						 BPF_CGROUP_INET_SOCK_CREATE); \
 	}								       \
 	__ret;								       \
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f71f5e07d82d..595e31b30f23 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -151,6 +151,15 @@ enum bpf_attach_type {
  */
 #define BPF_F_ALLOW_OVERRIDE	(1U << 0)
 
+/* If BPF_F_RECURSIVE flag is used in BPF_PROG_ATTACH command
+ * cgroups are walked recursively back to the root cgroup or the
+ * first cgroup without the flag set running any program attached.
+ * Once the flag is set, it MUST be set for all descendant cgroups.
+ */
+#define BPF_F_RECURSIVE		(1U << 1)
+
+#define BPF_F_ALL_ATTACH_FLAGS  (BPF_F_ALLOW_OVERRIDE | BPF_F_RECURSIVE)
+
 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
  * verifier will perform strict alignment checking as if the kernel
  * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 546113430049..eb1f436c18fb 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -47,10 +47,16 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
 	unsigned int type;
 
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) {
-		struct bpf_prog *e;
+		struct bpf_prog *e = NULL;
+
+		/* do not need to set effective program if cgroups are
+		 * walked recursively
+		 */
+		cgrp->bpf.is_recursive[type] = parent->bpf.is_recursive[type];
+		if (!cgrp->bpf.is_recursive[type])
+			e = rcu_dereference_protected(parent->bpf.effective[type],
+						      lockdep_is_held(&cgroup_mutex));
 
-		e = rcu_dereference_protected(parent->bpf.effective[type],
-					      lockdep_is_held(&cgroup_mutex));
 		rcu_assign_pointer(cgrp->bpf.effective[type], e);
 		cgrp->bpf.disallow_override[type] = parent->bpf.disallow_override[type];
 	}
@@ -85,8 +91,12 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
  */
 int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			struct bpf_prog *prog, enum bpf_attach_type type,
-			bool new_overridable)
+			u32 flags)
 {
+	bool new_overridable = flags & BPF_F_ALLOW_OVERRIDE;
+	/* initial state inherited from parent */
+	bool curr_recursive = cgrp->bpf.is_recursive[type];
+	bool new_recursive = flags & BPF_F_RECURSIVE;
 	struct bpf_prog *old_prog, *effective = NULL;
 	struct cgroup_subsys_state *pos;
 	bool overridable = true;
@@ -109,6 +119,12 @@ int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 		 */
 		return -EPERM;
 
+	if (prog && curr_recursive && !new_recursive)
+		/* if a parent has recursive prog attached, only
+		 * allow recursive programs in descendent cgroup
+		 */
+		return -EINVAL;
+
 	old_prog = cgrp->bpf.prog[type];
 
 	if (prog) {
@@ -139,6 +155,7 @@ int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			rcu_assign_pointer(desc->bpf.effective[type],
 					   effective);
 			desc->bpf.disallow_override[type] = !overridable;
+			desc->bpf.is_recursive[type] = new_recursive;
 		}
 	}
 
@@ -217,14 +234,12 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
  * This function will return %-EPERM if any if an attached program was found
  * and if it returned != 1 during execution. In all other cases, 0 is returned.
  */
-int __cgroup_bpf_run_filter_sk(struct sock *sk,
+int __cgroup_bpf_run_filter_sk(struct cgroup *cgrp, struct sock *sk,
 			       enum bpf_attach_type type)
 {
-	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	struct bpf_prog *prog;
 	int ret = 0;
 
-
 	rcu_read_lock();
 
 	prog = rcu_dereference(cgrp->bpf.effective[type]);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d5774a6851f1..a1ab5dbaae89 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1187,7 +1187,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_PROG_ATTACH))
 		return -EINVAL;
 
-	if (attr->attach_flags & ~BPF_F_ALLOW_OVERRIDE)
+	if (attr->attach_flags & ~BPF_F_ALL_ATTACH_FLAGS)
 		return -EINVAL;
 
 	switch (attr->attach_type) {
@@ -1222,7 +1222,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	}
 
 	ret = cgroup_bpf_update(cgrp, prog, attr->attach_type,
-				attr->attach_flags & BPF_F_ALLOW_OVERRIDE);
+				attr->attach_flags);
 	if (ret)
 		bpf_prog_put(prog);
 	cgroup_put(cgrp);
@@ -1252,7 +1252,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
 
-		ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, false);
+		ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, 0);
 		cgroup_put(cgrp);
 		break;
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index df2e0f14a95d..27a4f14435a3 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5176,14 +5176,35 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd)
 
 #ifdef CONFIG_CGROUP_BPF
 int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, bool overridable)
+		      enum bpf_attach_type type, u32 flags)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_update(cgrp, parent, prog, type, overridable);
+	ret = __cgroup_bpf_update(cgrp, parent, prog, type, flags);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
+int cgroup_bpf_run_filter_sk(struct sock *sk,
+			     enum bpf_attach_type type)
+{
+	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	int ret = 0;
+
+	while (cgrp) {
+		ret = __cgroup_bpf_run_filter_sk(cgrp, sk, type);
+		if (ret)
+			break;
+
+		if (!cgrp->bpf.is_recursive[type])
+			break;
+
+		cgrp = cgroup_parent(cgrp);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(cgroup_bpf_run_filter_sk);
 #endif /* CONFIG_CGROUP_BPF */
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 2/8] bpf: Add mark and priority to sock options that can be set
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c        | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 595e31b30f23..f72b957580cd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -773,6 +773,8 @@ struct bpf_sock {
 	__u32 family;
 	__u32 type;
 	__u32 protocol;
+	__u32 mark;
+	__u32 priority;
 };
 
 #define XDP_PACKET_HEADROOM 256
diff --git a/net/core/filter.c b/net/core/filter.c
index 4bcd6baa80c9..d582d1b1e533 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3444,6 +3444,10 @@ static bool sock_filter_is_valid_access(int off, int size,
 		switch (off) {
 		case offsetof(struct bpf_sock, bound_dev_if):
 			break;
+		case offsetof(struct bpf_sock, mark):
+			break;
+		case offsetof(struct bpf_sock, priority):
+			break;
 		default:
 			return false;
 		}
@@ -3947,6 +3951,28 @@ static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct sock, sk_bound_dev_if));
 		break;
 
+	case offsetof(struct bpf_sock, mark):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_mark) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					offsetof(struct sock, sk_mark));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct sock, sk_mark));
+		break;
+
+	case offsetof(struct bpf_sock, priority):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					offsetof(struct sock, sk_priority));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct sock, sk_priority));
+		break;
+
 	case offsetof(struct bpf_sock, family):
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_family) != 2);
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 3/8] bpf: Allow cgroup sock filters to use get_current_uid_gid helper
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Allow BPF programs run on sock create to use the get_current_uid_gid
helper. IPv4 and IPv6 sockets are created in a process context so
there is always a valid uid/gid

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/core/filter.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d582d1b1e533..eb505842a77e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3139,6 +3139,20 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+sock_filter_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	/* inet and inet6 sockets are created in a process
+	 * context so there is always a valid uid/gid
+	 */
+	case BPF_FUNC_get_current_uid_gid:
+		return &bpf_get_current_uid_gid_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static const struct bpf_func_proto *
 sk_filter_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -4222,7 +4236,7 @@ const struct bpf_verifier_ops lwt_xmit_prog_ops = {
 };
 
 const struct bpf_verifier_ops cg_sock_prog_ops = {
-	.get_func_proto		= bpf_base_func_proto,
+	.get_func_proto		= sock_filter_func_proto,
 	.is_valid_access	= sock_filter_is_valid_access,
 	.convert_ctx_access	= sock_filter_convert_ctx_access,
 };
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 4/8] samples/bpf: Update sock test to allow setting mark and priority
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Update sock test to set mark and priority on socket create.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c  | 139 ++++++++++++++++++++++++++++++++++++-----
 samples/bpf/test_cgrp2_sock.sh |   2 +-
 2 files changed, 123 insertions(+), 18 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index c3cfb23e23b5..b018bf948933 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -19,63 +19,168 @@
 #include <errno.h>
 #include <fcntl.h>
 #include <net/if.h>
+#include <inttypes.h>
 #include <linux/bpf.h>
 
 #include "libbpf.h"
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
-static int prog_load(int idx)
+static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 {
-	struct bpf_insn prog[] = {
+	/* save pointer to context */
+	struct bpf_insn prog_start[] = {
 		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	};
+	struct bpf_insn prog_end[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	/* set sk_bound_dev_if on socket */
+	struct bpf_insn prog_dev[] = {
 		BPF_MOV64_IMM(BPF_REG_3, idx),
 		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
 		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
-		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
-		BPF_EXIT_INSN(),
 	};
-	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
-	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
+	/* set mark on socket */
+	struct bpf_insn prog_mark[] = {
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		BPF_MOV64_IMM(BPF_REG_3, mark),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, mark)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, mark)),
+	};
+
+	/* set priority on socket */
+	struct bpf_insn prog_prio[] = {
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		BPF_MOV64_IMM(BPF_REG_3, prio),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)),
+	};
+
+	struct bpf_insn *prog;
+	size_t insns_cnt;
+	void *p;
+	int ret;
+
+	insns_cnt = sizeof(prog_start) + sizeof(prog_end);
+	if (idx)
+		insns_cnt += sizeof(prog_dev);
+
+	if (mark)
+		insns_cnt += sizeof(prog_mark);
+
+	if (prio)
+		insns_cnt += sizeof(prog_prio);
+
+	p = prog = malloc(insns_cnt);
+	if (!prog) {
+		fprintf(stderr, "Failed to allocate memory for instructions\n");
+		return EXIT_FAILURE;
+	}
+
+	memcpy(p, prog_start, sizeof(prog_start));
+	p += sizeof(prog_start);
+
+	if (idx) {
+		memcpy(p, prog_dev, sizeof(prog_dev));
+		p += sizeof(prog_dev);
+	}
+
+	if (mark) {
+		memcpy(p, prog_mark, sizeof(prog_mark));
+		p += sizeof(prog_mark);
+	}
+
+	if (prio) {
+		memcpy(p, prog_prio, sizeof(prog_prio));
+		p += sizeof(prog_prio);
+	}
+
+	memcpy(p, prog_end, sizeof(prog_end));
+	p += sizeof(prog_end);
+
+	insns_cnt /= sizeof(struct bpf_insn);
+
+	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
 				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
+
+	free(prog);
+
+	return ret;
 }
 
 static int usage(const char *argv0)
 {
-	printf("Usage: %s cg-path device-index\n", argv0);
+	printf("Usage: %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
 	return EXIT_FAILURE;
 }
 
 int main(int argc, char **argv)
 {
+	__u32 attach_flags = BPF_F_ALLOW_OVERRIDE;
+	__u32 idx = 0, mark = 0, prio = 0;
+	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
-	unsigned int idx;
+	int rc;
+
+	while ((rc = getopt(argc, argv, "b:m:p:r")) != -1) {
+		switch (rc) {
+		case 'b':
+			idx = if_nametoindex(optarg);
+			if (!idx) {
+				idx = strtoumax(optarg, NULL, 0);
+				if (!idx) {
+					printf("Invalid device name\n");
+					return EXIT_FAILURE;
+				}
+			}
+			break;
+		case 'm':
+			mark = strtoumax(optarg, NULL, 0);
+			break;
+		case 'p':
+			prio = strtoumax(optarg, NULL, 0);
+			break;
+		case 'r':
+			attach_flags |= BPF_F_RECURSIVE;
+			break;
+		default:
+			return usage(argv[0]);
+		}
+	}
 
-	if (argc < 2)
+	if (optind == argc)
 		return usage(argv[0]);
 
-	idx = if_nametoindex(argv[2]);
-	if (!idx) {
-		printf("Invalid device name\n");
+	cgrp_path = argv[optind];
+	if (!cgrp_path) {
+		fprintf(stderr, "cgroup path not given\n");
 		return EXIT_FAILURE;
 	}
 
-	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (!idx && !mark && !prio) {
+		fprintf(stderr, "One of device, mark or priority must be given\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(cgrp_path, O_DIRECTORY | O_RDONLY);
 	if (cg_fd < 0) {
 		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
 		return EXIT_FAILURE;
 	}
 
-	prog_fd = prog_load(idx);
-	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
-
+	prog_fd = prog_load(idx, mark, prio);
 	if (prog_fd < 0) {
 		printf("Failed to load prog: '%s'\n", strerror(errno));
+		printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
 		return EXIT_FAILURE;
 	}
 
-	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE, 0);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
+			      attach_flags);
 	if (ret < 0) {
 		printf("Failed to attach prog to cgroup: '%s'\n",
 		       strerror(errno));
diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
index 925fd467c7cc..1153c33e8964 100755
--- a/samples/bpf/test_cgrp2_sock.sh
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -20,7 +20,7 @@ function attach_bpf {
 	mkdir -p /tmp/cgroupv2
 	mount -t cgroup2 none /tmp/cgroupv2
 	mkdir -p /tmp/cgroupv2/foo
-	test_cgrp2_sock /tmp/cgroupv2/foo foo
+	test_cgrp2_sock -b foo /tmp/cgroupv2/foo
 	echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 5/8] samples/bpf: Add detach option to test_cgrp2_sock
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add option to detach programs from a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c | 50 ++++++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index b018bf948933..a1ef7b8bd3f9 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -114,7 +114,12 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 
 static int usage(const char *argv0)
 {
-	printf("Usage: %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
+	printf("Usage:\n");
+	printf("  Attach a program\n");
+	printf("  %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
+	printf("\n");
+	printf("  Detach a program\n");
+	printf("  %s -d cg-path\n", argv0);
 	return EXIT_FAILURE;
 }
 
@@ -124,10 +129,14 @@ int main(int argc, char **argv)
 	__u32 idx = 0, mark = 0, prio = 0;
 	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
+	int do_attach = 1;
 	int rc;
 
-	while ((rc = getopt(argc, argv, "b:m:p:r")) != -1) {
+	while ((rc = getopt(argc, argv, "db:m:p:r")) != -1) {
 		switch (rc) {
+		case 'd':
+			do_attach = 0;
+			break;
 		case 'b':
 			idx = if_nametoindex(optarg);
 			if (!idx) {
@@ -161,7 +170,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	if (!idx && !mark && !prio) {
+	if (do_attach && !idx && !mark && !prio) {
 		fprintf(stderr, "One of device, mark or priority must be given\n");
 		return EXIT_FAILURE;
 	}
@@ -172,20 +181,31 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	prog_fd = prog_load(idx, mark, prio);
-	if (prog_fd < 0) {
-		printf("Failed to load prog: '%s'\n", strerror(errno));
-		printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
-		return EXIT_FAILURE;
-	}
+	if (do_attach) {
+		prog_fd = prog_load(idx, mark, prio);
+		if (prog_fd < 0) {
+			printf("Failed to load prog: '%s'\n", strerror(errno));
+			printf("Output from kernel verifier:\n%s\n-------\n",
+			       bpf_log_buf);
+			return EXIT_FAILURE;
+		}
 
-	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
-			      attach_flags);
-	if (ret < 0) {
-		printf("Failed to attach prog to cgroup: '%s'\n",
-		       strerror(errno));
-		return EXIT_FAILURE;
+		ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
+				      attach_flags);
+		if (ret < 0) {
+			printf("Failed to attach prog to cgroup: '%s'\n",
+			       strerror(errno));
+			return EXIT_FAILURE;
+		}
+	} else {
+		ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+		if (ret < 0) {
+			printf("Failed to detach prog from cgroup: '%s'\n",
+			       strerror(errno));
+			return EXIT_FAILURE;
+		}
 	}
 
+	close(cg_fd);
 	return EXIT_SUCCESS;
 }
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 6/8] samples/bpf: Add option to dump socket settings
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c | 75 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index a1ef7b8bd3f9..eabf530a5223 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -112,6 +112,70 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 	return ret;
 }
 
+static int get_bind_to_device(int sd, char *name, size_t len)
+{
+	socklen_t optlen = len;
+	int rc;
+
+	name[0] = '\0';
+	rc = getsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, name, &optlen);
+	if (rc < 0)
+		perror("setsockopt(SO_BINDTODEVICE)");
+
+	return rc;
+}
+
+static unsigned int get_somark(int sd)
+{
+	unsigned int mark = 0;
+	socklen_t optlen = sizeof(mark);
+	int rc;
+
+	rc = getsockopt(sd, SOL_SOCKET, SO_MARK, &mark, &optlen);
+	if (rc < 0)
+		perror("getsockopt(SO_MARK)");
+
+	return mark;
+}
+
+static unsigned int get_priority(int sd)
+{
+	unsigned int prio = 0;
+	socklen_t optlen = sizeof(prio);
+	int rc;
+
+	rc = getsockopt(sd, SOL_SOCKET, SO_PRIORITY, &prio, &optlen);
+	if (rc < 0)
+		perror("getsockopt(SO_PRIORITY)");
+
+	return prio;
+}
+
+static int show_sockopts(int family)
+{
+	unsigned int mark, prio;
+	char name[16];
+	int sd;
+
+	sd = socket(family, SOCK_DGRAM, 17);
+	if (sd < 0) {
+		perror("socket");
+		return 1;
+	}
+
+	if (get_bind_to_device(sd, name, sizeof(name)) < 0)
+		return 1;
+
+	mark = get_somark(sd);
+	prio = get_priority(sd);
+
+	close(sd);
+
+	printf("sd %d: dev %s, mark %u, priority %u\n", sd, name, mark, prio);
+
+	return 0;
+}
+
 static int usage(const char *argv0)
 {
 	printf("Usage:\n");
@@ -120,6 +184,9 @@ static int usage(const char *argv0)
 	printf("\n");
 	printf("  Detach a program\n");
 	printf("  %s -d cg-path\n", argv0);
+	printf("\n");
+	printf("  Show inherited socket settings (mark, priority, and device)\n");
+	printf("  %s [-6]\n", argv0);
 	return EXIT_FAILURE;
 }
 
@@ -129,10 +196,11 @@ int main(int argc, char **argv)
 	__u32 idx = 0, mark = 0, prio = 0;
 	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
+	int family = PF_INET;
 	int do_attach = 1;
 	int rc;
 
-	while ((rc = getopt(argc, argv, "db:m:p:r")) != -1) {
+	while ((rc = getopt(argc, argv, "db:m:p:r6")) != -1) {
 		switch (rc) {
 		case 'd':
 			do_attach = 0;
@@ -156,13 +224,16 @@ int main(int argc, char **argv)
 		case 'r':
 			attach_flags |= BPF_F_RECURSIVE;
 			break;
+		case '6':
+			family = PF_INET6;
+			break;
 		default:
 			return usage(argv[0]);
 		}
 	}
 
 	if (optind == argc)
-		return usage(argv[0]);
+		return show_sockopts(family);
 
 	cgrp_path = argv[optind];
 	if (!cgrp_path) {
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 7/8] samples/bpf: Add test case for nested socket options
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock3.sh | 162 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)
 create mode 100755 samples/bpf/test_cgrp2_sock3.sh

diff --git a/samples/bpf/test_cgrp2_sock3.sh b/samples/bpf/test_cgrp2_sock3.sh
new file mode 100755
index 000000000000..9bfed035963f
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock3.sh
@@ -0,0 +1,162 @@
+#!/bin/sh
+
+# Verify socket options inherited by bpf programs attached
+# to a cgroup.
+
+CGRP_MNT="/tmp/cgroupv2-test_cgrp2_sock"
+
+################################################################################
+#
+print_result()
+{
+	printf "%-50s    [%4s]\n" "$1" "$2"
+}
+
+check_sock()
+{
+	out=$(test_cgrp2_sock)
+	echo $out | grep -q "$1"
+	if [ $? -ne 0 ]; then
+		print_result "IPv4: $2" "FAIL"
+		echo "    expected: $1"
+		echo "        have: $out"
+		rc=1
+	else
+		print_result "IPv4: $2" " OK "
+	fi
+}
+
+check_sock6()
+{
+	out=$(test_cgrp2_sock -6)
+	echo $out | grep -q "$1"
+	if [ $? -ne 0 ]; then
+		print_result "IPv6: $2" "FAIL"
+		echo "    expected: $1"
+		echo "        have: $out"
+		rc=1
+	else
+		print_result "IPv6: $2" " OK "
+	fi
+}
+
+################################################################################
+#
+setup()
+{
+	cleanup 2>/dev/null
+
+	mkdir -p ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to create cgroup hierarchy"
+
+	test_cgrp2_sock -p 123 ${CGRP_MNT}/cgrp_sock_test/prio
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set priority"
+
+	test_cgrp2_sock -m 666 -r ${CGRP_MNT}/cgrp_sock_test/prio/mark
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set mark"
+
+	test_cgrp2_sock -b cgrp2_sock -r ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set device"
+}
+
+cleanup()
+{
+	echo $$ >> ${CGRP_MNT}/cgroup.procs
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio/mark
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio
+	rmdir ${CGRP_MNT}/cgrp_sock_test
+}
+
+cleanup_and_exit()
+{
+	local rc=$1
+	local msg="$2"
+
+	[ -n "$msg" ] && echo "ERROR: $msg"
+
+	ip li del cgrp2_sock
+	umount ${CGRP_MNT}
+
+	exit $rc
+}
+
+################################################################################
+#
+
+run_tests()
+{
+	# set pid into first cgroup. socket should show it
+	# has a priority but not a mark or device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/cgroup.procs
+	check_sock "dev , mark 0, priority 123" "Priority only"
+
+	# set pid into second group. socket should show it
+	# has a priority and mark but not a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/cgroup.procs
+	check_sock "dev , mark 666, priority 123" "Priority + mark"
+
+	# set pid into inner group. socket should show it
+	# has a priority, mark and a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev/cgroup.procs
+	check_sock "dev cgrp2_sock, mark 666, priority 123" "Priority + mark + dev"
+
+	echo
+
+	# set pid into first cgroup. socket should show it
+	# has a priority but not a mark or device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/cgroup.procs
+	check_sock6 "dev , mark 0, priority 123" "Priority only"
+
+	# set pid into second group. socket should show it
+	# has a priority and mark but not a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/cgroup.procs
+	check_sock6 "dev , mark 666, priority 123" "Priority + mark"
+
+	# set pid into inner group. socket should show it
+	# has a priority, mark and a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev/cgroup.procs
+	check_sock6 "dev cgrp2_sock, mark 666, priority 123" "Priority + mark + dev"
+}
+
+################################################################################
+# verify expected invalid setups are invalid
+
+invalid_setup()
+{
+	echo
+
+	mkdir -p ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to create cgroup hierarchy"
+
+	test_cgrp2_sock -p 123 -r ${CGRP_MNT}/cgrp_sock_test/prio
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set priority"
+
+	# recursive - followed by non-recursive is not allowed
+	test_cgrp2_sock -m 666 ${CGRP_MNT}/cgrp_sock_test/prio/mark >/dev/null 2>&1
+	if [ $? -eq 0 ]; then
+		print_result "recursive setting followed by non-recursive" "FAIL"
+	else
+		print_result "recursive setting followed by non-recursive" " OK "
+	fi
+}
+
+################################################################################
+# main
+
+rc=0
+
+ip li add cgrp2_sock type dummy 2>/dev/null
+
+set -e
+mkdir -p ${CGRP_MNT}
+mount -t cgroup2 none ${CGRP_MNT}
+set +e
+
+setup
+run_tests
+cleanup
+
+invalid_setup
+
+cleanup_and_exit $rc
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 8/8] samples/bpf: Update cgroup socket examples to use uid gid helper
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/sock_flags_kern.c |  5 +++++
 samples/bpf/test_cgrp2_sock.c | 12 +++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/sock_flags_kern.c b/samples/bpf/sock_flags_kern.c
index 533dd11a6baa..05dcdf8a4baa 100644
--- a/samples/bpf/sock_flags_kern.c
+++ b/samples/bpf/sock_flags_kern.c
@@ -9,8 +9,13 @@ SEC("cgroup/sock1")
 int bpf_prog1(struct bpf_sock *sk)
 {
 	char fmt[] = "socket: family %d type %d protocol %d\n";
+	char fmt2[] = "socket: uid %u gid %u\n";
+	__u64 gid_uid = bpf_get_current_uid_gid();
+	__u32 uid = gid_uid & 0xffffffff;
+	__u32 gid = gid_uid >> 32;
 
 	bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
+	bpf_trace_printk(fmt2, sizeof(fmt2), uid, gid);
 
 	/* block PF_INET6, SOCK_RAW, IPPROTO_ICMPV6 sockets
 	 * ie., make ping6 fail
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index eabf530a5223..e9eeaaf52219 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -46,8 +46,18 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 
 	/* set mark on socket */
 	struct bpf_insn prog_mark[] = {
-		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		/* get uid of process */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_get_current_uid_gid),
+		BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xffffffff),
+
+		/* if uid is 0, use given mark, else use the uid as the mark */
+		BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
 		BPF_MOV64_IMM(BPF_REG_3, mark),
+
+		/* set the mark on the new socket */
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
 		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, mark)),
 		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, mark)),
 	};
-- 
2.1.4

^ permalink raw reply related

* [PATCH 0/4] net: stmmac: revert the EMAC bindings
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel

Hi,

The bindings of the stmmac glue for the new Allwinner EMAC controller
are still controversial and being discussed, even though they've been
merged in 4.13.

In order not to introduce any binding we do not really want to commit
to in a stable release, especially since that would mean we would have
to support both the right and old bindings, let's revert them.

We will reintroduce them in due time, once the discussion has settled
down.

The first three patches should go through the arm-soc tree, the last
one through the net tree. All of them must be treated as fixes.

Thanks!
Maxime

Maxime Ripard (4):
  dt-bindings: net: Revert sun8i dwmac binding
  arm64: dts: allwinner: Revert EMAC changes
  arm: dts: sunxi: Revert EMAC changes
  net: stmmac: sun8i: Remove the compatibles

 .../devicetree/bindings/net/dwmac-sun8i.txt        | 84 ----------------------
 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts  |  9 ---
 arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts    | 19 -----
 arch/arm/boot/dts/sun8i-h3-beelink-x2.dts          |  8 ---
 arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts          |  7 --
 arch/arm/boot/dts/sun8i-h3-orangepi-2.dts          |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-one.dts        |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts    |  5 --
 arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts         |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts       | 22 ------
 arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts     | 16 -----
 arch/arm/boot/dts/sunxi-h3-h5.dtsi                 | 26 -------
 .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts | 17 -----
 .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts  | 15 ----
 .../arm64/boot/dts/allwinner/sun50i-a64-pine64.dts | 18 -----
 .../dts/allwinner/sun50i-a64-sopine-baseboard.dts  | 17 -----
 arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi      | 20 ------
 .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts   | 17 -----
 .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts  | 17 -----
 .../dts/allwinner/sun50i-h5-orangepi-prime.dts     | 17 -----
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |  8 ---
 21 files changed, 366 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

-- 
2.13.5

^ permalink raw reply

* [PATCH 1/4] dt-bindings: net: Revert sun8i dwmac binding
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel
In-Reply-To: <20170825191217.10278-1-maxime.ripard@free-electrons.com>

This binding still doesn't please everyone, and we're getting far too
close from the release to allow it to reach a stable version.

Let's remove it until the discussion settles down.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 .../devicetree/bindings/net/dwmac-sun8i.txt        | 84 ----------------------
 1 file changed, 84 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
deleted file mode 100644
index 725f3b187886..000000000000
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ /dev/null
@@ -1,84 +0,0 @@
-* Allwinner sun8i GMAC ethernet controller
-
-This device is a platform glue layer for stmmac.
-Please see stmmac.txt for the other unchanged properties.
-
-Required properties:
-- compatible: should be one of the following string:
-		"allwinner,sun8i-a83t-emac"
-		"allwinner,sun8i-h3-emac"
-		"allwinner,sun8i-v3s-emac"
-		"allwinner,sun50i-a64-emac"
-- reg: address and length of the register for the device.
-- interrupts: interrupt for the device
-- interrupt-names: should be "macirq"
-- clocks: A phandle to the reference clock for this device
-- clock-names: should be "stmmaceth"
-- resets: A phandle to the reset control for this device
-- reset-names: should be "stmmaceth"
-- phy-mode: See ethernet.txt
-- phy-handle: See ethernet.txt
-- #address-cells: shall be 1
-- #size-cells: shall be 0
-- syscon: A phandle to the syscon of the SoC with one of the following
- compatible string:
-  - allwinner,sun8i-h3-system-controller
-  - allwinner,sun8i-v3s-system-controller
-  - allwinner,sun50i-a64-system-controller
-  - allwinner,sun8i-a83t-system-controller
-
-Optional properties:
-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
-
-Optional properties for the following compatibles:
-  - "allwinner,sun8i-h3-emac",
-  - "allwinner,sun8i-v3s-emac":
-- allwinner,leds-active-low: EPHY LEDs are active low
-
-Required child node of emac:
-- mdio bus node: should be named mdio
-
-Required properties of the mdio node:
-- #address-cells: shall be 1
-- #size-cells: shall be 0
-
-The device node referenced by "phy" or "phy-handle" should be a child node
-of the mdio node. See phy.txt for the generic PHY bindings.
-
-Required properties of the phy node with the following compatibles:
-  - "allwinner,sun8i-h3-emac",
-  - "allwinner,sun8i-v3s-emac":
-- clocks: a phandle to the reference clock for the EPHY
-- resets: a phandle to the reset control for the EPHY
-
-Example:
-
-emac: ethernet@1c0b000 {
-	compatible = "allwinner,sun8i-h3-emac";
-	syscon = <&syscon>;
-	reg = <0x01c0b000 0x104>;
-	interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
-	interrupt-names = "macirq";
-	resets = <&ccu RST_BUS_EMAC>;
-	reset-names = "stmmaceth";
-	clocks = <&ccu CLK_BUS_EMAC>;
-	clock-names = "stmmaceth";
-	#address-cells = <1>;
-	#size-cells = <0>;
-
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	mdio: mdio {
-		#address-cells = <1>;
-		#size-cells = <0>;
-		int_mii_phy: ethernet-phy@1 {
-			reg = <1>;
-			clocks = <&ccu CLK_BUS_EPHY>;
-			resets = <&ccu RST_BUS_EPHY>;
-		};
-	};
-};
-- 
2.13.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox