Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 1/4] ipv4: Namespaceify tcp_fastopen knob
From: David Miller @ 2017-09-28 17:47 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <1506483343-11544-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Wed, 27 Sep 2017 11:35:40 +0800

> Different namespace application might require enable TCP Fast Open
> feature independently of the host.
> 
> This patch series continues making more of the TCP Fast Open related
> sysctl knobs be per net-namespace.
> 
> Reported-by: Luca BRUNO <lucab@debian.org>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next RFC 3/9] net: dsa: mv88e6xxx: add support for GPIO configuration
From: Florian Fainelli @ 2017-09-28 17:45 UTC (permalink / raw)
  To: Brandon Streiff, netdev
  Cc: linux-kernel, David S. Miller, Andrew Lunn, Vivien Didelot,
	Richard Cochran, Erik Hons
In-Reply-To: <1506612341-18061-4-git-send-email-brandon.streiff@ni.com>

On 09/28/2017 08:25 AM, Brandon Streiff wrote:
> The Scratch/Misc register is a windowed interface that provides access
> to the GPIO configuration. Provide a new method for configuration of
> GPIO functions.
> 
> Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
> ---

> +/* Offset 0x1A: Scratch and Misc. Register */
> +static int mv88e6xxx_g2_scratch_reg_read(struct mv88e6xxx_chip *chip,
> +					 int reg, u8 *data)
> +{
> +	int err;
> +	u16 value;
> +
> +	err = mv88e6xxx_g2_write(chip, MV88E6XXX_G2_SCRATCH_MISC_MISC,
> +				 reg << 8);
> +	if (err)
> +		return err;
> +
> +	err = mv88e6xxx_g2_read(chip, MV88E6XXX_G2_SCRATCH_MISC_MISC, &value);
> +	if (err)
> +		return err;
> +
> +	*data = (value & MV88E6XXX_G2_SCRATCH_MISC_DATA_MASK);
> +
> +	return 0;
> +}

With the write and read acquiring and then releasing the lock
immediately, is no there room for this sequence to be interrupted in the
middle and end-up returning inconsistent reads?

> +
> +static int mv88e6xxx_g2_scratch_reg_write(struct mv88e6xxx_chip *chip,
> +					  int reg, u8 data)
> +{
> +	u16 value = (reg << 8) | data;
> +
> +	return mv88e6xxx_g2_update(chip, MV88E6XXX_G2_SCRATCH_MISC_MISC, value);
> +}
> +
> +/* Configures the specified pin for the specified function. This function
> + * does not unset other pins configured for the same function. If multiple
> + * pins are configured for the same function, the lower-index pin gets
> + * that function and the higher-index pin goes back to being GPIO.
> + */
> +int mv88e6xxx_g2_set_gpio_config(struct mv88e6xxx_chip *chip, int pin,
> +				 int func, int dir)
> +{
> +	int mode_reg = MV88E6XXX_G2_SCRATCH_GPIO_MODE(pin);
> +	int dir_reg = MV88E6XXX_G2_SCRATCH_GPIO_DIR(pin);
> +	int err;
> +	u8 val;
> +
> +	if (pin < 0 || pin >= mv88e6xxx_num_gpio(chip))
> +		return -ERANGE;
> +
> +	/* Set function first */
> +	err = mv88e6xxx_g2_scratch_reg_read(chip, mode_reg, &val);
> +	if (err)
> +		return err;
> +
> +	/* Zero bits in the field for this GPIO and OR in new config */
> +	val &= ~MV88E6XXX_G2_SCRATCH_GPIO_MODE_MASK(pin);
> +	val |= (func << MV88E6XXX_G2_SCRATCH_GPIO_MODE_OFFSET(pin));
> +
> +	err = mv88e6xxx_g2_scratch_reg_write(chip, mode_reg, val);
> +	if (err)
> +		return err;
> +
> +	/* Set direction */
> +	err = mv88e6xxx_g2_scratch_reg_read(chip, dir_reg, &val);
> +	if (err)
> +		return err;
> +
> +	/* Zero bits in the field for this GPIO and OR in new config */
> +	val &= ~MV88E6XXX_G2_SCRATCH_GPIO_DIR_MASK(pin);
> +	val |= (dir << MV88E6XXX_G2_SCRATCH_GPIO_DIR_OFFSET(pin));
> +
> +	return mv88e6xxx_g2_scratch_reg_write(chip, dir_reg, val);
> +}

Would there be any value in implementing a proper gpiochip structure
here such that other pieces of SW can see this GPIO controller as a
provider and you can reference it from e.g: Device Tree using GPIO
descriptors?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next RFC 6/9] net: dsa: forward timestamping callbacks to switch drivers
From: Florian Fainelli @ 2017-09-28 17:40 UTC (permalink / raw)
  To: Brandon Streiff, netdev
  Cc: linux-kernel, David S. Miller, Andrew Lunn, Vivien Didelot,
	Richard Cochran, Erik Hons
In-Reply-To: <1506612341-18061-7-git-send-email-brandon.streiff@ni.com>

On 09/28/2017 08:25 AM, Brandon Streiff wrote:
> Forward the rx/tx timestamp machinery from the dsa infrastructure to the
> switch driver.
> 
> On the rx side, defer delivery of skbs until we have an rx timestamp.
> This mimicks the behavior of skb_defer_rx_timestamp. The implementation
> does have to thread through the tagging protocol handlers because
> it is where that we know which switch and port the skb goes to.
> 
> On the tx side, identify PTP packets, clone them, and pass them to the
> underlying switch driver before we transmit. This mimicks the behavior
> of skb_tx_timestamp.
> 
> Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
> ---
>  include/net/dsa.h     | 13 +++++++++++--
>  net/dsa/dsa.c         | 39 ++++++++++++++++++++++++++++++++++++++-
>  net/dsa/slave.c       | 25 +++++++++++++++++++++++++
>  net/dsa/tag_brcm.c    |  6 +++++-
>  net/dsa/tag_dsa.c     |  6 +++++-
>  net/dsa/tag_edsa.c    |  6 +++++-
>  net/dsa/tag_ksz.c     |  6 +++++-
>  net/dsa/tag_lan9303.c |  6 +++++-
>  net/dsa/tag_mtk.c     |  6 +++++-
>  net/dsa/tag_qca.c     |  6 +++++-
>  net/dsa/tag_trailer.c |  6 +++++-
>  11 files changed, 114 insertions(+), 11 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 1163af1..4daf7f7 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -101,11 +101,14 @@ struct dsa_platform_data {
>  };
>  
>  struct packet_type;
> +struct dsa_switch;
>  
>  struct dsa_device_ops {
>  	struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
>  	struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
> -			       struct packet_type *pt);
> +			       struct packet_type *pt,
> +			       struct dsa_switch **src_dev,
> +			       int *src_port);
>  	int (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
>  			    int *offset);
>  };
> @@ -134,7 +137,9 @@ struct dsa_switch_tree {
>  	/* Copy of tag_ops->rcv for faster access in hot path */
>  	struct sk_buff *	(*rcv)(struct sk_buff *skb,
>  				       struct net_device *dev,
> -				       struct packet_type *pt);
> +				       struct packet_type *pt,
> +				       struct dsa_switch **src_dev,
> +				       int *src_port);
>  
>  	/*
>  	 * The switch port to which the CPU is attached.
> @@ -449,6 +454,10 @@ struct dsa_switch_ops {
>  				     struct ifreq *ifr);
>  	int	(*port_hwtstamp_set)(struct dsa_switch *ds, int port,
>  				     struct ifreq *ifr);
> +	void	(*port_txtstamp)(struct dsa_switch *ds, int port,
> +				 struct sk_buff *clone, unsigned int type);
> +	bool	(*port_rxtstamp)(struct dsa_switch *ds, int port,
> +				 struct sk_buff *skb, unsigned int type);
>  };
>  
>  struct dsa_switch_driver {
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 81c852e..42e7286 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -22,6 +22,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/sysfs.h>
>  #include <linux/phy_fixed.h>
> +#include <linux/ptp_classify.h>
>  #include <linux/gpio/consumer.h>
>  #include <linux/etherdevice.h>
>  
> @@ -157,6 +158,37 @@ struct net_device *dsa_dev_to_net_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dsa_dev_to_net_device);
>  
> +/* Determine if we should defer delivery of skb until we have a rx timestamp.
> + *
> + * Called from dsa_switch_rcv. For now, this will only work if tagging is
> + * enabled on the switch. Normally the MAC driver would retrieve the hardware
> + * timestamp when it reads the packet out of the hardware. However in a DSA
> + * switch, the DSA driver owning the interface to which the packet is
> + * delivered is never notified unless we do so here.
> + */
> +static bool dsa_skb_defer_rx_timestamp(struct dsa_switch *ds, int port,
> +				       struct sk_buff *skb)

You should not need the port information here because it's already
implied from skb->dev which points to the DSA slave network device, see
below.

> +{
> +	unsigned int type;
> +
> +	if (skb_headroom(skb) < ETH_HLEN)
> +		return false;

Are you positive this is necessary? Because we called dst->rcv() we have
called eth_type_trans() which already made sure about that

> +
> +	__skb_push(skb, ETH_HLEN);
> +
> +	type = ptp_classify_raw(skb);
> +
> +	__skb_pull(skb, ETH_HLEN);
> +
> +	if (type == PTP_CLASS_NONE)
> +		return false;
> +
> +	if (likely(ds->ops->port_rxtstamp))
> +		return ds->ops->port_rxtstamp(ds, port, skb, type);
> +
> +	return false;
> +}

Can we also have a fast-path bypass in case time stamping is not
supported by the switch so we don't have to even try to classify this
packet only to realize we don't have a port_rxtsamp() operation later?
You can either gate this with a compile-time option, or use e.g: a
static key or something like an early test?

> +
>  static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  			  struct packet_type *pt, struct net_device *unused)
>  {
> @@ -164,6 +196,8 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  	struct sk_buff *nskb = NULL;
>  	struct pcpu_sw_netstats *s;
>  	struct dsa_slave_priv *p;
> +	struct dsa_switch *ds = NULL;
> +	int source_port;
>  
>  	if (unlikely(dst == NULL)) {
>  		kfree_skb(skb);
> @@ -174,7 +208,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (!skb)
>  		return 0;
>  
> -	nskb = dst->rcv(skb, dev, pt);
> +	nskb = dst->rcv(skb, dev, pt, &ds, &source_port);

I don't think this is necessary, what dst->rcv() does is actually
properly assign skb->dev to the correct dsa slave network device, which
has the information about the port number already in its private context.

>  	if (!nskb) {
>  		kfree_skb(skb);
>  		return 0;
> @@ -192,6 +226,9 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  	s->rx_bytes += skb->len;
>  	u64_stats_update_end(&s->syncp);
>  
> +	if (dsa_skb_defer_rx_timestamp(ds, source_port, skb))
> +		return 0;

Can we just propagate an integer return value from
dsa_skb_defer_rx_timestamp()?

> +
>  	netif_receive_skb(skb);
>  
>  	return 0;
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 2cf6a83..a278335 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -22,6 +22,7 @@
>  #include <net/tc_act/tc_mirred.h>
>  #include <linux/if_bridge.h>
>  #include <linux/netpoll.h>
> +#include <linux/ptp_classify.h>
>  
>  #include "dsa_priv.h"
>  
> @@ -407,6 +408,25 @@ static inline netdev_tx_t dsa_slave_netpoll_send_skb(struct net_device *dev,
>  	return NETDEV_TX_OK;
>  }
>  
> +static void dsa_skb_tx_timestamp(struct dsa_slave_priv *p,
> +				 struct sk_buff *skb)
> +{
> +	struct dsa_switch *ds = p->dp->ds;
> +	struct sk_buff *clone;
> +	unsigned int type;
> +
> +	type = ptp_classify_raw(skb);
> +	if (type == PTP_CLASS_NONE)
> +		return;

If we don't have a port_txtstamp option, is there even value in
classifying this packet?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next RFC 0/9] net: dsa: PTP timestamping for mv88e6xxx
From: Andrew Lunn @ 2017-09-28 17:36 UTC (permalink / raw)
  To: Brandon Streiff
  Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
	Vivien Didelot, Richard Cochran, Erik Hons
In-Reply-To: <1506612341-18061-1-git-send-email-brandon.streiff@ni.com>

> - Patch #3: The GPIO config support is handled in a very simple manner.
>   I suspect a longer term goal would be to use pinctrl here.

I assume ptp already has the core code to use pinctrl and Linux
standard GPIOs? What does the device tree binding look like? How do
you specify the GPIOs to use?

What we want to avoid is defining an ABI now, otherwise it is going to
be hard to swap to pinctrl later.

> - Patch #6: the dsa_switch pointer and port index is plumbed from
>   dsa_device_ops::rcv so that we can call the correct port_rxtstamp
>   method. This involved instrumenting all of the *_tag_rcv functions in
>   a way that's kind of a kludge and that I'm not terribly happy with.

Yes, this is ugly. I will see if i can find a better way to do
this. 

      Andrew

^ permalink raw reply

* Re: [PATCH v3 net-next 00/10] Add support for DCB feature in hns3 driver
From: David Miller @ 2017-09-28 17:35 UTC (permalink / raw)
  To: linyunsheng
  Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, gabriele.paoloni,
	john.garry, linuxarm, salil.mehta, lipeng321, netdev,
	linux-kernel
In-Reply-To: <1506476732-128130-1-git-send-email-linyunsheng@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 27 Sep 2017 09:45:22 +0800

> The patchset contains some enhancement related to DCB before
> adding support for DCB feature.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net] net: Set sk_prot_creator when cloning sockets to the right proto
From: David Miller @ 2017-09-28 17:34 UTC (permalink / raw)
  To: cpaasch; +Cc: netdev
In-Reply-To: <20170927003850.23731-1-cpaasch@apple.com>

From: Christoph Paasch <cpaasch@apple.com>
Date: Tue, 26 Sep 2017 17:38:50 -0700

> sk->sk_prot and sk->sk_prot_creator can differ when the app uses
> IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
> Which is why sk_prot_creator is there to make sure that sk_prot_free()
> does the kmem_cache_free() on the right kmem_cache slab.
> 
> Now, if such a socket gets transformed back to a listening socket (using
> connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
> sk_clone_lock() when a new connection comes in. But sk_prot_creator will
> still point to the IPv6 kmem_cache (as everything got copied in
> sk_clone_lock()). When freeing, we will thus put this
> memory back into the IPv6 kmem_cache although it was allocated in the
> IPv4 cache. I have seen memory corruption happening because of this.
> 
> With slub-debugging and MEMCG_KMEM enabled this gives the warning
> 	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
> 
> A C-program to trigger this:
 ...
> As far as I can see, this bug has been there since the beginning of the
> git-days.
> 
> Signed-off-by: Christoph Paasch <cpaasch@apple.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [patch net-next 7/7] mlxsw: spectrum: mr: Support trap-and-forward routes
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

Add the support of trap-and-forward route action in the multicast routing
offloading logic. A route will be set to trap-and-forward action if one (or
more) of its output interfaces is not offload-able, i.e. does not have a
valid Spectrum RIF.

This way, a route with mixed output VIFs list, which contains both
offload-able and un-offload-able devices can go through partial offloading
in hardware, and the rest will be done in the kernel ipmr module.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
index 0912025..4c0848e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
@@ -114,9 +114,9 @@ static bool mlxsw_sp_mr_vif_valid(const struct mlxsw_sp_mr_vif *vif)
 	return mlxsw_sp_mr_vif_regular(vif) && vif->dev && vif->rif;
 }
 
-static bool mlxsw_sp_mr_vif_rif_invalid(const struct mlxsw_sp_mr_vif *vif)
+static bool mlxsw_sp_mr_vif_exists(const struct mlxsw_sp_mr_vif *vif)
 {
-	return mlxsw_sp_mr_vif_regular(vif) && vif->dev && !vif->rif;
+	return vif->dev;
 }
 
 static bool
@@ -182,14 +182,13 @@ mlxsw_sp_mr_route_action(const struct mlxsw_sp_mr_route *mr_route)
 	if (!mlxsw_sp_mr_route_valid_evifs_num(mr_route))
 		return MLXSW_SP_MR_ROUTE_ACTION_TRAP;
 
-	/* If either one of the eVIFs is not regular (VIF of type pimreg or
-	 * tunnel) or one of the VIFs has no matching RIF, trap the packet.
+	/* If one of the eVIFs has no RIF, trap-and-forward the route as there
+	 * is some more routing to do in software too.
 	 */
-	list_for_each_entry(rve, &mr_route->evif_list, route_node) {
-		if (!mlxsw_sp_mr_vif_regular(rve->mr_vif) ||
-		    mlxsw_sp_mr_vif_rif_invalid(rve->mr_vif))
-			return MLXSW_SP_MR_ROUTE_ACTION_TRAP;
-	}
+	list_for_each_entry(rve, &mr_route->evif_list, route_node)
+		if (mlxsw_sp_mr_vif_exists(rve->mr_vif) && !rve->mr_vif->rif)
+			return MLXSW_SP_MR_ROUTE_ACTION_TRAP_AND_FORWARD;
+
 	return MLXSW_SP_MR_ROUTE_ACTION_FORWARD;
 }
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 6/7] mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast route
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

In addition to the current multicast route actions, which include trap
route action and a forward route action, add the trap-and-forward multicast
route action, and implement it in the multicast routing hardware logic.

To implement that, add a trap-and-forward ACL action as the last action in
the route flexible action set. The used trap is the ACL2 trap, which marks
the packets with offload_mr_forward_mark, to prevent the packet from being
forwarded again by the kernel.

Note: At that stage the offloading logic does not support trap-and-forward
multicast routes. This patch adds the support only in the hardware logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h      | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
index c851b23..5d26a12 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
@@ -42,6 +42,7 @@
 enum mlxsw_sp_mr_route_action {
 	MLXSW_SP_MR_ROUTE_ACTION_FORWARD,
 	MLXSW_SP_MR_ROUTE_ACTION_TRAP,
+	MLXSW_SP_MR_ROUTE_ACTION_TRAP_AND_FORWARD,
 };
 
 enum mlxsw_sp_mr_route_prio {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
index cda9e9a..3ffb28d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
@@ -253,6 +253,7 @@ mlxsw_sp_mr_tcam_afa_block_create(struct mlxsw_sp *mlxsw_sp,
 		if (err)
 			goto err;
 		break;
+	case MLXSW_SP_MR_ROUTE_ACTION_TRAP_AND_FORWARD:
 	case MLXSW_SP_MR_ROUTE_ACTION_FORWARD:
 		/* If we are about to append a multicast router action, commit
 		 * the erif_list.
@@ -266,6 +267,13 @@ mlxsw_sp_mr_tcam_afa_block_create(struct mlxsw_sp *mlxsw_sp,
 						      erif_list->kvdl_index);
 		if (err)
 			goto err;
+
+		if (route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP_AND_FORWARD) {
+			err = mlxsw_afa_block_append_trap_and_forward(afa_block,
+								      MLXSW_TRAP_ID_ACL2);
+			if (err)
+				goto err;
+		}
 		break;
 	default:
 		err = -EINVAL;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 5/7] mlxsw: spectrum: Add trap for multicast trap-and-forward routes
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

When a multicast route is configured with trap-and-forward action, the
packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
the packets from being forwarded again by the kernel ipmr module.

Due to this, it is not possible to use the already existing multicast trap
(MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
the offload_mr_fwd_mark skb field in its handler.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 13 +++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/trap.h     |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index e9b9443..3adf237 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3312,6 +3312,14 @@ static void mlxsw_sp_rx_listener_mark_func(struct sk_buff *skb, u8 local_port,
 	return mlxsw_sp_rx_listener_no_mark_func(skb, local_port, priv);
 }
 
+static void mlxsw_sp_rx_listener_mr_mark_func(struct sk_buff *skb,
+					      u8 local_port, void *priv)
+{
+	skb->offload_mr_fwd_mark = 1;
+	skb->offload_fwd_mark = 1;
+	return mlxsw_sp_rx_listener_no_mark_func(skb, local_port, priv);
+}
+
 static void mlxsw_sp_rx_listener_sample_func(struct sk_buff *skb, u8 local_port,
 					     void *priv)
 {
@@ -3355,6 +3363,10 @@ static void mlxsw_sp_rx_listener_sample_func(struct sk_buff *skb, u8 local_port,
 	MLXSW_RXL(mlxsw_sp_rx_listener_mark_func, _trap_id, _action,	\
 		_is_ctrl, SP_##_trap_group, DISCARD)
 
+#define MLXSW_SP_RXL_MR_MARK(_trap_id, _action, _trap_group, _is_ctrl)	\
+	MLXSW_RXL(mlxsw_sp_rx_listener_mr_mark_func, _trap_id, _action,	\
+		_is_ctrl, SP_##_trap_group, DISCARD)
+
 #define MLXSW_SP_EVENTL(_func, _trap_id)		\
 	MLXSW_EVENTL(_func, _trap_id, SP_EVENT)
 
@@ -3425,6 +3437,7 @@ static const struct mlxsw_listener mlxsw_sp_listener[] = {
 	MLXSW_SP_RXL_MARK(IPV4_PIM, TRAP_TO_CPU, PIM, false),
 	MLXSW_SP_RXL_MARK(RPF, TRAP_TO_CPU, RPF, false),
 	MLXSW_SP_RXL_MARK(ACL1, TRAP_TO_CPU, MULTICAST, false),
+	MLXSW_SP_RXL_MR_MARK(ACL2, TRAP_TO_CPU, MULTICAST, false),
 };
 
 static int mlxsw_sp_cpu_policers_set(struct mlxsw_core *mlxsw_core)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/trap.h b/drivers/net/ethernet/mellanox/mlxsw/trap.h
index a981035..ec6cef8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/trap.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/trap.h
@@ -93,6 +93,8 @@ enum {
 	MLXSW_TRAP_ID_ACL0 = 0x1C0,
 	/* Multicast trap used for routes with trap action */
 	MLXSW_TRAP_ID_ACL1 = 0x1C1,
+	/* Multicast trap used for routes with trap-and-forward action */
+	MLXSW_TRAP_ID_ACL2 = 0x1C2,
 
 	MLXSW_TRAP_ID_MAX = 0x1FF
 };
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 4/7] mlxsw: acl: Introduce ACL trap and forward action
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

Use trap/discard flex action to implement trap and forward. The action will
later be used for multicast routing, as the multicast routing mechanism is
done using ACL flexible actions in Spectrum hardware. Using that action, it
will be possible to implement a trap-and-forward route.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c | 17 +++++++++++++++++
 .../net/ethernet/mellanox/mlxsw/core_acl_flex_actions.h |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index bc55d0e..6a979a0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -676,6 +676,7 @@ enum mlxsw_afa_trapdisc_trap_action {
 MLXSW_ITEM32(afa, trapdisc, trap_action, 0x00, 24, 4);
 
 enum mlxsw_afa_trapdisc_forward_action {
+	MLXSW_AFA_TRAPDISC_FORWARD_ACTION_FORWARD = 1,
 	MLXSW_AFA_TRAPDISC_FORWARD_ACTION_DISCARD = 3,
 };
 
@@ -729,6 +730,22 @@ int mlxsw_afa_block_append_trap(struct mlxsw_afa_block *block, u16 trap_id)
 }
 EXPORT_SYMBOL(mlxsw_afa_block_append_trap);
 
+int mlxsw_afa_block_append_trap_and_forward(struct mlxsw_afa_block *block,
+					    u16 trap_id)
+{
+	char *act = mlxsw_afa_block_append_action(block,
+						  MLXSW_AFA_TRAPDISC_CODE,
+						  MLXSW_AFA_TRAPDISC_SIZE);
+
+	if (!act)
+		return -ENOBUFS;
+	mlxsw_afa_trapdisc_pack(act, MLXSW_AFA_TRAPDISC_TRAP_ACTION_TRAP,
+				MLXSW_AFA_TRAPDISC_FORWARD_ACTION_FORWARD,
+				trap_id);
+	return 0;
+}
+EXPORT_SYMBOL(mlxsw_afa_block_append_trap_and_forward);
+
 /* Forwarding Action
  * -----------------
  * Forwarding Action can be used to implement Policy Based Switching (PBS)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.h b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.h
index 06b0be4..a8d3314 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.h
@@ -61,6 +61,8 @@ int mlxsw_afa_block_continue(struct mlxsw_afa_block *block);
 int mlxsw_afa_block_jump(struct mlxsw_afa_block *block, u16 group_id);
 int mlxsw_afa_block_append_drop(struct mlxsw_afa_block *block);
 int mlxsw_afa_block_append_trap(struct mlxsw_afa_block *block, u16 trap_id);
+int mlxsw_afa_block_append_trap_and_forward(struct mlxsw_afa_block *block,
+					    u16 trap_id);
 int mlxsw_afa_block_append_fwd(struct mlxsw_afa_block *block,
 			       u8 local_port, bool in_port);
 int mlxsw_afa_block_append_vlan_modify(struct mlxsw_afa_block *block,
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 3/7] ipv4: ipmr: Don't forward packets already forwarded by hardware
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

Change the ipmr module to not forward packets if:
 - The packet is marked with the offload_mr_fwd_mark, and
 - Both input interface and output interface share the same parent ID.

This way, a packet can go through partial multicast forwarding in the
hardware, where it will be forwarded only to the devices that share the
same parent ID (AKA, reside inside the same hardware). The kernel will
forward the packet to all other interfaces.

To do this, add the ipmr_offload_forward helper, which per skb, ingress VIF
and egress VIF, returns whether the forwarding was offloaded to hardware.
The ipmr_queue_xmit frees the skb and does not forward it if the result is
a true value.

All the forwarding path code compiles out when the CONFIG_NET_SWITCHDEV is
not set.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/ipv4/ipmr.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 4566c54..deba569 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1857,10 +1857,33 @@ static inline int ipmr_forward_finish(struct net *net, struct sock *sk,
 	return dst_output(net, sk, skb);
 }
 
+#ifdef CONFIG_NET_SWITCHDEV
+static bool ipmr_forward_offloaded(struct sk_buff *skb, struct mr_table *mrt,
+				   int in_vifi, int out_vifi)
+{
+	struct vif_device *out_vif = &mrt->vif_table[out_vifi];
+	struct vif_device *in_vif = &mrt->vif_table[in_vifi];
+
+	if (!skb->offload_mr_fwd_mark)
+		return false;
+	if (!out_vif->dev_parent_id_valid || !in_vif->dev_parent_id_valid)
+		return false;
+	return netdev_phys_item_id_same(&out_vif->dev_parent_id,
+					&in_vif->dev_parent_id);
+}
+#else
+static bool ipmr_forward_offloaded(struct sk_buff *skb, struct mr_table *mrt,
+				   int in_vifi, int out_vifi)
+{
+	return false;
+}
+#endif
+
 /* Processing handlers for ipmr_forward */
 
 static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
-			    struct sk_buff *skb, struct mfc_cache *c, int vifi)
+			    int in_vifi, struct sk_buff *skb,
+			    struct mfc_cache *c, int vifi)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	struct vif_device *vif = &mrt->vif_table[vifi];
@@ -1881,6 +1904,9 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		goto out_free;
 	}
 
+	if (ipmr_forward_offloaded(skb, mrt, in_vifi, vifi))
+		goto out_free;
+
 	if (vif->flags & VIFF_TUNNEL) {
 		rt = ip_route_output_ports(net, &fl4, NULL,
 					   vif->remote, vif->local,
@@ -2058,8 +2084,8 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 				struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 
 				if (skb2)
-					ipmr_queue_xmit(net, mrt, skb2, cache,
-							psend);
+					ipmr_queue_xmit(net, mrt, true_vifi,
+							skb2, cache, psend);
 			}
 			psend = ct;
 		}
@@ -2070,9 +2096,10 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 			struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 
 			if (skb2)
-				ipmr_queue_xmit(net, mrt, skb2, cache, psend);
+				ipmr_queue_xmit(net, mrt, true_vifi, skb2,
+						cache, psend);
 		} else {
-			ipmr_queue_xmit(net, mrt, skb, cache, psend);
+			ipmr_queue_xmit(net, mrt, true_vifi, skb, cache, psend);
 			return;
 		}
 	}
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
used to allow partial offloading of MFC multicast routes.

Switchdev drivers can offload MFC multicast routes to the hardware by
registering to the FIB notification chain. When one of the route output
interfaces is not offload-able, i.e. has different parent ID, the route
cannot be fully offloaded by the hardware. Examples to non-offload-able
devices are a management NIC, dummy device, pimreg device, etc.

Similar problem exists in the bridge module, as one bridge can hold
interfaces with different parent IDs. At the bridge, the problem is solved
by the offload_fwd_mark skb field.

Currently, when a route cannot go through full offload, the only solution
for a switchdev driver is not to offload it at all and let the packet go
through slow path.

Using the offload_mr_fwd_mark field, a driver can indicate that a packet
was already forwarded by hardware to all the devices with the same parent
ID as the input device. Further patches in this patch-set are going to
enhance ipmr to skip multicast forwarding to devices with the same parent
ID if a packets is marked with that field.

The reason why the already existing "offload_fwd_mark" bit cannot be used
is that a switchdev driver would want to make the distinction between a
packet that has already gone through L2 forwarding but did not go through
multicast forwarding, and a packet that has already gone through both L2
and multicast forwarding.

For example: when a packet is ingressing from a switchport enslaved to a
bridge, which is configured with multicast forwarding, the following
scenarios are possible:
 - The packet can be trapped to the CPU due to exception while multicast
   forwarding (for example, MTU error). In that case, it had already gone
   through L2 forwarding in the hardware, thus A switchdev driver would
   want to set the skb->offload_fwd_mark and not the
   skb->offload_mr_fwd_mark.
 - The packet can also be trapped due to a pimreg/dummy device used as one
   of the output interfaces. In that case, it can go through both L2 and
   (partial) multicast forwarding inside the hardware, thus a switchdev
   driver would want to set both the skb->offload_fwd_mark and
   skb->offload_mr_fwd_mark.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellaox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/skbuff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 19e64bf..ada8214 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -772,6 +772,7 @@ struct sk_buff {
 	__u8			remcsum_offload:1;
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
+	__u8			offload_mr_fwd_mark:1;
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	__u8			tc_skip_classify:1;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 2/7] ipv4: ipmr: Add the parent ID field to VIF struct
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-1-jiri@resnulli.us>

From: Yotam Gigi <yotamg@mellanox.com>

In order to allow the ipmr module to do partial multicast forwarding
according to the device parent ID, add the device parent ID field to the
VIF struct. This way, the forwarding path can use the parent ID field
without invoking switchdev calls, which requires the RTNL lock.

When a new VIF is added, set the device parent ID field in it by invoking
the switchdev_port_attr_get call.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/mroute.h | 2 ++
 net/ipv4/ipmr.c        | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index b072a84..a46577f 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -57,6 +57,8 @@ static inline bool ipmr_rule_default(const struct fib_rule *rule)
 
 struct vif_device {
 	struct net_device 	*dev;			/* Device we are using */
+	struct netdev_phys_item_id dev_parent_id;	/* Device parent ID    */
+	bool		dev_parent_id_valid;
 	unsigned long	bytes_in,bytes_out;
 	unsigned long	pkt_in,pkt_out;		/* Statistics 			*/
 	unsigned long	rate_limit;		/* Traffic shaping (NI) 	*/
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 292a8e8..4566c54 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -67,6 +67,7 @@
 #include <net/fib_rules.h>
 #include <linux/netconf.h>
 #include <net/nexthop.h>
+#include <net/switchdev.h>
 
 struct ipmr_rule {
 	struct fib_rule		common;
@@ -868,6 +869,9 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 		   struct vifctl *vifc, int mrtsock)
 {
 	int vifi = vifc->vifc_vifi;
+	struct switchdev_attr attr = {
+		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
+	};
 	struct vif_device *v = &mrt->vif_table[vifi];
 	struct net_device *dev;
 	struct in_device *in_dev;
@@ -942,6 +946,11 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 
 	/* Fill in the VIF structures */
 
+	attr.orig_dev = dev;
+	if (!switchdev_port_attr_get(dev, &attr)) {
+		v->dev_parent_id_valid = true;
+		memcpy(v->dev_parent_id.id, attr.u.ppid.id, attr.u.ppid.id_len);
+	}
 	v->rate_limit = vifc->vifc_rate_limit;
 	v->local = vifc->vifc_lcl_addr.s_addr;
 	v->remote = vifc->vifc_rmt_addr.s_addr;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 0/7] mlxsw: Add support for partial multicast route offload
From: Jiri Pirko @ 2017-09-28 17:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind

From: Jiri Pirko <jiri@mellanox.com>

Yotam says:

Previous patchset introduced support for offloading multicast MFC routes to
the Spectrum hardware. As described in that patchset, no partial offloading
is supported, i.e if a route has one output interface which is not a valid
offloadable device (e.g. pimreg device, dummy device, management NIC), the
route is trapped to the CPU and the forwarding is done in slow-path.

Add support for partial offloading of multicast routes, by letting the
hardware to forward the packet to all the in-hardware devices, while the
kernel ipmr module will continue forwarding to all other interfaces.

Similarly to the bridge, the kernel ipmr module will forward a marked
packet to an interface only if the interface has a different parent ID than
the packet's ingress interfaces.

The first patch introduces the offload_mr_fwd_mark skb field, which can be
used by offloading drivers to indicate that a packet had already gone
through multicast forwarding in hardware, similarly to the offload_fwd_mark
field that indicates that a packet had already gone through L2 forwarding
in hardware.

Patches 2 and 3 change the ipmr module to not forward packets that had
already been forwarded by the hardware, i.e. packets that are marked with
offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the
egress VIF.

Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap
and forward routes, while marking the trapped packets with the
offload_mr_fwd_mark.

Yotam Gigi (7):
  skbuff: Add the offload_mr_fwd_mark field
  ipv4: ipmr: Add the parent ID field to VIF struct
  ipv4: ipmr: Don't forward packets already forwarded by hardware
  mlxsw: acl: Introduce ACL trap and forward action
  mlxsw: spectrum: Add trap for multicast trap-and-forward routes
  mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast route
  mlxsw: spectrum: mr: Support trap-and-forward routes

 .../mellanox/mlxsw/core_acl_flex_actions.c         | 17 ++++++++
 .../mellanox/mlxsw/core_acl_flex_actions.h         |  2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 13 ++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c  | 17 ++++----
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h  |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c |  8 ++++
 drivers/net/ethernet/mellanox/mlxsw/trap.h         |  2 +
 include/linux/mroute.h                             |  2 +
 include/linux/skbuff.h                             |  1 +
 net/ipv4/ipmr.c                                    | 46 +++++++++++++++++++---
 10 files changed, 95 insertions(+), 14 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [iproute PATCH] ip-route: Fix for listing routes with RTAX_LOCK attribute
From: Phil Sutter @ 2017-09-28 17:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Thomas Haller, Hangbin Liu

This fixes a corner-case for routes with a certain metric locked to
zero:

| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0

Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.

Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/iproute.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index a8733f45bf881..e81bc05ec16cb 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -574,10 +574,10 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		for (i = 2; i <= RTAX_MAX; i++) {
 			__u32 val = 0U;
 
-			if (mxrta[i] == NULL)
+			if (mxrta[i] == NULL && !(mxlock & (1 << i)))
 				continue;
 
-			if (i != RTAX_CC_ALGO)
+			if (mxrta[i] != NULL && i != RTAX_CC_ALGO)
 				val = rta_getattr_u32(mxrta[i]);
 
 			if (i == RTAX_HOPLIMIT && (int)val == -1)
-- 
2.13.1

^ permalink raw reply related

* Re: [PATCH net-next] libbpf: use map_flags when creating maps
From: Craig Gallek @ 2017-09-28 17:33 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S . Miller, Chonggang Li, netdev
In-Reply-To: <59CC201C.6090502@iogearbox.net>

On Wed, Sep 27, 2017 at 6:03 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 09/27/2017 06:29 PM, Alexei Starovoitov wrote:
>>
>> On 9/27/17 7:04 AM, Craig Gallek wrote:
>>>
>>> From: Craig Gallek <kraig@google.com>
>>>
>>> This extends struct bpf_map_def to include a flags field.  Note that
>>> this has the potential to break the validation logic in
>>> bpf_object__validate_maps and bpf_object__init_maps as they use
>>> sizeof(struct bpf_map_def) as a minimal allowable size of a map section.
>>> Any bpf program compiled with a smaller struct bpf_map_def will fail this
>>> check.
>>>
>>> I don't believe this will be an issue in practice as both compile-time
>>> definitions of struct bpf_map_def (in samples/bpf/bpf_load.h and
>>> tools/testing/selftests/bpf/bpf_helpers.h) have always been larger
>>> than this newly updated version in libbpf.h.
>>>
>>> Signed-off-by: Craig Gallek <kraig@google.com>
>>> ---
>>>  tools/lib/bpf/libbpf.c | 2 +-
>>>  tools/lib/bpf/libbpf.h | 1 +
>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>> index 35f6dfcdc565..6bea85f260a3 100644
>>> --- a/tools/lib/bpf/libbpf.c
>>> +++ b/tools/lib/bpf/libbpf.c
>>> @@ -874,7 +874,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>>                        def->key_size,
>>>                        def->value_size,
>>>                        def->max_entries,
>>> -                      0);
>>> +                      def->map_flags);
>>>          if (*pfd < 0) {
>>>              size_t j;
>>>              int err = *pfd;
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index 7959086eb9c9..6e20003109e0 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -207,6 +207,7 @@ struct bpf_map_def {
>>>      unsigned int key_size;
>>>      unsigned int value_size;
>>>      unsigned int max_entries;
>>> +    unsigned int map_flags;
>>>  };
>>
>>
>> yes it will break loading of pre-compiled .o
>> Instead of breaking, let's fix the loader to do it the way
>> samples/bpf/bpf_load.c does.
>> See commit 156450d9d964 ("samples/bpf: make bpf_load.c code compatible
>> with ELF maps section changes")
>
>
> +1, iproute2 loader also does map spec fixup
>
> For libbpf it would be good also such that it reduces the diff
> further between the libbpf and bpf_load so that it allows move
> to libbpf for samples in future.

Fair enough, I'll try to get this to work more dynamically.  I did
noticed that the fields of struct bpf_map_def in
selftests/.../bpf_helpers.h and iproute2's struct bpf_elf_map have
diverged. The flags field is the only thing missing from libbpf right
now (and they are at the same offset for both), so it won't be an
issue for this change, but it is going to make unifying all of these
things under libbpf not trivial at some point...

^ permalink raw reply

* Re: [PATCH] arp: make arp_hdr_len() return unsigned int
From: David Miller @ 2017-09-28 17:29 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20170926201228.GA31899@avx2>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 26 Sep 2017 23:12:28 +0300

> Negative ARP header length are not a thing.
> 
> Constify arguments while I'm at it.
> 
> Space savings:
> 
> 	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3)
> 	function                        old     new   delta
> 	arpt_do_table                  1163    1160      -3
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v2] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: David Miller @ 2017-09-28 17:29 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev, linux-kernel, kernel, f.fainelli, andrew
In-Reply-To: <20170926185721.12187-1-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Tue, 26 Sep 2017 14:57:21 -0400

> mv88e6xxx_g2_irq_free locks the registers mutex, but not
> mv88e6xxx_g1_irq_free, which results in a stack trace from
> assert_reg_lock when unloading the mv88e6xxx module. Fix this.
> 
> Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] net-next/hinic: Fix a case of Tx Queue is Stopped forever
From: David Miller @ 2017-09-28 17:27 UTC (permalink / raw)
  To: aviad.krawczyk; +Cc: linux-kernel, netdev
In-Reply-To: <1506449493-83395-1-git-send-email-aviad.krawczyk@huawei.com>

From: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Date: Wed, 27 Sep 2017 02:11:33 +0800

> Fix the following scenario:
> 1. tx_free_poll is running on cpu X
> 2. xmit function is running on cpu Y and fails to get sq wqe
> 3. tx_free_poll frees wqes on cpu X and checks the queue is not stopped
> 4. xmit function stops the queue after failed to get sq wqe
> 5. The queue is stopped forever
> 
> Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net-next/hinic: Set Rxq irq to specific cpu for NUMA
From: David Miller @ 2017-09-28 17:27 UTC (permalink / raw)
  To: aviad.krawczyk; +Cc: linux-kernel, netdev
In-Reply-To: <1506448670-51213-1-git-send-email-aviad.krawczyk@huawei.com>

From: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Date: Wed, 27 Sep 2017 01:57:50 +0800

> Set Rxq irq to specific cpu for allocating and receiving the skb from
> the same node.
> 
> Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next RFC 5/9] net: dsa: forward hardware timestamping ioctls to switch driver
From: Florian Fainelli @ 2017-09-28 17:25 UTC (permalink / raw)
  To: Brandon Streiff, netdev
  Cc: linux-kernel, David S. Miller, Andrew Lunn, Vivien Didelot,
	Richard Cochran, Erik Hons
In-Reply-To: <1506612341-18061-6-git-send-email-brandon.streiff@ni.com>

On 09/28/2017 08:25 AM, Brandon Streiff wrote:
> This patch adds support to the dsa slave network device so that
> switch drivers can implement the SIOC[GS]HWTSTAMP ioctls and the
> ethtool timestamp-info interface.
> 
> Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
> ---

>  struct dsa_switch_driver {
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index bf8800d..2cf6a83 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -264,10 +264,34 @@ dsa_slave_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
>  
>  static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>  {
> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_switch *ds = p->dp->ds;
> +	int port = p->dp->index;
> +
>  	if (!dev->phydev)
>  		return -ENODEV;
>  
> -	return phy_mii_ioctl(dev->phydev, ifr, cmd);
> +	switch (cmd) {
> +	case SIOCGMIIPHY:
> +	case SIOCGMIIREG:
> +	case SIOCSMIIREG:
> +		if (dev->phydev)
> +			return phy_mii_ioctl(dev->phydev, ifr, cmd);
> +		else
> +			return -EOPNOTSUPP;
> +	case SIOCGHWTSTAMP:
> +		if (ds->ops->port_hwtstamp_get)
> +			return ds->ops->port_hwtstamp_get(ds, port, ifr);
> +		else
> +			return -EOPNOTSUPP;
> +	case SIOCSHWTSTAMP:
> +		if (ds->ops->port_hwtstamp_set)
> +			return ds->ops->port_hwtstamp_set(ds, port, ifr);
> +		else
> +			return -EOPNOTSUPP;
> +	default:
> +		return -EOPNOTSUPP;
> +	}

This echoes back to Andrew's comments in patch 2, but we may have to
prefer PHY timestamping over MAC timestamping if both are available?
Richard, is that usually how the preference should be made?
--
Florian

^ permalink raw reply

* Re: [PATCH net] packet: only test po->has_vnet_hdr once in packet_snd
From: David Miller @ 2017-09-28 17:25 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <20170926162017.60750-1-willemdebruijn.kernel@gmail.com>

aFrom: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 26 Sep 2017 12:20:17 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Packet socket option po->has_vnet_hdr can be updated concurrently with
> other operations if no ring is attached.
> 
> Do not test the option twice in packet_snd, as the value may change in
> between calls. A race on setsockopt disable may cause a packet > mtu
> to be sent without having GSO options set.
> 
> Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] packet: in packet_do_bind, test fanout with bind_lock held
From: David Miller @ 2017-09-28 17:25 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <20170926161937.60597-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 26 Sep 2017 12:19:37 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Once a socket has po->fanout set, it remains a member of the group
> until it is destroyed. The prot_hook must be constant and identical
> across sockets in the group.
> 
> If fanout_add races with packet_do_bind between the test of po->fanout
> and taking the lock, the bind call may make type or dev inconsistent
> with that of the fanout group.
> 
> Hold po->bind_lock when testing po->fanout to avoid this race.
> 
> I had to introduce artificial delay (local_bh_enable) to actually
> observe the race.
> 
> Fixes: dc99f600698d ("packet: Add fanout support.")
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/2] bpf/verifier: disassembly improvements
From: David Miller @ 2017-09-28 17:24 UTC (permalink / raw)
  To: ecree; +Cc: netdev, daniel, alexei.starovoitov, ys114321
In-Reply-To: <52270348-67f1-4e7a-cd2f-9d611ae94064@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 26 Sep 2017 16:32:15 +0100

> Fix the output of print_bpf_insn() for ALU ops that don't look like
>  compound assignment (i.e. BPF_END and BPF_NEG).
> 
> Sample output for a short test program:
> 0: (b4) (u32) r0 = (u32) 0
> 1: (dc) r0 = be32 r0
> 2: (84) r0 = (u32) -r0
> 3: (95) exit
> processed 4 insns, stack depth 0

Series applied.

^ permalink raw reply

* Re: [PATCH 2/4] ravb: Add optional PHY reset during system resume
From: Florian Fainelli @ 2017-09-28 17:22 UTC (permalink / raw)
  To: Geert Uytterhoeven, David S . Miller, Simon Horman, Magnus Damm
  Cc: Sergei Shtylyov, Andrew Lunn, Niklas Söderlund, netdev,
	linux-renesas-soc, devicetree
In-Reply-To: <1506614014-4398-3-git-send-email-geert+renesas@glider.be>

On 09/28/2017 08:53 AM, Geert Uytterhoeven wrote:
> If the optional "reset-gpios" property is specified in DT, the generic
> MDIO bus code takes care of resetting the PHY during device probe.
> However, the PHY may still have to be reset explicitly after system
> resume.
> 
> This allows to restore Ethernet operation after resume from s2ram on
> Salvator-XS, where the enable pin of the regulator providing PHY power
> is connected to PRESETn, and PSCI suspend powers down the SoC.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index fdf30bfa403bf416..96d1d48e302f8c9a 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -19,6 +19,7 @@
>  #include <linux/etherdevice.h>
>  #include <linux/ethtool.h>
>  #include <linux/if_vlan.h>
> +#include <linux/gpio/consumer.h>
>  #include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
> @@ -2268,6 +2269,7 @@ static int __maybe_unused ravb_resume(struct device *dev)
>  {
>  	struct net_device *ndev = dev_get_drvdata(dev);
>  	struct ravb_private *priv = netdev_priv(ndev);
> +	struct mii_bus *bus = priv->mii_bus;
>  	int ret = 0;
>  
>  	if (priv->wol_enabled) {
> @@ -2302,6 +2304,13 @@ static int __maybe_unused ravb_resume(struct device *dev)
>  	 * reopen device if it was running before system suspended.
>  	 */
>  
> +	/* PHY reset */
> +	if (bus->reset_gpiod) {
> +		gpiod_set_value_cansleep(bus->reset_gpiod, 1);
> +		udelay(bus->reset_delay_us);
> +		gpiod_set_value_cansleep(bus->reset_gpiod, 0);
> +	}

This is a clever hack, but unfortunately this is also misusing the MDIO
bus reset line into a PHY reset line. As commented in patch 3, if this
reset line is tied to the PHY, then this should be a PHY property and
you cannot (ab)use the MDIO bus GPIO reset logic anymore...

Should not you also try to manage this reset line during ravb_open() to
achiever better power savings?
-- 
Florian

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox