Netdev List
 help / color / mirror / Atom feed
* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, phil
In-Reply-To: <20160313232651.354f6087@xeon-e3>

Sorry dropped the ball on this..

On 16-03-14 02:26 AM, Stephen Hemminger wrote:
> On Wed,  9 Mar 2016 07:04:36 -0500
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

> This code has diverged way from the general rule that ip utilities display
> format should match the command format. For example the properties shown
> on "ip route show" match those of "ip route add".
>

Valid point (and thanks for catching this since i tend to be the biggest
whiner  on this topic ;-> I will make the changes - doesnt seem to be
far off already.
Note: in ife case it may not always symetric because there are optional
fields which may be absent in a request to the kernel but present in
a response.

> Also over the last several years, the code in iproute2 has switched from casting
> RTA_DATA() everywhere to a cleaner interface rte_getattr_u32() more like what
> is used in mnl library.
>

Will convert where it makes sense..

> The code has also gotten deeply intended creating lots of lines that are too long.
>

Is this you or the script saying the above? How is the conclusion that
we have deep indentation come about?
I checked there are some code lines that are > 80 characters because
it doesnt make sense to break them down.

> WARNING: 'doesnt' may be misspelled - perhaps 'doesn't'?
> #21:

What, checking my spelling now? ;->
I am on the internets dude!

> then provide a default so the user doesnt have to specify it.
>
> WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
> #25:
>              "Distributing Linux Traffic Control Classifier-Action Subsystem"
>

75 character? ;-> What happened to 80?

> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #143:
> new file mode 100644
>
> ERROR: "foo * bar" should be "foo *bar"


Will fix above and rest shortly.

Also, promise to send man page later. Ive coerced someone to do it;->

cheers,
jamal

^ permalink raw reply

* [PATCH net V1 2/8] net/mlx5e: Fix MLX5E_100BASE_T define
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Rana Shahout,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Rana Shahout <ranas@mellanox.com>

Bit 25 of eth_proto_capability in PTYS register is
1000Base-TT and not 100Base-T.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to
support ConnectX-4 Ethernet functionality')
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 879e627..e80ce94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -609,7 +609,7 @@ enum mlx5e_link_mode {
 	MLX5E_100GBASE_KR4	 = 22,
 	MLX5E_100GBASE_LR4	 = 23,
 	MLX5E_100BASE_TX	 = 24,
-	MLX5E_100BASE_T		 = 25,
+	MLX5E_1000BASE_T	 = 25,
 	MLX5E_10GBASE_T		 = 26,
 	MLX5E_25GBASE_CR	 = 27,
 	MLX5E_25GBASE_KR	 = 28,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 68834b7..3476ab8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -138,10 +138,10 @@ static const struct {
 	[MLX5E_100BASE_TX]   = {
 		.speed      = 100,
 	},
-	[MLX5E_100BASE_T]    = {
-		.supported  = SUPPORTED_100baseT_Full,
-		.advertised = ADVERTISED_100baseT_Full,
-		.speed      = 100,
+	[MLX5E_1000BASE_T]    = {
+		.supported  = SUPPORTED_1000baseT_Full,
+		.advertised = ADVERTISED_1000baseT_Full,
+		.speed      = 1000,
 	},
 	[MLX5E_10GBASE_T]    = {
 		.supported  = SUPPORTED_10000baseT_Full,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 0/8] mlx5 driver updates and fixes
From: Saeed Mahameed @ 2016-04-21 21:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed

Hi Dave,

Changes from V0:
	- Dropped: ("net/mlx5e: Reset link modes upon setting speed to zero") 
	- Fixed compilation issue introduced to mlx5_ib driver.
	- Rebased to df637193906a ('Revert "Prevent NUll pointer dereference with two PHYs on cpsw"')

This series has few bug fixes for mlx5 core and ethernet driver.

Eli fixed a wrong static local variable declaration in flow steering API.
Majd added the support of ConnectX-5 PF and VF and added the support
for kernel shutdown pci callback for more robust reboot procedures.
Maor fixed a soft lockup in flow steering.
Rana fixed a wrog speed define in mlx5 EN driver.
I also had the chance to introduce some bug fixes in mlx5 EN mtu 
reporting and handling.

For -stable:
	net/mlx5_core: Fix soft lockup in steering error flow
	net/mlx5e: Device's mtu field is u16 and not int
	net/mlx5e: Fix minimum MTU
	net/mlx5e: Use vport MTU rather than physical port MTU

Thanks,
Saeed


Eli Cohen (1):
  net/mlx5_core: Remove static from local variable

Majd Dibbiny (2):
  net/mlx5_core: Add ConnectX-5 to list of supported devices
  net/mlx5: Add pci shutdown callback

Maor Gottlieb (1):
  net/mlx5_core: Fix soft lockup in steering error flow

Rana Shahout (1):
  net/mlx5e: Fix MLX5E_100BASE_T define

Saeed Mahameed (3):
  net/mlx5e: Device's mtu field is u16 and not int
  net/mlx5e: Fix minimum MTU
  net/mlx5e: Use vport MTU rather than physical port MTU

 drivers/infiniband/hw/mlx5/main.c                  |    4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |    8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   72 +++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   48 +++++--------
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   25 ++++++-
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |   10 ++--
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |   40 +++++++++++
 include/linux/mlx5/driver.h                        |    7 +-
 include/linux/mlx5/port.h                          |    6 +-
 include/linux/mlx5/vport.h                         |    2 +
 11 files changed, 157 insertions(+), 67 deletions(-)

^ permalink raw reply

* [PATCH net V1 4/8] net/mlx5e: Device's mtu field is u16 and not int
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                 |    4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/port.c    |   10 +++++-----
 include/linux/mlx5/port.h                         |    6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5acf346..99eb1c1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -671,8 +671,8 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u8 port,
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *rep;
-	int max_mtu;
-	int oper_mtu;
+	u16 max_mtu;
+	u16 oper_mtu;
 	int err;
 	u8 ib_link_width_oper;
 	u8 vl_hw_cap;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e0adb60..2fbbc62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1408,7 +1408,7 @@ static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
-	int hw_mtu;
+	u16 hw_mtu;
 	int err;
 
 	err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(netdev->mtu), 1);
@@ -2004,7 +2004,7 @@ static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	bool was_opened;
-	int max_mtu;
+	u16 max_mtu;
 	int err = 0;
 
 	mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index ae378c5..53cc1e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -247,8 +247,8 @@ int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_admin_status);
 
-static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, int *admin_mtu,
-				int *max_mtu, int *oper_mtu, u8 port)
+static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, u16 *admin_mtu,
+				u16 *max_mtu, u16 *oper_mtu, u8 port)
 {
 	u32 in[MLX5_ST_SZ_DW(pmtu_reg)];
 	u32 out[MLX5_ST_SZ_DW(pmtu_reg)];
@@ -268,7 +268,7 @@ static void mlx5_query_port_mtu(struct mlx5_core_dev *dev, int *admin_mtu,
 		*admin_mtu = MLX5_GET(pmtu_reg, out, admin_mtu);
 }
 
-int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port)
+int mlx5_set_port_mtu(struct mlx5_core_dev *dev, u16 mtu, u8 port)
 {
 	u32 in[MLX5_ST_SZ_DW(pmtu_reg)];
 	u32 out[MLX5_ST_SZ_DW(pmtu_reg)];
@@ -283,14 +283,14 @@ int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port)
 }
 EXPORT_SYMBOL_GPL(mlx5_set_port_mtu);
 
-void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu,
+void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu,
 			     u8 port)
 {
 	mlx5_query_port_mtu(dev, NULL, max_mtu, NULL, port);
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_max_mtu);
 
-void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu,
+void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu,
 			      u8 port)
 {
 	mlx5_query_port_mtu(dev, NULL, NULL, oper_mtu, port);
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index a1d145a..b30250a 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -54,9 +54,9 @@ int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
 int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
 				 enum mlx5_port_status *status);
 
-int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu, u8 port);
-void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu, u8 port);
-void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu,
+int mlx5_set_port_mtu(struct mlx5_core_dev *dev, u16 mtu, u8 port);
+void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu, u8 port);
+void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu,
 			      u8 port);
 
 int mlx5_query_port_vl_hw_cap(struct mlx5_core_dev *dev,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 1/8] net/mlx5_core: Fix soft lockup in steering error flow
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Maor Gottlieb,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Maor Gottlieb <maorg@mellanox.com>

In the error flow of adding flow rule to auto-grouped flow
table, we call to tree_remove_node.

tree_remove_node locks the node's parent, however the node's parent
is already locked by mlx5_add_flow_rule and this causes a deadlock.
After this patch, if we failed to add the flow rule, we unlock the
flow table before calling to tree_remove_node.

fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped
flow table')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reported-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   46 ++++++++-------------
 1 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5121be4..3c7e3e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1065,33 +1065,6 @@ unlock_fg:
 	return rule;
 }
 
-static struct mlx5_flow_rule *add_rule_to_auto_fg(struct mlx5_flow_table *ft,
-						  u8 match_criteria_enable,
-						  u32 *match_criteria,
-						  u32 *match_value,
-						  u8 action,
-						  u32 flow_tag,
-						  struct mlx5_flow_destination *dest)
-{
-	struct mlx5_flow_rule *rule;
-	struct mlx5_flow_group *g;
-
-	g = create_autogroup(ft, match_criteria_enable, match_criteria);
-	if (IS_ERR(g))
-		return (void *)g;
-
-	rule = add_rule_fg(g, match_value,
-			   action, flow_tag, dest);
-	if (IS_ERR(rule)) {
-		/* Remove assumes refcount > 0 and autogroup creates a group
-		 * with a refcount = 0.
-		 */
-		tree_get_node(&g->node);
-		tree_remove_node(&g->node);
-	}
-	return rule;
-}
-
 static struct mlx5_flow_rule *
 _mlx5_add_flow_rule(struct mlx5_flow_table *ft,
 		    u8 match_criteria_enable,
@@ -1119,8 +1092,23 @@ _mlx5_add_flow_rule(struct mlx5_flow_table *ft,
 				goto unlock;
 		}
 
-	rule = add_rule_to_auto_fg(ft, match_criteria_enable, match_criteria,
-				   match_value, action, flow_tag, dest);
+	g = create_autogroup(ft, match_criteria_enable, match_criteria);
+	if (IS_ERR(g)) {
+		rule = (void *)g;
+		goto unlock;
+	}
+
+	rule = add_rule_fg(g, match_value,
+			   action, flow_tag, dest);
+	if (IS_ERR(rule)) {
+		/* Remove assumes refcount > 0 and autogroup creates a group
+		 * with a refcount = 0.
+		 */
+		unlock_ref_node(&ft->node);
+		tree_get_node(&g->node);
+		tree_remove_node(&g->node);
+		return rule;
+	}
 unlock:
 	unlock_ref_node(&ft->node);
 	return rule;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 5/8] net/mlx5e: Fix minimum MTU
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

Minimum MTU that can be set in Connectx4 device is 68.

This fixes the case where a user wants to set invalid MTU,
the driver will fail to satisfy this request and the interface
will stay down.

It is better to report an error and continue working with old
mtu.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2fbbc62..93e4ef4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1999,22 +1999,27 @@ static int mlx5e_set_features(struct net_device *netdev,
 	return err;
 }
 
+#define MXL5_HW_MIN_MTU 64
+#define MXL5E_MIN_MTU (MXL5_HW_MIN_MTU + ETH_FCS_LEN)
+
 static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	bool was_opened;
 	u16 max_mtu;
+	u16 min_mtu;
 	int err = 0;
 
 	mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
 
 	max_mtu = MLX5E_HW2SW_MTU(max_mtu);
+	min_mtu = MLX5E_HW2SW_MTU(MXL5E_MIN_MTU);
 
-	if (new_mtu > max_mtu) {
+	if (new_mtu > max_mtu || new_mtu < min_mtu) {
 		netdev_err(netdev,
-			   "%s: Bad MTU (%d) > (%d) Max\n",
-			   __func__, new_mtu, max_mtu);
+			   "%s: Bad MTU (%d), valid range is: [%d..%d]\n",
+			   __func__, new_mtu, min_mtu, max_mtu);
 		return -EINVAL;
 	}
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 3/8] net/mlx5_core: Add ConnectX-5 to list of supported devices
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Majd Dibbiny,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Majd Dibbiny <majd@mellanox.com>

Add the upcoming ConnectX-5 devices (PF and VF) to the list of
supported devices by the mlx5 driver.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3f3b2fa..ddd352a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1459,6 +1459,8 @@ static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0x1014), MLX5_PCI_DEV_IS_VF},	/* ConnectX-4 VF */
 	{ PCI_VDEVICE(MELLANOX, 0x1015) },			/* ConnectX-4LX */
 	{ PCI_VDEVICE(MELLANOX, 0x1016), MLX5_PCI_DEV_IS_VF},	/* ConnectX-4LX VF */
+	{ PCI_VDEVICE(MELLANOX, 0x1017) },			/* ConnectX-5 */
+	{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},	/* ConnectX-5 VF */
 	{ 0, }
 };
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 7/8] net/mlx5_core: Remove static from local variable
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Eli Cohen,
	Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Eli Cohen <eli@mellanox.com>

The static is not required and breaks re-entrancy if it will be required.

Fixes: 2530236303d9 ("net/mlx5_core: Flow steering tree initialization")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 3c7e3e5..89cce97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1276,7 +1276,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 {
 	struct mlx5_flow_root_namespace *root_ns = dev->priv.root_ns;
 	int prio;
-	static struct fs_prio *fs_prio;
+	struct fs_prio *fs_prio;
 	struct mlx5_flow_namespace *ns;
 
 	if (!root_ns)
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 6/8] net/mlx5e: Use vport MTU rather than physical port MTU
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

Set and report vport MTU rather than physical MTU,
Driver will set both vport and physical port mtu and will
rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   44 ++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   |   40 +++++++++++++++++++
 include/linux/mlx5/vport.h                        |    2 +
 3 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 93e4ef4..85773f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1404,24 +1404,50 @@ static int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5e_priv *priv)
 	return 0;
 }
 
-static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+static int mlx5e_set_mtu(struct mlx5e_priv *priv, u16 mtu)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
-	u16 hw_mtu;
+	u16 hw_mtu = MLX5E_SW2HW_MTU(mtu);
 	int err;
 
-	err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(netdev->mtu), 1);
+	err = mlx5_set_port_mtu(mdev, hw_mtu, 1);
 	if (err)
 		return err;
 
-	mlx5_query_port_oper_mtu(mdev, &hw_mtu, 1);
+	/* Update vport context MTU */
+	mlx5_modify_nic_vport_mtu(mdev, hw_mtu);
+	return 0;
+}
+
+static void mlx5e_query_mtu(struct mlx5e_priv *priv, u16 *mtu)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u16 hw_mtu = 0;
+	int err;
+
+	err = mlx5_query_nic_vport_mtu(mdev, &hw_mtu);
+	if (err || !hw_mtu) /* fallback to port oper mtu */
+		mlx5_query_port_oper_mtu(mdev, &hw_mtu, 1);
+
+	*mtu = MLX5E_HW2SW_MTU(hw_mtu);
+}
+
+static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	u16 mtu;
+	int err;
+
+	err = mlx5e_set_mtu(priv, netdev->mtu);
+	if (err)
+		return err;
 
-	if (MLX5E_HW2SW_MTU(hw_mtu) != netdev->mtu)
-		netdev_warn(netdev, "%s: Port MTU %d is different than netdev mtu %d\n",
-			    __func__, MLX5E_HW2SW_MTU(hw_mtu), netdev->mtu);
+	mlx5e_query_mtu(priv, &mtu);
+	if (mtu != netdev->mtu)
+		netdev_warn(netdev, "%s: VPort MTU %d is different than netdev mtu %d\n",
+			    __func__, mtu, netdev->mtu);
 
-	netdev->mtu = MLX5E_HW2SW_MTU(hw_mtu);
+	netdev->mtu = mtu;
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index bd51840..b69dadc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -196,6 +196,46 @@ int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mac_address);
 
+int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+	u32 *out;
+	int err;
+
+	out = mlx5_vzalloc(outlen);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
+	if (!err)
+		*mtu = MLX5_GET(query_nic_vport_context_out, out,
+				nic_vport_context.mtu);
+
+	kvfree(out);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mtu);
+
+int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.mtu, 1);
+	MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.mtu, mtu);
+
+	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mtu);
+
 int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
 				  u32 vport,
 				  enum mlx5_list_type list_type,
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index bd93e63..301da4a 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -45,6 +45,8 @@ int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u16 vport, u8 *addr);
 int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *dev,
 				      u16 vport, u8 *addr);
+int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu);
+int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu);
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid);
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net V1 8/8] net/mlx5: Add pci shutdown callback
From: Saeed Mahameed @ 2016-04-21 21:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Tal Alon, Eran Ben Elisha, Majd Dibbiny,
	Tariq Toukan, Haggai Abramovsky, Saeed Mahameed
In-Reply-To: <1461274387-10653-1-git-send-email-saeedm@mellanox.com>

From: Majd Dibbiny <majd@mellanox.com>

This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.

In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   15 +++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |   23 +++++++++++++++++---
 include/linux/mlx5/driver.h                       |    7 +++--
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 85773f8..67d548b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2633,7 +2633,16 @@ static void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, void *vpriv)
 	schedule_work(&priv->set_rx_mode_work);
 	mlx5e_disable_async_events(priv);
 	flush_scheduled_work();
-	unregister_netdev(netdev);
+	if (test_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &mdev->intf_state)) {
+		netif_device_detach(netdev);
+		mutex_lock(&priv->state_lock);
+		if (test_bit(MLX5E_STATE_OPENED, &priv->state))
+			mlx5e_close_locked(netdev);
+		mutex_unlock(&priv->state_lock);
+	} else {
+		unregister_netdev(netdev);
+	}
+
 	mlx5e_tc_cleanup(priv);
 	mlx5e_vxlan_cleanup(priv);
 	mlx5e_destroy_flow_tables(priv);
@@ -2646,7 +2655,9 @@ static void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, void *vpriv)
 	mlx5_core_dealloc_transport_domain(priv->mdev, priv->tdn);
 	mlx5_core_dealloc_pd(priv->mdev, priv->pdn);
 	mlx5_unmap_free_uar(priv->mdev, &priv->cq_uar);
-	free_netdev(netdev);
+
+	if (!test_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &mdev->intf_state))
+		free_netdev(netdev);
 }
 
 static void *mlx5e_get_netdev(void *vpriv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ddd352a..6892746 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -966,7 +966,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	int err;
 
 	mutex_lock(&dev->intf_state_mutex);
-	if (dev->interface_state == MLX5_INTERFACE_STATE_UP) {
+	if (test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
 		dev_warn(&dev->pdev->dev, "%s: interface is up, NOP\n",
 			 __func__);
 		goto out;
@@ -1133,7 +1133,8 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	if (err)
 		pr_info("failed request module on %s\n", MLX5_IB_MOD);
 
-	dev->interface_state = MLX5_INTERFACE_STATE_UP;
+	clear_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state);
+	set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
 out:
 	mutex_unlock(&dev->intf_state_mutex);
 
@@ -1207,7 +1208,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	}
 
 	mutex_lock(&dev->intf_state_mutex);
-	if (dev->interface_state == MLX5_INTERFACE_STATE_DOWN) {
+	if (test_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state)) {
 		dev_warn(&dev->pdev->dev, "%s: interface is down, NOP\n",
 			 __func__);
 		goto out;
@@ -1241,7 +1242,8 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	mlx5_cmd_cleanup(dev);
 
 out:
-	dev->interface_state = MLX5_INTERFACE_STATE_DOWN;
+	clear_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
+	set_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state);
 	mutex_unlock(&dev->intf_state_mutex);
 	return err;
 }
@@ -1452,6 +1454,18 @@ static const struct pci_error_handlers mlx5_err_handler = {
 	.resume		= mlx5_pci_resume
 };
 
+static void shutdown(struct pci_dev *pdev)
+{
+	struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
+	struct mlx5_priv *priv = &dev->priv;
+
+	dev_info(&pdev->dev, "Shutdown was called\n");
+	/* Notify mlx5 clients that the kernel is being shut down */
+	set_bit(MLX5_INTERFACE_STATE_SHUTDOWN, &dev->intf_state);
+	mlx5_unload_one(dev, priv);
+	mlx5_pci_disable_device(dev);
+}
+
 static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0x1011) },			/* Connect-IB */
 	{ PCI_VDEVICE(MELLANOX, 0x1012), MLX5_PCI_DEV_IS_VF},	/* Connect-IB VF */
@@ -1471,6 +1485,7 @@ static struct pci_driver mlx5_core_driver = {
 	.id_table       = mlx5_core_pci_table,
 	.probe          = init_one,
 	.remove         = remove_one,
+	.shutdown	= shutdown,
 	.err_handler	= &mlx5_err_handler,
 	.sriov_configure   = mlx5_core_sriov_configure,
 };
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index dcd5ac8..369c837 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -519,8 +519,9 @@ enum mlx5_device_state {
 };
 
 enum mlx5_interface_state {
-	MLX5_INTERFACE_STATE_DOWN,
-	MLX5_INTERFACE_STATE_UP,
+	MLX5_INTERFACE_STATE_DOWN = BIT(0),
+	MLX5_INTERFACE_STATE_UP = BIT(1),
+	MLX5_INTERFACE_STATE_SHUTDOWN = BIT(2),
 };
 
 enum mlx5_pci_status {
@@ -544,7 +545,7 @@ struct mlx5_core_dev {
 	enum mlx5_device_state	state;
 	/* sync interface state */
 	struct mutex		intf_state_mutex;
-	enum mlx5_interface_state interface_state;
+	unsigned long		intf_state;
 	void			(*event) (struct mlx5_core_dev *dev,
 					  enum mlx5_dev_event event,
 					  unsigned long param);
-- 
1.7.1

^ permalink raw reply related

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Stephen Hemminger @ 2016-04-21 21:36 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <e6070b054b0748f48991187bb4ccfccd@HQ1WP-EXMB11.corp.brocade.com>

On Thu, 21 Apr 2016 21:27:39 +0000
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> > The code has also gotten deeply intended creating lots of lines that are too long.
> >  
> 
> Is this you or the script saying the above? How is the conclusion that
> we have deep indentation come about?
> I checked there are some code lines that are > 80 characters because
> it doesnt make sense to break them down.

I use checkpatch from kernel source to check iproute code now.

^ permalink raw reply

* [iproute2 PATCH v3 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:40 UTC (permalink / raw)
  To: stephen; +Cc: netdev, phil, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
"Distributing Linux Traffic Control Classifier-Action Subsystem"
 Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-03.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/linux/tc_act/tc_ife.h |  38 +++++
 include/linux/tc_ife.h        |  38 +++++
 tc/Makefile                   |   1 +
 tc/m_ife.c                    | 341 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 418 insertions(+)
 create mode 100644 include/linux/tc_act/tc_ife.h
 create mode 100644 include/linux/tc_ife.h
 create mode 100644 tc/m_ife.c

diff --git a/include/linux/tc_act/tc_ife.h b/include/linux/tc_act/tc_ife.h
new file mode 100644
index 0000000..d648ff6
--- /dev/null
+++ b/include/linux/tc_act/tc_ife.h
@@ -0,0 +1,38 @@
+#ifndef __UAPI_TC_IFE_H
+#define __UAPI_TC_IFE_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_IFE 25
+/* Flag bits for now just encoding/decoding; mutually exclusive */
+#define IFE_ENCODE 1
+#define IFE_DECODE 0
+
+struct tc_ife {
+	tc_gen;
+	__u16 flags;
+};
+
+/*XXX: We need to encode the total number of bytes consumed */
+enum {
+	TCA_IFE_UNSPEC,
+	TCA_IFE_PARMS,
+	TCA_IFE_TM,
+	TCA_IFE_DMAC,
+	TCA_IFE_SMAC,
+	TCA_IFE_TYPE,
+	TCA_IFE_METALST,
+	__TCA_IFE_MAX
+};
+#define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
+
+#define IFE_META_SKBMARK 1
+#define IFE_META_HASHID 2
+#define	IFE_META_PRIO 3
+#define	IFE_META_QMAP 4
+/*Can be overridden at runtime by module option*/
+#define	__IFE_META_MAX 5
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif
diff --git a/include/linux/tc_ife.h b/include/linux/tc_ife.h
new file mode 100644
index 0000000..d648ff6
--- /dev/null
+++ b/include/linux/tc_ife.h
@@ -0,0 +1,38 @@
+#ifndef __UAPI_TC_IFE_H
+#define __UAPI_TC_IFE_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_IFE 25
+/* Flag bits for now just encoding/decoding; mutually exclusive */
+#define IFE_ENCODE 1
+#define IFE_DECODE 0
+
+struct tc_ife {
+	tc_gen;
+	__u16 flags;
+};
+
+/*XXX: We need to encode the total number of bytes consumed */
+enum {
+	TCA_IFE_UNSPEC,
+	TCA_IFE_PARMS,
+	TCA_IFE_TM,
+	TCA_IFE_DMAC,
+	TCA_IFE_SMAC,
+	TCA_IFE_TYPE,
+	TCA_IFE_METALST,
+	__TCA_IFE_MAX
+};
+#define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
+
+#define IFE_META_SKBMARK 1
+#define IFE_META_HASHID 2
+#define	IFE_META_PRIO 3
+#define	IFE_META_QMAP 4
+/*Can be overridden at runtime by module option*/
+#define	__IFE_META_MAX 5
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index f5bea87..20f5110 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -43,6 +43,7 @@ TCMODULES += m_gact.o
 TCMODULES += m_mirred.o
 TCMODULES += m_nat.o
 TCMODULES += m_pedit.o
+TCMODULES += m_ife.o
 TCMODULES += m_skbedit.o
 TCMODULES += m_csum.o
 TCMODULES += m_simple.o
diff --git a/tc/m_ife.c b/tc/m_ife.c
new file mode 100644
index 0000000..839e370
--- /dev/null
+++ b/tc/m_ife.c
@@ -0,0 +1,341 @@
+/*
+ * m_ife.c	IFE actions module
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:  J Hadi Salim (jhs@mojatatu.com)
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <linux/netdevice.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "tc_util.h"
+#include <linux/tc_act/tc_ife.h>
+
+static void ife_explain(void)
+{
+	fprintf(stderr,
+		"Usage:... ife {decode|encode} {ALLOW|USE} [dst DMAC] [src SMAC] [type TYPE] [CONTROL] [index INDEX]\n");
+	fprintf(stderr,
+		"\tALLOW := Encode direction. Allows encoding specified metadata\n"
+		"\t\t e.g \"allow mark\"\n"
+		"\tUSE := Encode direction. Enforce Static encoding of specified metadata\n"
+		"\t\t e.g \"use mark 0x12\"\n"
+		"\tDMAC := 6 byte Destination MAC address to encode\n"
+		"\tSMAC := optional 6 byte Source MAC address to encode\n"
+		"\tTYPE := optional 16 bit ethertype to encode\n"
+		"\tCONTROL := reclassify|pipe|drop|continue|ok\n"
+		"\tINDEX := optional IFE table index value used\n");
+	fprintf(stderr, "encode is used for sending IFE packets\n");
+	fprintf(stderr, "decode is used for receiving IFE packets\n");
+}
+
+static void ife_usage(void)
+{
+	ife_explain();
+	exit(-1);
+}
+
+static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
+		     int tca_id, struct nlmsghdr *n)
+{
+	int argc = *argc_p;
+	char **argv = *argv_p;
+	int ok = 0;
+	struct tc_ife p;
+	struct rtattr *tail;
+	struct rtattr *tail2;
+	char dbuf[ETH_ALEN];
+	char sbuf[ETH_ALEN];
+	__u16 ife_type = 0;
+	__u32 ife_prio = 0;
+	__u32 ife_prio_v = 0;
+	__u32 ife_mark = 0;
+	__u32 ife_mark_v = 0;
+	char *daddr = NULL;
+	char *saddr = NULL;
+
+	memset(&p, 0, sizeof(p));
+	p.action = TC_ACT_PIPE;	/* good default */
+
+	if (argc <= 0)
+		return -1;
+
+	while (argc > 0) {
+		if (matches(*argv, "ife") == 0) {
+			NEXT_ARG();
+			continue;
+		} else if (matches(*argv, "decode") == 0) {
+			p.flags = IFE_DECODE; /* readability aid */
+			ok++;
+		} else if (matches(*argv, "encode") == 0) {
+			p.flags = IFE_ENCODE;
+			ok++;
+		} else if (matches(*argv, "allow") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "mark") == 0) {
+				ife_mark = IFE_META_SKBMARK;
+			} else if (matches(*argv, "prio") == 0) {
+				ife_prio = IFE_META_PRIO;
+			} else {
+				fprintf(stderr, "Illegal meta define <%s>\n",
+					*argv);
+				return -1;
+			}
+		} else if (matches(*argv, "use") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "mark") == 0) {
+				NEXT_ARG();
+				if (get_u32(&ife_mark_v, *argv, 0))
+					invarg("ife mark val is invalid",
+					       *argv);
+			} else if (matches(*argv, "prio") == 0) {
+				NEXT_ARG();
+				if (get_u32(&ife_prio_v, *argv, 0))
+					invarg("ife prio val is invalid",
+					       *argv);
+			} else {
+				fprintf(stderr, "Illegal meta use type <%s>\n",
+					*argv);
+				return -1;
+			}
+		} else if (matches(*argv, "type") == 0) {
+			NEXT_ARG();
+			if (get_u16(&ife_type, *argv, 0))
+				invarg("ife type is invalid", *argv);
+			fprintf(stderr, "IFE type 0x%x\n", ife_type);
+		} else if (matches(*argv, "dst") == 0) {
+			NEXT_ARG();
+			daddr = *argv;
+			if (sscanf(daddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+				   dbuf, dbuf + 1, dbuf + 2,
+				   dbuf + 3, dbuf + 4, dbuf + 5) != 6) {
+				fprintf(stderr, "Invalid mac address %s\n",
+					daddr);
+			}
+			fprintf(stderr, "dst MAC address <%s>\n", daddr);
+
+		} else if (matches(*argv, "src") == 0) {
+			NEXT_ARG();
+			saddr = *argv;
+			if (sscanf(saddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+				   sbuf, sbuf + 1, sbuf + 2,
+				   sbuf + 3, sbuf + 4, sbuf + 5) != 6) {
+				fprintf(stderr, "Invalid mac address %s\n",
+					saddr);
+			}
+			fprintf(stderr, "src MAC address <%s>\n", saddr);
+		} else if (matches(*argv, "help") == 0) {
+			ife_usage();
+		} else {
+			break;
+		}
+
+		argc--;
+		argv++;
+	}
+
+	if (argc) {
+		if (matches(*argv, "reclassify") == 0) {
+			p.action = TC_ACT_RECLASSIFY;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "pipe") == 0) {
+			p.action = TC_ACT_PIPE;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "drop") == 0 ||
+			   matches(*argv, "shot") == 0) {
+			p.action = TC_ACT_SHOT;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "continue") == 0) {
+			p.action = TC_ACT_UNSPEC;
+			argc--;
+			argv++;
+		} else if (matches(*argv, "pass") == 0) {
+			p.action = TC_ACT_OK;
+			argc--;
+			argv++;
+		}
+	}
+
+	if (argc) {
+		if (matches(*argv, "index") == 0) {
+			NEXT_ARG();
+			if (get_u32(&p.index, *argv, 10)) {
+				fprintf(stderr, "ife: Illegal \"index\"\n");
+				return -1;
+			}
+			argc--;
+			argv++;
+		}
+	}
+
+	if (!ok) {
+		fprintf(stderr, "IFE requires decode/encode specified\n");
+		ife_usage();
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, tca_id, NULL, 0);
+	addattr_l(n, MAX_MSG, TCA_IFE_PARMS, &p, sizeof(p));
+
+	if (!(p.flags & IFE_ENCODE))
+		goto skip_encode;
+
+	if (daddr)
+		addattr_l(n, MAX_MSG, TCA_IFE_DMAC, dbuf, ETH_ALEN);
+	if (ife_type)
+		addattr_l(n, MAX_MSG, TCA_IFE_TYPE, &ife_type, 2);
+	if (saddr)
+		addattr_l(n, MAX_MSG, TCA_IFE_SMAC, sbuf, ETH_ALEN);
+
+	tail2 = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, TCA_IFE_METALST, NULL, 0);
+	if (ife_mark || ife_mark_v) {
+		if (ife_mark_v)
+			addattr_l(n, MAX_MSG, IFE_META_SKBMARK, &ife_mark_v, 4);
+		else
+			addattr_l(n, MAX_MSG, IFE_META_SKBMARK, NULL, 0);
+	}
+	if (ife_prio || ife_prio_v) {
+		if (ife_prio_v)
+			addattr_l(n, MAX_MSG, IFE_META_PRIO, &ife_prio_v, 4);
+		else
+			addattr_l(n, MAX_MSG, IFE_META_PRIO, NULL, 0);
+	}
+
+	tail2->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail2;
+
+skip_encode:
+	tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
+
+	*argc_p = argc;
+	*argv_p = argv;
+	return 0;
+}
+
+static int print_ife(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+	struct tc_ife *p = NULL;
+	struct rtattr *tb[TCA_IFE_MAX + 1];
+	__u16 ife_type = 0;
+	__u32 mmark = 0;
+	__u32 mhash = 0;
+	__u32 mprio = 0;
+	int has_optional = 0;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (arg == NULL)
+		return -1;
+
+	parse_rtattr_nested(tb, TCA_IFE_MAX, arg);
+
+	if (tb[TCA_IFE_PARMS] == NULL) {
+		fprintf(f, "[NULL ife parameters]");
+		return -1;
+	}
+	p = RTA_DATA(tb[TCA_IFE_PARMS]);
+
+	fprintf(f, "ife %s action %s ",
+		(p->flags & IFE_ENCODE) ? "encode" : "decode",
+		action_n2a(p->action, b1, sizeof(b1)));
+
+	if (tb[TCA_IFE_TYPE]) {
+		ife_type = rta_getattr_u16(tb[TCA_IFE_TYPE]);
+		has_optional = 1;
+		fprintf(f, "type 0x%X ", ife_type);
+	}
+
+	if (has_optional)
+		fprintf(f, "\n\t ");
+
+	if (tb[TCA_IFE_METALST]) {
+		struct rtattr *metalist[IFE_META_MAX + 1];
+		int len = 0;
+
+		parse_rtattr_nested(metalist, IFE_META_MAX,
+				    tb[TCA_IFE_METALST]);
+
+		if (metalist[IFE_META_SKBMARK]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_SKBMARK]);
+			if (len) {
+				mmark = rta_getattr_u32(metalist[IFE_META_SKBMARK]);
+				fprintf(f, "use mark %d ", mmark);
+			} else
+				fprintf(f, "allow mark ");
+		}
+
+		if (metalist[IFE_META_HASHID]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_HASHID]);
+			if (len) {
+				mhash = rta_getattr_u32(metalist[IFE_META_HASHID]);
+				fprintf(f, "use hash %d ", mhash);
+			} else
+				fprintf(f, "allow hash ");
+		}
+
+		if (metalist[IFE_META_PRIO]) {
+			len = RTA_PAYLOAD(metalist[IFE_META_PRIO]);
+			if (len) {
+				mprio = rta_getattr_u32(metalist[IFE_META_PRIO]);
+				fprintf(f, "use prio %d ", mprio);
+			} else
+				fprintf(f, "allow prio ");
+		}
+
+	}
+
+	if (tb[TCA_IFE_DMAC]) {
+		has_optional = 1;
+		fprintf(f, "dst %s ",
+			ll_addr_n2a(RTA_DATA(tb[TCA_IFE_DMAC]),
+				    RTA_PAYLOAD(tb[TCA_IFE_DMAC]), 0, b2,
+				    sizeof(b2)));
+
+	}
+
+	if (tb[TCA_IFE_SMAC]) {
+		has_optional = 1;
+		fprintf(f, "src %s ",
+			ll_addr_n2a(RTA_DATA(tb[TCA_IFE_SMAC]),
+				    RTA_PAYLOAD(tb[TCA_IFE_SMAC]), 0, b2,
+				    sizeof(b2)));
+	}
+
+	fprintf(f, "\n\t index %d ref %d bind %d", p->index, p->refcnt,
+		p->bindcnt);
+	if (show_stats) {
+		if (tb[TCA_IFE_TM]) {
+			struct tcf_t *tm = RTA_DATA(tb[TCA_IFE_TM]);
+
+			print_tm(f, tm);
+		}
+	}
+
+	fprintf(f, "\n");
+
+	return 0;
+}
+
+struct action_util ife_action_util = {
+	.id = "ife",
+	.parse_aopt = parse_ife,
+	.print_aopt = print_ife,
+};
-- 
1.9.1

^ permalink raw reply related

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 21:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <20160421143657.51ba9fdd@xeon-e3>

On 16-04-21 05:36 PM, Stephen Hemminger wrote:
> On Thu, 21 Apr 2016 21:27:39 +0000
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>
> I use checkpatch from kernel source to check iproute code now.

ah. It was wrong in this specific case.
In any case I just resent the patch with the fixes.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] libnl: enhance API to ease 64bit alignment for attribute
From: Nicolas Dichtel @ 2016-04-21 22:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, roopa, eric.dumazet, tgraf, jhs
In-Reply-To: <20160421.142831.1815562418742721577.davem@davemloft.net>

Le 21/04/2016 20:28, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Thu, 21 Apr 2016 18:58:23 +0200
>
>> Here is a proposal to add more helpers in the libnetlink to manage 64-bit
>> alignment issues.
>> Note that this series was only tested on x86 by tweeking
>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.
>>
>> The first patch adds helpers for 64bit alignment and other patches
>> use them.
>>
>> We could also add helpers for nla_put_u64() and its variants if needed.
>>
>> v1 -> v2:
>>   - remove patch #1
>>   - split patch #2 (now #1 and #2)
>>   - add nla_need_padding_for_64bit()
>
> I like it, nice work Nicolas.
Thank you.

>
> Applied to net-next.
>
> I did a quick scan and the following jumped out at me as cases we need
> to fix up as well:
Did you grep something or just catch this by code review?

>
> 1) xfrm_user
> 2) tcp_info
> 3) taskstats
> 4) pkt_{cls,sched}
> 5) openvswitch
> etc.
>
> Most of these are statistic cases just like all of the existing ones
> we have fixed so far.
Yes, I will follow on this topic. There are also a bunch of
nla_put_[u|be|le]64():
$ git grep -w "nla_put_.\{1,2\}64" net/ | wc -l
118
$ git grep -w "nla_put_.\{1,2\}64" | wc -l
172

^ permalink raw reply

* Re: [net-next PATCH iproute2 v2 1/1] tc: introduce IFE action
From: Jamal Hadi Salim @ 2016-04-21 22:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, phil@nwl.cc
In-Reply-To: <5719490F.3020807@mojatatu.com>

On 16-04-21 05:41 PM, Jamal Hadi Salim wrote:
> On 16-04-21 05:36 PM, Stephen Hemminger wrote:
>> On Thu, 21 Apr 2016 21:27:39 +0000
>> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
>>
>> I use checkpatch from kernel source to check iproute code now.
>
> ah. It was wrong in this specific case.
> In any case I just resent the patch with the fixes.

I just realized i sent with my local ife headers.
I didnt see them in the new iproute2 tree;
if you want me to resend i could or just feel free to
edit.

cheers,
jamal

^ permalink raw reply

* Re: qdisc spin lock
From: Michael Ma @ 2016-04-21 22:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <1461242518.7627.8.camel@edumazet-glaptop3.roam.corp.google.com>

2016-04-21 5:41 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> On Wed, 2016-04-20 at 22:51 -0700, Michael Ma wrote:
>> 2016-04-20 15:34 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> > On Wed, 2016-04-20 at 14:24 -0700, Michael Ma wrote:
>> >> 2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> >> > On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
>> >> >> I didn't really know that multiple qdiscs can be isolated using MQ so
>> >> >> that each txq can be associated with a particular qdisc. Also we don't
>> >> >> really have multiple interfaces...
>> >> >>
>> >> >> With this MQ solution we'll still need to assign transmit queues to
>> >> >> different classes by doing some math on the bandwidth limit if I
>> >> >> understand correctly, which seems to be less convenient compared with
>> >> >> a solution purely within HTB.
>> >> >>
>> >> >> I assume that with this solution I can still share qdisc among
>> >> >> multiple transmit queues - please let me know if this is not the case.
>> >> >
>> >> > Note that this MQ + HTB thing works well, unless you use a bonding
>> >> > device. (Or you need the MQ+HTB on the slaves, with no way of sharing
>> >> > tokens between the slaves)
>> >>
>> >> Actually MQ+HTB works well for small packets - like flow of 512 byte
>> >> packets can be throttled by HTB using one txq without being affected
>> >> by other flows with small packets. However I found using this solution
>> >> large packets (10k for example) will only achieve very limited
>> >> bandwidth. In my test I used MQ to assign one txq to a HTB which sets
>> >> rate at 1Gbit/s, 512 byte packets can achieve the ceiling rate by
>> >> using 30 threads. But sending 10k packets using 10 threads has only 10
>> >> Mbit/s with the same TC configuration. If I increase burst and cburst
>> >> of HTB to some extreme large value (like 50MB) the ceiling rate can be
>> >> hit.
>> >>
>> >> The strange thing is that I don't see this problem when using HTB as
>> >> the root. So txq number seems to be a factor here - however it's
>> >> really hard to understand why would it only affect larger packets. Is
>> >> this a known issue? Any suggestion on how to investigate the issue
>> >> further? Profiling shows that the cpu utilization is pretty low.
>> >
>> > You could try
>> >
>> > perf record -a -g -e skb:kfree_skb sleep 5
>> > perf report
>> >
>> > So that you see where the packets are dropped.
>> >
>> > Chances are that your UDP sockets SO_SNDBUF is too big, and packets are
>> > dropped at qdisc enqueue time, instead of having backpressure.
>> >
>>
>> Thanks for the hint - how should I read the perf report? Also we're
>> using TCP socket in this testing - TCP window size is set to 70kB.
>
> But how are you telling TCP to send 10k packets ?
>
We just write to the socket with 10k buffer and wait for a response
from the server (using read()) before the next write. Using tcpdump I
can see the 10k write is actually sent through 3 packets
(7.3k/1.5k/1.3k).

> AFAIK you can not : TCP happily aggregates packets in write queue
> (see current MSG_EOR discussion)
>
> I suspect a bug in your tc settings.
>
>

Could you help to check my tc setting?

sudo tc qdisc add dev eth0 root mqprio num_tc 6 map 0 1 2 3 4 5 0 0
queues 19@0 1@19 1@20 1@21 1@22 1@23 hw 0
sudo tc qdisc add dev eth0 parent 805a:1a handle 8001:0 htb default 10
sudo tc class add dev eth0 parent 8001: classid 8001:10 htb rate 1000Mbit

I didn't set r2q/burst/cburst/mtu/mpu so the default value should be used.

^ permalink raw reply

* [PATCH v2] ixgbevf: Change the relaxed order settings in VF driver for sparc
From: Babu Moger @ 2016-04-21 22:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, shannon.nelson,
	carolyn.wyborny, donald.c.skidmore, bruce.w.allan, john.ronciak,
	mitch.a.williams
  Cc: intel-wired-lan, netdev, linux-kernel, babu.moger,
	sowmini.varadhan

We noticed performance issues with VF interface on sparc compared
to PF. Setting the RX to IXGBE_DCA_RXCTRL_DATA_WRO_EN brings it
on far with PF. Also this matches to the default sparc setting in
PF driver.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2:
  Alexander had concerns about this negativily affecting other architectures.
  Added CONFIG_SPARC check so this should not affect other architectures.

 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 0ea14c0..3596e0b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1748,9 +1748,15 @@ static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_VFRDLEN(reg_idx),
 			ring->count * sizeof(union ixgbe_adv_rx_desc));
 
+#ifndef CONFIG_SPARC
 	/* enable relaxed ordering */
 	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
 			IXGBE_DCA_RXCTRL_DESC_RRO_EN);
+#else
+	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
+			IXGBE_DCA_RXCTRL_DESC_RRO_EN |
+			IXGBE_DCA_RXCTRL_DATA_WRO_EN);
+#endif
 
 	/* reset head and tail pointers */
 	IXGBE_WRITE_REG(hw, IXGBE_VFRDH(reg_idx), 0);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next] hv_netvsc: Fix the list processing for network change event
From: Haiyang Zhang @ 2016-04-21 23:13 UTC (permalink / raw)
  To: davem, netdev
  Cc: haiyangz, kys, olaf, vkuznets, linux-kernel, driverdev-devel

RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
media disconnect & connect. The second half should be added to the list
head, not to the tail. So all events are processed in normal order.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index bfdb568a..ba3f3f3 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1125,7 +1125,7 @@ static void netvsc_link_change(struct work_struct *w)
 			netif_tx_stop_all_queues(net);
 			event->event = RNDIS_STATUS_MEDIA_CONNECT;
 			spin_lock_irqsave(&ndev_ctx->lock, flags);
-			list_add_tail(&event->list, &ndev_ctx->reconfig_events);
+			list_add(&event->list, &ndev_ctx->reconfig_events);
 			spin_unlock_irqrestore(&ndev_ctx->lock, flags);
 			reschedule = true;
 		}
-- 
1.7.4.1

^ permalink raw reply related

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Vivien Didelot @ 2016-04-21 23:54 UTC (permalink / raw)
  To: David Miller; +Cc: Stephen Rothwell, netdev, linux-next, linux-kernel
In-Reply-To: <20160418113038.00d78799@canb.auug.org.au>

Hi David,

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   drivers/net/dsa/mv88e6xxx.c
>
> between commit:
>
>   207afda1b503 ("net: dsa: mv88e6xxx: share the same default FDB")
>
> from the net tree and commit:
>
>   009a2b9843bf ("net: dsa: mv88e6xxx: add number of ports to info")
>
> from the net-next tree.
>
> I fixed it up (the former removed some of the code updated by the latter)
> and can carry the fix as necessary. This is now fixed as far as linux-next
> is concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

I have another series to send to net-next which will also conflict with this
fix from net. As it is also required in net-next, can the fix be merged
in net-next as well?

This fix is the 3 commits:

  65fa40276ac1 ("net: dsa: mv88e6xxx: unlock DSA and CPU ports")
  996ecb824667 ("net: dsa: mv88e6xxx: enable SA learning on DSA ports")
  207afda1b503 ("net: dsa: mv88e6xxx: share the same default FDB")

For the merge commit:

  cf6b5fb2514d ("Merge branch 'dsa-mv88e6xxx-fix-cross-chip-bridging'")

Regards,
Vivien

^ permalink raw reply

* [PATCH net 1/1] net: fec: update dirty_tx even if no skb
From: Troy Kisky @ 2016-04-22  2:00 UTC (permalink / raw)
  To: netdev, davem, fugang.duan, lznuaa
  Cc: fabio.estevam, l.stach, andrew, tremyfr, gerg, linux-arm-kernel,
	johannes, stillcompiling, sergei.shtylyov, arnd, holgerschurig,
	Troy Kisky

If dirty_tx isn't updated, then dma_unmap_single
will be called twice.

This fixes a
[   58.420980] ------------[ cut here ]------------
[   58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
[   58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes]

encountered by Holger

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Tested-by: <holgerschurig@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 08243c2..b71654c 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1197,10 +1197,8 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
 					 fec16_to_cpu(bdp->cbd_datlen),
 					 DMA_TO_DEVICE);
 		bdp->cbd_bufaddr = cpu_to_fec32(0);
-		if (!skb) {
-			bdp = fec_enet_get_nextdesc(bdp, &txq->bd);
-			continue;
-		}
+		if (!skb)
+			goto skb_done;
 
 		/* Check for errors. */
 		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
@@ -1239,7 +1237,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
 
 		/* Free the sk buffer associated with this last transmit */
 		dev_kfree_skb_any(skb);
-
+skb_done:
 		/* Make sure the update to bdp and tx_skbuff are performed
 		 * before dirty_tx
 		 */
-- 
2.5.0

^ permalink raw reply related

* [PATCH] net: ipv6: Delete host routes on an ifdown
From: David Ahern @ 2016-04-22  3:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: mmanning, David Ahern

It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.

The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:

1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
   rt6_fill_node while handling a route dump request.

2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
   fib6_del

3. Panic in fib6_purge_rt because rt6i_ref count is not 1.

The root cause of all these is references related to the host route for
an address that is retained.

So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.

All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
Dave: I realize this goes against your preference to keep routes cached
      on a down but this patch really emphasizes my point of all the
      dark corners to be handled. 

 net/ipv6/addrconf.c | 48 +++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 33 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 23cec53b568a..8ec4b3089e20 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3176,35 +3176,9 @@ static void addrconf_gre_config(struct net_device *dev)
 }
 #endif
 
-#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
-/* If the host route is cached on the addr struct make sure it is associated
- * with the proper table. e.g., enslavement can change and if so the cached
- * host route needs to move to the new table.
- */
-static void l3mdev_check_host_rt(struct inet6_dev *idev,
-				  struct inet6_ifaddr *ifp)
-{
-	if (ifp->rt) {
-		u32 tb_id = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL;
-
-		if (tb_id != ifp->rt->rt6i_table->tb6_id) {
-			ip6_del_rt(ifp->rt);
-			ifp->rt = NULL;
-		}
-	}
-}
-#else
-static void l3mdev_check_host_rt(struct inet6_dev *idev,
-				  struct inet6_ifaddr *ifp)
-{
-}
-#endif
-
 static int fixup_permanent_addr(struct inet6_dev *idev,
 				struct inet6_ifaddr *ifp)
 {
-	l3mdev_check_host_rt(idev, ifp);
-
 	if (!ifp->rt) {
 		struct rt6_info *rt;
 
@@ -3304,6 +3278,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			break;
 
 		if (event == NETDEV_UP) {
+			/* restore routes for permanent addresses */
+			addrconf_permanent_addr(dev);
+
 			if (!addrconf_qdisc_ok(dev)) {
 				/* device is not ready yet. */
 				pr_info("ADDRCONF(NETDEV_UP): %s: link is not ready\n",
@@ -3337,9 +3314,6 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			run_pending = 1;
 		}
 
-		/* restore routes for permanent addresses */
-		addrconf_permanent_addr(dev);
-
 		switch (dev->type) {
 #if IS_ENABLED(CONFIG_IPV6_SIT)
 		case ARPHRD_SIT:
@@ -3556,6 +3530,8 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	INIT_LIST_HEAD(&del_list);
 	list_for_each_entry_safe(ifa, tmp, &idev->addr_list, if_list) {
+		struct rt6_info *rt = NULL;
+
 		addrconf_del_dad_work(ifa);
 
 		write_unlock_bh(&idev->lock);
@@ -3568,6 +3544,9 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 			ifa->state = 0;
 			if (!(ifa->flags & IFA_F_NODAD))
 				ifa->flags |= IFA_F_TENTATIVE;
+
+			rt = ifa->rt;
+			ifa->rt = NULL;
 		} else {
 			state = ifa->state;
 			ifa->state = INET6_IFADDR_STATE_DEAD;
@@ -3578,6 +3557,9 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 		spin_unlock_bh(&ifa->lock);
 
+		if (rt)
+			ip6_del_rt(rt);
+
 		if (state != INET6_IFADDR_STATE_DEAD) {
 			__ipv6_ifa_notify(RTM_DELADDR, ifa);
 			inet6addr_notifier_call_chain(NETDEV_DOWN, ifa);
@@ -5343,10 +5325,10 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 			if (rt)
 				ip6_del_rt(rt);
 		}
-		dst_hold(&ifp->rt->dst);
-
-		ip6_del_rt(ifp->rt);
-
+		if (ifp->rt) {
+			dst_hold(&ifp->rt->dst);
+			ip6_del_rt(ifp->rt);
+		}
 		rt_genid_bump_ipv6(net);
 		break;
 	}
-- 
2.1.4

^ permalink raw reply related

* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb
From: Martin KaFai Lau @ 2016-04-22  4:30 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <CACSApvZO9d9B-5YOxm7LbpOvtORXQctTiocAY+2tigugDqr0eg@mail.gmail.com>

On Thu, Apr 21, 2016 at 05:14:37PM -0400, Soheil Hassas Yeganeh wrote:
> On another note, do you think putting this is a self-documenting
> helper function, say tcp_can_collapse_to(), would help readability?
Sure.  I will move unlikely(TCP_SKB_CB(to)->eor) to a new helper
function tcp_skb_can_collapse_to() in the next spin.

^ permalink raw reply

* Re: [PATCH V3 29/29] ethernet: use parity8 in broadcom/tg3.c
From: Siva Reddy Kallam @ 2016-04-22  4:46 UTC (permalink / raw)
  To: zengzhaoxiu, Prashant Sreedharan, Michael Chan, Zhaoxiu Zeng
  Cc: linux-kernel, netdev

On Thu, Apr 14, 2016 at 8:42 AM, <zengzhaoxiu@163.com> wrote:
>
> From: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
>
> Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Looks good to me.
Acked-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/tg3.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 3010080..802a429 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -12939,11 +12939,7 @@ static int tg3_test_nvram(struct tg3 *tp)
>
>                 err = -EIO;
>                 for (i = 0; i < NVRAM_SELFBOOT_DATA_SIZE; i++) {
> -                       u8 hw8 = hweight8(data[i]);
> -
> -                       if ((hw8 & 0x1) && parity[i])
> -                               goto out;
> -                       else if (!(hw8 & 0x1) && !parity[i])
> +                       if (parity8(data[i]) == !!parity[i])
>                                 goto out;
>                 }
>                 err = 0;
> --
> 2.5.0
>
>

^ permalink raw reply

* [PATCH net-next] tcp: SYN packets are now simply consumed
From: Eric Dumazet @ 2016-04-22  5:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

We now have proper per-listener but also per network namespace counters
for SYN packets that might be dropped.

We replace the kfree_skb() by consume_skb() to be drop monitor [1]
friendly, and remove an obsolete comment.
FastOpen SYN packets can carry payload in them just fine.

[1] perf record -a -g -e skb:kfree_skb sleep 1; perf report

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c |   19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 90e0d9256b74..ab674cd99827 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5813,24 +5813,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 			if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)
 				return 1;
 
-			/* Now we have several options: In theory there is
-			 * nothing else in the frame. KA9Q has an option to
-			 * send data with the syn, BSD accepts data with the
-			 * syn up to the [to be] advertised window and
-			 * Solaris 2.1 gives you a protocol error. For now
-			 * we just ignore it, that fits the spec precisely
-			 * and avoids incompatibilities. It would be nice in
-			 * future to drop through and process the data.
-			 *
-			 * Now that TTCP is starting to be used we ought to
-			 * queue this data.
-			 * But, this leaves one open to an easy denial of
-			 * service attack, and SYN cookies can't defend
-			 * against this problem. So, we drop the data
-			 * in the interest of security over speed unless
-			 * it's still in use.
-			 */
-			kfree_skb(skb);
+			consume_skb(skb);
 			return 0;
 		}
 		goto discard;

^ permalink raw reply related

* [PATCH net-next] net: better drop monitoring in ip{6}_recv_error()
From: Eric Dumazet @ 2016-04-22  5:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

We should call consume_skb(skb) when skb is properly consumed,
or kfree_skb(skb) when skb must be dropped in error case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_sockglue.c |   10 +++++-----
 net/ipv6/datagram.c    |   10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 279471c4e58f..e3ca3915dc74 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -510,9 +510,10 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		copied = len;
 	}
 	err = skb_copy_datagram_msg(skb, 0, msg, copied);
-	if (err)
-		goto out_free_skb;
-
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
 	sock_recv_timestamp(msg, sk, skb);
 
 	serr = SKB_EXT_ERR(skb);
@@ -544,8 +545,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	msg->msg_flags |= MSG_ERRQUEUE;
 	err = copied;
 
-out_free_skb:
-	kfree_skb(skb);
+	consume_skb(skb);
 out:
 	return err;
 }
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index a73d70119fcd..4b33e54c5753 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -407,9 +407,10 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		copied = len;
 	}
 	err = skb_copy_datagram_msg(skb, 0, msg, copied);
-	if (err)
-		goto out_free_skb;
-
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
 	sock_recv_timestamp(msg, sk, skb);
 
 	serr = SKB_EXT_ERR(skb);
@@ -466,8 +467,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	msg->msg_flags |= MSG_ERRQUEUE;
 	err = copied;
 
-out_free_skb:
-	kfree_skb(skb);
+	consume_skb(skb);
 out:
 	return err;
 }

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox