Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next V1 00/10] net/mlx4: Add flow-steering support
From: Or Gerlitz @ 2012-07-05 14:03 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, amirv, Or Gerlitz, Hadar Hen Zion

This patch series from Hadar adds code to manage L2/L3/L4 network 
flow steering rules, a feature which is supported by the ConnectX-3 device.

The series is built as follows:

The first two patches deal with SRIOV resource tracker, whose mechanism 
is changed to use red-black tree instead of radix tree. The reason for 
this change is that the coming steering patches use flow IDs which are 64 
bits in size, where radix tree keys can't be 64bit on 32bit architecture, 
while RB tree can do that.

Patch #3 is little re-design of the Ethernet driver multicast attachments 
flow to be more efficient and robust.

The fourth patch does a re-org of the checks that deal with the current 
"older" steering modes such that we can easily add soon the new steering 
mode and the code remains easy to manage.

Patch #5 adds the firmware commands for the new steering mode, which is 
called "device managed flow steeering".

Patch 6 is the main patch of this series. It adds support for device-managed flow 
steering all across the place. We had to have this patch also to touch the mlx4 
IB driver, since the steering mode is global to the HCA -- so when being enabled, 
multicast attachment calls done by the IB driver into the mlx4 core driver, 
are now routed to the flow steering firmware commands whose API is a bit different, 
something that the IB driver had to be aware to. Following that, the 7th patch 
adds resource tracking for device-managed flow steering rules.

The 8th patch adds promiscuous mode support under device-managed flow steering,
next, the 9th patch adds implementation for the ethtool APIs for attaching 
L2/L3/L4 based flow steering rules, and the last patch adds support for drop 
action through ethtool.

changes from V0:
- fixed coding convention issues raised by Dave (across the place)
- patch 6/10 - Add device managed flow steering firmware API
  - changed both code and commit message: hash function is hard coded instead
    of a module param
- patch 9/10 - Manage flow steering rules with ethtool
  - not_all_zeros_or_all_ones => all_zeros_or_all_ones + enhance macro
  - check masks instead of values in many places
  - return EINVAL when bad arguments and not ENOTSUPP
  - cleanup ETHER_FLOW validation
  - VLAN tag mask specified by the user must be 12 bits only
  - all functions must return error code
  - fixed potential memory leak in an error flow
  - allow to get number of rings when not in device managed flow steering
  - limit get rules to rule_cnt
  - if no rules exists, do not return error - only return empty list
  <--- MANY thanks to Ben Hutchings for the detailed review of the 
  <--- ethtool and to Amir Vadai for helping with the fixes
- patch 10/10 - Add support for drop action through ethtool
  - return EINVAL instead of ENOTSUPP  

Hadar Hen Zion (9):
  net/mlx4_core: Change resource tracking mechanism to use red-black tree
  net/mlx4_core: Change resource tracking ID to be 64 bit
  net/mlx4: Set steering mode according to device capabilities
  net/mlx4_core: Add firmware commands to support device managed flow steering
  {NET,IB}/mlx4: Add device managed flow steering firmware API
  net/mlx4_core: Add resource tracking for device managed flow steering rules
  net/mlx4: Implement promiscuous mode with device managed flow-steering
  net/mlx4_en: Manage flow steering rules with ethtool
  net/mlx4_en: Add support for drop action through ethtool

Yevgeny Petrilin (1):
  net/mlx4_en: Re-design multicast attachments flow

 drivers/infiniband/hw/mlx4/main.c                  |   62 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |    1 +
 drivers/infiniband/hw/mlx4/qp.c                    |    1 +
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |   19 +
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |  382 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |  316 +++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   30 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   91 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/main.c          |   54 ++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |  524 ++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   29 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |   28 +-
 drivers/net/ethernet/mellanox/mlx4/port.c          |  111 +++--
 drivers/net/ethernet/mellanox/mlx4/profile.c       |   12 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  278 +++++++++--
 include/linux/mlx4/cmd.h                           |    4 +
 include/linux/mlx4/device.h                        |  135 +++++-
 18 files changed, 1849 insertions(+), 231 deletions(-)
-- 
1.7.8.2

Cc: Hadar Hen Zion <hadarh@mellanox.co.il>

^ permalink raw reply

* [PATCH net-next V1 04/10] net/mlx4: Set steering mode according to device capabilities
From: Or Gerlitz @ 2012-07-05 14:03 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, amirv, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <1341497030-1818-1-git-send-email-ogerlitz@mellanox.com>

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  108 ++++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx4/fw.c        |    2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c      |   16 ++++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c       |   70 +++++++--------
 drivers/net/ethernet/mellanox/mlx4/port.c      |    9 +-
 include/linux/mlx4/device.h                    |   24 +++++
 6 files changed, 148 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index bedcbb3..44ff7cd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -265,7 +265,7 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 	struct mlx4_en_mc_list *mclist, *tmp;
 	u64 mcast_addr = 0;
 	u8 mc_list[16] = {0};
-	int err;
+	int err = 0;
 
 	mutex_lock(&mdev->state_lock);
 	if (!mdev->device_up) {
@@ -300,16 +300,36 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 			priv->flags |= MLX4_EN_FLAG_PROMISC;
 
 			/* Enable promiscouos mode */
-			if (!(mdev->dev->caps.flags &
-						MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-				err = mlx4_SET_PORT_qpn_calc(mdev->dev, priv->port,
-							     priv->base_qpn, 1);
-			else
-				err = mlx4_unicast_promisc_add(mdev->dev, priv->base_qpn,
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_unicast_promisc_add(mdev->dev,
+							       priv->base_qpn,
 							       priv->port);
-			if (err)
-				en_err(priv, "Failed enabling "
-					     "promiscuous mode\n");
+				if (err)
+					en_err(priv, "Failed enabling unicast promiscuous mode\n");
+
+				/* Add the default qp number as multicast
+				 * promisc
+				 */
+				if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
+					err = mlx4_multicast_promisc_add(mdev->dev,
+									 priv->base_qpn,
+									 priv->port);
+					if (err)
+						en_err(priv, "Failed enabling multicast promiscuous mode\n");
+					priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
+				}
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				err = mlx4_SET_PORT_qpn_calc(mdev->dev,
+							     priv->port,
+							     priv->base_qpn,
+							     1);
+				if (err)
+					en_err(priv, "Failed enabling promiscuous mode\n");
+				break;
+			}
 
 			/* Disable port multicast filter (unconditionally) */
 			err = mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, 0,
@@ -318,15 +338,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 				en_err(priv, "Failed disabling "
 					     "multicast filter\n");
 
-			/* Add the default qp number as multicast promisc */
-			if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
-				err = mlx4_multicast_promisc_add(mdev->dev, priv->base_qpn,
-								 priv->port);
-				if (err)
-					en_err(priv, "Failed entering multicast promisc mode\n");
-				priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
-			}
-
 			/* Disable port VLAN filter */
 			err = mlx4_SET_VLAN_FLTR(mdev->dev, priv);
 			if (err)
@@ -345,22 +356,31 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		priv->flags &= ~MLX4_EN_FLAG_PROMISC;
 
 		/* Disable promiscouos mode */
-		if (!(mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-			err = mlx4_SET_PORT_qpn_calc(mdev->dev, priv->port,
-						     priv->base_qpn, 0);
-		else
-			err = mlx4_unicast_promisc_remove(mdev->dev, priv->base_qpn,
+		switch (mdev->dev->caps.steering_mode) {
+		case MLX4_STEERING_MODE_B0:
+			err = mlx4_unicast_promisc_remove(mdev->dev,
+							  priv->base_qpn,
 							  priv->port);
-		if (err)
-			en_err(priv, "Failed disabling promiscuous mode\n");
+			if (err)
+				en_err(priv, "Failed disabling unicast promiscuous mode\n");
+			/* Disable Multicast promisc */
+			if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
+				err = mlx4_multicast_promisc_remove(mdev->dev,
+								    priv->base_qpn,
+								    priv->port);
+				if (err)
+					en_err(priv, "Failed disabling multicast promiscuous mode\n");
+				priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
+			}
+			break;
 
-		/* Disable Multicast promisc */
-		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
-			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
-							    priv->port);
+		case MLX4_STEERING_MODE_A0:
+			err = mlx4_SET_PORT_qpn_calc(mdev->dev,
+						     priv->port,
+						     priv->base_qpn, 0);
 			if (err)
-				en_err(priv, "Failed disabling multicast promiscuous mode\n");
-			priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
+				en_err(priv, "Failed disabling promiscuous mode\n");
+			break;
 		}
 
 		/* Enable port VLAN filter */
@@ -378,8 +398,16 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 
 		/* Add the default qp number as multicast promisc */
 		if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
-			err = mlx4_multicast_promisc_add(mdev->dev, priv->base_qpn,
-							 priv->port);
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_multicast_promisc_add(mdev->dev,
+								 priv->base_qpn,
+								 priv->port);
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				break;
+			}
 			if (err)
 				en_err(priv, "Failed entering multicast promisc mode\n");
 			priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
@@ -387,8 +415,16 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 	} else {
 		/* Disable Multicast promisc */
 		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
-			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
-							    priv->port);
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_multicast_promisc_remove(mdev->dev,
+								    priv->base_qpn,
+								    priv->port);
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				break;
+			}
 			if (err)
 				en_err(priv, "Failed disabling multicast promiscuous mode\n");
 			priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 9c83bb8..40e048b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1124,7 +1124,7 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 	MLX4_PUT(inbox, param->mc_base,		INIT_HCA_MC_BASE_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_entry_sz, INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_hash_sz,  INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
+	if (dev->caps.steering_mode == MLX4_STEERING_MODE_B0)
 		MLX4_PUT(inbox, (u8) (1 << 3),	INIT_HCA_UC_STEERING_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_table_sz, INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index a0313de..5783275 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -243,7 +243,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.reserved_srqs	     = dev_cap->reserved_srqs;
 	dev->caps.max_sq_desc_sz     = dev_cap->max_sq_desc_sz;
 	dev->caps.max_rq_desc_sz     = dev_cap->max_rq_desc_sz;
-	dev->caps.num_qp_per_mgm     = mlx4_get_qp_per_mgm(dev);
 	/*
 	 * Subtract 1 from the limit because we need to allocate a
 	 * spare CQE so the HCA HW can tell the difference between an
@@ -274,6 +273,21 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 	dev->caps.max_rss_tbl_sz     = dev_cap->max_rss_tbl_sz;
 
+	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
+	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
+		dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
+	} else {
+		dev->caps.steering_mode = MLX4_STEERING_MODE_A0;
+
+		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER ||
+		    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
+			mlx4_warn(dev, "Must have UC_STEER and MC_STEER flags "
+				       "set to use B0 steering. Falling back to A0 steering mode.\n");
+	}
+	mlx4_dbg(dev, "Steering mode is: %s\n",
+		 mlx4_steering_mode_str(dev->caps.steering_mode));
+	dev->caps.num_qp_per_mgm = mlx4_get_qp_per_mgm(dev);
+
 	/* Sense port always allowed on supported devices for ConnectX1 and 2 */
 	if (dev->pdev->device != 0x1003)
 		dev->caps.flags |= MLX4_DEV_CAP_FLAG_SENSE_SUPPORT;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index f4a8f98..319c9d4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -868,36 +868,50 @@ static int mlx4_QP_ATTACH(struct mlx4_dev *dev, struct mlx4_qp *qp,
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			  int block_mcast_loopback, enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
 
-	if (prot == MLX4_PROT_ETH)
-		gid[7] |= (MLX4_MC_STEER << 1);
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		if (prot == MLX4_PROT_ETH)
+			return 0;
 
-	if (mlx4_is_mfunc(dev))
-		return mlx4_QP_ATTACH(dev, qp, gid, 1,
-					block_mcast_loopback, prot);
+	case MLX4_STEERING_MODE_B0:
+		if (prot == MLX4_PROT_ETH)
+			gid[7] |= (MLX4_MC_STEER << 1);
 
-	return mlx4_qp_attach_common(dev, qp, gid, block_mcast_loopback,
-					prot, MLX4_MC_STEER);
+		if (mlx4_is_mfunc(dev))
+			return mlx4_QP_ATTACH(dev, qp, gid, 1,
+					      block_mcast_loopback, prot);
+		return mlx4_qp_attach_common(dev, qp, gid,
+					     block_mcast_loopback, prot,
+					     MLX4_MC_STEER);
+
+	default:
+		return -EINVAL;
+	}
 }
 EXPORT_SYMBOL_GPL(mlx4_multicast_attach);
 
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			  enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		if (prot == MLX4_PROT_ETH)
+			return 0;
 
-	if (prot == MLX4_PROT_ETH)
-		gid[7] |= (MLX4_MC_STEER << 1);
+	case MLX4_STEERING_MODE_B0:
+		if (prot == MLX4_PROT_ETH)
+			gid[7] |= (MLX4_MC_STEER << 1);
 
-	if (mlx4_is_mfunc(dev))
-		return mlx4_QP_ATTACH(dev, qp, gid, 0, 0, prot);
+		if (mlx4_is_mfunc(dev))
+			return mlx4_QP_ATTACH(dev, qp, gid, 0, 0, prot);
+
+		return mlx4_qp_detach_common(dev, qp, gid, prot,
+					     MLX4_MC_STEER);
 
-	return mlx4_qp_detach_common(dev, qp, gid, prot, MLX4_MC_STEER);
+	default:
+		return -EINVAL;
+	}
 }
 EXPORT_SYMBOL_GPL(mlx4_multicast_detach);
 
@@ -905,10 +919,6 @@ int mlx4_unicast_attach(struct mlx4_dev *dev,
 			struct mlx4_qp *qp, u8 gid[16],
 			int block_mcast_loopback, enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (prot == MLX4_PROT_ETH)
 		gid[7] |= (MLX4_UC_STEER << 1);
 
@@ -924,10 +934,6 @@ EXPORT_SYMBOL_GPL(mlx4_unicast_attach);
 int mlx4_unicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp,
 			       u8 gid[16], enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (prot == MLX4_PROT_ETH)
 		gid[7] |= (MLX4_UC_STEER << 1);
 
@@ -968,9 +974,6 @@ static int mlx4_PROMISC(struct mlx4_dev *dev, u32 qpn,
 
 int mlx4_multicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_MC_STEER, 1, port);
 
@@ -980,9 +983,6 @@ EXPORT_SYMBOL_GPL(mlx4_multicast_promisc_add);
 
 int mlx4_multicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_MC_STEER, 0, port);
 
@@ -992,9 +992,6 @@ EXPORT_SYMBOL_GPL(mlx4_multicast_promisc_remove);
 
 int mlx4_unicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_UC_STEER, 1, port);
 
@@ -1004,9 +1001,6 @@ EXPORT_SYMBOL_GPL(mlx4_unicast_promisc_add);
 
 int mlx4_unicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_UC_STEER, 0, port);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index a8fb529..58de723 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -155,7 +155,7 @@ int mlx4_get_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 		return err;
 	}
 
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER)) {
+	if (dev->caps.steering_mode == MLX4_STEERING_MODE_A0) {
 		*qpn = info->base_qpn + index;
 		return 0;
 	}
@@ -206,7 +206,7 @@ void mlx4_put_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int qpn)
 		 (unsigned long long) mac);
 	mlx4_unregister_mac(dev, port, mac);
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER) {
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0) {
 		entry = radix_tree_lookup(&info->mac_tree, qpn);
 		if (entry) {
 			mlx4_dbg(dev, "Releasing qp: port %d, mac 0x%llx,"
@@ -359,7 +359,7 @@ int mlx4_replace_mac(struct mlx4_dev *dev, u8 port, int qpn, u64 new_mac)
 	int index = qpn - info->base_qpn;
 	int err = 0;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER) {
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0) {
 		entry = radix_tree_lookup(&info->mac_tree, qpn);
 		if (!entry)
 			return -EINVAL;
@@ -803,8 +803,7 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 	u32 m_promisc = (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) ?
 		MCAST_DIRECT : MCAST_DEFAULT;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER  &&
-	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER)
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
 		return 0;
 
 	mailbox = mlx4_alloc_cmd_mailbox(dev);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6a8f002..7f5c9ee 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -70,6 +70,29 @@ enum {
 	MLX4_MFUNC_EQE_MASK     = (MLX4_MFUNC_MAX_EQES - 1)
 };
 
+/* Driver supports 2 diffrent device methods to manage traffic steering:
+ *	- B0 steering mode - Common low level API for ib and (if supported) eth.
+ *	- A0 steering mode - Limited low level API for eth. In case of IB,
+ *			     B0 mode is in use.
+ */
+enum {
+	MLX4_STEERING_MODE_A0,
+	MLX4_STEERING_MODE_B0
+};
+
+static inline const char *mlx4_steering_mode_str(int steering_mode)
+{
+	switch (steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		return "A0 steering";
+
+	case MLX4_STEERING_MODE_B0:
+		return "B0 steering";
+	default:
+		return "Unrecognize steering mode";
+	}
+}
+
 enum {
 	MLX4_DEV_CAP_FLAG_RC		= 1LL <<  0,
 	MLX4_DEV_CAP_FLAG_UC		= 1LL <<  1,
@@ -295,6 +318,7 @@ struct mlx4_caps {
 	int			num_amgms;
 	int			reserved_mcgs;
 	int			num_qp_per_mgm;
+	int			steering_mode;
 	int			num_pds;
 	int			reserved_pds;
 	int			max_xrcds;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next V1 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree
From: Or Gerlitz @ 2012-07-05 14:03 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, amirv, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <1341497030-1818-1-git-send-email-ogerlitz@mellanox.com>

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    3 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  106 ++++++++++++++------
 2 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index e5d2022..1a2f372 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -39,6 +39,7 @@
 
 #include <linux/mutex.h>
 #include <linux/radix-tree.h>
+#include <linux/rbtree.h>
 #include <linux/timer.h>
 #include <linux/semaphore.h>
 #include <linux/workqueue.h>
@@ -509,7 +510,7 @@ struct slave_list {
 struct mlx4_resource_tracker {
 	spinlock_t lock;
 	/* tree for each resources */
-	struct radix_tree_root res_tree[MLX4_NUM_OF_RESOURCE_TYPE];
+	struct rb_root res_tree[MLX4_NUM_OF_RESOURCE_TYPE];
 	/* num_of_slave's lists, one per slave */
 	struct slave_list *slave_list;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 766b8c5..80c03c8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -57,6 +57,7 @@ struct mac_res {
 
 struct res_common {
 	struct list_head	list;
+	struct rb_node		node;
 	u32		        res_id;
 	int			owner;
 	int			state;
@@ -189,6 +190,49 @@ struct res_xrcdn {
 	int			port;
 };
 
+static void *res_tracker_lookup(struct rb_root *root, u64 res_id)
+{
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct res_common *res = container_of(node, struct res_common,
+						      node);
+
+		if (res_id < res->res_id)
+			node = node->rb_left;
+		else if (res_id > res->res_id)
+			node = node->rb_right;
+		else
+			return res;
+	}
+	return NULL;
+}
+
+static int res_tracker_insert(struct rb_root *root, struct res_common *res)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct res_common *this = container_of(*new, struct res_common,
+						       node);
+
+		parent = *new;
+		if (res->res_id < this->res_id)
+			new = &((*new)->rb_left);
+		else if (res->res_id > this->res_id)
+			new = &((*new)->rb_right);
+		else
+			return -EEXIST;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&res->node, parent, new);
+	rb_insert_color(&res->node, root);
+
+	return 0;
+}
+
 /* For Debug uses */
 static const char *ResourceType(enum mlx4_resource rt)
 {
@@ -228,8 +272,7 @@ int mlx4_init_resource_tracker(struct mlx4_dev *dev)
 	mlx4_dbg(dev, "Started init_resource_tracker: %ld slaves\n",
 		 dev->num_slaves);
 	for (i = 0 ; i < MLX4_NUM_OF_RESOURCE_TYPE; i++)
-		INIT_RADIX_TREE(&priv->mfunc.master.res_tracker.res_tree[i],
-				GFP_ATOMIC|__GFP_NOWARN);
+		priv->mfunc.master.res_tracker.res_tree[i] = RB_ROOT;
 
 	spin_lock_init(&priv->mfunc.master.res_tracker.lock);
 	return 0 ;
@@ -277,8 +320,8 @@ static void *find_res(struct mlx4_dev *dev, int res_id,
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
-	return radix_tree_lookup(&priv->mfunc.master.res_tracker.res_tree[type],
-				 res_id);
+	return res_tracker_lookup(&priv->mfunc.master.res_tracker.res_tree[type],
+				  res_id);
 }
 
 static int get_res(struct mlx4_dev *dev, int slave, int res_id,
@@ -523,7 +566,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct res_common **res_arr;
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
-	struct radix_tree_root *root = &tracker->res_tree[type];
+	struct rb_root *root = &tracker->res_tree[type];
 
 	res_arr = kzalloc(count * sizeof *res_arr, GFP_KERNEL);
 	if (!res_arr)
@@ -546,7 +589,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 			err = -EEXIST;
 			goto undo;
 		}
-		err = radix_tree_insert(root, base + i, res_arr[i]);
+		err = res_tracker_insert(root, res_arr[i]);
 		if (err)
 			goto undo;
 		list_add_tail(&res_arr[i]->list,
@@ -559,7 +602,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 
 undo:
 	for (--i; i >= base; --i)
-		radix_tree_delete(&tracker->res_tree[type], i);
+		rb_erase(&res_arr[i]->node, root);
 
 	spin_unlock_irq(mlx4_tlock(dev));
 
@@ -695,7 +738,7 @@ static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 
 	spin_lock_irq(mlx4_tlock(dev));
 	for (i = base; i < base + count; ++i) {
-		r = radix_tree_lookup(&tracker->res_tree[type], i);
+		r = res_tracker_lookup(&tracker->res_tree[type], i);
 		if (!r) {
 			err = -ENOENT;
 			goto out;
@@ -710,8 +753,8 @@ static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 	}
 
 	for (i = base; i < base + count; ++i) {
-		r = radix_tree_lookup(&tracker->res_tree[type], i);
-		radix_tree_delete(&tracker->res_tree[type], i);
+		r = res_tracker_lookup(&tracker->res_tree[type], i);
+		rb_erase(&r->node, &tracker->res_tree[type]);
 		list_del(&r->list);
 		kfree(r);
 	}
@@ -733,7 +776,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_QP], qpn);
+	r = res_tracker_lookup(&tracker->res_tree[RES_QP], qpn);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -797,7 +840,7 @@ static int mr_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_MPT], index);
+	r = res_tracker_lookup(&tracker->res_tree[RES_MPT], index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -850,7 +893,7 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_EQ], index);
+	r = res_tracker_lookup(&tracker->res_tree[RES_EQ], index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -898,7 +941,7 @@ static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
 	int err;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_CQ], cqn);
+	r = res_tracker_lookup(&tracker->res_tree[RES_CQ], cqn);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -952,7 +995,7 @@ static int srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_SRQ], index);
+	r = res_tracker_lookup(&tracker->res_tree[RES_SRQ], index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -1001,7 +1044,7 @@ static void res_abort_move(struct mlx4_dev *dev, int slave,
 	struct res_common *r;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[type], id);
+	r = res_tracker_lookup(&tracker->res_tree[type], id);
 	if (r && (r->owner == slave))
 		r->state = r->from_state;
 	spin_unlock_irq(mlx4_tlock(dev));
@@ -1015,7 +1058,7 @@ static void res_end_move(struct mlx4_dev *dev, int slave,
 	struct res_common *r;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[type], id);
+	r = res_tracker_lookup(&tracker->res_tree[type], id);
 	if (r && (r->owner == slave))
 		r->state = r->to_state;
 	spin_unlock_irq(mlx4_tlock(dev));
@@ -2817,8 +2860,8 @@ static void rem_slave_qps(struct mlx4_dev *dev, int slave)
 				switch (state) {
 				case RES_QP_RESERVED:
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_QP],
-							  qp->com.res_id);
+					rb_erase(&qp->com.node,
+						 &tracker->res_tree[RES_QP]);
 					list_del(&qp->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(qp);
@@ -2888,8 +2931,8 @@ static void rem_slave_srqs(struct mlx4_dev *dev, int slave)
 				case RES_SRQ_ALLOCATED:
 					__mlx4_srq_free_icm(dev, srqn);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_SRQ],
-							  srqn);
+					rb_erase(&srq->com.node,
+						 &tracker->res_tree[RES_SRQ]);
 					list_del(&srq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(srq);
@@ -2954,8 +2997,8 @@ static void rem_slave_cqs(struct mlx4_dev *dev, int slave)
 				case RES_CQ_ALLOCATED:
 					__mlx4_cq_free_icm(dev, cqn);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_CQ],
-							  cqn);
+					rb_erase(&cq->com.node,
+						 &tracker->res_tree[RES_CQ]);
 					list_del(&cq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(cq);
@@ -3017,8 +3060,8 @@ static void rem_slave_mrs(struct mlx4_dev *dev, int slave)
 				case RES_MPT_RESERVED:
 					__mlx4_mr_release(dev, mpt->key);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_MPT],
-							  mptn);
+					rb_erase(&mpt->com.node,
+						 &tracker->res_tree[RES_MPT]);
 					list_del(&mpt->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(mpt);
@@ -3086,8 +3129,8 @@ static void rem_slave_mtts(struct mlx4_dev *dev, int slave)
 					__mlx4_free_mtt_range(dev, base,
 							      mtt->order);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_MTT],
-							  base);
+					rb_erase(&mtt->com.node,
+						 &tracker->res_tree[RES_MTT]);
 					list_del(&mtt->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(mtt);
@@ -3133,8 +3176,8 @@ static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
 				switch (state) {
 				case RES_EQ_RESERVED:
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_EQ],
-							  eqn);
+					rb_erase(&eq->com.node,
+						 &tracker->res_tree[RES_EQ]);
 					list_del(&eq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(eq);
@@ -3191,7 +3234,8 @@ static void rem_slave_counters(struct mlx4_dev *dev, int slave)
 	list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
 		if (counter->com.owner == slave) {
 			index = counter->com.res_id;
-			radix_tree_delete(&tracker->res_tree[RES_COUNTER], index);
+			rb_erase(&counter->com.node,
+				 &tracker->res_tree[RES_COUNTER]);
 			list_del(&counter->com.list);
 			kfree(counter);
 			__mlx4_counter_free(dev, index);
@@ -3220,7 +3264,7 @@ static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)
 	list_for_each_entry_safe(xrcd, tmp, xrcdn_list, com.list) {
 		if (xrcd->com.owner == slave) {
 			xrcdn = xrcd->com.res_id;
-			radix_tree_delete(&tracker->res_tree[RES_XRCD], xrcdn);
+			rb_erase(&xrcd->com.node, &tracker->res_tree[RES_XRCD]);
 			list_del(&xrcd->com.list);
 			kfree(xrcd);
 			__mlx4_xrcd_free(dev, xrcdn);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next V1 03/10] net/mlx4_en: Re-design multicast attachments flow
From: Or Gerlitz @ 2012-07-05 14:03 UTC (permalink / raw)
  To: davem
  Cc: roland, yevgenyp, oren, netdev, amirv, Yevgeny Petrilin,
	Shawn Bohrer, Hadar Hen Zion
In-Reply-To: <1341497030-1818-1-git-send-email-ogerlitz@mellanox.com>

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  143 ++++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   16 +++-
 2 files changed, 124 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 073b85b..bedcbb3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -170,33 +170,81 @@ static void mlx4_en_do_set_mac(struct work_struct *work)
 static void mlx4_en_clear_list(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_mc_list *tmp, *mc_to_del;
 
-	kfree(priv->mc_addrs);
-	priv->mc_addrs = NULL;
-	priv->mc_addrs_cnt = 0;
+	list_for_each_entry_safe(mc_to_del, tmp, &priv->mc_list, list) {
+		list_del(&mc_to_del->list);
+		kfree(mc_to_del);
+	}
 }
 
 static void mlx4_en_cache_mclist(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct netdev_hw_addr *ha;
-	char *mc_addrs;
-	int mc_addrs_cnt = netdev_mc_count(dev);
-	int i;
+	struct mlx4_en_mc_list *tmp;
 
-	mc_addrs = kmalloc(mc_addrs_cnt * ETH_ALEN, GFP_ATOMIC);
-	if (!mc_addrs) {
-		en_err(priv, "failed to allocate multicast list\n");
-		return;
-	}
-	i = 0;
-	netdev_for_each_mc_addr(ha, dev)
-		memcpy(mc_addrs + i++ * ETH_ALEN, ha->addr, ETH_ALEN);
 	mlx4_en_clear_list(dev);
-	priv->mc_addrs = mc_addrs;
-	priv->mc_addrs_cnt = mc_addrs_cnt;
+	netdev_for_each_mc_addr(ha, dev) {
+		tmp = kzalloc(sizeof(struct mlx4_en_mc_list), GFP_ATOMIC);
+		if (!tmp) {
+			en_err(priv, "failed to allocate multicast list\n");
+			mlx4_en_clear_list(dev);
+			return;
+		}
+		memcpy(tmp->addr, ha->addr, ETH_ALEN);
+		list_add_tail(&tmp->list, &priv->mc_list);
+	}
 }
 
+static void update_mclist_flags(struct mlx4_en_priv *priv,
+				struct list_head *dst,
+				struct list_head *src)
+{
+	struct mlx4_en_mc_list *dst_tmp, *src_tmp, *new_mc;
+	bool found;
+
+	/* Find all the entries that should be removed from dst,
+	 * These are the entries that are not found in src
+	 */
+	list_for_each_entry(dst_tmp, dst, list) {
+		found = false;
+		list_for_each_entry(src_tmp, src, list) {
+			if (!memcmp(dst_tmp->addr, src_tmp->addr, ETH_ALEN)) {
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			dst_tmp->action = MCLIST_REM;
+	}
+
+	/* Add entries that exist in src but not in dst
+	 * mark them as need to add
+	 */
+	list_for_each_entry(src_tmp, src, list) {
+		found = false;
+		list_for_each_entry(dst_tmp, dst, list) {
+			if (!memcmp(dst_tmp->addr, src_tmp->addr, ETH_ALEN)) {
+				dst_tmp->action = MCLIST_NONE;
+				found = true;
+				break;
+			}
+		}
+		if (!found) {
+			new_mc = kmalloc(sizeof(struct mlx4_en_mc_list),
+					 GFP_KERNEL);
+			if (!new_mc) {
+				en_err(priv, "Failed to allocate current multicast list\n");
+				return;
+			}
+			memcpy(new_mc, src_tmp,
+			       sizeof(struct mlx4_en_mc_list));
+			new_mc->action = MCLIST_ADD;
+			list_add_tail(&new_mc->list, dst);
+		}
+	}
+}
 
 static void mlx4_en_set_multicast(struct net_device *dev)
 {
@@ -214,6 +262,7 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 						 mcast_task);
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct net_device *dev = priv->dev;
+	struct mlx4_en_mc_list *mclist, *tmp;
 	u64 mcast_addr = 0;
 	u8 mc_list[16] = {0};
 	int err;
@@ -336,7 +385,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 			priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
 		}
 	} else {
-		int i;
 		/* Disable Multicast promisc */
 		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
 			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
@@ -351,13 +399,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		if (err)
 			en_err(priv, "Failed disabling multicast filter\n");
 
-		/* Detach our qp from all the multicast addresses */
-		for (i = 0; i < priv->mc_addrs_cnt; i++) {
-			memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
-			mc_list[5] = priv->port;
-			mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp,
-					      mc_list, MLX4_PROT_ETH);
-		}
 		/* Flush mcast filter and init it with broadcast address */
 		mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, ETH_BCAST,
 				    1, MLX4_MCAST_CONFIG);
@@ -367,13 +408,8 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		netif_tx_lock_bh(dev);
 		mlx4_en_cache_mclist(dev);
 		netif_tx_unlock_bh(dev);
-		for (i = 0; i < priv->mc_addrs_cnt; i++) {
-			mcast_addr =
-			      mlx4_en_mac_to_u64(priv->mc_addrs + i * ETH_ALEN);
-			memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
-			mc_list[5] = priv->port;
-			mlx4_multicast_attach(mdev->dev, &priv->rss_map.indir_qp,
-					      mc_list, 0, MLX4_PROT_ETH);
+		list_for_each_entry(mclist, &priv->mc_list, list) {
+			mcast_addr = mlx4_en_mac_to_u64(mclist->addr);
 			mlx4_SET_MCAST_FLTR(mdev->dev, priv->port,
 					    mcast_addr, 0, MLX4_MCAST_CONFIG);
 		}
@@ -381,6 +417,38 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 					  0, MLX4_MCAST_ENABLE);
 		if (err)
 			en_err(priv, "Failed enabling multicast filter\n");
+
+		update_mclist_flags(priv, &priv->curr_list, &priv->mc_list);
+		list_for_each_entry_safe(mclist, tmp, &priv->curr_list, list) {
+			if (mclist->action == MCLIST_REM) {
+				/* detach this address and delete from list */
+				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
+				mc_list[5] = priv->port;
+				err = mlx4_multicast_detach(mdev->dev,
+							    &priv->rss_map.indir_qp,
+							    mc_list,
+							    MLX4_PROT_ETH);
+				if (err)
+					en_err(priv, "Fail to detach multicast address\n");
+
+				/* remove from list */
+				list_del(&mclist->list);
+				kfree(mclist);
+			}
+
+			if (mclist->action == MCLIST_ADD) {
+				/* attach the address */
+				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
+				mc_list[5] = priv->port;
+				err = mlx4_multicast_attach(mdev->dev,
+							    &priv->rss_map.indir_qp,
+							    mc_list, 0,
+							    MLX4_PROT_ETH);
+				if (err)
+					en_err(priv, "Fail to attach multicast address\n");
+
+			}
+		}
 	}
 out:
 	mutex_unlock(&mdev->state_lock);
@@ -605,6 +673,9 @@ int mlx4_en_start_port(struct net_device *dev)
 		return 0;
 	}
 
+	INIT_LIST_HEAD(&priv->mc_list);
+	INIT_LIST_HEAD(&priv->curr_list);
+
 	/* Calculate Rx buf size */
 	dev->mtu = min(dev->mtu, priv->max_mtu);
 	mlx4_en_calc_rx_buf(dev);
@@ -760,6 +831,7 @@ void mlx4_en_stop_port(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
+	struct mlx4_en_mc_list *mclist, *tmp;
 	int i;
 	u8 mc_list[16] = {0};
 
@@ -781,13 +853,18 @@ void mlx4_en_stop_port(struct net_device *dev)
 	mc_list[5] = priv->port;
 	mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp, mc_list,
 			      MLX4_PROT_ETH);
-	for (i = 0; i < priv->mc_addrs_cnt; i++) {
-		memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
+	list_for_each_entry(mclist, &priv->curr_list, list) {
+		memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
 		mc_list[5] = priv->port;
 		mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp,
 				      mc_list, MLX4_PROT_ETH);
 	}
 	mlx4_en_clear_list(dev);
+	list_for_each_entry_safe(mclist, tmp, &priv->curr_list, list) {
+		list_del(&mclist->list);
+		kfree(mclist);
+	}
+
 	/* Flush multicast filter */
 	mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, 0, 1, MLX4_MCAST_CONFIG);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 225c20d..1bb00cd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -404,6 +404,18 @@ struct mlx4_en_perf_stats {
 #define NUM_PERF_COUNTERS		6
 };
 
+enum mlx4_en_mclist_act {
+	MCLIST_NONE,
+	MCLIST_REM,
+	MCLIST_ADD,
+};
+
+struct mlx4_en_mc_list {
+	struct list_head	list;
+	enum mlx4_en_mclist_act	action;
+	u8			addr[ETH_ALEN];
+};
+
 struct mlx4_en_frag_info {
 	u16 frag_size;
 	u16 frag_prefix_size;
@@ -489,8 +501,8 @@ struct mlx4_en_priv {
 	struct mlx4_en_pkt_stats pkstats;
 	struct mlx4_en_port_stats port_stats;
 	u64 stats_bitmap;
-	char *mc_addrs;
-	int mc_addrs_cnt;
+	struct list_head mc_list;
+	struct list_head curr_list;
 	struct mlx4_en_stat_out_mbox hw_stats;
 	int vids[128];
 	bool wol;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next V1 02/10] net/mlx4_core: Change resource tracking ID to be 64 bit
From: Or Gerlitz @ 2012-07-05 14:03 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, amirv, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <1341497030-1818-1-git-send-email-ogerlitz@mellanox.com>

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.

As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    2 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   26 ++++++++++----------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 1a2f372..a425a98 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1033,7 +1033,7 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port);
 /* resource tracker functions*/
 int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 				    enum mlx4_resource resource_type,
-				    int resource_id, int *slave);
+				    u64 resource_id, int *slave);
 void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave_id);
 int mlx4_init_resource_tracker(struct mlx4_dev *dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 80c03c8..6bdac29 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -58,7 +58,7 @@ struct mac_res {
 struct res_common {
 	struct list_head	list;
 	struct rb_node		node;
-	u32		        res_id;
+	u64		        res_id;
 	int			owner;
 	int			state;
 	int			from_state;
@@ -324,7 +324,7 @@ static void *find_res(struct mlx4_dev *dev, int res_id,
 				  res_id);
 }
 
-static int get_res(struct mlx4_dev *dev, int slave, int res_id,
+static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
 		   enum mlx4_resource type,
 		   void *res)
 {
@@ -350,7 +350,7 @@ static int get_res(struct mlx4_dev *dev, int slave, int res_id,
 
 	r->from_state = r->state;
 	r->state = RES_ANY_BUSY;
-	mlx4_dbg(dev, "res %s id 0x%x to busy\n",
+	mlx4_dbg(dev, "res %s id 0x%llx to busy\n",
 		 ResourceType(type), r->res_id);
 
 	if (res)
@@ -363,7 +363,7 @@ exit:
 
 int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 				    enum mlx4_resource type,
-				    int res_id, int *slave)
+				    u64 res_id, int *slave)
 {
 
 	struct res_common *r;
@@ -384,7 +384,7 @@ int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 	return err;
 }
 
-static void put_res(struct mlx4_dev *dev, int slave, int res_id,
+static void put_res(struct mlx4_dev *dev, int slave, u64 res_id,
 		    enum mlx4_resource type)
 {
 	struct res_common *r;
@@ -516,7 +516,7 @@ static struct res_common *alloc_xrcdn_tr(int id)
 	return &ret->com;
 }
 
-static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
+static struct res_common *alloc_tr(u64 id, enum mlx4_resource type, int slave,
 				   int extra)
 {
 	struct res_common *ret;
@@ -558,7 +558,7 @@ static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
 	return ret;
 }
 
-static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
+static int add_res_range(struct mlx4_dev *dev, int slave, u64 base, int count,
 			 enum mlx4_resource type, int extra)
 {
 	int i;
@@ -727,10 +727,10 @@ static int remove_ok(struct res_common *res, enum mlx4_resource type, int extra)
 	}
 }
 
-static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
+static int rem_res_range(struct mlx4_dev *dev, int slave, u64 base, int count,
 			 enum mlx4_resource type, int extra)
 {
-	int i;
+	u64 i;
 	int err;
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
@@ -784,7 +784,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 	else {
 		switch (state) {
 		case RES_QP_BUSY:
-			mlx4_dbg(dev, "%s: failed RES_QP, 0x%x\n",
+			mlx4_dbg(dev, "%s: failed RES_QP, 0x%llx\n",
 				 __func__, r->com.res_id);
 			err = -EBUSY;
 			break;
@@ -793,7 +793,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 			if (r->com.state == RES_QP_MAPPED && !alloc)
 				break;
 
-			mlx4_dbg(dev, "failed RES_QP, 0x%x\n", r->com.res_id);
+			mlx4_dbg(dev, "failed RES_QP, 0x%llx\n", r->com.res_id);
 			err = -EINVAL;
 			break;
 
@@ -802,7 +802,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 			    r->com.state == RES_QP_HW)
 				break;
 			else {
-				mlx4_dbg(dev, "failed RES_QP, 0x%x\n",
+				mlx4_dbg(dev, "failed RES_QP, 0x%llx\n",
 					  r->com.res_id);
 				err = -EINVAL;
 			}
@@ -2794,7 +2794,7 @@ static int _move_all_busy(struct mlx4_dev *dev, int slave,
 				if (r->state == RES_ANY_BUSY) {
 					if (print)
 						mlx4_dbg(dev,
-							 "%s id 0x%x is busy\n",
+							 "%s id 0x%llx is busy\n",
 							  ResourceType(type),
 							  r->res_id);
 					++busy;
-- 
1.7.1

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-05 14:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVPksMd8CGZWEGBXBXB5NZCg3QOZreOLavkV-Z2pjE-M_A@mail.gmail.com>

On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:

> Basically, copy path will be bypassed since hard_header_len
> has included the 'overhead' already.

Unfortunately this is not done correctly.

needed_headroom should be set to SMSC95XX_TX_OVERHEAD_CSUM

^ permalink raw reply

* [PATCH net-next] asix: avoid copies in tx path
From: Eric Dumazet @ 2012-07-05 14:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Greg Kroah-Hartman, Allan Chou, Trond Wuellner,
	Grant Grundler, Ming Lei

From: Eric Dumazet <edumazet@google.com>

I noticed excess calls to skb_copy_expand() or memmove() in asix driver.

This driver needs to push 4 bytes in front of frame (packet_len)
and maybe add 4 bytes after the end (if padlen is 4)

So it should set needed_headroom & needed_tailroom to avoid
copies. But its not enough, because many packets are cloned
before entering asix_tx_fixup() and this driver use skb_cloned()
as a lazy way to check if it can push and put additional bytes in frame.

Avoid skb_copy_expand() expensive call, using following rules :

- We are allowed to push 4 bytes in headroom if skb_header_cloned()
  is false (and if we have 4 bytes of headroom)

- We are allowed to put 4 bytes at tail if skb_cloned()
  is false (and if we have 4 bytes of tailroom)

TCP packets for example are cloned, but skb_header_release()
was called in tcp stack, allowing us to use headroom for our needs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Allan Chou <allan@asix.com.tw>
Cc: Trond Wuellner <trond@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Paul Stewart <pstew@chromium.org>
Cc: Ming Lei <tom.leiming@gmail.com>
---
 drivers/net/usb/asix.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 3ae80ec..6564c32 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -358,14 +358,30 @@ static struct sk_buff *asix_tx_fixup(struct usbnet *dev, struct sk_buff *skb,
 
 	padlen = ((skb->len + 4) & (dev->maxpacket - 1)) ? 0 : 4;
 
-	if ((!skb_cloned(skb)) &&
-	    ((headroom + tailroom) >= (4 + padlen))) {
-		if ((headroom < 4) || (tailroom < padlen)) {
+	/* We need to push 4 bytes in front of frame (packet_len)
+	 * and maybe add 4 bytes after the end (if padlen is 4)
+	 *
+	 * Avoid skb_copy_expand() expensive call, using following rules :
+	 * - We are allowed to push 4 bytes in headroom if skb_header_cloned()
+	 *   is false (and if we have 4 bytes of headroom)
+	 * - We are allowed to put 4 bytes at tail if skb_cloned()
+	 *   is false (and if we have 4 bytes of tailroom)
+	 *
+	 * TCP packets for example are cloned, but skb_header_release()
+	 * was called in tcp stack, allowing us to use headroom for our needs.
+	 */
+	if (!skb_header_cloned(skb) &&
+	    !(padlen && skb_cloned(skb)) &&
+	    headroom + tailroom >= 4 + padlen) {
+		/* following should not happen, but better be safe */
+		if (headroom < 4 ||
+		    tailroom < padlen) {
 			skb->data = memmove(skb->head + 4, skb->data, skb->len);
 			skb_set_tail_pointer(skb, skb->len);
 		}
 	} else {
 		struct sk_buff *skb2;
+
 		skb2 = skb_copy_expand(skb, 4, padlen, flags);
 		dev_kfree_skb_any(skb);
 		skb = skb2;
@@ -373,8 +389,8 @@ static struct sk_buff *asix_tx_fixup(struct usbnet *dev, struct sk_buff *skb,
 			return NULL;
 	}
 
+	packet_len = ((skb->len ^ 0x0000ffff) << 16) + skb->len;
 	skb_push(skb, 4);
-	packet_len = (((skb->len - 4) ^ 0x0000ffff) << 16) + (skb->len - 4);
 	cpu_to_le32s(&packet_len);
 	skb_copy_to_linear_data(skb, &packet_len, sizeof(packet_len));
 
@@ -880,6 +896,8 @@ static int ax88172_bind(struct usbnet *dev, struct usb_interface *intf)
 
 	dev->net->netdev_ops = &ax88172_netdev_ops;
 	dev->net->ethtool_ops = &ax88172_ethtool_ops;
+	dev->net->needed_headroom = 4; /* cf asix_tx_fixup() */
+	dev->net->needed_tailroom = 4; /* cf asix_tx_fixup() */
 
 	asix_mdio_write(dev->net, dev->mii.phy_id, MII_BMCR, BMCR_RESET);
 	asix_mdio_write(dev->net, dev->mii.phy_id, MII_ADVERTISE,
@@ -1075,6 +1093,8 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 
 	dev->net->netdev_ops = &ax88772_netdev_ops;
 	dev->net->ethtool_ops = &ax88772_ethtool_ops;
+	dev->net->needed_headroom = 4; /* cf asix_tx_fixup() */
+	dev->net->needed_tailroom = 4; /* cf asix_tx_fixup() */
 
 	embd_phy = ((dev->mii.phy_id & 0x1f) == 0x10 ? 1 : 0);
 

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-05 14:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVPksMd8CGZWEGBXBXB5NZCg3QOZreOLavkV-Z2pjE-M_A@mail.gmail.com>

On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:

> At default SMSC95xx turbo mode is true, rx buffer will be very big
> (17.5K). Or the large rx buffer size puts limit on concurrent URBs/SKBs
> count.  Both two may cause the problem.

I see. So we should try to recycle these large rx buffers in usbnet
instead of allocating/freeing them for each incoming packet.

Following patch does the copybreak of all incoming frames.

It has nice property of not lying anymore on skb truesize ;)

It should be applied on both sender and receiver

 drivers/net/usb/smsc95xx.c |   19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index b1112e7..3d9566f 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1080,30 +1080,17 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 				return 0;
 			}
 
-			/* last frame in this batch */
-			if (skb->len == size) {
-				if (dev->net->features & NETIF_F_RXCSUM)
-					smsc95xx_rx_csum_offload(skb);
-				skb_trim(skb, skb->len - 4); /* remove fcs */
-				skb->truesize = size + sizeof(struct sk_buff);
-
-				return 1;
-			}
-
-			ax_skb = skb_clone(skb, GFP_ATOMIC);
+			ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
 			if (unlikely(!ax_skb)) {
 				netdev_warn(dev->net, "Error allocating skb\n");
 				return 0;
 			}
 
-			ax_skb->len = size;
-			ax_skb->data = packet;
-			skb_set_tail_pointer(ax_skb, size);
+			memcpy(skb_put(ax_skb, size), packet, size);
 
 			if (dev->net->features & NETIF_F_RXCSUM)
 				smsc95xx_rx_csum_offload(ax_skb);
-			skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
-			ax_skb->truesize = size + sizeof(struct sk_buff);
+			__skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
 
 			usbnet_skb_return(dev, ax_skb);
 		}

^ permalink raw reply related

* Re: [PATCH net-next] ipv6: Initialize the neighbour pointer of rt6_info on allocation
From: Eric Dumazet @ 2012-07-05 15:16 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20120705131828.GE1869@secunet.com>

On Thu, 2012-07-05 at 15:18 +0200, Steffen Klassert wrote:
> git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
> added a neighbour pointer to rt6_info. Currently we don't initialize
> this pointer at allocation time. We assume this pointer to be valid
> if it is not a null pointer, so initialize it on allocation.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv6/route.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index ceff71d..6cc6c88 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -273,7 +273,7 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
>  					0, 0, flags);
>  
>  	if (rt) {
> -		memset(&rt->rt6i_table, 0,
> +		memset(&rt->n, 0,
>  		       sizeof(*rt) - sizeof(struct dst_entry));
>  		rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
>  	}

Hmm, could we find a way to avoid this for future changes ?

We know dst_entry is the first field, so maybe :

if (rt) {
	struct dst_entry *dst = (struct dst_entry *)rt;

	memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));

^ permalink raw reply

* Re: [PATCH] NFC: Prevent NULL deref when getting socket name
From: Samuel Ortiz @ 2012-07-05 15:42 UTC (permalink / raw)
  To: John W. Linville
  Cc: Sasha Levin, lauro.venancio, aloisio.almeida, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20120702182438.GB2010@tuxdriver.com>

Hi John,

On Mon, Jul 02, 2012 at 02:24:38PM -0400, John W. Linville wrote:
> On Sat, Jun 30, 2012 at 11:56:47AM +0200, Sasha Levin wrote:
> > llcp_sock_getname can be called without a device attached to the nfc_llcp_sock.
> > 
> > This would lead to the following BUG:
> > 
> > [  362.341807] BUG: unable to handle kernel NULL pointer dereference at           (null)
> > [  362.341815] IP: [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
> > [  362.341818] PGD 31b35067 PUD 30631067 PMD 0
> > [  362.341821] Oops: 0000 [#627] PREEMPT SMP DEBUG_PAGEALLOC
> > [  362.341826] CPU 3
> > [  362.341827] Pid: 7816, comm: trinity-child55 Tainted: G      D W    3.5.0-rc4-next-20120628-sasha-00005-g9f23eb7 #479
> > [  362.341831] RIP: 0010:[<ffffffff836258e5>]  [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
> > [  362.341832] RSP: 0018:ffff8800304fde88  EFLAGS: 00010286
> > [  362.341834] RAX: 0000000000000000 RBX: ffff880033cb8000 RCX: 0000000000000001
> > [  362.341835] RDX: ffff8800304fdec4 RSI: ffff8800304fdec8 RDI: ffff8800304fdeda
> > [  362.341836] RBP: ffff8800304fdea8 R08: 7ebcebcb772b7ffb R09: 5fbfcb9c35bdfd53
> > [  362.341838] R10: 4220020c54326244 R11: 0000000000000246 R12: ffff8800304fdec8
> > [  362.341839] R13: ffff8800304fdec4 R14: ffff8800304fdec8 R15: 0000000000000044
> > [  362.341841] FS:  00007effa376e700(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000
> > [  362.341843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  362.341844] CR2: 0000000000000000 CR3: 0000000030438000 CR4: 00000000000406e0
> > [  362.341851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  362.341856] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  362.341858] Process trinity-child55 (pid: 7816, threadinfo ffff8800304fc000, task ffff880031270000)
> > [  362.341858] Stack:
> > [  362.341862]  ffff8800304fdea8 ffff880035156780 0000000000000000 0000000000001000
> > [  362.341865]  ffff8800304fdf78 ffffffff83183b40 00000000304fdec8 0000006000000000
> > [  362.341868]  ffff8800304f0027 ffffffff83729649 ffff8800304fdee8 ffff8800304fdf48
> > [  362.341869] Call Trace:
> > [  362.341874]  [<ffffffff83183b40>] sys_getpeername+0xa0/0x110
> > [  362.341877]  [<ffffffff83729649>] ? _raw_spin_unlock_irq+0x59/0x80
> > [  362.341882]  [<ffffffff810f342b>] ? do_setitimer+0x23b/0x290
> > [  362.341886]  [<ffffffff81985ede>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  362.341889]  [<ffffffff8372a539>] system_call_fastpath+0x16/0x1b
> > [  362.341921] Code: 84 00 00 00 00 00 b8 b3 ff ff ff 48 85 db 74 54 66 41 c7 04 24 27 00 49 8d 7c 24 12 41 c7 45 00 60 00 00 00 48 8b 83 28 05 00 00 <8b> 00 41 89 44 24 04 0f b6 83 41 05 00 00 41 88 44 24 10 0f b6
> > [  362.341924] RIP  [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
> > [  362.341925]  RSP <ffff8800304fde88>
> > [  362.341926] CR2: 0000000000000000
> > [  362.341928] ---[ end trace 6d450e935ee18bf3 ]---
> > 
> > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> 
> Samuel, I'm taking this one directly.
Thanks. It was already on my for-wireless branch.

Cheers,
Samuel.

-- 
Intel Open Source Technology Centre
http://oss.intel.com/

^ permalink raw reply

* Re: [PATCH] NFC: Don't warn when closing uninitialized socket
From: Samuel Ortiz @ 2012-07-05 15:45 UTC (permalink / raw)
  To: Sasha Levin
  Cc: lauro.venancio, aloisio.almeida, linville, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1341228702-9924-1-git-send-email-levinsasha928@gmail.com>

Hi Sasha,

On Mon, Jul 02, 2012 at 01:31:42PM +0200, Sasha Levin wrote:
> It is possible to close a NFC socket without initializing it first,
> there's no need to warn about it.
> 
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
>  net/nfc/llcp/llcp.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
I already fixed this one, with your reported-by.

Cheers,
Samuel.

-- 
Intel Open Source Technology Centre
http://oss.intel.com/

^ permalink raw reply

* Re: AF_BUS socket address family
From: Daniel Walker @ 2012-07-05 16:01 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller,
	Arve Hjønnevåg, John Stultz, Anton Vorontsov,
	Greg Kroah-Hartman
In-Reply-To: <CAKnu2Mr_TWvBXmL+ORfddpEgPdPxPXFpJdCtvPe+imj6o6Y+1w@mail.gmail.com>

On Thu, Jul 05, 2012 at 09:59:53AM +0200, Linus Walleij wrote:
> 2012/6/29 Vincent Sanders <vincent.sanders@collabora.co.uk>:
> 
> > AF_BUS is a message oriented inter process communication system.
> 
> We have a very huge and important in-kernel IPC message passer
> in drivers/staging/android/binder.c
> 
> It's deployed in some 400 million devices according to latest reports.
> John Stultz & Anton Vorontsov are trying to look after these Android
> drivers a bit...
> 
> I and others discussed this in the past with the Android folks. Dianne
> makes an excellent summary of how it works here:
> https://lkml.org/lkml/2009/6/25/3
> 
> If we could all be convinced that this thing also fulfills the needs
> of what binder does, this is a pretty solid case for it too. I can
> sure see that some of the shortcuts that Android is taking with
> binder try to address the same issue of high-speed IPC loopholes
> through the kernel and some kind of security model.
> 
> Whether Android would actually use it (or wrap it) is a totally
> different question, but what I think we need to know is whether it
> *could*. And staging code has to move forward, maybe this
> is the direction it should move?

I'm all for alternatives.. I haven't read this thread at all, but I read an LWN
article comparing Binder and other implementations .. So there are for sure
alternatives. It would be nice if things were included that did whatever binder
needs .. Since I did the logger performance analysis the big questions to me is
if Binder is actually fast (or faster than the alternatives). Whatever this
AF_BUS things is reviewing the performance of the major alternative(s) is probably a
good idea.

In terms of Android using anything we produce or incorporate, don't get
your hopes up .. They will always just use Binder .. (John is good cop, I'm bad
cop)

Daniel

^ permalink raw reply

* RE: [PATCH net 3/7] qlge: Garbage values shown in extra info during selftest.
From: Jitendra Kalsaria @ 2012-07-05 17:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ron Mercer, Dept-NX Linux NIC Driver
In-Reply-To: <20120705.002341.316014337743384600.davem@davemloft.net>



-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net] 
>Sent: Thursday, July 05, 2012 12:24 AM
>To: Jitendra Kalsaria
>Cc: netdev; Ron Mercer; Dept-NX Linux NIC Driver
>Subject: Re: [PATCH net 3/7] qlge: Garbage values shown in extra info during selftest.
>
>
>Why are you posting an arbitrary patch from a patch series,
>yet not the rest of that series?
>
>This needs to be sent alongside the rest of the series.

I haven't sent any arbitrary patch, seems like something wrong with mail server.

Thanks for letting me know about this will get it fixed. 

^ permalink raw reply

* Re: [net-next RFC V5 0/5] Multiqueue virtio-net
From: Rick Jones @ 2012-07-05 17:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>

On 07/05/2012 03:29 AM, Jason Wang wrote:

>
> Test result:
>
> 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
>
> - Guest to External Host TCP STREAM
> sessions size throughput1 throughput2   norm1 norm2
> 1 64 650.55 655.61 100% 24.88 24.86 99%
> 2 64 1446.81 1309.44 90% 30.49 27.16 89%
> 4 64 1430.52 1305.59 91% 30.78 26.80 87%
> 8 64 1450.89 1270.82 87% 30.83 25.95 84%

Was the -D test-specific option used to set TCP_NODELAY?  I'm guessing 
from your description of how packet sizes were smaller with multiqueue 
and your need to hack tcp_write_xmit() it wasn't but since we don't have 
the specific netperf command lines (hint hint :) I wanted to make certain.

Instead of calling them throughput1 and throughput2, it might be more 
clear in future to identify them as singlequeue and multiqueue.

Also, how are you combining the concurrent netperf results?  Are you 
taking sums of what netperf reports, or are you gathering statistics 
outside of netperf?

> - TCP RR
> sessions size throughput1 throughput2   norm1 norm2
> 50 1 54695.41 84164.98 153% 1957.33 1901.31 97%

A single instance TCP_RR test would help confirm/refute any non-trivial 
change in (effective) path length between the two cases.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
From: Javier Martinez Canillas @ 2012-07-05 17:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Javier Martinez Canillas, Vincent Sanders, netdev, linux-kernel,
	David S. Miller, Alban Crequy
In-Reply-To: <20120704173047.GA8864@1984>

On Wed, Jul 4, 2012 at 7:30 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Jul 02, 2012 at 05:43:43PM +0200, Javier Martinez Canillas wrote:
>> On 06/29/2012 07:11 PM, Pablo Neira Ayuso wrote:
>> > On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
>> >> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
>> >>
>> >> The netfilter D-Bus module needs to parse D-bus messages sent by
>> >> applications to decide whether a peer can receive or not a D-Bus
>> >> message. Add D-bus message parsing logic to be able to analyze.
>> >
>> > Not talking about the entire patchset, only about the part I'm
>> > responsible for.
>> >
>> > I don't see why you think this belong to netfilter at all.
>> >
>> > This doesn't integrate into the existing filtering infrastructure,
>> > neither it extends it in any way.
>> >
>>
>> Hello Pablo,
>>
>> Thanks a lot for your feedback.
>>
>> This is the first of a set of patches that adds a netfilter module to parse
>> D-Bus messages, the complete patch-set is:
>>
>> [PATCH 13/15] netfilter: nfdbus: Add D-bus message parsing
>> [PATCH 14/15] netfilter: nfdbus: Add D-bus match rule implementation
>> [PATCH 15/15] netfilter: add netfilter D-Bus module
>>
>> patches 13 and 14 just include D-Bus helper code to be used by the netfilter
>> module (added on patch 15) and specially the dbus_filter netfilter hook function.
>
> I see, the use of the netfilter hooks seems to be the only reason why
> you consider these chunks belong to netfilter.
>
>> For the next post version we will reorganize the patches so first the D-Bus
>> netfilter module is added with an empty dbus_filter function and then added the
>> D-Bus helper code.
>>
>> Also, we will move the nfdbus netfilter module to net/bus so is not inside the
>> netfilter core code.
>
> Yes, please, remove this stuff from my directory tree, I believe this
> filtering infrastructure has not much to do with Netfilter itself.
>
> It uses the connector to communicate kernel <-> userspace instead of
> nfnetlink and, as said, it does neither integrate into existing
> filtering kernel/userspace infrastructure nor integrates into it.
>
> So, please, if you plan to give another try to this patchset, move
> this to your net/bus directory as you propose and find a different
> (better) name for the filtering part (just to avoid confusion in the
> future).
>
> Thanks.

Hi Pablo,

Thanks a lot for your feedback and comments.

On a next patch-set post I'll move to net/bus and change the name as
you suggest.

Best regards,

-- 
Javier Martínez Canillas
(+34) 682 39 81 69
Barcelona, Spain

^ permalink raw reply

* Re: [PATCH 0/5] rtcache remove respin
From: Eric Dumazet @ 2012-07-05 19:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120705.031539.2275742387594459652.davem@davemloft.net>

On Thu, 2012-07-05 at 03:15 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 02 Jul 2012 12:44:01 +0200
> 
> > If we still want __refcnt being on cache line boundary, we might find a
> > better way to accomplish this.
> 
> Back to this issue again.
> 
> Eric, if you take a look at net-next right now, I left a dummy padding
> in dst_entry where the neighbour pointer used to be.
> 
> Can you come up with some way to make use of that new space?
> 

If route cache is removed, I believe we can remove all paddings.

Each tcp session will have its own dst_entry, instead of being shared.

^ permalink raw reply

* Re: [net-next RFC V5 4/5] virtio_net: multiqueue support
From: Amos Kong @ 2012-07-05 20:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-5-git-send-email-jasowang@redhat.com>

On 07/05/2012 06:29 PM, Jason Wang wrote:
> This patch converts virtio_net to a multi queue device. After negotiated
> VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs,
> and driver could read the number from config space.
> 
> The driver expects the number of rx/tx queue paris is equal to the number of
> vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
> optimization were introduced:
> 
> - Txq selection is based on the processor id in order to avoid contending a lock
>   whose owner may exits to host.
> - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
>   the queue pairs.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---

...

>  
>  static int virtnet_probe(struct virtio_device *vdev)
>  {
> -	int err;
> +	int i, err;
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
> +	u16 num_queues, num_queue_pairs;
> +
> +	/* Find if host supports multiqueue virtio_net device */
> +	err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
> +				offsetof(struct virtio_net_config,
> +				num_queues), &num_queues);
> +
> +	/* We need atleast 2 queue's */


s/atleast/at least/


> +	if (err || num_queues < 2)
> +		num_queues = 2;
> +	if (num_queues > MAX_QUEUES * 2)
> +		num_queues = MAX_QUEUES;

                num_queues = MAX_QUEUES * 2;

MAX_QUEUES is the limitation of RX or TX.

> +
> +	num_queue_pairs = num_queues / 2;

...

-- 
			Amos.

^ permalink raw reply

* Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Amos Kong @ 2012-07-05 20:07 UTC (permalink / raw)
  To: Sasha Levin
  Cc: krkumar2, habanero, kvm, mst, netdev, mashirle, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341492679.18786.18.camel@lappy>

On 07/05/2012 08:51 PM, Sasha Levin wrote:
> On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
>> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>>                 vi->has_cvq = true;
>>  


>> +       /* Use single tx/rx queue pair as default */
>> +       vi->num_queue_pairs = 1;
>> +       vi->total_queue_pairs = num_queue_pairs; 

vi->total_queue_pairs also should be set to 1

           vi->total_queue_pairs = 1;

> 
> The code is using this "default" even if the amount of queue pairs it
> wants was specified during initialization. This basically limits any
> device to use 1 pair when starting up.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
			Amos.

^ permalink raw reply

* Re: AF_BUS socket address family
From: Jan Engelhardt @ 2012-07-05 21:06 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20120629231236.GA28593@mail.collabora.co.uk>


On Saturday 2012-06-30 01:12, Vincent Sanders wrote:
>
>Firstly it is intended is an interprocess mechanism and not to rely on
>a configured IP system, indeed one of its primary usages is to
>provide mechanism for various tools to set up IP networking.

Using IP as a localhost IPC is not uncommon (independent of
software preferring AF_UNIX, if so available). Distro boot
scripts have been running `ip addr add ::1/128 dev lo`
all these years along.

And now we suddently need a DBUS program just to configure
IP-based localhost IPC? I can see the flaw in that.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: Initialize the neighbour pointer of rt6_info on allocation
From: David Miller @ 2012-07-05 21:21 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20120705131828.GE1869@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 5 Jul 2012 15:18:28 +0200

> git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
> added a neighbour pointer to rt6_info. Currently we don't initialize
> this pointer at allocation time. We assume this pointer to be valid
> if it is not a null pointer, so initialize it on allocation.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Applied, but as Eric said we need to find a way to avoid having to
make changes like this every time we simply want to add a struct
member to rt6_info.

^ permalink raw reply

* Re: [net-next:master] general protection fault in __nla_put()
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
  To: wfg; +Cc: netdev
In-Reply-To: <20120705134857.GA14643@localhost>


Steffen Klassert posted a patch which fixes this.

^ permalink raw reply

* Re: ipv6 problem with 6lowpan
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
  To: alex.bluesman.smirnov; +Cc: netdev
In-Reply-To: <CAJmB2rD8U1ihy4Ai6y5QGjj4f7txDabszesrNrQ=pgEbscePqQ@mail.gmail.com>


Should be fixed by Steffen Kassert's patch which I just pushed into net-next

^ permalink raw reply

* Re: [PATCH 0/5] rtcache remove respin
From: David Miller @ 2012-07-05 21:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1341515017.3265.6.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 21:03:37 +0200

> If route cache is removed, I believe we can remove all paddings.
> 
> Each tcp session will have its own dst_entry, instead of being shared.

Not really, the routing cache removal patches have poor performance
and won't go-in as-is. :-) Once PMTU/redirect/TCP-metrics are reworked
I plan to do things like the patch below to make the performance loss
more acceptable.

And then I'll do the same for input routes too, at which point your
'noref' case can be put back.

So really, we have to consider how to rework the layout of this
structure.

Thanks.

====================
ipv4: Cache output routes in fib_info nexthops.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h     |    3 +++
 net/ipv4/fib_semantics.c |    2 ++
 net/ipv4/route.c         |    9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 3dc7c96..ff9f0c4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -45,6 +45,7 @@ struct fib_config {
  };
 
 struct fib_info;
+struct rtable;
 
 struct fib_nh {
 	struct net_device	*nh_dev;
@@ -63,6 +64,8 @@ struct fib_nh {
 	__be32			nh_gw;
 	__be32			nh_saddr;
 	int			nh_saddr_genid;
+
+	struct rtable		*rth;
 };
 
 /*
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c46c20b..f3ada74 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -148,6 +148,8 @@ static void free_fib_info_rcu(struct rcu_head *head)
 	change_nexthops(fi) {
 		if (nexthop_nh->nh_dev)
 			dev_put(nexthop_nh->nh_dev);
+		if (nexthop_nh->rth)
+			dst_release(&nexthop_nh->rth->dst);
 	} endfor_nexthops(fi);
 
 	release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9f68f74..35bfd98 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -914,6 +914,8 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		dst->tclassid = FIB_RES_NH(*res).nh_tclassid;
 #endif
+		FIB_RES_NH(*res).rth = rt;
+		dst_clone(&rt->dst);
 	}
 
 	if (dst_mtu(dst) > IP_MAX_MTU)
@@ -1399,6 +1401,13 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 			fi = NULL;
 	}
 
+	if (fi) {
+		rth = FIB_RES_NH(*res).rth;
+		if (rth) {
+			dst_use(&rth->dst, jiffies);
+			return rth;
+		}
+	}
 	rth = rt_dst_alloc(dev_out,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(in_dev, NOXFRM));
-- 
1.7.10

^ permalink raw reply related

* [PATCH] gianfar: fix potential sk_wmem_alloc imbalance
From: Eric Dumazet @ 2012-07-05 21:45 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Manfred Rudigier, Claudiu Manoil, Jiajun Wu,
	Paul Gortmaker, Andy Fleming

From: Eric Dumazet <edumazet@google.com>

commit db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance

If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.

Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andy Fleming <afleming@freescale.com>
---

Note : I don't have the hardware and discovered this problem by code
analysis. So please compile and run this patch before Acking it,
thanks !

BTW, dev->needed_headroom should be set to GMAC_FCB_LEN + GMAC_TXPAL_LEN
to avoid reallocations...

 drivers/net/ethernet/freescale/gianfar.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index f2db8fc..ab1d80f 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2063,10 +2063,9 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			return NETDEV_TX_OK;
 		}
 
-		/* Steal sock reference for processing TX time stamps */
-		swap(skb_new->sk, skb->sk);
-		swap(skb_new->destructor, skb->destructor);
-		kfree_skb(skb);
+		if (skb->sk)
+			skb_set_owner_w(skb_new, skb->sk);
+		consume_skb(skb);
 		skb = skb_new;
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-07-05 21:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120629.153425.24594752441419170.davem@davemloft.net>

On Fri, 2012-06-29 at 15:34 -0700, David Miller wrote: 
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Fri, 29 Jun 2012 12:32:45 -0700
> 
> > commit 1f85d58cdf15354a7120fc9ccc9bb9c45b53af88
> >     cnic: Remove uio mem[0].
> > 
> > introduced a regression as older versions of userspace app still rely
> > on this mmap.  Restore the mmap functionality and get the base address
> > from pci_resource_start() as the nedev->base_addr has been deprecated for
> > PCI devices.
> > 
> > Update version to 2.5.12.
> > 
> > Signed-off-by: Michael Chan <mchan@broadocm.com>
> 
> I really couldn't believe what you guys were doing in the original
> commit, but I decided to let you do stupid things and find out the
> hard way that removing any user visible interface is basically
> impossible.
> 
> Applied, thanks.
> 

David, this patch plus the earlier commit are also needed for the net
tree because netdev->base_addr was removed there.  Can you apply these
directly to the net tree?  Or you want me to send you the equivalent
patches for net.  Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox