Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* [PATCH for-next V2 11/15] net/mlx5: Refactor find_flow_rule
From: Saeed Mahameed @ 2016-10-30 21:22 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak, Mark Bloch,
	Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The way we compare between two dests will need to be used in other
places in the future, so we factor out the comparison logic
between two dests into a separate function.

Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 29 ++++++++++++++++-------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e2bab9d..fca6937 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -153,6 +153,8 @@ static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
 static void del_fte(struct fs_node *node);
+static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
+				struct mlx5_flow_destination *d2);
 
 static void tree_init_node(struct fs_node *node,
 			   unsigned int refcount,
@@ -1064,21 +1066,30 @@ static struct mlx5_flow_group *create_autogroup(struct mlx5_flow_table *ft,
 	return fg;
 }
 
+static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
+				struct mlx5_flow_destination *d2)
+{
+	if (d1->type == d2->type) {
+		if ((d1->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
+		     d1->vport_num == d2->vport_num) ||
+		    (d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
+		     d1->ft == d2->ft) ||
+		    (d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
+		     d1->tir_num == d2->tir_num))
+			return true;
+	}
+
+	return false;
+}
+
 static struct mlx5_flow_rule *find_flow_rule(struct fs_fte *fte,
 					     struct mlx5_flow_destination *dest)
 {
 	struct mlx5_flow_rule *rule;
 
 	list_for_each_entry(rule, &fte->node.children, node.list) {
-		if (rule->dest_attr.type == dest->type) {
-			if ((dest->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
-			     dest->vport_num == rule->dest_attr.vport_num) ||
-			    (dest->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
-			     dest->ft == rule->dest_attr.ft) ||
-			    (dest->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
-			     dest->tir_num == rule->dest_attr.tir_num))
-				return rule;
-		}
+		if (mlx5_flow_dests_cmp(&rule->dest_attr, dest))
+			return rule;
 	}
 	return NULL;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next V2 10/15] net/mlx5: Use fte status to decide on firmware command
From: Saeed Mahameed @ 2016-10-30 21:22 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak, Mark Bloch,
	Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

An fte status becomes FS_FTE_STATUS_EXISTING only after it was
created in HW. We can use this in order to simplify the logic on
what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING
we need to create the fte, otherwise we need only to update it.

Signed-off-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index a07ff30..e2bab9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -946,7 +946,7 @@ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte *fte,
 			BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_DESTINATION_LIST);
 	}
 
-	if (fte->dests_size == 1 || !dest)
+	if (!(fte->status & FS_FTE_STATUS_EXISTING))
 		err = mlx5_cmd_create_fte(get_dev(&ft->node),
 					  ft, fg->id, fte);
 	else
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next V2 09/15] net/mlx5: Don't unlock fte while still using it
From: Saeed Mahameed @ 2016-10-30 21:22 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Mark Bloch, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

From: Mark Bloch <markb@mellanox.com>

When adding a new rule to an fte, we need to hold the fte lock
until we add that rule to the fte and increase the fte ref count.

Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5da2cc8..a07ff30 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1107,9 +1107,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg,
 				return rule;
 			}
 			rule = add_rule_fte(fte, fg, dest);
-			unlock_ref_node(&fte->node);
 			if (IS_ERR(rule))
-				goto unlock_fg;
+				goto unlock_fte;
 			else
 				goto add_rule;
 		}
@@ -1127,6 +1126,7 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg,
 		goto unlock_fg;
 	}
 	tree_init_node(&fte->node, 0, del_fte);
+	nested_lock_ref_node(&fte->node, FS_MUTEX_CHILD);
 	rule = add_rule_fte(fte, fg, dest);
 	if (IS_ERR(rule)) {
 		kfree(fte);
@@ -1139,6 +1139,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg,
 	list_add(&fte->node.list, prev);
 add_rule:
 	tree_add_node(&rule->node, &fte->node);
+unlock_fte:
+	unlock_ref_node(&fte->node);
 unlock_fg:
 	unlock_ref_node(&fg->node);
 	return rule;
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 08/15] net/mlx5: Add SRIOV VF max rate configuration support
From: Saeed Mahameed @ 2016-10-30 21:22 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Mohamad Haj Yahia, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

From: Mohamad Haj Yahia <mohamad@mellanox.com>

Implement the vf set rate ndo by modifying the TSAR vport rate limit.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 63 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  2 +
 3 files changed, 80 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7eaf380..7f763d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2945,6 +2945,20 @@ static int mlx5e_set_vf_trust(struct net_device *dev, int vf, bool setting)
 
 	return mlx5_eswitch_set_vport_trust(mdev->priv.eswitch, vf + 1, setting);
 }
+
+static int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate,
+			     int max_tx_rate)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+
+	if (min_tx_rate)
+		return -EOPNOTSUPP;
+
+	return mlx5_eswitch_set_vport_rate(mdev->priv.eswitch, vf + 1,
+					   max_tx_rate);
+}
+
 static int mlx5_vport_link2ifla(u8 esw_link)
 {
 	switch (esw_link) {
@@ -3252,6 +3266,7 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 	.ndo_set_vf_vlan         = mlx5e_set_vf_vlan,
 	.ndo_set_vf_spoofchk     = mlx5e_set_vf_spoofchk,
 	.ndo_set_vf_trust        = mlx5e_set_vf_trust,
+	.ndo_set_vf_rate         = mlx5e_set_vf_rate,
 	.ndo_get_vf_config       = mlx5e_get_vf_config,
 	.ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
 	.ndo_get_vf_stats        = mlx5e_get_vf_stats,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2e11a94..9ef01d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1451,6 +1451,47 @@ static void esw_vport_disable_qos(struct mlx5_eswitch *esw, int vport_num)
 	vport->qos.enabled = false;
 }
 
+static int esw_vport_qos_config(struct mlx5_eswitch *esw, int vport_num,
+				u32 max_rate)
+{
+	u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+	struct mlx5_vport *vport = &esw->vports[vport_num];
+	struct mlx5_core_dev *dev = esw->dev;
+	void *vport_elem;
+	u32 bitmask = 0;
+	int err = 0;
+
+	if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
+		return -EOPNOTSUPP;
+
+	if (!vport->qos.enabled)
+		return -EIO;
+
+	MLX5_SET(scheduling_context, &sched_ctx, element_type,
+		 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
+	vport_elem = MLX5_ADDR_OF(scheduling_context, &sched_ctx,
+				  element_attributes);
+	MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
+	MLX5_SET(scheduling_context, &sched_ctx, parent_element_id,
+		 esw->qos.root_tsar_id);
+	MLX5_SET(scheduling_context, &sched_ctx, max_average_bw,
+		 max_rate);
+	bitmask |= MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_MAX_AVERAGE_BW;
+
+	err = mlx5_modify_scheduling_element_cmd(dev,
+						 SCHEDULING_HIERARCHY_E_SWITCH,
+						 &sched_ctx,
+						 vport->qos.esw_tsar_ix,
+						 bitmask);
+	if (err) {
+		esw_warn(esw->dev, "E-Switch modify TSAR vport element failed (vport=%d,err=%d)\n",
+			 vport_num, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN])
 {
 	((u8 *)node_guid)[7] = mac[0];
@@ -1888,6 +1929,7 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 	ivi->qos = evport->info.qos;
 	ivi->spoofchk = evport->info.spoofchk;
 	ivi->trusted = evport->info.trusted;
+	ivi->max_tx_rate = evport->info.max_rate;
 	mutex_unlock(&esw->state_lock);
 
 	return 0;
@@ -1981,6 +2023,27 @@ int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch *esw,
 	return 0;
 }
 
+int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw,
+				int vport, u32 max_rate)
+{
+	struct mlx5_vport *evport;
+	int err = 0;
+
+	if (!ESW_ALLOWED(esw))
+		return -EPERM;
+	if (!LEGAL_VPORT(esw, vport))
+		return -EINVAL;
+
+	mutex_lock(&esw->state_lock);
+	evport = &esw->vports[vport];
+	err = esw_vport_qos_config(esw, vport, max_rate);
+	if (!err)
+		evport->info.max_rate = max_rate;
+
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
+
 int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 				 int vport,
 				 struct ifla_vf_stats *vf_stats)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index fb8de34..ddae90c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -246,6 +246,8 @@ int mlx5_eswitch_set_vport_spoofchk(struct mlx5_eswitch *esw,
 				    int vport, bool spoofchk);
 int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch *esw,
 				 int vport_num, bool setting);
+int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw,
+				int vport, u32 max_rate);
 int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 				  int vport, struct ifla_vf_info *ivi);
 int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 07/15] net/mlx5: Introduce E-switch QoS management
From: Saeed Mahameed @ 2016-10-30 21:22 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Mohamad Haj Yahia, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

From: Mohamad Haj Yahia <mohamad@mellanox.com>

Add TSAR to the eswitch which will act as the vports rate limiter.
Create/Destroy TSAR on Enable/Dsiable SRIOV.
Attach/Detach vport to eswitch TSAR on Enable/Disable vport.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 113 +++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  12 +++
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index abbf2c3..2e11a94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1351,6 +1351,106 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
 	return err;
 }
 
+/* Vport QoS management */
+static int esw_create_tsar(struct mlx5_eswitch *esw)
+{
+	u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+	struct mlx5_core_dev *dev = esw->dev;
+	int err;
+
+	if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
+		return 0;
+
+	if (esw->qos.enabled)
+		return -EEXIST;
+
+	err = mlx5_create_scheduling_element_cmd(dev,
+						 SCHEDULING_HIERARCHY_E_SWITCH,
+						 &tsar_ctx,
+						 &esw->qos.root_tsar_id);
+	if (err) {
+		esw_warn(esw->dev, "E-Switch create TSAR failed (%d)\n", err);
+		return err;
+	}
+
+	esw->qos.enabled = true;
+	return 0;
+}
+
+static void esw_destroy_tsar(struct mlx5_eswitch *esw)
+{
+	int err;
+
+	if (!esw->qos.enabled)
+		return;
+
+	err = mlx5_destroy_scheduling_element_cmd(esw->dev,
+						  SCHEDULING_HIERARCHY_E_SWITCH,
+						  esw->qos.root_tsar_id);
+	if (err)
+		esw_warn(esw->dev, "E-Switch destroy TSAR failed (%d)\n", err);
+
+	esw->qos.enabled = false;
+}
+
+static int esw_vport_enable_qos(struct mlx5_eswitch *esw, int vport_num,
+				u32 initial_max_rate)
+{
+	u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+	struct mlx5_vport *vport = &esw->vports[vport_num];
+	struct mlx5_core_dev *dev = esw->dev;
+	void *vport_elem;
+	int err = 0;
+
+	if (!esw->qos.enabled || !MLX5_CAP_GEN(dev, qos) ||
+	    !MLX5_CAP_QOS(dev, esw_scheduling))
+		return 0;
+
+	if (vport->qos.enabled)
+		return -EEXIST;
+
+	MLX5_SET(scheduling_context, &sched_ctx, element_type,
+		 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
+	vport_elem = MLX5_ADDR_OF(scheduling_context, &sched_ctx,
+				  element_attributes);
+	MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
+	MLX5_SET(scheduling_context, &sched_ctx, parent_element_id,
+		 esw->qos.root_tsar_id);
+	MLX5_SET(scheduling_context, &sched_ctx, max_average_bw,
+		 initial_max_rate);
+
+	err = mlx5_create_scheduling_element_cmd(dev,
+						 SCHEDULING_HIERARCHY_E_SWITCH,
+						 &sched_ctx,
+						 &vport->qos.esw_tsar_ix);
+	if (err) {
+		esw_warn(esw->dev, "E-Switch create TSAR vport element failed (vport=%d,err=%d)\n",
+			 vport_num, err);
+		return err;
+	}
+
+	vport->qos.enabled = true;
+	return 0;
+}
+
+static void esw_vport_disable_qos(struct mlx5_eswitch *esw, int vport_num)
+{
+	struct mlx5_vport *vport = &esw->vports[vport_num];
+	int err = 0;
+
+	if (!vport->qos.enabled)
+		return;
+
+	err = mlx5_destroy_scheduling_element_cmd(esw->dev,
+						  SCHEDULING_HIERARCHY_E_SWITCH,
+						  vport->qos.esw_tsar_ix);
+	if (err)
+		esw_warn(esw->dev, "E-Switch destroy TSAR vport element failed (vport=%d,err=%d)\n",
+			 vport_num, err);
+
+	vport->qos.enabled = false;
+}
+
 static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN])
 {
 	((u8 *)node_guid)[7] = mac[0];
@@ -1386,6 +1486,7 @@ static void esw_apply_vport_conf(struct mlx5_eswitch *esw,
 		esw_vport_egress_config(esw, vport);
 	}
 }
+
 static void esw_enable_vport(struct mlx5_eswitch *esw, int vport_num,
 			     int enable_events)
 {
@@ -1399,6 +1500,10 @@ static void esw_enable_vport(struct mlx5_eswitch *esw, int vport_num,
 	/* Restore old vport configuration */
 	esw_apply_vport_conf(esw, vport);
 
+	/* Attach vport to the eswitch rate limiter */
+	if (esw_vport_enable_qos(esw, vport_num, vport->info.max_rate))
+		esw_warn(esw->dev, "Failed to attach vport %d to eswitch rate limiter", vport_num);
+
 	/* Sync with current vport context */
 	vport->enabled_events = enable_events;
 	vport->enabled = true;
@@ -1437,7 +1542,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
 	 */
 	esw_vport_change_handle_locked(vport);
 	vport->enabled_events = 0;
-
+	esw_vport_disable_qos(esw, vport_num);
 	if (vport_num && esw->mode == SRIOV_LEGACY) {
 		mlx5_modify_vport_admin_state(esw->dev,
 					      MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
@@ -1483,6 +1588,10 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode)
 	if (err)
 		goto abort;
 
+	err = esw_create_tsar(esw);
+	if (err)
+		esw_warn(esw->dev, "Failed to create eswitch TSAR");
+
 	enabled_events = (mode == SRIOV_LEGACY) ? SRIOV_VPORT_EVENTS : UC_ADDR_CHANGE;
 	for (i = 0; i <= nvfs; i++)
 		esw_enable_vport(esw, i, enabled_events);
@@ -1519,6 +1628,8 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
 	if (mc_promisc && mc_promisc->uplink_rule)
 		mlx5_del_flow_rule(mc_promisc->uplink_rule);
 
+	esw_destroy_tsar(esw);
+
 	if (esw->mode == SRIOV_LEGACY)
 		esw_destroy_legacy_fdb_table(esw);
 	else if (esw->mode == SRIOV_OFFLOADS)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 2e2938e..fb8de34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -115,6 +115,7 @@ struct mlx5_vport_info {
 	u8                      qos;
 	u64                     node_guid;
 	int                     link_state;
+	u32                     max_rate;
 	bool                    spoofchk;
 	bool                    trusted;
 };
@@ -133,6 +134,11 @@ struct mlx5_vport {
 
 	struct mlx5_vport_info  info;
 
+	struct {
+		bool            enabled;
+		u32             esw_tsar_ix;
+	} qos;
+
 	bool                    enabled;
 	u16                     enabled_events;
 };
@@ -209,6 +215,12 @@ struct mlx5_eswitch {
 	 */
 	struct mutex            state_lock;
 	struct esw_mc_addr      *mc_promisc;
+
+	struct {
+		bool            enabled;
+		u32             root_tsar_id;
+	} qos;
+
 	struct mlx5_esw_offload offloads;
 	int                     mode;
 };
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 06/15] net/mlx5: Introduce TSAR manipulation firmware commands
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak,
	Mohamad Haj Yahia, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Mohamad Haj Yahia <mohamad-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.

In this patch we implement the API of manipulating the TSAR.

Signed-off-by: Mohamad Haj Yahia <mohamad-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  13 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   7 +
 drivers/net/ethernet/mellanox/mlx5/core/rl.c       |  65 +++++++
 include/linux/mlx5/mlx5_ifc.h                      | 199 ++++++++++++++++++++-
 4 files changed, 279 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 1e639f8..8561102 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -318,6 +318,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
 	case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT:
 	case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER:
+	case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT:
+	case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT:
 		return MLX5_CMD_STAT_OK;
 
 	case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -419,11 +421,14 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_QUERY_FLOW_TABLE:
 	case MLX5_CMD_OP_CREATE_FLOW_GROUP:
 	case MLX5_CMD_OP_QUERY_FLOW_GROUP:
-
 	case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY:
 	case MLX5_CMD_OP_ALLOC_FLOW_COUNTER:
 	case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
 	case MLX5_CMD_OP_ALLOC_ENCAP_HEADER:
+	case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
+	case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+	case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT:
+	case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT:
 		*status = MLX5_DRIVER_STATUS_ABORTED;
 		*synd = MLX5_DRIVER_SYND;
 		return -EIO;
@@ -580,6 +585,12 @@ const char *mlx5_command_str(int command)
 	MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE);
 	MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER);
 	MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER);
+	MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
+	MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
+	MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT);
+	MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT);
+	MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT);
+	MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT);
 	default: return "unknown command opcode";
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 3d0cfb9..bf43171 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -91,6 +91,13 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs);
 bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
+int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+				       void *context, u32 *element_id);
+int mlx5_modify_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+				       void *context, u32 element_id,
+				       u32 modify_bitmask);
+int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+					u32 element_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev);
 u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
index 104902a..e651e4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -36,6 +36,71 @@
 #include <linux/mlx5/cmd.h>
 #include "mlx5_core.h"
 
+/* Scheduling element fw management */
+int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+				       void *ctx, u32 *element_id)
+{
+	u32 in[MLX5_ST_SZ_DW(create_scheduling_element_in)]  = {0};
+	u32 out[MLX5_ST_SZ_DW(create_scheduling_element_in)] = {0};
+	void *schedc;
+	int err;
+
+	schedc = MLX5_ADDR_OF(create_scheduling_element_in, in,
+			      scheduling_context);
+	MLX5_SET(create_scheduling_element_in, in, opcode,
+		 MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT);
+	MLX5_SET(create_scheduling_element_in, in, scheduling_hierarchy,
+		 hierarchy);
+	memcpy(schedc, ctx, MLX5_ST_SZ_BYTES(scheduling_context));
+
+	err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+	if (err)
+		return err;
+
+	*element_id = MLX5_GET(create_scheduling_element_out, out,
+			       scheduling_element_id);
+	return 0;
+}
+
+int mlx5_modify_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+				       void *ctx, u32 element_id,
+				       u32 modify_bitmask)
+{
+	u32 in[MLX5_ST_SZ_DW(modify_scheduling_element_in)]  = {0};
+	u32 out[MLX5_ST_SZ_DW(modify_scheduling_element_in)] = {0};
+	void *schedc;
+
+	schedc = MLX5_ADDR_OF(modify_scheduling_element_in, in,
+			      scheduling_context);
+	MLX5_SET(modify_scheduling_element_in, in, opcode,
+		 MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT);
+	MLX5_SET(modify_scheduling_element_in, in, scheduling_element_id,
+		 element_id);
+	MLX5_SET(modify_scheduling_element_in, in, modify_bitmask,
+		 modify_bitmask);
+	MLX5_SET(modify_scheduling_element_in, in, scheduling_hierarchy,
+		 hierarchy);
+	memcpy(schedc, ctx, MLX5_ST_SZ_BYTES(scheduling_context));
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+					u32 element_id)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_scheduling_element_in)]  = {0};
+	u32 out[MLX5_ST_SZ_DW(destroy_scheduling_element_in)] = {0};
+
+	MLX5_SET(destroy_scheduling_element_in, in, opcode,
+		 MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT);
+	MLX5_SET(destroy_scheduling_element_in, in, scheduling_element_id,
+		 element_id);
+	MLX5_SET(destroy_scheduling_element_in, in, scheduling_hierarchy,
+		 hierarchy);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
 /* Finds an entry where we can register the given rate
  * If the rate already exists, return the entry where it is registered,
  * otherwise return the first available entry.
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 12f72e4..2632cb2 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -145,6 +145,12 @@ enum {
 	MLX5_CMD_OP_QUERY_Q_COUNTER               = 0x773,
 	MLX5_CMD_OP_SET_RATE_LIMIT                = 0x780,
 	MLX5_CMD_OP_QUERY_RATE_LIMIT              = 0x781,
+	MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT      = 0x782,
+	MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT     = 0x783,
+	MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT       = 0x784,
+	MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT      = 0x785,
+	MLX5_CMD_OP_CREATE_QOS_PARA_VPORT         = 0x786,
+	MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT        = 0x787,
 	MLX5_CMD_OP_ALLOC_PD                      = 0x800,
 	MLX5_CMD_OP_DEALLOC_PD                    = 0x801,
 	MLX5_CMD_OP_ALLOC_UAR                     = 0x802,
@@ -537,13 +543,27 @@ struct mlx5_ifc_e_switch_cap_bits {
 
 struct mlx5_ifc_qos_cap_bits {
 	u8         packet_pacing[0x1];
-	u8         reserved_0[0x1f];
-	u8         reserved_1[0x20];
+	u8         esw_scheduling[0x1];
+	u8         reserved_at_2[0x1e];
+
+	u8         reserved_at_20[0x20];
+
 	u8         packet_pacing_max_rate[0x20];
+
 	u8         packet_pacing_min_rate[0x20];
-	u8         reserved_2[0x10];
+
+	u8         reserved_at_80[0x10];
 	u8         packet_pacing_rate_table_size[0x10];
-	u8         reserved_3[0x760];
+
+	u8         esw_element_type[0x10];
+	u8         esw_tsar_type[0x10];
+
+	u8         reserved_at_c0[0x10];
+	u8         max_qos_para_vport[0x10];
+
+	u8         max_tsar_bw_share[0x20];
+
+	u8         reserved_at_100[0x700];
 };
 
 struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
@@ -2333,6 +2353,30 @@ struct mlx5_ifc_sqc_bits {
 	struct mlx5_ifc_wq_bits wq;
 };
 
+enum {
+	SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR = 0x0,
+	SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT = 0x1,
+	SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC = 0x2,
+	SCHEDULING_CONTEXT_ELEMENT_TYPE_PARA_VPORT_TC = 0x3,
+};
+
+struct mlx5_ifc_scheduling_context_bits {
+	u8         element_type[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         element_attributes[0x20];
+
+	u8         parent_element_id[0x20];
+
+	u8         reserved_at_60[0x40];
+
+	u8         bw_share[0x20];
+
+	u8         max_average_bw[0x20];
+
+	u8         reserved_at_e0[0x120];
+};
+
 struct mlx5_ifc_rqtc_bits {
 	u8         reserved_at_0[0xa0];
 
@@ -2920,6 +2964,29 @@ struct mlx5_ifc_register_loopback_control_bits {
 	u8         reserved_at_20[0x60];
 };
 
+struct mlx5_ifc_vport_tc_element_bits {
+	u8         traffic_class[0x4];
+	u8         reserved_at_4[0xc];
+	u8         vport_number[0x10];
+};
+
+struct mlx5_ifc_vport_element_bits {
+	u8         reserved_at_0[0x10];
+	u8         vport_number[0x10];
+};
+
+enum {
+	TSAR_ELEMENT_TSAR_TYPE_DWRR = 0x0,
+	TSAR_ELEMENT_TSAR_TYPE_ROUND_ROBIN = 0x1,
+	TSAR_ELEMENT_TSAR_TYPE_ETS = 0x2,
+};
+
+struct mlx5_ifc_tsar_element_bits {
+	u8         reserved_at_0[0x8];
+	u8         tsar_type[0x8];
+	u8         reserved_at_10[0x10];
+};
+
 struct mlx5_ifc_teardown_hca_out_bits {
 	u8         status[0x8];
 	u8         reserved_at_8[0x18];
@@ -3540,6 +3607,39 @@ struct mlx5_ifc_query_special_contexts_in_bits {
 	u8         reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_query_scheduling_element_out_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0xc0];
+
+	struct mlx5_ifc_scheduling_context_bits scheduling_context;
+
+	u8         reserved_at_300[0x100];
+};
+
+enum {
+	SCHEDULING_HIERARCHY_E_SWITCH = 0x2,
+};
+
+struct mlx5_ifc_query_scheduling_element_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         scheduling_hierarchy[0x8];
+	u8         reserved_at_48[0x18];
+
+	u8         scheduling_element_id[0x20];
+
+	u8         reserved_at_80[0x180];
+};
+
 struct mlx5_ifc_query_rqt_out_bits {
 	u8         status[0x8];
 	u8         reserved_at_8[0x18];
@@ -4725,6 +4825,43 @@ struct mlx5_ifc_modify_sq_in_bits {
 	struct mlx5_ifc_sqc_bits ctx;
 };
 
+struct mlx5_ifc_modify_scheduling_element_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x1c0];
+};
+
+enum {
+	MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_BW_SHARE = 0x1,
+	MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_MAX_AVERAGE_BW = 0x2,
+};
+
+struct mlx5_ifc_modify_scheduling_element_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         scheduling_hierarchy[0x8];
+	u8         reserved_at_48[0x18];
+
+	u8         scheduling_element_id[0x20];
+
+	u8         reserved_at_80[0x20];
+
+	u8         modify_bitmask[0x20];
+
+	u8         reserved_at_c0[0x40];
+
+	struct mlx5_ifc_scheduling_context_bits scheduling_context;
+
+	u8         reserved_at_300[0x100];
+};
+
 struct mlx5_ifc_modify_rqt_out_bits {
 	u8         status[0x8];
 	u8         reserved_at_8[0x18];
@@ -5390,6 +5527,30 @@ struct mlx5_ifc_destroy_sq_in_bits {
 	u8         reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_destroy_scheduling_element_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x1c0];
+};
+
+struct mlx5_ifc_destroy_scheduling_element_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         scheduling_hierarchy[0x8];
+	u8         reserved_at_48[0x18];
+
+	u8         scheduling_element_id[0x20];
+
+	u8         reserved_at_80[0x180];
+};
+
 struct mlx5_ifc_destroy_rqt_out_bits {
 	u8         status[0x8];
 	u8         reserved_at_8[0x18];
@@ -6017,6 +6178,36 @@ struct mlx5_ifc_create_sq_in_bits {
 	struct mlx5_ifc_sqc_bits ctx;
 };
 
+struct mlx5_ifc_create_scheduling_element_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x40];
+
+	u8         scheduling_element_id[0x20];
+
+	u8         reserved_at_a0[0x160];
+};
+
+struct mlx5_ifc_create_scheduling_element_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         scheduling_hierarchy[0x8];
+	u8         reserved_at_48[0x18];
+
+	u8         reserved_at_60[0xa0];
+
+	struct mlx5_ifc_scheduling_context_bits scheduling_context;
+
+	u8         reserved_at_300[0x100];
+};
+
 struct mlx5_ifc_create_rqt_out_bits {
 	u8         status[0x8];
 	u8         reserved_at_8[0x18];
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next V2 05/15] net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the
device ID "0x101a" to mlx5_core_pci_table.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index d9c3c70..197e04c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1422,6 +1422,7 @@ static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0x1017) },			/* ConnectX-5, PCIe 3.0 */
 	{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},	/* ConnectX-5 VF */
 	{ PCI_VDEVICE(MELLANOX, 0x1019) },			/* ConnectX-5, PCIe 4.0 */
+	{ PCI_VDEVICE(MELLANOX, 0x101a), MLX5_PCI_DEV_IS_VF},	/* ConnectX-5, PCIe 4.0 VF */
 	{ 0, }
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 04/15] net/mlx5: Fix length of async_event_mask
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak,
	Eugenia Emantayev, Saeed Mahameed, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Eugenia Emantayev <eugenia-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

According to PRM async_event_mask have to be 64 bits long.

Signed-off-by: Eugenia Emantayev <eugenia-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index aaca090..e74a73b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -469,7 +469,7 @@ void mlx5_eq_cleanup(struct mlx5_core_dev *dev)
 int mlx5_start_eqs(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = &dev->priv.eq_table;
-	u32 async_event_mask = MLX5_ASYNC_EVENT_MASK;
+	u64 async_event_mask = MLX5_ASYNC_EVENT_MASK;
 	int err;
 
 	if (MLX5_CAP_GEN(dev, pg))
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next V2 03/15] net/mlx5: Ensure SRQ physical address structure endianness
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Artemy Kovalyov, Leon Romanovsky, Saeed Mahameed
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

From: Artemy Kovalyov <artemyko@mellanox.com>

SRQ physical address structure field should be in big-endian format.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/srq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/srq.h b/include/linux/mlx5/srq.h
index 33c97dc..1cde0fd 100644
--- a/include/linux/mlx5/srq.h
+++ b/include/linux/mlx5/srq.h
@@ -55,7 +55,7 @@ struct mlx5_srq_attr {
 	u32 lwm;
 	u32 user_index;
 	u64 db_record;
-	u64 *pas;
+	__be64 *pas;
 };
 
 struct mlx5_core_dev;
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 02/15] net/mlx5: Update struct mlx5_ifc_xrqc_bits
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev, linux-rdma, Or Gerlitz, Leon Romanovsky, Tal Alon,
	Matan Barak, Artemy Kovalyov, Leon Romanovsky, Saeed Mahameed
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm@mellanox.com>

From: Artemy Kovalyov <artemyko@mellanox.com>

Update struct mlx5_ifc_xrqc_bits according to last specification

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6045d4d..12f72e4 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2844,7 +2844,7 @@ struct mlx5_ifc_xrqc_bits {
 
 	struct mlx5_ifc_tag_matching_topology_context_bits tag_matching_topology_context;
 
-	u8         reserved_at_180[0x200];
+	u8         reserved_at_180[0x880];
 
 	struct mlx5_ifc_wq_bits wq;
 };
-- 
2.7.4

^ permalink raw reply related

* [PATCH for-next V2 01/15] IB/mlx5: Skip handling unknown events
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak,
	Saeed Mahameed, Eugenia Emantayev, Leon Romanovsky
In-Reply-To: <1477862528-4328-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Do not dispatch unknown mlx5 core events on mlx5_ib_event.

Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Eugenia Emantayev <eugenia-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2217477..d02341e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2358,6 +2358,8 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 		ibev.event = IB_EVENT_CLIENT_REREGISTER;
 		port = (u8)param;
 		break;
+	default:
+		return;
 	}
 
 	ibev.device	      = &ibdev->ib_dev;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH for-next V2 00/15][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: Saeed Mahameed @ 2016-10-30 21:21 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Leon Romanovsky, Tal Alon, Matan Barak,
	Saeed Mahameed

Hi Dave and Doug,

This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
 - SRIOV VF max rate limit support
 - mlx5e tc support for FWD rules with counter.

Needed for both net and rdma subsystems.

Updates and Fixes:
>From Saeed Mahameed (2):
  - mlx5 IB: Skip handling unknown mlx5 events
  - Add ConnectX-5 PCIe 4.0 VF device ID

>From Artemy Kovalyov (2):
  - Update struct mlx5_ifc_xrqc_bits
  - Ensure SRQ physical address structure endianness

>From Eugenia Emantayev (1):
  - Fix length of async_event_mask

New Features:
>From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
  - Introduce TSAR manipulation firmware commands
  - Introduce E-switch QoS management
  - Add SRIOV VF max rate configuration support

>From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
  - Don't unlock fte while still using it
  - Use fte status to decide on firmware command
  - Refactor find_flow_rule
  - Group similar rules under the same fte
  - Add multi dest support
  - Add option to add fwd rule with counter
  - mlx5e tc support for FWD rule with counter
  Mark here fixed two trivial issues with the flow steering core, and did
  some refactoring in the flow steering API to support adding mulit destination
  rules to the same hardware flow table entry at once.  In the last two patches
  added the ability to populate a flow rule with a flow counter to the same flow entry.

V2: Dropped some patches that added new structures without adding any usage of them.
    Added SRIOV VF max rate configuration support patch that introduces
    the usage of the TSAR infrastructure.
    Added flow steering fixes and refactoring in addition to mlx5 tc
    support for forward rule with counter.

The following changes since commit a909d3e636995ba7c349e2ca5dbb528154d4ac30
    Linux 4.9-rc3

are available in the git repository at:
    git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git tags/shared-for-4.10-1

for you to fetch changes up to e37a79e5d4cac3831fac3d4afbf2461f56b4b7bd
    net/mlx5e: Add tc support for FWD rule with counter

Thanks,
Saeed & Leon.

Artemy Kovalyov (2):
  net/mlx5: Update struct mlx5_ifc_xrqc_bits
  net/mlx5: Ensure SRQ physical address structure endianness

Eugenia Emantayev (1):
  net/mlx5: Fix length of async_event_mask

Mark Bloch (7):
  net/mlx5: Don't unlock fte while still using it
  net/mlx5: Use fte status to decide on firmware command
  net/mlx5: Refactor find_flow_rule
  net/mlx5: Group similer rules under the same fte
  net/mlx5: Add multi dest support
  net/mlx5: Add option to add fwd rule with counter
  net/mlx5e: Add tc support for FWD rule with counter

Mohamad Haj Yahia (3):
  net/mlx5: Introduce TSAR manipulation firmware commands
  net/mlx5: Introduce E-switch QoS management
  net/mlx5: Add SRIOV VF max rate configuration support

Saeed Mahameed (2):
  IB/mlx5: Skip handling unknown events
  net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID

 drivers/infiniband/hw/mlx5/main.c                  |  16 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  13 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  38 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  49 +--
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c    |  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  15 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  35 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 244 ++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  36 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  60 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 358 +++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   1 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   7 +
 drivers/net/ethernet/mellanox/mlx5/core/rl.c       |  65 ++++
 include/linux/mlx5/fs.h                            |  28 +-
 include/linux/mlx5/mlx5_ifc.h                      | 201 +++++++++++-
 include/linux/mlx5/srq.h                           |   2 +-
 22 files changed, 927 insertions(+), 289 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event
From: Sagi Grimberg @ 2016-10-30 21:17 UTC (permalink / raw)
  To: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477551554-30349-11-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>



On 27/10/16 09:59, Binoy Jayan wrote:
> Convert the completion 'mlx5_ib_umr_context:done' to a wait_event as it
> just waits for the return value to be filled.
>
> Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
>  drivers/infiniband/hw/mlx5/mr.c      | 9 +++++----
>  include/rdma/ib_verbs.h              | 1 +
>  3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index de31b5f..cf496b5 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -524,7 +524,7 @@ struct mlx5_ib_mw {
>  struct mlx5_ib_umr_context {
>  	struct ib_cqe		cqe;
>  	enum ib_wc_status	status;
> -	struct completion	done;
> +	wait_queue_head_t	wq;
>  };
>
>  struct umr_common {
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index dfaf6f6..49ff2af 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -846,14 +846,14 @@ static void mlx5_ib_umr_done(struct ib_cq *cq, struct ib_wc *wc)
>  		container_of(wc->wr_cqe, struct mlx5_ib_umr_context, cqe);
>
>  	context->status = wc->status;
> -	complete(&context->done);
> +	wake_up(&context->wq);
>  }
>
>  static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
>  {
>  	context->cqe.done = mlx5_ib_umr_done;
> -	context->status = -1;
> -	init_completion(&context->done);
> +	context->status = IB_WC_STATUS_NONE;
> +	init_waitqueue_head(&context->wq);
>  }
>
>  static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> @@ -873,7 +873,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
>  	if (err) {
>  		mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
>  	} else {
> -		wait_for_completion(&umr_context.done);
> +		wait_event(umr_context.wq,
> +			   umr_context.status != IB_WC_STATUS_NONE);

How is this simpler?


>  enum ib_wc_status {
> +	IB_WC_STATUS_NONE = -1,
>  	IB_WC_SUCCESS,
>  	IB_WC_LOC_LEN_ERR,
>  	IB_WC_LOC_QP_OP_ERR,
>

Huh? Where did this bogus status came from? IMHO, this is polluting
the verbs interface for no good reason at all, sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Sagi Grimberg @ 2016-10-30 21:12 UTC (permalink / raw)
  To: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477551554-30349-6-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

> The semaphore 'sem' in isert_device is used as completion, so convert
> it to struct completion. Semaphores are going away in the future.

Umm, this is 100% *not* true. np->sem is designed as a counting to
sync the iscsi login thread with the connect requests coming from the
initiators. So this is actually a reliable bug insertion :(

NAK from me...

Also, I would appreciate if you include get_maintainer.pl in your
patch submissions so I won't need to fish these in the Linux-rdma
patch traffic.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH -next] IB/rxe: Use DEFINE_SPINLOCK() for spinlock
From: Leon Romanovsky @ 2016-10-30 21:09 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Moni Shoua, Doug Ledford, Sean Hefty, Hal Rosenstock, Wei Yongjun,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477757973-32052-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]

On Sat, Oct 29, 2016 at 04:19:33PM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>
> spinlock can be initialized automatically with DEFINE_SPINLOCK()
> rather than explicitly calling spin_lock_init().
>
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Thanks.
Reviewed-by: Leon Romanosky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

> ---
>  drivers/infiniband/sw/rxe/rxe_net.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index b8258e4..4cb6378 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -46,7 +46,7 @@
>  #include "rxe_loc.h"
>
>  static LIST_HEAD(rxe_dev_list);
> -static spinlock_t dev_list_lock; /* spinlock for device list */
> +static DEFINE_SPINLOCK(dev_list_lock); /* spinlock for device list */
>
>  struct rxe_dev *net_to_rxe(struct net_device *ndev)
>  {
> @@ -663,8 +663,6 @@ struct notifier_block rxe_net_notifier = {
>
>  int rxe_net_ipv4_init(void)
>  {
> -	spin_lock_init(&dev_list_lock);
> -
>  	recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
>  				htons(ROCE_V2_UDP_DPORT), false);
>  	if (IS_ERR(recv_sockets.sk4)) {
> @@ -680,8 +678,6 @@ int rxe_net_ipv6_init(void)
>  {
>  #if IS_ENABLED(CONFIG_IPV6)
>
> -	spin_lock_init(&dev_list_lock);
> -
>  	recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
>  						htons(ROCE_V2_UDP_DPORT), true);
>  	if (IS_ERR(recv_sockets.sk6)) {
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] IBcore/CM: Issue DREQ when receiving REQ/REP for stale QP
From: Sagi Grimberg @ 2016-10-30 21:06 UTC (permalink / raw)
  To: Hans Westgaard Ry, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Matan Barak, Erez Shitrit, Bart Van Assche, Ira Weiny, Or Gerlitz,
	Hakon Bugge, Yuval Shaia, linux-rdma, linux-kernel
In-Reply-To: <1477653269-27359-1-git-send-email-hans.westgaard.ry@oracle.com>

> from "InfiBand Architecture Specifications Volume 1":
>
>   A QP is said to have a stale connection when only one side has
>   connection information. A stale connection may result if the remote CM
>   had dropped the connection and sent a DREQ but the DREQ was never
>   received by the local CM. Alternatively the remote CM may have lost
>   all record of past connections because its node crashed and rebooted,
>   while the local CM did not become aware of the remote node's reboot
>   and therefore did not clean up stale connections.
>
> and:
>
>    A local CM may receive a REQ/REP for a stale connection. It shall
>    abort the connection issuing REJ to the REQ/REP. It shall then issue
>    DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.
>
> This patch solves a problem with reuse of QPN. Current codebase, that
> is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
> in CM. A problem with this is the timeconstants governing this
> mechanism; they are up to 768 seconds and the interface may look
> inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
> does the necessary cleanup and the interface comes up.

I like this fix, so,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

But I think the CM layer still is buggy in this area.

In vol 1 the state transition table specifically states that DREP
timeouts should move the cm_id to timewait state but the CM doesn't
seem to maintain response timeouts on disconnect requests. If the
DREQ happened to fail (send error completion) things are fine, but
if the DREQ makes it to the peer but it doesn't reply then no one
will take care of it (i.e. we will never see a TIMEWAIT event from
this cm_id)...

I recall some debugging session with Hal on this area a ~year ago
with a new iser target (which didn't reply to DREQs on reboot
sequences). iser initiator waits for a DISCONNECTED/TIMEWAIT events
before destroying the cm_id (which never happened because of the
above). I think I ended up working around that in iser to just go
ahead and destroy the cm_id after issuing a DREQ (but now I realize
it was never included so I'll probably dig it up again soon).

^ permalink raw reply

* Re: [PATCH -next] qedr: Use list_move_tail instead of list_del/list_add_tail
From: Leon Romanovsky @ 2016-10-30 20:56 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Ram Amrani,
	Rajesh Borundia, Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477757993-32186-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

On Sat, Oct 29, 2016 at 04:19:53PM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>
> Using list_move_tail() instead of list_del() + list_add_tail().
>
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Thanks,
Reviewed-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

> ---
>  drivers/infiniband/hw/qedr/verbs.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..cdaddf9 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -2413,8 +2413,7 @@ static void handle_completed_mrs(struct qedr_dev *dev, struct mr_info *info)
>  		 */
>  		pbl = list_first_entry(&info->inuse_pbl_list,
>  				       struct qedr_pbl, list_entry);
> -		list_del(&pbl->list_entry);
> -		list_add_tail(&pbl->list_entry, &info->free_pbl_list);
> +		list_move_tail(&pbl->list_entry, &info->free_pbl_list);
>  		info->completed_handled++;
>  	}
>  }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Leon Romanovsky @ 2016-10-30 20:50 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR07MB22076D8F67AB8C78A9565CB6F8AF0-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Sun, Oct 30, 2016 at 09:25:09AM +0000, Amrani, Ram wrote:
> > The rdma-core word in the subject is misleading.
>
> Yeah it is. The location of the code (qede) is somewhat different from the content (qedr).
> I don't know how you would have done it, but at least you'll see in a future patch that we've
> changed this  to be fully qedr.

We use "rdma-core" notations for patches intended to consolidated
library, while your patch is for the kernel.

>
> Ram
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
From: Hefty, Sean @ 2016-10-30 19:28 UTC (permalink / raw)
  To: 'Matan Barak',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1477579398-6875-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

I found this patch very hard to follow.  This was in part due to the output of the patch command itself, but also because there lacked sufficient documentation on what the new data structures were for and the terms being used.  As a result, I had to bounce around the patch to figure things out, adding comments as I went along, until I finally just gave up trying to read it.

> The new ioctl infrastructure supports driver specific objects.
> Each such object type has a free function, allocation size and an

You can replace the allocation size with an alloc function, to pair with the free call.  Then the object can be initialized by the user.

> order of destruction. This information is embedded in the same
> table describing the various action allowed on the object, similarly
> to object oriented programming.
> 
> When a ucontext is created, a new list is created in this ib_ucontext.
> This list contains all objects created under this ib_ucontext.
> When a ib_ucontext is destroyed, we traverse this list several time
> destroying the various objects by the order mentioned in the object
> type description. If few object types have the same destruction order,
> they are destroyed in an order opposite to their creation order.

Could we simply walk the list backwards, destroying all objects with a reference count of 1 - repeat if necessary?  Basically avoid complex rules for this.

In fact, it would be great if we could just cleanup the list in the reverse order that items were created.  Maybe this requires supporting a pre-cleanup handler, so that the driver can pluck items out of the list that may need to be destroyed out of order.

> Adding an object is done in two parts.
> First, an object is allocated and added to IDR/fd table. Then, the
> command's handlers (in downstream patches) could work on this object
> and fill in its required details.
> After a successful command, ib_uverbs_uobject_enable is called and
> this user objects becomes ucontext visible.

If you have a way to mark that an object is used for exclusive access, you may be able to use that instead of introducing a new variable.  (I.e. acquire the object's write lock).  I think we want to make an effort to minimize the size of the kernel structure needed to track every user space object (within reason).

> Removing an uboject is done by calling ib_uverbs_uobject_remove.
> 
> We should make sure IDR (per-device) and list (per-ucontext) could
> be accessed concurrently without corrupting them.
> 
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---

As a general comment, I do have concerns that the resulting generalized parsing of everything will negatively impact performance for operations that do have to transition into the kernel.  Not all devices offload all operations to user space.  Plus the resulting code is extremely difficult to read and non-trivial to use.  It's equivalent to reading C++ code that has 4 layers of inheritance with overrides to basic operators...

Pre and post operators per command that can do straightforward validation seem like a better option.


>  drivers/infiniband/core/Makefile      |   3 +-
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   | 489
> ++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>  drivers/infiniband/core/uverbs.h      |   1 +
>  drivers/infiniband/core/uverbs_main.c |   2 +-
>  include/rdma/ib_verbs.h               |  28 +-
>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>  8 files changed, 789 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/infiniband/core/rdma_core.c
>  create mode 100644 drivers/infiniband/core/rdma_core.h
>  create mode 100644 include/rdma/uverbs_ioctl.h
> 
> diff --git a/drivers/infiniband/core/Makefile
> b/drivers/infiniband/core/Makefile
> index edaae9f..1819623 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -28,4 +28,5 @@ ib_umad-y :=			user_mad.o
> 
>  ib_ucm-y :=			ucm.o
> 
> -ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o
> uverbs_marshall.o
> +ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o
> uverbs_marshall.o \
> +				rdma_core.o
> diff --git a/drivers/infiniband/core/device.c
> b/drivers/infiniband/core/device.c
> index c3b68f5..43994b1 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>  	spin_lock_init(&device->client_data_lock);
>  	INIT_LIST_HEAD(&device->client_data_list);
>  	INIT_LIST_HEAD(&device->port_list);
> +	INIT_LIST_HEAD(&device->type_list);
> 
>  	return device;
>  }
> diff --git a/drivers/infiniband/core/rdma_core.c
> b/drivers/infiniband/core/rdma_core.c
> new file mode 100644
> index 0000000..337abc2
> --- /dev/null
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -0,0 +1,489 @@
> +/*
> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <linux/file.h>
> +#include <linux/anon_inodes.h>
> +#include <rdma/ib_verbs.h>
> +#include "uverbs.h"
> +#include "rdma_core.h"
> +#include <rdma/uverbs_ioctl.h>
> +
> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> *ibdev,
> +					  uint16_t type)
> +{
> +	const struct uverbs_types_group *groups = ibdev->types_group;
> +	const struct uverbs_types *types;
> +	int ret = groups->dist(&type, groups->priv);
> +
> +	if (ret >= groups->num_groups)
> +		return NULL;
> +
> +	types = groups->type_groups[ret];
> +
> +	if (type >= types->num_types)
> +		return NULL;
> +
> +	return types->types[type];
> +}
> +
> +static int uverbs_lock_object(struct ib_uobject *uobj,
> +			      enum uverbs_idr_access access)
> +{
> +	if (access == UVERBS_IDR_ACCESS_READ)
> +		return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
> +
> +	/* lock is either WRITE or DESTROY - should be exclusive */
> +	return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;

This function could take the lock type directly (read or write), versus inferring it based on some other access type.

> +}
> +
> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
> *context)
> +{
> +	struct ib_uobject *uobj;
> +
> +	rcu_read_lock();
> +	uobj = idr_find(&context->device->idr, id);
> +	if (uobj && uobj->live) {
> +		if (uobj->context != context)
> +			uobj = NULL;
> +	}
> +	rcu_read_unlock();
> +
> +	return uobj;
> +}
> +
> +struct ib_ucontext_lock {
> +	struct kref  ref;
> +	/* locking the uobjects_list */
> +	struct mutex lock;
> +};
> +
> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
> +{
> +	mutex_init(&lock->lock);
> +	kref_init(&lock->ref);
> +}
> +
> +static void release_uobjects_list_lock(struct kref *ref)
> +{
> +	struct ib_ucontext_lock *lock = container_of(ref,
> +						     struct ib_ucontext_lock,
> +						     ref);
> +
> +	kfree(lock);
> +}
> +
> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
> +		      struct ib_ucontext *context)
> +{
> +	init_rwsem(&uobj->usecnt);
> +	uobj->user_handle = user_handle;
> +	uobj->context     = context;
> +	uobj->live        = 0;
> +}
> +
> +static int add_uobj(struct ib_uobject *uobj)
> +{
> +	int ret;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&uobj->context->device->idr_lock);
> +
> +	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
> GFP_NOWAIT);
> +	if (ret >= 0)
> +		uobj->id = ret;
> +
> +	spin_unlock(&uobj->context->device->idr_lock);
> +	idr_preload_end();
> +
> +	return ret < 0 ? ret : 0;
> +}
> +
> +static void remove_uobj(struct ib_uobject *uobj)
> +{
> +	spin_lock(&uobj->context->device->idr_lock);
> +	idr_remove(&uobj->context->device->idr, uobj->id);
> +	spin_unlock(&uobj->context->device->idr_lock);
> +}
> +
> +static void put_uobj(struct ib_uobject *uobj)
> +{
> +	kfree_rcu(uobj, rcu);
> +}
> +
> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext
> *ucontext,
> +						   const struct
> uverbs_type_alloc_action *type,
> +						   u32 idr,
> +						   enum uverbs_idr_access access)
> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	rcu_read_lock();
> +	uobj = get_uobj(idr, ucontext);
> +	if (!uobj)
> +		goto free;
> +
> +	if (uobj->type != type) {
> +		uobj = NULL;
> +		goto free;
> +	}
> +
> +	ret = uverbs_lock_object(uobj, access);
> +	if (ret)
> +		uobj = ERR_PTR(ret);
> +free:
> +	rcu_read_unlock();
> +	return uobj;
> +
> +	return NULL;
> +}
> +
> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
> +				 const struct uverbs_type_alloc_action
> *uobject_type)
> +{
> +	uobject->type = uobject_type;
> +	return add_uobj(uobject);
> +}
> +
> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> uverbs_type_alloc_action *type,
> +					    struct ib_ucontext *ucontext,
> +					    enum uverbs_idr_access access,
> +					    uint32_t idr)
> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		uobj = kmalloc(type->obj_size, GFP_KERNEL);
> +		if (!uobj)
> +			return ERR_PTR(-ENOMEM);
> +
> +		init_uobj(uobj, 0, ucontext);
> +
> +		/* lock idr */

Command to lock idr, but no lock is obtained.

> +		ret = ib_uverbs_uobject_add(uobj, type);
> +		if (ret) {
> +			kfree(uobj);
> +			return ERR_PTR(ret);
> +		}
> +
> +	} else {
> +		uobj = get_uobject_from_context(ucontext, type, idr,
> +						access);
> +
> +		if (!uobj)
> +			return ERR_PTR(-ENOENT);
> +	}
> +
> +	return uobj;
> +}
> +
> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> uverbs_type_alloc_action *type,
> +					   struct ib_ucontext *ucontext,
> +					   enum uverbs_idr_access access,
> +					   int fd)
> +{
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		int _fd;
> +		struct ib_uobject *uobj = NULL;
> +		struct file *filp;
> +
> +		_fd = get_unused_fd_flags(O_CLOEXEC);
> +		if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
> ib_uobject)))
> +			return ERR_PTR(_fd);
> +
> +		uobj = kmalloc(type->obj_size, GFP_KERNEL);
> +		init_uobj(uobj, 0, ucontext);
> +
> +		if (!uobj)
> +			return ERR_PTR(-ENOMEM);
> +
> +		filp = anon_inode_getfile(type->fd.name, type->fd.fops,
> +					  uobj + 1, type->fd.flags);
> +		if (IS_ERR(filp)) {
> +			put_unused_fd(_fd);
> +			kfree(uobj);
> +			return (void *)filp;
> +		}
> +
> +		uobj->type = type;
> +		uobj->id = _fd;
> +		uobj->object = filp;
> +
> +		return uobj;
> +	} else if (access == UVERBS_IDR_ACCESS_READ) {
> +		struct file *f = fget(fd);
> +		struct ib_uobject *uobject;
> +
> +		if (!f)
> +			return ERR_PTR(-EBADF);
> +
> +		uobject = f->private_data - sizeof(struct ib_uobject);
> +		if (f->f_op != type->fd.fops ||
> +		    !uobject->live) {
> +			fput(f);
> +			return ERR_PTR(-EBADF);
> +		}
> +
> +		/*
> +		 * No need to protect it with a ref count, as fget
> increases
> +		 * f_count.
> +		 */
> +		return uobject;
> +	} else {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +}
> +
> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
> +{
> +	mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_add(&uobject->list, &uobject->context->uobjects);
> +	mutex_unlock(&uobject->context->uobjects_lock->lock);

Why not just insert the object into the list on creation?

> +	uobject->live = 1;

See my comments above on removing the live field.

> +}
> +
> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool
> lock)
> +{
> +	/*
> +	 * Calling remove requires exclusive access, so it's not possible
> +	 * another thread will use our object.
> +	 */
> +	uobject->live = 0;
> +	uobject->type->free_fn(uobject->type, uobject);
> +	if (lock)
> +		mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_del(&uobject->list);
> +	if (lock)
> +		mutex_unlock(&uobject->context->uobjects_lock->lock);
> +	remove_uobj(uobject);
> +	put_uobj(uobject);
> +}
> +
> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
> +			      enum uverbs_idr_access access,
> +			      bool success)
> +{
> +	switch (access) {
> +	case UVERBS_IDR_ACCESS_READ:
> +		up_read(&uobj->usecnt);
> +		break;
> +	case UVERBS_IDR_ACCESS_NEW:
> +		if (success) {
> +			ib_uverbs_uobject_enable(uobj);
> +		} else {
> +			remove_uobj(uobj);
> +			put_uobj(uobj);
> +		}
> +		break;
> +	case UVERBS_IDR_ACCESS_WRITE:
> +		up_write(&uobj->usecnt);
> +		break;
> +	case UVERBS_IDR_ACCESS_DESTROY:
> +		if (success)
> +			ib_uverbs_uobject_remove(uobj, true);
> +		else
> +			up_write(&uobj->usecnt);
> +		break;
> +	}
> +}
> +
> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
> +			     enum uverbs_idr_access access,
> +			     bool success)
> +{
> +	struct file *filp = uobj->object;
> +
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		if (success) {
> +			kref_get(&uobj->context->ufile->ref);
> +			uobj->uobjects_lock = uobj->context->uobjects_lock;
> +			kref_get(&uobj->uobjects_lock->ref);
> +			ib_uverbs_uobject_enable(uobj);
> +			fd_install(uobj->id, uobj->object);

I don't get this.  The function is unlocking something, but there are calls to get krefs?

> +		} else {
> +			fput(uobj->object);
> +			put_unused_fd(uobj->id);
> +			kfree(uobj);
> +		}
> +	} else {
> +		fput(filp);
> +	}
> +}
> +
> +void uverbs_unlock_object(struct ib_uobject *uobj,
> +			  enum uverbs_idr_access access,
> +			  bool success)
> +{
> +	if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
> +		uverbs_unlock_idr(uobj, access, success);
> +	else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
> +		uverbs_unlock_fd(uobj, access, success);
> +	else
> +		WARN_ON(true);
> +}
> +
> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
> +{
> +	/*
> +	 * user should release the uobject in the release
> +	 * callback.
> +	 */
> +	if (uobject->live) {
> +		uobject->live = 0;
> +		list_del(&uobject->list);
> +		uobject->type->free_fn(uobject->type, uobject);
> +		kref_put(&uobject->context->ufile->ref,
> ib_uverbs_release_file);
> +		uobject->context = NULL;
> +	}
> +}
> +
> +void ib_uverbs_close_fd(struct file *f)
> +{
> +	struct ib_uobject *uobject = f->private_data - sizeof(struct
> ib_uobject);
> +
> +	mutex_lock(&uobject->uobjects_lock->lock);
> +	if (uobject->live) {
> +		uobject->live = 0;
> +		list_del(&uobject->list);
> +		kref_put(&uobject->context->ufile->ref,
> ib_uverbs_release_file);
> +		uobject->context = NULL;
> +	}
> +	mutex_unlock(&uobject->uobjects_lock->lock);
> +	kref_put(&uobject->uobjects_lock->ref,
> release_uobjects_list_lock);
> +}
> +
> +void ib_uverbs_cleanup_fd(void *private_data)
> +{
> +	struct ib_uboject *uobject = private_data - sizeof(struct
> ib_uobject);
> +
> +	kfree(uobject);
> +}
> +
> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> +			   size_t num,
> +			   const struct uverbs_action_spec *spec,
> +			   bool success)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < num; i++) {
> +		struct uverbs_attr_array *attr_spec_array = &attr_array[i];
> +		const struct uverbs_attr_group_spec *group_spec =
> +			spec->attr_groups[i];
> +		unsigned int j;
> +
> +		for (j = 0; j < attr_spec_array->num_attrs; j++) {
> +			struct uverbs_attr *attr = &attr_spec_array-
> >attrs[j];
> +			struct uverbs_attr_spec *spec = &group_spec-
> >attrs[j];
> +
> +			if (!attr->valid)
> +				continue;
> +
> +			if (spec->type == UVERBS_ATTR_TYPE_IDR ||
> +			    spec->type == UVERBS_ATTR_TYPE_FD)
> +				/*
> +				 * refcounts should be handled at the object
> +				 * level and not at the uobject level.
> +				 */
> +				uverbs_unlock_object(attr->obj_attr.uobject,
> +						     spec->obj.access, success);
> +		}
> +	}
> +}
> +
> +static unsigned int get_type_orders(const struct uverbs_types_group
> *types_group)
> +{
> +	unsigned int i;
> +	unsigned int max = 0;
> +
> +	for (i = 0; i < types_group->num_groups; i++) {
> +		unsigned int j;
> +		const struct uverbs_types *types = types_group-
> >type_groups[i];
> +
> +		for (j = 0; j < types->num_types; j++) {
> +			if (!types->types[j] || !types->types[j]->alloc)
> +				continue;
> +			if (types->types[j]->alloc->order > max)
> +				max = types->types[j]->alloc->order;
> +		}
> +	}
> +
> +	return max;
> +}
> +
> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> *ucontext,
> +					     const struct uverbs_types_group
> *types_group)
> +{
> +	unsigned int num_orders = get_type_orders(types_group);
> +	unsigned int i;
> +
> +	for (i = 0; i <= num_orders; i++) {
> +		struct ib_uobject *obj, *next_obj;
> +
> +		/*
> +		 * No need to take lock here, as cleanup should be called
> +		 * after all commands finished executing. Newly executed
> +		 * commands should fail.
> +		 */
> +		mutex_lock(&ucontext->uobjects_lock->lock);

It's really confusing to see a comment about 'no need to take lock' immediately followed by a call to lock.

> +		list_for_each_entry_safe(obj, next_obj, &ucontext-
> >uobjects,
> +					 list)
> +			if (obj->type->order == i) {
> +				if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
> +					ib_uverbs_uobject_remove(obj, false);
> +				else
> +					ib_uverbs_remove_fd(obj);
> +			}
> +		mutex_unlock(&ucontext->uobjects_lock->lock);
> +	}
> +	kref_put(&ucontext->uobjects_lock->ref,
> release_uobjects_list_lock);
> +}
> +
> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> *ucontext)

Please work on the function names.  This is horrendously long and still doesn't help describe what it does.

> +{
> +	ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
> >uobjects_lock),
> +					  GFP_KERNEL);
> +	if (!ucontext->uobjects_lock)
> +		return -ENOMEM;
> +
> +	init_uobjects_list_lock(ucontext->uobjects_lock);
> +	INIT_LIST_HEAD(&ucontext->uobjects);
> +
> +	return 0;
> +}
> +
> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> *ucontext)
> +{
> +	kfree(ucontext->uobjects_lock);
> +}

No need to wrap a call to 'free'.

> +
> diff --git a/drivers/infiniband/core/rdma_core.h
> b/drivers/infiniband/core/rdma_core.h
> new file mode 100644
> index 0000000..8990115
> --- /dev/null
> +++ b/drivers/infiniband/core/rdma_core.h
> @@ -0,0 +1,75 @@
> +/*
> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef UOBJECT_H
> +#define UOBJECT_H
> +
> +#include <linux/idr.h>
> +#include <rdma/uverbs_ioctl.h>
> +#include <rdma/ib_verbs.h>
> +#include <linux/mutex.h>
> +
> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> *ibdev,
> +					  uint16_t type);
> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> uverbs_type_alloc_action *type,
> +					    struct ib_ucontext *ucontext,
> +					    enum uverbs_idr_access access,
> +					    uint32_t idr);
> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> uverbs_type_alloc_action *type,
> +					   struct ib_ucontext *ucontext,
> +					   enum uverbs_idr_access access,
> +					   int fd);
> +void uverbs_unlock_object(struct ib_uobject *uobj,
> +			  enum uverbs_idr_access access,
> +			  bool success);
> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> +			   size_t num,
> +			   const struct uverbs_action_spec *spec,
> +			   bool success);
> +
> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> *ucontext,
> +					     const struct uverbs_types_group
> *types_group);
> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> *ucontext);
> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> *ucontext);
> +void ib_uverbs_close_fd(struct file *f);
> +void ib_uverbs_cleanup_fd(void *private_data);
> +
> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
> +{
> +	return uobj + 1;
> +}

This seems like a rather useless function.

> +
> +#endif /* UIDR_H */
> diff --git a/drivers/infiniband/core/uverbs.h
> b/drivers/infiniband/core/uverbs.h
> index 8074705..ae7d4b8 100644
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
> *uverbs_file,
>  					struct ib_device *ib_dev,
>  					int is_async);
> +void ib_uverbs_release_file(struct kref *ref);
>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
> *uverbs_file);
>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
> 
> diff --git a/drivers/infiniband/core/uverbs_main.c
> b/drivers/infiniband/core/uverbs_main.c
> index f783723..e63357a 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
> ib_uverbs_device *dev)
>  	complete(&dev->comp);
>  }
> 
> -static void ib_uverbs_release_file(struct kref *ref)
> +void ib_uverbs_release_file(struct kref *ref)
>  {
>  	struct ib_uverbs_file *file =
>  		container_of(ref, struct ib_uverbs_file, ref);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index b5d2075..7240615 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
> 
>  struct ib_umem;
> 
> +struct ib_ucontext_lock;
> +
>  struct ib_ucontext {
>  	struct ib_device       *device;
> +	struct ib_uverbs_file  *ufile;
>  	struct list_head	pd_list;
>  	struct list_head	mr_list;
>  	struct list_head	mw_list;
> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>  	struct list_head	rwq_ind_tbl_list;
>  	int			closing;
> 
> +	/* lock for uobjects list */
> +	struct ib_ucontext_lock	*uobjects_lock;
> +	struct list_head	uobjects;
> +
>  	struct pid             *tgid;
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>  	struct rb_root      umem_tree;
> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>  #endif
>  };
> 
> +struct uverbs_object_list;
> +
> +#define OLD_ABI_COMPAT
> +
>  struct ib_uobject {
>  	u64			user_handle;	/* handle given to us by userspace
> */
>  	struct ib_ucontext     *context;	/* associated user context
> */
>  	void		       *object;		/* containing object */
>  	struct list_head	list;		/* link to context's list */
> -	int			id;		/* index into kernel idr */
> -	struct kref		ref;
> -	struct rw_semaphore	mutex;		/* protects .live */
> +	int			id;		/* index into kernel idr/fd */
> +#ifdef OLD_ABI_COMPAT
> +	struct kref             ref;
> +#endif
> +	struct rw_semaphore	usecnt;		/* protects exclusive
> access */
> +#ifdef OLD_ABI_COMPAT
> +	struct rw_semaphore     mutex;          /* protects .live */
> +#endif
>  	struct rcu_head		rcu;		/* kfree_rcu() overhead */
>  	int			live;
> +
> +	const struct uverbs_type_alloc_action *type;
> +	struct ib_ucontext_lock	*uobjects_lock;
>  };
> 
>  struct ib_udata {
> @@ -2101,6 +2120,9 @@ struct ib_device {
>  	 */
>  	int (*get_port_immutable)(struct ib_device *, u8, struct
> ib_port_immutable *);
>  	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
> str_len);
> +	struct list_head type_list;
> +
> +	const struct uverbs_types_group	*types_group;
>  };
> 
>  struct ib_client {
> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
> new file mode 100644
> index 0000000..2f50045
> --- /dev/null
> +++ b/include/rdma/uverbs_ioctl.h
> @@ -0,0 +1,195 @@
> +/*
> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef _UVERBS_IOCTL_
> +#define _UVERBS_IOCTL_
> +
> +#include <linux/kernel.h>
> +
> +struct uverbs_object_type;
> +struct ib_ucontext;
> +struct ib_uobject;
> +struct ib_device;
> +struct uverbs_uobject_type;
> +
> +/*
> + * =======================================
> + *	Verbs action specifications
> + * =======================================
> + */

I intentionally used urdma (though condensed to 3 letters that I don't recall atm), rather than uverbs.  This will need to work with non-verbs devices and interfaces -- again, consider how this fits with the rdma cm.  Verbs has a very specific meaning, which gets lost if we start referring to everything as 'verbs'.  It's bad enough that we're stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also means ethernet and rdma means nothing. 

> +
> +enum uverbs_attr_type {
> +	UVERBS_ATTR_TYPE_PTR_IN,
> +	UVERBS_ATTR_TYPE_PTR_OUT,
> +	UVERBS_ATTR_TYPE_IDR,
> +	UVERBS_ATTR_TYPE_FD,
> +};
> +
> +enum uverbs_idr_access {
> +	UVERBS_IDR_ACCESS_READ,
> +	UVERBS_IDR_ACCESS_WRITE,
> +	UVERBS_IDR_ACCESS_NEW,
> +	UVERBS_IDR_ACCESS_DESTROY
> +};
> +
> +struct uverbs_attr_spec {
> +	u16				len;
> +	enum uverbs_attr_type		type;
> +	struct {
> +		u16			obj_type;
> +		u8			access;

Is access intended to be an enum uverbs_idr_access value?

> +	} obj;

I would remove (flatten) the substructure and re-order the fields for better alignment.

> +};
> +
> +struct uverbs_attr_group_spec {
> +	struct uverbs_attr_spec		*attrs;
> +	size_t				num_attrs;
> +};
> +
> +struct uverbs_action_spec {
> +	const struct uverbs_attr_group_spec		**attr_groups;
> +	/* if > 0 -> validator, otherwise, error */

? not sure what this comment means

> +	int (*dist)(__u16 *attr_id, void *priv);

What does 'dist' stand for?

> +	void						*priv;
> +	size_t						num_groups;
> +};
> +
> +struct uverbs_attr_array;
> +struct ib_uverbs_file;
> +
> +struct uverbs_action {
> +	struct uverbs_action_spec spec;
> +	void *priv;
> +	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
> *ufile,
> +		       struct uverbs_attr_array *ctx, size_t num, void
> *priv);
> +};
> +
> +struct uverbs_type_alloc_action;
> +typedef void (*free_type)(const struct uverbs_type_alloc_action
> *uobject_type,
> +			  struct ib_uobject *uobject);
> +
> +struct uverbs_type_alloc_action {
> +	enum uverbs_attr_type		type;
> +	int				order;

I think this is being used as destroy order, in which case I would rename it to clarify the intent.  Though I'd prefer we come up with a more efficient destruction mechanism than the repeated nested looping.

> +	size_t				obj_size;

This can be alloc_fn

> +	free_type			free_fn;
> +	struct {
> +		const struct file_operations	*fops;
> +		const char			*name;
> +		int				flags;
> +	} fd;
> +};
> +
> +struct uverbs_type_actions_group {
> +	size_t					num_actions;
> +	const struct uverbs_action		**actions;
> +};
> +
> +struct uverbs_type {
> +	size_t					num_groups;
> +	const struct uverbs_type_actions_group	**action_groups;
> +	const struct uverbs_type_alloc_action	*alloc;
> +	int (*dist)(__u16 *action_id, void *priv);
> +	void					*priv;
> +};
> +
> +struct uverbs_types {
> +	size_t					num_types;
> +	const struct uverbs_type		**types;
> +};
> +
> +struct uverbs_types_group {
> +	const struct uverbs_types		**type_groups;
> +	size_t					num_groups;
> +	int (*dist)(__u16 *type_id, void *priv);
> +	void					*priv;
> +};
> +
> +/* =================================================
> + *              Parsing infrastructure
> + * =================================================
> + */
> +
> +struct uverbs_ptr_attr {
> +	void	* __user ptr;
> +	__u16		len;
> +};
> +
> +struct uverbs_fd_attr {
> +	int		fd;
> +};
> +
> +struct uverbs_uobj_attr {
> +	/*  idr handle */
> +	__u32	idr;
> +};
> +
> +struct uverbs_obj_attr {
> +	/* pointer to the kernel descriptor -> type, access, etc */
> +	const struct uverbs_attr_spec *val;
> +	struct ib_uverbs_attr __user	*uattr;
> +	const struct uverbs_type_alloc_action	*type;
> +	struct ib_uobject		*uobject;
> +	union {
> +		struct uverbs_fd_attr		fd;
> +		struct uverbs_uobj_attr		uobj;
> +	};
> +};
> +
> +struct uverbs_attr {
> +	bool valid;
> +	union {
> +		struct uverbs_ptr_attr	cmd_attr;
> +		struct uverbs_obj_attr	obj_attr;
> +	};
> +};

It's odd to have a union that's part of a structure without some field to indicate which union field is accessible.

> +
> +/* output of one validator */
> +struct uverbs_attr_array {
> +	size_t num_attrs;
> +	/* arrays of attrubytes, index is the id i.e SEND_CQ */
> +	struct uverbs_attr *attrs;
> +};
> +
> +/* =================================================
> + *              Types infrastructure
> + * =================================================
> + */
> +
> +int ib_uverbs_uobject_type_add(struct list_head	*head,
> +			       void (*free)(struct uverbs_uobject_type *type,
> +					    struct ib_uobject *uobject,
> +					    struct ib_ucontext *ucontext),
> +			       uint16_t	obj_type);
> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
> +
> +#endif
> --
> 2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Post userspace patches here, or pull requests to github?
From: Jeff Squyres (jsquyres) @ 2016-10-30 19:11 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161030180356.GA25939-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Oct 30, 2016, at 2:03 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
> We have been asking for posting here as well..

Ok.

> Are you still using usnic with verbs? I thought you moved everything
> to libfabric? Should the usnic verbs provider be added to rdma-core?

The usnic_verbs driver is upstream in the kernel, and ib core returns that long string; that's the primary reason for this fix.

I don't think there's a need to have a usnic verbs provider in rdma-core at this time.

-- 
Jeff Squyres
jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Post userspace patches here, or pull requests to github?
From: Jason Gunthorpe @ 2016-10-30 18:03 UTC (permalink / raw)
  To: Jeff Squyres (jsquyres)
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <2DD5ED3D-7F86-4AAF-863B-F2ED30B9A798-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>

On Sun, Oct 30, 2016 at 12:57:37PM +0000, Jeff Squyres (jsquyres) wrote:
> I posted a bug fix pull request for libibverbs yesterday (fixing a userspace/kernel space buffer size mismatch):
> 
>    https://github.com/linux-rdma/rdma-core/pull/31
> 
> Is that sufficient, or does it also need to be posted here?

We have been asking for posting here as well..

Are you still using usnic with verbs? I thought you moved everything
to libfabric? Should the usnic verbs provider be added to rdma-core?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: David Miller @ 2016-10-30 16:02 UTC (permalink / raw)
  To: saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	talal-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <CALzJLG8cN0VUiTHDdkgibObA970UsAP+E7E=DSgY1RKNefSyzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Saeed Mahameed <saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Date: Sun, 30 Oct 2016 11:59:57 +0200

> On Fri, Oct 28, 2016 at 7:53 PM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>>
>> I really disalike pull requests of this form.
>>
>> You add lots of datastructures and helper functions but no actual
>> users of these facilities to the driver.
>>
>> Do this instead:
>>
>>         1) Add TSAR infrastructure
>>         2) Add use of TSAR facilities to the driver
>>
>> That's one pull request.
>>
>> I don't care if this is hard, or if there are entanglements with
>> Infiniband or whatever, you must submit changes in this manner.
>>
> 
> It is not hard, it is just not right,  we have lots of IB and ETH
> features that we would like to submit in the same kernel cycle,
> with your suggestion I will have to almost submit every feature (core
> infrastructure and netdev/RDMA usage)
> to you and Doug.

Nobody can properly review an API addition without seeing how that
API is _USED_.

This is a simple fundamental fact.

And I'm not pulling in code that can't be reviewed properly.

Also, so many times people have added new junk to drivers and months
later never added the users of that new code and interfaces.

Forcing you to provide the use with the API addition makes sure that
it is absolutely impossible for that to happen.

Whatever issues you think prevent this are your issues, not mine.  I
want high quality submissions that can be properly reviewed, and you
have to find a way to satisfy that requirement.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Post userspace patches here, or pull requests to github?
From: Jeff Squyres (jsquyres) @ 2016-10-30 12:57 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I posted a bug fix pull request for libibverbs yesterday (fixing a userspace/kernel space buffer size mismatch):

   https://github.com/linux-rdma/rdma-core/pull/31

Is that sufficient, or does it also need to be posted here?

-- 
Jeff Squyres
jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RFC if==else in i40iw_virtchnl.
From: Nicholas Mc Guire @ 2016-10-30 12:33 UTC (permalink / raw)
  To: Faisal Latif; +Cc: linux-rdma, linux-kernel

Hi Faisal !

 your commit 4097351a47c5 ("i40iw: virtual channel handling files")
 adds the following lines in i40iw_vchnl_recv_pf()

+               if (vchnl_msg->iw_op_ver != I40IW_VCHNL_OP_GET_VER_V0)
+                       vchnl_pf_send_get_ver_resp(dev, vf_id, vchnl_msg);
+               else
+                       vchnl_pf_send_get_ver_resp(dev, vf_id, vchnl_msg);
+               return I40IW_SUCCESS;

 as the if==else here this looks buggy - if it is intended it atleast needs
 a comment/explaination but it did not seem to make much sense in this
 form (Note this is the only place where vchnl_pf_send_get_ver_resp() is 
 being called so the intent is not clear from code review and thus no patch 
 can be resonably suggested.

thx!
hofrat

^ permalink raw reply

* Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: Saeed Mahameed @ 2016-10-30  9:59 UTC (permalink / raw)
  To: David Miller
  Cc: Saeed Mahameed, Doug Ledford, Linux Netdev List,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Leon Romanovsky,
	Tal Alon, Matan Barak
In-Reply-To: <20161028.135309.1712496950641242201.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Fri, Oct 28, 2016 at 7:53 PM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>
> I really disalike pull requests of this form.
>
> You add lots of datastructures and helper functions but no actual
> users of these facilities to the driver.
>
> Do this instead:
>
>         1) Add TSAR infrastructure
>         2) Add use of TSAR facilities to the driver
>
> That's one pull request.
>
> I don't care if this is hard, or if there are entanglements with
> Infiniband or whatever, you must submit changes in this manner.
>

It is not hard, it is just not right,  we have lots of IB and ETH
features that we would like to submit in the same kernel cycle,
with your suggestion I will have to almost submit every feature (core
infrastructure and netdev/RDMA usage)
to you and Doug.  Same for rdma features,  you will receive PULL
request for them as well,
I am sure you and the netdev list don't need such noise.  do not
forget that this will slow down mlx5 progress since
netde will block rdma and vise-versa.

> I will not accept additions to a driver that don't even get really
> used.

For logic/helper functions containing patches such as "Add TSAR
infrastructure" I agree and i can find a way to move some code around
to
avoid future conflicts and remove them from such pull requests.

but you need to at least accept hardware related structures
infrastructure patches for shared code such as
include/linux/mlx5/mlx5_ifc.h where we have only hardware definitions
and those patches are really minimal.

So bottom line, I will do my best to ensure future PULL requests will
contain only include/linux/mlx5/*.h hardware related definitions
or fully implemented features.

Can we agree on that ?

Thanks,
Saeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox