Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net-next 07/14] net/mlx5: E-Switch, Have explicit API to delete fwd rules
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Or Gerlitz, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Be symmetric with the e-switch API to add rules which has a
specific function to add fwd rules which are used as part of
vport mirroring.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c           | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h         | 4 ++++
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c    | 8 ++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a4a432f02930..7487bdd55f23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -920,7 +920,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 	if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
 		flow->flags &= ~MLX5E_TC_FLOW_OFFLOADED;
 		if (attr->mirror_count)
-			mlx5_eswitch_del_offloaded_rule(esw, flow->rule[1], attr);
+			mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
 		mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
 	}
 
@@ -996,7 +996,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 
 			flow->flags &= ~MLX5E_TC_FLOW_OFFLOADED;
 			if (attr->mirror_count)
-				mlx5_eswitch_del_offloaded_rule(esw, flow->rule[1], attr);
+				mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
 			mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
 		}
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1698a322a7c4..584e735bbad1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -231,6 +231,10 @@ void
 mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_handle *rule,
 				struct mlx5_esw_flow_attr *attr);
+void
+mlx5_eswitch_del_fwd_rule(struct mlx5_eswitch *esw,
+			  struct mlx5_flow_handle *rule,
+			  struct mlx5_esw_flow_attr *attr);
 
 struct mlx5_flow_handle *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 39932dce15cb..983bb8a80f75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -194,6 +194,14 @@ mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 	esw->offloads.num_flows--;
 }
 
+void
+mlx5_eswitch_del_fwd_rule(struct mlx5_eswitch *esw,
+			  struct mlx5_flow_handle *rule,
+			  struct mlx5_esw_flow_attr *attr)
+{
+	mlx5_eswitch_del_offloaded_rule(esw, rule, attr);
+}
+
 static int esw_set_global_vlan_pop(struct mlx5_eswitch *esw, u8 val)
 {
 	struct mlx5_eswitch_rep *rep;
-- 
2.17.2

^ permalink raw reply related

* [net-next 06/14] net/mlx5: Split FDB fast path prio to multiple namespaces
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Paul Blakey, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Paul Blakey <paulb@mellanox.com>

Towards supporting multi-chains and priorities, split the FDB fast path
to multiple namespaces (sub namespaces), each with multiple priorities.

This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is
like current FS_TYPE_PRIO, but may contain only namespaces, and those
will be in parallel to one another in terms of managing of the flow
tables connections inside them. Meaning, while searching for the next
or previous flow table to connect for a new table inside such namespace
we skip the parallel namespaces in the same level under the
FS_TYPE_PRIO_CHAINS prio we originated from.

We use this new type for splitting the fast path prio into multiple
parallel namespaces, each containing normal prios.
The prios inside them (and their tables) will be connected to one
another, but not from one parallel namespace to another, instead the
last prio in each namespace will be connected to the next prio in
the containing FDB namespace, which is the slow path prio.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  7 ++
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 88 ++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/fs_core.h | 13 +++
 include/linux/mlx5/fs.h                       |  2 +
 5 files changed, 101 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 9c893d7d273e..d004957328f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -263,7 +263,7 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
 	esw_debug(dev, "Create FDB log_max_size(%d)\n",
 		  MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
 
-	root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
+	root_ns = mlx5_get_fdb_sub_ns(dev, 0);
 	if (!root_ns) {
 		esw_warn(dev, "Failed to get FDB flow namespace\n");
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index c1b627577003..1698a322a7c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -59,6 +59,9 @@
 #define mlx5_esw_has_fwd_fdb(dev) \
 	MLX5_CAP_ESW_FLOWTABLE(dev, fdb_multi_path_to_table)
 
+#define FDB_MAX_CHAIN 3
+#define FDB_MAX_PRIO 16
+
 struct vport_ingress {
 	struct mlx5_flow_table *acl;
 	struct mlx5_flow_group *allow_untagged_spoofchk_grp;
@@ -319,6 +322,10 @@ static inline void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) {}
 static inline void mlx5_eswitch_vport_event(struct mlx5_eswitch *esw, struct mlx5_eqe *eqe) {}
 static inline int  mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode) { return 0; }
 static inline void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw) {}
+
+#define FDB_MAX_CHAIN 1
+#define FDB_MAX_PRIO 1
+
 #endif /* CONFIG_MLX5_ESWITCH */
 
 #endif /* __MLX5_ESWITCH_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index cdcbf9d0ae6c..7eb6d58733ac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,6 +40,7 @@
 #include "diag/fs_tracepoint.h"
 #include "accel/ipsec.h"
 #include "fpga/ipsec.h"
+#include "eswitch.h"
 
 #define INIT_TREE_NODE_ARRAY_SIZE(...)	(sizeof((struct init_tree_node[]){__VA_ARGS__}) /\
 					 sizeof(struct init_tree_node))
@@ -713,7 +714,7 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node  *root,
 	struct fs_node *iter = list_entry(start, struct fs_node, list);
 	struct mlx5_flow_table *ft = NULL;
 
-	if (!root)
+	if (!root || root->type == FS_TYPE_PRIO_CHAINS)
 		return NULL;
 
 	list_for_each_advance_continue(iter, &root->children, reverse) {
@@ -1973,6 +1974,18 @@ void mlx5_destroy_flow_group(struct mlx5_flow_group *fg)
 			       fg->id);
 }
 
+struct mlx5_flow_namespace *mlx5_get_fdb_sub_ns(struct mlx5_core_dev *dev,
+						int n)
+{
+	struct mlx5_flow_steering *steering = dev->priv.steering;
+
+	if (!steering || !steering->fdb_sub_ns)
+		return NULL;
+
+	return steering->fdb_sub_ns[n];
+}
+EXPORT_SYMBOL(mlx5_get_fdb_sub_ns);
+
 struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 						    enum mlx5_flow_namespace_type type)
 {
@@ -2051,8 +2064,10 @@ struct mlx5_flow_namespace *mlx5_get_flow_vport_acl_namespace(struct mlx5_core_d
 	}
 }
 
-static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
-				      unsigned int prio, int num_levels)
+static struct fs_prio *_fs_create_prio(struct mlx5_flow_namespace *ns,
+				       unsigned int prio,
+				       int num_levels,
+				       enum fs_node_type type)
 {
 	struct fs_prio *fs_prio;
 
@@ -2060,7 +2075,7 @@ static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
 	if (!fs_prio)
 		return ERR_PTR(-ENOMEM);
 
-	fs_prio->node.type = FS_TYPE_PRIO;
+	fs_prio->node.type = type;
 	tree_init_node(&fs_prio->node, NULL, del_sw_prio);
 	tree_add_node(&fs_prio->node, &ns->node);
 	fs_prio->num_levels = num_levels;
@@ -2070,6 +2085,19 @@ static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
 	return fs_prio;
 }
 
+static struct fs_prio *fs_create_prio_chained(struct mlx5_flow_namespace *ns,
+					      unsigned int prio,
+					      int num_levels)
+{
+	return _fs_create_prio(ns, prio, num_levels, FS_TYPE_PRIO_CHAINS);
+}
+
+static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
+				      unsigned int prio, int num_levels)
+{
+	return _fs_create_prio(ns, prio, num_levels, FS_TYPE_PRIO);
+}
+
 static struct mlx5_flow_namespace *fs_init_namespace(struct mlx5_flow_namespace
 						     *ns)
 {
@@ -2374,6 +2402,9 @@ void mlx5_cleanup_fs(struct mlx5_core_dev *dev)
 	cleanup_egress_acls_root_ns(dev);
 	cleanup_ingress_acls_root_ns(dev);
 	cleanup_root_ns(steering->fdb_root_ns);
+	steering->fdb_root_ns = NULL;
+	kfree(steering->fdb_sub_ns);
+	steering->fdb_sub_ns = NULL;
 	cleanup_root_ns(steering->sniffer_rx_root_ns);
 	cleanup_root_ns(steering->sniffer_tx_root_ns);
 	cleanup_root_ns(steering->egress_root_ns);
@@ -2419,27 +2450,64 @@ static int init_sniffer_rx_root_ns(struct mlx5_flow_steering *steering)
 
 static int init_fdb_root_ns(struct mlx5_flow_steering *steering)
 {
-	struct fs_prio *prio;
+	struct mlx5_flow_namespace *ns;
+	struct fs_prio *maj_prio;
+	struct fs_prio *min_prio;
+	int levels;
+	int chain;
+	int prio;
+	int err;
 
 	steering->fdb_root_ns = create_root_ns(steering, FS_FT_FDB);
 	if (!steering->fdb_root_ns)
 		return -ENOMEM;
 
-	prio = fs_create_prio(&steering->fdb_root_ns->ns, 0, 2);
-	if (IS_ERR(prio))
+	steering->fdb_sub_ns = kzalloc(sizeof(steering->fdb_sub_ns) *
+				       FDB_MAX_CHAIN + 1, GFP_KERNEL);
+	if (!steering->fdb_sub_ns)
+		return -ENOMEM;
+
+	levels = 2 * FDB_MAX_PRIO * (FDB_MAX_CHAIN + 1);
+	maj_prio = fs_create_prio_chained(&steering->fdb_root_ns->ns, 0,
+					  levels);
+	if (IS_ERR(maj_prio)) {
+		err = PTR_ERR(maj_prio);
 		goto out_err;
+	}
 
-	prio = fs_create_prio(&steering->fdb_root_ns->ns, 1, 1);
-	if (IS_ERR(prio))
+	for (chain = 0; chain <= FDB_MAX_CHAIN; chain++) {
+		ns = fs_create_namespace(maj_prio);
+		if (IS_ERR(ns)) {
+			err = PTR_ERR(ns);
+			goto out_err;
+		}
+
+		for (prio = 0; prio < FDB_MAX_PRIO * (chain + 1); prio++) {
+			min_prio = fs_create_prio(ns, prio, 2);
+			if (IS_ERR(min_prio)) {
+				err = PTR_ERR(min_prio);
+				goto out_err;
+			}
+		}
+
+		steering->fdb_sub_ns[chain] = ns;
+	}
+
+	maj_prio = fs_create_prio(&steering->fdb_root_ns->ns, 1, 1);
+	if (IS_ERR(maj_prio)) {
+		err = PTR_ERR(maj_prio);
 		goto out_err;
+	}
 
 	set_prio_attrs(steering->fdb_root_ns);
 	return 0;
 
 out_err:
 	cleanup_root_ns(steering->fdb_root_ns);
+	kfree(steering->fdb_sub_ns);
+	steering->fdb_sub_ns = NULL;
 	steering->fdb_root_ns = NULL;
-	return PTR_ERR(prio);
+	return err;
 }
 
 static int init_egress_acl_root_ns(struct mlx5_flow_steering *steering, int vport)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index a06f83c0c2b6..b51ad217da32 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -38,9 +38,21 @@
 #include <linux/rhashtable.h>
 #include <linux/llist.h>
 
+/* FS_TYPE_PRIO_CHAINS is a PRIO that will have namespaces only,
+ * and those are in parallel to one another when going over them to connect
+ * a new flow table. Meaning the last flow table in a TYPE_PRIO prio in one
+ * parallel namespace will not automatically connect to the first flow table
+ * found in any prio in any next namespace, but skip the entire containing
+ * TYPE_PRIO_CHAINS prio.
+ *
+ * This is used to implement tc chains, each chain of prios is a different
+ * namespace inside a containing TYPE_PRIO_CHAINS prio.
+ */
+
 enum fs_node_type {
 	FS_TYPE_NAMESPACE,
 	FS_TYPE_PRIO,
+	FS_TYPE_PRIO_CHAINS,
 	FS_TYPE_FLOW_TABLE,
 	FS_TYPE_FLOW_GROUP,
 	FS_TYPE_FLOW_ENTRY,
@@ -73,6 +85,7 @@ struct mlx5_flow_steering {
 	struct kmem_cache               *ftes_cache;
 	struct mlx5_flow_root_namespace *root_ns;
 	struct mlx5_flow_root_namespace *fdb_root_ns;
+	struct mlx5_flow_namespace	**fdb_sub_ns;
 	struct mlx5_flow_root_namespace **esw_egress_root_ns;
 	struct mlx5_flow_root_namespace **esw_ingress_root_ns;
 	struct mlx5_flow_root_namespace	*sniffer_tx_root_ns;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index a5fc62184195..f8d00872c7d3 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -101,6 +101,8 @@ struct mlx5_flow_destination {
 	};
 };
 
+struct mlx5_flow_namespace *
+mlx5_get_fdb_sub_ns(struct mlx5_core_dev *dev, int n);
 struct mlx5_flow_namespace *
 mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 			enum mlx5_flow_namespace_type type);
-- 
2.17.2

^ permalink raw reply related

* [net-next 05/14] net/mlx5: Add cap bits for multi fdb encap
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Paul Blakey, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Paul Blakey <paulb@mellanox.com>

If set, the firmware supports creating of flow tables with encap
enabled while VFs are configured, if we already created one
(restriction still applies on the first creation).

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 15e36198f85f..963611820006 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -584,7 +584,9 @@ struct mlx5_ifc_flow_table_nic_cap_bits {
 struct mlx5_ifc_flow_table_eswitch_cap_bits {
 	u8      reserved_at_0[0x1c];
 	u8      fdb_multi_path_to_table[0x1];
-	u8      reserved_at_1d[0x1e3];
+	u8      reserved_at_1d[0x1];
+	u8      multi_fdb_encap[0x1];
+	u8      reserved_at_1e[0x1e1];

 	struct mlx5_ifc_flow_table_prop_layout_bits flow_table_properties_nic_esw_fdb;

-- 
2.17.2

^ permalink raw reply related

* [net-next 03/14] net/mlx5e: Change return type of tc add flow functions
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Rabie Loulou, Shahar Klein, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Rabie Loulou <rabiel@mellanox.com>

Refactor the flow add utility functions to return err code instead of rule
pointers. This will allow for simpler logic when one tc rule is
duplicated to two HW rules in downstream patches.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 86 +++++++++----------
 1 file changed, 39 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5ce87f54852d..861986f82844 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -673,7 +673,7 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
 	}
 }
 
-static struct mlx5_flow_handle *
+static int
 mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		      struct mlx5e_tc_flow_parse_attr *parse_attr,
 		      struct mlx5e_tc_flow *flow,
@@ -689,14 +689,12 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		.reformat_id = 0,
 	};
 	struct mlx5_fc *counter = NULL;
-	struct mlx5_flow_handle *rule;
 	bool table_created = false;
 	int err, dest_ix = 0;
 
 	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN) {
 		err = mlx5e_hairpin_flow_add(priv, flow, parse_attr, extack);
 		if (err) {
-			rule = ERR_PTR(err);
 			goto err_add_hairpin_flow;
 		}
 		if (flow->flags & MLX5E_TC_FLOW_HAIRPIN_RSS) {
@@ -716,7 +714,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		counter = mlx5_fc_create(dev, true);
 		if (IS_ERR(counter)) {
-			rule = ERR_CAST(counter);
+			err = PTR_ERR(counter);
 			goto err_fc_create;
 		}
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
@@ -729,10 +727,8 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
 		flow_act.modify_id = attr->mod_hdr_id;
 		kfree(parse_attr->mod_hdr_actions);
-		if (err) {
-			rule = ERR_PTR(err);
+		if (err)
 			goto err_create_mod_hdr_id;
-		}
 	}
 
 	if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
@@ -758,7 +754,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 					   "Failed to create tc offload table\n");
 			netdev_err(priv->netdev,
 				   "Failed to create tc offload table\n");
-			rule = ERR_CAST(priv->fs.tc.t);
+			err = PTR_ERR(priv->fs.tc.t);
 			goto err_create_ft;
 		}
 
@@ -768,13 +764,15 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	if (attr->match_level != MLX5_MATCH_NONE)
 		parse_attr->spec.match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 
-	rule = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
-				   &flow_act, dest, dest_ix);
+	flow->rule[0] = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
+					    &flow_act, dest, dest_ix);
 
-	if (IS_ERR(rule))
+	if (IS_ERR(flow->rule[0])) {
+		err = PTR_ERR(flow->rule[0]);
 		goto err_add_rule;
+	}
 
-	return rule;
+	return 0;
 
 err_add_rule:
 	if (table_created) {
@@ -790,7 +788,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN)
 		mlx5e_hairpin_flow_del(priv, flow);
 err_add_hairpin_flow:
-	return rule;
+	return err;
 }
 
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
@@ -825,7 +823,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 			      struct mlx5e_tc_flow *flow,
 			      struct netlink_ext_ack *extack);
 
-static struct mlx5_flow_handle *
+static int
 mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		      struct mlx5e_tc_flow_parse_attr *parse_attr,
 		      struct mlx5e_tc_flow *flow,
@@ -834,21 +832,20 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct net_device *out_dev, *encap_dev = NULL;
-	struct mlx5_flow_handle *rule = NULL;
 	struct mlx5_fc *counter = NULL;
 	struct mlx5e_rep_priv *rpriv;
 	struct mlx5e_priv *out_priv;
-	int err;
+	int err = 0, encap_err = 0;
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT) {
 		out_dev = __dev_get_by_index(dev_net(priv->netdev),
 					     attr->parse_attr->mirred_ifindex);
-		err = mlx5e_attach_encap(priv, &parse_attr->tun_info,
-					 out_dev, &encap_dev, flow, extack);
-		if (err) {
-			rule = ERR_PTR(err);
-			if (err != -EAGAIN)
-				goto err_attach_encap;
+		encap_err = mlx5e_attach_encap(priv, &parse_attr->tun_info,
+					       out_dev, &encap_dev, flow,
+					       extack);
+		if (encap_err && encap_err != -EAGAIN) {
+			err = encap_err;
+			goto err_attach_encap;
 		}
 		out_priv = netdev_priv(encap_dev);
 		rpriv = out_priv->ppriv;
@@ -857,49 +854,49 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	}
 
 	err = mlx5_eswitch_add_vlan_action(esw, attr);
-	if (err) {
-		rule = ERR_PTR(err);
+	if (err)
 		goto err_add_vlan;
-	}
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
 		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
 		kfree(parse_attr->mod_hdr_actions);
-		if (err) {
-			rule = ERR_PTR(err);
+		if (err)
 			goto err_mod_hdr;
-		}
 	}
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		counter = mlx5_fc_create(esw->dev, true);
 		if (IS_ERR(counter)) {
-			rule = ERR_CAST(counter);
+			err = PTR_ERR(counter);
 			goto err_create_counter;
 		}
 
 		attr->counter = counter;
 	}
 
-	/* we get here if (1) there's no error (rule being null) or when
+	/* we get here if (1) there's no error or when
 	 * (2) there's an encap action and we're on -EAGAIN (no valid neigh)
 	 */
-	if (rule != ERR_PTR(-EAGAIN)) {
-		rule = mlx5_eswitch_add_offloaded_rule(esw, &parse_attr->spec, attr);
-		if (IS_ERR(rule))
+	if (encap_err != -EAGAIN) {
+		flow->rule[0] = mlx5_eswitch_add_offloaded_rule(esw, &parse_attr->spec, attr);
+		if (IS_ERR(flow->rule[0])) {
+			err = PTR_ERR(flow->rule[0]);
 			goto err_add_rule;
+		}
 
 		if (attr->mirror_count) {
 			flow->rule[1] = mlx5_eswitch_add_fwd_rule(esw, &parse_attr->spec, attr);
-			if (IS_ERR(flow->rule[1]))
+			if (IS_ERR(flow->rule[1])) {
+				err = PTR_ERR(flow->rule[1]);
 				goto err_fwd_rule;
+			}
 		}
 	}
-	return rule;
+
+	return encap_err;
 
 err_fwd_rule:
-	mlx5_eswitch_del_offloaded_rule(esw, rule, attr);
-	rule = flow->rule[1];
+	mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
 err_add_rule:
 	mlx5_fc_destroy(esw->dev, counter);
 err_create_counter:
@@ -911,7 +908,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT)
 		mlx5e_detach_encap(priv, flow);
 err_attach_encap:
-	return rule;
+	return err;
 }
 
 static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
@@ -2959,22 +2956,17 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv,
 					   extack);
 		if (err < 0)
 			goto err_free;
-		flow->rule[0] = mlx5e_tc_add_fdb_flow(priv, parse_attr, flow,
-						      extack);
+		err = mlx5e_tc_add_fdb_flow(priv, parse_attr, flow, extack);
 	} else {
 		err = parse_tc_nic_actions(priv, f->exts, parse_attr, flow,
 					   extack);
 		if (err < 0)
 			goto err_free;
-		flow->rule[0] = mlx5e_tc_add_nic_flow(priv, parse_attr, flow,
-						      extack);
+		err = mlx5e_tc_add_nic_flow(priv, parse_attr, flow, extack);
 	}
 
-	if (IS_ERR(flow->rule[0])) {
-		err = PTR_ERR(flow->rule[0]);
-		if (err != -EAGAIN)
-			goto err_free;
-	}
+	if (err && err != -EAGAIN)
+		goto err_free;
 
 	if (err != -EAGAIN)
 		flow->flags |= MLX5E_TC_FLOW_OFFLOADED;
-- 
2.17.2

^ permalink raw reply related

* [net-next 04/14] net/mlx5e: Split TC add rule path for nic vs e-switch
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Roi Dayan, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Roi Dayan <roid@mellanox.com>

Move to have clear separation on the code path to add nic vs e-switch
flows. While here we break the code that deals with adding offloaded
TC tool to few smaller stages, each on helper function.

Besides getting us simpler and readable code, these are pre-steps
for being able to have two HW flows serving one SW TC flow for some
e-switch use cases.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 185 +++++++++++++-----
 1 file changed, 138 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 861986f82844..a4a432f02930 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2907,34 +2907,15 @@ static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv)
 		return &priv->fs.tc.ht;
 }
 
-int mlx5e_configure_flower(struct mlx5e_priv *priv,
-			   struct tc_cls_flower_offload *f, int flags)
+static int
+mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
+		 struct tc_cls_flower_offload *f, u8 flow_flags,
+		 struct mlx5e_tc_flow_parse_attr **__parse_attr,
+		 struct mlx5e_tc_flow **__flow)
 {
-	struct netlink_ext_ack *extack = f->common.extack;
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
-	struct rhashtable *tc_ht = get_tc_ht(priv);
 	struct mlx5e_tc_flow *flow;
-	int attr_size, err = 0;
-	u8 flow_flags = 0;
-
-	get_flags(flags, &flow_flags);
-
-	flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
-	if (flow) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "flow cookie already exists, ignoring");
-		netdev_warn_once(priv->netdev, "flow cookie %lx already exists, ignoring\n", f->cookie);
-		return 0;
-	}
-
-	if (esw && esw->mode == SRIOV_OFFLOADS) {
-		flow_flags |= MLX5E_TC_FLOW_ESWITCH;
-		attr_size  = sizeof(struct mlx5_esw_flow_attr);
-	} else {
-		flow_flags |= MLX5E_TC_FLOW_NIC;
-		attr_size  = sizeof(struct mlx5_nic_flow_attr);
-	}
+	int err;
 
 	flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL);
 	parse_attr = kvzalloc(sizeof(*parse_attr), GFP_KERNEL);
@@ -2948,45 +2929,155 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv,
 	flow->priv = priv;
 
 	err = parse_cls_flower(priv, flow, &parse_attr->spec, f);
-	if (err < 0)
+	if (err)
 		goto err_free;
 
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
-		err = parse_tc_fdb_actions(priv, f->exts, parse_attr, flow,
-					   extack);
-		if (err < 0)
-			goto err_free;
-		err = mlx5e_tc_add_fdb_flow(priv, parse_attr, flow, extack);
-	} else {
-		err = parse_tc_nic_actions(priv, f->exts, parse_attr, flow,
-					   extack);
-		if (err < 0)
-			goto err_free;
-		err = mlx5e_tc_add_nic_flow(priv, parse_attr, flow, extack);
-	}
+	*__flow = flow;
+	*__parse_attr = parse_attr;
+
+	return 0;
 
+err_free:
+	kfree(flow);
+	kvfree(parse_attr);
+	return err;
+}
+
+static int
+mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
+		   struct tc_cls_flower_offload *f,
+		   u8 flow_flags,
+		   struct mlx5e_tc_flow **__flow)
+{
+	struct netlink_ext_ack *extack = f->common.extack;
+	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	struct mlx5e_tc_flow *flow;
+	int attr_size, err;
+
+	flow_flags |= MLX5E_TC_FLOW_ESWITCH;
+	attr_size  = sizeof(struct mlx5_esw_flow_attr);
+	err = mlx5e_alloc_flow(priv, attr_size, f, flow_flags,
+			       &parse_attr, &flow);
+	if (err)
+		goto out;
+
+	err = parse_tc_fdb_actions(priv, f->exts, parse_attr, flow, extack);
+	if (err)
+		goto err_free;
+
+	err = mlx5e_tc_add_fdb_flow(priv, parse_attr, flow, extack);
 	if (err && err != -EAGAIN)
 		goto err_free;
 
-	if (err != -EAGAIN)
+	if (!err)
 		flow->flags |= MLX5E_TC_FLOW_OFFLOADED;
 
-	if (!(flow->flags & MLX5E_TC_FLOW_ESWITCH) ||
-	    !(flow->esw_attr->action &
+	if (!(flow->esw_attr->action &
 	      MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT))
 		kvfree(parse_attr);
 
-	err = rhashtable_insert_fast(tc_ht, &flow->node, tc_ht_params);
-	if (err) {
-		mlx5e_tc_del_flow(priv, flow);
-		kfree(flow);
-	}
+	*__flow = flow;
 
+	return 0;
+
+err_free:
+	kfree(flow);
+	kvfree(parse_attr);
+out:
 	return err;
+}
+
+static int
+mlx5e_add_nic_flow(struct mlx5e_priv *priv,
+		   struct tc_cls_flower_offload *f,
+		   u8 flow_flags,
+		   struct mlx5e_tc_flow **__flow)
+{
+	struct netlink_ext_ack *extack = f->common.extack;
+	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	struct mlx5e_tc_flow *flow;
+	int attr_size, err;
+
+	flow_flags |= MLX5E_TC_FLOW_NIC;
+	attr_size  = sizeof(struct mlx5_nic_flow_attr);
+	err = mlx5e_alloc_flow(priv, attr_size, f, flow_flags,
+			       &parse_attr, &flow);
+	if (err)
+		goto out;
+
+	err = parse_tc_nic_actions(priv, f->exts, parse_attr, flow, extack);
+	if (err)
+		goto err_free;
+
+	err = mlx5e_tc_add_nic_flow(priv, parse_attr, flow, extack);
+	if (err)
+		goto err_free;
+
+	flow->flags |= MLX5E_TC_FLOW_OFFLOADED;
+	kvfree(parse_attr);
+	*__flow = flow;
+
+	return 0;
 
 err_free:
+	kfree(flow);
 	kvfree(parse_attr);
+out:
+	return err;
+}
+
+static int
+mlx5e_tc_add_flow(struct mlx5e_priv *priv,
+		  struct tc_cls_flower_offload *f,
+		  int flags,
+		  struct mlx5e_tc_flow **flow)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	u8 flow_flags;
+	int err;
+
+	get_flags(flags, &flow_flags);
+
+	if (esw && esw->mode == SRIOV_OFFLOADS)
+		err = mlx5e_add_fdb_flow(priv, f, flow_flags, flow);
+	else
+		err = mlx5e_add_nic_flow(priv, f, flow_flags, flow);
+
+	return err;
+}
+
+int mlx5e_configure_flower(struct mlx5e_priv *priv,
+			   struct tc_cls_flower_offload *f, int flags)
+{
+	struct netlink_ext_ack *extack = f->common.extack;
+	struct rhashtable *tc_ht = get_tc_ht(priv);
+	struct mlx5e_tc_flow *flow;
+	int err = 0;
+
+	flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
+	if (flow) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "flow cookie already exists, ignoring");
+		netdev_warn_once(priv->netdev,
+				 "flow cookie %lx already exists, ignoring\n",
+				 f->cookie);
+		goto out;
+	}
+
+	err = mlx5e_tc_add_flow(priv, f, flags, &flow);
+	if (err)
+		goto out;
+
+	err = rhashtable_insert_fast(tc_ht, &flow->node, tc_ht_params);
+	if (err)
+		goto err_free;
+
+	return 0;
+
+err_free:
+	mlx5e_tc_del_flow(priv, flow);
 	kfree(flow);
+out:
 	return err;
 }
 
-- 
2.17.2

^ permalink raw reply related

* [net-next 02/14] net/mlx5: Use flow counter IDs and not the wrapping cache object
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Mark Bloch, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Mark Bloch <markb@mellanox.com>

Currently, when a flow rule is created using the FS core layer, the caller
has to pass the entire flow counter object and not just the counter HW
handle (ID). This requires both the FS core and the caller to have
knowledge about the inner implementation of the FS layer flow counters
cache and limits the possible users.

Move to use the counter ID across the place when dealing with flows.

Doing this decoupling, now can we privatize the inner implementation
of the flow counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                      |  7 +++++--
 .../ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h   |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c        |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c      |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c       |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c      | 10 ++--------
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c  |  6 ++++++
 include/linux/mlx5/fs.h                                |  3 ++-
 9 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5d9b7f62a0ba..5ced0cc46ba1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3320,15 +3320,18 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 	}
 
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+		struct mlx5_ib_mcounters *mcounters;
+
 		err = flow_counters_set_data(flow_act.counters, ucmd);
 		if (err)
 			goto free;
 
+		mcounters = to_mcounters(flow_act.counters);
 		handler->ibcounters = flow_act.counters;
 		dest_arr[dest_num].type =
 			MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		dest_arr[dest_num].counter =
-			to_mcounters(flow_act.counters)->hw_cntrs_hndl;
+		dest_arr[dest_num].counter_id =
+			mlx5_fc_id(mcounters->hw_cntrs_hndl);
 		dest_num++;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
index e83dda441a81..d027ce00c8ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
@@ -252,10 +252,10 @@ TRACE_EVENT(mlx5_fs_add_rule,
 			   memcpy(__entry->destination,
 				  &rule->dest_attr,
 				  sizeof(__entry->destination));
-			   if (rule->dest_attr.type & MLX5_FLOW_DESTINATION_TYPE_COUNTER &&
-			       rule->dest_attr.counter)
+			   if (rule->dest_attr.type &
+			       MLX5_FLOW_DESTINATION_TYPE_COUNTER)
 				__entry->counter_id =
-				rule->dest_attr.counter->id;
+					rule->dest_attr.counter_id;
 	    ),
 	    TP_printk("rule=%p fte=%p index=%u sw_action=<%s> [dst] %s\n",
 		      __entry->rule, __entry->fte, __entry->index,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 8a27c0813a18..5ce87f54852d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -720,7 +720,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 			goto err_fc_create;
 		}
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		dest[dest_ix].counter = counter;
+		dest[dest_ix].counter_id = mlx5_fc_id(counter);
 		dest_ix++;
 		attr->counter = counter;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index e1d47fa5ab83..9c893d7d273e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1198,7 +1198,7 @@ static int esw_vport_ingress_config(struct mlx5_eswitch *esw,
 	if (counter) {
 		flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
 		drop_ctr_dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		drop_ctr_dst.counter = counter;
+		drop_ctr_dst.counter_id = mlx5_fc_id(counter);
 		dst = &drop_ctr_dst;
 		dest_num++;
 	}
@@ -1285,7 +1285,7 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
 	if (counter) {
 		flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
 		drop_ctr_dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		drop_ctr_dst.counter = counter;
+		drop_ctr_dst.counter_id = mlx5_fc_id(counter);
 		dst = &drop_ctr_dst;
 		dest_num++;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index a2f2d726c99b..39932dce15cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -91,7 +91,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	}
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		dest[i].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		dest[i].counter = attr->counter;
+		dest[i].counter_id = mlx5_fc_id(attr->counter);
 		i++;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index dc8d7f6b52c2..08a891f9aade 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -419,7 +419,7 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 				continue;
 
 			MLX5_SET(flow_counter_list, in_dests, flow_counter_id,
-				 dst->dest_attr.counter->id);
+				 dst->dest_attr.counter_id);
 			in_dests += MLX5_ST_SZ_BYTES(dest_format_struct);
 			list_size++;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 9e18e6c0a8b3..cdcbf9d0ae6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1474,14 +1474,8 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 	return handle;
 }
 
-static bool counter_is_valid(struct mlx5_fc *counter, u32 action)
+static bool counter_is_valid(u32 action)
 {
-	if (!(action & MLX5_FLOW_CONTEXT_ACTION_COUNT))
-		return !counter;
-
-	if (!counter)
-		return false;
-
 	return (action & (MLX5_FLOW_CONTEXT_ACTION_DROP |
 			  MLX5_FLOW_CONTEXT_ACTION_FWD_DEST));
 }
@@ -1491,7 +1485,7 @@ static bool dest_is_valid(struct mlx5_flow_destination *dest,
 			  struct mlx5_flow_table *ft)
 {
 	if (dest && (dest->type == MLX5_FLOW_DESTINATION_TYPE_COUNTER))
-		return counter_is_valid(dest->counter, action);
+		return counter_is_valid(action);
 
 	if (!(action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST))
 		return true;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index 09206c4acd9a..1329bc5b7969 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -258,6 +258,12 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging)
 }
 EXPORT_SYMBOL(mlx5_fc_create);
 
+u32 mlx5_fc_id(struct mlx5_fc *counter)
+{
+	return counter->id;
+}
+EXPORT_SYMBOL(mlx5_fc_id);
+
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
 {
 	struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 74d0ea146c9a..a5fc62184195 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -92,7 +92,7 @@ struct mlx5_flow_destination {
 		u32			tir_num;
 		u32			ft_num;
 		struct mlx5_flow_table	*ft;
-		struct mlx5_fc		*counter;
+		u32			counter_id;
 		struct {
 			u16		num;
 			u16		vhca_id;
@@ -192,6 +192,7 @@ void mlx5_fc_query_cached(struct mlx5_fc *counter,
 			  u64 *bytes, u64 *packets, u64 *lastuse);
 int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 		  u64 *packets, u64 *bytes);
+u32 mlx5_fc_id(struct mlx5_fc *counter);
 
 int mlx5_fs_add_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 int mlx5_fs_remove_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
-- 
2.17.2

^ permalink raw reply related

* [net-next 01/14] net/mlx5: E-Switch, Get counters for offloaded flows from callers
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Mark Bloch, Saeed Mahameed
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>

From: Mark Bloch <markb@mellanox.com>

There's no real reason for the e-switch logic to manage the creation of
counters for offloaded flows. The API already has the directive for the
caller to denote they want to attach a counter to the created flow.
As such, we go and move the management of flow counters to the mlx5e
tc offload logic. This also lets us remove an inelegant interface where
the FS layer had to provide a way to retrieve a counter from a flow rule.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 32 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 20 ++----------
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 15 ---------
 include/linux/mlx5/fs.h                       |  1 -
 5 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index acf7a847f561..8a27c0813a18 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -61,6 +61,7 @@ struct mlx5_nic_flow_attr {
 	u32 hairpin_tirn;
 	u8 match_level;
 	struct mlx5_flow_table	*hairpin_ft;
+	struct mlx5_fc		*counter;
 };
 
 #define MLX5E_TC_FLOW_BASE (MLX5E_TC_LAST_EXPORTED_BIT + 1)
@@ -721,6 +722,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
 		dest[dest_ix].counter = counter;
 		dest_ix++;
+		attr->counter = counter;
 	}
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
@@ -797,7 +799,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
 	struct mlx5_fc *counter = NULL;
 
-	counter = mlx5_flow_rule_counter(flow->rule[0]);
+	counter = attr->counter;
 	mlx5_del_flow_rules(flow->rule[0]);
 	mlx5_fc_destroy(priv->mdev, counter);
 
@@ -833,6 +835,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct net_device *out_dev, *encap_dev = NULL;
 	struct mlx5_flow_handle *rule = NULL;
+	struct mlx5_fc *counter = NULL;
 	struct mlx5e_rep_priv *rpriv;
 	struct mlx5e_priv *out_priv;
 	int err;
@@ -868,6 +871,16 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		}
 	}
 
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+		counter = mlx5_fc_create(esw->dev, true);
+		if (IS_ERR(counter)) {
+			rule = ERR_CAST(counter);
+			goto err_create_counter;
+		}
+
+		attr->counter = counter;
+	}
+
 	/* we get here if (1) there's no error (rule being null) or when
 	 * (2) there's an encap action and we're on -EAGAIN (no valid neigh)
 	 */
@@ -888,6 +901,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	mlx5_eswitch_del_offloaded_rule(esw, rule, attr);
 	rule = flow->rule[1];
 err_add_rule:
+	mlx5_fc_destroy(esw->dev, counter);
+err_create_counter:
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
 err_mod_hdr:
@@ -921,6 +936,9 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
+
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT)
+		mlx5_fc_destroy(esw->dev, attr->counter);
 }
 
 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
@@ -992,6 +1010,14 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 	}
 }
 
+static struct mlx5_fc *mlx5e_tc_get_counter(struct mlx5e_tc_flow *flow)
+{
+	if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+		return flow->esw_attr->counter;
+	else
+		return flow->nic_attr->counter;
+}
+
 void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 {
 	struct mlx5e_neigh *m_neigh = &nhe->m_neigh;
@@ -1017,7 +1043,7 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 			continue;
 		list_for_each_entry(flow, &e->flows, encap) {
 			if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
-				counter = mlx5_flow_rule_counter(flow->rule[0]);
+				counter = mlx5e_tc_get_counter(flow);
 				mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
 				if (time_after((unsigned long)lastuse, nhe->reported_lastuse)) {
 					neigh_used = true;
@@ -3019,7 +3045,7 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
 	if (!(flow->flags & MLX5E_TC_FLOW_OFFLOADED))
 		return 0;
 
-	counter = mlx5_flow_rule_counter(flow->rule[0]);
+	counter = mlx5e_tc_get_counter(flow);
 	if (!counter)
 		return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index dfc642de4e6d..c1b627577003 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -266,6 +266,7 @@ struct mlx5_esw_flow_attr {
 	u32	encap_id;
 	u32	mod_hdr_id;
 	u8	match_level;
+	struct mlx5_fc *counter;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 0741683f7d70..a2f2d726c99b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -51,7 +51,6 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	struct mlx5_flow_destination dest[MLX5_MAX_FLOW_FWD_VPORTS + 1] = {};
 	struct mlx5_flow_act flow_act = {0};
 	struct mlx5_flow_table *ft = NULL;
-	struct mlx5_fc *counter = NULL;
 	struct mlx5_flow_handle *rule;
 	int j, i = 0;
 	void *misc;
@@ -91,13 +90,8 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		}
 	}
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
-		counter = mlx5_fc_create(esw->dev, true);
-		if (IS_ERR(counter)) {
-			rule = ERR_CAST(counter);
-			goto err_counter_alloc;
-		}
 		dest[i].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		dest[i].counter = counter;
+		dest[i].counter = attr->counter;
 		i++;
 	}
 
@@ -132,15 +126,11 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 
 	rule = mlx5_add_flow_rules(ft, spec, &flow_act, dest, i);
 	if (IS_ERR(rule))
-		goto err_add_rule;
+		goto out;
 	else
 		esw->offloads.num_flows++;
 
-	return rule;
-
-err_add_rule:
-	mlx5_fc_destroy(esw->dev, counter);
-err_counter_alloc:
+out:
 	return rule;
 }
 
@@ -200,11 +190,7 @@ mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_handle *rule,
 				struct mlx5_esw_flow_attr *attr)
 {
-	struct mlx5_fc *counter = NULL;
-
-	counter = mlx5_flow_rule_counter(rule);
 	mlx5_del_flow_rules(rule);
-	mlx5_fc_destroy(esw->dev, counter);
 	esw->offloads.num_flows--;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 8d340e5181f8..9e18e6c0a8b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1474,21 +1474,6 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 	return handle;
 }
 
-struct mlx5_fc *mlx5_flow_rule_counter(struct mlx5_flow_handle *handle)
-{
-	struct mlx5_flow_rule *dst;
-	struct fs_fte *fte;
-
-	fs_get_obj(fte, handle->rule[0]->node.parent);
-
-	fs_for_each_dst(dst, fte) {
-		if (dst->dest_attr.type == MLX5_FLOW_DESTINATION_TYPE_COUNTER)
-			return dst->dest_attr.counter;
-	}
-
-	return NULL;
-}
-
 static bool counter_is_valid(struct mlx5_fc *counter, u32 action)
 {
 	if (!(action & MLX5_FLOW_CONTEXT_ACTION_COUNT))
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index b1c026f1c8ba..74d0ea146c9a 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -186,7 +186,6 @@ int mlx5_modify_rule_destination(struct mlx5_flow_handle *handler,
 				 struct mlx5_flow_destination *new_dest,
 				 struct mlx5_flow_destination *old_dest);
 
-struct mlx5_fc *mlx5_flow_rule_counter(struct mlx5_flow_handle *handler);
 struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
-- 
2.17.2

^ permalink raw reply related

* [pull request][net-next 00/14] Mellanox, mlx5 updates
From: Saeed Mahameed @ 2018-10-18  0:08 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Saeed Mahameed

Hi Dave,

This series from Paul, Or and Mark provides the support for 
e-switch tc offloading of multiple priorities and chains, the series
is based on a merge commit with mlx5-next branch for patches that are
already submitted and reviewed through the netdev and rdma mailing
lists.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
saeed.

---

The following changes since commit 186daf0c20507072e72a3c74db4ac50a5b6dae07:

  Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next (2018-10-17 14:13:36 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-10-17

for you to fetch changes up to bf07aa730a04a375bc10d09df1e81357af1d4477:

  net/mlx5e: Support offloading tc priorities and chains for eswitch flows (2018-10-17 14:20:49 -0700)

----------------------------------------------------------------
mlx5-updates-2018-10-17

========================================================================

>From Or Gerlitz <ogerlitz@mellanox.com>:

This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.

This is made of four building blocks (along with few minor driver refactors):

[1] Split FDB fast path prio to multiple namespaces

Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
The slow path contains the per representor SQ send-to-vport TX rule and the match-all
RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
to multiple namespaces  (sub namespaces), each with multiple priorities.

[2] E-Switch chains and priorities

A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
and a flow table for each priority in them.

Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to each other (but to the slow path),
and one must use a explicit goto action to reach a different chain.

Flow tables for the priorities are created on demand and destroyed
once not used.

[3] Add a no-append flow insertion mode, use it for TC offloads

Enhance the driver fs core, such that if a no-append flag is set by the caller,
we add a new FTE, instead of appending the actions of the inserted rule when
the same match already exists.

For encap rules, we defer the HW offloading till we have a valid neighbor. This can
result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
before priorities where supported.

[4] Offloading tc priorities and chains for eswitch flows

Using [1], [2] and [3] above we add the support for offloading both chains
and priorities. To get to a new chain, use the tc goto action. We support
a fixed prio range 1-16, and chains 0-3.
=============================================================================

----------------------------------------------------------------
Mark Bloch (2):
      net/mlx5: E-Switch, Get counters for offloaded flows from callers
      net/mlx5: Use flow counter IDs and not the wrapping cache object

Or Gerlitz (2):
      net/mlx5: E-Switch, Have explicit API to delete fwd rules
      net/mlx5e: Avoid duplicated code for tc offloads add/del fdb rule

Paul Blakey (8):
      net/mlx5: Add cap bits for multi fdb encap
      net/mlx5: Split FDB fast path prio to multiple namespaces
      net/mlx5: E-Switch, Add chains and priorities
      net/mlx5: Add a no-append flow insertion mode
      net/mlx5e: For TC offloads, always add new flow instead of appending the actions
      net/mlx5: E-Switch, Enable setting goto slow path chain action
      net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available
      net/mlx5e: Support offloading tc priorities and chains for eswitch flows

Rabie Loulou (1):
      net/mlx5e: Change return type of tc add flow functions

Roi Dayan (1):
      net/mlx5e: Split TC add rule path for nic vs e-switch

 drivers/infiniband/hw/mlx5/main.c                  |  13 +-
 .../mellanox/mlx5/core/diag/fs_tracepoint.h        |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   3 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   6 -
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 490 +++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  46 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 410 +++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 122 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  13 +
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |   6 +
 include/linux/mlx5/fs.h                            |  20 +-
 include/linux/mlx5/mlx5_ifc.h                      |   4 +-
 15 files changed, 860 insertions(+), 289 deletions(-)

^ permalink raw reply

* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Francois Romieu @ 2018-10-17 23:30 UTC (permalink / raw)
  To: Holger Hoffstätte
  Cc: Heiner Kallweit, David Miller, Realtek linux nic maintainers,
	netdev@vger.kernel.org
In-Reply-To: <e7607b0f-86ed-6784-638c-37d68d910fb7@applied-asynchrony.com>

Holger Hoffstätte <holger@applied-asynchrony.com> :
[...]
> I continued to use the BQL patch in my private tree after it was reverted
> and also had occasional timeouts, but *only* after I started playing
> with ethtool to change offload settings. Without offloads or the BQL patch
> everything has been rock-solid since then.
> The other weird problem was that timeouts would occur on an otherwise
> *completely idle* system. Since that occasionally borked my NFS server
> over night I ultimately removed BQL as well. Rock-solid since then.

The bug will induce delayed rx processing when a spike of "load" is
followed by an idle period. I do not see how bql would matter per se though.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Francois Romieu @ 2018-10-17 23:30 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David Miller, Realtek linux nic maintainers,
	netdev@vger.kernel.org
In-Reply-To: <8f84fe39-3d8d-396d-3b97-027e0a83f8cb@gmail.com>

Heiner Kallweit <hkallweit1@gmail.com> :
[...]
> This issue has been there more or less forever (at least it exists in
> 3.16 already), so I can't provide a "Fixes" tag. 

Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.

-- 
Ueimor

^ permalink raw reply

* [PATCH v2 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-17 23:36 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu

Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().

This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.

Song Liu (2):
  bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
  bpf: add tests for direct packet access from CGROUP_SKB

 kernel/bpf/cgroup.c                         |   4 +
 net/core/filter.c                           |  37 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 159 ++++++++++++++++++++
 3 files changed, 199 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH v2 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-17 23:36 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181017233616.3130909-1-songliubraving@fb.com>

Tests are added to make sure CGROUP_SKB cannot access:
  tc_classid, data_meta, flow_keys

and can read and write:
  mark, prority, and cb[0-4]

and can read other fields.

To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                          |   4 +
 tools/testing/selftests/bpf/test_verifier.c | 170 ++++++++++++++++++++
 2 files changed, 174 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..c7210e2f1ae9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,7 @@
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
 #include <linux/sched/signal.h>
+#include <net/sock.h>
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -115,6 +116,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	u32 retval, duration;
 	int hh_len = ETH_HLEN;
 	struct sk_buff *skb;
+	struct sock sk;
 	void *data;
 	int ret;
 
@@ -142,6 +144,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		kfree(data);
 		return -ENOMEM;
 	}
+	sock_init_data(NULL, &sk);
+	skb->sk = &sk;
 
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	__skb_put(skb, size);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..5bfba7e8afd7 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,176 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
+	{
+		"direct packet read test#1 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, len)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+			    	    offsetof(struct __sk_buff, pkt_type)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, queue_mapping)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, protocol)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_present)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#2 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_tci)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+			    	    offsetof(struct __sk_buff, vlan_proto)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, ingress_ifindex)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_index)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, hash)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#3 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+			    	    offsetof(struct __sk_buff, cb[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#4 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, family)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_port)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of tc_classid for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_classid)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of data_meta for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_meta)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of flow_keys for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, flow_keys)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid write access to napi_id for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
 	{
 		"valid cgroup storage access",
 		.insns = {
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Song Liu @ 2018-10-17 23:36 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181017233616.3130909-1-songliubraving@fb.com>

BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
to compute proper data_end for the BPF program.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/cgroup.c |  4 ++++
 net/core/filter.c   | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..340d496f35bd 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	save_sk = skb->sk;
 	skb->sk = sk;
 	__skb_push(skb, offset);
+
+	/* compute pointers for the bpf prog */
+	bpf_compute_data_pointers(skb);
+
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
 				 bpf_prog_run_save_cb);
 	__skb_pull(skb, offset);
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..e3ca30bd6840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool cg_skb_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, tc_classid):
+	case bpf_ctx_range(struct __sk_buff, data_meta):
+	case bpf_ctx_range(struct __sk_buff, flow_keys):
+		return false;
+	}
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case bpf_ctx_range(struct __sk_buff, mark):
+		case bpf_ctx_range(struct __sk_buff, priority):
+		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
 
 const struct bpf_verifier_ops cg_skb_verifier_ops = {
 	.get_func_proto		= cg_skb_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
+	.is_valid_access	= cg_skb_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 12/13] tools/bpf: enhance test_btf file testing to test func info
From: Yonghong Song @ 2018-10-17 23:31 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233101.1072913-1-yhs@fb.com>

Change the bpf programs test_btf_haskv.c and test_btf_nokv.c to
have two sections, and enhance test_btf.c test_file feature
to test btf func_info returned by the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/test_btf.c       | 87 ++++++++++++++++++--
 tools/testing/selftests/bpf/test_btf_haskv.c | 16 +++-
 tools/testing/selftests/bpf/test_btf_nokv.c  | 16 +++-
 3 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index e03a8cea4bb7..38ca942eba28 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -2180,13 +2180,13 @@ static struct btf_file_test file_tests[] = {
 },
 };
 
-static int file_has_btf_elf(const char *fn)
+static int file_has_btf_elf(const char *fn, bool *has_btf_ext)
 {
 	Elf_Scn *scn = NULL;
 	GElf_Ehdr ehdr;
+	int ret = 0;
 	int elf_fd;
 	Elf *elf;
-	int ret;
 
 	if (CHECK(elf_version(EV_CURRENT) == EV_NONE,
 		  "elf_version(EV_CURRENT) == EV_NONE"))
@@ -2218,14 +2218,12 @@ static int file_has_btf_elf(const char *fn)
 		}
 
 		sh_name = elf_strptr(elf, ehdr.e_shstrndx, sh.sh_name);
-		if (!strcmp(sh_name, BTF_ELF_SEC)) {
+		if (!strcmp(sh_name, BTF_ELF_SEC))
 			ret = 1;
-			goto done;
-		}
+		if (!strcmp(sh_name, BTF_EXT_ELF_SEC))
+			*has_btf_ext = true;
 	}
 
-	ret = 0;
-
 done:
 	close(elf_fd);
 	elf_end(elf);
@@ -2235,15 +2233,22 @@ static int file_has_btf_elf(const char *fn)
 static int do_test_file(unsigned int test_num)
 {
 	const struct btf_file_test *test = &file_tests[test_num - 1];
+	const char *expected_fnames[] = {"_dummy_tracepoint",
+					 "test_long_fname_1",
+					 "test_long_fname_2"};
+	__u32 func_lens[10], func_types[10], info_len;
+	struct bpf_prog_info info = {};
 	struct bpf_object *obj = NULL;
 	struct bpf_program *prog;
+	bool has_btf_ext = false;
+	struct btf *btf = NULL;
 	struct bpf_map *map;
-	int err;
+	int i, err, prog_fd;
 
 	fprintf(stderr, "BTF libbpf test[%u] (%s): ", test_num,
 		test->file);
 
-	err = file_has_btf_elf(test->file);
+	err = file_has_btf_elf(test->file, &has_btf_ext);
 	if (err == -1)
 		return err;
 
@@ -2271,6 +2276,7 @@ static int do_test_file(unsigned int test_num)
 	err = bpf_object__load(obj);
 	if (CHECK(err < 0, "bpf_object__load: %d", err))
 		goto done;
+	prog_fd = bpf_program__fd(prog);
 
 	map = bpf_object__find_map_by_name(obj, "btf_map");
 	if (CHECK(!map, "btf_map not found")) {
@@ -2285,6 +2291,69 @@ static int do_test_file(unsigned int test_num)
 		  test->btf_kv_notfound))
 		goto done;
 
+	if (!jit_enabled || !has_btf_ext)
+		goto skip_jit;
+
+	info_len = sizeof(struct bpf_prog_info);
+	info.nr_jited_func_types = ARRAY_SIZE(func_types);
+	info.nr_jited_func_lens = ARRAY_SIZE(func_lens);
+	info.jited_func_types = ptr_to_u64(&func_types[0]);
+	info.jited_func_lens = ptr_to_u64(&func_lens[0]);
+
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+
+	if (CHECK(err == -1, "invalid get info errno:%d", errno)) {
+		fprintf(stderr, "%s\n", btf_log_buf);
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.nr_jited_func_lens != 3,
+		  "incorrect info.nr_jited_func_lens %d",
+		  info.nr_jited_func_lens)) {
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.nr_jited_func_types != 3,
+		  "incorrect info.nr_jited_func_types %d",
+		  info.nr_jited_func_types)) {
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.btf_id == 0, "incorrect btf_id = 0")) {
+		err = -1;
+		goto done;
+	}
+
+	err = btf_get_from_id(info.btf_id, &btf);
+	if (CHECK(err, "cannot get btf from kernel, err: %d", err))
+		goto done;
+
+	/* check three functions */
+	for (i = 0; i < 3; i++) {
+		const struct btf_type *t;
+		const char *fname;
+
+		t = btf__type_by_id(btf, func_types[i]);
+		if (CHECK(!t, "btf__type_by_id failure: id %u",
+			  func_types[i])) {
+			err = -1;
+			goto done;
+		}
+
+		fname = btf__name_by_offset(btf, t->name_off);
+		err = strcmp(fname, expected_fnames[i]);
+		/* for the second and third functions in .text section,
+		 * the compiler may order them either way.
+		 */
+		if (i && err)
+			err = strcmp(fname, expected_fnames[3 - i]);
+		if (CHECK(err, "incorrect fname %s", fname ? : "")) {
+			err = -1;
+			goto done;
+		}
+	}
+
+skip_jit:
 	fprintf(stderr, "OK");
 
 done:
diff --git a/tools/testing/selftests/bpf/test_btf_haskv.c b/tools/testing/selftests/bpf/test_btf_haskv.c
index b21b876f475d..e5c79fe0ffdb 100644
--- a/tools/testing/selftests/bpf/test_btf_haskv.c
+++ b/tools/testing/selftests/bpf/test_btf_haskv.c
@@ -24,8 +24,8 @@ struct dummy_tracepoint_args {
 	struct sock *sock;
 };
 
-SEC("dummy_tracepoint")
-int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+__attribute__((noinline))
+static int test_long_fname_2(struct dummy_tracepoint_args *arg)
 {
 	struct ipv_counts *counts;
 	int key = 0;
@@ -42,4 +42,16 @@ int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
 	return 0;
 }
 
+__attribute__((noinline))
+static int test_long_fname_1(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_2(arg);
+}
+
+SEC("dummy_tracepoint")
+int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_1(arg);
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_btf_nokv.c b/tools/testing/selftests/bpf/test_btf_nokv.c
index 0ed8e088eebf..434188c37774 100644
--- a/tools/testing/selftests/bpf/test_btf_nokv.c
+++ b/tools/testing/selftests/bpf/test_btf_nokv.c
@@ -22,8 +22,8 @@ struct dummy_tracepoint_args {
 	struct sock *sock;
 };
 
-SEC("dummy_tracepoint")
-int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+__attribute__((noinline))
+static int test_long_fname_2(struct dummy_tracepoint_args *arg)
 {
 	struct ipv_counts *counts;
 	int key = 0;
@@ -40,4 +40,16 @@ int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
 	return 0;
 }
 
+__attribute__((noinline))
+static int test_long_fname_1(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_2(arg);
+}
+
+SEC("dummy_tracepoint")
+int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_1(arg);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 06/13] tools/bpf: sync kernel uapi bpf.h header to tools directory
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233042.1072050-1-yhs@fb.com>

The kernel uapi bpf.h is synced to tools directory.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f9187b41dff6..7ebbf4f06a65 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -332,6 +332,9 @@ union bpf_attr {
 		 * (context accesses, allowed helpers, etc).
 		 */
 		__u32		expected_attach_type;
+		__u32		prog_btf_fd;	/* fd pointing to BTF type data */
+		__u32		func_info_len;	/* func_info length */
+		__aligned_u64	func_info;	/* func type info */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -2585,6 +2588,9 @@ struct bpf_prog_info {
 	__u32 nr_jited_func_lens;
 	__aligned_u64 jited_ksyms;
 	__aligned_u64 jited_func_lens;
+	__u32 btf_id;
+	__u32 nr_jited_func_types;
+	__aligned_u64 jited_func_types;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -2896,4 +2902,9 @@ struct bpf_flow_keys {
 	};
 };
 
+struct bpf_func_info {
+	__u32	insn_offset;
+	__u32	type_id;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 13/13] tools/bpf: bpftool: add support for jited func types
From: Yonghong Song @ 2018-10-17 23:31 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233101.1072913-1-yhs@fb.com>

This patch added support to print function signature
if btf func_info is available. Note that ksym
now uses function name instead of prog_name as
prog_name has a limit of 16 bytes including
ending '\0'.

The following is a sample output for selftests
test_btf with file test_btf_haskv.o:

  $ bpftool prog dump jited id 1
  int _dummy_tracepoint(struct dummy_tracepoint_args * ):
  bpf_prog_b07ccb89267cf242__dummy_tracepoint:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    3c:   add    $0x28,%rbp
    40:   leaveq
    41:   retq

  int test_long_fname_1(struct dummy_tracepoint_args * ):
  bpf_prog_2dcecc18072623fc_test_long_fname_1:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    3a:   add    $0x28,%rbp
    3e:   leaveq
    3f:   retq

  int test_long_fname_2(struct dummy_tracepoint_args * ):
  bpf_prog_89d64e4abf0f0126_test_long_fname_2:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    80:   add    $0x28,%rbp
    84:   leaveq
    85:   retq

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/bpf/bpftool/btf_dumper.c | 90 ++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/main.h       |  2 +
 tools/bpf/bpftool/prog.c       | 54 ++++++++++++++++++++
 3 files changed, 146 insertions(+)

diff --git a/tools/bpf/bpftool/btf_dumper.c b/tools/bpf/bpftool/btf_dumper.c
index 55bc512a1831..6122b735ddcc 100644
--- a/tools/bpf/bpftool/btf_dumper.c
+++ b/tools/bpf/bpftool/btf_dumper.c
@@ -249,3 +249,93 @@ int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
 {
 	return btf_dumper_do_type(d, type_id, 0, data);
 }
+
+#define BTF_PRINT_ARG(...)						\
+	do {								\
+		pos += snprintf(func_sig + pos, size - pos,		\
+				__VA_ARGS__);				\
+		if (pos >= size)					\
+			return -1;					\
+	} while (0)
+#define BTF_PRINT_TYPE(type)					\
+	do {								\
+		pos = __btf_dumper_type_only(btf, type, func_sig,	\
+					     pos, size);		\
+		if (pos == -1)						\
+			return -1;					\
+	} while (0)
+
+static int __btf_dumper_type_only(struct btf *btf, __u32 type_id,
+				  char *func_sig, int pos, int size)
+{
+	const struct btf_type *t = btf__type_by_id(btf, type_id);
+	const struct btf_array *array;
+	int i, vlen;
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_INT:
+		BTF_PRINT_ARG("%s ", btf__name_by_offset(btf, t->name_off));
+		break;
+	case BTF_KIND_STRUCT:
+		BTF_PRINT_ARG("struct %s ",
+			      btf__name_by_offset(btf, t->name_off));
+		break;
+	case BTF_KIND_UNION:
+		BTF_PRINT_ARG("union %s ",
+			      btf__name_by_offset(btf, t->name_off));
+		break;
+	case BTF_KIND_ENUM:
+		BTF_PRINT_ARG("enum %s ",
+			      btf__name_by_offset(btf, t->name_off));
+		break;
+	case BTF_KIND_ARRAY:
+		array = (struct btf_array *)(t + 1);
+		BTF_PRINT_TYPE(array->type);
+		BTF_PRINT_ARG("[%d]", array->nelems);
+		break;
+	case BTF_KIND_PTR:
+		BTF_PRINT_TYPE(t->type);
+		BTF_PRINT_ARG("* ");
+		break;
+	case BTF_KIND_UNKN:
+	case BTF_KIND_FWD:
+	case BTF_KIND_TYPEDEF:
+		return -1;
+	case BTF_KIND_VOLATILE:
+		BTF_PRINT_ARG("volatile ");
+		BTF_PRINT_TYPE(t->type);
+		break;
+	case BTF_KIND_CONST:
+		BTF_PRINT_ARG("const ");
+		BTF_PRINT_TYPE(t->type);
+		break;
+	case BTF_KIND_RESTRICT:
+		BTF_PRINT_ARG("restrict ");
+		BTF_PRINT_TYPE(t->type);
+		break;
+	case BTF_KIND_FUNC:
+	case BTF_KIND_FUNC_PROTO:
+		BTF_PRINT_TYPE(t->type);
+		BTF_PRINT_ARG("%s(", btf__name_by_offset(btf, t->name_off));
+		vlen = BTF_INFO_VLEN(t->info);
+		for (i = 0; i < vlen; i++) {
+			__u32 arg_type = ((__u32 *)(t + 1))[i];
+
+			if (i)
+				BTF_PRINT_ARG(", ");
+			BTF_PRINT_TYPE(arg_type);
+		}
+		BTF_PRINT_ARG(")");
+		break;
+	default:
+		return -1;
+	}
+
+	return pos;
+}
+
+int btf_dumper_type_only(struct btf *btf, __u32 type_id, char *func_sig,
+			 int size)
+{
+	return __btf_dumper_type_only(btf, type_id, func_sig, 0, size);
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28ee769bd11b..f887564476dc 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -168,6 +168,8 @@ struct btf_dumper {
  */
 int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
 		    const void *data);
+int btf_dumper_type_only(struct btf *btf, __u32 func_type_id, char *func_only,
+			 int size);
 
 struct nlattr;
 struct ifinfomsg;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 335028968dfb..e78632a30ea3 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -47,6 +47,7 @@
 #include <linux/err.h>
 
 #include <bpf.h>
+#include <btf.h>
 #include <libbpf.h>
 
 #include "cfg.h"
@@ -447,9 +448,11 @@ static int do_show(int argc, char **argv)
 static int do_dump(int argc, char **argv)
 {
 	unsigned long *func_ksyms = NULL;
+	unsigned int *func_types = NULL;
 	struct bpf_prog_info info = {};
 	unsigned int *func_lens = NULL;
 	unsigned int nr_func_ksyms;
+	unsigned int nr_func_types;
 	unsigned int nr_func_lens;
 	struct dump_data dd = {};
 	__u32 len = sizeof(info);
@@ -546,6 +549,16 @@ static int do_dump(int argc, char **argv)
 		}
 	}
 
+	nr_func_types = info.nr_jited_func_types;
+	if (nr_func_types) {
+		func_types = malloc(nr_func_types * sizeof(__u32));
+		if (!func_types) {
+			p_err("mem alloc failed");
+			close(fd);
+			goto err_free;
+		}
+	}
+
 	memset(&info, 0, sizeof(info));
 
 	*member_ptr = ptr_to_u64(buf);
@@ -554,6 +567,8 @@ static int do_dump(int argc, char **argv)
 	info.nr_jited_ksyms = nr_func_ksyms;
 	info.jited_func_lens = ptr_to_u64(func_lens);
 	info.nr_jited_func_lens = nr_func_lens;
+	info.jited_func_types = ptr_to_u64(func_types);
+	info.nr_jited_func_types = nr_func_types;
 
 	err = bpf_obj_get_info_by_fd(fd, &info, &len);
 	close(fd);
@@ -577,6 +592,11 @@ static int do_dump(int argc, char **argv)
 		goto err_free;
 	}
 
+	if (info.nr_jited_func_types > nr_func_types) {
+		p_err("too many types returned");
+		goto err_free;
+	}
+
 	if ((member_len == &info.jited_prog_len &&
 	     info.jited_prog_insns == 0) ||
 	    (member_len == &info.xlated_prog_len &&
@@ -585,6 +605,12 @@ static int do_dump(int argc, char **argv)
 		goto err_free;
 	}
 
+	if (info.btf_id &&
+	    info.nr_jited_func_lens != info.nr_jited_func_types) {
+		p_err("unequal jited func lens and types");
+		goto err_free;
+	}
+
 	if (filepath) {
 		fd = open(filepath, O_WRONLY | O_CREAT | O_TRUNC, 0600);
 		if (fd < 0) {
@@ -618,7 +644,9 @@ static int do_dump(int argc, char **argv)
 			struct kernel_sym *sym = NULL;
 			char sym_name[SYM_MAX_NAME];
 			unsigned char *img = buf;
+			struct btf *btf = NULL;
 			__u64 *ksyms = NULL;
+			char func_sig[1024];
 			__u32 *lens;
 			__u32 i;
 
@@ -627,6 +655,14 @@ static int do_dump(int argc, char **argv)
 				ksyms = (__u64 *) info.jited_ksyms;
 			}
 
+			if (info.btf_id) {
+				err = btf_get_from_id(info.btf_id, &btf);
+				if (err) {
+					p_err("failed to get btf");
+					goto err_free;
+				}
+			}
+
 			if (json_output)
 				jsonw_start_array(json_wtr);
 
@@ -642,12 +678,28 @@ static int do_dump(int argc, char **argv)
 					strcpy(sym_name, "unknown");
 				}
 
+				func_sig[0] = '\0';
+				if (btf) {
+					err = btf_dumper_type_only(btf,
+								   func_types[i],
+								   func_sig,
+								   sizeof(func_sig));
+					if (err < 0)
+						func_sig[0] = '\0';
+				}
+
 				if (json_output) {
 					jsonw_start_object(json_wtr);
+					if (func_sig[0] != '\0') {
+						jsonw_name(json_wtr, "proto");
+						jsonw_string(json_wtr, func_sig);
+					}
 					jsonw_name(json_wtr, "name");
 					jsonw_string(json_wtr, sym_name);
 					jsonw_name(json_wtr, "insns");
 				} else {
+					if (func_sig[0] != '\0')
+						printf("%s:\n", func_sig);
 					printf("%s:\n", sym_name);
 				}
 
@@ -685,12 +737,14 @@ static int do_dump(int argc, char **argv)
 	free(buf);
 	free(func_ksyms);
 	free(func_lens);
+	free(func_types);
 	return 0;
 
 err_free:
 	free(buf);
 	free(func_ksyms);
 	free(func_lens);
+	free(func_types);
 	return -1;
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 05/13] bpf: get better bpf_prog ksyms based on btf func type_id
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team

This patch added interface to load a program with the following
additional information:
   . prog_btf_fd
   . func_info and func_info_len
where func_info will provides function range and type_id
corresponding to each function.

If verifier agrees with function range provided by the user,
the bpf_prog ksym for each function will use the func name
provided in the type_id, which is supposed to provide better
encoding as it is not limited by 16 bytes program name
limitation and this is better for bpf program which contains
multiple subprograms.

The bpf_prog_info interface is also extended to
return btf_id and jited_func_types, so user spaces can
print out the function prototype for each jited function.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          |  2 +
 include/linux/bpf_verifier.h |  1 +
 include/linux/btf.h          |  2 +
 include/uapi/linux/bpf.h     | 11 +++++
 kernel/bpf/btf.c             |  4 +-
 kernel/bpf/core.c            | 13 ++++++
 kernel/bpf/syscall.c         | 86 +++++++++++++++++++++++++++++++++++-
 kernel/bpf/verifier.c        | 46 +++++++++++++++++++
 8 files changed, 162 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e60fff48288b..a99e038ce9c4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -308,6 +308,8 @@ struct bpf_prog_aux {
 	void *security;
 #endif
 	struct bpf_prog_offload *offload;
+	struct btf *btf;
+	u32 type_id; /* type id for this prog/func */
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9e8056ec20fa..e84782ec50ac 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -201,6 +201,7 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 struct bpf_subprog_info {
 	u32 start; /* insn idx of function entry point */
 	u16 stack_depth; /* max. stack depth used by this function */
+	u32 type_id; /* btf type_id for this subprog */
 };
 
 /* single container for all structs
diff --git a/include/linux/btf.h b/include/linux/btf.h
index e076c4697049..7f2c0a4a45ea 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -46,5 +46,7 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m);
 int btf_get_fd_by_id(u32 id);
 u32 btf_id(const struct btf *btf);
+const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
+const char *btf_name_by_offset(const struct btf *btf, u32 offset);
 
 #endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f9187b41dff6..7ebbf4f06a65 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -332,6 +332,9 @@ union bpf_attr {
 		 * (context accesses, allowed helpers, etc).
 		 */
 		__u32		expected_attach_type;
+		__u32		prog_btf_fd;	/* fd pointing to BTF type data */
+		__u32		func_info_len;	/* func_info length */
+		__aligned_u64	func_info;	/* func type info */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -2585,6 +2588,9 @@ struct bpf_prog_info {
 	__u32 nr_jited_func_lens;
 	__aligned_u64 jited_ksyms;
 	__aligned_u64 jited_func_lens;
+	__u32 btf_id;
+	__u32 nr_jited_func_types;
+	__aligned_u64 jited_func_types;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -2896,4 +2902,9 @@ struct bpf_flow_keys {
 	};
 };
 
+struct bpf_func_info {
+	__u32	insn_offset;
+	__u32	type_id;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 763f8e06bc91..13e83f82374b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -471,7 +471,7 @@ static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
 	return !*src;
 }
 
-static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
+const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 {
 	if (!offset)
 		return "(anon)";
@@ -481,7 +481,7 @@ static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 		return "(invalid-name-offset)";
 }
 
-static const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
+const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
 {
 	if (type_id > btf->nr_types)
 		return NULL;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index defcf4df6d91..4c4d414e030a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -21,12 +21,14 @@
  * Kris Katterjohn - Added many additional checks in bpf_check_classic()
  */
 
+#include <uapi/linux/btf.h>
 #include <linux/filter.h>
 #include <linux/skbuff.h>
 #include <linux/vmalloc.h>
 #include <linux/random.h>
 #include <linux/moduleloader.h>
 #include <linux/bpf.h>
+#include <linux/btf.h>
 #include <linux/frame.h>
 #include <linux/rbtree_latch.h>
 #include <linux/kallsyms.h>
@@ -387,6 +389,8 @@ bpf_get_prog_addr_region(const struct bpf_prog *prog,
 static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
 {
 	const char *end = sym + KSYM_NAME_LEN;
+	const struct btf_type *type;
+	const char *func_name;
 
 	BUILD_BUG_ON(sizeof("bpf_prog_") +
 		     sizeof(prog->tag) * 2 +
@@ -401,6 +405,15 @@ static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
 
 	sym += snprintf(sym, KSYM_NAME_LEN, "bpf_prog_");
 	sym  = bin2hex(sym, prog->tag, sizeof(prog->tag));
+
+	/* prog->aux->name will be ignored if full btf name is available */
+	if (prog->aux->btf) {
+		type = btf_type_by_id(prog->aux->btf, prog->aux->type_id);
+		func_name = btf_name_by_offset(prog->aux->btf, type->name_off);
+		snprintf(sym, (size_t)(end - sym), "_%s", func_name);
+		return;
+	}
+
 	if (prog->aux->name[0])
 		snprintf(sym, (size_t)(end - sym), "_%s", prog->aux->name);
 	else
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f4ecd6ed2252..f8ea6fe5a1a1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -9,6 +9,7 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
  * General Public License for more details.
  */
+#include <uapi/linux/btf.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
@@ -1133,6 +1134,7 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog, do_idr_lock);
 		bpf_prog_kallsyms_del_all(prog);
+		btf_put(prog->aux->btf);
 
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 	}
@@ -1356,8 +1358,47 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
 	}
 }
 
+static int prog_check_btf(const struct bpf_prog *prog, const struct btf *btf,
+			  union bpf_attr *attr)
+{
+	struct bpf_func_info __user *uinfo, info;
+	const struct btf_type *type;
+	int i, nfuncs, usize;
+	u32 prev_offset;
+
+	usize = sizeof(struct bpf_func_info);
+	if (attr->func_info_len % usize)
+		return -EINVAL;
+
+	/* func_info section should have increasing and valid insn_offset
+	 * and type should be BTF_KIND_FUNC.
+	 */
+	nfuncs = attr->func_info_len / usize;
+	uinfo = u64_to_user_ptr(attr->func_info);
+	for (i = 0; i < nfuncs; i++) {
+		if (copy_from_user(&info, uinfo, usize))
+			return -EFAULT;
+
+		type = btf_type_by_id(btf, info.type_id);
+		if (!type || BTF_INFO_KIND(type->info) != BTF_KIND_FUNC)
+			return -EINVAL;
+
+		if (i == 0) {
+			if (info.insn_offset)
+				return -EINVAL;
+			prev_offset = 0;
+		} else if (info.insn_offset < prev_offset) {
+			return -EINVAL;
+		}
+
+		prev_offset = info.insn_offset;
+	}
+
+	return 0;
+}
+
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD expected_attach_type
+#define	BPF_PROG_LOAD_LAST_FIELD func_info
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -1444,6 +1485,24 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err)
 		goto free_prog;
 
+	if (attr->func_info_len) {
+		struct btf *btf;
+
+		btf = btf_get_by_fd(attr->prog_btf_fd);
+		if (IS_ERR(btf)) {
+			err = PTR_ERR(btf);
+			goto free_prog;
+		}
+
+		err = prog_check_btf(prog, btf, attr);
+		if (err) {
+			btf_put(btf);
+			goto free_prog;
+		}
+
+		prog->aux->btf = btf;
+	}
+
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
 	if (err < 0)
@@ -1476,6 +1535,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	bpf_prog_kallsyms_del_subprogs(prog);
 	free_used_maps(prog->aux);
 free_prog:
+	btf_put(prog->aux->btf);
 	bpf_prog_uncharge_memlock(prog);
 free_prog_sec:
 	security_bpf_prog_free(prog->aux);
@@ -2121,6 +2181,30 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 		}
 	}
 
+	if (prog->aux->btf) {
+		info.btf_id = btf_id(prog->aux->btf);
+
+		ulen = info.nr_jited_func_types;
+		info.nr_jited_func_types = prog->aux->func_cnt;
+		if (info.nr_jited_func_types && ulen) {
+			if (bpf_dump_raw_ok()) {
+				u32 __user *user_types;
+				u32 func_type, i;
+
+				ulen = min_t(u32, info.nr_jited_func_types,
+					     ulen);
+				user_types = u64_to_user_ptr(info.jited_func_types);
+				for (i = 0; i < ulen; i++) {
+					func_type = prog->aux->func[i]->aux->type_id;
+					if (put_user(func_type, &user_types[i]))
+						return -EFAULT;
+				}
+			} else {
+				info.jited_func_types = 0;
+			}
+		}
+	}
+
 done:
 	if (copy_to_user(uinfo, &info, info_len) ||
 	    put_user(info_len, &uattr->info.info_len))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3f93a548a642..01ecfe431cc2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4589,6 +4589,46 @@ static int check_cfg(struct bpf_verifier_env *env)
 	return ret;
 }
 
+static int check_btf_func(struct bpf_prog *prog, struct bpf_verifier_env *env,
+			  union bpf_attr *attr)
+{
+	struct bpf_func_info *data;
+	int i, nfuncs, ret = 0;
+
+	if (!attr->func_info_len)
+		return 0;
+
+	nfuncs = attr->func_info_len / sizeof(struct bpf_func_info);
+	if (env->subprog_cnt != nfuncs) {
+		verbose(env, "number of funcs in func_info doesn't match number of subprogs\n");
+		return -EINVAL;
+	}
+
+	data = kvmalloc(attr->func_info_len, GFP_KERNEL | __GFP_NOWARN);
+	if (!data)
+		return -ENOMEM;
+
+	if (copy_from_user(data, u64_to_user_ptr(attr->func_info),
+			   attr->func_info_len)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+
+	for (i = 0; i < nfuncs; i++) {
+		if (env->subprog_info[i].start != data[i].insn_offset) {
+			verbose(env, "func_info BTF section doesn't match subprog layout in BPF program\n");
+			ret = -EINVAL;
+			goto cleanup;
+		}
+		env->subprog_info[i].type_id = data[i].type_id;
+	}
+
+	prog->aux->type_id = data[0].type_id;
+cleanup:
+	kvfree(data);
+	return ret;
+}
+
 /* check %cur's range satisfies %old's */
 static bool range_within(struct bpf_reg_state *old,
 			 struct bpf_reg_state *cur)
@@ -5873,6 +5913,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->name[0] = 'F';
 		func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
 		func[i]->jit_requested = 1;
+		func[i]->aux->btf = prog->aux->btf;
+		func[i]->aux->type_id = env->subprog_info[i].type_id;
 		func[i] = bpf_int_jit_compile(func[i]);
 		if (!func[i]->jited) {
 			err = -ENOTSUPP;
@@ -6307,6 +6349,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if (ret < 0)
 		goto skip_full_check;
 
+	ret = check_btf_func(env->prog, env, attr);
+	if (ret < 0)
+		goto skip_full_check;
+
 	ret = do_check(env);
 	if (env->cur_state) {
 		free_verifier_state(env->cur_state, true);
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 09/13] tools/bpf: add support to read .BTF.ext sections
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233042.1072050-1-yhs@fb.com>

The .BTF section is already available to encode types.
These types can be used for map
pretty print. The whole .BTF will be passed to the
kernel as well for which kernel can verify and return
to the user space for pretty print etc.

The llvm patch at https://reviews.llvm.org/D53261
will generate .BTF section and one more section .BTF.ext.
The .BTF.ext section encodes function type
information and line information. Note that
this patch set only supports function type info.
The functionality is implemented in libbpf.

The .BTF section can be directly loaded into the
kernel, and the .BTF.ext section cannot. The loader
may need to do some relocation and merging,
similar to merging multiple code sections, before
loading into the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/btf.c    | 232 +++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h    |  48 +++++++++
 tools/lib/bpf/libbpf.c |  53 +++++++++-
 3 files changed, 329 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 33095fc1860b..4748e0bacd2b 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -37,6 +37,11 @@ struct btf {
 	int fd;
 };
 
+struct btf_ext {
+	void *func_info;
+	__u32 func_info_len;
+};
+
 static int btf_add_type(struct btf *btf, struct btf_type *t)
 {
 	if (btf->types_size - btf->nr_types < 2) {
@@ -397,3 +402,230 @@ const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
 	else
 		return NULL;
 }
+
+static int btf_ext_validate_func_info(const struct btf_sec_func_info *sinfo,
+				      __u32 size, btf_print_fn_t err_log)
+{
+	int sec_hdrlen = sizeof(struct btf_sec_func_info);
+	__u32 record_size = sizeof(struct bpf_func_info);
+	__u32 size_left = size, num_records;
+	__u64 total_record_size;
+
+	while (size_left) {
+		if (size_left < sec_hdrlen) {
+			elog("BTF.ext func_info header not found");
+			return -EINVAL;
+		}
+
+		num_records = sinfo->num_func_info;
+		if (num_records == 0) {
+			elog("incorrect BTF.ext num_func_info");
+			return -EINVAL;
+		}
+
+		total_record_size = sec_hdrlen +
+				    (__u64)num_records * record_size;
+		if (size_left < total_record_size) {
+			elog("incorrect BTF.ext num_func_info");
+			return -EINVAL;
+		}
+
+		size_left -= total_record_size;
+		sinfo = (void *)sinfo + total_record_size;
+	}
+
+	return 0;
+}
+static int btf_ext_parse_hdr(__u8 *data, __u32 data_size,
+			     btf_print_fn_t err_log)
+{
+	const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
+	const struct btf_sec_func_info *sinfo;
+	__u32 meta_left, last_func_info_pos;
+
+	if (data_size < sizeof(*hdr)) {
+		elog("BTF.ext header not found");
+		return -EINVAL;
+	}
+
+	if (hdr->magic != BTF_MAGIC) {
+		elog("Invalid BTF.ext magic:%x\n", hdr->magic);
+		return -EINVAL;
+	}
+
+	if (hdr->version != BTF_VERSION) {
+		elog("Unsupported BTF.ext version:%u\n", hdr->version);
+		return -ENOTSUP;
+	}
+
+	if (hdr->flags) {
+		elog("Unsupported BTF.ext flags:%x\n", hdr->flags);
+		return -ENOTSUP;
+	}
+
+	meta_left = data_size - sizeof(*hdr);
+	if (!meta_left) {
+		elog("BTF.ext has no data\n");
+		return -EINVAL;
+	}
+
+	if (meta_left < hdr->func_info_off) {
+		elog("Invalid BTF.ext func_info section offset:%u\n",
+		     hdr->func_info_off);
+		return -EINVAL;
+	}
+
+	if (hdr->func_info_off & 0x02) {
+		elog("BTF.ext func_info section is not aligned to 4 bytes\n");
+		return -EINVAL;
+	}
+
+	last_func_info_pos = sizeof(*hdr) + hdr->func_info_off +
+			     hdr->func_info_len;
+	if (last_func_info_pos > data_size) {
+		elog("Invalid BTF.ext func_info section size:%u\n",
+		     hdr->func_info_len);
+		return -EINVAL;
+	}
+
+	sinfo = (const struct btf_sec_func_info *)(data + sizeof(*hdr) +
+						   hdr->func_info_off);
+	return btf_ext_validate_func_info(sinfo, hdr->func_info_len,
+					  err_log);
+}
+
+void btf_ext__free(struct btf_ext *btf_ext)
+{
+	if (!btf_ext)
+		return;
+
+	free(btf_ext->func_info);
+	free(btf_ext);
+}
+
+struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
+{
+	int hdrlen = sizeof(struct btf_ext_header);
+	const struct btf_ext_header *hdr;
+	struct btf_ext *btf_ext;
+	void *fdata;
+	int err;
+
+	err = btf_ext_parse_hdr(data, size, err_log);
+	if (err)
+		return ERR_PTR(err);
+
+	btf_ext = calloc(1, sizeof(struct btf_ext));
+	if (!btf_ext)
+		return ERR_PTR(-ENOMEM);
+
+	hdr = (const struct btf_ext_header *)data;
+	fdata = malloc(hdr->func_info_len);
+	if (!fdata)
+		return ERR_PTR(-ENOMEM);
+
+	memcpy(fdata, data + hdrlen + hdr->func_info_off, hdr->func_info_len);
+	btf_ext->func_info = fdata;
+	btf_ext->func_info_len = hdr->func_info_len;
+
+	return btf_ext;
+}
+
+int btf_ext_reloc_init(struct btf *btf, struct btf_ext *btf_ext,
+		       const char *sec_name, __u32 *btf_fd,
+		       void **func_info, __u32 *func_info_len)
+{
+	int sec_hdrlen = sizeof(struct btf_sec_func_info);
+	int record_len = sizeof(struct bpf_func_info);
+	struct btf_sec_func_info *sinfo;
+	int i, remain_len, records_len;
+	const char *info_sec_name;
+	void *data;
+
+	if (!btf || !btf_ext) {
+		*btf_fd = 0;
+		*func_info = NULL;
+		*func_info_len = 0;
+		return 0;
+	}
+
+	sinfo = btf_ext->func_info;
+	remain_len = btf_ext->func_info_len;
+	*btf_fd = btf__fd(btf);
+
+	while (remain_len > 0) {
+		records_len = sinfo->num_func_info * record_len;
+		info_sec_name = btf__name_by_offset(btf, sinfo->sec_name_off);
+		if (strcmp(info_sec_name, sec_name)) {
+			remain_len -= sec_hdrlen + records_len;
+			sinfo = (void *)sinfo + sec_hdrlen + records_len;
+			continue;
+		}
+
+		data = malloc(records_len);
+		if (!data)
+			return -ENOMEM;
+
+		memcpy(data, sinfo->data, records_len);
+
+		/* adjust the insn_offset, the data in .BTF.ext is
+		 * the actual byte offset, and the kernel expects
+		 * the offset in term of bpf_insn.
+		 */
+		for (i = 0; i < sinfo->num_func_info; i++)
+			((struct bpf_func_info *)data)[i].insn_offset
+				/= sizeof(struct bpf_insn);
+
+		*func_info = data;
+		*func_info_len = records_len;
+		break;
+	}
+
+	return 0;
+}
+
+int btf_ext_reloc(struct btf *btf, struct btf_ext *btf_ext,
+		  const char *sec_name, __u32 insns_cnt,
+		  void **func_info, __u32 *func_info_len)
+{
+	int sec_hdrlen = sizeof(struct btf_sec_func_info);
+	int record_len = sizeof(struct bpf_func_info);
+	__u32 i, remain_len, records_len;
+	struct btf_sec_func_info *sinfo;
+	struct bpf_func_info *record;
+	const char *info_sec_name;
+	__u32 existing_flen;
+	void *data;
+
+	if (!*func_info)
+		return 0;
+
+	sinfo = btf_ext->func_info;
+	remain_len = btf_ext->func_info_len;
+	while (remain_len > 0) {
+		records_len = sinfo->num_func_info * record_len;
+		info_sec_name = btf__name_by_offset(btf, sinfo->sec_name_off);
+		if (strcmp(info_sec_name, sec_name)) {
+			remain_len -= sec_hdrlen + records_len;
+			sinfo = (void *)sinfo + sec_hdrlen + records_len;
+			continue;
+		}
+
+		existing_flen = *func_info_len;
+		data = realloc(*func_info, existing_flen + records_len);
+		if (!data)
+			return -ENOMEM;
+
+		memcpy(data + existing_flen, sinfo->data, records_len);
+		record = data + existing_flen;
+		for (i = 0; i < sinfo->num_func_info; i++)
+			record[i].insn_offset =
+				record[i].insn_offset/sizeof(struct bpf_insn) +
+				insns_cnt;
+		*func_info = data;
+		*func_info_len = existing_flen + records_len;
+		break;
+	}
+
+	return 0;
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index b77e7080f7e7..513e55c1a928 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -11,10 +11,49 @@
 #endif
 
 #define BTF_ELF_SEC ".BTF"
+#define BTF_EXT_ELF_SEC ".BTF.ext"
 
 struct btf;
+struct btf_ext;
 struct btf_type;
 
+/*
+ * The .BTF.ext ELF section layout defined as
+ *   struct btf_ext_header
+ *   func_info and line_info
+ *
+ * The func_info subsection layout:
+ *   struct btf_sec_func_info for section #1
+ *   a list of bpf_func_info records for section #1
+ *     where struct bpf_func_info is defined in include/uapi/linux/bpf.h
+ *   struct btf_sec_func_info for section #2
+ *   a list of bpf_func_info records for section #2
+ *   ......
+ *
+ * NOTE: the line_info in the btf_ext_header below is included in order
+ *       to sync with what llvm generates. line_info support is not
+ *       in linux yet.
+ */
+struct btf_ext_header {
+	__u16	magic;
+	__u8	version;
+	__u8	flags;
+	__u32	hdr_len;
+
+	/* All offsets are in bytes relative to the end of this header */
+	__u32	func_info_off;
+	__u32	func_info_len;
+	__u32	line_info_off;
+	__u32	line_info_len;
+};
+
+struct btf_sec_func_info {
+	__u32	sec_name_off;
+	__u32	num_func_info;
+	/* followed by num_func_info number of bpf_func_info records */
+	__u8	data[0];
+};
+
 typedef int (*btf_print_fn_t)(const char *, ...)
 	__attribute__((format(printf, 1, 2)));
 
@@ -29,4 +68,13 @@ LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__fd(const struct btf *btf);
 LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
 
+struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log);
+void btf_ext__free(struct btf_ext *btf_ext);
+int btf_ext_reloc_init(struct btf *btf, struct btf_ext *btf_ext,
+		       const char *sec_name, __u32 *btf_fd,
+		       void **func_info, __u32 *func_info_len);
+int btf_ext_reloc(struct btf *btf, struct btf_ext *btf_ext,
+		  const char *sec_name, __u32 insns_cnt, void **func_info,
+		  __u32 *func_info_len);
+
 #endif /* __LIBBPF_BTF_H */
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index bd71efcc53be..2ea3b1467d47 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -151,6 +151,9 @@ struct bpf_program {
 	bpf_program_clear_priv_t clear_priv;
 
 	enum bpf_attach_type expected_attach_type;
+	__u32 btf_fd;
+	void *func_info;
+	__u32 func_info_len;
 };
 
 struct bpf_map {
@@ -207,6 +210,7 @@ struct bpf_object {
 	struct list_head list;
 
 	struct btf *btf;
+	struct btf_ext *btf_ext;
 
 	void *priv;
 	bpf_object_clear_priv_t clear_priv;
@@ -783,6 +787,15 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 					   BTF_ELF_SEC, PTR_ERR(obj->btf));
 				obj->btf = NULL;
 			}
+		} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
+			obj->btf_ext = btf_ext__new(data->d_buf, data->d_size,
+						    __pr_debug);
+			if (IS_ERR(obj->btf_ext)) {
+				pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
+					   BTF_EXT_ELF_SEC,
+					   PTR_ERR(obj->btf_ext));
+				obj->btf_ext = NULL;
+			}
 		} else if (sh.sh_type == SHT_SYMTAB) {
 			if (obj->efile.symbols) {
 				pr_warning("bpf: multiple SYMTAB in %s\n",
@@ -1166,6 +1179,7 @@ bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
 	struct bpf_insn *insn, *new_insn;
 	struct bpf_program *text;
 	size_t new_cnt;
+	int err;
 
 	if (relo->type != RELO_CALL)
 		return -LIBBPF_ERRNO__RELOC;
@@ -1188,6 +1202,16 @@ bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
 			pr_warning("oom in prog realloc\n");
 			return -ENOMEM;
 		}
+
+		err = btf_ext_reloc(obj->btf, obj->btf_ext, text->section_name,
+				    prog->insns_cnt, &prog->func_info,
+				    &prog->func_info_len);
+		if (err) {
+			pr_warning("error in btf_ext_reloc for sec %s\n",
+				   text->section_name);
+			return err;
+		}
+
 		memcpy(new_insn + prog->insns_cnt, text->insns,
 		       text->insns_cnt * sizeof(*insn));
 		prog->insns = new_insn;
@@ -1207,7 +1231,19 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 {
 	int i, err;
 
-	if (!prog || !prog->reloc_desc)
+	if (!prog)
+		return 0;
+
+	err = btf_ext_reloc_init(obj->btf, obj->btf_ext, prog->section_name,
+				 &prog->btf_fd, &prog->func_info,
+				 &prog->func_info_len);
+	if (err) {
+		pr_warning("err in btf_ext_reloc_init for sec %s\n",
+			   prog->section_name);
+		return err;
+	}
+
+	if (!prog->reloc_desc)
 		return 0;
 
 	for (i = 0; i < prog->nr_reloc; i++) {
@@ -1297,7 +1333,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
 static int
 load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
 	     const char *name, struct bpf_insn *insns, int insns_cnt,
-	     char *license, __u32 kern_version, int *pfd, int prog_ifindex)
+	     char *license, __u32 kern_version, int *pfd, int prog_ifindex,
+	     u32 btf_fd, struct bpf_func_info *func_info, u32 func_info_len)
 {
 	struct bpf_load_program_attr load_attr;
 	char *cp, errmsg[STRERR_BUFSIZE];
@@ -1313,6 +1350,9 @@ load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
 	load_attr.license = license;
 	load_attr.kern_version = kern_version;
 	load_attr.prog_ifindex = prog_ifindex;
+	load_attr.prog_btf_fd = btf_fd;
+	load_attr.func_info = func_info;
+	load_attr.func_info_len = func_info_len;
 
 	if (!load_attr.insns || !load_attr.insns_cnt)
 		return -EINVAL;
@@ -1396,7 +1436,9 @@ bpf_program__load(struct bpf_program *prog,
 		err = load_program(prog->type, prog->expected_attach_type,
 				   prog->name, prog->insns, prog->insns_cnt,
 				   license, kern_version, &fd,
-				   prog->prog_ifindex);
+				   prog->prog_ifindex,
+				   prog->btf_fd, prog->func_info,
+				   prog->func_info_len);
 		if (!err)
 			prog->instances.fds[0] = fd;
 		goto out;
@@ -1428,7 +1470,9 @@ bpf_program__load(struct bpf_program *prog,
 				   prog->name, result.new_insn_ptr,
 				   result.new_insn_cnt,
 				   license, kern_version, &fd,
-				   prog->prog_ifindex);
+				   prog->prog_ifindex,
+				   prog->btf_fd, prog->func_info,
+				   prog->func_info_len);
 
 		if (err) {
 			pr_warning("Loading the %dth instance of program '%s' failed\n",
@@ -1844,6 +1888,7 @@ void bpf_object__close(struct bpf_object *obj)
 	bpf_object__elf_finish(obj);
 	bpf_object__unload(obj);
 	btf__free(obj->btf);
+	btf_ext__free(obj->btf_ext);
 
 	for (i = 0; i < obj->nr_maps; i++) {
 		zfree(&obj->maps[i].name);
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 01/13] bpf: btf: Break up btf_type_is_void()
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233026.1071519-1-yhs@fb.com>

This patch breaks up btf_type_is_void() into
btf_type_is_void() and btf_type_is_fwd().

It also adds btf_type_nosize() to better describe it is
testing a type has nosize info.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/btf.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 378cef70341c..be406d8906ce 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -306,15 +306,22 @@ static bool btf_type_is_modifier(const struct btf_type *t)
 
 static bool btf_type_is_void(const struct btf_type *t)
 {
-	/* void => no type and size info.
-	 * Hence, FWD is also treated as void.
-	 */
-	return t == &btf_void || BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
+	return t == &btf_void;
+}
+
+static bool btf_type_is_fwd(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
+}
+
+static bool btf_type_nosize(const struct btf_type *t)
+{
+	return btf_type_is_void(t) || btf_type_is_fwd(t);
 }
 
-static bool btf_type_is_void_or_null(const struct btf_type *t)
+static bool btf_type_nosize_or_null(const struct btf_type *t)
 {
-	return !t || btf_type_is_void(t);
+	return !t || btf_type_nosize(t);
 }
 
 /* union is only a special case of struct:
@@ -826,7 +833,7 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
 	u32 size = 0;
 
 	size_type = btf_type_by_id(btf, size_type_id);
-	if (btf_type_is_void_or_null(size_type))
+	if (btf_type_nosize_or_null(size_type))
 		return NULL;
 
 	if (btf_type_has_size(size_type)) {
@@ -842,7 +849,7 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
 		size = btf->resolved_sizes[size_type_id];
 		size_type_id = btf->resolved_ids[size_type_id];
 		size_type = btf_type_by_id(btf, size_type_id);
-		if (btf_type_is_void(size_type))
+		if (btf_type_nosize_or_null(size_type))
 			return NULL;
 	}
 
@@ -1164,7 +1171,7 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 	}
 
 	/* "typedef void new_void", "const void"...etc */
-	if (btf_type_is_void(next_type))
+	if (btf_type_is_void(next_type) || btf_type_is_fwd(next_type))
 		goto resolved;
 
 	if (!env_type_is_resolve_sink(env, next_type) &&
@@ -1178,7 +1185,7 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 	 * pretty print).
 	 */
 	if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
-	    !btf_type_is_void(btf_type_id_resolve(btf, &next_type_id))) {
+	    !btf_type_nosize(btf_type_id_resolve(btf, &next_type_id))) {
 		btf_verifier_log_type(env, v->t, "Invalid type_id");
 		return -EINVAL;
 	}
@@ -1205,7 +1212,7 @@ static int btf_ptr_resolve(struct btf_verifier_env *env,
 	}
 
 	/* "void *" */
-	if (btf_type_is_void(next_type))
+	if (btf_type_is_void(next_type) || btf_type_is_fwd(next_type))
 		goto resolved;
 
 	if (!env_type_is_resolve_sink(env, next_type) &&
@@ -1235,7 +1242,7 @@ static int btf_ptr_resolve(struct btf_verifier_env *env,
 	}
 
 	if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
-	    !btf_type_is_void(btf_type_id_resolve(btf, &next_type_id))) {
+	    !btf_type_nosize(btf_type_id_resolve(btf, &next_type_id))) {
 		btf_verifier_log_type(env, v->t, "Invalid type_id");
 		return -EINVAL;
 	}
@@ -1396,7 +1403,7 @@ static int btf_array_resolve(struct btf_verifier_env *env,
 	/* Check array->index_type */
 	index_type_id = array->index_type;
 	index_type = btf_type_by_id(btf, index_type_id);
-	if (btf_type_is_void_or_null(index_type)) {
+	if (btf_type_nosize_or_null(index_type)) {
 		btf_verifier_log_type(env, v->t, "Invalid index");
 		return -EINVAL;
 	}
@@ -1415,7 +1422,7 @@ static int btf_array_resolve(struct btf_verifier_env *env,
 	/* Check array->type */
 	elem_type_id = array->type;
 	elem_type = btf_type_by_id(btf, elem_type_id);
-	if (btf_type_is_void_or_null(elem_type)) {
+	if (btf_type_nosize_or_null(elem_type)) {
 		btf_verifier_log_type(env, v->t,
 				      "Invalid elem");
 		return -EINVAL;
@@ -1615,7 +1622,7 @@ static int btf_struct_resolve(struct btf_verifier_env *env,
 		const struct btf_type *member_type = btf_type_by_id(env->btf,
 								member_type_id);
 
-		if (btf_type_is_void_or_null(member_type)) {
+		if (btf_type_nosize_or_null(member_type)) {
 			btf_verifier_log_member(env, v->t, member,
 						"Invalid member");
 			return -EINVAL;
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 11/13] tools/bpf: refactor to implement btf_get_from_id() in lib/bpf
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233101.1072913-1-yhs@fb.com>

The function get_btf() is implemented in tools/bpf/bpftool/map.c
to get a btf structure given a map_info. This patch
refactored this function to be function btf_get_from_id()
in tools/lib/bpf so that it can be used later.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/bpf/bpftool/map.c | 68 ++--------------------------------------
 tools/lib/bpf/btf.c     | 69 +++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h     | 18 ++++++-----
 3 files changed, 81 insertions(+), 74 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7bf38f0e152e..1b8a75fa0471 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -215,70 +215,6 @@ static int do_dump_btf(const struct btf_dumper *d,
 	return ret;
 }
 
-static int get_btf(struct bpf_map_info *map_info, struct btf **btf)
-{
-	struct bpf_btf_info btf_info = { 0 };
-	__u32 len = sizeof(btf_info);
-	__u32 last_size;
-	int btf_fd;
-	void *ptr;
-	int err;
-
-	err = 0;
-	*btf = NULL;
-	btf_fd = bpf_btf_get_fd_by_id(map_info->btf_id);
-	if (btf_fd < 0)
-		return 0;
-
-	/* we won't know btf_size until we call bpf_obj_get_info_by_fd(). so
-	 * let's start with a sane default - 4KiB here - and resize it only if
-	 * bpf_obj_get_info_by_fd() needs a bigger buffer.
-	 */
-	btf_info.btf_size = 4096;
-	last_size = btf_info.btf_size;
-	ptr = malloc(last_size);
-	if (!ptr) {
-		err = -ENOMEM;
-		goto exit_free;
-	}
-
-	bzero(ptr, last_size);
-	btf_info.btf = ptr_to_u64(ptr);
-	err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
-
-	if (!err && btf_info.btf_size > last_size) {
-		void *temp_ptr;
-
-		last_size = btf_info.btf_size;
-		temp_ptr = realloc(ptr, last_size);
-		if (!temp_ptr) {
-			err = -ENOMEM;
-			goto exit_free;
-		}
-		ptr = temp_ptr;
-		bzero(ptr, last_size);
-		btf_info.btf = ptr_to_u64(ptr);
-		err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
-	}
-
-	if (err || btf_info.btf_size > last_size) {
-		err = errno;
-		goto exit_free;
-	}
-
-	*btf = btf__new((__u8 *)btf_info.btf, btf_info.btf_size, NULL);
-	if (IS_ERR(*btf)) {
-		err = PTR_ERR(*btf);
-		*btf = NULL;
-	}
-
-exit_free:
-	close(btf_fd);
-	free(ptr);
-
-	return err;
-}
-
 static json_writer_t *get_btf_writer(void)
 {
 	json_writer_t *jw = jsonw_new(stdout);
@@ -765,7 +701,7 @@ static int do_dump(int argc, char **argv)
 
 	prev_key = NULL;
 
-	err = get_btf(&info, &btf);
+	err = btf_get_from_id(info.btf_id, &btf);
 	if (err) {
 		p_err("failed to get btf");
 		goto exit_free;
@@ -909,7 +845,7 @@ static int do_lookup(int argc, char **argv)
 	}
 
 	/* here means bpf_map_lookup_elem() succeeded */
-	err = get_btf(&info, &btf);
+	err = btf_get_from_id(info.btf_id, &btf);
 	if (err) {
 		p_err("failed to get btf");
 		goto exit_free;
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 4748e0bacd2b..ab654628e966 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -42,6 +42,11 @@ struct btf_ext {
 	__u32 func_info_len;
 };
 
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64) (unsigned long) ptr;
+}
+
 static int btf_add_type(struct btf *btf, struct btf_type *t)
 {
 	if (btf->types_size - btf->nr_types < 2) {
@@ -403,6 +408,70 @@ const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
 		return NULL;
 }
 
+int btf_get_from_id(__u32 id, struct btf **btf)
+{
+	struct bpf_btf_info btf_info = { 0 };
+	__u32 len = sizeof(btf_info);
+	__u32 last_size;
+	int btf_fd;
+	void *ptr;
+	int err;
+
+	err = 0;
+	*btf = NULL;
+	btf_fd = bpf_btf_get_fd_by_id(id);
+	if (btf_fd < 0)
+		return 0;
+
+	/* we won't know btf_size until we call bpf_obj_get_info_by_fd(). so
+	 * let's start with a sane default - 4KiB here - and resize it only if
+	 * bpf_obj_get_info_by_fd() needs a bigger buffer.
+	 */
+	btf_info.btf_size = 4096;
+	last_size = btf_info.btf_size;
+	ptr = malloc(last_size);
+	if (!ptr) {
+		err = -ENOMEM;
+		goto exit_free;
+	}
+
+	bzero(ptr, last_size);
+	btf_info.btf = ptr_to_u64(ptr);
+	err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
+
+	if (!err && btf_info.btf_size > last_size) {
+		void *temp_ptr;
+
+		last_size = btf_info.btf_size;
+		temp_ptr = realloc(ptr, last_size);
+		if (!temp_ptr) {
+			err = -ENOMEM;
+			goto exit_free;
+		}
+		ptr = temp_ptr;
+		bzero(ptr, last_size);
+		btf_info.btf = ptr_to_u64(ptr);
+		err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
+	}
+
+	if (err || btf_info.btf_size > last_size) {
+		err = errno;
+		goto exit_free;
+	}
+
+	*btf = btf__new((__u8 *)btf_info.btf, btf_info.btf_size, NULL);
+	if (IS_ERR(*btf)) {
+		err = PTR_ERR(*btf);
+		*btf = NULL;
+	}
+
+exit_free:
+	close(btf_fd);
+	free(ptr);
+
+	return err;
+}
+
 static int btf_ext_validate_func_info(const struct btf_sec_func_info *sinfo,
 				      __u32 size, btf_print_fn_t err_log)
 {
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 513e55c1a928..c555decd70f2 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -67,14 +67,16 @@ LIBBPF_API __s64 btf__resolve_size(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__fd(const struct btf *btf);
 LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
+LIBBPF_API int btf_get_from_id(__u32 id, struct btf **btf);
 
-struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log);
-void btf_ext__free(struct btf_ext *btf_ext);
-int btf_ext_reloc_init(struct btf *btf, struct btf_ext *btf_ext,
-		       const char *sec_name, __u32 *btf_fd,
-		       void **func_info, __u32 *func_info_len);
-int btf_ext_reloc(struct btf *btf, struct btf_ext *btf_ext,
-		  const char *sec_name, __u32 insns_cnt, void **func_info,
-		  __u32 *func_info_len);
+LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size,
+					btf_print_fn_t err_log);
+LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
+LIBBPF_API int btf_ext_reloc_init(struct btf *btf, struct btf_ext *btf_ext,
+				  const char *sec_name, __u32 *btf_fd,
+				  void **func_info, __u32 *func_info_len);
+LIBBPF_API int btf_ext_reloc(struct btf *btf, struct btf_ext *btf_ext,
+			     const char *sec_name, __u32 insns_cnt,
+			     void **func_info, __u32 *func_info_len);
 
 #endif /* __LIBBPF_BTF_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 10/13] tools/bpf: do not use pahole if clang/llvm can generate BTF sections
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team

Add additional checks in tools/testing/selftests/bpf and
samples/bpf such that if clang/llvm compiler can generate
BTF sections, do not use pahole.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/Makefile                 | 8 ++++++++
 tools/testing/selftests/bpf/Makefile | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index be0a961450bc..870fe7ee2b69 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -208,12 +208,20 @@ endif
 BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
+BTF_LLVM_PROBE := $(shell echo "int main() { return 0; }" | \
+			  clang -target bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
+			  readelf -S ./llvm_btf_verify.o | grep BTF; \
+			  /bin/rm -f ./llvm_btf_verify.o)
 
+ifneq ($(BTF_LLVM_PROBE),)
+	EXTRA_CFLAGS += -g
+else
 ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
 	EXTRA_CFLAGS += -g
 	LLC_FLAGS += -mattr=dwarfris
 	DWARF2BTF = y
 endif
+endif
 
 # Trick to allow make to be run from this directory
 all:
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index d99dd6fc3fbe..8d5612724db8 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -121,7 +121,14 @@ $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
 BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
+BTF_LLVM_PROBE := $(shell echo "int main() { return 0; }" | \
+			  clang -target bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
+			  readelf -S ./llvm_btf_verify.o | grep BTF; \
+			  /bin/rm -f ./llvm_btf_verify.o)
 
+ifneq ($(BTF_LLVM_PROBE),)
+	CLANG_FLAGS += -g
+else
 ifneq ($(BTF_LLC_PROBE),)
 ifneq ($(BTF_PAHOLE_PROBE),)
 ifneq ($(BTF_OBJCOPY_PROBE),)
@@ -131,6 +138,7 @@ ifneq ($(BTF_OBJCOPY_PROBE),)
 endif
 endif
 endif
+endif
 
 $(OUTPUT)/%.o: %.c
 	$(CLANG) $(CLANG_FLAGS) \
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233026.1071519-1-yhs@fb.com>

This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
support to the type section. BTF_KIND_FUNC_PROTO is used
to specify the type of a function pointer. With this,
BTF has a complete set of C types (except float).

BTF_KIND_FUNC is used to specify the signature of a
defined subprogram. BTF_KIND_FUNC_PROTO can be referenced
by another type, e.g., a pointer type, and BTF_KIND_FUNC
type cannot be referenced by another type.

For both BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO types,
the func return type is in t->type (where t is a
"struct btf_type" object). The func args are an array of
u32s immediately following object "t".

As a concrete example, for the C program below,
  $ cat test.c
  int foo(int (*bar)(int)) { return bar(5); }
with LLVM patch https://reviews.llvm.org/D53261
in Debug mode, we have
  $ clang -target bpf -g -O2 -mllvm -debug-only=btf -c test.c
  Type Table:
  [1] FUNC NameOff=1 Info=0x0c000001 Size/Type=2
          ParamType=3
  [2] INT NameOff=11 Info=0x01000000 Size/Type=4
          Desc=0x01000020
  [3] PTR NameOff=0 Info=0x02000000 Size/Type=4
  [4] FUNC_PROTO NameOff=0 Info=0x0d000001 Size/Type=2
          ParamType=2

  String Table:
  0 :
  1 : foo
  5 : .text
  11 : int
  15 : test.c
  22 : int foo(int (*bar)(int)) { return bar(5); }

  FuncInfo Table:
  SecNameOff=5
          InsnOffset=<Omitted> TypeId=1

  ...

(Eventually we shall have bpftool to dump btf information
 like the above.)

Function "foo" has a FUNC type (type_id = 1).
The parameter of "foo" has type_id 3 which is PTR->FUNC_PROTO,
where FUNC_PROTO refers to function pointer "bar".

In FuncInfo Table, for section .text, the function,
with to-be-determined offset (marked as <Omitted>),
has type_id=1 which refers to a FUNC type.
This way, the function signature is
available to both kernel and user space.
Here, the insn offset is not available during the dump time
as relocation is resolved pretty late in the compilation process.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/btf.h |   9 +-
 kernel/bpf/btf.c         | 280 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 253 insertions(+), 36 deletions(-)

diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index 972265f32871..63f8500e6f34 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -40,7 +40,8 @@ struct btf_type {
 	/* "size" is used by INT, ENUM, STRUCT and UNION.
 	 * "size" tells the size of the type it is describing.
 	 *
-	 * "type" is used by PTR, TYPEDEF, VOLATILE, CONST and RESTRICT.
+	 * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
+	 * FUNC and FUNC_PROTO.
 	 * "type" is a type_id referring to another type.
 	 */
 	union {
@@ -64,8 +65,10 @@ struct btf_type {
 #define BTF_KIND_VOLATILE	9	/* Volatile	*/
 #define BTF_KIND_CONST		10	/* Const	*/
 #define BTF_KIND_RESTRICT	11	/* Restrict	*/
-#define BTF_KIND_MAX		11
-#define NR_BTF_KINDS		12
+#define BTF_KIND_FUNC		12	/* Function	*/
+#define BTF_KIND_FUNC_PROTO	13	/* Function Prototype	*/
+#define BTF_KIND_MAX		13
+#define NR_BTF_KINDS		14
 
 /* For some specific BTF_KIND, "struct btf_type" is immediately
  * followed by extra data.
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index be406d8906ce..763f8e06bc91 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5,6 +5,7 @@
 #include <uapi/linux/types.h>
 #include <linux/seq_file.h>
 #include <linux/compiler.h>
+#include <linux/ctype.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
 #include <linux/anon_inodes.h>
@@ -259,6 +260,8 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
 	[BTF_KIND_VOLATILE]	= "VOLATILE",
 	[BTF_KIND_CONST]	= "CONST",
 	[BTF_KIND_RESTRICT]	= "RESTRICT",
+	[BTF_KIND_FUNC]		= "FUNC",
+	[BTF_KIND_FUNC_PROTO]	= "FUNC_PROTO",
 };
 
 struct btf_kind_operations {
@@ -281,6 +284,9 @@ struct btf_kind_operations {
 static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS];
 static struct btf_type btf_void;
 
+static int btf_resolve(struct btf_verifier_env *env,
+		       const struct btf_type *t, u32 type_id);
+
 static bool btf_type_is_modifier(const struct btf_type *t)
 {
 	/* Some of them is not strictly a C modifier
@@ -314,9 +320,20 @@ static bool btf_type_is_fwd(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
 }
 
+static bool btf_type_is_func(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC;
+}
+
+static bool btf_type_is_func_proto(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
+}
+
 static bool btf_type_nosize(const struct btf_type *t)
 {
-	return btf_type_is_void(t) || btf_type_is_fwd(t);
+	return btf_type_is_void(t) || btf_type_is_fwd(t) ||
+	       btf_type_is_func(t) || btf_type_is_func_proto(t);
 }
 
 static bool btf_type_nosize_or_null(const struct btf_type *t)
@@ -433,6 +450,27 @@ static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
 		offset < btf->hdr.str_len;
 }
 
+static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
+{
+	/* offset must be valid */
+	const char *src = &btf->strings[offset];
+	const char *src_limit;
+
+	if (!isalpha(*src) && *src != '_')
+		return false;
+
+	/* set a limit on identifier length */
+	src_limit = src + KSYM_NAME_LEN;
+	src++;
+	while (*src && src < src_limit) {
+		if (!isalnum(*src) && *src != '_')
+			return false;
+		src++;
+	}
+
+	return !*src;
+}
+
 static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 {
 	if (!offset)
@@ -747,7 +785,9 @@ static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
 		/* int, enum or void is a sink */
 		return !btf_type_needs_resolve(next_type);
 	case RESOLVE_PTR:
-		/* int, enum, void, struct or array is a sink for ptr */
+		/* int, enum, void, struct, array or func_ptoto is a sink
+		 * for ptr
+		 */
 		return !btf_type_is_modifier(next_type) &&
 			!btf_type_is_ptr(next_type);
 	case RESOLVE_STRUCT_OR_ARRAY:
@@ -1170,6 +1210,14 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
+	if (btf_type_is_func_proto(next_type)) {
+		if (env->resolve_mode == RESOLVE_PTR)
+			goto resolved;
+
+		btf_verifier_log_type(env, v->t, "Invalid type_id");
+		return -EINVAL;
+	}
+
 	/* "typedef void new_void", "const void"...etc */
 	if (btf_type_is_void(next_type) || btf_type_is_fwd(next_type))
 		goto resolved;
@@ -1185,7 +1233,8 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 	 * pretty print).
 	 */
 	if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
-	    !btf_type_nosize(btf_type_id_resolve(btf, &next_type_id))) {
+	    (!btf_type_nosize(btf_type_id_resolve(btf, &next_type_id)) ||
+	     btf_type_is_func(next_type))) {
 		btf_verifier_log_type(env, v->t, "Invalid type_id");
 		return -EINVAL;
 	}
@@ -1212,7 +1261,8 @@ static int btf_ptr_resolve(struct btf_verifier_env *env,
 	}
 
 	/* "void *" */
-	if (btf_type_is_void(next_type) || btf_type_is_fwd(next_type))
+	if (btf_type_is_void(next_type) || btf_type_is_fwd(next_type) ||
+	    btf_type_is_func_proto(next_type))
 		goto resolved;
 
 	if (!env_type_is_resolve_sink(env, next_type) &&
@@ -1242,7 +1292,8 @@ static int btf_ptr_resolve(struct btf_verifier_env *env,
 	}
 
 	if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
-	    !btf_type_nosize(btf_type_id_resolve(btf, &next_type_id))) {
+	    (!btf_type_nosize(btf_type_id_resolve(btf, &next_type_id)) ||
+	     btf_type_is_func(next_type))) {
 		btf_verifier_log_type(env, v->t, "Invalid type_id");
 		return -EINVAL;
 	}
@@ -1550,6 +1601,14 @@ static s32 btf_struct_check_meta(struct btf_verifier_env *env,
 			return -EINVAL;
 		}
 
+		if (member->name_off &&
+		    !btf_name_valid_identifier(btf, member->name_off)) {
+			btf_verifier_log_member(env, t, member,
+						"Invalid member name:%s",
+						btf_name_by_offset(btf, member->name_off));
+			return -EINVAL;
+		}
+
 		/* A member cannot be in type void */
 		if (!member->type || !BTF_TYPE_ID_VALID(member->type)) {
 			btf_verifier_log_member(env, t, member,
@@ -1787,6 +1846,142 @@ static struct btf_kind_operations enum_ops = {
 	.seq_show = btf_enum_seq_show,
 };
 
+static s32 btf_func_check_meta(struct btf_verifier_env *env,
+			       const struct btf_type *t,
+			       u32 meta_left)
+{
+	u32 meta_needed = btf_type_vlen(t) * sizeof(u32);
+
+	if (meta_left < meta_needed) {
+		btf_verifier_log_basic(env, t,
+				       "meta_left:%u meta_needed:%u",
+				       meta_left, meta_needed);
+		return -EINVAL;
+	}
+
+	btf_verifier_log_type(env, t, NULL);
+
+	return meta_needed;
+}
+
+static void btf_func_log(struct btf_verifier_env *env,
+			 const struct btf_type *t)
+{
+	u16 nr_args = btf_type_vlen(t), i;
+	u32 *args = (u32 *)(t + 1);
+
+	btf_verifier_log(env, "return=%u args=(", t->type);
+	if (!nr_args) {
+		btf_verifier_log(env, "void");
+		goto done;
+	}
+
+	if (nr_args == 1 && !args[0]) {
+		/* Only one vararg */
+		btf_verifier_log(env, "vararg");
+		goto done;
+	}
+
+	btf_verifier_log(env, "%u", args[0]);
+	for (i = 1; i < nr_args - 1; i++)
+		btf_verifier_log(env, ", %u", args[i]);
+	if (nr_args > 1) {
+		u32 last_arg = args[nr_args - 1];
+
+		if (last_arg)
+			btf_verifier_log(env, ", %u", last_arg);
+		else
+			btf_verifier_log(env, ", vararg");
+	}
+
+done:
+	btf_verifier_log(env, ")");
+}
+
+static int btf_func_check(struct btf_verifier_env *env,
+			  const struct btf_type *t,
+			  u32 type_id)
+{
+	u16 nr_args = btf_type_vlen(t), i;
+	const struct btf *btf = env->btf;
+	const struct btf_type *ret_type;
+	u32 *args = (u32 *)(t + 1);
+	u32 ret_type_id;
+	int err;
+
+	/* Check func return type which could be "void" (t->type == 0) */
+	ret_type_id = t->type;
+	if (ret_type_id) {
+		/* return type is not "void" */
+		ret_type = btf_type_by_id(btf, ret_type_id);
+		if (btf_type_needs_resolve(ret_type) &&
+		    !env_type_is_resolved(env, ret_type_id)) {
+			err = btf_resolve(env, ret_type, ret_type_id);
+			if (err)
+				return err;
+		}
+
+		/* Ensure the return type is a type that has a size */
+		if (!btf_type_id_size(btf, &ret_type_id, NULL)) {
+			btf_verifier_log_type(env, t, "Invalid return type");
+			return -EINVAL;
+		}
+	}
+
+	if (!nr_args)
+		return 0;
+
+	/* Last func arg type_id could be 0 if it is a vararg */
+	if (!args[nr_args - 1])
+		nr_args--;
+
+	err = 0;
+	for (i = 0; i < nr_args; i++) {
+		const struct btf_type *arg_type;
+		u32 arg_type_id;
+
+		arg_type_id = args[i];
+		arg_type = btf_type_by_id(btf, arg_type_id);
+		if (!arg_type) {
+			btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
+			err = -EINVAL;
+			break;
+		}
+
+		if (btf_type_needs_resolve(arg_type) &&
+		    !env_type_is_resolved(env, arg_type_id)) {
+			err = btf_resolve(env, arg_type, arg_type_id);
+			if (err)
+				break;
+		}
+
+		if (!btf_type_id_size(btf, &arg_type_id, NULL)) {
+			btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
+			err = -EINVAL;
+			break;
+		}
+	}
+
+	return err;
+}
+
+static struct btf_kind_operations func_ops = {
+	.check_meta = btf_func_check_meta,
+	.resolve = btf_df_resolve,
+	/*
+	 * BPF_KIND_FUNC_PROTO cannot be directly referred by
+	 * a struct's member.
+	 *
+	 * It should be a funciton pointer instead.
+	 * (i.e. struct's member -> BTF_KIND_PTR -> BTF_KIND_FUNC_PROTO)
+	 *
+	 * Hence, there is no btf_func_check_member().
+	 */
+	.check_member = btf_df_check_member,
+	.log_details = btf_func_log,
+	.seq_show = btf_df_seq_show,
+};
+
 static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS] = {
 	[BTF_KIND_INT] = &int_ops,
 	[BTF_KIND_PTR] = &ptr_ops,
@@ -1799,6 +1994,8 @@ static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS] = {
 	[BTF_KIND_VOLATILE] = &modifier_ops,
 	[BTF_KIND_CONST] = &modifier_ops,
 	[BTF_KIND_RESTRICT] = &modifier_ops,
+	[BTF_KIND_FUNC] = &func_ops,
+	[BTF_KIND_FUNC_PROTO] = &func_ops,
 };
 
 static s32 btf_check_meta(struct btf_verifier_env *env,
@@ -1870,30 +2067,6 @@ static int btf_check_all_metas(struct btf_verifier_env *env)
 	return 0;
 }
 
-static int btf_resolve(struct btf_verifier_env *env,
-		       const struct btf_type *t, u32 type_id)
-{
-	const struct resolve_vertex *v;
-	int err = 0;
-
-	env->resolve_mode = RESOLVE_TBD;
-	env_stack_push(env, t, type_id);
-	while (!err && (v = env_stack_peak(env))) {
-		env->log_type_id = v->type_id;
-		err = btf_type_ops(v->t)->resolve(env, v);
-	}
-
-	env->log_type_id = type_id;
-	if (err == -E2BIG)
-		btf_verifier_log_type(env, t,
-				      "Exceeded max resolving depth:%u",
-				      MAX_RESOLVE_DEPTH);
-	else if (err == -EEXIST)
-		btf_verifier_log_type(env, t, "Loop detected");
-
-	return err;
-}
-
 static bool btf_resolve_valid(struct btf_verifier_env *env,
 			      const struct btf_type *t,
 			      u32 type_id)
@@ -1927,6 +2100,39 @@ static bool btf_resolve_valid(struct btf_verifier_env *env,
 	return false;
 }
 
+static int btf_resolve(struct btf_verifier_env *env,
+		       const struct btf_type *t, u32 type_id)
+{
+	u32 save_log_type_id = env->log_type_id;
+	const struct resolve_vertex *v;
+	int err = 0;
+
+	env->resolve_mode = RESOLVE_TBD;
+	env_stack_push(env, t, type_id);
+	while (!err && (v = env_stack_peak(env))) {
+		env->log_type_id = v->type_id;
+		err = btf_type_ops(v->t)->resolve(env, v);
+	}
+
+	env->log_type_id = type_id;
+	if (err == -E2BIG) {
+		btf_verifier_log_type(env, t,
+				      "Exceeded max resolving depth:%u",
+				      MAX_RESOLVE_DEPTH);
+	} else if (err == -EEXIST) {
+		btf_verifier_log_type(env, t, "Loop detected");
+	}
+
+	/* Final sanity check */
+	if (!err && !btf_resolve_valid(env, t, type_id)) {
+		btf_verifier_log_type(env, t, "Invalid resolve state");
+		err = -EINVAL;
+	}
+
+	env->log_type_id = save_log_type_id;
+	return err;
+}
+
 static int btf_check_all_types(struct btf_verifier_env *env)
 {
 	struct btf *btf = env->btf;
@@ -1949,10 +2155,18 @@ static int btf_check_all_types(struct btf_verifier_env *env)
 				return err;
 		}
 
-		if (btf_type_needs_resolve(t) &&
-		    !btf_resolve_valid(env, t, type_id)) {
-			btf_verifier_log_type(env, t, "Invalid resolve state");
-			return -EINVAL;
+		if (btf_type_is_func(t) || btf_type_is_func_proto(t)) {
+			if (btf_type_is_func(t) &&
+			    (!btf_name_offset_valid(btf, t->name_off) ||
+			     !btf_name_valid_identifier(btf, t->name_off))) {
+				btf_verifier_log_type(env, t,
+						      "Invalid type_id");
+				return -EINVAL;
+			}
+
+			err = btf_func_check(env, t, type_id);
+			if (err)
+				return err;
 		}
 	}
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 08/13] tools/bpf: extends test_btf to test load/retrieve func_type info
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233042.1072050-1-yhs@fb.com>

A two function bpf program is loaded with btf and func_info.
After successful prog load, the bpf_get_info syscall is called
to retrieve prog info to ensure the types returned from the
kernel matches the types passed to the kernel from the
user space.

Several negative tests are also added to test loading/retriving
of func_type info.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/test_btf.c | 278 ++++++++++++++++++++++++-
 1 file changed, 275 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index b6461c3c5e11..e03a8cea4bb7 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -5,6 +5,7 @@
 #include <linux/btf.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
+#include <linux/filter.h>
 #include <bpf/bpf.h>
 #include <sys/resource.h>
 #include <libelf.h>
@@ -22,9 +23,13 @@
 #include "bpf_rlimit.h"
 #include "bpf_util.h"
 
+#define MAX_INSNS	512
+#define MAX_SUBPROGS	16
+
 static uint32_t pass_cnt;
 static uint32_t error_cnt;
 static uint32_t skip_cnt;
+static bool jit_enabled;
 
 #define CHECK(condition, format...) ({					\
 	int __ret = !!(condition);					\
@@ -60,6 +65,24 @@ static int __base_pr(const char *format, ...)
 	return err;
 }
 
+static bool is_jit_enabled(void)
+{
+	const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
+	bool enabled = false;
+	int sysctl_fd;
+
+	sysctl_fd = open(jit_sysctl, 0, O_RDONLY);
+	if (sysctl_fd != -1) {
+		char tmpc;
+
+		if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1)
+			enabled = (tmpc != '0');
+		close(sysctl_fd);
+	}
+
+	return enabled;
+}
+
 #define BTF_INFO_ENC(kind, root, vlen)			\
 	((!!(root) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN))
 
@@ -103,6 +126,7 @@ static struct args {
 	bool get_info_test;
 	bool pprint_test;
 	bool always_log;
+	bool func_type_test;
 } args;
 
 static char btf_log_buf[BTF_LOG_BUF_SIZE];
@@ -2693,16 +2717,256 @@ static int test_pprint(void)
 	return err;
 }
 
+static struct btf_func_type_test {
+	const char *descr;
+	const char *str_sec;
+	__u32 raw_types[MAX_NR_RAW_TYPES];
+	__u32 str_sec_size;
+	struct bpf_insn insns[MAX_INSNS];
+	__u32 prog_type;
+	struct bpf_func_info func_info[MAX_SUBPROGS];
+	__u32 func_info_len;
+	bool expected_prog_load_failure;
+} func_type_test[] = {
+
+{
+	.descr = "func_type test #1",
+	.str_sec = "\0int\0unsigned int\0funcA\0funcB",
+	.raw_types = {
+		BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 32, 4),               /* [2] */
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [3] */
+		1, 2,
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [4] */
+		2, 1,
+		BTF_END_RAW,
+	},
+	.str_sec_size = sizeof("\0int\0unsigned int\0funcA\0funcB"),
+	.insns = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	.func_info = { {0, 3}, {3, 4} },
+	.func_info_len = 2 * sizeof(struct bpf_func_info),
+},
+
+{
+	.descr = "func_type test #2",
+	.str_sec = "\0int\0unsigned int\0funcA\0funcB",
+	.raw_types = {
+		BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 32, 4),               /* [2] */
+		/* incorrect func type */
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 2), 1),  /* [3] */
+		1, 2,
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [4] */
+		2, 1,
+		BTF_END_RAW,
+	},
+	.str_sec_size = sizeof("\0int\0unsigned int\0funcA\0funcB"),
+	.insns = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	.func_info = { {0, 3}, {3, 4} },
+	.func_info_len = 2 * sizeof(struct bpf_func_info),
+	.expected_prog_load_failure = true,
+},
+
+{
+	.descr = "func_type test #3",
+	.str_sec = "\0int\0unsigned int\0funcA\0funcB",
+	.raw_types = {
+		BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 32, 4),               /* [2] */
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [3] */
+		1, 2,
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [4] */
+		2, 1,
+		BTF_END_RAW,
+	},
+	.str_sec_size = sizeof("\0int\0unsigned int\0funcA\0funcB"),
+	.insns = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	.func_info = { {0, 3}, {3, 4} },
+	/* incorrect func_info_len */
+	.func_info_len = sizeof(struct bpf_func_info),
+	.expected_prog_load_failure = true,
+},
+
+{
+	.descr = "func_type test #4",
+	.str_sec = "\0int\0unsigned int\0funcA\0funcB",
+	.raw_types = {
+		BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
+		BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 32, 4),               /* [2] */
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [3] */
+		1, 2,
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 1),  /* [4] */
+		2, 1,
+		BTF_END_RAW,
+	},
+	.str_sec_size = sizeof("\0int\0unsigned int\0funcA\0funcB"),
+	.insns = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	/* incorrect func_info insn_offset */
+	.func_info = { {0, 3}, {2, 4} },
+	.func_info_len = 2 * sizeof(struct bpf_func_info),
+	.expected_prog_load_failure = true,
+},
+
+};
+
+static size_t probe_prog_length(const struct bpf_insn *fp)
+{
+	size_t len;
+
+	for (len = MAX_INSNS - 1; len > 0; --len)
+		if (fp[len].code != 0 || fp[len].imm != 0)
+			break;
+	return len + 1;
+}
+
+static int do_test_func_type(int test_num)
+{
+	const struct btf_func_type_test *test = &func_type_test[test_num];
+	__u32 func_lens[4], func_types[4], info_len;
+	struct bpf_load_program_attr attr = {};
+	unsigned int raw_btf_size;
+	struct bpf_prog_info info = {};
+	int btf_fd, prog_fd, err = 0;
+	void *raw_btf;
+
+	fprintf(stderr, "%s......", test->descr);
+	raw_btf = btf_raw_create(&hdr_tmpl, test->raw_types,
+				 test->str_sec, test->str_sec_size,
+				 &raw_btf_size);
+
+	if (!raw_btf)
+		return -1;
+
+	*btf_log_buf = '\0';
+	btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+			      btf_log_buf, BTF_LOG_BUF_SIZE,
+			      args.always_log);
+	free(raw_btf);
+
+	if (CHECK(btf_fd == -1, "invalid btf_fd errno:%d", errno)) {
+		fprintf(stderr, "%s\n", btf_log_buf);
+		err = -1;
+		goto done;
+	}
+
+	attr.prog_type = test->prog_type;
+	attr.insns = test->insns;
+	attr.insns_cnt = probe_prog_length(attr.insns);
+	attr.license = "GPL";
+	attr.prog_btf_fd = btf_fd;
+	attr.func_info_len = test->func_info_len;
+	attr.func_info = test->func_info;
+
+	*btf_log_buf = '\0';
+	prog_fd = bpf_load_program_xattr(&attr, btf_log_buf,
+					 BTF_LOG_BUF_SIZE);
+	if (test->expected_prog_load_failure && prog_fd == -1) {
+		err = 0;
+		goto done;
+	}
+	if (CHECK(prog_fd == -1, "invalid prog_id errno:%d", errno)) {
+		fprintf(stderr, "%s\n", btf_log_buf);
+		err = -1;
+		goto done;
+	}
+	if (!jit_enabled) {
+		skip_cnt++;
+		fprintf(stderr, "SKIPPED, please enable sysctl bpf_jit_enable\n");
+		err = 0;
+		goto done;
+	}
+
+	info_len = sizeof(struct bpf_prog_info);
+	info.nr_jited_func_types = 4;
+	info.nr_jited_func_lens = 4;
+	info.jited_func_types = ptr_to_u64(&func_types[0]);
+	info.jited_func_lens = ptr_to_u64(&func_lens[0]);
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (CHECK(err == -1, "invalid get info errno:%d", errno)) {
+		fprintf(stderr, "%s\n", btf_log_buf);
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.nr_jited_func_lens != 2,
+		  "incorrect info.nr_jited_func_lens %d\n",
+		  info.nr_jited_func_lens)) {
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.nr_jited_func_types != 2,
+		  "incorrect info.nr_jited_func_types %d\n",
+		  info.nr_jited_func_types)) {
+		err = -1;
+		goto done;
+	}
+	if (CHECK(info.btf_id == 0, "incorrect btf_id = 0")) {
+		err = -1;
+		goto done;
+	}
+	if (CHECK(func_types[0] != test->func_info[0].type_id ||
+		  func_types[1] != test->func_info[1].type_id,
+		  "incorrect func_types (%u, %u) expected (%u %d)",
+		  func_types[0], func_types[1],
+		  test->func_info[0].type_id, test->func_info[1].type_id)) {
+		err = -1;
+		goto done;
+	}
+
+done:
+	return err;
+}
+
+static int test_func_type(void)
+{
+	unsigned int i;
+	int err = 0;
+
+	for (i = 0; i < ARRAY_SIZE(func_type_test); i++)
+		err |= count_result(do_test_func_type(i));
+
+	return err;
+}
+
 static void usage(const char *cmd)
 {
-	fprintf(stderr, "Usage: %s [-l] [[-r test_num (1 - %zu)] | [-g test_num (1 - %zu)] | [-f test_num (1 - %zu)] | [-p]]\n",
+	fprintf(stderr, "Usage: %s [-l] [[-r test_num (1 - %zu)] |"
+			" [-g test_num (1 - %zu)] |"
+			" [-f test_num (1 - %zu)] | [-p] | [-k] ]\n",
 		cmd, ARRAY_SIZE(raw_tests), ARRAY_SIZE(get_info_tests),
 		ARRAY_SIZE(file_tests));
 }
 
 static int parse_args(int argc, char **argv)
 {
-	const char *optstr = "lpf:r:g:";
+	const char *optstr = "lpkf:r:g:";
 	int opt;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
@@ -2725,6 +2989,9 @@ static int parse_args(int argc, char **argv)
 		case 'p':
 			args.pprint_test = true;
 			break;
+		case 'k':
+			args.func_type_test = true;
+			break;
 		case 'h':
 			usage(argv[0]);
 			exit(0);
@@ -2778,6 +3045,8 @@ int main(int argc, char **argv)
 	if (args.always_log)
 		libbpf_set_print(__base_pr, __base_pr, __base_pr);
 
+	jit_enabled = is_jit_enabled();
+
 	if (args.raw_test)
 		err |= test_raw();
 
@@ -2790,8 +3059,11 @@ int main(int argc, char **argv)
 	if (args.pprint_test)
 		err |= test_pprint();
 
+	if (args.func_type_test)
+		err |= test_func_type();
+
 	if (args.raw_test || args.get_info_test || args.file_test ||
-	    args.pprint_test)
+	    args.pprint_test || args.func_type_test)
 		goto done;
 
 	err |= test_raw();
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 03/13] tools/bpf: sync kernel btf.h header
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233026.1071519-1-yhs@fb.com>

The kernel uapi btf.h is synced to the tools directory.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/btf.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
index 972265f32871..63f8500e6f34 100644
--- a/tools/include/uapi/linux/btf.h
+++ b/tools/include/uapi/linux/btf.h
@@ -40,7 +40,8 @@ struct btf_type {
 	/* "size" is used by INT, ENUM, STRUCT and UNION.
 	 * "size" tells the size of the type it is describing.
 	 *
-	 * "type" is used by PTR, TYPEDEF, VOLATILE, CONST and RESTRICT.
+	 * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
+	 * FUNC and FUNC_PROTO.
 	 * "type" is a type_id referring to another type.
 	 */
 	union {
@@ -64,8 +65,10 @@ struct btf_type {
 #define BTF_KIND_VOLATILE	9	/* Volatile	*/
 #define BTF_KIND_CONST		10	/* Const	*/
 #define BTF_KIND_RESTRICT	11	/* Restrict	*/
-#define BTF_KIND_MAX		11
-#define NR_BTF_KINDS		12
+#define BTF_KIND_FUNC		12	/* Function	*/
+#define BTF_KIND_FUNC_PROTO	13	/* Function Prototype	*/
+#define BTF_KIND_MAX		13
+#define NR_BTF_KINDS		14
 
 /* For some specific BTF_KIND, "struct btf_type" is immediately
  * followed by extra data.
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next v3 07/13] tools/bpf: add new fields for program load in lib/bpf
From: Yonghong Song @ 2018-10-17 23:30 UTC (permalink / raw)
  To: ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017233042.1072050-1-yhs@fb.com>

The new fields are added for program load in lib/bpf so
application uses api bpf_load_program_xattr() is able
to load program with btf and func_info data.

This functionality will be used in next patch
by bpf selftest test_btf.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c | 3 +++
 tools/lib/bpf/bpf.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d70a255cb05e..d8d48ab34220 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -196,6 +196,9 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	attr.log_level = 0;
 	attr.kern_version = load_attr->kern_version;
 	attr.prog_ifindex = load_attr->prog_ifindex;
+	attr.prog_btf_fd = load_attr->prog_btf_fd;
+	attr.func_info_len = load_attr->func_info_len;
+	attr.func_info = ptr_to_u64(load_attr->func_info);
 	memcpy(attr.prog_name, load_attr->name,
 	       min(name_len, BPF_OBJ_NAME_LEN - 1));
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 258c3c178333..d2bdaffd7712 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -74,6 +74,9 @@ struct bpf_load_program_attr {
 	const char *license;
 	__u32 kern_version;
 	__u32 prog_ifindex;
+	__u32 prog_btf_fd;
+	__u32 func_info_len;
+	const struct bpf_func_info *func_info;
 };
 
 /* Flags to direct loading requirements */
-- 
2.17.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox