Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 5/7] bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info
From: Martin KaFai Lau @ 2018-05-22 21:57 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Yonghong Song, kernel-team
In-Reply-To: <20180522215723.784040-1-kafai@fb.com>

In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
could cause confusion because the "id" of "btf_id" means the BPF obj id
given to the BTF object while
"btf_key_id" and "btf_value_id" means the BTF type id within
that BTF object.

To make it clear, btf_key_id and btf_value_id are
renamed to btf_key_type_id and btf_value_type_id.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  4 ++--
 include/uapi/linux/bpf.h |  8 ++++----
 kernel/bpf/arraymap.c    |  2 +-
 kernel/bpf/syscall.c     | 18 +++++++++---------
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f6fe3c719ca8..1795eeee846c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -69,8 +69,8 @@ struct bpf_map {
 	u32 pages;
 	u32 id;
 	int numa_node;
-	u32 btf_key_id;
-	u32 btf_value_id;
+	u32 btf_key_type_id;
+	u32 btf_value_type_id;
 	struct btf *btf;
 	bool unpriv_array;
 	/* 55 bytes hole */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97446bbe2ca5..c3e502d06bc3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -284,8 +284,8 @@ union bpf_attr {
 		char	map_name[BPF_OBJ_NAME_LEN];
 		__u32	map_ifindex;	/* ifindex of netdev to create on */
 		__u32	btf_fd;		/* fd pointing to a BTF type data */
-		__u32	btf_key_id;	/* BTF type_id of the key */
-		__u32	btf_value_id;	/* BTF type_id of the value */
+		__u32	btf_key_type_id;	/* BTF type_id of the key */
+		__u32	btf_value_type_id;	/* BTF type_id of the value */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -2219,8 +2219,8 @@ struct bpf_map_info {
 	__u64 netns_dev;
 	__u64 netns_ino;
 	__u32 btf_id;
-	__u32 btf_key_id;
-	__u32 btf_value_id;
+	__u32 btf_key_type_id;
+	__u32 btf_value_type_id;
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 0fd8d8f1a398..544e58f5f642 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -352,7 +352,7 @@ static void array_map_seq_show_elem(struct bpf_map *map, void *key,
 	}
 
 	seq_printf(m, "%u: ", *(u32 *)key);
-	btf_type_seq_show(map->btf, map->btf_value_id, value, m);
+	btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
 	seq_puts(m, "\n");
 
 	rcu_read_unlock();
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2b29ef84ded3..0b4c94551001 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -422,7 +422,7 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 	return 0;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD btf_value_id
+#define BPF_MAP_CREATE_LAST_FIELD btf_value_type_id
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
@@ -457,10 +457,10 @@ static int map_create(union bpf_attr *attr)
 	atomic_set(&map->usercnt, 1);
 
 	if (bpf_map_support_seq_show(map) &&
-	    (attr->btf_key_id || attr->btf_value_id)) {
+	    (attr->btf_key_type_id || attr->btf_value_type_id)) {
 		struct btf *btf;
 
-		if (!attr->btf_key_id || !attr->btf_value_id) {
+		if (!attr->btf_key_type_id || !attr->btf_value_type_id) {
 			err = -EINVAL;
 			goto free_map_nouncharge;
 		}
@@ -471,16 +471,16 @@ static int map_create(union bpf_attr *attr)
 			goto free_map_nouncharge;
 		}
 
-		err = map->ops->map_check_btf(map, btf, attr->btf_key_id,
-					      attr->btf_value_id);
+		err = map->ops->map_check_btf(map, btf, attr->btf_key_type_id,
+					      attr->btf_value_type_id);
 		if (err) {
 			btf_put(btf);
 			goto free_map_nouncharge;
 		}
 
 		map->btf = btf;
-		map->btf_key_id = attr->btf_key_id;
-		map->btf_value_id = attr->btf_value_id;
+		map->btf_key_type_id = attr->btf_key_type_id;
+		map->btf_value_type_id = attr->btf_value_type_id;
 	}
 
 	err = security_bpf_map_alloc(map);
@@ -2013,8 +2013,8 @@ static int bpf_map_get_info_by_fd(struct bpf_map *map,
 
 	if (map->btf) {
 		info.btf_id = btf_id(map->btf);
-		info.btf_key_id = map->btf_key_id;
-		info.btf_value_id = map->btf_value_id;
+		info.btf_key_type_id = map->btf_key_type_id;
+		info.btf_value_type_id = map->btf_value_type_id;
 	}
 
 	if (bpf_map_is_dev_bound(map)) {
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v2 2/7] bpf: btf: Change how section is supported in btf_header
From: Martin KaFai Lau @ 2018-05-22 21:57 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Yonghong Song, kernel-team
In-Reply-To: <20180522215723.784040-1-kafai@fb.com>

There are currently unused section descriptions in the btf_header.  Those
sections are here to support future BTF use cases.  For example, the
func section (func_off) is to support function signature (e.g. the BPF
prog function signature).

Instead of spelling out all potential sections up-front in the btf_header.
This patch makes changes to btf_header such that extending it (e.g. adding
a section) is possible later.  The unused ones can be removed for now and
they can be added back later.

This patch:
1. adds a hdr_len to the btf_header.  It will allow adding
sections (and other info like parent_label and parent_name)
later.  The check is similar to the existing bpf_attr.
If a user passes in a longer hdr_len, the kernel
ensures the extra tailing bytes are 0.

2. allows the section order in the BTF object to be
different from its sec_off order in btf_header.

3. each sec_off is followed by a sec_len.  It must not have gap or
overlapping among sections.

The string section is ensured to be at the end due to the 4 bytes
alignment requirement of the type section.

The above changes will allow enough flexibility to
add new sections (and other info) to the btf_header later.

This patch also removes an unnecessary !err check
at the end of btf_parse().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/uapi/linux/btf.h |   8 +-
 kernel/bpf/btf.c         | 209 +++++++++++++++++++++++++++++++++++------------
 2 files changed, 160 insertions(+), 57 deletions(-)

diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index bcb56ee47014..4fa479741a02 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -12,15 +12,11 @@ struct btf_header {
 	__u16	magic;
 	__u8	version;
 	__u8	flags;
-
-	__u32	parent_label;
-	__u32	parent_name;
+	__u32	hdr_len;
 
 	/* All offsets are in bytes relative to the end of this header */
-	__u32	label_off;	/* offset of label section	*/
-	__u32	object_off;	/* offset of data object section*/
-	__u32	func_off;	/* offset of function section	*/
 	__u32	type_off;	/* offset of type section	*/
+	__u32	type_len;	/* length of type section	*/
 	__u32	str_off;	/* offset of string section	*/
 	__u32	str_len;	/* length of string section	*/
 };
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ded10ab47b8a..75da6cbae47d 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -12,6 +12,7 @@
 #include <linux/uaccess.h>
 #include <linux/kernel.h>
 #include <linux/idr.h>
+#include <linux/sort.h>
 #include <linux/bpf_verifier.h>
 #include <linux/btf.h>
 
@@ -184,15 +185,13 @@ static DEFINE_IDR(btf_idr);
 static DEFINE_SPINLOCK(btf_idr_lock);
 
 struct btf {
-	union {
-		struct btf_header *hdr;
-		void *data;
-	};
+	void *data;
 	struct btf_type **types;
 	u32 *resolved_ids;
 	u32 *resolved_sizes;
 	const char *strings;
 	void *nohdr_data;
+	struct btf_header hdr;
 	u32 nr_types;
 	u32 types_size;
 	u32 data_size;
@@ -228,6 +227,11 @@ enum resolve_mode {
 
 #define MAX_RESOLVE_DEPTH 32
 
+struct btf_sec_info {
+	u32 off;
+	u32 len;
+};
+
 struct btf_verifier_env {
 	struct btf *btf;
 	u8 *visit_states;
@@ -418,14 +422,14 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
 static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
 {
 	return !BTF_STR_TBL_ELF_ID(offset) &&
-		BTF_STR_OFFSET(offset) < btf->hdr->str_len;
+		BTF_STR_OFFSET(offset) < btf->hdr.str_len;
 }
 
 static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 {
 	if (!BTF_STR_OFFSET(offset))
 		return "(anon)";
-	else if (BTF_STR_OFFSET(offset) < btf->hdr->str_len)
+	else if (BTF_STR_OFFSET(offset) < btf->hdr.str_len)
 		return &btf->strings[BTF_STR_OFFSET(offset)];
 	else
 		return "(invalid-name-offset)";
@@ -536,7 +540,8 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
 	__btf_verifier_log(log, "\n");
 }
 
-static void btf_verifier_log_hdr(struct btf_verifier_env *env)
+static void btf_verifier_log_hdr(struct btf_verifier_env *env,
+				 u32 btf_data_size)
 {
 	struct bpf_verifier_log *log = &env->log;
 	const struct btf *btf = env->btf;
@@ -545,19 +550,16 @@ static void btf_verifier_log_hdr(struct btf_verifier_env *env)
 	if (!bpf_verifier_log_needed(log))
 		return;
 
-	hdr = btf->hdr;
+	hdr = &btf->hdr;
 	__btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
 	__btf_verifier_log(log, "version: %u\n", hdr->version);
 	__btf_verifier_log(log, "flags: 0x%x\n", hdr->flags);
-	__btf_verifier_log(log, "parent_label: %u\n", hdr->parent_label);
-	__btf_verifier_log(log, "parent_name: %u\n", hdr->parent_name);
-	__btf_verifier_log(log, "label_off: %u\n", hdr->label_off);
-	__btf_verifier_log(log, "object_off: %u\n", hdr->object_off);
-	__btf_verifier_log(log, "func_off: %u\n", hdr->func_off);
+	__btf_verifier_log(log, "hdr_len: %u\n", hdr->hdr_len);
 	__btf_verifier_log(log, "type_off: %u\n", hdr->type_off);
+	__btf_verifier_log(log, "type_len: %u\n", hdr->type_len);
 	__btf_verifier_log(log, "str_off: %u\n", hdr->str_off);
 	__btf_verifier_log(log, "str_len: %u\n", hdr->str_len);
-	__btf_verifier_log(log, "btf_total_size: %u\n", btf->data_size);
+	__btf_verifier_log(log, "btf_total_size: %u\n", btf_data_size);
 }
 
 static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
@@ -1754,9 +1756,9 @@ static int btf_check_all_metas(struct btf_verifier_env *env)
 	struct btf_header *hdr;
 	void *cur, *end;
 
-	hdr = btf->hdr;
+	hdr = &btf->hdr;
 	cur = btf->nohdr_data + hdr->type_off;
-	end = btf->nohdr_data + hdr->str_off;
+	end = btf->nohdr_data + hdr->type_len;
 
 	env->log_type_id = 1;
 	while (cur < end) {
@@ -1866,8 +1868,20 @@ static int btf_check_all_types(struct btf_verifier_env *env)
 
 static int btf_parse_type_sec(struct btf_verifier_env *env)
 {
+	const struct btf_header *hdr = &env->btf->hdr;
 	int err;
 
+	/* Type section must align to 4 bytes */
+	if (hdr->type_off & (sizeof(u32) - 1)) {
+		btf_verifier_log(env, "Unaligned type_off");
+		return -EINVAL;
+	}
+
+	if (!hdr->type_len) {
+		btf_verifier_log(env, "No type found");
+		return -EINVAL;
+	}
+
 	err = btf_check_all_metas(env);
 	if (err)
 		return err;
@@ -1881,10 +1895,15 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
 	struct btf *btf = env->btf;
 	const char *start, *end;
 
-	hdr = btf->hdr;
+	hdr = &btf->hdr;
 	start = btf->nohdr_data + hdr->str_off;
 	end = start + hdr->str_len;
 
+	if (end != btf->data + btf->data_size) {
+		btf_verifier_log(env, "String section is not at the end");
+		return -EINVAL;
+	}
+
 	if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET ||
 	    start[0] || end[-1]) {
 		btf_verifier_log(env, "Invalid string section");
@@ -1896,20 +1915,122 @@ static int btf_parse_str_sec(struct btf_verifier_env *env)
 	return 0;
 }
 
-static int btf_parse_hdr(struct btf_verifier_env *env)
+static const size_t btf_sec_info_offset[] = {
+	offsetof(struct btf_header, type_off),
+	offsetof(struct btf_header, str_off),
+};
+
+static int btf_sec_info_cmp(const void *a, const void *b)
+{
+	const struct btf_sec_info *x = a;
+	const struct btf_sec_info *y = b;
+
+	return (int)(x->off - y->off) ? : (int)(x->len - y->len);
+}
+
+static int btf_check_sec_info(struct btf_verifier_env *env,
+			      u32 btf_data_size)
 {
+	const unsigned int nr_secs = ARRAY_SIZE(btf_sec_info_offset);
+	struct btf_sec_info secs[nr_secs];
+	u32 total, expected_total, i;
 	const struct btf_header *hdr;
-	struct btf *btf = env->btf;
-	u32 meta_left;
+	const struct btf *btf;
+
+	btf = env->btf;
+	hdr = &btf->hdr;
 
-	if (btf->data_size < sizeof(*hdr)) {
+	/* Populate the secs from hdr */
+	for (i = 0; i < nr_secs; i++)
+		secs[i] = *(struct btf_sec_info *)((void *)hdr +
+						   btf_sec_info_offset[i]);
+
+	sort(secs, nr_secs, sizeof(struct btf_sec_info),
+	     btf_sec_info_cmp, NULL);
+
+	/* Check for gaps and overlap among sections */
+	total = 0;
+	expected_total = btf_data_size - hdr->hdr_len;
+	for (i = 0; i < nr_secs; i++) {
+		if (expected_total < secs[i].off) {
+			btf_verifier_log(env, "Invalid section offset");
+			return -EINVAL;
+		}
+		if (total < secs[i].off) {
+			/* gap */
+			btf_verifier_log(env, "Unsupported section found");
+			return -EINVAL;
+		}
+		if (total > secs[i].off) {
+			btf_verifier_log(env, "Section overlap found");
+			return -EINVAL;
+		}
+		if (expected_total - total < secs[i].len) {
+			btf_verifier_log(env,
+					 "Total section length too long");
+			return -EINVAL;
+		}
+		total += secs[i].len;
+	}
+
+	/* There is data other than hdr and known sections */
+	if (expected_total != total) {
+		btf_verifier_log(env, "Unsupported section found");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int btf_parse_hdr(struct btf_verifier_env *env, void __user *btf_data,
+			 u32 btf_data_size)
+{
+	const struct btf_header *hdr;
+	u32 hdr_len, hdr_copy;
+	/*
+	 * Minimal part of the "struct btf_header" that
+	 * contains the hdr_len.
+	 */
+	struct btf_min_header {
+		u16	magic;
+		u8	version;
+		u8	flags;
+		u32	hdr_len;
+	} __user *min_hdr;
+	struct btf *btf;
+	int err;
+
+	btf = env->btf;
+	min_hdr = btf_data;
+
+	if (btf_data_size < sizeof(*min_hdr)) {
+		btf_verifier_log(env, "hdr_len not found");
+		return -EINVAL;
+	}
+
+	if (get_user(hdr_len, &min_hdr->hdr_len))
+		return -EFAULT;
+
+	if (btf_data_size < hdr_len) {
 		btf_verifier_log(env, "btf_header not found");
 		return -EINVAL;
 	}
 
-	btf_verifier_log_hdr(env);
+	err = bpf_check_uarg_tail_zero(btf_data, sizeof(btf->hdr), hdr_len);
+	if (err) {
+		if (err == -E2BIG)
+			btf_verifier_log(env, "Unsupported btf_header");
+		return err;
+	}
+
+	hdr_copy = min_t(u32, hdr_len, sizeof(btf->hdr));
+	if (copy_from_user(&btf->hdr, btf_data, hdr_copy))
+		return -EFAULT;
+
+	hdr = &btf->hdr;
+
+	btf_verifier_log_hdr(env, btf_data_size);
 
-	hdr = btf->hdr;
 	if (hdr->magic != BTF_MAGIC) {
 		btf_verifier_log(env, "Invalid magic");
 		return -EINVAL;
@@ -1925,26 +2046,14 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
 		return -ENOTSUPP;
 	}
 
-	meta_left = btf->data_size - sizeof(*hdr);
-	if (!meta_left) {
+	if (btf_data_size == hdr->hdr_len) {
 		btf_verifier_log(env, "No data");
 		return -EINVAL;
 	}
 
-	if (meta_left < hdr->type_off || hdr->str_off <= hdr->type_off ||
-	    /* Type section must align to 4 bytes */
-	    hdr->type_off & (sizeof(u32) - 1)) {
-		btf_verifier_log(env, "Invalid type_off");
-		return -EINVAL;
-	}
-
-	if (meta_left < hdr->str_off ||
-	    meta_left - hdr->str_off < hdr->str_len) {
-		btf_verifier_log(env, "Invalid str_off or str_len");
-		return -EINVAL;
-	}
-
-	btf->nohdr_data = btf->hdr + 1;
+	err = btf_check_sec_info(env, btf_data_size);
+	if (err)
+		return err;
 
 	return 0;
 }
@@ -1987,6 +2096,11 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
 		err = -ENOMEM;
 		goto errout;
 	}
+	env->btf = btf;
+
+	err = btf_parse_hdr(env, btf_data, btf_data_size);
+	if (err)
+		goto errout;
 
 	data = kvmalloc(btf_data_size, GFP_KERNEL | __GFP_NOWARN);
 	if (!data) {
@@ -1996,18 +2110,13 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
 
 	btf->data = data;
 	btf->data_size = btf_data_size;
+	btf->nohdr_data = btf->data + btf->hdr.hdr_len;
 
 	if (copy_from_user(data, btf_data, btf_data_size)) {
 		err = -EFAULT;
 		goto errout;
 	}
 
-	env->btf = btf;
-
-	err = btf_parse_hdr(env);
-	if (err)
-		goto errout;
-
 	err = btf_parse_str_sec(env);
 	if (err)
 		goto errout;
@@ -2016,16 +2125,14 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
 	if (err)
 		goto errout;
 
-	if (!err && log->level && bpf_verifier_log_full(log)) {
+	if (log->level && bpf_verifier_log_full(log)) {
 		err = -ENOSPC;
 		goto errout;
 	}
 
-	if (!err) {
-		btf_verifier_env_free(env);
-		refcount_set(&btf->refcnt, 1);
-		return btf;
-	}
+	btf_verifier_env_free(env);
+	refcount_set(&btf->refcnt, 1);
+	return btf;
 
 errout:
 	btf_verifier_env_free(env);
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v2 4/7] bpf: btf: Remove unused bits from uapi/linux/btf.h
From: Martin KaFai Lau @ 2018-05-22 21:57 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Yonghong Song, kernel-team
In-Reply-To: <20180522215723.784040-1-kafai@fb.com>

This patch does the followings:
1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k.  We can
   raise it later.

2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID.  They are
   currently encoded at the highest bit of a u32.
   It is because the current use case does not require supporting
   parent type (i.e type_id referring to a type in another BTF file).
   It also does not support referring to a string in ELF.

   The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced
   by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are
   defined in btf.c instead of uapi/linux/btf.h.

3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough.
   There is unused bits headroom if we ever needed it later.

4. The root bit in BTF_INFO is also removed because it is not
   used in the current use case.

5. Remove BTF_INT_VARARGS since func type is not supported now.
   The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits.

The above can be added back later because the verifier
ensures the unused bits are zeros.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/btf.h | 29 +++++++++------------------
 kernel/bpf/btf.c         | 52 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index 4fa479741a02..0b5ddbe135a4 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -22,28 +22,19 @@ struct btf_header {
 };
 
 /* Max # of type identifier */
-#define BTF_MAX_TYPE	0x7fffffff
+#define BTF_MAX_TYPE	0x0000ffff
 /* Max offset into the string section */
-#define BTF_MAX_NAME_OFFSET	0x7fffffff
+#define BTF_MAX_NAME_OFFSET	0x0000ffff
 /* Max # of struct/union/enum members or func args */
 #define BTF_MAX_VLEN	0xffff
 
-/* The type id is referring to a parent BTF */
-#define BTF_TYPE_PARENT(id)	(((id) >> 31) & 0x1)
-#define BTF_TYPE_ID(id)		((id) & BTF_MAX_TYPE)
-
-/* String is in the ELF string section */
-#define BTF_STR_TBL_ELF_ID(ref)	(((ref) >> 31) & 0x1)
-#define BTF_STR_OFFSET(ref)	((ref) & BTF_MAX_NAME_OFFSET)
-
 struct btf_type {
 	__u32 name_off;
 	/* "info" bits arrangement
 	 * bits  0-15: vlen (e.g. # of struct's members)
 	 * bits 16-23: unused
-	 * bits 24-28: kind (e.g. int, ptr, array...etc)
-	 * bits 29-30: unused
-	 * bits    31: root
+	 * bits 24-27: kind (e.g. int, ptr, array...etc)
+	 * bits 28-31: unused
 	 */
 	__u32 info;
 	/* "size" is used by INT, ENUM, STRUCT and UNION.
@@ -58,8 +49,7 @@ struct btf_type {
 	};
 };
 
-#define BTF_INFO_KIND(info)	(((info) >> 24) & 0x1f)
-#define BTF_INFO_ISROOT(info)	(!!(((info) >> 24) & 0x80))
+#define BTF_INFO_KIND(info)	(((info) >> 24) & 0x0f)
 #define BTF_INFO_VLEN(info)	((info) & 0xffff)
 
 #define BTF_KIND_UNKN		0	/* Unknown	*/
@@ -84,15 +74,14 @@ struct btf_type {
 /* BTF_KIND_INT is followed by a u32 and the following
  * is the 32 bits arrangement:
  */
-#define BTF_INT_ENCODING(VAL)	(((VAL) & 0xff000000) >> 24)
+#define BTF_INT_ENCODING(VAL)	(((VAL) & 0x0f000000) >> 24)
 #define BTF_INT_OFFSET(VAL)	(((VAL  & 0x00ff0000)) >> 16)
 #define BTF_INT_BITS(VAL)	((VAL)  & 0x0000ffff)
 
 /* Attributes stored in the BTF_INT_ENCODING */
-#define BTF_INT_SIGNED	0x1
-#define BTF_INT_CHAR	0x2
-#define BTF_INT_BOOL	0x4
-#define BTF_INT_VARARGS	0x8
+#define BTF_INT_SIGNED	(1 << 0)
+#define BTF_INT_CHAR	(1 << 1)
+#define BTF_INT_BOOL	(1 << 2)
 
 /* BTF_KIND_ENUM is followed by multiple "struct btf_enum".
  * The exact number of btf_enum is stored in the vlen (of the
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e388a6598de2..9cbeabb5aca3 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -163,13 +163,16 @@
 #define BITS_ROUNDUP_BYTES(bits) \
 	(BITS_ROUNDDOWN_BYTES(bits) + !!BITS_PER_BYTE_MASKED(bits))
 
+#define BTF_INFO_MASK 0x0f00ffff
+#define BTF_INT_MASK 0x0fffffff
+#define BTF_TYPE_ID_VALID(type_id) ((type_id) <= BTF_MAX_TYPE)
+#define BTF_STR_OFFSET_VALID(name_off) ((name_off) <= BTF_MAX_NAME_OFFSET)
+
 /* 16MB for 64k structs and each has 16 members and
  * a few MB spaces for the string section.
  * The hard limit is S32_MAX.
  */
 #define BTF_MAX_SIZE (16 * 1024 * 1024)
-/* 64k. We can raise it later. The hard limit is S32_MAX. */
-#define BTF_MAX_NR_TYPES 65535
 
 #define for_each_member(i, struct_type, member)			\
 	for (i = 0, member = btf_type_member(struct_type);	\
@@ -383,8 +386,6 @@ static const char *btf_int_encoding_str(u8 encoding)
 		return "CHAR";
 	else if (encoding == BTF_INT_BOOL)
 		return "BOOL";
-	else if (encoding == BTF_INT_VARARGS)
-		return "VARARGS";
 	else
 		return "UNKN";
 }
@@ -421,16 +422,16 @@ static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
 
 static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
 {
-	return !BTF_STR_TBL_ELF_ID(offset) &&
-		BTF_STR_OFFSET(offset) < btf->hdr.str_len;
+	return BTF_STR_OFFSET_VALID(offset) &&
+		offset < btf->hdr.str_len;
 }
 
 static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
 {
-	if (!BTF_STR_OFFSET(offset))
+	if (!offset)
 		return "(anon)";
-	else if (BTF_STR_OFFSET(offset) < btf->hdr.str_len)
-		return &btf->strings[BTF_STR_OFFSET(offset)];
+	else if (offset < btf->hdr.str_len)
+		return &btf->strings[offset];
 	else
 		return "(invalid-name-offset)";
 }
@@ -598,13 +599,13 @@ static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
 		struct btf_type **new_types;
 		u32 expand_by, new_size;
 
-		if (btf->types_size == BTF_MAX_NR_TYPES) {
+		if (btf->types_size == BTF_MAX_TYPE) {
 			btf_verifier_log(env, "Exceeded max num of types");
 			return -E2BIG;
 		}
 
 		expand_by = max_t(u32, btf->types_size >> 2, 16);
-		new_size = min_t(u32, BTF_MAX_NR_TYPES,
+		new_size = min_t(u32, BTF_MAX_TYPE,
 				 btf->types_size + expand_by);
 
 		new_types = kvzalloc(new_size * sizeof(*new_types),
@@ -934,6 +935,12 @@ static s32 btf_int_check_meta(struct btf_verifier_env *env,
 	}
 
 	int_data = btf_type_int(t);
+	if (int_data & ~BTF_INT_MASK) {
+		btf_verifier_log_basic(env, t, "Invalid int_data:%x",
+				       int_data);
+		return -EINVAL;
+	}
+
 	nr_bits = BTF_INT_BITS(int_data) + BTF_INT_OFFSET(int_data);
 
 	if (nr_bits > BITS_PER_U64) {
@@ -947,12 +954,17 @@ static s32 btf_int_check_meta(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
+	/*
+	 * Only one of the encoding bits is allowed and it
+	 * should be sufficient for the pretty print purpose (i.e. decoding).
+	 * Multiple bits can be allowed later if it is found
+	 * to be insufficient.
+	 */
 	encoding = BTF_INT_ENCODING(int_data);
 	if (encoding &&
 	    encoding != BTF_INT_SIGNED &&
 	    encoding != BTF_INT_CHAR &&
-	    encoding != BTF_INT_BOOL &&
-	    encoding != BTF_INT_VARARGS) {
+	    encoding != BTF_INT_BOOL) {
 		btf_verifier_log_type(env, t, "Unsupported encoding");
 		return -ENOTSUPP;
 	}
@@ -1126,7 +1138,7 @@ static int btf_ref_type_check_meta(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (BTF_TYPE_PARENT(t->type)) {
+	if (!BTF_TYPE_ID_VALID(t->type)) {
 		btf_verifier_log_type(env, t, "Invalid type_id");
 		return -EINVAL;
 	}
@@ -1333,12 +1345,12 @@ static s32 btf_array_check_meta(struct btf_verifier_env *env,
 	/* Array elem type and index type cannot be in type void,
 	 * so !array->type and !array->index_type are not allowed.
 	 */
-	if (!array->type || BTF_TYPE_PARENT(array->type)) {
+	if (!array->type || !BTF_TYPE_ID_VALID(array->type)) {
 		btf_verifier_log_type(env, t, "Invalid elem");
 		return -EINVAL;
 	}
 
-	if (!array->index_type || BTF_TYPE_PARENT(array->index_type)) {
+	if (!array->index_type || !BTF_TYPE_ID_VALID(array->index_type)) {
 		btf_verifier_log_type(env, t, "Invalid index");
 		return -EINVAL;
 	}
@@ -1507,7 +1519,7 @@ static s32 btf_struct_check_meta(struct btf_verifier_env *env,
 		}
 
 		/* A member cannot be in type void */
-		if (!member->type || BTF_TYPE_PARENT(member->type)) {
+		if (!member->type || !BTF_TYPE_ID_VALID(member->type)) {
 			btf_verifier_log_member(env, t, member,
 						"Invalid type_id");
 			return -EINVAL;
@@ -1760,6 +1772,12 @@ static s32 btf_check_meta(struct btf_verifier_env *env,
 	}
 	meta_left -= sizeof(*t);
 
+	if (t->info & ~BTF_INFO_MASK) {
+		btf_verifier_log(env, "[%u] Invalid btf_info:%x",
+				 env->log_type_id, t->info);
+		return -EINVAL;
+	}
+
 	if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX ||
 	    BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
 		btf_verifier_log(env, "[%u] Invalid kind:%u",
-- 
2.9.5

^ permalink raw reply related

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 21:54 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <2fcc22b088fd04dafbdc1582d03c4bca21eefade.camel@intel.com>

On Tue, May 22, 2018 at 02:50:32PM -0700, Jeff Kirsher wrote:
> On Tue, 2018-05-22 at 15:33 -0600, Jason Gunthorpe wrote:
> > On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> > > On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > > > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > > > From: Sindhu Devale <sindhu.devale@intel.com>
> > > > > 
> > > > > Currently i40iw is dependent on i40e symbols
> > > > > i40e_register_client and i40e_unregister_client due to
> > > > > which i40iw cannot be loaded without i40e being loaded.
> > > > > 
> > > > > This patch allows RDMA driver to build and load without
> > > > > linking to LAN driver and without LAN driver being loaded
> > > > > first. Once the LAN driver is loaded, the RDMA driver
> > > > > is notified through the netdevice notifiers to register
> > > > > as client to the LAN driver. Add function pointers to IDC
> > > > > register/unregister in the private VSI structure. This
> > > > > allows a RDMA driver to build without linking to i40e.
> > > > 
> > > > Why would you want to do this? The rdma driver is non-functional
> > > > without the ethernet driver, so why on earth would we want to
> > > > defeat
> > > > the module dependency mechanism?
> > > 
> > > This change is driven by the OSV's like Red Hat, where customer's
> > > were
> > > updating the i40e driver, which in turn broke i40iw.
> > 
> > So that isn't a reason to put something into the main line kernel,
> > and
> > I'm deeply skeptical this change is even sane.
> > 
> > It has been a while since I've looked at RH's kernel, but AFAIK,
> > breakage should only happen if the ABIs around
> > i40e_unregister_client/etc change..
> > 
> > So if the i40e module updates breaks the ABI, then stuffing that same
> > broken ABI through a function pointer is *totally* wrong.
> > 
> > Looks like a NAK for the RDMA side.
> 
> The ABI rarely changes, if at all.  The issue OSV's are seeing is that
> upgrading i40e, requires that i40iw be recompiled even though there
> were no updates/changes to the ABI.

Why does it need recompiling? Details please.

Jason

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 21:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522213334.GE7502@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

On Tue, 2018-05-22 at 15:33 -0600, Jason Gunthorpe wrote:
> On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> > On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > > From: Sindhu Devale <sindhu.devale@intel.com>
> > > > 
> > > > Currently i40iw is dependent on i40e symbols
> > > > i40e_register_client and i40e_unregister_client due to
> > > > which i40iw cannot be loaded without i40e being loaded.
> > > > 
> > > > This patch allows RDMA driver to build and load without
> > > > linking to LAN driver and without LAN driver being loaded
> > > > first. Once the LAN driver is loaded, the RDMA driver
> > > > is notified through the netdevice notifiers to register
> > > > as client to the LAN driver. Add function pointers to IDC
> > > > register/unregister in the private VSI structure. This
> > > > allows a RDMA driver to build without linking to i40e.
> > > 
> > > Why would you want to do this? The rdma driver is non-functional
> > > without the ethernet driver, so why on earth would we want to
> > > defeat
> > > the module dependency mechanism?
> > 
> > This change is driven by the OSV's like Red Hat, where customer's
> > were
> > updating the i40e driver, which in turn broke i40iw.
> 
> So that isn't a reason to put something into the main line kernel,
> and
> I'm deeply skeptical this change is even sane.
> 
> It has been a while since I've looked at RH's kernel, but AFAIK,
> breakage should only happen if the ABIs around
> i40e_unregister_client/etc change..
> 
> So if the i40e module updates breaks the ABI, then stuffing that same
> broken ABI through a function pointer is *totally* wrong.
> 
> Looks like a NAK for the RDMA side.

The ABI rarely changes, if at all.  The issue OSV's are seeing is that
upgrading i40e, requires that i40iw be recompiled even though there
were no updates/changes to the ABI.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 21:33 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <079ceee3bc8cd0ea50dd7ddc12b27512ca5ac49e.camel@intel.com>

On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > From: Sindhu Devale <sindhu.devale@intel.com>
> > > 
> > > Currently i40iw is dependent on i40e symbols
> > > i40e_register_client and i40e_unregister_client due to
> > > which i40iw cannot be loaded without i40e being loaded.
> > > 
> > > This patch allows RDMA driver to build and load without
> > > linking to LAN driver and without LAN driver being loaded
> > > first. Once the LAN driver is loaded, the RDMA driver
> > > is notified through the netdevice notifiers to register
> > > as client to the LAN driver. Add function pointers to IDC
> > > register/unregister in the private VSI structure. This
> > > allows a RDMA driver to build without linking to i40e.
> > 
> > Why would you want to do this? The rdma driver is non-functional
> > without the ethernet driver, so why on earth would we want to defeat
> > the module dependency mechanism?
> 
> This change is driven by the OSV's like Red Hat, where customer's were
> updating the i40e driver, which in turn broke i40iw.

So that isn't a reason to put something into the main line kernel, and
I'm deeply skeptical this change is even sane.

It has been a while since I've looked at RH's kernel, but AFAIK,
breakage should only happen if the ABIs around
i40e_unregister_client/etc change..

So if the i40e module updates breaks the ABI, then stuffing that same
broken ABI through a function pointer is *totally* wrong.

Looks like a NAK for the RDMA side.

Jason

^ permalink raw reply

* [PATCH 2/2] rfkill: Create rfkill-none LED trigger
From: João Paulo Rechi Vita @ 2018-05-22 21:29 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller
  Cc: linux, Michał Kępień, João Paulo Rechi Vita,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20180522212932.5357-1-jprvita@endlessm.com>

Creates a new trigger rfkill-none, as a complement to rfkill-any, which
drives LEDs when any radio is enabled. The new trigger is meant to turn
a LED ON whenever all radios are OFF, and turn it OFF otherwise.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
---
 net/rfkill/core.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 6d64d14f4b0a..07235520b00f 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -178,6 +178,7 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 }
 
 static struct led_trigger rfkill_any_led_trigger;
+static struct led_trigger rfkill_none_led_trigger;
 static struct work_struct rfkill_global_led_trigger_work;
 
 static void rfkill_global_led_trigger_worker(struct work_struct *work)
@@ -195,6 +196,8 @@ static void rfkill_global_led_trigger_worker(struct work_struct *work)
 	mutex_unlock(&rfkill_global_mutex);
 
 	led_trigger_event(&rfkill_any_led_trigger, brightness);
+	led_trigger_event(&rfkill_none_led_trigger, brightness == LED_OFF ?
+							LED_FULL : LED_OFF);
 }
 
 static void rfkill_global_led_trigger_event(void)
@@ -202,22 +205,32 @@ static void rfkill_global_led_trigger_event(void)
 	schedule_work(&rfkill_global_led_trigger_work);
 }
 
-static void rfkill_global_led_trigger_activate(struct led_classdev *led_cdev)
-{
-	rfkill_global_led_trigger_event();
-}
-
 static int rfkill_global_led_trigger_register(void)
 {
+	int ret;
+
 	INIT_WORK(&rfkill_global_led_trigger_work,
 			rfkill_global_led_trigger_worker);
+
 	rfkill_any_led_trigger.name = "rfkill-any";
-	rfkill_any_led_trigger.activate = rfkill_global_led_trigger_activate;
-	return led_trigger_register(&rfkill_any_led_trigger);
+	ret = led_trigger_register(&rfkill_any_led_trigger);
+	if (ret)
+		return ret;
+
+	rfkill_none_led_trigger.name = "rfkill-none";
+	ret = led_trigger_register(&rfkill_none_led_trigger);
+	if (ret)
+		led_trigger_unregister(&rfkill_any_led_trigger);
+	else
+		/* Delay activation until all global triggers are registered */
+		rfkill_global_led_trigger_event();
+
+	return ret;
 }
 
 static void rfkill_global_led_trigger_unregister(void)
 {
+	led_trigger_unregister(&rfkill_none_led_trigger);
 	led_trigger_unregister(&rfkill_any_led_trigger);
 	cancel_work_sync(&rfkill_global_led_trigger_work);
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH 1/2] rfkill: Rename rfkill_any_led_trigger* functions
From: João Paulo Rechi Vita @ 2018-05-22 21:29 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller
  Cc: linux, Michał Kępień, João Paulo Rechi Vita,
	linux-wireless, netdev, linux-kernel

Rename these functions to rfkill_global_led_trigger*, as they are going
to be extended to handle another global rfkill led trigger.

This commit does not change any functionality.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
---
 net/rfkill/core.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 59d0eb960275..6d64d14f4b0a 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -178,9 +178,9 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 }
 
 static struct led_trigger rfkill_any_led_trigger;
-static struct work_struct rfkill_any_work;
+static struct work_struct rfkill_global_led_trigger_work;
 
-static void rfkill_any_led_trigger_worker(struct work_struct *work)
+static void rfkill_global_led_trigger_worker(struct work_struct *work)
 {
 	enum led_brightness brightness = LED_OFF;
 	struct rfkill *rfkill;
@@ -197,28 +197,29 @@ static void rfkill_any_led_trigger_worker(struct work_struct *work)
 	led_trigger_event(&rfkill_any_led_trigger, brightness);
 }
 
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
 {
-	schedule_work(&rfkill_any_work);
+	schedule_work(&rfkill_global_led_trigger_work);
 }
 
-static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+static void rfkill_global_led_trigger_activate(struct led_classdev *led_cdev)
 {
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 }
 
-static int rfkill_any_led_trigger_register(void)
+static int rfkill_global_led_trigger_register(void)
 {
-	INIT_WORK(&rfkill_any_work, rfkill_any_led_trigger_worker);
+	INIT_WORK(&rfkill_global_led_trigger_work,
+			rfkill_global_led_trigger_worker);
 	rfkill_any_led_trigger.name = "rfkill-any";
-	rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
+	rfkill_any_led_trigger.activate = rfkill_global_led_trigger_activate;
 	return led_trigger_register(&rfkill_any_led_trigger);
 }
 
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
 {
 	led_trigger_unregister(&rfkill_any_led_trigger);
-	cancel_work_sync(&rfkill_any_work);
+	cancel_work_sync(&rfkill_global_led_trigger_work);
 }
 #else
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
@@ -234,16 +235,16 @@ static inline void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 }
 
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
 {
 }
 
-static int rfkill_any_led_trigger_register(void)
+static int rfkill_global_led_trigger_register(void)
 {
 	return 0;
 }
 
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
 {
 }
 #endif /* CONFIG_RFKILL_LEDS */
@@ -354,7 +355,7 @@ static void rfkill_set_block(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	if (prev != curr)
 		rfkill_event(rfkill);
@@ -535,7 +536,7 @@ bool rfkill_set_hw_state(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	if (rfkill->registered && prev != blocked)
 		schedule_work(&rfkill->uevent_work);
@@ -579,7 +580,7 @@ bool rfkill_set_sw_state(struct rfkill *rfkill, bool blocked)
 		schedule_work(&rfkill->uevent_work);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	return blocked;
 }
@@ -629,7 +630,7 @@ void rfkill_set_states(struct rfkill *rfkill, bool sw, bool hw)
 			schedule_work(&rfkill->uevent_work);
 
 		rfkill_led_trigger_event(rfkill);
-		rfkill_any_led_trigger_event();
+		rfkill_global_led_trigger_event();
 	}
 }
 EXPORT_SYMBOL(rfkill_set_states);
@@ -1046,7 +1047,7 @@ int __must_check rfkill_register(struct rfkill *rfkill)
 #endif
 	}
 
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 	rfkill_send_events(rfkill, RFKILL_OP_ADD);
 
 	mutex_unlock(&rfkill_global_mutex);
@@ -1079,7 +1080,7 @@ void rfkill_unregister(struct rfkill *rfkill)
 	mutex_lock(&rfkill_global_mutex);
 	rfkill_send_events(rfkill, RFKILL_OP_DEL);
 	list_del_init(&rfkill->node);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 	mutex_unlock(&rfkill_global_mutex);
 
 	rfkill_led_trigger_unregister(rfkill);
@@ -1332,7 +1333,7 @@ static int __init rfkill_init(void)
 	if (error)
 		goto error_misc;
 
-	error = rfkill_any_led_trigger_register();
+	error = rfkill_global_led_trigger_register();
 	if (error)
 		goto error_led_trigger;
 
@@ -1346,7 +1347,7 @@ static int __init rfkill_init(void)
 
 #ifdef CONFIG_RFKILL_INPUT
 error_input:
-	rfkill_any_led_trigger_unregister();
+	rfkill_global_led_trigger_unregister();
 #endif
 error_led_trigger:
 	misc_deregister(&rfkill_miscdev);
@@ -1362,7 +1363,7 @@ static void __exit rfkill_exit(void)
 #ifdef CONFIG_RFKILL_INPUT
 	rfkill_handler_exit();
 #endif
-	rfkill_any_led_trigger_unregister();
+	rfkill_global_led_trigger_unregister();
 	misc_deregister(&rfkill_miscdev);
 	class_unregister(&rfkill_class);
 }
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-22 21:15 UTC (permalink / raw)
  To: Rajkumar Manoharan
  Cc: Kalle Valo, David S. Miller, ath10k, linux-wireless, netdev,
	linux-kernel, linux-wireless-owner, erik.stromdahl
In-Reply-To: <8195be7603a8cd659d25a9c3d898b891@codeaurora.org>

On Mon, May 21, 2018 at 04:11:38PM -0700, Rajkumar Manoharan wrote:
> On 2018-05-21 13:43, Niklas Cassel wrote:
> > The following problem was observed when running iperf:
> [...]
> > 
> > In order to avoid trying to flush the queue every time we free a frame,
> > only do this when there are 3 or less frames pending, and while we
> > actually have frames in the queue. This logic was copied from
> > mt76_txq_schedule (mt76), one of few other drivers that are actually
> > using wake_tx_queue.
> > 
> > Suggested-by: Toke Høiland-Jørgensen <toke@toke.dk>
> > Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
> > ---
> > Changes since V1: use READ_ONCE() to disallow the compiler
> > optimizing things in undesirable ways.
> > 
> >  drivers/net/wireless/ath/ath10k/txrx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/wireless/ath/ath10k/txrx.c
> > b/drivers/net/wireless/ath/ath10k/txrx.c
> > index cda164f6e9f6..264cf0bd5c00 100644
> > --- a/drivers/net/wireless/ath/ath10k/txrx.c
> > +++ b/drivers/net/wireless/ath/ath10k/txrx.c
> > @@ -95,6 +95,9 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
> >  		wake_up(&htt->empty_tx_wq);
> >  	spin_unlock_bh(&htt->tx_lock);
> > 
> > +	if (READ_ONCE(htt->num_pending_tx) <= 3 && !list_empty(&ar->txqs))
> > +		ath10k_mac_tx_push_pending(ar);
> > +
> Niklas,

Hello Rajkumar

> 
> Sorry for the late response. ath10k_mac_tx_push_pending is already called
> at the end of NAPI handler. Isn't that enough to process pending frames?

This is true for e.g. ATH10K_BUS_PCI and ATH10K_BUS_SNOC,
but not for e.g. ATH10K_BUS_SDIO and ATH10K_BUS_USB.

While there is some SDIO code merged in Kalle's tree already,
this problem was found when merging
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/?h=ath10k-pending-sdio-usb
with Kalle's ath-next branch.

> 
> Earlier we observed performance issues in calling push_pending from each
> tx completion. IMHO this change may introduce the same problem again.

I prefer functional TX over performance issues,
but I agree that it is unfortunate that SDIO doesn't use
ath10k_htt_txrx_compl_task().
Erik, is there a reason for this?

Perhaps it would be possible to call ath10k_mac_tx_push_pending()
from the equivalent to ath10k_htt_txrx_compl_task(),
but from SDIO's point of view.

Another solution might be to change so that we only call
ath10k_mac_tx_push_pending() from ath10k_txrx_tx_unref()
if (htt->num_pending_tx == 0). That should decrease the number
of calls to ath10k_mac_tx_push_pending(), while still avoiding
a "TX deadlock" scenario for SDIO.

Regards,
Niklas

^ permalink raw reply

* [PATCH net-next v5 3/3] selftests: net: initial fib rule tests
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 248 ++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index e60dddb..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100755
index 0000000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+	local rc=$1
+	local expected=$2
+	local msg="$3"
+
+	if [ ${rc} -eq ${expected} ]; then
+		nsuccess=$((nsuccess+1))
+		printf "\n    TEST: %-50s  [ OK ]\n" "${msg}"
+	else
+		nfail=$((nfail+1))
+		printf "\n    TEST: %-50s  [FAIL]\n" "${msg}"
+		if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+			echo
+			echo "hit enter to continue, 'q' to quit"
+			read a
+			[ "$a" = "q" ] && exit 1
+		fi
+	fi
+}
+
+log_section()
+{
+	echo
+	echo "######################################################################"
+	echo "TEST SECTION: $*"
+	echo "######################################################################"
+}
+
+setup()
+{
+	set -e
+	ip netns add testns
+	$IP link set dev lo up
+
+	$IP link add dummy0 type dummy
+	$IP link set dev dummy0 up
+	$IP address add 198.51.100.1/24 dev dummy0
+	$IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+	set +e
+}
+
+cleanup()
+{
+	$IP link del dev dummy0 &> /dev/null
+	ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+	ip rule help 2>&1 | grep -q $1
+	if [ $? -ne 0 ]; then
+		echo "SKIP: iproute2 iprule too old, missing $1 match"
+		return 1
+	fi
+
+	ip route get help 2>&1 | grep -q $2
+	if [ $? -ne 0 ]; then
+		echo "SKIP: iproute2 get route too old, missing $2 match"
+		return 1
+	fi
+
+	return 0
+}
+
+fib_rule6_del()
+{
+	$IP -6 rule del $1
+	log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+	pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+	$IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+	local match="$1"
+	local getmatch="$2"
+
+	$IP -6 rule add $match table $RTABLE
+	$IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+	log_test $? 0 "rule6 check: $1"
+
+	fib_rule6_del_by_pref "$match"
+	log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+	# setup the fib rule redirect route
+	$IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+	match="oif $DEV"
+	fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+	match="from $SRC_IP6 iif $DEV"
+	fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+	match="tos 0x10"
+	fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+	match="fwmark 0x64"
+	getmatch="mark 0x64"
+	fib_rule6_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+	fib_check_iproute_support "uidrange" "uid"
+	if [ $? -eq 0 ]; then
+		match="uidrange 100-100"
+		getmatch="uid 100"
+		fib_rule6_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+	fi
+
+	fib_check_iproute_support "sport" "sport"
+	if [ $? -eq 0 ]; then
+		match="sport 666 dport 777"
+		fib_rule6_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto tcp"
+		fib_rule6_test_match_n_redirect "$match" "$match" "ipproto match"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto icmp"
+		fib_rule6_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+	fi
+}
+
+fib_rule4_del()
+{
+	$IP rule del $1
+	log_test $? 0 "del $1"
+}
+
+fib_rule4_del_by_pref()
+{
+	pref=$($IP rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+	$IP rule del pref $pref
+}
+
+fib_rule4_test_match_n_redirect()
+{
+	local match="$1"
+	local getmatch="$2"
+
+	$IP rule add $match table $RTABLE
+	$IP route get $GW_IP4 $getmatch | grep -q "table $RTABLE"
+	log_test $? 0 "rule4 check: $1"
+
+	fib_rule4_del_by_pref "$match"
+	log_test $? 0 "rule4 del by pref: $match"
+}
+
+fib_rule4_test()
+{
+	# setup the fib rule redirect route
+	$IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink
+
+	match="oif $DEV"
+	fib_rule4_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+	match="from $SRC_IP iif $DEV"
+	fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+	match="tos 0x10"
+	fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+	match="fwmark 0x64"
+	getmatch="mark 0x64"
+	fib_rule4_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+	fib_check_iproute_support "uidrange" "uid"
+	if [ $? -eq 0 ]; then
+		match="uidrange 100-100"
+		getmatch="uid 100"
+		fib_rule4_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+	fi
+
+	fib_check_iproute_support "sport" "sport"
+	if [ $? -eq 0 ]; then
+		match="sport 666 dport 777"
+		fib_rule4_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto tcp"
+		fib_rule4_test_match_n_redirect "$match" "$match" "ipproto tcp match"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto icmp"
+		fib_rule4_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+	fi
+}
+
+run_fibrule_tests()
+{
+	log_section "IPv4 fib rule"
+	fib_rule4_test
+	log_section "IPv6 fib rule"
+	fib_rule6_test
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit 0
+fi
+
+# start clean
+cleanup &> /dev/null
+setup
+run_fibrule_tests
+cleanup
+
+exit $ret
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 2/3] ipv6: support sport, dport and ip_proto in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv6/route.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bcb8785..038d661 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,6 +63,7 @@
 #include <net/lwtunnel.h>
 #include <net/ip_tunnels.h>
 #include <net/l3mdev.h>
+#include <net/ip.h>
 #include <trace/events/fib6.h>
 
 #include <linux/uaccess.h>
@@ -4083,6 +4084,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
 	[RTA_TABLE]		= { .type = NLA_U32 },
+	[RTA_IP_PROTO]		= { .type = NLA_U8 },
+	[RTA_SPORT]		= { .type = NLA_U16 },
+	[RTA_DPORT]		= { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4795,6 +4799,19 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	else
 		fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+	if (tb[RTA_SPORT])
+		fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+	if (tb[RTA_DPORT])
+		fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+	if (tb[RTA_IP_PROTO]) {
+		err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+						  &fl6.flowi6_proto, extack);
+		if (err)
+			goto errout;
+	}
+
 	if (iif) {
 		struct net_device *dev;
 		int flags = 0;
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/ip.h               |   3 +
 include/uapi/linux/rtnetlink.h |   3 +
 net/ipv4/Makefile              |   2 +-
 net/ipv4/fib_frontend.c        |   3 +
 net/ipv4/netlink.c             |  23 +++++++
 net/ipv4/route.c               | 146 ++++++++++++++++++++++++++++++-----------
 6 files changed, 140 insertions(+), 40 deletions(-)
 create mode 100644 net/ipv4/netlink.c

diff --git a/include/net/ip.h b/include/net/ip.h
index bada1f1..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -664,4 +664,7 @@ extern int sysctl_icmp_msgs_burst;
 int ip_misc_proc_init(void);
 #endif
 
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+				struct netlink_ext_ack *extack);
+
 #endif	/* _IP_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@ enum rtattr_type_t {
 	RTA_PAD,
 	RTA_UID,
 	RTA_TTL_PROPAGATE,
+	RTA_IP_PROTO,
+	RTA_SPORT,
+	RTA_DPORT,
 	__RTA_MAX
 };
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520..13f2ba9 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,7 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o
+	     metrics.o netlink.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..897ae92 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,9 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
 	[RTA_ENCAP]		= { .type = NLA_NESTED },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_IP_PROTO]		= { .type = NLA_U8 },
+	[RTA_SPORT]		= { .type = NLA_U16 },
+	[RTA_DPORT]		= { .type = NLA_U16 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 0000000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/types.h>
+#include <net/net_namespace.h>
+#include <net/netlink.h>
+#include <net/ip.h>
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+				struct netlink_ext_ack *extack)
+{
+	*ip_proto = nla_get_u8(attr);
+
+	switch (*ip_proto) {
+	case IPPROTO_TCP:
+	case IPPROTO_UDP:
+	case IPPROTO_ICMP:
+		return 0;
+	default:
+		NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+		return -EOPNOTSUPP;
+	}
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2cfa1b5..0e401dc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2574,11 +2574,10 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+			struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+			struct sk_buff *skb, u32 portid, u32 seq)
 {
-	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
 	unsigned long expires = 0;
@@ -2674,7 +2673,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			}
 		} else
 #endif
-			if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+			if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
 				goto nla_put_failure;
 	}
 
@@ -2689,43 +2688,93 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	return -EMSGSIZE;
 }
 
+static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
+						   u8 ip_proto, __be16 sport,
+						   __be16 dport)
+{
+	struct sk_buff *skb;
+	struct iphdr *iph;
+
+	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+
+	/* Reserve room for dummy headers, this skb can pass
+	 * through good chunk of routing engine.
+	 */
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	skb->protocol = htons(ETH_P_IP);
+	iph = skb_put(skb, sizeof(struct iphdr));
+	iph->protocol = ip_proto;
+	iph->saddr = src;
+	iph->daddr = dst;
+	iph->version = 0x4;
+	iph->frag_off = 0;
+	iph->ihl = 0x5;
+	skb_set_transport_header(skb, skb->len);
+
+	switch (iph->protocol) {
+	case IPPROTO_UDP: {
+		struct udphdr *udph;
+
+		udph = skb_put_zero(skb, sizeof(struct udphdr));
+		udph->source = sport;
+		udph->dest = dport;
+		udph->len = sizeof(struct udphdr);
+		udph->check = 0;
+		break;
+	}
+	case IPPROTO_TCP: {
+		struct tcphdr *tcph;
+
+		tcph = skb_put_zero(skb, sizeof(struct tcphdr));
+		tcph->source	= sport;
+		tcph->dest	= dport;
+		tcph->doff	= sizeof(struct tcphdr) / 4;
+		tcph->rst = 1;
+		tcph->check = ~tcp_v4_check(sizeof(struct tcphdr),
+					    src, dst, 0);
+		break;
+	}
+	case IPPROTO_ICMP: {
+		struct icmphdr *icmph;
+
+		icmph = skb_put_zero(skb, sizeof(struct icmphdr));
+		icmph->type = ICMP_ECHO;
+		icmph->code = 0;
+	}
+	}
+
+	return skb;
+}
+
 static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 			     struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(in_skb->sk);
-	struct rtmsg *rtm;
 	struct nlattr *tb[RTA_MAX+1];
+	u32 table_id = RT_TABLE_MAIN;
+	__be16 sport = 0, dport = 0;
 	struct fib_result res = {};
+	u8 ip_proto = IPPROTO_UDP;
 	struct rtable *rt = NULL;
+	struct sk_buff *skb;
+	struct rtmsg *rtm;
 	struct flowi4 fl4;
 	__be32 dst = 0;
 	__be32 src = 0;
+	kuid_t uid;
 	u32 iif;
 	int err;
 	int mark;
-	struct sk_buff *skb;
-	u32 table_id = RT_TABLE_MAIN;
-	kuid_t uid;
 
 	err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy,
 			  extack);
 	if (err < 0)
-		goto errout;
+		return err;
 
 	rtm = nlmsg_data(nlh);
-
-	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
-	if (!skb) {
-		err = -ENOBUFS;
-		goto errout;
-	}
-
-	/* Reserve room for dummy headers, this skb can pass
-	   through good chunk of routing engine.
-	 */
-	skb_reset_mac_header(skb);
-	skb_reset_network_header(skb);
-
 	src = tb[RTA_SRC] ? nla_get_in_addr(tb[RTA_SRC]) : 0;
 	dst = tb[RTA_DST] ? nla_get_in_addr(tb[RTA_DST]) : 0;
 	iif = tb[RTA_IIF] ? nla_get_u32(tb[RTA_IIF]) : 0;
@@ -2735,14 +2784,22 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	else
 		uid = (iif ? INVALID_UID : current_uid());
 
-	/* Bugfix: need to give ip_route_input enough of an IP header to
-	 * not gag.
-	 */
-	ip_hdr(skb)->protocol = IPPROTO_UDP;
-	ip_hdr(skb)->saddr = src;
-	ip_hdr(skb)->daddr = dst;
+	if (tb[RTA_IP_PROTO]) {
+		err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+						  &ip_proto, extack);
+		if (err)
+			return err;
+	}
+
+	if (tb[RTA_SPORT])
+		sport = nla_get_be16(tb[RTA_SPORT]);
 
-	skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
+	if (tb[RTA_DPORT])
+		dport = nla_get_be16(tb[RTA_DPORT]);
+
+	skb = inet_rtm_getroute_build_skb(src, dst, ip_proto, sport, dport);
+	if (!skb)
+		return -ENOBUFS;
 
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.daddr = dst;
@@ -2751,6 +2808,11 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = uid;
+	if (sport)
+		fl4.fl4_sport = sport;
+	if (dport)
+		fl4.fl4_dport = dport;
+	fl4.flowi4_proto = ip_proto;
 
 	rcu_read_lock();
 
@@ -2760,10 +2822,10 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		dev = dev_get_by_index_rcu(net, iif);
 		if (!dev) {
 			err = -ENODEV;
-			goto errout_free;
+			goto errout_rcu;
 		}
 
-		skb->protocol	= htons(ETH_P_IP);
+		fl4.flowi4_iif = iif; /* for rt_fill_info */
 		skb->dev	= dev;
 		skb->mark	= mark;
 		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
@@ -2783,7 +2845,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	}
 
 	if (err)
-		goto errout_free;
+		goto errout_rcu;
 
 	if (rtm->rtm_flags & RTM_F_NOTIFY)
 		rt->rt_flags |= RTCF_NOTIFY;
@@ -2791,34 +2853,40 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
 		table_id = res.table ? res.table->tb_id : 0;
 
+	/* reset skb for netlink reply msg */
+	skb_trim(skb, 0);
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb_reset_mac_header(skb);
+
 	if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
 		if (!res.fi) {
 			err = fib_props[res.type].error;
 			if (!err)
 				err = -EHOSTUNREACH;
-			goto errout_free;
+			goto errout_rcu;
 		}
 		err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
 				    nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
 				    rt->rt_type, res.prefix, res.prefixlen,
 				    fl4.flowi4_tos, res.fi, 0);
 	} else {
-		err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
+		err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb,
 				   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
 	}
 	if (err < 0)
-		goto errout_free;
+		goto errout_rcu;
 
 	rcu_read_unlock();
 
 	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
-errout:
-	return err;
 
 errout_free:
+	return err;
+errout_rcu:
 	rcu_read_unlock();
 	kfree_skb(skb);
-	goto errout;
+	goto errout_free;
 }
 
 void ip_rt_multicast_event(struct in_device *in_dev)
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 0/3] fib rule selftest
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h                |   2 +
 net/ipv4/route.c                              | 152 ++++++++++++-----
 net/ipv6/route.c                              |  25 +++
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++++++++++++++++++++++++++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522205612.GD7502@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > From: Sindhu Devale <sindhu.devale@intel.com>
> > 
> > Currently i40iw is dependent on i40e symbols
> > i40e_register_client and i40e_unregister_client due to
> > which i40iw cannot be loaded without i40e being loaded.
> > 
> > This patch allows RDMA driver to build and load without
> > linking to LAN driver and without LAN driver being loaded
> > first. Once the LAN driver is loaded, the RDMA driver
> > is notified through the netdevice notifiers to register
> > as client to the LAN driver. Add function pointers to IDC
> > register/unregister in the private VSI structure. This
> > allows a RDMA driver to build without linking to i40e.
> 
> Why would you want to do this? The rdma driver is non-functional
> without the ethernet driver, so why on earth would we want to defeat
> the module dependency mechanism?

This change is driven by the OSV's like Red Hat, where customer's were
updating the i40e driver, which in turn broke i40iw.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 20:56 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522203831.20624-1-jeffrey.t.kirsher@intel.com>

On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> From: Sindhu Devale <sindhu.devale@intel.com>
> 
> Currently i40iw is dependent on i40e symbols
> i40e_register_client and i40e_unregister_client due to
> which i40iw cannot be loaded without i40e being loaded.
> 
> This patch allows RDMA driver to build and load without
> linking to LAN driver and without LAN driver being loaded
> first. Once the LAN driver is loaded, the RDMA driver
> is notified through the netdevice notifiers to register
> as client to the LAN driver. Add function pointers to IDC
> register/unregister in the private VSI structure. This
> allows a RDMA driver to build without linking to i40e.

Why would you want to do this? The rdma driver is non-functional
without the ethernet driver, so why on earth would we want to defeat
the module dependency mechanism?

Jason

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-22 20:54 UTC (permalink / raw)
  To: Jiri Pirko, Michael S. Tsirkin
  Cc: stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522161246.GN2149@nanopsycho>



On 5/22/2018 9:12 AM, Jiri Pirko wrote:
> Fixing the subj, sorry about that.
>
> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>> failover infrastructure.
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>> In previous patchset versions, the common code did
>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>
>>>>>> This should be part of the common "failover" code.
>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>> :( This kind of degrades the whole efford of having single solution
>>> in "failover" module. I think that common parts, as
>>> netdev_rx_handler_register() and others certainly is should be inside
>>> the common module. This is not a good time to minimize changes. Let's do
>>> the thing properly and fix the netvsc mess now.
>>>
>>>
>>>> It should be possible to move some of the code from net_failover.c to generic
>>>> failover.c in future if Stephen is ok with it.
>>>>
>>>>
>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>> IFF_FAILOVER_SLAVE should be used.
>>>> Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>>>> in patch 3.
>>> The existing netvsc driver.
>> We really can't change netvsc's flags now, even if it's interface is
>> messy, it's being used in the field. We can add a flag that makes netvsc
>> behave differently, and if this flag also allows enhanced functionality
>> userspace will gradually switch.
> Okay, although in this case, it really does not make much sense, so be
> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
> now. (This once-wrong-forever-wrong policy is flustrating me).
>
> But since this patchset introduces private flag IFF_FAILOVER and
> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
> netdevice to get at least some consistency between virtio_net and
> netvsc.

OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
register/unregister routines so that it is consistent with virtio_net.

Based on your discussion with mst, i think we can even remove IFF_SLAVE
setting on netvsc as it should not impact userspace.  If Stephen is OK
we can make this change too.

Do you see any other items that need to be resolved for this series to go in
this merge window?



>
>> Anything breaking userspace I fully expect Stephen to nack and
>> IMO with good reason.
>>
>> -- 
>> MST

^ permalink raw reply

* Re: [PATCH net-next v2 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Michal @ 2018-05-22 20:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, linux-kernel, devicetree, f.fainelli, vivien.didelot,
	andrew, mark.rutland, davem, michal.vokac
In-Reply-To: <20180522194039.GA15413@rob-hp-laptop>

On 22.5.2018 21:40, Rob Herring wrote:
> On Tue, May 22, 2018 at 01:16:26PM +0200, Michal Vokáč wrote:
>> Add support for the four-port variant of the Qualcomm QCA833x switch.
>>
>> The CPU port default link settings can be reconfigured using
>> a fixed-link sub-node.
>>
>> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
>> ---
>> Changes in v2:
>>   - Add commit message and document fixed-link binding.
>>
>>   .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++++++++++++++++-
>>   1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> index 9c67ee4..15b9057 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> @@ -2,7 +2,10 @@
>>   
>>   Required properties:
>>   
>> -- compatible: should be "qca,qca8337"
>> +- compatible: should be one of:
>> +    "qca,qca8334"
>> +    "qca,qca8337"
>> +
>>   - #size-cells: must be 0
>>   - #address-cells: must be 1
>>   
>> @@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle
>>   referencing the internal PHY connected to it. The CPU port of this switch is
>>   always port 0.
>>   
>> +A CPU port node has the following optional property:
> 
> s/property/node/
> 
> Otherwise,
> 
> Reviewed-by: Rob Herring <robh@kernel.org>

Good catch, I will correct this.
Thanks for the review Rob.

Michal

^ permalink raw reply

* [PATCH net-next] enic: set DMA mask to 47 bit
From: Govindarajulu Varadarajan @ 2018-05-22 13:37 UTC (permalink / raw)
  To: davem, netdev; +Cc: benve, Govindarajulu Varadarajan

In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then
failover to DMA") DMA mask was changed from 40 bits to 64 bits.
Hardware actually supports only 47 bits.

Fixes: 624dbf55a359b("driver/net: enic: Try DMA 64 first, then failover
to DMA")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 81684acf52af..8a8b12b720ef 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2747,11 +2747,11 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_master(pdev);
 
 	/* Query PCI controller on system for DMA addressing
-	 * limitation for the device.  Try 64-bit first, and
+	 * limitation for the device.  Try 47-bit first, and
 	 * fail to 32-bit.
 	 */
 
-	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(47));
 	if (err) {
 		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (err) {
@@ -2765,10 +2765,10 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			goto err_out_release_regions;
 		}
 	} else {
-		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(47));
 		if (err) {
 			dev_err(dev, "Unable to obtain %u-bit DMA "
-				"for consistent allocations, aborting\n", 64);
+				"for consistent allocations, aborting\n", 47);
 			goto err_out_release_regions;
 		}
 		using_dac = 1;
-- 
2.17.0

^ permalink raw reply related

* [PATCH net] net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
From: Roopa Prabhu @ 2018-05-22 20:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/fib_frontend.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..e66172a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
 	[RTA_ENCAP]		= { .type = NLA_NESTED },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_TABLE]		= { .type = NLA_U32 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 0/7] net/ipv6: Fix route append and replace use cases
From: David Ahern @ 2018-05-22 20:44 UTC (permalink / raw)
  To: David Miller, dsahern; +Cc: netdev, Thomas.Winter, idosch, sharpd, roopa
In-Reply-To: <20180522.144617.1003575784676243779.davem@davemloft.net>

On 5/22/18 12:46 PM, David Miller wrote:
> 
> Ok, I'll apply this series.
> 
> But if this breaks things for anyone in a practical way, I am unfortunately
> going to have to revert no matter how silly the current behavior may be.
> 

Understood. I have to try the best option first. I'll look at
regressions if they happen.

^ permalink raw reply

* Re: [PATCH 07/33] iwlwifi: mvm: use match_string() helper
From: Andy Shevchenko @ 2018-05-22 20:41 UTC (permalink / raw)
  To: Yisheng Xie, Luca Coelho
  Cc: Linux Kernel Mailing List, Kalle Valo, Intel Linux Wireless,
	Johannes Berg, Emmanuel Grumbach, open list:TI WILINK WIRELES...,
	netdev
In-Reply-To: <63a78572-b7d3-9cc7-9e22-5bd19cad3333@huawei.com>

On Tue, May 22, 2018 at 6:30 AM, Yisheng Xie <xieyisheng1@huawei.com> wrote:

>> But it's up tu Loca.

Shame on me. I meant Luca, of course!
Luca, sorry.

> OK, I will change it if Loca agree your opinion.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 20:38 UTC (permalink / raw)
  To: davem, dledford, jgg
  Cc: Sindhu Devale, netdev, linux-rdma, nhorman, sassmann, jogreene,
	Shiraz Saleem, Jeff Kirsher

From: Sindhu Devale <sindhu.devale@intel.com>

Currently i40iw is dependent on i40e symbols
i40e_register_client and i40e_unregister_client due to
which i40iw cannot be loaded without i40e being loaded.

This patch allows RDMA driver to build and load without
linking to LAN driver and without LAN driver being loaded
first. Once the LAN driver is loaded, the RDMA driver
is notified through the netdevice notifiers to register
as client to the LAN driver. Add function pointers to IDC
register/unregister in the private VSI structure. This
allows a RDMA driver to build without linking to i40e.

Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/infiniband/hw/i40iw/i40iw.h           |  24 +++
 drivers/infiniband/hw/i40iw/i40iw_main.c      | 141 ++++++++++++++++--
 drivers/infiniband/hw/i40iw/i40iw_utils.c     |   5 +-
 drivers/net/ethernet/intel/i40e/i40e.h        |   1 +
 drivers/net/ethernet/intel/i40e/i40e_client.h |   9 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   6 +
 6 files changed, 173 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw.h b/drivers/infiniband/hw/i40iw/i40iw.h
index d5d8c1be345a..c6398b73a8da 100644
--- a/drivers/infiniband/hw/i40iw/i40iw.h
+++ b/drivers/infiniband/hw/i40iw/i40iw.h
@@ -119,6 +119,30 @@
 #define I40IW_CQP_COMPL_SQ_WQE_FLUSHED    3
 #define I40IW_CQP_COMPL_RQ_SQ_WQE_FLUSHED 4
 
+enum I40IW_IDC_STATE {
+	I40IW_STATE_INVALID,
+	I40IW_STATE_VALID,
+	I40IW_STATE_REG_FAILED
+};
+
+struct i40iw_peer {
+	struct module *module;
+#define MAX_PEER_NAME_SIZE 8
+	char name[MAX_PEER_NAME_SIZE];
+	enum I40IW_IDC_STATE state;
+	atomic_t ref_count;
+	int (*idc_reg_peer_driver)(struct i40e_client *i40iw_client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *i40iw_client);
+};
+
+struct i40iw_peer_drv {
+	struct i40e_client i40iw_client;
+	struct i40iw_peer peer;
+};
+
+bool i40iw_is_new_peer(struct net_device *netdev);
+void i40iw_reg_peer(void);
+
 struct i40iw_cqp_compl_info {
 	u32 op_ret_val;
 	u16 maj_err_code;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_main.c b/drivers/infiniband/hw/i40iw/i40iw_main.c
index 9cd0d3ef9057..f4c5be11c1d4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_main.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_main.c
@@ -78,6 +78,7 @@ MODULE_AUTHOR("Intel Corporation, <e1000-rdma@lists.sourceforge.net>");
 MODULE_DESCRIPTION("Intel(R) Ethernet Connection X722 iWARP RDMA Driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
+static struct i40iw_peer_drv peer_drv;
 static struct i40e_client i40iw_client;
 static char i40iw_client_name[I40E_CLIENT_STR_LENGTH] = "i40iw";
 
@@ -103,6 +104,30 @@ static struct notifier_block i40iw_netdevice_notifier = {
 	.notifier_call = i40iw_netdevice_event
 };
 
+/**
+ * i40iw_open_inc_ref - Increment ref count for a open
+ */
+static void i40iw_open_inc_ref(void)
+{
+	atomic_inc(&peer_drv.peer.ref_count);
+}
+
+/**
+ * i40iw_open_dec_ref - Decrement ref count for a open
+ */
+static void i40iw_open_dec_ref(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID &&
+	    atomic_dec_and_test(&peer->ref_count)) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_find_i40e_handler - find a handler given a client info
  * @ldev: pointer to a client info
@@ -1710,6 +1735,7 @@ static int i40iw_open(struct i40e_info *ldev, struct i40e_client *client)
 		if(iwdev->param_wq == NULL)
 			break;
 		i40iw_pr_info("i40iw_open completed\n");
+		i40iw_open_inc_ref();
 		return 0;
 	} while (0);
 
@@ -1801,6 +1827,7 @@ static void i40iw_close(struct i40e_info *ldev, struct i40e_client *client, bool
 	i40iw_cm_teardown_connections(iwdev, NULL, NULL, true);
 	destroy_workqueue(iwdev->virtchnl_wq);
 	i40iw_deinit_device(iwdev);
+	i40iw_open_dec_ref();
 }
 
 /**
@@ -2024,6 +2051,104 @@ static const struct i40e_client_ops i40e_ops = {
 	.vf_capable = i40iw_vf_capable
 };
 
+/**
+ * i40iw_is_new_peer - check netdev of the peer driver
+ * @netdev: netdev of peer driver
+ */
+bool i40iw_is_new_peer(struct net_device *netdev)
+{
+	struct idc_srv_provider *sp;
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID)
+		return false;
+
+	if (netdev->dev.parent && netdev->dev.parent->driver &&
+	    !strncmp(netdev->dev.parent->driver->name, peer->name, sizeof(peer->name))) {
+		sp = (struct idc_srv_provider *)netdev_priv(netdev);
+		if (sp->signature != IDC_SIGNATURE || sp->version)
+			return false;
+
+		/* Found the driver */
+		peer->idc_reg_peer_driver = sp->idc_reg_peer_driver;
+		peer->idc_unreg_peer_driver = sp->idc_unreg_peer_driver;
+		peer->module = netdev->dev.parent->driver->owner;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40iw_initialize_client - Setup client struct
+ */
+static void i40iw_initialize_client(void)
+{
+	struct i40e_client *i40iw_client = &peer_drv.i40iw_client;
+
+	i40iw_client->version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
+	i40iw_client->version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
+	i40iw_client->version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
+	i40iw_client->ops = &i40e_ops;
+	memcpy(i40iw_client->name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
+	i40iw_client->type = I40E_CLIENT_IWARP;
+	strncpy(peer_drv.peer.name, "i40e", sizeof(peer_drv.peer.name));
+}
+
+/**
+ * i40iw_reg_peer - Register with peer
+ */
+void i40iw_reg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+
+	if (peer->state == I40IW_STATE_VALID)
+		return;
+
+	if (peer->idc_reg_peer_driver &&
+	    !peer->idc_reg_peer_driver(&peer_drv.i40iw_client)) {
+		peer->state = I40IW_STATE_VALID;
+		try_module_get(peer->module);
+	} else {
+		peer->state = I40IW_STATE_REG_FAILED;
+	}
+}
+
+/**
+ * i40iw_find_idc_peer - Search netdevs for a peer driver
+ */
+static void i40iw_find_idc_peer(void)
+{
+	struct net_device *dev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
+		if (i40iw_is_new_peer(dev))
+			break;
+	}
+	rcu_read_unlock();
+	i40iw_reg_peer();
+}
+
+/**
+ * i40iw_unreg_peer - Unregister with peer
+ */
+static void i40iw_unreg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_init_module - driver initialization function
  *
@@ -2032,20 +2157,12 @@ static const struct i40e_client_ops i40e_ops = {
  */
 static int __init i40iw_init_module(void)
 {
-	int ret;
-
-	memset(&i40iw_client, 0, sizeof(i40iw_client));
-	i40iw_client.version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
-	i40iw_client.version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
-	i40iw_client.version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
-	i40iw_client.ops = &i40e_ops;
-	memcpy(i40iw_client.name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
-	i40iw_client.type = I40E_CLIENT_IWARP;
 	spin_lock_init(&i40iw_handler_lock);
-	ret = i40e_register_client(&i40iw_client);
+	i40iw_initialize_client();
+	i40iw_find_idc_peer();
 	i40iw_register_notifiers();
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -2057,7 +2174,7 @@ static int __init i40iw_init_module(void)
 static void __exit i40iw_exit_module(void)
 {
 	i40iw_unregister_notifiers();
-	i40e_unregister_client(&i40iw_client);
+	i40iw_unreg_peer();
 }
 
 module_init(i40iw_init_module);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index a9ea966877f2..264939942da0 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -314,8 +314,11 @@ int i40iw_netdevice_event(struct notifier_block *notifier,
 	event_netdev = netdev_notifier_info_to_dev(ptr);
 
 	hdl = i40iw_find_netdev(event_netdev);
-	if (!hdl)
+	if (!hdl) {
+		if (i40iw_is_new_peer(event_netdev))
+			i40iw_reg_peer();
 		return NOTIFY_DONE;
+	}
 
 	iwdev = &hdl->device;
 	if (iwdev->init_state < RDMA_DEV_REGISTERED || iwdev->closing)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 7a80652e2500..e3171b696848 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -789,6 +789,7 @@ struct i40e_vsi {
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
+	struct idc_srv_provider prov_callbacks;
 	struct i40e_vsi *vsi;
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.h b/drivers/net/ethernet/intel/i40e/i40e_client.h
index 72994baf4941..95a47df9c104 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.h
@@ -44,6 +44,15 @@ struct i40e_client;
 #define I40E_QUEUE_TYPE_PE_AEQ  0x80
 #define I40E_QUEUE_INVALID_IDX	0xFFFF
 
+#define IDC_SIGNATURE 0x494e54454c494443ULL	/* INTELIDC */
+struct idc_srv_provider {
+	u64 signature;
+	u8 version;
+	u8 rsvd[7];
+	int (*idc_reg_peer_driver)(struct i40e_client *client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *client);
+};
+
 struct i40e_qv_info {
 	u32 v_idx; /* msix_vector */
 	u16 ceq_idx;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b5daa5c9c7de..984001ae7680 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11913,6 +11913,12 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	np = netdev_priv(netdev);
 	np->vsi = vsi;
 
+	np->prov_callbacks.signature = IDC_SIGNATURE;
+	np->prov_callbacks.version = 0;
+	memset(np->prov_callbacks.rsvd, 0, sizeof(np->prov_callbacks.rsvd));
+	np->prov_callbacks.idc_reg_peer_driver = i40e_register_client;
+	np->prov_callbacks.idc_unreg_peer_driver = i40e_unregister_client;
+
 	hw_enc_features = NETIF_F_SG			|
 			  NETIF_F_IP_CSUM		|
 			  NETIF_F_IPV6_CSUM		|
-- 
2.17.0

^ permalink raw reply related

* Re: [Cake] [PATCH net-next v14 6/7] sch_cake: Add overhead compensation support to the rate shaper
From: Marcelo Ricardo Leitner @ 2018-05-22 20:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake
In-Reply-To: <52F9132E-4FDC-495A-A020-BCD963B3E3CF@toke.dk>

On Tue, May 22, 2018 at 10:44:53AM +0200, Toke Høiland-Jørgensen wrote:
> 
> 
> On 22 May 2018 01:45:13 CEST, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >On Mon, May 21, 2018 at 10:35:58PM +0200, Toke Høiland-Jørgensen wrote:
> >> +static u32 cake_overhead(struct cake_sched_data *q, const struct
> >sk_buff *skb)
> >> +{
> >> +	const struct skb_shared_info *shinfo = skb_shinfo(skb);
> >> +	unsigned int hdr_len, last_len = 0;
> >> +	u32 off = skb_network_offset(skb);
> >> +	u32 len = qdisc_pkt_len(skb);
> >> +	u16 segs = 1;
> >> +
> >> +	q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
> >> +
> >> +	if (!shinfo->gso_size)
> >> +		return cake_calc_overhead(q, len, off);
> >> +
> >> +	/* borrowed from qdisc_pkt_len_init() */
> >> +	hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
> >> +
> >> +	/* + transport layer */
> >> +	if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 |
> >> +						SKB_GSO_TCPV6))) {
> >> +		const struct tcphdr *th;
> >> +		struct tcphdr _tcphdr;
> >> +
> >> +		th = skb_header_pointer(skb, skb_transport_offset(skb),
> >> +					sizeof(_tcphdr), &_tcphdr);
> >> +		if (likely(th))
> >> +			hdr_len += __tcp_hdrlen(th);
> >> +	} else {
> >
> >I didn't see some code limiting GSO packets to just TCP or UDP. Is it
> >safe to assume that this packet is an UDP one, and not SCTP or ESP,
> >for example?
> 
> As the comment says, I nicked this from the qdisc init code.
> So I assume it's safe? :)

As long as it doesn't go further than this, it is. As in, it is just
validating if it can contain an UDP header, and if so, account for its
size, without actually reading the header.

Considering everything !TCP as UDP work as an approximation, which is
quite accurate. SCTP header is just 4 bytes bigger than UDP header and
is equal to ESP header size.

> 
> >> +		struct udphdr _udphdr;
> >> +
> >> +		if (skb_header_pointer(skb, skb_transport_offset(skb),
> >> +				       sizeof(_udphdr), &_udphdr))
> >> +			hdr_len += sizeof(struct udphdr);
> >> +	}
> >> +
> >> +	if (unlikely(shinfo->gso_type & SKB_GSO_DODGY))
> >> +		segs = DIV_ROUND_UP(skb->len - hdr_len,
> >> +				    shinfo->gso_size);
> >> +	else
> >> +		segs = shinfo->gso_segs;
> >> +
> >> +	len = shinfo->gso_size + hdr_len;
> >> +	last_len = skb->len - shinfo->gso_size * (segs - 1);
> >> +
> >> +	return (cake_calc_overhead(q, len, off) * (segs - 1) +
> >> +		cake_calc_overhead(q, last_len, off));
> >> +}
> >> +
> 

^ permalink raw reply

* Re: Regression: Approximate 34% performance hit in receive throughput over ixgbe seen due to build_skb patch
From: Alexander Duyck @ 2018-05-22 20:03 UTC (permalink / raw)
  To: William Kucharski
  Cc: LKML, Netdev, intel-wired-lan, Jeff Kirsher, Duyck, Alexander H
In-Reply-To: <4F646FBB-FE0B-4FEE-98E5-3CA2DF0598DE@oracle.com>

On Tue, May 22, 2018 at 12:29 PM, William Kucharski
<william.kucharski@oracle.com> wrote:
>
>
>> On May 22, 2018, at 12:23 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>
>> 3. There should be a private flag that can be updated via "ethtool
>> --set-priv-flags" called "legacy-rx" that you can enable that will
>> roll back to the original that did the copy-break type approach for
>> small packets and the headers of the frame.
>
> With legacy-rx enabled, most of the regression goes away, but it's still present
> as compared to the code without the patch; the regression then drops to about 6%:
>
> # ethtool --show-priv-flags eno1
> Private flags for eno1:
> legacy-rx: on
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     35934709      0     306.64
>  65536           60.00     33791739            288.35
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     39254351      0     334.97
>  65536           60.00     36761069            313.69
>
> Is this variance to be expected, or do you think modification of the
> interrupt delay would achieve better results?
>
>
>     William Kucharski
>

I would think with modification of interrupt delay you could probably
do much better if my assumption is correct and the issue is us sitting
on packets for too long so we overrun the socket buffer and start
dropping packets or stalling the Tx.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-22 19:55 UTC (permalink / raw)
  To: Sandipan Das
  Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao,
	Quentin Monnet
In-Reply-To: <88b61b11ebca5b44bad0c34225b6f2383e5983a5.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:13 +0530, Sandipan Das wrote:
> +		if (info.nr_jited_func_lens && info.jited_func_lens) {
> +			struct kernel_sym *sym = NULL;
> +			unsigned char *img = buf;
> +			__u64 *ksyms = NULL;
> +			__u32 *lens;
> +			__u32 i;
> +
> +			if (info.nr_jited_ksyms) {
> +				kernel_syms_load(&dd);
> +				ksyms = (__u64 *) info.jited_ksyms;
> +			}
> +
> +			lens = (__u32 *) info.jited_func_lens;
> +			for (i = 0; i < info.nr_jited_func_lens; i++) {
> +				if (ksyms) {
> +					sym = kernel_syms_search(&dd, ksyms[i]);
> +					if (sym)
> +						printf("%s:\n", sym->name);
> +					else
> +						printf("%016llx:\n", ksyms[i]);
> +				}
> +
> +				disasm_print_insn(img, lens[i], opcodes, name);
> +				img += lens[i];
> +				printf("\n");
> +			}
> +		} else {

The output doesn't seem to be JSON-compatible :(  We try to make sure
all bpftool command can produce valid JSON when run with -j (or -p)
switch.

Would it be possible to make each function a separate JSON object with
"name" and "insn" array?  Would that work?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox