Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next v7 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps
From: Roman Gushchin @ 2020-11-21  2:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, daniel, netdev, andrii, akpm, linux-mm, linux-kernel,
	kernel-team
In-Reply-To: <20201121025227.dypweojhaz7elwb3@ast-mbp>

On Fri, Nov 20, 2020 at 06:52:27PM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 19, 2020 at 09:37:52AM -0800, Roman Gushchin wrote:
> >  static void bpf_map_put_uref(struct bpf_map *map)
> > @@ -619,7 +562,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> >  		   "value_size:\t%u\n"
> >  		   "max_entries:\t%u\n"
> >  		   "map_flags:\t%#x\n"
> > -		   "memlock:\t%llu\n"
> > +		   "memlock:\t%llu\n" /* deprecated */
> >  		   "map_id:\t%u\n"
> >  		   "frozen:\t%u\n",
> >  		   map->map_type,
> > @@ -627,7 +570,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> >  		   map->value_size,
> >  		   map->max_entries,
> >  		   map->map_flags,
> > -		   map->memory.pages * 1ULL << PAGE_SHIFT,
> > +		   0LLU,
> 
> The set looks great to me overall, but above change is problematic.
> There are tools out there that read this value.
> Returning zero might cause oncall alarms to trigger.
> I think we can be more accurate here.
> Instead of zero the kernel can return
> round_up(max_entries * round_up(key_size + value_size, 8), PAGE_SIZE)
> It's not the same as before, but at least the numbers won't suddenly
> go to zero and comparison between maps is still relevant.
> Of course we can introduce a page size calculating callback per map type,
> but imo that would be overkill. These monitoring tools don't care about
> precise number, but rather about relative value and growth from one
> version of the application to another.
> 
> If Daniel doesn't find other issues this can be fixed in the follow up.

Makes total sense. I'll prepare a follow-up patch.

Thanks!

^ permalink raw reply

* Re: [PATCH v2 00/10] Broadcom b53 YAML bindings
From: Florian Fainelli @ 2020-11-21  2:56 UTC (permalink / raw)
  To: Florian Fainelli, linux-arm-kernel
  Cc: Andrew Lunn, Vivien Didelot, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Rob Herring, Ray Jui, Scott Branden,
	maintainer:BROADCOM IPROC ARM ARCHITECTURE, Hauke Mehrtens,
	Rafał Miłecki, open list:NETWORKING DRIVERS,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	open list, Kurt Kanzenbach
In-Reply-To: <20201112045020.9766-1-f.fainelli@gmail.com>



On 11/11/2020 8:50 PM, Florian Fainelli wrote:
> Hi,
> 
> This patch series fixes the various Broadcom SoCs DTS files and the
> existing YAML binding for missing properties before adding a proper b53
> switch YAML binding from Kurt.
> 
> If this all looks good, given that there are quite a few changes to the
> DTS files, it might be best if I take them through the upcoming Broadcom
> ARM SoC pull requests. Let me know if you would like those patches to be
> applied differently.
> 
> Thanks!

Series applied to devicetree/next, thanks everyone.
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next v7 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps
From: Alexei Starovoitov @ 2020-11-21  2:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, ast, daniel, netdev, andrii, akpm, linux-mm, linux-kernel,
	kernel-team
In-Reply-To: <20201119173754.4125257-33-guro@fb.com>

On Thu, Nov 19, 2020 at 09:37:52AM -0800, Roman Gushchin wrote:
>  static void bpf_map_put_uref(struct bpf_map *map)
> @@ -619,7 +562,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
>  		   "value_size:\t%u\n"
>  		   "max_entries:\t%u\n"
>  		   "map_flags:\t%#x\n"
> -		   "memlock:\t%llu\n"
> +		   "memlock:\t%llu\n" /* deprecated */
>  		   "map_id:\t%u\n"
>  		   "frozen:\t%u\n",
>  		   map->map_type,
> @@ -627,7 +570,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
>  		   map->value_size,
>  		   map->max_entries,
>  		   map->map_flags,
> -		   map->memory.pages * 1ULL << PAGE_SHIFT,
> +		   0LLU,

The set looks great to me overall, but above change is problematic.
There are tools out there that read this value.
Returning zero might cause oncall alarms to trigger.
I think we can be more accurate here.
Instead of zero the kernel can return
round_up(max_entries * round_up(key_size + value_size, 8), PAGE_SIZE)
It's not the same as before, but at least the numbers won't suddenly
go to zero and comparison between maps is still relevant.
Of course we can introduce a page size calculating callback per map type,
but imo that would be overkill. These monitoring tools don't care about
precise number, but rather about relative value and growth from one
version of the application to another.

If Daniel doesn't find other issues this can be fixed in the follow up.

^ permalink raw reply

* Re: [PATCH net-next 4/6] net: ipa: support retries on generic GSI commands
From: Jakub Kicinski @ 2020-11-21  2:49 UTC (permalink / raw)
  To: Alex Elder
  Cc: davem, evgreen, subashab, cpratapa, bjorn.andersson, netdev,
	linux-kernel
In-Reply-To: <20201119224929.23819-5-elder@linaro.org>

On Thu, 19 Nov 2020 16:49:27 -0600 Alex Elder wrote:
> +	do
> +		ret = gsi_generic_command(gsi, channel_id,
> +					  GSI_GENERIC_HALT_CHANNEL);
> +	while (ret == -EAGAIN && retries--);

This may well be the first time I've seen someone write a do while loop
without the curly brackets!

^ permalink raw reply

* [PATCH bpf-next 3/7] libbpf: factor out low-level BPF program loading helper
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Refactor low-level API for BPF program loading to not rely on public API
types. This allows painless extension without constant efforts to cleverly not
break backwards compatibility.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             | 100 ++++++++++++++++++++++----------
 tools/lib/bpf/libbpf.c          |  34 +++++------
 tools/lib/bpf/libbpf_internal.h |  29 +++++++++
 3 files changed, 113 insertions(+), 50 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d27e34133973..e371ef91f5e8 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -214,59 +214,52 @@ alloc_zero_tailing_info(const void *orecord, __u32 cnt,
 	return info;
 }
 
-int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
-			   char *log_buf, size_t log_buf_sz)
+int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr)
 {
 	void *finfo = NULL, *linfo = NULL;
 	union bpf_attr attr;
-	__u32 log_level;
 	int fd;
 
-	if (!load_attr || !log_buf != !log_buf_sz)
+	if (!load_attr->log_buf != !load_attr->log_buf_sz)
 		return -EINVAL;
 
-	log_level = load_attr->log_level;
-	if (log_level > (4 | 2 | 1) || (log_level && !log_buf))
+	if (load_attr->log_level > (4 | 2 | 1) || (load_attr->log_level && !load_attr->log_buf))
 		return -EINVAL;
 
 	memset(&attr, 0, sizeof(attr));
 	attr.prog_type = load_attr->prog_type;
 	attr.expected_attach_type = load_attr->expected_attach_type;
-	if (attr.prog_type == BPF_PROG_TYPE_STRUCT_OPS ||
-	    attr.prog_type == BPF_PROG_TYPE_LSM) {
-		attr.attach_btf_id = load_attr->attach_btf_id;
-	} else if (attr.prog_type == BPF_PROG_TYPE_TRACING ||
-		   attr.prog_type == BPF_PROG_TYPE_EXT) {
-		attr.attach_btf_id = load_attr->attach_btf_id;
-		attr.attach_prog_fd = load_attr->attach_prog_fd;
-	} else {
-		attr.prog_ifindex = load_attr->prog_ifindex;
-		attr.kern_version = load_attr->kern_version;
-	}
-	attr.insn_cnt = (__u32)load_attr->insns_cnt;
+
+	attr.attach_btf_id = load_attr->attach_btf_id;
+	attr.attach_prog_fd = load_attr->attach_prog_fd;
+
+	attr.prog_ifindex = load_attr->prog_ifindex;
+	attr.kern_version = load_attr->kern_version;
+
+	attr.insn_cnt = (__u32)load_attr->insn_cnt;
 	attr.insns = ptr_to_u64(load_attr->insns);
 	attr.license = ptr_to_u64(load_attr->license);
 
-	attr.log_level = log_level;
-	if (log_level) {
-		attr.log_buf = ptr_to_u64(log_buf);
-		attr.log_size = log_buf_sz;
-	} else {
-		attr.log_buf = ptr_to_u64(NULL);
-		attr.log_size = 0;
+	attr.log_level = load_attr->log_level;
+	if (attr.log_level) {
+		attr.log_buf = ptr_to_u64(load_attr->log_buf);
+		attr.log_size = load_attr->log_buf_sz;
 	}
 
 	attr.prog_btf_fd = load_attr->prog_btf_fd;
+	attr.prog_flags = load_attr->prog_flags;
+
 	attr.func_info_rec_size = load_attr->func_info_rec_size;
 	attr.func_info_cnt = load_attr->func_info_cnt;
 	attr.func_info = ptr_to_u64(load_attr->func_info);
+
 	attr.line_info_rec_size = load_attr->line_info_rec_size;
 	attr.line_info_cnt = load_attr->line_info_cnt;
 	attr.line_info = ptr_to_u64(load_attr->line_info);
+
 	if (load_attr->name)
 		memcpy(attr.prog_name, load_attr->name,
-		       min(strlen(load_attr->name), BPF_OBJ_NAME_LEN - 1));
-	attr.prog_flags = load_attr->prog_flags;
+		       min(strlen(load_attr->name), (size_t)BPF_OBJ_NAME_LEN - 1));
 
 	fd = sys_bpf_prog_load(&attr, sizeof(attr));
 	if (fd >= 0)
@@ -306,19 +299,19 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 		}
 
 		fd = sys_bpf_prog_load(&attr, sizeof(attr));
-
 		if (fd >= 0)
 			goto done;
 	}
 
-	if (log_level || !log_buf)
+	if (load_attr->log_level || !load_attr->log_buf)
 		goto done;
 
 	/* Try again with log */
-	attr.log_buf = ptr_to_u64(log_buf);
-	attr.log_size = log_buf_sz;
+	attr.log_buf = ptr_to_u64(load_attr->log_buf);
+	attr.log_size = load_attr->log_buf_sz;
 	attr.log_level = 1;
-	log_buf[0] = 0;
+	load_attr->log_buf[0] = 0;
+
 	fd = sys_bpf_prog_load(&attr, sizeof(attr));
 done:
 	free(finfo);
@@ -326,6 +319,49 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	return fd;
 }
 
+int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
+			   char *log_buf, size_t log_buf_sz)
+{
+	struct bpf_prog_load_params p = {};
+
+	if (!load_attr || !log_buf != !log_buf_sz)
+		return -EINVAL;
+
+	p.prog_type = load_attr->prog_type;
+	p.expected_attach_type = load_attr->expected_attach_type;
+	switch (p.prog_type) {
+	case BPF_PROG_TYPE_STRUCT_OPS:
+	case BPF_PROG_TYPE_LSM:
+		p.attach_btf_id = load_attr->attach_btf_id;
+		break;
+	case BPF_PROG_TYPE_TRACING:
+	case BPF_PROG_TYPE_EXT:
+		p.attach_btf_id = load_attr->attach_btf_id;
+		p.attach_prog_fd = load_attr->attach_prog_fd;
+		break;
+	default:
+		p.prog_ifindex = load_attr->prog_ifindex;
+		p.kern_version = load_attr->kern_version;
+	}
+	p.insn_cnt = load_attr->insns_cnt;
+	p.insns = load_attr->insns;
+	p.license = load_attr->license;
+	p.log_level = load_attr->log_level;
+	p.log_buf = log_buf;
+	p.log_buf_sz = log_buf_sz;
+	p.prog_btf_fd = load_attr->prog_btf_fd;
+	p.func_info_rec_size = load_attr->func_info_rec_size;
+	p.func_info_cnt = load_attr->func_info_cnt;
+	p.func_info = load_attr->func_info;
+	p.line_info_rec_size = load_attr->line_info_rec_size;
+	p.line_info_cnt = load_attr->line_info_cnt;
+	p.line_info = load_attr->line_info;
+	p.name = load_attr->name;
+	p.prog_flags = load_attr->prog_flags;
+
+	return libbpf__bpf_prog_load(&p);
+}
+
 int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     size_t insns_cnt, const char *license,
 		     __u32 kern_version, char *log_buf,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac1ff4e7741a..3425e594132b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6809,7 +6809,7 @@ static int
 load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	     char *license, __u32 kern_version, int *pfd)
 {
-	struct bpf_load_program_attr load_attr;
+	struct bpf_prog_load_params load_attr = {};
 	char *cp, errmsg[STRERR_BUFSIZE];
 	size_t log_buf_size = 0;
 	char *log_buf = NULL;
@@ -6818,7 +6818,6 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	if (!insns || !insns_cnt)
 		return -EINVAL;
 
-	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
 	load_attr.prog_type = prog->type;
 	/* old kernels might not support specifying expected_attach_type */
 	if (!kernel_supports(FEAT_EXP_ATTACH_TYPE) && prog->sec_def &&
@@ -6829,19 +6828,14 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	if (kernel_supports(FEAT_PROG_NAME))
 		load_attr.name = prog->name;
 	load_attr.insns = insns;
-	load_attr.insns_cnt = insns_cnt;
+	load_attr.insn_cnt = insns_cnt;
 	load_attr.license = license;
-	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
-	    prog->type == BPF_PROG_TYPE_LSM) {
-		load_attr.attach_btf_id = prog->attach_btf_id;
-	} else if (prog->type == BPF_PROG_TYPE_TRACING ||
-		   prog->type == BPF_PROG_TYPE_EXT) {
-		load_attr.attach_prog_fd = prog->attach_prog_fd;
-		load_attr.attach_btf_id = prog->attach_btf_id;
-	} else {
-		load_attr.kern_version = kern_version;
-		load_attr.prog_ifindex = prog->prog_ifindex;
-	}
+	load_attr.attach_btf_id = prog->attach_btf_id;
+	load_attr.attach_prog_fd = prog->attach_prog_fd;
+	load_attr.attach_btf_id = prog->attach_btf_id;
+	load_attr.kern_version = kern_version;
+	load_attr.prog_ifindex = prog->prog_ifindex;
+
 	/* specify func_info/line_info only if kernel supports them */
 	btf_fd = bpf_object__btf_fd(prog->obj);
 	if (btf_fd >= 0 && kernel_supports(FEAT_BTF_FUNC)) {
@@ -6865,7 +6859,9 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 		*log_buf = 0;
 	}
 
-	ret = bpf_load_program_xattr(&load_attr, log_buf, log_buf_size);
+	load_attr.log_buf = log_buf;
+	load_attr.log_buf_sz = log_buf_size;
+	ret = libbpf__bpf_prog_load(&load_attr);
 
 	if (ret >= 0) {
 		if (log_buf && load_attr.log_level)
@@ -6906,9 +6902,9 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 		pr_warn("-- BEGIN DUMP LOG ---\n");
 		pr_warn("\n%s\n", log_buf);
 		pr_warn("-- END LOG --\n");
-	} else if (load_attr.insns_cnt >= BPF_MAXINSNS) {
+	} else if (load_attr.insn_cnt >= BPF_MAXINSNS) {
 		pr_warn("Program too large (%zu insns), at most %d insns\n",
-			load_attr.insns_cnt, BPF_MAXINSNS);
+			load_attr.insn_cnt, BPF_MAXINSNS);
 		ret = -LIBBPF_ERRNO__PROG2BIG;
 	} else if (load_attr.prog_type != BPF_PROG_TYPE_KPROBE) {
 		/* Wrong program type? */
@@ -6916,7 +6912,9 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 
 		load_attr.prog_type = BPF_PROG_TYPE_KPROBE;
 		load_attr.expected_attach_type = 0;
-		fd = bpf_load_program_xattr(&load_attr, NULL, 0);
+		load_attr.log_buf = NULL;
+		load_attr.log_buf_sz = 0;
+		fd = libbpf__bpf_prog_load(&load_attr);
 		if (fd >= 0) {
 			close(fd);
 			ret = -LIBBPF_ERRNO__PROGTYPE;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index e569ae63808e..681073a67ae3 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -151,6 +151,35 @@ int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 			 const char *str_sec, size_t str_len);
 
+struct bpf_prog_load_params {
+	enum bpf_prog_type prog_type;
+	enum bpf_attach_type expected_attach_type;
+	const char *name;
+	const struct bpf_insn *insns;
+	size_t insn_cnt;
+	const char *license;
+	__u32 kern_version;
+	__u32 attach_prog_fd;
+	__u32 attach_btf_id;
+	__u32 prog_ifindex;
+	__u32 prog_btf_fd;
+	__u32 prog_flags;
+
+	__u32 func_info_rec_size;
+	const void *func_info;
+	__u32 func_info_cnt;
+
+	__u32 line_info_rec_size;
+	const void *line_info;
+	__u32 line_info_cnt;
+
+	__u32 log_level;
+	char *log_buf;
+	size_t log_buf_sz;
+};
+
+int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr);
+
 int bpf_object__section_size(const struct bpf_object *obj, const char *name,
 			     __u32 *size);
 int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 7/7] selftests/bpf: add fentry/fexit/fmod_ret selftest for kernel module
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Add new selftest checking attachment of fentry/fexit/fmod_ret (and raw
tracepoint ones for completeness) BPF programs to kernel module function.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/module_attach.c  | 53 +++++++++++++++
 .../selftests/bpf/progs/test_module_attach.c  | 66 +++++++++++++++++++
 2 files changed, 119 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/module_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_module_attach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/module_attach.c b/tools/testing/selftests/bpf/prog_tests/module_attach.c
new file mode 100644
index 000000000000..c871ac073bcf
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/module_attach.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+
+#include <test_progs.h>
+#include "test_module_attach.skel.h"
+
+static int duration;
+
+static int trigger_module_test_read(int read_sz)
+{
+	int fd, err;
+
+	fd = open("/sys/kernel/bpf_sidecar", O_RDONLY);
+	err = -errno;
+	if (CHECK(fd < 0, "sidecar_file_open", "failed: %d\n", err))
+		return err;
+
+	read(fd, NULL, read_sz);
+	close(fd);
+
+	return 0;
+}
+
+void test_module_attach(void)
+{
+	const int READ_SZ = 456;
+	struct test_module_attach* skel;
+	struct test_module_attach__bss *bss;
+	int err;
+
+	skel = test_module_attach__open_and_load();
+	if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+		return;
+
+	bss = skel->bss;
+
+	err = test_module_attach__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	/* trigger tracepoint */
+	ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
+
+	ASSERT_EQ(bss->raw_tp_read_sz, READ_SZ, "raw_tp");
+	ASSERT_EQ(bss->tp_btf_read_sz, READ_SZ, "tp_btf");
+	ASSERT_EQ(bss->fentry_read_sz, READ_SZ, "fentry");
+	ASSERT_EQ(bss->fexit_read_sz, READ_SZ, "fexit");
+	ASSERT_EQ(bss->fexit_ret, -EIO, "fexit_tet");
+	ASSERT_EQ(bss->fmod_ret_read_sz, READ_SZ, "fmod_ret");
+
+cleanup:
+	test_module_attach__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_module_attach.c b/tools/testing/selftests/bpf/progs/test_module_attach.c
new file mode 100644
index 000000000000..4c79261d95e6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_module_attach.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "../bpf_sidecar/bpf_sidecar.h"
+
+__u32 raw_tp_read_sz = 0;
+
+SEC("raw_tp/bpf_sidecar_test_read")
+int BPF_PROG(handle_raw_tp,
+	     struct task_struct *task, struct bpf_sidecar_test_read_ctx *read_ctx)
+{
+	raw_tp_read_sz = BPF_CORE_READ(read_ctx, len);
+	return 0;
+}
+
+__u32 tp_btf_read_sz = 0;
+
+SEC("tp_btf/bpf_sidecar_test_read")
+int BPF_PROG(handle_tp_btf,
+	     struct task_struct *task, struct bpf_sidecar_test_read_ctx *read_ctx)
+{
+	tp_btf_read_sz = read_ctx->len;
+	return 0;
+}
+
+__u32 fentry_read_sz = 0;
+
+SEC("fentry/bpf_sidecar_test_read")
+int BPF_PROG(handle_fentry,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	fentry_read_sz = len;
+	return 0;
+}
+
+__u32 fexit_read_sz = 0;
+int fexit_ret = 0;
+
+SEC("fexit/bpf_sidecar_test_read")
+int BPF_PROG(handle_fexit,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len,
+	     int ret)
+{
+	fexit_read_sz = len;
+	fexit_ret = ret;
+	return 0;
+}
+
+__u32 fmod_ret_read_sz = 0;
+
+SEC("fmod_ret/bpf_sidecar_test_read")
+int BPF_PROG(handle_fmod_ret,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	fmod_ret_read_sz = len;
+	return 0; /* don't override the exit code */
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 2/7] bpf: allow to specify kernel module BTFs when attaching BPF programs
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Add ability for user-space programs to specify non-vmlinux BTF when attaching
BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this,
add attach_btf_obj_id field which contains BTF object ID for either vmlinux or
module. For backwards compatibility (and simplicity) reasons, 0 denotes
vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/btf.h            |  2 ++
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/btf.c               | 25 +++++++++++++++++++------
 kernel/bpf/syscall.c           | 22 +++++++++++++++++++---
 tools/include/uapi/linux/bpf.h |  1 +
 5 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index fb608e4de076..53ce2bc6dc54 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -89,7 +89,9 @@ int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
 			   char *buf, int len, u64 flags);
 
 int btf_get_fd_by_id(u32 id);
+struct btf *btf_get_by_id(int id);
 u32 btf_obj_id(const struct btf *btf);
+bool btf_is_kernel(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3ca6146f001a..782169dc41e0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -558,6 +558,7 @@ union bpf_attr {
 		__u32		line_info_cnt;	/* number of bpf_line_info records */
 		__u32		attach_btf_id;	/* in-kernel BTF type id to attach to */
 		__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+		__u32		attach_btf_obj_id; /* vmlinux/module BTF object ID for BTF type */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 7a19bf5bfe97..12876b272c6b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5652,6 +5652,19 @@ struct btf *btf_get_by_fd(int fd)
 	return btf;
 }
 
+struct btf *btf_get_by_id(int id)
+{
+	struct btf *btf;
+
+	rcu_read_lock();
+	btf = idr_find(&btf_idr, id);
+	if (!btf || !refcount_inc_not_zero(&btf->refcnt))
+		btf = ERR_PTR(-ENOENT);
+	rcu_read_unlock();
+
+	return btf;
+}
+
 int btf_get_info_by_fd(const struct btf *btf,
 		       const union bpf_attr *attr,
 		       union bpf_attr __user *uattr)
@@ -5717,12 +5730,7 @@ int btf_get_fd_by_id(u32 id)
 	struct btf *btf;
 	int fd;
 
-	rcu_read_lock();
-	btf = idr_find(&btf_idr, id);
-	if (!btf || !refcount_inc_not_zero(&btf->refcnt))
-		btf = ERR_PTR(-ENOENT);
-	rcu_read_unlock();
-
+	btf = btf_get_by_id(id);
 	if (IS_ERR(btf))
 		return PTR_ERR(btf);
 
@@ -5738,6 +5746,11 @@ u32 btf_obj_id(const struct btf *btf)
 	return btf->id;
 }
 
+bool btf_is_kernel(const struct btf *btf)
+{
+	return btf->kernel_btf;
+}
+
 static int btf_id_cmp_func(const void *a, const void *b)
 {
 	const int *pa = a, *pb = b;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5ee00611af53..3af073642664 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1968,7 +1968,7 @@ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr)
 static int
 bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 			   enum bpf_attach_type expected_attach_type,
-			   u32 btf_id, u32 prog_fd)
+			   u32 btf_obj_id, u32 btf_id, u32 prog_fd)
 {
 	if (btf_id) {
 		if (btf_id > BTF_MAX_TYPE)
@@ -1985,6 +1985,9 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		}
 	}
 
+	if (btf_obj_id && (!btf_id || prog_fd))
+		return -EINVAL;
+
 	if (prog_fd && prog_type != BPF_PROG_TYPE_TRACING &&
 	    prog_type != BPF_PROG_TYPE_EXT)
 		return -EINVAL;
@@ -2097,7 +2100,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD attach_prog_fd
+#define	BPF_PROG_LOAD_LAST_FIELD attach_btf_obj_id
 
 static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 {
@@ -2146,6 +2149,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
+				       attr->attach_btf_obj_id,
 				       attr->attach_btf_id,
 				       attr->attach_prog_fd))
 		return -EINVAL;
@@ -2158,7 +2162,19 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	prog->expected_attach_type = attr->expected_attach_type;
 	prog->aux->attach_btf_id = attr->attach_btf_id;
 
-	if (attr->attach_btf_id && !attr->attach_prog_fd) {
+	if (attr->attach_btf_obj_id) {
+		struct btf *btf;
+
+		btf = btf_get_by_id(attr->attach_btf_obj_id);
+		if (IS_ERR(btf))
+			return PTR_ERR(btf);
+		if (!btf_is_kernel(btf)) {
+			btf_put(btf);
+			return -EINVAL;
+		}
+
+		prog->aux->attach_btf = btf;
+	} else if (attr->attach_btf_id && !attr->attach_prog_fd) {
 		struct btf *btf;
 
 		btf = bpf_get_btf_vmlinux();
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3ca6146f001a..782169dc41e0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -558,6 +558,7 @@ union bpf_attr {
 		__u32		line_info_cnt;	/* number of bpf_line_info records */
 		__u32		attach_btf_id;	/* in-kernel BTF type id to attach to */
 		__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+		__u32		attach_btf_obj_id; /* vmlinux/module BTF object ID for BTF type */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 5/7] selftests/bpf: add tp_btf CO-RE reloc test for modules
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Add another CO-RE relocation test for kernel module relocations. This time for
tp_btf with direct memory reads.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/core_reloc.c     |  3 +-
 .../bpf/progs/test_core_reloc_module.c        | 32 ++++++++++++++++++-
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/core_reloc.c b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
index b3f14e3d9077..85c9cf8e3994 100644
--- a/tools/testing/selftests/bpf/prog_tests/core_reloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
@@ -512,7 +512,8 @@ static struct core_reloc_test_case test_cases[] = {
 	},
 
 	/* validate we can find kernel module BTF types for relocs/attach */
-	MODULES_CASE("module", "raw_tp/bpf_sidecar_test_read", "bpf_sidecar_test_read"),
+	MODULES_CASE("module_probed", "raw_tp/bpf_sidecar_test_read", "bpf_sidecar_test_read"),
+	MODULES_CASE("module_direct", "tp_btf/bpf_sidecar_test_read", NULL),
 
 	/* validate BPF program can use multiple flavors to match against
 	 * single target BTF type
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_module.c b/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
index 4630301de259..f336c0565d47 100644
--- a/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
@@ -36,7 +36,7 @@ struct core_reloc_module_output {
 };
 
 SEC("raw_tp/bpf_sidecar_test_read")
-int BPF_PROG(test_core_module,
+int BPF_PROG(test_core_module_probed,
 	     struct task_struct *task,
 	     struct bpf_sidecar_test_read_ctx *read_ctx)
 {
@@ -64,3 +64,33 @@ int BPF_PROG(test_core_module,
 
 	return 0;
 }
+
+SEC("tp_btf/bpf_sidecar_test_read")
+int BPF_PROG(test_core_module_direct,
+	     struct task_struct *task,
+	     struct bpf_sidecar_test_read_ctx *read_ctx)
+{
+	struct core_reloc_module_output *out = (void *)&data.out;
+	__u64 pid_tgid = bpf_get_current_pid_tgid();
+	__u32 real_tgid = (__u32)(pid_tgid >> 32);
+	__u32 real_pid = (__u32)pid_tgid;
+
+	if (data.my_pid_tgid != pid_tgid)
+		return 0;
+
+	if (task->pid != real_pid || task->tgid != real_tgid)
+		return 0;
+
+	out->len = read_ctx->len;
+	out->off = read_ctx->off;
+
+	out->read_ctx_sz = bpf_core_type_size(struct bpf_sidecar_test_read_ctx);
+	out->read_ctx_exists = bpf_core_type_exists(struct bpf_sidecar_test_read_ctx);
+	out->buf_exists = bpf_core_field_exists(read_ctx->buf);
+	out->off_exists = bpf_core_field_exists(read_ctx->off);
+	out->len_exists = bpf_core_field_exists(read_ctx->len);
+
+	out->comm_len = BPF_CORE_READ_STR_INTO(&out->comm, task, comm);
+
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 1/7] bpf: remove hard-coded btf_vmlinux assumption from BPF verifier
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead,
wherever BTF type IDs are involved, also track the instance of struct btf that
goes along with the type ID. This allows to gradually add support for kernel
module BTFs and using/tracking module types across BPF helper calls and
registers.

This patch also renames btf_id() function to btf_obj_id() to minimize naming
clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID.

Also, altough btf_vmlinux can't get destructed and thus doesn't need
refcounting, module BTFs need that, so apply BTF refcounting universally when
BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This
makes for simpler clean up code.

Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF
trampoline key to include BTF object ID. To differentiate that from target
program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are
not allowed to take full 32 bits, so there is no danger of confusing that bit
with a valid BTF type ID.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h          | 13 ++++--
 include/linux/bpf_verifier.h | 24 ++++++++---
 include/linux/btf.h          |  5 ++-
 kernel/bpf/btf.c             | 65 ++++++++++++++++++++----------
 kernel/bpf/syscall.c         | 24 +++++++++--
 kernel/bpf/verifier.c        | 77 ++++++++++++++++++++++--------------
 net/ipv4/bpf_tcp_ca.c        |  3 +-
 7 files changed, 146 insertions(+), 65 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e1bcb6d7345c..2f3b22887ef3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -421,7 +421,10 @@ struct bpf_insn_access_aux {
 	enum bpf_reg_type reg_type;
 	union {
 		int ctx_field_size;
-		u32 btf_id;
+		struct {
+			struct btf *btf;
+			u32 btf_id;
+		};
 	};
 	struct bpf_verifier_log *log; /* for verbose logs */
 };
@@ -458,6 +461,7 @@ struct bpf_verifier_ops {
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
 	int (*btf_struct_access)(struct bpf_verifier_log *log,
+				 const struct btf *btf,
 				 const struct btf_type *t, int off, int size,
 				 enum bpf_access_type atype,
 				 u32 *next_btf_id);
@@ -771,6 +775,7 @@ struct bpf_prog_aux {
 	u32 ctx_arg_info_size;
 	u32 max_rdonly_access;
 	u32 max_rdwr_access;
+	struct btf *attach_btf;
 	const struct bpf_ctx_arg_aux *ctx_arg_info;
 	struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */
 	struct bpf_prog *dst_prog;
@@ -1005,7 +1010,6 @@ struct bpf_event_entry {
 
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
 int bpf_prog_calc_tag(struct bpf_prog *fp);
-const char *kernel_type_name(u32 btf_type_id);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
@@ -1430,12 +1434,13 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info);
-int btf_struct_access(struct bpf_verifier_log *log,
+int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
 		      const struct btf_type *t, int off, int size,
 		      enum bpf_access_type atype,
 		      u32 *next_btf_id);
 bool btf_struct_ids_match(struct bpf_verifier_log *log,
-			  int off, u32 id, u32 need_type_id);
+			  const struct btf *btf, u32 id, int off,
+			  const struct btf *need_btf, u32 need_type_id);
 
 int btf_distill_func_proto(struct bpf_verifier_log *log,
 			   struct btf *btf,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 306869d4743b..12ded13d2885 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -5,6 +5,7 @@
 #define _LINUX_BPF_VERIFIER_H 1
 
 #include <linux/bpf.h> /* for enum bpf_reg_type */
+#include <linux/btf.h> /* for struct btf and btf_id() */
 #include <linux/filter.h> /* for MAX_BPF_STACK */
 #include <linux/tnum.h>
 
@@ -52,12 +53,19 @@ struct bpf_reg_state {
 		 */
 		struct bpf_map *map_ptr;
 
-		u32 btf_id; /* for PTR_TO_BTF_ID */
+		/* for PTR_TO_BTF_ID */
+		struct {
+			struct btf *btf;
+			u32 btf_id;
+		};
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 
 		/* Max size from any of the above. */
-		unsigned long raw;
+		struct {
+			unsigned long raw1;
+			unsigned long raw2;
+		} raw;
 	};
 	/* Fixed part of pointer offset, pointer types only */
 	s32 off;
@@ -311,7 +319,10 @@ struct bpf_insn_aux_data {
 		struct {
 			enum bpf_reg_type reg_type;	/* type of pseudo_btf_id */
 			union {
-				u32 btf_id;	/* btf_id for struct typed var */
+				struct {
+					struct btf *btf;
+					u32 btf_id;	/* btf_id for struct typed var */
+				};
 				u32 mem_size;	/* mem_size for non-struct typed var */
 			};
 		} btf_var;
@@ -459,9 +470,12 @@ int check_ctx_reg(struct bpf_verifier_env *env,
 
 /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
 static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
-					     u32 btf_id)
+					     struct btf *btf, u32 btf_id)
 {
-        return tgt_prog ? (((u64)tgt_prog->aux->id) << 32 | btf_id) : btf_id;
+	if (tgt_prog)
+		return ((u64)tgt_prog->aux->id << 32) | btf_id;
+	else
+		return ((u64)btf_obj_id(btf) << 32) | 0x80000000 | btf_id;
 }
 
 int bpf_check_attach_target(struct bpf_verifier_log *log,
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 2bf641829664..fb608e4de076 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -18,6 +18,7 @@ struct btf_show;
 
 extern const struct file_operations btf_fops;
 
+void btf_get(struct btf *btf);
 void btf_put(struct btf *btf);
 int btf_new_fd(const union bpf_attr *attr);
 struct btf *btf_get_by_fd(int fd);
@@ -88,7 +89,7 @@ int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
 			   char *buf, int len, u64 flags);
 
 int btf_get_fd_by_id(u32 id);
-u32 btf_id(const struct btf *btf);
+u32 btf_obj_id(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
@@ -206,6 +207,8 @@ static inline const struct btf_var_secinfo *btf_type_var_secinfo(
 }
 
 #ifdef CONFIG_BPF_SYSCALL
+struct bpf_prog;
+
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
 struct btf *btf_parse_vmlinux(void);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6b2d508b33d4..7a19bf5bfe97 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -1524,6 +1524,11 @@ static void btf_free_rcu(struct rcu_head *rcu)
 	btf_free(btf);
 }
 
+void btf_get(struct btf *btf)
+{
+	refcount_inc(&btf->refcnt);
+}
+
 void btf_put(struct btf *btf)
 {
 	if (btf && refcount_dec_and_test(&btf->refcnt)) {
@@ -4555,11 +4560,10 @@ struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
 {
 	struct bpf_prog *tgt_prog = prog->aux->dst_prog;
 
-	if (tgt_prog) {
+	if (tgt_prog)
 		return tgt_prog->aux->btf;
-	} else {
-		return btf_vmlinux;
-	}
+	else
+		return prog->aux->attach_btf;
 }
 
 static bool is_string_ptr(struct btf *btf, const struct btf_type *t)
@@ -4700,6 +4704,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 
 		if (ctx_arg_info->offset == off) {
 			info->reg_type = ctx_arg_info->reg_type;
+			info->btf = btf_vmlinux;
 			info->btf_id = ctx_arg_info->btf_id;
 			return true;
 		}
@@ -4716,6 +4721,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 
 		ret = btf_translate_to_vmlinux(log, btf, t, tgt_type, arg);
 		if (ret > 0) {
+			info->btf = btf_vmlinux;
 			info->btf_id = ret;
 			return true;
 		} else {
@@ -4723,6 +4729,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		}
 	}
 
+	info->btf = btf;
 	info->btf_id = t->type;
 	t = btf_type_by_id(btf, t->type);
 	/* skip modifiers */
@@ -4749,7 +4756,7 @@ enum bpf_struct_walk_result {
 	WALK_STRUCT,
 };
 
-static int btf_struct_walk(struct bpf_verifier_log *log,
+static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 			   const struct btf_type *t, int off, int size,
 			   u32 *next_btf_id)
 {
@@ -4760,7 +4767,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 	u32 vlen, elem_id, mid;
 
 again:
-	tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
+	tname = __btf_name_by_offset(btf, t->name_off);
 	if (!btf_type_is_struct(t)) {
 		bpf_log(log, "Type '%s' is not a struct\n", tname);
 		return -EINVAL;
@@ -4777,7 +4784,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 			goto error;
 
 		member = btf_type_member(t) + vlen - 1;
-		mtype = btf_type_skip_modifiers(btf_vmlinux, member->type,
+		mtype = btf_type_skip_modifiers(btf, member->type,
 						NULL);
 		if (!btf_type_is_array(mtype))
 			goto error;
@@ -4793,7 +4800,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 		/* Only allow structure for now, can be relaxed for
 		 * other types later.
 		 */
-		t = btf_type_skip_modifiers(btf_vmlinux, array_elem->type,
+		t = btf_type_skip_modifiers(btf, array_elem->type,
 					    NULL);
 		if (!btf_type_is_struct(t))
 			goto error;
@@ -4851,10 +4858,10 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 
 		/* type of the field */
 		mid = member->type;
-		mtype = btf_type_by_id(btf_vmlinux, member->type);
-		mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
+		mtype = btf_type_by_id(btf, member->type);
+		mname = __btf_name_by_offset(btf, member->name_off);
 
-		mtype = __btf_resolve_size(btf_vmlinux, mtype, &msize,
+		mtype = __btf_resolve_size(btf, mtype, &msize,
 					   &elem_type, &elem_id, &total_nelems,
 					   &mid);
 		if (IS_ERR(mtype)) {
@@ -4949,7 +4956,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 					mname, moff, tname, off, size);
 				return -EACCES;
 			}
-			stype = btf_type_skip_modifiers(btf_vmlinux, mtype->type, &id);
+			stype = btf_type_skip_modifiers(btf, mtype->type, &id);
 			if (btf_type_is_struct(stype)) {
 				*next_btf_id = id;
 				return WALK_PTR;
@@ -4975,7 +4982,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log,
 	return -EINVAL;
 }
 
-int btf_struct_access(struct bpf_verifier_log *log,
+int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
 		      const struct btf_type *t, int off, int size,
 		      enum bpf_access_type atype __maybe_unused,
 		      u32 *next_btf_id)
@@ -4984,7 +4991,7 @@ int btf_struct_access(struct bpf_verifier_log *log,
 	u32 id;
 
 	do {
-		err = btf_struct_walk(log, t, off, size, &id);
+		err = btf_struct_walk(log, btf, t, off, size, &id);
 
 		switch (err) {
 		case WALK_PTR:
@@ -5000,7 +5007,7 @@ int btf_struct_access(struct bpf_verifier_log *log,
 			 * by diving in it. At this point the offset is
 			 * aligned with the new type, so set it to 0.
 			 */
-			t = btf_type_by_id(btf_vmlinux, id);
+			t = btf_type_by_id(btf, id);
 			off = 0;
 			break;
 		default:
@@ -5016,21 +5023,37 @@ int btf_struct_access(struct bpf_verifier_log *log,
 	return -EINVAL;
 }
 
+/* Check that two BTF types, each specified as an BTF object + id, are exactly
+ * the same. Trivial ID check is not enough due to module BTFs, because we can
+ * end up with two different module BTFs, but IDs point to the common type in
+ * vmlinux BTF.
+ */
+static bool btf_types_are_same(const struct btf *btf1, u32 id1,
+			       const struct btf *btf2, u32 id2)
+{
+	if (id1 != id2)
+		return false;
+	if (btf1 == btf2)
+		return true;
+	return btf_type_by_id(btf1, id1) == btf_type_by_id(btf2, id2);
+}
+
 bool btf_struct_ids_match(struct bpf_verifier_log *log,
-			  int off, u32 id, u32 need_type_id)
+			  const struct btf *btf, u32 id, int off,
+			  const struct btf *need_btf, u32 need_type_id)
 {
 	const struct btf_type *type;
 	int err;
 
 	/* Are we already done? */
-	if (need_type_id == id && off == 0)
+	if (off == 0 && btf_types_are_same(btf, id, need_btf, need_type_id))
 		return true;
 
 again:
-	type = btf_type_by_id(btf_vmlinux, id);
+	type = btf_type_by_id(btf, id);
 	if (!type)
 		return false;
-	err = btf_struct_walk(log, type, off, 1, &id);
+	err = btf_struct_walk(log, btf, type, off, 1, &id);
 	if (err != WALK_STRUCT)
 		return false;
 
@@ -5039,7 +5062,7 @@ bool btf_struct_ids_match(struct bpf_verifier_log *log,
 	 * continue the search with offset 0 in the new
 	 * type.
 	 */
-	if (need_type_id != id) {
+	if (!btf_types_are_same(btf, id, need_btf, need_type_id)) {
 		off = 0;
 		goto again;
 	}
@@ -5710,7 +5733,7 @@ int btf_get_fd_by_id(u32 id)
 	return fd;
 }
 
-u32 btf_id(const struct btf *btf)
+u32 btf_obj_id(const struct btf *btf)
 {
 	return btf->id;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f3fe9f53f93c..5ee00611af53 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1726,6 +1726,8 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 	bpf_prog_uncharge_memlock(aux->prog);
 	security_bpf_prog_free(aux);
 	bpf_prog_free(aux->prog);
+	if (aux->attach_btf)
+		btf_put(aux->attach_btf);
 }
 
 static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
@@ -2155,6 +2157,20 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 	prog->expected_attach_type = attr->expected_attach_type;
 	prog->aux->attach_btf_id = attr->attach_btf_id;
+
+	if (attr->attach_btf_id && !attr->attach_prog_fd) {
+		struct btf *btf;
+
+		btf = bpf_get_btf_vmlinux();
+		if (IS_ERR(btf))
+			return PTR_ERR(btf);
+		if (!btf)
+			return -EINVAL;
+
+		btf_get(btf);
+		prog->aux->attach_btf = btf;
+	}
+
 	if (attr->attach_prog_fd) {
 		struct bpf_prog *dst_prog;
 
@@ -2255,6 +2271,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 free_prog_sec:
 	security_bpf_prog_free(prog->aux);
 free_prog_nouncharge:
+	if (prog->aux->attach_btf)
+		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
 	return err;
 }
@@ -2612,7 +2630,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 			goto out_put_prog;
 		}
 
-		key = bpf_trampoline_compute_key(tgt_prog, btf_id);
+		key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
 	}
 
 	link = kzalloc(sizeof(*link), GFP_USER);
@@ -3589,7 +3607,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
 	}
 
 	if (prog->aux->btf)
-		info.btf_id = btf_id(prog->aux->btf);
+		info.btf_id = btf_obj_id(prog->aux->btf);
 
 	ulen = info.nr_func_info;
 	info.nr_func_info = prog->aux->func_info_cnt;
@@ -3692,7 +3710,7 @@ static int bpf_map_get_info_by_fd(struct file *file,
 	memcpy(info.name, map->name, sizeof(map->name));
 
 	if (map->btf) {
-		info.btf_id = btf_id(map->btf);
+		info.btf_id = btf_obj_id(map->btf);
 		info.btf_key_type_id = map->btf_key_type_id;
 		info.btf_value_type_id = map->btf_value_type_id;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fb2943ea715d..34dba0fba981 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -238,7 +238,9 @@ struct bpf_call_arg_meta {
 	u64 msize_max_value;
 	int ref_obj_id;
 	int func_id;
+	struct btf *btf;
 	u32 btf_id;
+	struct btf *ret_btf;
 	u32 ret_btf_id;
 };
 
@@ -556,10 +558,9 @@ static struct bpf_func_state *func(struct bpf_verifier_env *env,
 	return cur->frame[reg->frameno];
 }
 
-const char *kernel_type_name(u32 id)
+static const char *kernel_type_name(const struct btf* btf, u32 id)
 {
-	return btf_name_by_offset(btf_vmlinux,
-				  btf_type_by_id(btf_vmlinux, id)->name_off);
+	return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off);
 }
 
 static void print_verifier_state(struct bpf_verifier_env *env,
@@ -589,7 +590,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			if (t == PTR_TO_BTF_ID ||
 			    t == PTR_TO_BTF_ID_OR_NULL ||
 			    t == PTR_TO_PERCPU_BTF_ID)
-				verbose(env, "%s", kernel_type_name(reg->btf_id));
+				verbose(env, "%s", kernel_type_name(reg->btf, reg->btf_id));
 			verbose(env, "(id=%d", reg->id);
 			if (reg_type_may_be_refcounted_or_null(t))
 				verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
@@ -1383,7 +1384,8 @@ static void mark_reg_not_init(struct bpf_verifier_env *env,
 
 static void mark_btf_ld_reg(struct bpf_verifier_env *env,
 			    struct bpf_reg_state *regs, u32 regno,
-			    enum bpf_reg_type reg_type, u32 btf_id)
+			    enum bpf_reg_type reg_type,
+			    struct btf *btf, u32 btf_id)
 {
 	if (reg_type == SCALAR_VALUE) {
 		mark_reg_unknown(env, regs, regno);
@@ -1391,6 +1393,7 @@ static void mark_btf_ld_reg(struct bpf_verifier_env *env,
 	}
 	mark_reg_known_zero(env, regs, regno);
 	regs[regno].type = PTR_TO_BTF_ID;
+	regs[regno].btf = btf;
 	regs[regno].btf_id = btf_id;
 }
 
@@ -2764,7 +2767,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type,
-			    u32 *btf_id)
+			    struct btf **btf, u32 *btf_id)
 {
 	struct bpf_insn_access_aux info = {
 		.reg_type = *reg_type,
@@ -2782,10 +2785,12 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 		 */
 		*reg_type = info.reg_type;
 
-		if (*reg_type == PTR_TO_BTF_ID || *reg_type == PTR_TO_BTF_ID_OR_NULL)
+		if (*reg_type == PTR_TO_BTF_ID || *reg_type == PTR_TO_BTF_ID_OR_NULL) {
+			*btf = info.btf;
 			*btf_id = info.btf_id;
-		else
+		} else {
 			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
+		}
 		/* remember the offset of last byte accessed in ctx */
 		if (env->prog->aux->max_ctx_offset < off + size)
 			env->prog->aux->max_ctx_offset = off + size;
@@ -3297,8 +3302,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 				   int value_regno)
 {
 	struct bpf_reg_state *reg = regs + regno;
-	const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
-	const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+	const struct btf_type *t = btf_type_by_id(reg->btf, reg->btf_id);
+	const char *tname = btf_name_by_offset(reg->btf, t->name_off);
 	u32 btf_id;
 	int ret;
 
@@ -3319,23 +3324,23 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	}
 
 	if (env->ops->btf_struct_access) {
-		ret = env->ops->btf_struct_access(&env->log, t, off, size,
-						  atype, &btf_id);
+		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
+						  off, size, atype, &btf_id);
 	} else {
 		if (atype != BPF_READ) {
 			verbose(env, "only read is supported\n");
 			return -EACCES;
 		}
 
-		ret = btf_struct_access(&env->log, t, off, size, atype,
-					&btf_id);
+		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
+					atype, &btf_id);
 	}
 
 	if (ret < 0)
 		return ret;
 
 	if (atype == BPF_READ && value_regno >= 0)
-		mark_btf_ld_reg(env, regs, value_regno, ret, btf_id);
+		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id);
 
 	return 0;
 }
@@ -3385,12 +3390,12 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	ret = btf_struct_access(&env->log, t, off, size, atype, &btf_id);
+	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id);
 	if (ret < 0)
 		return ret;
 
 	if (value_regno >= 0)
-		mark_btf_ld_reg(env, regs, value_regno, ret, btf_id);
+		mark_btf_ld_reg(env, regs, value_regno, ret, btf_vmlinux, btf_id);
 
 	return 0;
 }
@@ -3466,6 +3471,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			mark_reg_unknown(env, regs, value_regno);
 	} else if (reg->type == PTR_TO_CTX) {
 		enum bpf_reg_type reg_type = SCALAR_VALUE;
+		struct btf *btf = NULL;
 		u32 btf_id = 0;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
@@ -3478,7 +3484,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err < 0)
 			return err;
 
-		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
+		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf, &btf_id);
 		if (err)
 			verbose_linfo(env, insn_idx, "; ");
 		if (!err && t == BPF_READ && value_regno >= 0) {
@@ -3500,8 +3506,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				 */
 				regs[value_regno].subreg_def = DEF_NOT_SUBREG;
 				if (reg_type == PTR_TO_BTF_ID ||
-				    reg_type == PTR_TO_BTF_ID_OR_NULL)
+				    reg_type == PTR_TO_BTF_ID_OR_NULL) {
+					regs[value_regno].btf = btf;
 					regs[value_regno].btf_id = btf_id;
+				}
 			}
 			regs[value_regno].type = reg_type;
 		}
@@ -4118,11 +4126,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			arg_btf_id = compatible->btf_id;
 		}
 
-		if (!btf_struct_ids_match(&env->log, reg->off, reg->btf_id,
-					  *arg_btf_id)) {
+		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+					  btf_vmlinux, *arg_btf_id)) {
 			verbose(env, "R%d is of type %s but %s is expected\n",
-				regno, kernel_type_name(reg->btf_id),
-				kernel_type_name(*arg_btf_id));
+				regno, kernel_type_name(reg->btf, reg->btf_id),
+				kernel_type_name(btf_vmlinux, *arg_btf_id));
 			return -EACCES;
 		}
 
@@ -4244,6 +4252,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "Helper has invalid btf_id in R%d\n", regno);
 			return -EACCES;
 		}
+		meta->ret_btf = reg->btf;
 		meta->ret_btf_id = reg->btf_id;
 	} else if (arg_type == ARG_PTR_TO_SPIN_LOCK) {
 		if (meta->func_id == BPF_FUNC_spin_lock) {
@@ -5190,16 +5199,16 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		const struct btf_type *t;
 
 		mark_reg_known_zero(env, regs, BPF_REG_0);
-		t = btf_type_skip_modifiers(btf_vmlinux, meta.ret_btf_id, NULL);
+		t = btf_type_skip_modifiers(meta.ret_btf, meta.ret_btf_id, NULL);
 		if (!btf_type_is_struct(t)) {
 			u32 tsize;
 			const struct btf_type *ret;
 			const char *tname;
 
 			/* resolve the type size of ksym. */
-			ret = btf_resolve_size(btf_vmlinux, t, &tsize);
+			ret = btf_resolve_size(meta.ret_btf, t, &tsize);
 			if (IS_ERR(ret)) {
-				tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+				tname = btf_name_by_offset(meta.ret_btf, t->name_off);
 				verbose(env, "unable to resolve the size of type '%s': %ld\n",
 					tname, PTR_ERR(ret));
 				return -EINVAL;
@@ -5212,6 +5221,7 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 			regs[BPF_REG_0].type =
 				fn->ret_type == RET_PTR_TO_MEM_OR_BTF_ID ?
 				PTR_TO_BTF_ID : PTR_TO_BTF_ID_OR_NULL;
+			regs[BPF_REG_0].btf = meta.ret_btf;
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
 	} else if (fn->ret_type == RET_PTR_TO_BTF_ID_OR_NULL ||
@@ -5228,6 +5238,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 				fn->ret_type, func_id_name(func_id), func_id);
 			return -EINVAL;
 		}
+		/* current BPF helper definitions are only coming from
+		 * built-in code with type IDs from  vmlinux BTF
+		 */
+		regs[BPF_REG_0].btf = btf_vmlinux;
 		regs[BPF_REG_0].btf_id = ret_btf_id;
 	} else {
 		verbose(env, "unknown return type %d of func %s#%d\n",
@@ -5627,7 +5641,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		if (reg_is_pkt_pointer(ptr_reg)) {
 			dst_reg->id = ++env->id_gen;
 			/* something was added to pkt_ptr, set range to zero */
-			dst_reg->raw = 0;
+			memset(&dst_reg->raw, 0, sizeof(dst_reg->raw));
 		}
 		break;
 	case BPF_SUB:
@@ -5692,7 +5706,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 			dst_reg->id = ++env->id_gen;
 			/* something was added to pkt_ptr, set range to zero */
 			if (smin_val < 0)
-				dst_reg->raw = 0;
+				memset(&dst_reg->raw, 0, sizeof(dst_reg->raw));
 		}
 		break;
 	case BPF_AND:
@@ -7744,6 +7758,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 			break;
 		case PTR_TO_BTF_ID:
 		case PTR_TO_PERCPU_BTF_ID:
+			dst_reg->btf = aux->btf_var.btf;
 			dst_reg->btf_id = aux->btf_var.btf_id;
 			break;
 		default:
@@ -9728,6 +9743,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 	t = btf_type_skip_modifiers(btf_vmlinux, type, NULL);
 	if (percpu) {
 		aux->btf_var.reg_type = PTR_TO_PERCPU_BTF_ID;
+		aux->btf_var.btf = btf_vmlinux;
 		aux->btf_var.btf_id = type;
 	} else if (!btf_type_is_struct(t)) {
 		const struct btf_type *ret;
@@ -9746,6 +9762,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 		aux->btf_var.mem_size = tsize;
 	} else {
 		aux->btf_var.reg_type = PTR_TO_BTF_ID;
+		aux->btf_var.btf = btf_vmlinux;
 		aux->btf_var.btf_id = type;
 	}
 	return 0;
@@ -11598,7 +11615,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		bpf_log(log, "Tracing programs must provide btf_id\n");
 		return -EINVAL;
 	}
-	btf = tgt_prog ? tgt_prog->aux->btf : btf_vmlinux;
+	btf = tgt_prog ? tgt_prog->aux->btf : prog->aux->attach_btf;
 	if (!btf) {
 		bpf_log(log,
 			"FENTRY/FEXIT program can only be attached to another program annotated with BTF\n");
@@ -11874,7 +11891,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			return ret;
 	}
 
-	key = bpf_trampoline_compute_key(tgt_prog, btf_id);
+	key = bpf_trampoline_compute_key(tgt_prog, prog->aux->attach_btf, btf_id);
 	tr = bpf_trampoline_get(key, &tgt_info);
 	if (!tr)
 		return -ENOMEM;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 618954f82764..d520e61649c8 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -95,6 +95,7 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
 }
 
 static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
+					const struct btf *btf,
 					const struct btf_type *t, int off,
 					int size, enum bpf_access_type atype,
 					u32 *next_btf_id)
@@ -102,7 +103,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
 	size_t end;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, t, off, size, atype, next_btf_id);
+		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id);
 
 	if (t != tcp_sock_type) {
 		bpf_log(log, "only read is supported\n");
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 6/7] selftests/bpf: make BPF sidecar traceable function global
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Pahole currently isn't able to generate BTF for static functions in kernel
modules, so make sure traced sidecar function is global.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/bpf_sidecar/bpf_sidecar.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/bpf_sidecar/bpf_sidecar.c b/tools/testing/selftests/bpf/bpf_sidecar/bpf_sidecar.c
index 46f48c20d99b..a2f6d4a15a3f 100644
--- a/tools/testing/selftests/bpf/bpf_sidecar/bpf_sidecar.c
+++ b/tools/testing/selftests/bpf/bpf_sidecar/bpf_sidecar.c
@@ -10,7 +10,7 @@
 #define CREATE_TRACE_POINTS
 #include "bpf_sidecar-events.h"
 
-static noinline ssize_t
+noinline ssize_t
 bpf_sidecar_test_read(struct file *file, struct kobject *kobj,
 		      struct bin_attribute *bin_attr,
 		      char *buf, loff_t off, size_t len)
@@ -25,6 +25,7 @@ bpf_sidecar_test_read(struct file *file, struct kobject *kobj,
 
 	return -EIO; /* always fail */
 }
+EXPORT_SYMBOL(bpf_sidecar_test_read);
 ALLOW_ERROR_INJECTION(bpf_sidecar_test_read, ERRNO);
 
 static struct bin_attribute bin_attr_bpf_sidecar_file __ro_after_init = {
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 4/7] libbpf: support attachment of BPF tracing programs to kernel modules
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team
In-Reply-To: <20201121024616.1588175-1-andrii@kernel.org>

Teach libbpf to search for BTF types in kernel modules for tracing BPF
programs.  This allows attachment of raw_tp/fentry/fexit/fmod_ret/etc BPF
program types to tracepoints and functions in kernel modules.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c             |   1 +
 tools/lib/bpf/libbpf.c          | 105 ++++++++++++++++++++++++++------
 tools/lib/bpf/libbpf_internal.h |   1 +
 3 files changed, 87 insertions(+), 20 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index e371ef91f5e8..dfdad932ef72 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -230,6 +230,7 @@ int libbpf__bpf_prog_load(const struct bpf_prog_load_params *load_attr)
 	attr.prog_type = load_attr->prog_type;
 	attr.expected_attach_type = load_attr->expected_attach_type;
 
+	attr.attach_btf_obj_id = load_attr->attach_btf_obj_id;
 	attr.attach_btf_id = load_attr->attach_btf_id;
 	attr.attach_prog_fd = load_attr->attach_prog_fd;
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3425e594132b..c3ca34998a76 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -278,6 +278,7 @@ struct bpf_program {
 	enum bpf_prog_type type;
 	enum bpf_attach_type expected_attach_type;
 	int prog_ifindex;
+	__u32 attach_btf_obj_id;
 	__u32 attach_btf_id;
 	__u32 attach_prog_fd;
 	void *func_info;
@@ -6832,6 +6833,7 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	load_attr.license = license;
 	load_attr.attach_btf_id = prog->attach_btf_id;
 	load_attr.attach_prog_fd = prog->attach_prog_fd;
+	load_attr.attach_btf_obj_id = prog->attach_btf_obj_id;
 	load_attr.attach_btf_id = prog->attach_btf_id;
 	load_attr.kern_version = kern_version;
 	load_attr.prog_ifindex = prog->prog_ifindex;
@@ -6927,11 +6929,11 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	return ret;
 }
 
-static int libbpf_find_attach_btf_id(struct bpf_program *prog);
+static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_id, int *btf_type_id);
 
 int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver)
 {
-	int err = 0, fd, i, btf_id;
+	int err = 0, fd, i;
 
 	if (prog->obj->loaded) {
 		pr_warn("prog '%s': can't load after object was loaded\n", prog->name);
@@ -6941,10 +6943,14 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver)
 	if ((prog->type == BPF_PROG_TYPE_TRACING ||
 	     prog->type == BPF_PROG_TYPE_LSM ||
 	     prog->type == BPF_PROG_TYPE_EXT) && !prog->attach_btf_id) {
-		btf_id = libbpf_find_attach_btf_id(prog);
-		if (btf_id <= 0)
-			return btf_id;
-		prog->attach_btf_id = btf_id;
+		int btf_obj_id = 0, btf_type_id = 0;
+
+		err = libbpf_find_attach_btf_id(prog, &btf_obj_id, &btf_type_id);
+		if (err)
+			return err;
+
+		prog->attach_btf_id = btf_type_id;
+		prog->attach_btf_obj_id = btf_obj_id;
 	}
 
 	if (prog->instances.nr < 0 || !prog->instances.fds) {
@@ -8796,8 +8802,8 @@ static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
 	return btf__find_by_name_kind(btf, btf_type_name, kind);
 }
 
-static inline int __find_vmlinux_btf_id(struct btf *btf, const char *name,
-					enum bpf_attach_type attach_type)
+static inline int find_attach_btf_id(struct btf *btf, const char *name,
+				     enum bpf_attach_type attach_type)
 {
 	int err;
 
@@ -8828,7 +8834,7 @@ int libbpf_find_vmlinux_btf_id(const char *name,
 		return -EINVAL;
 	}
 
-	err = __find_vmlinux_btf_id(btf, name, attach_type);
+	err = find_attach_btf_id(btf, name, attach_type);
 	if (err <= 0)
 		pr_warn("%s is not found in vmlinux BTF\n", name);
 
@@ -8869,11 +8875,49 @@ static int libbpf_find_prog_btf_id(const char *name, __u32 attach_prog_fd)
 	return err;
 }
 
-static int libbpf_find_attach_btf_id(struct bpf_program *prog)
+static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
+			      enum bpf_attach_type attach_type,
+			      int *btf_obj_id, int *btf_type_id)
+{
+	int ret, i;
+
+	ret = find_attach_btf_id(obj->btf_vmlinux, attach_name, attach_type);
+	if (ret > 0) {
+		*btf_obj_id = 0; /* vmlinux BTF */
+		*btf_type_id = ret;
+		return 0;
+	}
+	if (ret != -ENOENT)
+		return ret;
+
+	ret = load_module_btfs(obj);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < obj->btf_module_cnt; i++) {
+		const struct module_btf *mod = &obj->btf_modules[i];
+
+		ret = find_attach_btf_id(mod->btf, attach_name, attach_type);
+		if (ret > 0) {
+			*btf_obj_id = mod->id;
+			*btf_type_id = ret;
+			return 0;
+		}
+		if (ret == -ENOENT)
+			continue;
+
+		return ret;
+	}
+
+	return -ESRCH;
+}
+
+static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_id, int *btf_type_id)
 {
 	enum bpf_attach_type attach_type = prog->expected_attach_type;
 	__u32 attach_prog_fd = prog->attach_prog_fd;
-	const char *name = prog->sec_name;
+	const char *name = prog->sec_name, *attach_name;
+	const struct bpf_sec_def *sec = NULL;
 	int i, err;
 
 	if (!name)
@@ -8884,17 +8928,37 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog)
 			continue;
 		if (strncmp(name, section_defs[i].sec, section_defs[i].len))
 			continue;
-		if (attach_prog_fd)
-			err = libbpf_find_prog_btf_id(name + section_defs[i].len,
-						      attach_prog_fd);
-		else
-			err = __find_vmlinux_btf_id(prog->obj->btf_vmlinux,
-						    name + section_defs[i].len,
-						    attach_type);
+
+		sec = &section_defs[i];
+		break;
+	}
+
+	if (!sec) {
+		pr_warn("failed to identify BTF ID based on ELF section name '%s'\n", name);
+		return -ESRCH;
+	}
+	attach_name = name + sec->len;
+
+	/* BPF program's BTF ID */
+	if (attach_prog_fd) {
+		err = libbpf_find_prog_btf_id(attach_name, attach_prog_fd);
+		if (err < 0) {
+			pr_warn("failed to find BPF program (FD %d) BTF ID for '%s': %d\n",
+				 attach_prog_fd, attach_name, err);
+			return err;
+		}
+		*btf_obj_id = 0;
+		*btf_type_id = err;
+		return 0;
+	}
+
+	/* kernel/module BTF ID */
+	err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_id, btf_type_id);
+	if (err) {
+		pr_warn("failed to find kernel BTF type ID of '%s': %d\n", attach_name, err);
 		return err;
 	}
-	pr_warn("failed to identify btf_id based on ELF section name '%s'\n", name);
-	return -ESRCH;
+	return 0;
 }
 
 int libbpf_attach_type_by_name(const char *name,
@@ -10783,6 +10847,7 @@ int bpf_program__set_attach_target(struct bpf_program *prog,
 		return btf_id;
 
 	prog->attach_btf_id = btf_id;
+	prog->attach_btf_obj_id = 0;
 	prog->attach_prog_fd = attach_prog_fd;
 	return 0;
 }
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 681073a67ae3..36236b4430ff 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -160,6 +160,7 @@ struct bpf_prog_load_params {
 	const char *license;
 	__u32 kern_version;
 	__u32 attach_prog_fd;
+	__u32 attach_btf_obj_id;
 	__u32 attach_btf_id;
 	__u32 prog_ifindex;
 	__u32 prog_btf_fd;
-- 
2.24.1


^ permalink raw reply related

* [PATCH bpf-next 0/7] Add kernel modules support for tracing BPF program attachments
From: Andrii Nakryiko @ 2020-11-21  2:46 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii, kernel-team

Building on top of two previous patch sets ([0] and not yet landed [1]), this
patch sets extends kernel and libbpf with support for attaching BTF-powered
raw tracepoint (tp_btf) and tracing (fentry/fexit/fmod_ret/lsm) BPF programs
to BPF hooks defined in kernel modules.

Kernel UAPI for BPF_PROG_LOAD is extended with extra parameter
(attach_btf_obj_id) which allows to specify kernel module BTF in which the BTF
type is identifed by attach_btf_id.

From end user perspective there are no extra actions that need to happen.
Libbpf will continue searching across all kernel module BTFs, if desired
attach BTF type is not found in vmlinux. That way it doesn't matter if BPF
hook that user is trying to attach to is built-in into vmlinux image or is
loaded in kernel module.

Currently pahole doesn't generate BTF_KIND_FUNC info for ftrace-able static
functions in kernel modules, so expose traced function in bpf_sidecar.ko. Once
pahole is enhanced, we can go back to static function.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=380759&state=*
  [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=387965&state=*

Andrii Nakryiko (7):
  bpf: remove hard-coded btf_vmlinux assumption from BPF verifier
  bpf: allow to specify kernel module BTFs when attaching BPF programs
  libbpf: factor out low-level BPF program loading helper
  libbpf: support attachment of BPF tracing programs to kernel modules
  selftests/bpf: add tp_btf CO-RE reloc test for modules
  selftests/bpf: make BPF sidecar traceable function global
  selftests/bpf: add fentry/fexit/fmod_ret selftest for kernel module

 include/linux/bpf.h                           |  13 +-
 include/linux/bpf_verifier.h                  |  24 ++-
 include/linux/btf.h                           |   7 +-
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/btf.c                              |  90 ++++++++----
 kernel/bpf/syscall.c                          |  44 +++++-
 kernel/bpf/verifier.c                         |  77 ++++++----
 net/ipv4/bpf_tcp_ca.c                         |   3 +-
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           | 101 +++++++++----
 tools/lib/bpf/libbpf.c                        | 139 +++++++++++++-----
 tools/lib/bpf/libbpf_internal.h               |  30 ++++
 .../selftests/bpf/bpf_sidecar/bpf_sidecar.c   |   3 +-
 .../selftests/bpf/prog_tests/core_reloc.c     |   3 +-
 .../selftests/bpf/prog_tests/module_attach.c  |  53 +++++++
 .../bpf/progs/test_core_reloc_module.c        |  32 +++-
 .../selftests/bpf/progs/test_module_attach.c  |  66 +++++++++
 17 files changed, 541 insertions(+), 146 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/module_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_module_attach.c

-- 
2.24.1


^ permalink raw reply

* Re: [PATCH bpf-next v2 1/7] samples: bpf: refactor hbm program with libbpf
From: Martin KaFai Lau @ 2020-11-21  2:42 UTC (permalink / raw)
  To: Daniel T. Lee
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, brakmo,
	Jesper Dangaard Brouer, Andrii Nakryiko, Lorenzo Bianconi,
	David Ahern, Yonghong Song, Toke Høiland-Jørgensen,
	Ira Weiny, Thomas Graf, Jakub Kicinski, John Fastabend, bpf,
	netdev, Xdp
In-Reply-To: <20201121023405.tchtyadco4x45sf3@kafai-mbp.dhcp.thefacebook.com>

On Fri, Nov 20, 2020 at 06:34:05PM -0800, Martin KaFai Lau wrote:
> On Thu, Nov 19, 2020 at 03:06:11PM +0000, Daniel T. Lee wrote:
> [ ... ]
> 
> >  static int run_bpf_prog(char *prog, int cg_id)
> >  {
> > -	int map_fd;
> > -	int rc = 0;
> > +	struct hbm_queue_stats qstats = {0};
> > +	char cg_dir[100], cg_pin_path[100];
> > +	struct bpf_link *link = NULL;
> >  	int key = 0;
> >  	int cg1 = 0;
> > -	int type = BPF_CGROUP_INET_EGRESS;
> > -	char cg_dir[100];
> > -	struct hbm_queue_stats qstats = {0};
> > +	int rc = 0;
> >  
> >  	sprintf(cg_dir, "/hbm%d", cg_id);
> > -	map_fd = prog_load(prog);
> > -	if (map_fd  == -1)
> > -		return 1;
> > +	rc = prog_load(prog);
> > +	if (rc != 0)
> > +		return rc;
> >  
> >  	if (setup_cgroup_environment()) {
> >  		printf("ERROR: setting cgroup environment\n");
> > @@ -190,16 +183,25 @@ static int run_bpf_prog(char *prog, int cg_id)
> >  	qstats.stats = stats_flag ? 1 : 0;
> >  	qstats.loopback = loopback_flag ? 1 : 0;
> >  	qstats.no_cn = no_cn_flag ? 1 : 0;
> > -	if (bpf_map_update_elem(map_fd, &key, &qstats, BPF_ANY)) {
> > +	if (bpf_map_update_elem(queue_stats_fd, &key, &qstats, BPF_ANY)) {
> >  		printf("ERROR: Could not update map element\n");
> >  		goto err;
> >  	}
> >  
> >  	if (!outFlag)
> > -		type = BPF_CGROUP_INET_INGRESS;
> > -	if (bpf_prog_attach(bpfprog_fd, cg1, type, 0)) {
> > -		printf("ERROR: bpf_prog_attach fails!\n");
> > -		log_err("Attaching prog");
> > +		bpf_program__set_expected_attach_type(bpf_prog, BPF_CGROUP_INET_INGRESS);
> > +
> > +	link = bpf_program__attach_cgroup(bpf_prog, cg1);
> > +	if (libbpf_get_error(link)) {
> > +		fprintf(stderr, "ERROR: bpf_program__attach_cgroup failed\n");
> > +		link = NULL;
> Again, this is not needed.  bpf_link__destroy() can
> handle both NULL and error pointer.  Please take a look
> at the bpf_link__destroy() in libbpf.c
> 
> > +		goto err;
> > +	}
> > +
> > +	sprintf(cg_pin_path, "/sys/fs/bpf/hbm%d", cg_id);
> > +	rc = bpf_link__pin(link, cg_pin_path);
> > +	if (rc < 0) {
> > +		printf("ERROR: bpf_link__pin failed: %d\n", rc);
> >  		goto err;
> >  	}
> >  
> > @@ -213,7 +215,7 @@ static int run_bpf_prog(char *prog, int cg_id)
> >  #define DELTA_RATE_CHECK 10000		/* in us */
> >  #define RATE_THRESHOLD 9500000000	/* 9.5 Gbps */
> >  
> > -		bpf_map_lookup_elem(map_fd, &key, &qstats);
> > +		bpf_map_lookup_elem(queue_stats_fd, &key, &qstats);
> >  		if (gettimeofday(&t0, NULL) < 0)
> >  			do_error("gettimeofday failed", true);
> >  		t_last = t0;
> > @@ -242,7 +244,7 @@ static int run_bpf_prog(char *prog, int cg_id)
> >  			fclose(fin);
> >  			printf("  new_eth_tx_bytes:%llu\n",
> >  			       new_eth_tx_bytes);
> > -			bpf_map_lookup_elem(map_fd, &key, &qstats);
> > +			bpf_map_lookup_elem(queue_stats_fd, &key, &qstats);
> >  			new_cg_tx_bytes = qstats.bytes_total;
> >  			delta_bytes = new_eth_tx_bytes - last_eth_tx_bytes;
> >  			last_eth_tx_bytes = new_eth_tx_bytes;
> > @@ -289,14 +291,14 @@ static int run_bpf_prog(char *prog, int cg_id)
> >  					rate = minRate;
> >  				qstats.rate = rate;
> >  			}
> > -			if (bpf_map_update_elem(map_fd, &key, &qstats, BPF_ANY))
> > +			if (bpf_map_update_elem(queue_stats_fd, &key, &qstats, BPF_ANY))
> >  				do_error("update map element fails", false);
> >  		}
> >  	} else {
> >  		sleep(dur);
> >  	}
> >  	// Get stats!
> > -	if (stats_flag && bpf_map_lookup_elem(map_fd, &key, &qstats)) {
> > +	if (stats_flag && bpf_map_lookup_elem(queue_stats_fd, &key, &qstats)) {
> >  		char fname[100];
> >  		FILE *fout;
> >  
> > @@ -398,10 +400,10 @@ static int run_bpf_prog(char *prog, int cg_id)
> >  err:
> >  	rc = 1;
> >  
> > -	if (cg1)
> > -		close(cg1);
> > +	bpf_link__destroy(link);
> > +	close(cg1);
> >  	cleanup_cgroup_environment();
> > -
> > +	bpf_object__close(obj);
> The bpf_* cleanup condition still looks wrong.
> 
> I can understand why it does not want to cleanup_cgroup_environment()
> on the success case because the sh script may want to run test under this
> cgroup.
> 
> However, the bpf_link__destroy(), bpf_object__close(), and
> even close(cg1) should be done in both success and error
> cases.
> 
> The cg1 test still looks wrong also.  The cg1 should
> be init to -1 and then test for "if (cg1 == -1)".
oops.  I meant cg1 != -1 .

^ permalink raw reply

* Re: [PATCH net-next] r8169: reduce number of workaround doorbell rings
From: patchwork-bot+netdevbpf @ 2020-11-21  2:40 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: kuba, davem, nic_swsd, netdev
In-Reply-To: <0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Thu, 19 Nov 2020 21:57:27 +0100 you wrote:
> Some chip versions have a hw bug resulting in lost door bell rings.
> To work around this the doorbell is also rung whenever we still have
> tx descriptors in flight after having cleaned up tx descriptors.
> These PCI(e) writes come at a cost, therefore let's reduce the number
> of extra doorbell rings.
> If skb is NULL then this means:
> - last cleaned-up descriptor belongs to a skb with at least one fragment
>   and last fragment isn't marked as sent yet
> - hw is in progress sending the skb, therefore no extra doorbell ring
>   is needed for this skb
> - once last fragment is marked as transmitted hw will trigger
>   a tx done interrupt and we come here again (with skb != NULL)
>   and ring the doorbell if needed
> Therefore skip the workaround doorbell ring if skb is NULL.
> 
> [...]

Here is the summary with links:
  - [net-next] r8169: reduce number of workaround doorbell rings
    https://git.kernel.org/netdev/net-next/c/94d8a98e6235

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] r8169: use dev_err_probe in rtl_get_ether_clk
From: Jakub Kicinski @ 2020-11-21  2:37 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David Miller, Realtek linux nic maintainers,
	netdev@vger.kernel.org
In-Reply-To: <b0c4ebcf-2047-e933-b890-8a20e4bdb19f@gmail.com>

On Thu, 19 Nov 2020 22:00:11 +0100 Heiner Kallweit wrote:
> Tiny improvement, let dev_err_probe() deal with EPROBE_DEFER.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/7] samples: bpf: refactor hbm program with libbpf
From: Martin KaFai Lau @ 2020-11-21  2:34 UTC (permalink / raw)
  To: Daniel T. Lee
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, brakmo,
	Jesper Dangaard Brouer, Andrii Nakryiko, Lorenzo Bianconi,
	David Ahern, Yonghong Song, Toke Høiland-Jørgensen,
	Ira Weiny, Thomas Graf, Jakub Kicinski, John Fastabend, bpf,
	netdev, Xdp
In-Reply-To: <20201119150617.92010-2-danieltimlee@gmail.com>

On Thu, Nov 19, 2020 at 03:06:11PM +0000, Daniel T. Lee wrote:
[ ... ]

>  static int run_bpf_prog(char *prog, int cg_id)
>  {
> -	int map_fd;
> -	int rc = 0;
> +	struct hbm_queue_stats qstats = {0};
> +	char cg_dir[100], cg_pin_path[100];
> +	struct bpf_link *link = NULL;
>  	int key = 0;
>  	int cg1 = 0;
> -	int type = BPF_CGROUP_INET_EGRESS;
> -	char cg_dir[100];
> -	struct hbm_queue_stats qstats = {0};
> +	int rc = 0;
>  
>  	sprintf(cg_dir, "/hbm%d", cg_id);
> -	map_fd = prog_load(prog);
> -	if (map_fd  == -1)
> -		return 1;
> +	rc = prog_load(prog);
> +	if (rc != 0)
> +		return rc;
>  
>  	if (setup_cgroup_environment()) {
>  		printf("ERROR: setting cgroup environment\n");
> @@ -190,16 +183,25 @@ static int run_bpf_prog(char *prog, int cg_id)
>  	qstats.stats = stats_flag ? 1 : 0;
>  	qstats.loopback = loopback_flag ? 1 : 0;
>  	qstats.no_cn = no_cn_flag ? 1 : 0;
> -	if (bpf_map_update_elem(map_fd, &key, &qstats, BPF_ANY)) {
> +	if (bpf_map_update_elem(queue_stats_fd, &key, &qstats, BPF_ANY)) {
>  		printf("ERROR: Could not update map element\n");
>  		goto err;
>  	}
>  
>  	if (!outFlag)
> -		type = BPF_CGROUP_INET_INGRESS;
> -	if (bpf_prog_attach(bpfprog_fd, cg1, type, 0)) {
> -		printf("ERROR: bpf_prog_attach fails!\n");
> -		log_err("Attaching prog");
> +		bpf_program__set_expected_attach_type(bpf_prog, BPF_CGROUP_INET_INGRESS);
> +
> +	link = bpf_program__attach_cgroup(bpf_prog, cg1);
> +	if (libbpf_get_error(link)) {
> +		fprintf(stderr, "ERROR: bpf_program__attach_cgroup failed\n");
> +		link = NULL;
Again, this is not needed.  bpf_link__destroy() can
handle both NULL and error pointer.  Please take a look
at the bpf_link__destroy() in libbpf.c

> +		goto err;
> +	}
> +
> +	sprintf(cg_pin_path, "/sys/fs/bpf/hbm%d", cg_id);
> +	rc = bpf_link__pin(link, cg_pin_path);
> +	if (rc < 0) {
> +		printf("ERROR: bpf_link__pin failed: %d\n", rc);
>  		goto err;
>  	}
>  
> @@ -213,7 +215,7 @@ static int run_bpf_prog(char *prog, int cg_id)
>  #define DELTA_RATE_CHECK 10000		/* in us */
>  #define RATE_THRESHOLD 9500000000	/* 9.5 Gbps */
>  
> -		bpf_map_lookup_elem(map_fd, &key, &qstats);
> +		bpf_map_lookup_elem(queue_stats_fd, &key, &qstats);
>  		if (gettimeofday(&t0, NULL) < 0)
>  			do_error("gettimeofday failed", true);
>  		t_last = t0;
> @@ -242,7 +244,7 @@ static int run_bpf_prog(char *prog, int cg_id)
>  			fclose(fin);
>  			printf("  new_eth_tx_bytes:%llu\n",
>  			       new_eth_tx_bytes);
> -			bpf_map_lookup_elem(map_fd, &key, &qstats);
> +			bpf_map_lookup_elem(queue_stats_fd, &key, &qstats);
>  			new_cg_tx_bytes = qstats.bytes_total;
>  			delta_bytes = new_eth_tx_bytes - last_eth_tx_bytes;
>  			last_eth_tx_bytes = new_eth_tx_bytes;
> @@ -289,14 +291,14 @@ static int run_bpf_prog(char *prog, int cg_id)
>  					rate = minRate;
>  				qstats.rate = rate;
>  			}
> -			if (bpf_map_update_elem(map_fd, &key, &qstats, BPF_ANY))
> +			if (bpf_map_update_elem(queue_stats_fd, &key, &qstats, BPF_ANY))
>  				do_error("update map element fails", false);
>  		}
>  	} else {
>  		sleep(dur);
>  	}
>  	// Get stats!
> -	if (stats_flag && bpf_map_lookup_elem(map_fd, &key, &qstats)) {
> +	if (stats_flag && bpf_map_lookup_elem(queue_stats_fd, &key, &qstats)) {
>  		char fname[100];
>  		FILE *fout;
>  
> @@ -398,10 +400,10 @@ static int run_bpf_prog(char *prog, int cg_id)
>  err:
>  	rc = 1;
>  
> -	if (cg1)
> -		close(cg1);
> +	bpf_link__destroy(link);
> +	close(cg1);
>  	cleanup_cgroup_environment();
> -
> +	bpf_object__close(obj);
The bpf_* cleanup condition still looks wrong.

I can understand why it does not want to cleanup_cgroup_environment()
on the success case because the sh script may want to run test under this
cgroup.

However, the bpf_link__destroy(), bpf_object__close(), and
even close(cg1) should be done in both success and error
cases.

The cg1 test still looks wrong also.  The cg1 should
be init to -1 and then test for "if (cg1 == -1)".

^ permalink raw reply

* Re: [net] net/packet: fix incoming receive for L3 devices without visible hard header
From: Willem de Bruijn @ 2020-11-21  2:23 UTC (permalink / raw)
  To: Eyal Birger
  Cc: David Miller, Jakub Kicinski, Jason A. Donenfeld,
	Network Development, Xie He
In-Reply-To: <20201120030412.1646940-1-eyal.birger@gmail.com>

On Thu, Nov 19, 2020 at 10:05 PM Eyal Birger <eyal.birger@gmail.com> wrote:
>
> In the patchset merged by commit b9fcf0a0d826
> ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
> did not have header_ops were given one for the purpose of protocol parsing
> on af_packet transmit path.
>
> That change made af_packet receive path regard these devices as having a
> visible L3 header and therefore aligned incoming skb->data to point to the
> skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
> reset their mac_header prior to ingress and therefore their incoming
> packets became malformed.
>
> Ideally these devices would reset their mac headers, or af_packet would be
> able to rely on dev->hard_header_len being 0 for such cases, but it seems
> this is not the case.
>
> Fix by changing af_packet RX ll visibility criteria to include the
> existence of a '.create()' header operation, which is used when creating
> a device hard header - via dev_hard_header() - by upper layers, and does
> not exist in these L3 devices.
>
> Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>

Thanks for the fix. This makes sense to me.

Recent discussions on this point also agreed that whether or not
headers are exposed to code above the device driver depends
on dev->hard_header_len and dev->header_ops->create (and
the two have to be consistent with one another).

dev->header_ops->parse_protocol is a best effort approach to
infer a protocol in cases where the caller did not specify it. But
as best effort, its existence or absence does not define the
device header, so testing only header_ops != NULL is insufficient.

> ---
>  net/packet/af_packet.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index cefbd50c1090..a241059fd536 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -93,8 +93,8 @@
>
>  /*
>     Assumptions:
> -   - If the device has no dev->header_ops, there is no LL header visible
> -     above the device. In this case, its hard_header_len should be 0.
> +   - If the device has no dev->header_ops->create, there is no LL header
> +     visible above the device. In this case, its hard_header_len should be 0.
>       The device may prepend its own header internally. In this case, its
>       needed_headroom should be set to the space needed for it to add its
>       internal header.
> @@ -108,21 +108,21 @@
>  On receive:
>  -----------
>
> -Incoming, dev->header_ops != NULL
> +Incoming, dev->header_ops != NULL && dev->header_ops->create != NULL
>     mac_header -> ll header
>     data       -> data
>
> -Outgoing, dev->header_ops != NULL
> +Outgoing, dev->header_ops != NULL && dev->header_ops->create != NULL
>     mac_header -> ll header
>     data       -> ll header
>
> -Incoming, dev->header_ops == NULL
> +Incoming, dev->header_ops == NULL || dev->header_ops->create == NULL
>     mac_header -> data
>       However drivers often make it point to the ll header.
>       This is incorrect because the ll header should be invisible to us.
>     data       -> data
>
> -Outgoing, dev->header_ops == NULL
> +Outgoing, dev->header_ops == NULL || dev->header_ops->create == NULL
>     mac_header -> data. ll header is invisible to us.
>     data       -> data
>
> @@ -272,6 +272,18 @@ static bool packet_use_direct_xmit(const struct packet_sock *po)
>         return po->xmit == packet_direct_xmit;
>  }
>
> +static bool packet_ll_header_rcv_visible(const struct net_device *dev)
> +{
> +       /* The device has an explicit notion of ll header,
> +        * exported to higher levels
> +        *
> +        * Otherwise, the device hides details of its frame
> +        * structure, so that corresponding packet head is
> +        * never delivered to user.
> +        */
> +       return dev->header_ops && dev->header_ops->create;
> +}
> +

Perhaps a dev_has_header(..) in include/linux/netdevice.h

And then the same in the Incoming/Outgoing comments above.

^ permalink raw reply

* Re: [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets
From: Jakub Kicinski @ 2020-11-21  2:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bpf, daniel, kafai, kernel-team, edumazet, brakmo,
	alexanderduyck, weiwan
In-Reply-To: <160582070138.66684.11785214534154816097.stgit@localhost.localdomain>

On Thu, 19 Nov 2020 13:23:43 -0800 Alexander Duyck wrote:
> This patch set is meant to address issues seen with SYN/ACK packets not
> containing the ECT0 bit when DCTCP is configured as the congestion control
> algorithm for a TCP socket.
> 
> A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
> issue obvious. Looking at the packets will result in the SYN/ACK packet
> with an ECT0 bit that does not match the other packets for the flow when
> the congestion control agorithm is switch from the default. So for example
> going from non-DCTCP to a DCTCP congestion control algorithm we will see
> the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
> the flow will. Likewise if we switch from a default of DCTCP to cubic we
> will see the ECT0 bit set in the SYN/ACK while the other packets in the
> flow will not.

Applied, thanks!

^ permalink raw reply

* Re: [RFC V2 net-next 1/2] ethtool: add support for controling the type of adaptive coalescing
From: tanhuazhong @ 2020-11-21  2:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: davem, netdev, linux-kernel, linuxarm, kuba, mkubecek
In-Reply-To: <20201120152548.GN1853236@lunn.ch>



On 2020/11/20 23:25, Andrew Lunn wrote:
>> @@ -310,6 +334,13 @@ int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info)
>>   	ret = dev->ethtool_ops->set_coalesce(dev, &coalesce);
>>   	if (ret < 0)
>>   		goto out_ops;
>> +
>> +	if (ops->set_ext_coalesce) {
>> +		ret = ops->set_ext_coalesce(dev, &ext_coalesce);
>> +		if (ret < 0)
>> +			goto out_ops;
>> +	}
>> +
> 
> The problem here is, if ops->set_ext_coalesce() fails, you need to
> undo what dev->ethtool_ops->set_coalesce() did. From the users
> perspective, this should be atomic. It does everything, or it does
> nothing and returns an error code.
> 
> And that is not easy given this structure of two op calls.
> 
>      Andrew
> 

yes, i will try what Michal suggested in V1.

Regards.
Huazhong.

> .
> 


^ permalink raw reply

* Re: [RFC net-next 1/2] ethtool: add support for controling the type of adaptive coalescing
From: tanhuazhong @ 2020-11-21  1:56 UTC (permalink / raw)
  To: Michal Kubecek, Andrew Lunn; +Cc: davem, netdev, linux-kernel, linuxarm, kuba
In-Reply-To: <20201120212243.n7vnwo3ldzisr4hl@lion.mk-sys.cz>



On 2020/11/21 5:22, Michal Kubecek wrote:
> On Fri, Nov 20, 2020 at 02:39:38PM +0100, Andrew Lunn wrote:
>> On Fri, Nov 20, 2020 at 08:23:22AM +0100, Michal Kubecek wrote:
>>> On Fri, Nov 20, 2020 at 10:59:59AM +0800, tanhuazhong wrote:
>>>> On 2020/11/20 6:02, Michal Kubecek wrote:
>>>>> We could use a similar approach as struct ethtool_link_ksettings, e.g.
>>>>>
>>>>> 	struct kernel_ethtool_coalesce {
>>>>> 		struct ethtool_coalesce base;
>>>>> 		/* new members which are not part of UAPI */
>>>>> 	}
>>>>>
>>>>> get_coalesce() and set_coalesce() would get pointer to struct
>>>>> kernel_ethtool_coalesce and ioctl code would be modified to only touch
>>>>> the base (legacy?) part.
>>>>>   > While already changing the ops arguments, we could also add extack
>>>>> pointer, either as a separate argument or as struct member (I slightly
>>>>> prefer the former).
>>>> If changing the ops arguments, each driver who implement
>>>> set_coalesce/get_coalesce of ethtool_ops need to be updated. Is it
>>>> acceptable adding two new ops to get/set ext_coalesce info (like
>>>> ecc31c60240b ("ethtool: Add link extended state") does)? Maybe i can send V2
>>>> in this way, and then could you help to see which one is more suitable?
>>> If it were just this one case, adding an extra op would be perfectly
>>> fine. But from long term point of view, we should expect extending also
>>> other existing ethtool requests and going this way for all of them would
>>> essentially double the number of callbacks in struct ethtool_ops.
>> coccinella might be useful here.
> I played with spatch a bit and it with the spatch and patch below, I got
> only three build failures (with allmodconfig) that would need to be
> addressed manually - these drivers call the set_coalesce() callback on
> device initialization.
> 
> I also tried to make the structure argument const in ->set_coalesce()
> but that was more tricky as adjusting other functions that the structure
> is passed to required either running the spatch three times or repeating
> the same two rules three times in the spatch (or perhaps there is
> a cleaner way but I'm missing relevant knowledge of coccinelle). Then
> there was one more problem in i40e driver which modifies the structure
> before passing it on to its helpers. It could be worked around but I'm
> not sure if constifying the argument is worth these extra complications.
> 
> Michal

will implement it like this in V3.

Regards,
Huazhong.


^ permalink raw reply

* Re: [net-next v3 05/15] ice: create flow profile
From: Alexander Duyck @ 2020-11-21  1:49 UTC (permalink / raw)
  To: Nguyen, Anthony L
  Cc: Cao, Chinh T, davem@davemloft.net, kuba@kernel.org,
	Behera, BrijeshX, Valiquette, Real, sassmann@redhat.com,
	netdev@vger.kernel.org
In-Reply-To: <fd0fc6f95f4c107d1aed18bf58239fda91879b26.camel@intel.com>

On Fri, Nov 20, 2020 at 4:42 PM Nguyen, Anthony L
<anthony.l.nguyen@intel.com> wrote:
>
> On Fri, 2020-11-13 at 15:56 -0800, Alexander Duyck wrote:
> > On Fri, Nov 13, 2020 at 1:46 PM Tony Nguyen <
> > anthony.l.nguyen@intel.com> wrote:
> > >
> > > From: Real Valiquette <real.valiquette@intel.com>
> > >
> > > Implement the initial steps for creating an ACL filter to support
> > > ntuple
> > > masks. Create a flow profile based on a given mask rule and program
> > > it to
> > > the hardware. Though the profile is written to hardware, no actions
> > > are
> > > associated with the profile yet.
> > >
> > > Co-developed-by: Chinh Cao <chinh.t.cao@intel.com>
> > > Signed-off-by: Chinh Cao <chinh.t.cao@intel.com>
> > > Signed-off-by: Real Valiquette <real.valiquette@intel.com>
> > > Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
> >
> > So I see two big issues with the patch.
> >
> > First it looks like there is an anti-pattern of defensive NULL
> > pointer
> > checks throughout. Those can probably all go since all of the callers
> > either use the pointer, or verify it is non-NULL before calling the
> > function in question.
>
> I'm removing those checks that you pointed out and some others as well.
>
> >
> > In addition the mask handling doens't look right to me. It is calling
> > out a partial mask as being the only time you need an ACL and I would
> > think it is any time you don't have a full mask for all
> > ports/addresses since a flow director rule normally pulls in the full
> > 4 tuple based on ice_ntuple_set_input_set() .
>
> Commented below as well.
>
> <snip>
>
> > > +/**
> > > + * ice_is_acl_filter - Checks if it's a FD or ACL filter
> > > + * @fsp: pointer to ethtool Rx flow specification
> > > + *
> > > + * If any field of the provided filter is using a partial mask
> > > then this is
> > > + * an ACL filter.
> > > + *
> >
> > I'm not sure this logic is correct. Can the flow director rules
> > handle
> > a field that is removed? Last I knew it couldn't. If that is the case
> > you should be using ACL for any case in which a full mask is not
> > provided. So in your tests below you could probably drop the check
> > for
> > zero as I don't think that is a valid case in which flow director
> > would work.
> >
>
> I'm not sure what you meant by a field that is removed, but Flow
> Director can handle reduced input sets. Flow Director is able to handle
> 0 mask, full mask, and less than 4 tuples. ACL is needed/used only when
> a partial mask rule is requested.

So historically speaking with flow director you are only allowed one
mask because it determines the inputs used to generate the hash that
identifies the flow. So you are only allowed one mask for all flows
because changing those inputs would break the hash mapping.

Normally this ends up meaning that you have to do like what we did in
ixgbe and disable ATR and only allow one mask for all inputs. I
believe for i40e they required that you always use a full 4 tuple. I
didn't see something like that here. As such you may want to double
check that you can have a mix of flow director rules that are using 1
tuple, 2 tuples, 3 tuples, and 4 tuples as last I knew you couldn't.
Basically if you had fields included they had to be included for all
the rules on the port or device depending on how the tables are set
up.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH net-next v3 00/12] net: dsa: microchip: PTP support for KSZ956x
From: Vladimir Oltean @ 2020-11-21  1:26 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: ceggers, kuba, andrew, richardcochran, robh+dt, vivien.didelot,
	davem, kurt.kanzenbach, george.mccollister, marex, helmut.grohne,
	pbarker, Codrin.Ciubotariu, Woojung.Huh, UNGLinuxDriver, netdev,
	devicetree, linux-kernel
In-Reply-To: <BYAPR11MB35582F880B533EB2EE0CDD1DECE00@BYAPR11MB3558.namprd11.prod.outlook.com>

On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@microchip.com wrote:
> The initial proposal in tag_ksz.c is for the switch driver to provide callback functions
> to handle receiving and transmitting.  Then each switch driver can be added to
> process the tail tag in its own driver and leave tag_ksz.c unchanged.
>
> It was rejected because of wanting to keep tag_ksz.c code and switch driver code
> separate and concern about performance.
>
> Now tag_ksz.c is filled with PTP code that is not relevant for other switches and will
> need to be changed again when another switch driver with PTP function is added.
>
> Can we implement that callback mechanism?

I, too, lack the context here. But it sounds like feedback that Andrew
would give.

If you don't like the #ifdef's, I am not in love with them either. But
maybe Christian is just optimizing too aggressively, and doesn't actually
need to put those #ifdef's there and provide stub implementations, but
could actually just leave the ksz9477_rcv_timestamp and ksz9477_xmit_timestamp
always compiled-in, and "dead at runtime" in the case there is no PTP.

If there is something else you don't like, what is it? If you know that
other KSZ switches don't implement timestamping in the same way, well,
we don't know that. I thought that it's generally up to the second
implementer to recognize which parts of the code are common and should
be reused, not for the first one to guess. I would not add function
pointers for a single implementation if they don't have a clear
justification.

> One issue with transmission with PTP enabled is that the tail tag needs to contain 4
> additional bytes.  When the PTP function is off the bytes are not added.  This should
> be monitored all the time.
>
> The extra 4 bytes are only used for 1-step Pdelay_Resp.  It should contain the receive
> timestamp of previous Pdelay_Req with latency adjusted.  The correction field in
> Pdelay_Resp should be zero.  It may be a hardware bug to have wrong UDP checksum
> when the message is sent.

It "may" be a hardware bug? Are you unsure or polite?
As for the phrase "the correction field in Pdelay_Resp should be zero".
Consider the case where there is an E2E TC switch attached to that port.
It will update the correctionField of the Pdelay_Req message. Then the
application stack running on this ksz9477 switch is forced by the
standard to copy the correctionField as-is from the Pdelay_Req into the
Pdelay_Resp message. So that correctionField is never guaranteed to be
zero, even if Christian doesn't fiddle with it within the driver. Are
you saying that for proper UDP checksum calculation, the driver should
be forcing the correctionField to zero and moving that value into the
tail tag?

> I think the right implementation is for the driver to remember this receive timestamp
> of Pdelay_Req and puts it in the tail tag when it sees a 1-step Pdelay_Resp is sent.

I have mixed feelings about this. IIUC, you're saying "let's implement a
fixed-size FIFO of RX timestamps of Pdelay_Req messages, and let's match
them on TX to Pdelay_Resp messages, by {sequenceId, domainNumber}."

But how deep should we make that FIFO? I.e. how many Pdelay_Req messages
should we expect before the user space will inject back a Pdelay_Resp
for transmission?

Again, consider the case of an E2E TC attached to a ksz9477 port. Even
if we run peer delay, it's not guaranteed that we only have one peer.
That E2E TC might connect us to a plethora of other peers. And the more
peers we are connected to, the higher the chance that the size of this
Pdelay_Req RX timestamp FIFO will not be adequately chosen.

> There is one more requirement that is a little difficult to do.  The calculated peer delay
> needs to be programmed in hardware register, but the regular PTP stack has no way to
> send that command.  I think the driver has to do its own calculation by snooping on the
> Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.

What register, and what does the switch do with this peer delay information?

> The receive and transmit latencies are different for different connected speed.  So the
> driver needs to change them when the link changes.  For that reason the PTP stack
> should not use its own latency values as generally the application does not care about
> the linked speed.

The thing is, ptp4l already has ingressLatency and egressLatency
settings, and I would not be surprised if those config options would get
extended to cover values at multiple link speeds.

In the general case, the ksz9477 MAC could be attached to any external
PHY, having its own propagation delay characteristics, or any number of
other things that cause clock domain crossings. I'm not sure how feasible
it is for the kernel to abstract this away completely, and adjust
timestamps automatically based on any and all combinations of MAC and
PHY. Maybe this is just wishful thinking.

Oh, and by the way, Christian, I'm not even sure if you aren't in fact
just beating around the bush with these tstamp_rx_latency_ns and
tstamp_tx_latency_ns values? I mean, the switch adds the latency value
to the timestamps. And you, from the driver, read the value of the
register, so you can subtract the value from the timestamp, to
compensate for its correction. So, all in all, there is no net latency
compensation seen by the outside world?! If that is the case, can't you
just set the latency registers to zero, do your compensation from the
application stack and call it a day?

^ permalink raw reply

* Re: [net-next v3 08/15] ice: don't always return an error for Get PHY Abilities AQ command
From: Nguyen, Anthony L @ 2020-11-21  0:43 UTC (permalink / raw)
  To: alexander.duyck@gmail.com
  Cc: davem@davemloft.net, kuba@kernel.org, Brown, Aaron F,
	netdev@vger.kernel.org, Stillwell Jr, Paul M, sassmann@redhat.com
In-Reply-To: <CAKgT0Uedphdr1RvdB_Zw5aAH-2CuudwGK8OenrvrQ1bE0XK-pQ@mail.gmail.com>

On Fri, 2020-11-13 at 17:25 -0800, Alexander Duyck wrote:
> On Fri, Nov 13, 2020 at 1:49 PM Tony Nguyen <
> anthony.l.nguyen@intel.com> wrote:
> > 
> > From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > 
> > There are times when the driver shouldn't return an error when the
> > Get
> > PHY abilities AQ command (0x0600) returns an error. Instead the
> > driver
> > should log that the error occurred and continue on. This allows the
> > driver to load even though the AQ command failed. The user can then
> > later determine the reason for the failure and correct it.
> > 
> > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_common.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_common.c
> > b/drivers/net/ethernet/intel/ice/ice_common.c
> > index 7db5fd977367..3c600808d0da 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_common.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> > @@ -925,7 +925,7 @@ enum ice_status ice_init_hw(struct ice_hw *hw)
> >                                      ICE_AQC_REPORT_TOPO_CAP,
> > pcaps, NULL);
> >         devm_kfree(ice_hw_to_dev(hw), pcaps);
> >         if (status)
> > -               goto err_unroll_sched;
> > +               ice_debug(hw, ICE_DBG_PHY, "Get PHY capabilities
> > failed, continuing anyway\n");
> > 
> >         /* Initialize port_info struct with link information */
> >         status = ice_aq_get_link_info(hw->port_info, false, NULL,
> > NULL);
> > --
> > 2.26.2
> > 
> 
> If we are expecting the user to correct things then we should be
> putting out a warning via dev_warn() rather than a debug message.
> Otherwise the user is just going to have to come back through and
> turn
> on debugging and reload the driver in order to figure out what is
> going on. In my mind it should get the same treatment as an outdated
> NVM image in ice_aq_ver_check().

Will change this to a dev_warn().

Thanks,
Tony

^ permalink raw reply

* Re: [net-next v3 05/15] ice: create flow profile
From: Nguyen, Anthony L @ 2020-11-21  0:42 UTC (permalink / raw)
  To: alexander.duyck@gmail.com
  Cc: Cao, Chinh T, davem@davemloft.net, kuba@kernel.org,
	Behera, BrijeshX, Valiquette, Real, sassmann@redhat.com,
	netdev@vger.kernel.org
In-Reply-To: <CAKgT0UeQ5q2M-uiR0-1G=30syPiO8S5OFHvDuN1XtQg5700hCg@mail.gmail.com>

On Fri, 2020-11-13 at 15:56 -0800, Alexander Duyck wrote:
> On Fri, Nov 13, 2020 at 1:46 PM Tony Nguyen <
> anthony.l.nguyen@intel.com> wrote:
> > 
> > From: Real Valiquette <real.valiquette@intel.com>
> > 
> > Implement the initial steps for creating an ACL filter to support
> > ntuple
> > masks. Create a flow profile based on a given mask rule and program
> > it to
> > the hardware. Though the profile is written to hardware, no actions
> > are
> > associated with the profile yet.
> > 
> > Co-developed-by: Chinh Cao <chinh.t.cao@intel.com>
> > Signed-off-by: Chinh Cao <chinh.t.cao@intel.com>
> > Signed-off-by: Real Valiquette <real.valiquette@intel.com>
> > Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
> 
> So I see two big issues with the patch.
> 
> First it looks like there is an anti-pattern of defensive NULL
> pointer
> checks throughout. Those can probably all go since all of the callers
> either use the pointer, or verify it is non-NULL before calling the
> function in question.

I'm removing those checks that you pointed out and some others as well.

> 
> In addition the mask handling doens't look right to me. It is calling
> out a partial mask as being the only time you need an ACL and I would
> think it is any time you don't have a full mask for all
> ports/addresses since a flow director rule normally pulls in the full
> 4 tuple based on ice_ntuple_set_input_set() .

Commented below as well.

<snip>

> > +/**
> > + * ice_is_acl_filter - Checks if it's a FD or ACL filter
> > + * @fsp: pointer to ethtool Rx flow specification
> > + *
> > + * If any field of the provided filter is using a partial mask
> > then this is
> > + * an ACL filter.
> > + *
> 
> I'm not sure this logic is correct. Can the flow director rules
> handle
> a field that is removed? Last I knew it couldn't. If that is the case
> you should be using ACL for any case in which a full mask is not
> provided. So in your tests below you could probably drop the check
> for
> zero as I don't think that is a valid case in which flow director
> would work.
> 

I'm not sure what you meant by a field that is removed, but Flow
Director can handle reduced input sets. Flow Director is able to handle
0 mask, full mask, and less than 4 tuples. ACL is needed/used only when
a partial mask rule is requested.


> > + * Returns true if ACL filter otherwise false.
> > + */
> > +static bool ice_is_acl_filter(struct ethtool_rx_flow_spec *fsp)
> > +{
> > +       struct ethtool_tcpip4_spec *tcp_ip4_spec;
> > +       struct ethtool_usrip4_spec *usr_ip4_spec;
> > +
> > +       switch (fsp->flow_type & ~FLOW_EXT) {
> > +       case TCP_V4_FLOW:
> > +       case UDP_V4_FLOW:
> > +       case SCTP_V4_FLOW:
> > +               tcp_ip4_spec = &fsp->m_u.tcp_ip4_spec;
> > +
> > +               /* IP source address */
> > +               if (tcp_ip4_spec->ip4src &&
> > +                   tcp_ip4_spec->ip4src != htonl(0xFFFFFFFF))
> > +                       return true;
> > +
> > +               /* IP destination address */
> > +               if (tcp_ip4_spec->ip4dst &&
> > +                   tcp_ip4_spec->ip4dst != htonl(0xFFFFFFFF))
> > +                       return true;
> > +
> 
> Instead of testing this up here you could just skip the break and
> fall
> through since the source and destination IP addresses occupy the same
> spots on usr_ip4_spec and tcp_ip4_spec. You could probably also just
> use tcp_ip4_spec for the entire test.

Will make this change.

Thanks,
Tony

^ permalink raw reply

* Re: [net-next v2 03/15] ice: initialize ACL table
From: Nguyen, Anthony L @ 2020-11-21  0:39 UTC (permalink / raw)
  To: alexander.duyck@gmail.com
  Cc: Cao, Chinh T, kuba@kernel.org, netdev@vger.kernel.org,
	Valiquette, Real, davem@davemloft.neti, Bokkena, HarikumarX,
	sassmann@redhat.com
In-Reply-To: <CAKgT0UcEd4BmyMxBmy2D2vVCWKu3Q=0iYKZ2UTdAPg0gitSiCQ@mail.gmail.com>

On Fri, 2020-11-13 at 14:59 -0800, Alexander Duyck wrote:
> On Fri, Nov 13, 2020 at 1:36 PM Tony Nguyen <
> anthony.l.nguyen@intel.com> wrote:
> > 
> > From: Real Valiquette <real.valiquette@intel.com>
> > 
> > ACL filtering can be utilized to expand support of ntuple rules by
> > allowing
> > mask values to be specified for redirect to queue or drop.
> > 
> > Implement support for specifying the 'm' value of ethtool ntuple
> > command
> > for currently supported fields (src-ip, dst-ip, src-port, and dst-
> > port).
> > 
> > For example:
> > 
> > ethtool -N eth0 flow-type tcp4 dst-port 8880 m 0x00ff action 10
> > or
> > ethtool -N eth0 flow-type tcp4 src-ip 192.168.0.55 m 0.0.0.255
> > action -1
> > 
> > At this time the following flow-types support mask values: tcp4,
> > udp4,
> > sctp4, and ip4.
> 
> So you spend all of the patch description describing how this might
> be
> used in the future. However there is nothing specific to the ethtool
> interface as far as I can tell anywhere in this patch. With this
> patch
> the actual command called out above cannot be performed, correct?
> 
> > Begin implementation of ACL filters by setting up structures,
> > AdminQ
> > commands, and allocation of the ACL table in the hardware.
> 
> This seems to be what this patch is actually doing. You may want to
> rewrite this patch description to focus on this and explain that you
> are enabling future support for ethtool ntuple masks. However save
> this feature description for the patch that actually enables the
> functionality.

I will do this. Thanks.

> > Co-developed-by: Chinh Cao <chinh.t.cao@intel.com>
> > Signed-off-by: Chinh Cao <chinh.t.cao@intel.com>
> > Signed-off-by: Real Valiquette <real.valiquette@intel.com>
> > Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Tested-by: Harikumar Bokkena  <harikumarx.bokkena@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/Makefile       |   2 +
> >  drivers/net/ethernet/intel/ice/ice.h          |   4 +
> >  drivers/net/ethernet/intel/ice/ice_acl.c      | 153 +++++++++
> >  drivers/net/ethernet/intel/ice/ice_acl.h      | 125 +++++++
> >  drivers/net/ethernet/intel/ice/ice_acl_ctrl.c | 311
> > ++++++++++++++++++
> >  .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 215 +++++++++++-
> >  drivers/net/ethernet/intel/ice/ice_flow.h     |   2 +
> >  drivers/net/ethernet/intel/ice/ice_main.c     |  50 +++
> >  drivers/net/ethernet/intel/ice/ice_type.h     |   3 +
> >  9 files changed, 863 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/net/ethernet/intel/ice/ice_acl.c
> >  create mode 100644 drivers/net/ethernet/intel/ice/ice_acl.h
> >  create mode 100644 drivers/net/ethernet/intel/ice/ice_acl_ctrl.c
> > 
> 
> <snip>
> 
> > +/**
> > + * ice_acl_destroy_tbl - Destroy a previously created LEM table
> > for ACL
> > + * @hw: pointer to the HW struct
> > + */
> > +enum ice_status ice_acl_destroy_tbl(struct ice_hw *hw)
> > +{
> > +       struct ice_aqc_acl_generic resp_buf;
> > +       enum ice_status status;
> > +
> > +       if (!hw->acl_tbl)
> > +               return ICE_ERR_DOES_NOT_EXIST;
> > +
> > +       /* call the AQ command to destroy the ACL table */
> > +       status = ice_aq_dealloc_acl_tbl(hw, hw->acl_tbl->id,
> > &resp_buf, NULL);
> > +       if (status) {
> > +               ice_debug(hw, ICE_DBG_ACL, "AQ de-allocation of ACL
> > failed. status: %d\n",
> > +                         status);
> > +               return status;
> > +       }
> > +
> > +       devm_kfree(ice_hw_to_dev(hw), hw->acl_tbl);
> > +       hw->acl_tbl = NULL;
> 
> What are the scenarios where you might see the dealloc_acl_tbl fail?
> I'm just wondering if it makes sense to keep the table just because
> the hardware is refusing to give it up.

The datasheet isn't clear on what would cause a failed allocation, but
we're trying to keep the SW structures in sync with HW.

<snip>

> > +/* This response structure is same in case of alloc/dealloc table,
> > + * alloc/dealloc action-pair
> > + */
> > +struct ice_aqc_acl_generic {
> > +       /* if alloc_id is below 0x1000 then allocation failed due
> > to
> > +        * unavailable resources, else this is set by FW to
> > identify
> > +        * table allocation
> > +        */
> > +       __le16 alloc_id;
> > +
> > +       union {
> > +               /* to be used only in case of alloc/dealloc table
> > */
> > +               struct {
> > +                       /* Index of the first TCAM block, otherwise
> > set to 0xFF
> > +                        * for a failed allocation
> > +                        */
> > +                       u8 first_tcam;
> > +                       /* Index of the last TCAM block. This index
> > shall be
> > +                        * set to the value of first_tcam for
> > single TCAM block
> > +                        * allocation, otherwise set to 0xFF for a
> > failed
> > +                        * allocation
> > +                        */
> > +                       u8 last_tcam;
> > +               } table;
> > +               /* reserved in case of alloc/dealloc action-pair */
> > +               struct {
> > +                       __le16 reserved;
> > +               } act_pair;
> 
> Is there really any need to call out the reserved value? It seems
> like
> you could just leave the struct table in place and not bother with
> the
> union since you would likely just be memsetting the entire ops struct
> to 0 anyway.
> 


Since this is used for table and action-pair calls, the reason we have
the union and reserved value is to make it explicit that for action-
pair calls, nothing is to be written and to clearly differentiate the
fields that should be set for table alloc/dealloc and those for
act_pair alloc/dealloc.

> > +       } ops;
> > +
> > +       /* index of first entry (in both TCAM and action memories),
> > +        * otherwise set to 0xFF for a failed allocation
> > +        */
> > +       __le16 first_entry;
> > +       /* index of last entry (in both TCAM and action memories),
> > +        * otherwise set to 0xFF for a failed allocation
> > +        */
> > +       __le16 last_entry;
> > +
> > +       /* Each act_mem element specifies the order of the memory
> > +        * otherwise 0xFF
> > +        */
> > +       u8 act_mem[ICE_AQC_MAX_ACTION_MEMORIES];
> > +};
> > +
> 
> <snip>
> 
> > +/**
> > + * ice_deinit_acl - Unroll the initialization of the ACL block
> > + * @pf: ptr to PF device
> > + */
> > +static void ice_deinit_acl(struct ice_pf *pf)
> > +{
> > +       ice_acl_destroy_tbl(&pf->hw);
> 
> Why have the ice_acl_destroy_tbl function return a value if it is
> just
> going to be ignored?
> 
> > +}
> > +
> >  /**
> >   * ice_init_fdir - Initialize flow director VSI and configuration
> >   * @pf: pointer to the PF instance
> > @@ -4231,6 +4273,12 @@ ice_probe(struct pci_dev *pdev, const struct
> > pci_device_id __always_unused *ent)
> >         /* Note: Flow director init failure is non-fatal to load */
> >         if (ice_init_fdir(pf))
> >                 dev_err(dev, "could not initialize flow
> > director\n");
> > +       if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
> > +               /* Note: ACL init failure is non-fatal to load */
> > +               err = ice_init_acl(pf);
> > +               if (err)
> > +                       dev_err(dev, "Failed to initialize ACL:
> > %d\n", err);
> > +       }
> > 
> >         /* Note: DCB init failure is non-fatal to load */
> >         if (ice_init_pf_dcb(pf, false)) {
> > @@ -4361,6 +4409,8 @@ static void ice_remove(struct pci_dev *pdev)
> > 
> >         ice_aq_cancel_waiting_tasks(pf);
> > 
> > +       if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
> > +               ice_deinit_acl(pf);
> 
> Looking over the code is there any reason why you need to bother with
> checking the flag? It seems like if ACL is not enabled ice_deinit_acl
> won't do anything. So why bother checking the flag? Also is it really
> okay to just ignore if deallocating the table fails? What are the
> side
> effects?
> 

I will remove the flag check. If the table deallocation fails, there
isn't much we can do to resolve it. I'll use the return value from
ice_acl_destroy_tbl() and add a warning to inform the user of an issue.

> >         mutex_destroy(&(&pf->hw)->fdir_fltr_lock);
> >         if (!ice_is_safe_mode(pf))
> >                 ice_remove_arfs(pf);
> > 

Thanks,
Tony

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox