[PATCH bpf-next 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
@ 2018-01-04  7:27 Yonghong Song
  2018-01-04  7:27 ` [PATCH bpf-next 1/2] " Yonghong Song
  2018-01-04  7:27 ` [PATCH bpf-next 2/2] tools/bpf: add a bpf selftest for stacktrace Yonghong Song
  0 siblings, 2 replies; 5+ messages in thread
From: Yonghong Song @ 2018-01-04  7:27 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY
for stacktrace map. Patch #1 is the core implementation
and Patch #2 implements a bpf test at tools/testing/selftests/bpf
directory. Please see individual patch comments for details.

Yonghong Song (2):
  bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
  tools/bpf: add a bpf selftest for stacktrace

 kernel/bpf/stackmap.c                             |  23 +++-
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/test_progs.c          | 127 ++++++++++++++++++++++
 tools/testing/selftests/bpf/test_stacktrace_map.c |  62 +++++++++++
 4 files changed, 211 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_stacktrace_map.c

-- 
2.9.5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
  2018-01-04  7:27 [PATCH bpf-next 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map Yonghong Song
@ 2018-01-04  7:27 ` Yonghong Song
  2018-01-04 21:08   ` Jakub Kicinski
  2018-01-04  7:27 ` [PATCH bpf-next 2/2] tools/bpf: add a bpf selftest for stacktrace Yonghong Song
  1 sibling, 1 reply; 5+ messages in thread
From: Yonghong Song @ 2018-01-04  7:27 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
supported for stacktrace map. However, there are use cases where
user space wants to enumerate all stacktrace map entries where
BPF_MAP_GET_NEXT_KEY command will be really helpful.
In addition, if user space wants to delete all map entries
in order to save memory and does not want to close the
map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
performance if map entries are sparsely populated.

The implementation follows the API specification of existing
BPF_MAP_GET_NEXT_KEY implementation. If user provides
an NULL key pointer, the first key is returned. Otherwise,
the first valid key after the input parameter "key"
is returned, or -ENOENT if no valid key can be found.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a15bc63..207b21c 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 	return 0;
 }

-static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+static int stack_map_get_next_key(struct bpf_map *map, void *key,
+				  void *next_key)
 {
-	return -EINVAL;
+	struct bpf_stack_map *smap = container_of(map,
+						  struct bpf_stack_map, map);
+	u32 id;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if (!key)
+		id = 0;
+	else
+		id = *(u32 *)key + 1;
+
+	while (id < smap->n_buckets && !smap->buckets[id])
+		id++;
+
+	if (id >= smap->n_buckets)
+		return -ENOENT;
+
+	*(u32 *)next_key = id;
+	return 0;
 }

 static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next 2/2] tools/bpf: add a bpf selftest for stacktrace
  2018-01-04  7:27 [PATCH bpf-next 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map Yonghong Song
  2018-01-04  7:27 ` [PATCH bpf-next 1/2] " Yonghong Song
@ 2018-01-04  7:27 ` Yonghong Song
  1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2018-01-04  7:27 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

Added a bpf selftest in test_progs at tools directory for stacktrace.
The test will populate a hashtable map and a stacktrace map
at the same time with the same key, stackid.
The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM
command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have
the same set of keys.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/test_progs.c          | 127 ++++++++++++++++++++++
 tools/testing/selftests/bpf/test_stacktrace_map.c |  62 +++++++++++
 3 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_stacktrace_map.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1304753..a8aa7e2 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -19,7 +19,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
 	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
-	test_l4lb_noinline.o test_xdp_noinline.o
+	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
 	test_offload.py
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 09087ab..b549308 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -837,6 +837,132 @@ static void test_tp_attach_query(void)
 	free(query);
 }
 
+static int compare_map_keys(int map1_fd, int map2_fd)
+{
+	__u32 key, next_key;
+	char val_buf[PERF_MAX_STACK_DEPTH * sizeof(__u64)];
+	int err;
+
+	err = bpf_map_get_next_key(map1_fd, NULL, &key);
+	if (err)
+		return err;
+	err = bpf_map_lookup_elem(map2_fd, &key, val_buf);
+	if (err)
+		return err;
+
+	while (bpf_map_get_next_key(map1_fd, &key, &next_key) == 0) {
+		err = bpf_map_lookup_elem(map2_fd, &next_key, val_buf);
+		if (err)
+			return err;
+
+		key = next_key;
+	}
+	if (errno != ENOENT)
+		return -1;
+
+	return 0;
+}
+
+static void test_stacktrace_map()
+{
+	int control_map_fd, stackid_hmap_fd, stackmap_fd;
+	const char *file = "./test_stacktrace_map.o";
+	int bytes, efd, err, pmu_fd, prog_fd;
+	struct perf_event_attr attr = {};
+	__u32 key, val, duration = 0;
+	struct bpf_object *obj;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
+		goto out;
+
+	/* Get the ID for the sched/sched_switch tracepoint */
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/sched/sched_switch/id");
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf),
+		  "read", "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	/* Open the perf event and attach bpf progrram */
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+	pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+			 0 /* cpu 0 */, -1 /* group id */,
+			 0 /* flags */);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n",
+		  pmu_fd, errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
+		  err, errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
+		  err, errno))
+		goto disable_pmu;
+
+	/* find map fds */
+	control_map_fd = bpf_find_map(__func__, obj, "control_map");
+	if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+	if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+	if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
+		  err, errno))
+		goto disable_pmu;
+
+	/* give some time for bpf program run */
+	sleep(1);
+
+	/* disable stack trace collection */
+	key = 0;
+	val = 1;
+	bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+	/* for every element in stackid_hmap, we can find a corresponding one
+	 * in stackmap, and vise versa.
+	 */
+	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		; /* fall through */
+
+disable_pmu:
+	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
+
+close_pmu:
+	close(pmu_fd);
+
+close_prog:
+	bpf_object__close(obj);
+
+out:
+	return;
+}
+
 int main(void)
 {
 	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
@@ -852,6 +978,7 @@ int main(void)
 	test_pkt_md_access();
 	test_obj_name();
 	test_tp_attach_query();
+	test_stacktrace_map();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
diff --git a/tools/testing/selftests/bpf/test_stacktrace_map.c b/tools/testing/selftests/bpf/test_stacktrace_map.c
new file mode 100644
index 0000000..76d85c5d
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_stacktrace_map.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+#ifndef PERF_MAX_STACK_DEPTH
+#define PERF_MAX_STACK_DEPTH         127
+#endif
+
+struct bpf_map_def SEC("maps") control_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") stackid_hmap = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 10000,
+};
+
+struct bpf_map_def SEC("maps") stackmap = {
+	.type = BPF_MAP_TYPE_STACK_TRACE,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
+	.max_entries = 10000,
+};
+
+/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
+struct sched_switch_args {
+	unsigned long long pad;
+	char prev_comm[16];
+	int prev_pid;
+	int prev_prio;
+	long long prev_state;
+	char next_comm[16];
+	int next_pid;
+	int next_prio;
+};
+
+SEC("tracepoint/sched/sched_switch")
+int oncpu(struct sched_switch_args *ctx)
+{
+	__u32 key = 0, val = 0, *value_p;
+
+	value_p = bpf_map_lookup_elem(&control_map, &key);
+	if (value_p && *value_p)
+		return 0; /* skip if non-zero *value_p */
+
+	/* The size of stackmap and stackid_hmap should be the same */
+	key = bpf_get_stackid(ctx, &stackmap, 0);
+	if ((int)key >= 0)
+		bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
  2018-01-04  7:27 ` [PATCH bpf-next 1/2] " Yonghong Song
@ 2018-01-04 21:08   ` Jakub Kicinski
  2018-01-04 21:32     ` Yonghong Song
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2018-01-04 21:08 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team

On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
> supported for stacktrace map. However, there are use cases where
> user space wants to enumerate all stacktrace map entries where
> BPF_MAP_GET_NEXT_KEY command will be really helpful.
> In addition, if user space wants to delete all map entries
> in order to save memory and does not want to close the
> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
> performance if map entries are sparsely populated.
> 
> The implementation follows the API specification of existing
> BPF_MAP_GET_NEXT_KEY implementation. If user provides
> an NULL key pointer, the first key is returned. Otherwise,
> the first valid key after the input parameter "key"
> is returned, or -ENOENT if no valid key can be found.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index a15bc63..207b21c 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>  	return 0;
>  }
>  
> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
> +				  void *next_key)
>  {
> -	return -EINVAL;
> +	struct bpf_stack_map *smap = container_of(map,
> +						  struct bpf_stack_map, map);
> +	u32 id;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	if (!key)
> +		id = 0;
> +	else
> +		id = *(u32 *)key + 1;
> +
> +	while (id < smap->n_buckets && !smap->buckets[id])
> +		id++;
> +
> +	if (id >= smap->n_buckets)
> +		return -ENOENT;

AFAIU for hash maps the semantics of get next are as follows:

get_next(map, key) {
	if (!key)
		return get_first(map);

	elem = lookup(map, key);
	if (!elem)                       // <-- note this branch
		return get_first(map);
	if (elem->next)
		return elem->next->key;
	return -ENOENT;
}

For arrays elements always exist, hence the elem->next check is
omitted.  Here you are, however, testing !smap->buckets[id] so I assume
elements may not exist.  

Is there any precedent for get_next on non-existent key returning
element other than first?  The stacktrace map is a bit special, and
returning id + 1 would defeat what you're trying to do here..  Is there
value in keeping the behaviour consistent across map types?  

Anyway, you said in the commit message that "The implementation follows
the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
and I find that arguable :)

> +	*(u32 *)next_key = id;
> +	return 0;
>  }
>  
>  static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
  2018-01-04 21:08   ` Jakub Kicinski
@ 2018-01-04 21:32     ` Yonghong Song
  0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2018-01-04 21:32 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: ast, daniel, netdev, kernel-team



On 1/4/18 1:08 PM, Jakub Kicinski wrote:
> On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
>> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
>> supported for stacktrace map. However, there are use cases where
>> user space wants to enumerate all stacktrace map entries where
>> BPF_MAP_GET_NEXT_KEY command will be really helpful.
>> In addition, if user space wants to delete all map entries
>> in order to save memory and does not want to close the
>> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
>> performance if map entries are sparsely populated.
>>
>> The implementation follows the API specification of existing
>> BPF_MAP_GET_NEXT_KEY implementation. If user provides
>> an NULL key pointer, the first key is returned. Otherwise,
>> the first valid key after the input parameter "key"
>> is returned, or -ENOENT if no valid key can be found.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index a15bc63..207b21c 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>>   	return 0;
>>   }
>>   
>> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
>> +				  void *next_key)
>>   {
>> -	return -EINVAL;
>> +	struct bpf_stack_map *smap = container_of(map,
>> +						  struct bpf_stack_map, map);
>> +	u32 id;
>> +
>> +	WARN_ON_ONCE(!rcu_read_lock_held());
>> +
>> +	if (!key)
>> +		id = 0;
>> +	else
>> +		id = *(u32 *)key + 1;
>> +
>> +	while (id < smap->n_buckets && !smap->buckets[id])
>> +		id++;
>> +
>> +	if (id >= smap->n_buckets)
>> +		return -ENOENT;
> 
> AFAIU for hash maps the semantics of get next are as follows:
> 
> get_next(map, key) {
> 	if (!key)
> 		return get_first(map);
> 
> 	elem = lookup(map, key);
> 	if (!elem)                       // <-- note this branch
> 		return get_first(map);
> 	if (elem->next)
> 		return elem->next->key;
> 	return -ENOENT;
> }
> 
> For arrays elements always exist, hence the elem->next check is
> omitted.  Here you are, however, testing !smap->buckets[id] so I assume
> elements may not exist.

Right, buckets[id] could be NULL.

> 
> Is there any precedent for get_next on non-existent key returning
> element other than first?  The stacktrace map is a bit special, and

Sorry, I miss this. You are right. hashtable get_next_key will return 
the first for non-existing key. And all other implemented get_next_key
is a variant of arrays where all keys already exist.

> returning id + 1 would defeat what you're trying to do here..  Is there
> value in keeping the behaviour consistent across map types?

Let us keep the behavior consistent with hashtable then.

> 
> Anyway, you said in the commit message that "The implementation follows
> the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
> and I find that arguable :)

You are right. Will send v2 soon with re-wording of commit message as well.

>> +	*(u32 *)next_key = id;
>> +	return 0;
>>   }
>>   
>>   static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-04 21:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-04  7:27 [PATCH bpf-next 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map Yonghong Song
2018-01-04  7:27 ` [PATCH bpf-next 1/2] " Yonghong Song
2018-01-04 21:08   ` Jakub Kicinski
2018-01-04 21:32     ` Yonghong Song
2018-01-04  7:27 ` [PATCH bpf-next 2/2] tools/bpf: add a bpf selftest for stacktrace Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).