From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yonghong Song Subject: Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map Date: Thu, 4 Jan 2018 13:32:10 -0800 Message-ID: <78089cd2-9f12-267a-4b5c-d2e39e47932f@fb.com> References: <20180104072746.1569033-1-yhs@fb.com> <20180104072746.1569033-2-yhs@fb.com> <20180104130822.377721d8@cakuba.netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , To: Jakub Kicinski Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:38598 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752875AbeADVcc (ORCPT ); Thu, 4 Jan 2018 16:32:32 -0500 In-Reply-To: <20180104130822.377721d8@cakuba.netronome.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 1/4/18 1:08 PM, Jakub Kicinski wrote: > On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote: >> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not >> supported for stacktrace map. However, there are use cases where >> user space wants to enumerate all stacktrace map entries where >> BPF_MAP_GET_NEXT_KEY command will be really helpful. >> In addition, if user space wants to delete all map entries >> in order to save memory and does not want to close the >> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve >> performance if map entries are sparsely populated. >> >> The implementation follows the API specification of existing >> BPF_MAP_GET_NEXT_KEY implementation. If user provides >> an NULL key pointer, the first key is returned. Otherwise, >> the first valid key after the input parameter "key" >> is returned, or -ENOENT if no valid key can be found. >> >> Signed-off-by: Yonghong Song >> --- >> kernel/bpf/stackmap.c | 23 +++++++++++++++++++++-- >> 1 file changed, 21 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index a15bc63..207b21c 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) >> return 0; >> } >> >> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key) >> +static int stack_map_get_next_key(struct bpf_map *map, void *key, >> + void *next_key) >> { >> - return -EINVAL; >> + struct bpf_stack_map *smap = container_of(map, >> + struct bpf_stack_map, map); >> + u32 id; >> + >> + WARN_ON_ONCE(!rcu_read_lock_held()); >> + >> + if (!key) >> + id = 0; >> + else >> + id = *(u32 *)key + 1; >> + >> + while (id < smap->n_buckets && !smap->buckets[id]) >> + id++; >> + >> + if (id >= smap->n_buckets) >> + return -ENOENT; > > AFAIU for hash maps the semantics of get next are as follows: > > get_next(map, key) { > if (!key) > return get_first(map); > > elem = lookup(map, key); > if (!elem) // <-- note this branch > return get_first(map); > if (elem->next) > return elem->next->key; > return -ENOENT; > } > > For arrays elements always exist, hence the elem->next check is > omitted. Here you are, however, testing !smap->buckets[id] so I assume > elements may not exist. Right, buckets[id] could be NULL. > > Is there any precedent for get_next on non-existent key returning > element other than first? The stacktrace map is a bit special, and Sorry, I miss this. You are right. hashtable get_next_key will return the first for non-existing key. And all other implemented get_next_key is a variant of arrays where all keys already exist. > returning id + 1 would defeat what you're trying to do here.. Is there > value in keeping the behaviour consistent across map types? Let us keep the behavior consistent with hashtable then. > > Anyway, you said in the commit message that "The implementation follows > the API specification of existing BPF_MAP_GET_NEXT_KEY implementation." > and I find that arguable :) You are right. Will send v2 soon with re-wording of commit message as well. >> + *(u32 *)next_key = id; >> + return 0; >> } >> >> static int stack_map_update_elem(struct bpf_map *map, void *key, void *value, >